Lexical and Parser Tools

Similar documents
CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

An Introduction to LEX and YACC. SYSC Programming Languages

LECTURE 11. Semantic Analysis and Yacc

Yacc: A Syntactic Analysers Generator

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Introduction to Lex & Yacc. (flex & bison)

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

PRACTICAL CLASS: Flex & Bison

Automatic Scanning and Parsing using LEX and YACC

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

An introduction to Flex

Compil M1 : Front-End

Syntax Analysis Part IV

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Preparing for the ACW Languages & Compilers

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Chapter 3 -- Scanner (Lexical Analyzer)

Using Lex or Flex. Prof. James L. Frankel Harvard University

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Flex and lexical analysis

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Gechstudentszone.wordpress.com

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Lexical and Syntax Analysis

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Figure 2.1: Role of Lexical Analyzer

UNIT - 7 LEX AND YACC - 1

TDDD55- Compilers and Interpreters Lesson 2

Programming Assignment II

Parsing How parser works?

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

Compiler Lab. Introduction to tools Lex and Yacc

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

CSC 467 Lecture 3: Regular Expressions

Applications of Context-Free Grammars (CFG)

CSE302: Compiler Design

Using an LALR(1) Parser Generator

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CSCI Compiler Design

Flex and lexical analysis. October 25, 2016

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Edited by Himanshu Mittal. Lexical Analysis Phase

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Lexical and Syntax Analysis

The structure of a compiler

COMPILER CONSTRUCTION Seminar 02 TDDB44

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

TDDD55 - Compilers and Interpreters Lesson 3

(F)lex & Bison/Yacc. Language Tools for C/C++ CS 550 Programming Languages. Alexander Gutierrez

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

CSE302: Compiler Design

Component Compilers. Abstract

Yacc Yet Another Compiler Compiler

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

Etienne Bernard eb/textes/minimanlexyacc-english.html

Chapter 3 Lexical Analysis

TDDD55- Compilers and Interpreters Lesson 3

Compiler course. Chapter 3 Lexical Analysis

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

LEX/Flex Scanner Generator

Syntax-Directed Translation

Compiler phases. Non-tokens

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Introduction to Compiler Design

Lexical Analysis - Flex

Part 2 Lexical analysis. Lexical analysis 44

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CST-402(T): Language Processors

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lex - A Lexical Analyzer Generator

Principles of Programming Languages

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Syntax Analysis The Parser Generator (BYacc/J)

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Ray Pereda Unicon Technical Report UTR-03. February 25, Abstract

Transcription:

Lexical and Parser Tools CSE 413, Autumn 2005 Programming Languages http://www.cs.washington.edu/education/courses/413/05au/ 7-Dec-2005 cse413-20-tools 2005 University of Washington 1 References» The Lex & Yacc Page (C) http://dinosaur.compilertools.net/» lex & yacc, Levine, Mason, Brown http://www.oreilly.com/catalog/lex/index.html» GNU flex and bison (C) http://www.gnu.org/software/flex/manual/ http://www.gnu.org/software/bison/manual/» Modern Compiler Implementation in Java, Appel http://www.cs.princeton.edu/~appel/modern/java/» JLex, JFlex and CUP (Java) http://jflex.de/ http://www2.cs.tum.edu/projects/cup/» Engineering a Compiler, Cooper and Torczon http://www.mkp.com/engineeringacompiler 7-Dec-2005 cse413-20-tools 2005 University of Washington 2 Structure of a Compiler Front End source Scanner tokens Parser IR First approximation» Front end: analysis Read source program and understand its structure and meaning» Back end: synthesis Generate equivalent target language program Source Front End Back End Target 7-Dec-2005 cse413-20-tools 2005 University of Washington 3 Split into two parts» Scanner: Responsible for converting character stream to token stream Also strips out white space, comments» Parser: Reads token stream generates IR Both of these can be generated automatically or by hand» Source language specified by a formal grammar» Tools read the grammar and generate scanner & parser (either table-driven or hard coded)

Lex and Yacc Lex source Scanner Lexical analyzer generated by Lex data structures Parser generated by Yacc source Scanner tokens Parser parse tree yylex() yylex() rest of the application yyparse() rest of the application 7-Dec-2005 cse413-20-tools 2005 University of Washington 5 Lex is a lexical analyzer generator Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream» editor-script type transformations» segmenting input in preparation for a parsing routine, ie, tokenizing You define the scanner by providing the patterns to recognize and the actions to take 7-Dec-2005 cse413-20-tools 2005 University of Washington 6 Lex Input Declarations % Definitions Rules User Subroutines Declarations - optional user supplied header code Definitions - optional definitions to simplify the Rules section Rules - token definition patterns and associated actions in C User subroutines - optional helper functions Lex Output Generates a scanner function written in C» the file is lex.yy.c» the function is yylex()» yylex() implements the DFA that corresponds to the regular expressions you supplied using: transition table for the DFA action code invoked at the accept states 7-Dec-2005 cse413-20-tools 2005 University of Washington 7 7-Dec-2005 cse413-20-tools 2005 University of Washington 8

Lex Rules Lines in the rules section have the form expression action expression is the pattern to recognize» regular expressions defined as we have in class with extensions for convenience action is the action to take when the pattern is recognized» arbitrary C code that becomes part of the DFA code pattern definition match operators x the character "x" [xy] any character selected from the list given (in this case, x or y) [x-z] any character from the range given (in this case, x, y or z) [^x] any character but x, ie the complement of the character(s) specified.. any single character but newline. x? an optional x, ie, 0 or 1 occurrences of x x* 0 or more instances of x. x+ 1 or more instances of x, equivalent to xx* x{m,n m to n occurrences of x ^x an x at the beginning of a line. x$ an x at the end of a line. \x an "x", even if x otherwise has some special meaning "xy" string "xy", even if xy would otherwise have some special meaning x y an x or a y. x/y an x but only if followed by y. {xx the translation of xx from the definitions section. 7-Dec-2005 cse413-20-tools 2005 University of Washington 9 7-Dec-2005 cse413-20-tools 2005 University of Washington 10 actions When an expression is matched, Lex executes the corresponding action.» the action is defined as one or more C statements» if a section of the input is not matched by any pattern, then the default action is taken which consists of copying the input to the output context for the action code There are several global variables defined at the point when the action code is called» yytext - null terminated string containing the lexeme» yyleng - the length of yytext string» yylval - structured variable holding attributes of the recognized token» yylloc - structured variable holding location information about the recognized token 7-Dec-2005 cse413-20-tools 2005 University of Washington 11 7-Dec-2005 cse413-20-tools 2005 University of Washington 12

count.l - count chars, words, lines create and run count int numchar = 0, numword = 0, numline = 0 % \n {numline++ numchar++ [^ \t\n]+ {numword++ numchar+= yyleng. {numchar++ main() { yylex() printf("%d\t%d\t%d\n", numchar, numword, numline) declarations rules user code create the scanner source file lex.yy.c [finson@walnut cse413]$ flex count.l build the executable program count [finson@walnut cse413]$ gcc -o count lex.yy.c -ll run the program on the definition file count.l [finson@walnut cse413]$./count < count.l 220 32 17 run the program on the generated source file lex.yy.c [finson@walnut cse413]$./count < lex.yy.c 35824 5090 1509 7-Dec-2005 cse413-20-tools 2005 University of Washington 13 7-Dec-2005 cse413-20-tools 2005 University of Washington 14 histo.l - histogram of word lengths Create and run histo int lengs[100] [a-za-z]+ lengs[yyleng]++. \n yywrap() { int i printf("length No. words\n") for(i=0 i<100 i++) if (lengs[i] > 0) printf("%5d%10d\n",i,lengs[i]) return(1) declarations rules user code [finson@walnut cse413]$ flex histo.l create the scanner source file lex.yy.c build the executable program histo [finson@walnut cse413]$ gcc -o histo lex.yy.c -ll run the program on the definition file histo.l [finson@walnut cse413]$./histo < histo.l Length No. words 1 14 2 3 3 3 5 5 6 6 7-Dec-2005 cse413-20-tools 2005 University of Washington 15 7-Dec-2005 cse413-20-tools 2005 University of Washington 16

Yacc: Yet-Another Compiler Compiler Yacc provides a general tool for describing the input to a computer program.» ie, Yacc helps write programs whose actions are directed by a language generated by some grammar The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized.» Yacc turns the specification into a subroutine that handles the input process Yacc Input Declarations % Definitions Productions User Subroutines Declarations - optional user supplied header code Definitions - optional definitions to simplify the Rules section Productions - the grammar to parse and the associated actions User subroutines - optional helper functions 7-Dec-2005 cse413-20-tools 2005 University of Washington 17 7-Dec-2005 cse413-20-tools 2005 University of Washington 18 Yacc Output Generates a parser function written in C» the file is y.tab.c» the function is yyparse()» yyparse() implements a bottom-up, LALR(1) parser for the grammar you supplied Yacc Productions Lines in the productions section have the form production action production is the grammar expression» almost exactly the same as the productions we have defined in class action is the action to take when the pattern is recognized» arbitrary C code that becomes part of the DFA code 7-Dec-2005 cse413-20-tools 2005 University of Washington 19 7-Dec-2005 cse413-20-tools 2005 University of Washington 20

example productions Production format parameters : parameter : declarations declaration parameter parameters ',' parameter T_KW_INT T_ID {printf("parsed PARAMETER %s\n",$2) : declaration declarations declaration : T_KW_INT T_ID '' {printf ("PARSED LOCAL VARIABLE %s\n",$2) A grammar production has the form» A : BODY» A is the non-terminal» BODY is a sequence of zero or more nonterminals and terminals» ':' takes the place of '::=' or ' ' in our grammars» '' means the end of the production or set of productions 7-Dec-2005 cse413-20-tools 2005 University of Washington 21 7-Dec-2005 cse413-20-tools 2005 University of Washington 22 actions When a production is matched, Yacc executes the corresponding action.» the action is defined as one or more C statements» the statements can do anything» if no action code is specified, then the default action is to return the value of the first item in right hand side of the production for use in higher level accumulations context for the action code The value of the items in the production is available when the action code is called You can access the values of the parsed items with $1, $2, etc parameter : T_KW_INT T_ID {printf("parsed PARAMETER %s\n",$2) 7-Dec-2005 cse413-20-tools 2005 University of Washington 23 7-Dec-2005 cse413-20-tools 2005 University of Washington 24

dp.y: declarations and definitions dp.y: productions and actions #define YYERROR_VERBOSE 1 void yyerror (char *s) % %union { int intvalue char *stringvalue %token <stringvalue>t_id %token <intvalue>t_int %token T_KW_INT %token T_KW_RETURN %token T_KW_IF %token T_KW_ELSE %token T_KW_WHILE %token T_OP_EQ %token T_OP_ASSIGN miscellaneous C declarations token attributes tokens and keywords program : functiondefinition program functiondefinition functiondefinition : T_KW_INT T_ID '(' ')' '{' statements '' {printf("parsed FUNCTION %s\n",$2) T_KW_INT T_ID '(' parameters ')' '{' statements '' {printf("parsed FUNCTION %s\n",$2) T_KW_INT T_ID '(' ')' '{' declarations statements '' {printf("parsed FUNCTION %s\n",$2) T_KW_INT T_ID '('parameters ')' '{' declarations statements '' {printf("parsed FUNCTION %s\n",$2) etc, etc 7-Dec-2005 cse413-20-tools 2005 University of Washington 25 7-Dec-2005 cse413-20-tools 2005 University of Washington 26 dp.y: user subroutines dp.l: lexical (part 1) void yyerror (char *s) { fprintf (stderr, "Err: %s\n", s) int main (void) { return yyparse () simple error reporting function simple main program #include "dp.tab.h" % alpha [a-za-z] numeric [0-9] comment "//".*$ whitespace [ \t\n\r\v\f]+ [!>+\-*(){,] {return yytext[0] "int" { return T_KW_INT "return" { return T_KW_RETURN "if" { return T_KW_IF "else" { return T_KW_ELSE "while" { return T_KW_WHILE "==" { return T_OP_EQ "=" { return T_OP_ASSIGN 7-Dec-2005 cse413-20-tools 2005 University of Washington 27 7-Dec-2005 cse413-20-tools 2005 University of Washington 28

dp.l: lexical (part 2) sample run of the dp parser {whitespace { {alpha({alpha {numeric _)* { yylval.stringvalue = strdup(yytext) return T_ID {numeric+ { yylval.intvalue = atoi(yytext) return T_INT {comment { djohnson@skyo ex20 $ flex dp.l djohnson@skyo ex20 $ bison -vdt dp.y djohnson@skyo ex20 $ gcc -o parse lex.yy.c dp.tab.c -ll djohnson@skyo ex20 $./parse < scanx.d PARSED PARAMETER x PARSED PARAMETER y PARSED FUNCTION somefunction PARSED PARAMETER x PARSED LOCAL VARIABLE asimpleint PARSED LOCAL VARIABLE a_s_i_2 PARSED LOCAL VARIABLE k PARSED LOCAL VARIABLE z PARSED FUNCTION main djohnson@skyo ex20 $ 7-Dec-2005 cse413-20-tools 2005 University of Washington 29 7-Dec-2005 cse413-20-tools 2005 University of Washington 30 Summary The actions can be any code you need If you ever need to build a program that parses character strings into data structures» think Lex/Yacc, flex/bison, JLex/CUP, You can quickly build a solid parser with well defined tokens and grammar» tokens are regular expressions» grammar is standard Backus-Naur Form 7-Dec-2005 cse413-20-tools 2005 University of Washington 31