Parser Generators. Aurochs and ANTLR and Yaccs, Oh My. Tuesday, December 1, 2009 Reading: Appel 3.3. Department of Computer Science Wellesley College

Size: px
Start display at page:

Download "Parser Generators. Aurochs and ANTLR and Yaccs, Oh My. Tuesday, December 1, 2009 Reading: Appel 3.3. Department of Computer Science Wellesley College"

Transcription

1 Parser Generators Aurochs and ANTLR and Yaccs, Oh My Tuesday, December 1, 2009 Reading: Appel 3.3 CS235 Languages and Automata Department of Computer Science Wellesley College Compiler Structure Source Program (character stream) Lexer (a.k.a. Scanner, Tokenizer) Tokens Front End (CS235) Global Analysis Information (Symbol and Attribute Tables) Parser Middle Stages (CS251/ CS301) Abstract Syntax Tree (AST) Type Checker Intermediate Representation Semantic Analysis Intermediate Representation Optimizer Intermediate Representation (used by all phases of the compiler) Back End (CS301) Code Generator Machine code or byte code Parser Generators

2 Motivation We now know how to construct LL(k) and LR(k) parsers by hand. But constructing parser tables from grammars and implementing parsers from parser tables is tedious and error prone. Isn t there a better way? Idea: Hey, aren t computers good at automating these sorts of things? Parser Generators 34-3 Parser Generators A parser generator is a program whose inputs are: 1. a grammar that specifies a tree structure for linear sequences of tokens (typically created by a separate lexer program). 2. a specification of semantic actions to perform when building the parse tree. Its output is an automatically constructed program that parses a token sequence into a tree, performing the specified semantic actions along the way In the remainder of this lecture, we will explore and compare a two parser generators: Yacc and Aurochs. Parser Generators

3 ml-yacc: An LALR(1) Parser Generator Slip.yacc.yacc file (token datatype, parsing rules, precedence & associativity info) Slip.lex.lex file (token rules & token-handling code) ml-yacc ml-lex Slip.lex.sml Slip.yacc.sig token datatype Slip.yacc.desc LR parsing table info for debugging shift-reduce and reduce-reduce conflicts Slip.yacc.sml parsing program program (as character stream) A Slip program ant scanner program (SML) token stream Slip tokens ant parser program (SML) abstract syntax tree Slip AST Parser Generators 34-5 Format of a.yacc File Header section with SML code %% Specification of terminals & variables and various declarations (including precedence and associativity) %% Grammar productions with semantic actions Parser Generators

4 IntexpUnambiguousAST.yacc (Part 1) (* no SML code in this case *) %% (* separates initial section from middle section of.yacc file *) Name used to prefix various names created by ml-yacc %name Intexp %pos int Declares the type of positions for terminals. Each terminal has a left and right position. Here, position is just an int (a character position in a string or file) %term INT of int ADD SUB MUL DIV EXPT LPAREN (* "(" *) RPAREN (* ")" *) EOF (* End of file*) (* Middle section continued on next slide *) Specifies the terminals of the language. ml-yacc automatically constructs a Tokens module based on this specification. Tokens specified without "of" will have a constructor of two args: (1) its left position and (2) its right position. Tokens specified with "of" will have a constructor of three args: (1) the component datum (whose type follows "of"); (2) its left position; and (3) its the right position. %nonterm Start of AST.exp Specifies the non-terminals of the language and the kind of value Exp of AST.exp that the parser will generate for them (an AST in this case). Term of AST.exp The "start" non-terminal should not appear on the right-hand Factor of AST.exp sides of any productions. Unit of AST.exp %start Start Start symbol of the grammar Parser Generators 34-7 IntexpUnambiguousAST.yacc (Part 2) (* Middle section continued from previous slide *) %keyword %eop EOF %noshift EOF %nodefault Lists the keywords of the language. The error handler of the ml-yacc generated parser treats keywords specially. In this example, there aren t any keywords. Indicates tokens that the input Tells the parser never to shift the specfied tokens Suppresses generation of default reduction %verbose Generates a description of the LALR parser table in filename.yacc.desc. This is helpful for debugging shift/reduce and reduce/reduce conflicts %value INT(0) Specifies default values for value-bearing terminals. Terminals with default values may be created by an ml-yacc-generated parser as part of error-correction. Note; in this middle section, you can also specify associativity and precedence of operators. See IntexpPrecedence.yacc (later slide) for an example. (* Grammar productions and semantic actions on next slide *) Parser Generators

5 AST module for integer expressions structure AST = struct datatype exp = Int of int BinApp of binop * exp * exp and binop = Add Sub Mul Div Expt (* Various AST functions omitted from this slide *) Parser Generators 34-9 IntexpUnambiguousAST.yacc (Part 3) %% (* separates middle section from final section *) (* Grammar productions and associated semantic actions (in parens) go here *) Start: Exp (Exp) Within parens, nonterminals stand for the value generated by the parser for the nonterminal, as specified in the %nonterm declaration (an AST expression in this case). Exp : Term (Term) (* The following rules specify left associativity *) Exp ADD Term (AST.BinApp(AST.Add,Exp,Term)) Exp SUB Term (AST.BinApp(AST.Sub,Exp,Term)) Term : Factor (Factor) (* The following rules specify left associativity *) Term MUL Factor (AST.BinApp(AST.Mul,Term,Factor)) Term DIV Factor (AST.BinApp(AST.Div,Term,Factor)) Factor : Unit (Unit) (* The following rule specifies right associativity *) Unit EXPT Factor (AST.BinApp(AST.Expt,Unit,Factor)) Unit : INT (AST.Int(INT)) LPAREN Exp RPAREN (Exp) Within parens, terminals stand for the component datum specified after of in the %term declaration. E.g., the datum associated with INT is an integer. Parser Generators

6 Interfacing with the Lexer (*contents of Intexp.lex *) open Tokens type pos = int type lexresult= (svalue,pos) token fun eof () = Tokens.EOF(0,0) 0) fun integer(str,lexpos) = case Int.fromString(str) of NONE => raise Fail("Shouldn't happen: sequence of digits not recognized as integer -- " ^ str) SOME(n) => INT(n,lexPos,lexPos) (* For scanner initialization (not needed here) *) fun init() = () %% %header (functor IntexpLexFun(structure Tokens: Intexp_TOKENS)); alpha=[a-za-z]; digit=[0-9]; whitespace=[\ \t\n]; symbol=[+*^()\[]; any= [^]; %% {digit}+ => (integer(yytext,yypos)); "+" => (ADD(yypos,yypos)); (* more lexer rules go here *) Parser Generators Parser interface code, Part 1 structure Intexp = struct (* Module for interfacing with parser and lexer *) (* Most of the following is "boilerplate" code that needs to be slightly edited on a per parser basis. *) structure IntexpLrVals = IntexpLrValsFun(structure Token = LrParser.Token) structure IntexpLex = IntexpLexFun(structure Tokens = IntexpLrVals.Tokens); structure IntexpParser = Join(structure LrParser = LrParser structure ParserData = IntexpLrVals.ParserData structure Lex = IntexpLex) val invoke = fn lexstream => (* The invoke function invokes the parser given a lexer *) let val print_error = fn (str,pos,_) => TextIO.output(TextIO.stdOut, "***Intexp Parser Error at character position " ^ (Int.toString pos) ^ "***\n" ^ str^ "\n") in IntexpParser.parse(0,lexstream,print_error,()) fun newlexer fcn = (* newlexer creates a lexer from a given input-reading function *) let val lexer = IntexpParser.makeLexer fcn val _ = IntexpLex.UserDeclarations.init() in lexer (* continued on next slide *) ml-yacc

7 Parser interface code, Part 2 (* continued from previous slide *) fun stringtolexer str = (* creates a lexer from a string *) let val done = ref false in newlexer (fn n => if (!done) then " else (done := true; str)) fun filetolexer filename = (* creates a lexer from a file *) let val instream = TextIO.openIn(filename) in newlexer (fn n => TextIO.inputAll(inStream)) fun lexertoparser (lexer) = (* creates a parser from a lexer *) let val dummyeof = IntexpLrVals.Tokens.EOF(0,0) val (result,lexer) = invoke lexer val (nexttoken,lexer) = IntexpParser.Stream.get lexer in if IntexpParser.sameToken(nextToken,dummyEOF) then result else (TextIO.output(TextIO.stdOut, "*** INTEXP PARSER WARNING -- unconsumed input ***\n"); result) val parsestring = lexertoparser o stringtolexer (* parses a string *) val parsefile = lexertoparser o filetolexer (* parses a file *) (* struct *) Parser Generators Putting it all together: the load file (* This is the contents of load-intexp-unambiguous-ast.sml *) CM.make("$/basis.cm"); (* loads SML basis library *) CM.make("$/ml-yacc-lib.cm"); (* loads SML YACC library *) use "AST.sml"; (* datatype for integer expression abstract syntax trees *) use "IntexpUnambiguousAST.yacc.sig"; (* defines Intexp_TOKENS and other datatypes *) use "IntexpUnambiguousAST.yacc.sml"; (* defines shift-reduce parser *) use "Intexp.lex.sml"; (* load lexer *after* parser since it uses tokens defined by parser *) use "Intexp.sml"; (* load top-level parser interface *) Control.Print.printLength := 1000; (* set printing parameters so that *) Control.Print.printDepth := 1000; (* we ll see all details of parse trees *) Control.Print.stringDepth := 1000; (* and strings *) open Intexp; (* open the parsing module so that we can use parsestring and parsefile without qualification. *) Parser Generators

8 Taking it for a spin [fturbak@cardinal intexp-parsers] ml-lex Intexp.lex Number of states = 14 Number of distinct rows = 3 Approx. memory size of trans. tabl e = 387 bytes [fturbak@cardinal intexp-parsers] ml-yacc IntexpUnambiguousAST.yacc [fturbak@cardinal intexp-parsers] Linux shell - use "load-intexp-unambiguous-ast.sml"; (* lots of printed output omitted here *) - parsestring "1+2*3^4^5*6+7"; val it = BinApp SML (Add, BinApp (Add,Int 1, BinApp (Mul, BinApp (Mul,Int 2,BinApp (Expt,Int 3,BinApp (Expt,Int 4,Int 5))), Int 6)),Int 7) : IntexpParser.result Parser Generators intepreter IntexpUnambiguousAST.yacc.desc state 0: state 1: Start :. Exp INT shift 6 LPAREN shift 5 Start goto 19 Exp goto 4 Term goto 3 Factor goto 2 Unit goto 1. error Factor : Unit. (reduce by rule 7) Factor : Unit. EXPT Factor ADD reduce by rule 7 SUB reduce by rule 7 MUL reduce by rule 7 DIV reduce by rule 7 EXPT shift 7 RPAREN reduce by rule 7 EOF reduce by rule 7 lots of states omitted state 18: state 19: Unit : LPAREN Exp RPAREN. (reduce by rule 10) ADD reduce by rule 10 SUB reduce by rule 10 MUL reduce by rule 10 DIV reduce by rule 10 EXPT reduce by rule 10 RPAREN reduce by rule 10 EOF reduce by rule 10. error EOF accept. error 72 of 104 action table entries left after compaction 21 goto table entries. error Parser Generators

9 IntexpUnambiguousCalc.yacc We can calculate the result of an integer expression rather than generating an AST. (* An integer exponentiation function *) fun expt (base,0) = 1 expt(base,power) = base*(expt(base,power-1)) %% (* Only changed parts shown *) %nonterm Start of int Exp of int Term of int Factor of int Unit of int ) %% Start: Exp (Exp) Exp : Term (Term) (* The following rules specify left associativity *) Exp ADD Term (Exp + Term) Exp SUB Term (Exp - Term) Term : Factor (Factor) (* The following rules specify left associativity *) Term MUL Factor (Term * Factor) Term DIV Factor (Term div Factor) (* div is integer division *) Factor : Unit (Unit) (* The following rule specifies right associativity *) Unit EXPT Factor (expt(unit,factor)) Unit : INT (INT) LPAREN Exp RPAREN (Exp) Parser Generators Testing IntexpUnambiguousCalc [fturbak@cardinal intexp-parsers] ml-yacc IntexpUnambiguousCalc.yacc [fturbak@cardinal intexp-parsers] Linux shell - use "load-intexp-unambiguous-calc.sml"; (* lots of printed output omitted here *) - parsestring "1+2*3"; val it = 7 : IntexpParser.result result SML intepreter - parsestring "1-2-3"; val it = ~4 : IntexpParser.result Parser Generators

10 IntexpAmbiguousAST.yacc We can use an ambiguous grammar. (In this case, it leads to shift/reduce conflicts.) %% (* Only changed parts shown *) %nonterm Start of AST.exp Exp of AST.exp ) %% Start: Exp (Exp) Exp : INT (AST.Int(INT)) Exp ADD Exp (AST.BinApp(AST.Add,Exp1,Exp2)) Exp SUB Exp (AST.BinApp(AST.Sub,Exp1,Exp2)) Exp MUL Exp (AST.BinApp(AST.Mul,Exp1,Exp2)) Exp DIV Exp (AST.BinApp(AST.Div,Exp1,Exp2)) Exp EXPT Exp (AST.BinApp(AST.Expt,Exp1,Exp2)) Parser Generators IntexpAmbiguousAST.yacc.desc 25 shift/reduce conflicts error: state 8: shift/reduce conflict (shift EXPT, reduce by rule 6) error: state 8: shift/reduce conflict (shift DIV, reduce by rule 6) error: state 8: shift/reduce conflict (shift MUL, reduce by rule 6) error: state 8: shift/reduce conflict (shift SUB, reduce by rule 6) error: state 8: shift/reduce conflict (shift ADD, reduce by rule 6) error: state 9: shift/reduce conflict (shift EXPT, reduce by rule 5) error: state 9: shift/reduce conflict (shift DIV, reduce by rule 5) error: state 9: shift/reduce conflict (shift MUL, reduce by rule 5) error: state 9: shift/reduce conflict (shift SUB, reduce by rule 5) error: state 9: shift/reduce conflict (shift ADD, reduce by rule 5) error: state 10: shift/reduce conflict (shift EXPT, reduce by rule 4) error: state 10: shift/reduce conflict (shift DIV, reduce by rule 4) error: state 10: shift/reduce conflict (shift MUL, reduce by rule 4) error: state 10: shift/reduce conflict (shift SUB, reduce by rule 4) error: state 10: shift/reduce conflict (shift ADD, reduce by rule 4) error: state 11: shift/reduce conflict (shift EXPT, reduce by rule 3) error: state 11: shift/reduce conflict (shift DIV, reduce by rule 3) error: state 11: shift/reduce conflict (shift MUL, reduce by rule 3) error: state 11: shift/reduce conflict (shift SUB, reduce by rule 3) error: state 11: shift/reduce conflict (shift ADD, reduce by rule 3) error: state 12: shift/reduce conflict (shift EXPT, reduce by rule 2) error: state 12: shift/reduce conflict (shift DIV, reduce by rule 2) error: state 12: shift/reduce conflict (shift MUL, reduce by rule 2) error: state 12: shift/reduce conflict (shift SUB, reduce by rule 2) error: state 12: shift/reduce conflict (shift ADD, reduce by rule 2) state descriptions omitted Parser Generators

11 IntexpPrecedenceAST.yacc We can resolve conflicts with precedence/associativity declarations (can also handle dangling else problem this way). %% (* Only changed parts shown *) %nonterm Start of AST.exp Exp of AST.exp ) (* Specify associativity and precedence from low to high *) %left ADD SUB %left MUL DIV %right EXPT %% Start: Exp (Exp) Exp : INT (AST.Int(INT)) Exp ADD Exp (AST.BinApp(AST.Add,Exp1,Exp2)) Exp SUB Exp (AST.BinApp(AST.Sub,Exp1,Exp2)) Exp MUL Exp (AST.BinApp(AST.Mul,Exp1,Exp2)) Exp DIV Exp (AST.BinApp(AST.Div,Exp1,Exp2)) Exp EXPT Exp (AST.BinApp(AST.Expt,Exp1,Exp2)) Parser Generators Stepping Back Whew! That s a lot of work! Is it all worth it? There s a tremous amount of initial setup and glue code to get Lex and Yacc working together. But once that s done, we can focus on specifying and debugging high-level declarations of the token structure (in Lex) and grammar (in Yacc) rather than on SML programming. In Yacc, generally sp most of the time debugging shift/reduce and reduce/reduce conflicts must be very familiar with the LALR(1) table descriptions in the.yacc.desc file. Parser Generators

12 Other Parser Generators There are a plethora of other LALR parser generators for many languages, including: GNU Bison (Yacc for C, C++) jb (Bison for Java) CUP (Yacc for Java) There are also LL(k) parser generators, particularly ANTLR, which generates parsers and other tree-manipulation programs for Java, C, C++, C#, Python, Ruby, and JavaScript. There is also an sml-antlr. There are a relatively new (starting in 2002) collection of so- called packrat parsers for Parsing Expression Grammars (PEGs) for a wide range of target languages. We will study one of these: Aurochs. Parser Generators Parsing Expression Grammars (PEGs) Key problem in building top-down (recursive-descent) parsers: deciding which production to use on tokens. Three strategies: Predictive parser: determine which production to use based on k tokens of lookahead. Often mangle grammars to make this work. Backtracking parser: when have several choices, try them in order. If one fails try the next. Very general and high-level, but can take exponential time to parse input. PEG Parser: have a prioritized choice operator in which unconditionally use first successful match. Can directly express common disambiguation rules like longest-match match, followed-by by, and not-followed-by. Can express lexing rules and parsing rules in same grammar! This eliminates glue code between lexer and parser. Can implement PEGs via a packrat parser that memoizes parsing results and guarantees parsing in linear time (but possibly exponential space) Parser Generators

13 Aurochs: a PEG Parser We will explore PEG parsing using the Aurochs parser generator, developed by Berke Durak and available from The input to Aurochs is (1) a file to be parsed and (2) a.peg file specifying PEG declarations for lexing and parsing rules. The output of Aurochs is an XML parse tree representation. (It is also possible to generate parse trees in particular languages, but for simplicity we ll stick with the XML trees.) Parser Generators Aurochs Example [lynux@localhost slipmm-parsers]$ aurochs -parse simple.slip SlipmmParens.peg Aurochs Parsing file simple.slip using Nog interpreter - Grammar loaded from file SlipmmParens.peg - RESULT <Root> <StmList> <Assign var="x"> <BinApp binop="+"> <Int value="3"/> <Int value="4"/> </BinApp> </Assign> <Print> <BinApp binop="*"> <BinApp binop="-"> <Id name="x"/> <Int value="1"/> </BinApp> <BinApp binop="+"> <Id name="x"/> <Int value="2"/> </BinApp> </BinApp> </Print> </StmList> </Root> - No targets Contents of the file simple.slip begin x := (3+4); print ((x-1)*(x+2)); Parser Generators

14 Aurochs Grammar Expressions Expression 'c' "hello" BOF, EOF sigma Meaning The single ASCII character 'c' The string hello Beginning and of file tokens Any character epsilon The empty string [a-za-z0-9] grep-like character sets [^\n\t ] grep-like complements of character sets e1 e2 Concatenation: parse e1, then e2 e* Repetition: parses as many occurrences of e as possible, including zero. e+ Nonzero repetition: parses at least one occurrences of e. e? Option: parses zero or one occurrence of e. &e Conjunction: Parses e, but without consuming it. ~e Negation: Attempts to parse e. If parsing e fails, then ~e succeeds without consuming any input. If parsing e succeeds, then ~e fails (also without consuming input). e1 e2 (e) Ordered disjunction (prioritized choice): Attempt to parse e1 at the current position. If this succeeds, it will never try to parse e2 at the current position. If it fails, it will never try to parse e1 again at that position, and will try to parse e2. If e2 fails, the whole disjunction fails. Parentheses used for explicit grouping name Parse according to e from production name ::= e Parser Generators Some Simple Expressions Expression Meaning -? [1-9][0-9]* Any nonempty sequence of digits not beginning with 0, preceded by an optional minus sign &[aeiou] [a-z]+ Any nonempty sequence of lowercase letters beginning with a vowel. ~0 [0-9]+ Any nonempty sequence of digits not beginning with 0 ~( begin let ) &[a-z] [a-za-z0-9]+ exp ::= (exp + exp) id = exp id num Alphanumeric names beginning with lowercase letters, excluding the keywords begin,, and let. Assume id matches identifiers and num matches numbers. Then this can parse (x=3 + x) and (x=y=3 + (1 + z=y)) Due to prioritized choice, this will never parse x=3. exp ::= (exp + exp) Due to prioritized choice this will never parse x=3 id id = exp num if exp then stm else stm if exp then stm Solves dangling else problem by specifying that else goes with the closest if. Parser Generators

15 Numerical Expressions in Aurochs (* Tokens *) float ::= '-'? decimal+ ('.' decimal+)? ('e' ('-' '+')? decimal+)?; integer ::= '-'? decimal+; decimal ::= [0-9]; sp ::= (' ' '\n' '\r' '\t')*; (* any number of whitespaces *) ADD ::= sp ' + ' sp; SUB ::= sp '-' sp; MUL ::= sp '*' sp; DIV ::= sp '/' sp; LPAREN ::= sp '(' sp; RPAREN ::= sp ')' sp; (* Expressions *) start ::= exp sp EOF; exp ::= term ADD exp (* right associative *) term SUB exp (* right associative *) term; term ::= factor MUL term (* right associative *) factor DIV term (* right associative *) SUB factor (* unary negation *) factor; factor ::= number LPAREN exp RPAREN; number ::= float integer; Parser Generators XML Parse Trees for Numexps in Aurochs (* Tokens the same as before, so omitted *) (* Expressions *) start ::= exp sp EOF; exp ::= <add> term ADD exp </add> (* right associative *) <sub> term SUB exp </sub> (* right associative *) term; term ::= <mul> factor MUL term </mul> (* right associative *) <div> factor DIV term </div> (* right associative *) <neg> SUB factor </neg> factor; factor ::= number LPAREN exp RPAREN; number ::= <number> value:(float integer) </number>; Parser Generators

16 Testing Numexp Parser calculator]$ aurochs -quiet -parse exp.txt numexp.peg <Root> <mul> <sub> <number value="1"/> <sub> <number value="2"/> <number value="3"/> Contents of the file </sub> exp.txt (1-2-3)*-( e-9) </sub> <neg> <add> <neg> <number value="4"/> </neg> <add> <number value="5.6"/> <number value="7.8e-9"/> </add> </add> </neg> </mul> </Root> Parser Generators Transforming Left Recursion (* Expressions *) start ::= exp sp EOF; exp ::= term <addsub> (op:(add SUB) ( exp)* </addsub>; term ::= <neg> SUB factor </neg> factor <muldiv> (op:(mul DIV) term)* </muldiv>; factor ::= number LPAREN exp RPAREN; number ::= <number> value:(float integer) </number>; Parser Generators

17 Left Recursion Transformation Exampl calculator]$ aurochs -quiet -parse sub.txt numexp-leftrec.peg <Root> <number value="1"/> <muldiv/> <addsub op="-"> <number value="2"/> <muldiv/> <addsub op="-"> Contents of the file sub.txt <number value="3"/> ( ) <muldiv/> <addsub op="-"> <number value="4"/> <muldiv/> <addsub/> </addsub> </addsub> </addsub> <muldiv/> <addsub/> </Root> Parser Generators Moving Spaces from Tokens to Expressions (* Tokens *) float ::= '-'? decimal+ ('.' decimal+)? ('e' ('-' '+')? decimal+)?; integer ::= '-'? decimal+; decimal ::= [0-9]; sp ::= (' ' '\n' '\r' '\t')*; (* any number of whitespaces *) ADD ::= ' + '; SUB ::= '-'; MUL ::= '*'; DIV ::= '/'; /; LPAREN ::= '('; RPAREN ::= ')'; (* Expressions *) start ::= sp exp sp EOF; exp ::= <add> sp term sp ADD sp exp sp </add> (* right associative *) <sub> sp term sp SUB sp exp sp </sub> (* right associative *) term; term ::= <mul> sp factor sp MUL sp term sp </mul> (* right associative *) <div> sp factor sp DIV sp term sp </div> (* right associative *) <neg> sp SUB sp factor sp </neg> sp factor sp; factor ::= sp number sp sp LPAREN sp exp sp RPAREN sp; number ::= <number> sp value:(float integer) sp </number>; Parser Generators

18 Modifiers: Surrounding, Outfixing, and Fris The following modifiers take an expression x as parameter and transform each production a ::= b as follows: apping x do... done transforms them into a ::= b x preping x do... done transforms them into a ::= x b surrounding x do... done transforms them into a ::= x b x The following modifiers transform each concatenation a b c: outfixing x do... done transforms them into x a x b x c x suffixing x do... done transforms them into a x b x c x prefixing x do... done transforms them into x a x b x c Note that these modifiers also act within implicit concatenations produced by the operators '*' and '+'. These rules found at Parser Generators A Shorthand for all those Expression Spaces start ::= sp exp sp EOF; outfixing sp do surrounding sp do exp ::= <add> term ADD exp </add> (* right associative *) <sub> term SUB exp </sub> (* right associative *) term; term ::= <mul> factor MUL term </mul> (* right associative *) <div> factor DIV term </div> (* right associative *) <neg> SUB factor </neg> factor; factor ::= number LPAREN exp RPAREN; number ::= <number> value:(float integer) </number>; done; done; Parser Generators

19 Aurochs Productions for Slip-- Tokens whitespace ::= [ \n\r\t] BOF; (* BOF = beginning of file token *) (* sp is at least one ignorable unit (space or comment) *) sp ::= whitespace*; PRINT ::= "print"; BEGIN ::= "begin"; END ::= ""; keyword ::= <keyword> value: (PRINT BEGIN END) </keyword>; ADD ::= '+'; SUB ::= '-'; MUL ::= '*'; DIV ::= '/'; op ::= <op> value: (ADD SUB MUL DIV) </op>; LPAREN ::= "("; RPAREN ::= ")"; SEMI ::= ";"; GETS ::= ":="; punct ::= <punct> value: (LPAREN RPAREN SEMI GETS) </punct> ; alpha ::= [a-za-z]; alphanumund ::= [a-za-z0-9_]; digit ::= [0-9]; reserved ::= "print" "begin" ""; ident ::= <ident> value: (~(reserved sp) alpha alphanumund*) </ident>; integer ::= <integer> value: digit+ </integer>; token ::= sp (keyword punct op ident integer) sp; tokenlist ::= <tokenlist> token* </tokenlist>; start ::= tokenlist EOF; Parser Generators Slip-- Tokenizing example [lynux@localhost slipmm-parsers]$ aurochs -quiet -parse simple.slip SlipmmTokens.peg <Root> <tokenlist> <keyword value="begin"/> <ident value="x"/> <punct value=":="/> <punct value="("/> <integer value="3"/> <op value="+"/> <integer value="4"/> <punct value=")"/> <punct value=";"/> <keyword value="print"/> <punct value="("/> <punct value="("/> <ident value="x"/> <op value="-"/> <integer value="1"/> <punct value=")"/> <op value="*"/> <punct value="("/> <ident value="x"/> <op value="+"/> <integer value="2"/> <punct value=")"/> <punct value=")"/> <punct value=";"/> <keyword value=""/> </tokenlist> </Root> Contents of the file simple.slip begin x := (3+4); print ((x-1)*(x+2)); Parser Generators

20 Slip-- Tokens with Positions slipmm-parsers]$ aurochs -quiet -parse simple.slip SlipmmTokensPositions.peg <Root> <tokenlist> <token start="0" stop="5"> (* Only Change to.peg file. *) <keyword value="begin"/> </token> <token start="9" stop="10"> <ident value="x"/> </token> <token start="11" stop="13"> <punct value=":="/> </token> <token start="14" stop="15"> <punct value="("/> </token> <token start="15" stop="16"> <integer value="3"/> </token> <token start="16" stop="17"> <op value="+"/> </token> many more tokens. token ::= <token> sp start:position (keyword punct op ident integer) stop:position sp </token>; Contents t of the file simple.slip begin x := (3+4); print ((x-1)*(x+2)); Parser Generators Line Comments How to add line-terminated comments introduced by #? begin sum := 5+3; # Set sum to 8 print(sum); # Display sum (* Only changes in.peg file *) whitespace ::= [ \n\r\t] BOF; (* BOF = beginning of file token *) linecomment ::= '#' [^\n]* '\n'; (* Comment goes from '#' to of line *) (* sp is at least one ignorable unit (space or comment) *) sp ::= (whitespace linecomment )*; Parser Generators

21 Nonnestable Block Comments begin sum := 5+3; # Set sum to 8 { Comment out several lines: x := sum * 2; z := x * x; } print(sum); # Display sum (* Only changes in.peg file *) whitespace ::= [ \n\r\t] BOF; (* BOF = beginning of file token *) linecomment ::= '#' [^\n]* '\n'; (* Comment goes from '#' to of line *) blockcomment ::= '{' ((~ '}') sigma)* '}'; (* Nonnestable block comment *) (* sp is at least one ignorable unit (space or comment) *) sp ::= (whitespace linecomment blockcomment)*; Parser Generators Nestable Block Comments begin sum := 5+3; # Set sum to 8 { Comment out several lines: x := sum * 2; { Illustrate nested block comments: y = sum - 3:} z := x * x; } print(sum); # Display sum (* Only changes in.peg file *) whitespace ::= [ \n\r\t] BOF; (* BOF = beginning of file token *) linecomment ::= '#' [^\n]* '\n'; (* Comment goes from '#' to of line *) blockcomment ::= '{' (blockcomment ((~ '}') sigma))* '}'; (* Properly nesting block comment *) (* sp is at least one ignorable unit (space or comment) *) sp ::= (whitespace linecomment blockcomment)*; Parser Generators

22 Slip -- Programs (* Assume tokens unchanged from before *) start ::= program; outfixing sp do apping sp do program ::= stm EOF; stm ::= <Assign> var:ident GETS exp </Assign> <Print> PRINT exp </Print> <StmList> BEGIN stmlist END </StmList>; stmlist ::= (stm SEMI)*; exp ::= <Id> name:ident </Id> <Int> value:integer </Int> <BinApp> LPAREN exp binop:op exp RPAREN </BinApp>; done; done; Parser Generators Aurouchs Errors -Fatal error: exception Aurochs.Parse_error(1033) This means that the.peg file has an error at the specified character number (1033 in this case). Note: start position is 0-indexed; compared to 1-indexed emacs position in M-x goto-char. -PARSE ERROR IN FILE nesting-block-comment-test.slip AT CHARACTER 12 This means that the parser encountered an error in parsing the input file at the specified ed character number. Parser Generators

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc Erik Ernst Aarhus University Parser generators, ML-Yacc LR parsers are tedious to write, but can be generated, e.g., by ML-Yacc Input:

More information

Topic 5: Syntax Analysis III

Topic 5: Syntax Analysis III Topic 5: Syntax Analysis III Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994 A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994 Andrew W. Appel 1 James S. Mattson David R. Tarditi 2 1 Department of Computer Science, Princeton University 2 School of Computer

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

Parser Tools: lex and yacc-style Parsing

Parser Tools: lex and yacc-style Parsing Parser Tools: lex and yacc-style Parsing Version 6.11.0.6 Scott Owens January 6, 2018 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1

More information

Topic 3: Syntax Analysis I

Topic 3: Syntax Analysis I Topic 3: Syntax Analysis I Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis

More information

Parser Tools: lex and yacc-style Parsing

Parser Tools: lex and yacc-style Parsing Parser Tools: lex and yacc-style Parsing Version 5.0 Scott Owens June 6, 2010 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1 Creating

More information

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32 G53CMP: Lecture 4 Syntactic Analysis: Parser Generators Henrik Nilsson University of Nottingham, UK G53CMP: Lecture 4 p.1/32 This Lecture Parser generators ( compiler compilers ) The parser generator Happy

More information

Regular Language Applications

Regular Language Applications Regular Language Applications Monday, October 22, 2007 Revised Version, Wednesday, October 24, 2007 Reading: Stoughton 3.14, Appel Chs. 1 and 2 CS235 Languages and Automata Department of Computer Science

More information

ML-Yacc User s Manual Version 2.4

ML-Yacc User s Manual Version 2.4 ML-Yacc User s Manual Version 2.4 David R. Tarditi 1 Andrew W. Appel 2 1 Microsoft Research 2 Department of Computer Science Princeton University Princeton, NJ 08544 April 24, 2000 (c) 1989, 1990, 1991,1994

More information

CSE 401 Midterm Exam Sample Solution 11/4/11

CSE 401 Midterm Exam Sample Solution 11/4/11 Question 1. (12 points, 2 each) The front end of a compiler consists of three parts: scanner, parser, and (static) semantics. Collectively these need to analyze the input program and decide if it is correctly

More information

Compilation 2014 Warm-up project

Compilation 2014 Warm-up project Compilation 2014 Warm-up project Aslan Askarov aslan@cs.au.dk Revised from slides by E. Ernst Straight-line Programming Language Toy programming language: no branching, no loops Skip lexing and parsing

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers. Part III : Parsing From Regular to Context-Free Grammars Deriving a Parser from a Context-Free Grammar Scanners and Parsers A Parser for EBNF Left-Parsable Grammars Martin Odersky, LAMP/DI 1 From Regular

More information

CS 11 Ocaml track: lecture 6

CS 11 Ocaml track: lecture 6 CS 11 Ocaml track: lecture 6 n Today: n Writing a computer language n Parser generators n lexers (ocamllex) n parsers (ocamlyacc) n Abstract syntax trees Problem (1) n We want to implement a computer language

More information

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012 LR(0) Drawbacks Consider the unambiguous augmented grammar: 0.) S E $ 1.) E T + E 2.) E T 3.) T x If we build the LR(0) DFA table, we find that there is a shift-reduce conflict. This arises because the

More information

CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

More information

Lecture Notes on Bottom-Up LR Parsing

Lecture Notes on Bottom-Up LR Parsing Lecture Notes on Bottom-Up LR Parsing 15-411: Compiler Design Frank Pfenning Lecture 9 1 Introduction In this lecture we discuss a second parsing algorithm that traverses the input string from left to

More information

Problem Set 6 (Optional) Due: 5pm on Thursday, December 20

Problem Set 6 (Optional) Due: 5pm on Thursday, December 20 CS235 Languages and Automata Handout # 13 Prof. Lyn Turbak December 12, 2007 Wellesley College Problem Set 6 (Optional) Due: 5pm on Thursday, December 20 All problems on this problem are completely optional.

More information

Building Compilers with Phoenix

Building Compilers with Phoenix Building Compilers with Phoenix Parser Generators: ANTLR History of ANTLR ANother Tool for Language Recognition Terence Parr's dissertation: Obtaining Practical Variants of LL(k) and LR(k) for k > 1 PCCTS:

More information

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science Syntax and Parsing COMS W4115 Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science Lexical Analysis (Scanning) Lexical Analysis (Scanning) Goal is to translate a stream

More information

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing ) COMPILR (CS 4120) (Lecture 6: Parsing 4 Bottom-up Parsing ) Sungwon Jung Mobile Computing & Data ngineering Lab Dept. of Computer Science and ngineering Sogang University Seoul, Korea Tel: +82-2-705-8930

More information

Downloaded from Page 1. LR Parsing

Downloaded from  Page 1. LR Parsing Downloaded from http://himadri.cmsdu.org Page 1 LR Parsing We first understand Context Free Grammars. Consider the input string: x+2*y When scanned by a scanner, it produces the following stream of tokens:

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Compilers. Bottom-up Parsing. (original slides by Sam

Compilers. Bottom-up Parsing. (original slides by Sam Compilers Bottom-up Parsing Yannis Smaragdakis U Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Bottom-Up Parsing More general than top-down parsing And just as efficient Builds

More information

CMSC 350: COMPILER DESIGN

CMSC 350: COMPILER DESIGN Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

More information

Syntax Analysis. Tokens ---> Parse Tree. Main Problems. Grammars. Convert the list of tokens into a parse tree ( hierarchical analysis)

Syntax Analysis. Tokens ---> Parse Tree. Main Problems. Grammars. Convert the list of tokens into a parse tree ( hierarchical analysis) Syntax Analysis Convert the list of tokens into a parse tree ( hierarchical analysis) source program lexical analyzer token get next token parser parse tree The syntactic structure is specified using context-free

More information

Abstract Syntax. Mooly Sagiv. html://www.cs.tau.ac.il/~msagiv/courses/wcc06.html

Abstract Syntax. Mooly Sagiv. html://www.cs.tau.ac.il/~msagiv/courses/wcc06.html Abstract Syntax Mooly Sagiv html://www.cs.tau.ac.il/~msagiv/courses/wcc06.html Outline The general idea Cup Motivating example Interpreter for arithmetic expressions The need for abstract syntax Abstract

More information

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

More information

LECTURE 7. Lex and Intro to Parsing

LECTURE 7. Lex and Intro to Parsing LECTURE 7 Lex and Intro to Parsing LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens) and create real programs that can recognize them.

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Lecture Notes on Bottom-Up LR Parsing

Lecture Notes on Bottom-Up LR Parsing Lecture Notes on Bottom-Up LR Parsing 15-411: Compiler Design Frank Pfenning Lecture 9 September 23, 2009 1 Introduction In this lecture we discuss a second parsing algorithm that traverses the input string

More information

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing Parsing Wrapup Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing LR(1) items Computing closure Computing goto LR(1) canonical collection This lecture LR(1) parsing Building ACTION

More information

LECTURE 11. Semantic Analysis and Yacc

LECTURE 11. Semantic Analysis and Yacc LECTURE 11 Semantic Analysis and Yacc REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely specifying valid structures with a context-free

More information

MP 3 A Lexer for MiniJava

MP 3 A Lexer for MiniJava MP 3 A Lexer for MiniJava CS 421 Spring 2010 Revision 1.0 Assigned Tuesday, February 2, 2010 Due Monday, February 8, at 10:00pm Extension 48 hours (20% penalty) Total points 50 (+5 extra credit) 1 Change

More information

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis Prelude COMP 8 October, 9 What is triskaidekaphobia? Fear of the number s? No aisle in airplanes, no th floor in buildings Fear of Friday the th? Paraskevidedekatriaphobia or friggatriskaidekaphobia Why

More information

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create

More information

Parsing and Pattern Recognition

Parsing and Pattern Recognition Topics in IT 1 Parsing and Pattern Recognition Week 10 Lexical analysis College of Information Science and Engineering Ritsumeikan University 1 this week mid-term evaluation review lexical analysis its

More information

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421) Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC http://courses.engr.illinois.edu/cs421 Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha 10/20/16

More information

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019 Comp 411 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright January 11, 2019 Top Down Parsing What is a context-free grammar (CFG)? A recursive definition of a set of strings; it is

More information

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1 Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery Last modified: Mon Feb 23 10:05:56 2015 CS164: Lecture #14 1 Shift/Reduce Conflicts If a DFA state contains both [X: α aβ, b] and [Y: γ, a],

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-17/cc/ Recap: LR(1) Parsing Outline of Lecture 11 Recap: LR(1)

More information

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4 Regular Expressions Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC http://courses.engr.illinois.edu/cs421 Based in part on slides by Mattox Beckman, as updated by Vikram Adve

More information

ML 4 A Lexer for OCaml s Type System

ML 4 A Lexer for OCaml s Type System ML 4 A Lexer for OCaml s Type System CS 421 Fall 2017 Revision 1.0 Assigned October 26, 2017 Due November 2, 2017 Extension November 4, 2017 1 Change Log 1.0 Initial Release. 2 Overview To complete this

More information

MP 3 A Lexer for MiniJava

MP 3 A Lexer for MiniJava MP 3 A Lexer for MiniJava CS 421 Spring 2012 Revision 1.0 Assigned Wednesday, February 1, 2012 Due Tuesday, February 7, at 09:30 Extension 48 hours (penalty 20% of total points possible) Total points 43

More information

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F). CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

More information

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers Plan for Today Review main idea syntax-directed evaluation and translation Recall syntax-directed interpretation in recursive descent parsers Syntax-directed evaluation and translation in shift-reduce

More information

Using an LALR(1) Parser Generator

Using an LALR(1) Parser Generator Using an LALR(1) Parser Generator Yacc is an LALR(1) parser generator Developed by S.C. Johnson and others at AT&T Bell Labs Yacc is an acronym for Yet another compiler compiler Yacc generates an integrated

More information

Compilers and Language Processing Tools

Compilers and Language Processing Tools Compilers and Language Processing Tools Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern c Prof. Dr. Arnd Poetzsch-Heffter 1 Parser Generators c Prof. Dr. Arnd

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-17/cc/ Recap: LR(1) Parsing LR(1) Items and Sets Observation:

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input

More information

Syntax-Directed Translation. Lecture 14

Syntax-Directed Translation. Lecture 14 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik) 9/27/2006 Prof. Hilfinger, Lecture 14 1 Motivation: parser as a translator syntax-directed translation stream of tokens parser ASTs,

More information

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer CS664 Compiler Theory and Design LIU 1 of 16 ANTLR Christopher League* 17 February 2016 ANTLR is a parser generator. There are other similar tools, such as yacc, flex, bison, etc. We ll be using ANTLR

More information

A simple syntax-directed

A simple syntax-directed Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character

More information

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

More information

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006 Introduction to Yacc General Description Input file Output files Parsing conflicts Pseudovariables Examples General Description A parser generator is a program that takes as input a specification of a

More information

UNIVERSITY OF CALIFORNIA

UNIVERSITY OF CALIFORNIA UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS164 Fall 1997 P. N. Hilfinger CS 164: Midterm Name: Please do not discuss the contents of

More information

CSE 401 Midterm Exam Sample Solution 2/11/15

CSE 401 Midterm Exam Sample Solution 2/11/15 Question 1. (10 points) Regular expression warmup. For regular expression questions, you must restrict yourself to the basic regular expression operations covered in class and on homework assignments:

More information

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler Programming Languages & Compilers Programming Languages and Compilers (CS 421) Three Main Topics of the Course I II III Sasa Misailovic 4110 SC, UIUC https://courses.engr.illinois.edu/cs421/fa2017/cs421a

More information

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421) Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC http://courses.engr.illinois.edu/cs421 Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha 10/30/17

More information

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers Programming Languages & Compilers Programming Languages and Compilers (CS 421) I Three Main Topics of the Course II III Elsa L Gunter 2112 SC, UIUC http://courses.engr.illinois.edu/cs421 New Programming

More information

ASTs, Objective CAML, and Ocamlyacc

ASTs, Objective CAML, and Ocamlyacc ASTs, Objective CAML, and Ocamlyacc Stephen A. Edwards Columbia University Fall 2012 Parsing and Syntax Trees Parsing decides if the program is part of the language. Not that useful: we want more than

More information

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1 Review of CFGs and Parsing II Bottom-up Parsers Lecture 5 1 Outline Parser Overview op-down Parsers (Covered largely through labs) Bottom-up Parsers 2 he Functionality of the Parser Input: sequence of

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) Introduction This semester, through a project split into 3 phases, we are going

More information

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character

More information

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages! CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

More information

CM-Lex and CM-Yacc User s Manual (Version 2.1)

CM-Lex and CM-Yacc User s Manual (Version 2.1) CM-Lex and CM-Yacc User s Manual (Version 2.1) Karl Crary Carnegie Mellon University December 21, 2017 1 CM-Lex CM-Lex is a lexer generator for Standard ML, OCaml, and Haskell. It converts a specification

More information

Syntax-Directed Translation

Syntax-Directed Translation Syntax-Directed Translation ALSU Textbook Chapter 5.1 5.4, 4.8, 4.9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 What is syntax-directed translation? Definition: The compilation

More information

Alternatives for semantic processing

Alternatives for semantic processing Semantic Processing Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Parsing Expression Grammars and Packrat Parsing. Aaron Moss Parsing Expression Grammars and Packrat Parsing Aaron Moss References > B. Ford Packrat Parsing: Simple, Powerful, Lazy, Linear Time ICFP (2002) > Parsing Expression Grammars: A Recognition- Based Syntactic

More information

Question Points Score

Question Points Score CS 453 Introduction to Compilers Midterm Examination Spring 2009 March 12, 2009 75 minutes (maximum) Closed Book You may use one side of one sheet (8.5x11) of paper with any notes you like. This exam has

More information

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides

More information

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava. Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

In One Slide. Outline. LR Parsing. Table Construction

In One Slide. Outline. LR Parsing. Table Construction LR Parsing Table Construction #1 In One Slide An LR(1) parsing table can be constructed automatically from a CFG. An LR(1) item is a pair made up of a production and a lookahead token; it represents a

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren. COMP 520 Winter 2015 Parsing COMP 520: Compiler Design (4 credits) Professor Laurie Hendren hendren@cs.mcgill.ca Parsing (1) COMP 520 Winter 2015 Parsing (2) A parser transforms a string of tokens into

More information

Chapter 2 - Programming Language Syntax. September 20, 2017

Chapter 2 - Programming Language Syntax. September 20, 2017 Chapter 2 - Programming Language Syntax September 20, 2017 Specifying Syntax: Regular expressions and context-free grammars Regular expressions are formed by the use of three mechanisms Concatenation Alternation

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Lecture Notes on Shift-Reduce Parsing

Lecture Notes on Shift-Reduce Parsing Lecture Notes on Shift-Reduce Parsing 15-411: Compiler Design Frank Pfenning, Rob Simmons, André Platzer Lecture 8 September 24, 2015 1 Introduction In this lecture we discuss shift-reduce parsing, which

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science Syntax and Parsing COMS W4115 Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science Lexical Analysis (Scanning) Lexical Analysis (Scanning) Translates a stream of characters

More information

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic) A clarification on terminology: Recognizer: accepts or rejects strings in a language Parser: recognizes and generates parse trees (imminent topic) Assignment 3: building a recognizer for the Lake expression

More information

Last Time. What do we want? When do we want it? An AST. Now!

Last Time. What do we want? When do we want it? An AST. Now! Java CUP 1 Last Time What do we want? An AST When do we want it? Now! 2 This Time A little review of ASTs The philosophy and use of a Parser Generator 3 Translating Lists CFG IdList -> id IdList comma

More information

Compiler Construction Assignment 3 Spring 2018

Compiler Construction Assignment 3 Spring 2018 Compiler Construction Assignment 3 Spring 2018 Robert van Engelen µc for the JVM µc (micro-c) is a small C-inspired programming language. In this assignment we will implement a compiler in C++ for µc.

More information

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

More information

Syntax. 2.1 Terminology

Syntax. 2.1 Terminology Syntax 2 Once you ve learned to program in one language, learning a similar programming language isn t all that hard. But, understanding just how to write in the new language takes looking at examples

More information

S Y N T A X A N A L Y S I S LR

S Y N T A X A N A L Y S I S LR LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number

More information

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser JavaCC Parser The Compilation Task Input character stream Lexer stream Parser Abstract Syntax Tree Analyser Annotated AST Code Generator Code CC&P 2003 1 CC&P 2003 2 Automated? JavaCC Parser The initial

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite LALR r Driver Action Table for CSX-Lite Given the GoTo and parser action tables, a Shift/Reduce (LALR) parser is fairly simple: { S 5 9 5 9 void LALRDriver(){ Push(S ); } R S R R R R5 if S S R S R5 while(true){

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017 CS 426 Fall 2017 1 Machine Problem 1 Machine Problem 1 CS 426 Compiler Construction Fall Semester 2017 Handed Out: September 6, 2017. Due: September 21, 2017, 5:00 p.m. The machine problems for this semester

More information