Scanners and parsers. A scanner or lexer transforms a string of characters into a string of tokens: uses a combination of deterministic finite

Size: px
Start display at page:

Download "Scanners and parsers. A scanner or lexer transforms a string of characters into a string of tokens: uses a combination of deterministic finite"

Transcription

1 COMP 520 Fall 2008 Scanners and Parsers (1) COMP 520 Fall 2008 Scanners and Parsers (2) Scanners and parsers A scanner or leer transforms a string of characters into a string of tokens: uses a combination of deterministic finite automata (DFA) plus some glue code to make it work can be generated by tools like fle (or le), JFle,... joos.l fle foo.joos le.yy.c gcc scanner tokens COMP 520 Fall 2008 Scanners and Parsers (3) COMP 520 Fall 2008 Scanners and Parsers (4) A parser transforms a string of tokens into a parse tree, according to some grammar: it corresponds to a deterministic push-down automaton plus some glue code to make it work can be generated by bison (or yacc), CUP, ANTLR, SableCC, Beaver, JavaCC,... joos.y Tokens are defined by regular epressions:, the empty set: a language with no strings ε, the empty string a, where a Σ and Σ is our alphabet M N, alternation: either M or N M N, concatenation: M followed by N M, zero or more occurences of M where M and N are both regular epressions. What are M? and M +? bison tokens y.tab.c gcc parser AST We can write regular epressions for the tokens in our source language using standard POSIX notation: simple operators: "*", "/", "+", "-" parentheses: "(", ")" integer constants: 0 ([1-9][0-9]*) entifiers: [a-za-z_][a-za-z0-9_]* white space: [ \t\n]+

2 COMP 520 Fall 2008 Scanners and Parsers (5) COMP 520 Fall 2008 Scanners and Parsers (6) fle accepts a list of regular epressions (rege), converts each rege internally to an NFA (Thompson construction), and then converts each NFA to a DFA (see Appel, Ch. 2): \t\n \t\n * / ( ) a-za-z a-za-z0-9 Given DFAs D 1,..., D n, ordered by the input rule order, the behaviour of a fle-generated scanner on an input string is: while input is not empty do s i := the longest prefi that D i accepts l := ma{ s i if l > 0 then j := min{i : s i = l remove s j from input end perform the j th action else (error case) move one character from input to output end In nglish: The longest initial substring match forms the net token, and it is subject to some action The first rule to match breaks any ties ach DFA has an associated action. Non-matching characters are echoed back COMP 520 Fall 2008 Scanners and Parsers (7) COMP 520 Fall 2008 Scanners and Parsers (8) Why the longest match principle? ample: keywords Why the first match principle? Again ample: keywords [ \t]+ /* ignore */... import return timport... [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *)malloc(strlen(yytet)+1) printf(yylval.stringconst,"%s",yytet) return tidntifir Want to match importedfiles as tidntifir(importedfiles) and not as timport tidntifir(edfiles). Because we prefer longer matches, we get the right result. [ \t]+ /* ignore */... continue return tcontinu... [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *)malloc(strlen(yytet)+1) printf(yylval.stringconst,"%s",yytet) return tidntifir Want to match continue foo as tcontinu tidntifir(foo) and not as tidntifir(continue) tidntifir(foo). First match rule gives us the right answer: When both tcontinu and tidntifir match, prefer the first.

3 COMP 520 Fall 2008 Scanners and Parsers (9) COMP 520 Fall 2008 Scanners and Parsers (10) When first longest match (flm) is not enough, look-ahead may help. Another eample taken from FORTRAN: Fortran ignores whitespace FORTRAN allows for the following tokens:.q., 363, 363.,.363 flm analysis of 363.Q.363 gives us: tfloat(363) Q tfloat(0.363) What we actually want is: tintgr(363) tq tintgr(363) fle allows us to use look-ahead, using / : 363/.Q. return tintgr 1. DO5I = 1.25 DO5I=1.25 in C: do5i = DO 5 I = 1,25 DO5I=1,25 in C: for(i=1i<25++i){... (5 is interpreted as a line number here) Case 1: flm analysis correct: tid(do5i) tq tral(1.25) Case 2: want: tdo tint(5) tid(i) tq tint(1) tcomma tint(25) Cannot make decision on tdo until we see the comma! Look-ahead comes to the rescue: DO/({letter {digit)*=({letter {digit)*, return tdo COMP 520 Fall 2008 Scanners and Parsers (11) COMP 520 Fall 2008 Scanners and Parsers (12) $ cat print_tokens.l # fle source code /* includes and other arbitrary C code */ %{ #include <stdio.h> /* for printf */ % /* helper definitions */ DIGIT [0-9] /* rege + action rules come after the first */ [ \t\n]+ printf ("white space, length %i\n", yyleng) "*" printf ("times\n") "/" printf ("div\n") "+" printf ("plus\n") "-" printf ("minus\n") "(" printf ("left parenthesis\n") ")" printf ("right parenthesis\n") 0 ([1-9]{DIGIT*) printf ("integer constant: %s\n", yytet) [a-za-z_][a-za-z0-9_]* printf ("entifier: %s\n", yytet) /* user code comes after the second */ main () { yyle () Using fle to create a scanner is really simple: $ emacs print_tokens.l $ fle print_tokens.l $ gcc -o print_tokens le.yy.c -lfl When input a*(b-17) + 5/c: $ echo "a*(b-17) + 5/c"./print_tokens our print tokens scanner outputs: entifier: a times left parenthesis entifier: b minus integer constant: 17 right parenthesis white space, length 1 plus white space, length 1 integer constant: 5 div entifier: c white space, length 1 You should confirm this for yourself!

4 COMP 520 Fall 2008 Scanners and Parsers (13) COMP 520 Fall 2008 Scanners and Parsers (14) Count lines and characters: %{ int lines = 0, chars = 0 % \n lines++ chars++. chars++ main () { yyle () printf ("#lines = %i, #chars = %i\n", lines, chars) Remove vowels and increment integers: %{ #include <stdlib.h> /* for atoi */ #include <stdio.h> /* for printf */ % [aeiouy] /* ignore */ [0-9]+ printf ("%i", atoi (yytet) + 1) main () { yyle () A contet-free grammar is a 4-tuple (V, Σ, R, S), where we have: V, a set of variables (or non-terminals) Σ, a set of terminals such that V Σ = R, a set of rules, where the LHS is a variable in V and the RHS is a string of variables in V and terminals in Σ S V, the start variable CFGs are stronger than regular epressions, and able to epress recursively-defined constructs. ample: we cannot write a regular epression for any number of matched parentheses: (), (()), ((())),... Using a CFG: ( ) ǫ COMP 520 Fall 2008 Scanners and Parsers (15) COMP 520 Fall 2008 Scanners and Parsers (16) Automatic parser generators use CFGs as input and generate parsers using the machinery of a deterministic pushdown automaton. joos.y Simple CFG eample: A a B A ǫ B b B B c Alternatively: A a B ǫ B b B c bison tokens In both cases we specify S = A. Can you write this grammar as a regular epression? y.tab.c gcc parser We can perform a rightmost derivation by repeatedly replacing variables with their RHS until only terminals remain: AST By limiting the kind of CFG allowed, we get efficient parsers. A a B a b B a b b B a b b c

5 COMP 520 Fall 2008 Scanners and Parsers (17) COMP 520 Fall 2008 Scanners and Parsers (18) There are several different grammar formalisms. First, conser BNF (Backus-Naur Form): stmt ::= stmt_epr "" while_stmt block if_stmt while_stmt ::= WHIL "(" epr ")" stmt block ::= "{" stmt_list "" if_stmt ::= IF "(" epr ")" stmt IF "(" epr ")" stmt LS stmt We have four options for stmt list: 1. stmt list ::= stmt list stmt ǫ 0 or more, left-recursive 2. stmt list ::= stmt stmt list ǫ 0 or more, right-recursive 3. stmt list ::= stmt list stmt stmt 1 or more, left-recursive 4. stmt list ::= stmt stmt list stmt 1 or more, right-recursive Second, conser BNF (tended BNF): BNF derivations BNF A A a b b A a A b { a (left-recursive) A a a b a a A a A b b a A A { a b (right-recursive) a a A a a b where { and are like Kleene * s in regular epressions. Using BNF repetition, our four choices for stmt_list become: 1. stmt_list ::= { stmt 2. stmt_list ::= { stmt 3. stmt_list ::= { stmt stmt 4. stmt_list ::= stmt { stmt COMP 520 Fall 2008 Scanners and Parsers (19) COMP 520 Fall 2008 Scanners and Parsers (20) BNF also has an optional-construct. For eample: stmt_list ::= stmt stmt_list stmt could be written as: Third, conser railroad synta diagrams: (thanks rail.sty!)??? stmt_list ::= stmt [ stmt_list ] And similarly: if_stmt ::= IF "(" epr ")" stmt IF "(" epr ")" stmt LS stmt could be written as: if_stmt ::= IF "(" epr ")" stmt [ LS stmt ] where [ and ] are like? in regular epressions.

6 COMP 520 Fall 2008 Scanners and Parsers (21) COMP 520 Fall 2008 Scanners and Parsers (22)??? S S S L S := num L L, S print ( L ) a := 7 b := c + (d := 5 + 6, d) + ( S, ) S (rightmost derivation) S S S := S := + S := + (S, ) S := + (S, ) S := + ( :=, ) S := + ( := +, ) S := + ( := + num, ) S := + ( := num + num, ) S := + ( := num + num, ) := := + ( := num + num, ) := num := + ( := num + num, ) COMP 520 Fall 2008 Scanners and Parsers (23) COMP 520 Fall 2008 Scanners and Parsers (24) S S S L S := num L L, S print ( L ) a := 7 b := c + (d := 5 + 6, d) + S S ( S, ) S := := num + ( S, ) := + A grammar is ambiguous if a sentence has different parse trees: := + + S := + + S := + The above is harmless, but conser: := - - := + * + Clearly, we need to conser associativity and precedence when designing grammars. num num

7 COMP 520 Fall 2008 Scanners and Parsers (25) COMP 520 Fall 2008 Scanners and Parsers (26) An ambiguous grammar: / ( ) num + may be rewritten to become unambiguous: There are fundamentally two kinds of parser: 1) Top-down, predictive or recursive descent parsers. Used in all languages designed by Wirth, e.g. Pascal, Modula, and Oberon. + T T T F F T T T / F F num T T F F ( ) + T T T * F One can (easily) write a predictive parser by hand, or generate one from an LL(k) grammar: Left-to-right parse Leftmost-derivation and F F k symbol lookahead. Algorithm: look at beginning of input (up to k characters) and unambiguously epand leftmost non-terminal. COMP 520 Fall 2008 Scanners and Parsers (27) COMP 520 Fall 2008 Scanners and Parsers (28) 2) Bottom-up parsers. Algorithm: look for a sequence matching RHS and reduce to LHS. Postpone any decision until entire RHS is seen, plus k tokens lookahead. Can write a bottom-up parser by hand (tricky), or generate one from an LR(k) grammar (easy): Left-to-right parse Rightmost-derivation and k symbol lookahead. The -reduce bottom-up parsing technique. 1) tend the grammar with an end-of-file $, introduce fresh start symbol S : S S$ S S S L S := num L L, S print ( L ) + ( S, ) 2) Choose between the following actions: : move first input token to top of stack reduce: replace α on top of stack by X for some rule X α accept: when S is on the stack

8 COMP 520 Fall 2008 Scanners and Parsers (29) COMP 520 Fall 2008 Scanners and Parsers (30) a:=7 b:=c+(d:=5+6,d)$ :=7 b:=c+(d:=5+6,d)$ 7 b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ :=c+(d:=5+6,d)$ c+(d:=5+6,d)$ +(d:=5+6,d)$ := := num := S S S S := S := S := S := + S := + ( S := + ( S := + ( := S := + ( := num S := + ( := S := + ( := + +(d:=5+6,d)$ (d:=5+6,d)$ d:=5+6,d)$ :=5+6,d)$ S := + ( := + num S := + ( := + S := + ( := S := + ( S S := + ( S, S := + ( S, S := + ( S, S := + ( S, ) S := + S := S S S S$ S 5+6,d)$ +6,d)$ +6,d)$ 6,d)$,d)$,d)$,d)$,d)$ d)$ )$ )$ $ $ $ $ $ num S := num num + S := (S) + S := S SS S S$ accept 0 S S$ 5 num 1 S S S S := 7 ( S, ) 3 S print ( L ) 8 L 4 9 L L, Use a DFA to choose the action the stack only contains DFA states now. Start with the initial state (s1) on the stack. Lookup (stack top, net input symbol): (n): skip net input symbol and push state n reduce(k): rule k is X α pop α times lookup (stack top, X) in table goto(n): push state n accept: report success error: report failure COMP 520 Fall 2008 Scanners and Parsers (31) COMP 520 Fall 2008 Scanners and Parsers (32) DFA terminals non-terminals state num print, + := ( ) $ S L 1 s4 s7 g2 2 s3 a 3 s4 s7 g5 4 s6 5 r1 r1 r1 6 s20 s10 s8 g11 7 s9 8 s4 s7 g12 9 g15 g14 10 r5 r5 r5 r5 r5 11 r2 r2 s16 r2 12 s3 s18 13 r3 r3 r3 14 s19 s13 15 r8 r8 16 s20 s10 s8 g17 17 r6 r6 s16 r6 r6 18 s20 s10 s8 g21 19 s20 s10 s8 g23 20 r4 r4 r4 r4 r4 21 s22 22 r7 r7 r7 r7 r7 23 r9 s16 r9 s 1 a := 7$ (4) s 1 s 4 := 7$ (6) s 1 s 4 s 6 7$ (10) s 1 s 4 s 6 s 10 $ reduce(5): num s 1 s 4 s 6 ////// s 10 $ lookup(s 6,) = goto(11) s 1 s 4 s 6 s 11 $ reduce(2): S := s 1 //// s 4 //// s 6 ////// s 11 $ lookup(s 1,S) = goto(2) s 1 s 2 $ accept rror transitions omitted.

9 COMP 520 Fall 2008 Scanners and Parsers (33) COMP 520 Fall 2008 Scanners and Parsers (34) LR(1) is an algorithm that attempts to construct a parsing table: Left-to-right parse Rightmost-derivation and 1 symbol lookahead. If no conflicts (/reduce, reduce/reduce) arise, then we are happy otherwise, fi grammar. An LR(1) item (A α. βγ, ) consists of 1. A grammar production, A αβγ 2. The RHS position, represented by. 3. A lookahead symbol, We first compute a set of LR(1) states from our grammar, and then use them to build a parse table. There are four kinds of entry to make: 1. goto: when β is non-terminal 2. : when β is terminal 3. reduce: when β is empty (the net state is the number of the production used) 4. accept: when we have A B. $ Follow construction on the tiny grammar: 0 S $ 2 T 1 T + 3 T An LR(1) state is a set of LR(1) items. The sequence α is on top of the stack, and the head of the input is derivable from βγ. There are two cases for β, terminal or non-terminal. COMP 520 Fall 2008 Scanners and Parsers (35) COMP 520 Fall 2008 Scanners and Parsers (36) Constructing the LR(1) NFA: start with state S. $? state A α. B β l has: ǫ-successor B. γ, if: eists rule B γ, and lookahead(β) B-successor A α B. β l state A α. β l has: -successor A α. β l Constructing the LR(1) DFA: Standard power-set construction, inlining ǫ-transitions. 1 S.$? S.$? 2.T + $.T $ T T. + T.+ $ 3 T. $ T. $ 5 T. + T. $ 6 T+. $ + T + $ T 1 s5 g2 g3 2 a 3 s4 r2 4 s5 g6 g3 5 r3 r3 6 r1 T +. $.T + $.T $ T. $ T. + 4

10 COMP 520 Fall 2008 Scanners and Parsers (37) COMP 520 Fall 2008 Scanners and Parsers (38) Conflicts LR(1) tables may become very large. A.B A C. y no conflict (lookahead deces) Parser generators use LALR(1), which merges states that are entical ecept for lookaheads. A.B A C. /reduce conflict LL(k) LL(1) LR(k) LR(1) A. A C. y /reduce conflict LALR(1) A B. A C. reduce/reduce conflict LL(0) SLR LR(0) A.B A.C B C s i s j / conflict? by construction of the DFA we have s i = s j COMP 520 Fall 2008 Scanners and Parsers (39) COMP 520 Fall 2008 Scanners and Parsers (40) bison (yacc) is a parser generator: it inputs a grammar it computes an LALR(1) parser table it reports conflicts it resolves conflicts using defaults (!) and it creates a C program. Nobody writes (simple) parsers by hand anymore. The grammar: 1 4 / 7 ( ) 2 num is epressed in bison as: %{ /* C declarations */ % /* Bison declarations tokens come from leer (scanner) */ %token tidntifir tintconst %start ep /* Grammar rules after the first */ ep : tidntifir tintconst ep * ep ep / ep ep + ep ep - ep ( ep ) /* User C code after the second */ Input this code into ep.y to follow the eample.

11 COMP 520 Fall 2008 Scanners and Parsers (41) COMP 520 Fall 2008 Scanners and Parsers (42) The grammar is ambiguous: $ bison --verbose ep.y # --verbose produces ep.output ep.y contains 16 /reduce conflicts. $ cat ep.output State 11 contains 4 /reduce conflicts. State 12 contains 4 /reduce conflicts. State 13 contains 4 /reduce conflicts. State 14 contains 4 /reduce conflicts. [...] state 11 ep -> ep. * ep (rule 3) ep -> ep * ep. (rule 3) <-- problem is here ep -> ep. / ep (rule 4) ep -> ep. + ep (rule 5) ep -> ep. - ep (rule 6) *, and go to state 6 /, and go to state 7 +, and go to state 8 -, and go to state 9 * [reduce using rule 3 (ep)] / [reduce using rule 3 (ep)] + [reduce using rule 3 (ep)] - [reduce using rule 3 (ep)] $default reduce using rule 3 (ep) Rewrite the grammar to force reductions: + T T T F F - T T T / F F num T T F F ( ) %token tidntifir tintconst %start ep ep : ep + term ep - term term term : term * factor term / factor factor factor : tidntifir tintconst ( ep ) COMP 520 Fall 2008 Scanners and Parsers (43) COMP 520 Fall 2008 Scanners and Parsers (44) Or use precedence directives: %token tidntifir tintconst %start ep %left + - /* left-associative, lower precedence */ %left * / /* left-associative, higher precedence */ ep : tidntifir tintconst ep * ep ep / ep ep + ep ep - ep ( ep ) which resolve /reduce conflicts: Conflict in state 11 between rule 5 and token + resolved as reduce. <-- Reduce ep + ep. + Conflict in state 11 between rule 5 and token - resolved as reduce. <-- Reduce ep + ep. - Conflict in state 11 between rule 5 and token * resolved as. <-- Shift ep + ep. * Conflict in state 11 between rule 5 and token / resolved as. <-- Shift ep + ep. / The precedence directives are: %left (left-associative) %right (right-associative) %nonassoc (non-associative) When constructing a parse table, an action is chosen based on the precedence of the last symbol on the right-hand se of the rule. Precedences are ordered from lowest to highest on a linewise basis. If precedences are equal, then: %left favors reducing %right favors ing %nonassoc yields an error This usually ends up working. Note that this is not the same state 11 as before.

12 COMP 520 Fall 2008 Scanners and Parsers (45) COMP 520 Fall 2008 Scanners and Parsers (46) state 0 tidntifir, and go to state 1 tintconst, and go to state 2 (, and go to state 3 ep go to state 4 state 1 ep -> tidntifir. (rule 1) $default reduce using rule 1 (ep) state 2 ep -> tintconst. (rule 2) $default reduce using rule 2 (ep)... state 14 ep -> ep. * ep (rule 3) ep -> ep. / ep (rule 4) ep -> ep / ep. (rule 4) ep -> ep. + ep (rule 5) ep -> ep. - ep (rule 6) $default reduce using rule 4 (ep) state 15 $ go to state 16 state 16 $default accept $ cat ep.y %{ #include <stdio.h> /* for printf */ etern char *yytet /* string from scanner */ vo yyerror() { printf ("synta error before %s\n", yytet) % %union { int intconst char *stringconst %token <intconst> tintconst %token <stringconst> tidntifir %start ep %left + - %left * / ep : tidntifir { printf ("load %s\n", $1) tintconst { printf ("push %i\n", $1) ep * ep { printf ("mult\n") ep / ep { printf ("div\n") ep + ep { printf ("plus\n") ep - ep { printf ("minus\n") ( ep ) { COMP 520 Fall 2008 Scanners and Parsers (47) COMP 520 Fall 2008 Scanners and Parsers (48) $ cat ep.l %{ #include "y.tab.h" /* for ep.y types */ #include <string.h> /* for strlen */ #include <stdlib.h> /* for malloc and atoi */ % [ \t\n]+ /* ignore */ "*" return * "/" return / "+" return + "-" return - "(" return ( ")" return ) 0 ([1-9][0-9]*) { yylval.intconst = atoi (yytet) return tintconst [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *) malloc (strlen (yytet) + 1) sprintf (yylval.stringconst, "%s", yytet) return tidntifir. /* ignore */ $ cat main.c vo yyparse() int main (vo) { yyparse () Using fle/bison to create a parser is simple: $ fle ep.l $ bison --yacc --defines ep.y # note compatability options $ gcc le.yy.c y.tab.c y.tab.h main.c -o ep -lfl When input a*(b-17) + 5/c: $ echo "a*(b-17) + 5/c"./ep our ep parser outputs the correct order of operations: load a load b push 17 minus mult push 5 load c div plus You should confirm this for yourself!

13 COMP 520 Fall 2008 Scanners and Parsers (49) COMP 520 Fall 2008 Scanners and Parsers (50) If the input contains synta errors, then the bison-generated parser calls yyerror and stops. We may ask it to recover from the error: ep : tidntifir { printf ("load %s\n", $1)... ( ep ) error { yyerror() and on input a@(b-17) ++ 5/c get the output: load a synta error before ( synta error before ( synta error before ( synta error before b push 17 minus synta error before ) synta error before ) synta error before + plus push 5 load c div plus SableCC (by tienne Gagnon, McGill alumnus) is a compiler compiler: it takes a grammatical description of the source language as input, and generates a leer (scanner) and parser for it. joos.sablecc SableCC joos/*.java javac foo.joos scanner& parser CST/AST rror recovery hardly ever works. COMP 520 Fall 2008 Scanners and Parsers (51) COMP 520 Fall 2008 Scanners and Parsers (52) The SableCC 2 grammar for our Tiny language: Package tiny Helpers tab = 9 cr = 13 lf = 10 digit = [ ] lowercase = [ a.. z ] uppercase = [ A.. Z ] letter = lowercase uppercase letter = letter _ char = letter _ digit Tokens eol = cr lf cr lf blank = tab star = * slash = / plus = + minus = - l_par = ( r_par = ) number = 0 [digit- 0 ] digit* = letter char* Productions ep = {plus ep plus factor {minus ep minus factor {factor factor factor = {mult factor star term {divd factor slash term {term term term = {paren l_par ep r_par { {number number Version 2 produces parse trees, a.k.a. concrete synta trees (CSTs). Ignored Tokens blank, eol

14 COMP 520 Fall 2008 Scanners and Parsers (53) The SableCC 3 grammar for our Tiny language: Productions cst_ep {-> ep = {cst_plus cst_ep plus factor {-> New ep.plus(cst_ep.ep,factor.ep) {cst_minus cst_ep minus factor {-> New ep.minus(cst_ep.ep,factor.ep) {factor factor {-> factor.ep factor {-> ep = {cst_mult factor star term {-> New ep.mult(factor.ep,term.ep) {cst_divd factor slash term {-> New ep.divd(factor.ep,term.ep) {term term {-> term.ep term {-> ep = {paren l_par cst_ep r_par {-> cst_ep.ep {cst_ {-> New ep.() {cst_number number {-> New ep.number(number) Abstract Synta Tree ep = {plus [l]:ep [r]:ep {minus [l]:ep [r]:ep {mult [l]:ep [r]:ep {divd [l]:ep [r]:ep { {number number Version 3 generates abstract synta trees (ASTs).

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren. COMP 520 Winter 2015 Parsing COMP 520: Compiler Design (4 credits) Professor Laurie Hendren hendren@cs.mcgill.ca Parsing (1) COMP 520 Winter 2015 Parsing (2) A parser transforms a string of tokens into

More information

COMP 520 Fall 2012 Scanners and Parsers (1) Scanners and parsers

COMP 520 Fall 2012 Scanners and Parsers (1) Scanners and parsers COMP 520 Fall 2012 Scanners and Parsers (1) Scanners and parsers COMP 520 Fall 2012 Scanners and Parsers (2) A scanner or lexer transforms a string of characters into a string of tokens: uses a combination

More information

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren. COMP 520 Winter 2016 Scanning COMP 520: Compiler Design (4 credits) Professor Laurie Hendren hendren@cs.mcgill.ca Scanning (1) COMP 520 Winter 2016 Scanning (2) Readings Crafting a Compiler: Chapter 2,

More information

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279 COMP 520 Winter 2017 Scanning COMP 520: Compiler Design (4 credits) Alexander Krolik alexander.krolik@mail.mcgill.ca MWF 13:30-14:30, MD 279 Scanning (1) COMP 520 Winter 2017 Scanning (2) Announcements

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care. A Bison Manual 1 Overview Bison (and its predecessor yacc) is a tool that take a file of the productions for a context-free grammar and converts them into the tables for an LALR(1) parser. Bison produces

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F). CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11

More information

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1 Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery Last modified: Mon Feb 23 10:05:56 2015 CS164: Lecture #14 1 Shift/Reduce Conflicts If a DFA state contains both [X: α aβ, b] and [Y: γ, a],

More information

shift-reduce parsing

shift-reduce parsing Parsing #2 Bottom-up Parsing Rightmost derivations; use of rules from right to left Uses a stack to push symbols the concatenation of the stack symbols with the rest of the input forms a valid bottom-up

More information

Context-free grammars

Context-free grammars Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

Compiler Construction: Parsing

Compiler Construction: Parsing Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

More information

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing Parsing Wrapup Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing LR(1) items Computing closure Computing goto LR(1) canonical collection This lecture LR(1) parsing Building ACTION

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing ) COMPILR (CS 4120) (Lecture 6: Parsing 4 Bottom-up Parsing ) Sungwon Jung Mobile Computing & Data ngineering Lab Dept. of Computer Science and ngineering Sogang University Seoul, Korea Tel: +82-2-705-8930

More information

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax Susan Eggers 1 CSE 401 Syntax Analysis/Parsing Context-free grammars (CFG s) Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

UNIT III & IV. Bottom up parsing

UNIT III & IV. Bottom up parsing UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

More information

Context-free grammars (CFG s)

Context-free grammars (CFG s) Syntax Analysis/Parsing Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax tree (AST) AST: captures hierarchical structure of

More information

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology MIT 6.035 Parse Table Construction Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Parse Tables (Review) ACTION Goto State ( ) $ X s0 shift to s2 error error goto s1

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science Prelude COMP Lecture Topdown Parsing September, 00 What is the Tufts mascot? Jumbo the elephant Why? P. T. Barnum was an original trustee of Tufts : donated $0,000 for a natural museum on campus Barnum

More information

Lexical and Syntax Analysis. Bottom-Up Parsing

Lexical and Syntax Analysis. Bottom-Up Parsing Lexical and Syntax Analysis Bottom-Up Parsing Parsing There are two ways to construct derivation of a grammar. Top-Down: begin with start symbol; repeatedly replace an instance of a production s LHS with

More information

Conflicts in LR Parsing and More LR Parsing Types

Conflicts in LR Parsing and More LR Parsing Types Conflicts in LR Parsing and More LR Parsing Types Lecture 10 Dr. Sean Peisert ECS 142 Spring 2009 1 Status Project 2 Due Friday, Apr. 24, 11:55pm The usual lecture time is being replaced by a discussion

More information

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;} Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1 CS453 : JavaCUP and error recovery CS453 Shift-reduce Parsing 1 Shift-reduce parsing in an LR parser LR(k) parser Left-to-right parse Right-most derivation K-token look ahead LR parsing algorithm using

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages. Syntax Specification CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science Syntax and Parsing COMS W4115 Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science Lexical Analysis (Scanning) Lexical Analysis (Scanning) Goal is to translate a stream

More information

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End Architecture of Compilers, Interpreters : Organization of Programming Languages ource Analyzer Optimizer Code Generator Context Free Grammars Intermediate Representation Front End Back End Compiler / Interpreter

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Compiler Construction 2016/2017 Syntax Analysis

Compiler Construction 2016/2017 Syntax Analysis Compiler Construction 2016/2017 Syntax Analysis Peter Thiemann November 2, 2016 Outline 1 Syntax Analysis Recursive top-down parsing Nonrecursive top-down parsing Bottom-up parsing Syntax Analysis tokens

More information

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States TDDD16 Compilers and Interpreters TDDB44 Compiler Construction LR Parsing, Part 2 Constructing Parse Tables Parse table construction Grammar conflict handling Categories of LR Grammars and Parsers An NFA

More information

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built) Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language

More information

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers. Part III : Parsing From Regular to Context-Free Grammars Deriving a Parser from a Context-Free Grammar Scanners and Parsers A Parser for EBNF Left-Parsable Grammars Martin Odersky, LAMP/DI 1 From Regular

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

CS 4120 Introduction to Compilers

CS 4120 Introduction to Compilers CS 4120 Introduction to Compilers Andrew Myers Cornell University Lecture 6: Bottom-Up Parsing 9/9/09 Bottom-up parsing A more powerful parsing technology LR grammars -- more expressive than LL can handle

More information

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters : Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter

More information

In One Slide. Outline. LR Parsing. Table Construction

In One Slide. Outline. LR Parsing. Table Construction LR Parsing Table Construction #1 In One Slide An LR(1) parsing table can be constructed automatically from a CFG. An LR(1) item is a pair made up of a production and a lookahead token; it represents a

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Chapter 2 - Programming Language Syntax. September 20, 2017

Chapter 2 - Programming Language Syntax. September 20, 2017 Chapter 2 - Programming Language Syntax September 20, 2017 Specifying Syntax: Regular expressions and context-free grammars Regular expressions are formed by the use of three mechanisms Concatenation Alternation

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Chapter 2 :: Programming Language Syntax

Chapter 2 :: Programming Language Syntax Chapter 2 :: Programming Language Syntax Michael L. Scott kkman@sangji.ac.kr, 2015 1 Regular Expressions A regular expression is one of the following: A character The empty string, denoted by Two regular

More information

COP 3402 Systems Software Syntax Analysis (Parser)

COP 3402 Systems Software Syntax Analysis (Parser) COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character

More information

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012 LR(0) Drawbacks Consider the unambiguous augmented grammar: 0.) S E $ 1.) E T + E 2.) E T 3.) T x If we build the LR(0) DFA table, we find that there is a shift-reduce conflict. This arises because the

More information

S Y N T A X A N A L Y S I S LR

S Y N T A X A N A L Y S I S LR LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science Syntax and Parsing COMS W4115 Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science Lexical Analysis (Scanning) Lexical Analysis (Scanning) Translates a stream of characters

More information

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava. Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University Compilation 2012 Parsers and Scanners Jan Midtgaard Michael I. Schwartzbach Aarhus University Context-Free Grammars Example: sentence subject verb object subject person person John Joe Zacharias verb asked

More information

LECTURE 11. Semantic Analysis and Yacc

LECTURE 11. Semantic Analysis and Yacc LECTURE 11 Semantic Analysis and Yacc REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely specifying valid structures with a context-free

More information

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing Also known as Shift-Reduce parsing More powerful than top down Don t need left factored grammars Can handle left recursion Attempt to construct parse tree from an input string eginning at leaves and working

More information

LR Parsing LALR Parser Generators

LR Parsing LALR Parser Generators LR Parsing LALR Parser Generators Outline Review of bottom-up parsing Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) A bottom-up parser rewrites the input string to the

More information

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create

More information

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised: EDA180: Compiler Construc6on Context- free grammars Görel Hedin Revised: 2013-01- 28 Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens

More information

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform

More information

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics ISBN Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All

More information

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

More information

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CS2210: Compiler Construction Syntax Analysis Syntax Analysis Comparison with Lexical Analysis The second phase of compilation Phase Input Output Lexer string of characters string of tokens Parser string of tokens Parse tree/ast What Parse Tree? CS2210: Compiler

More information

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the

More information

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park Prerequisite Ubuntu Version 14.04 or over Virtual machine for Windows user or native OS flex bison gcc Version 4.7 or over Install in Ubuntu

More information

Using an LALR(1) Parser Generator

Using an LALR(1) Parser Generator Using an LALR(1) Parser Generator Yacc is an LALR(1) parser generator Developed by S.C. Johnson and others at AT&T Bell Labs Yacc is an acronym for Yet another compiler compiler Yacc generates an integrated

More information

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token

More information

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh. Bottom-Up Parsing II Different types of Shift-Reduce Conflicts) Lecture 10 Ganesh. Lecture 10) 1 Review: Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Doesn

More information

Programming Language Syntax and Analysis

Programming Language Syntax and Analysis Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of

More information

CS415 Compilers. LR Parsing & Error Recovery

CS415 Compilers. LR Parsing & Error Recovery CS415 Compilers LR Parsing & Error Recovery These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review: LR(k) items The LR(1) table construction

More information

Topic 5: Syntax Analysis III

Topic 5: Syntax Analysis III Topic 5: Syntax Analysis III Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis

More information

LR Parsing LALR Parser Generators

LR Parsing LALR Parser Generators Outline LR Parsing LALR Parser Generators Review of bottom-up parsing Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) A bottom-up parser rewrites the input string to the

More information

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

COP 3402 Systems Software Top Down Parsing (Recursive Descent) COP 3402 Systems Software Top Down Parsing (Recursive Descent) Top Down Parsing 1 Outline 1. Top down parsing and LL(k) parsing 2. Recursive descent parsing 3. Example of recursive descent parsing of arithmetic

More information

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence. Bottom-up parsing Recall For a grammar G, with start symbol S, any string α such that S α is a sentential form If α V t, then α is a sentence in L(G) A left-sentential form is a sentential form that occurs

More information

Compilers. Bottom-up Parsing. (original slides by Sam

Compilers. Bottom-up Parsing. (original slides by Sam Compilers Bottom-up Parsing Yannis Smaragdakis U Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Bottom-Up Parsing More general than top-down parsing And just as efficient Builds

More information

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program

More information

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1 CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout CS453 Shift-reduce Parsing 1 Plan for Today Finish PA1 this week Friday recitation: help with

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions

More information

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

More information