Scanners and parsers. A scanner or lexer transforms a string of characters into a string of tokens: uses a combination of deterministic finite
|
|
- Neil Cooper
- 5 years ago
- Views:
Transcription
1 COMP 520 Fall 2008 Scanners and Parsers (1) COMP 520 Fall 2008 Scanners and Parsers (2) Scanners and parsers A scanner or leer transforms a string of characters into a string of tokens: uses a combination of deterministic finite automata (DFA) plus some glue code to make it work can be generated by tools like fle (or le), JFle,... joos.l fle foo.joos le.yy.c gcc scanner tokens COMP 520 Fall 2008 Scanners and Parsers (3) COMP 520 Fall 2008 Scanners and Parsers (4) A parser transforms a string of tokens into a parse tree, according to some grammar: it corresponds to a deterministic push-down automaton plus some glue code to make it work can be generated by bison (or yacc), CUP, ANTLR, SableCC, Beaver, JavaCC,... joos.y Tokens are defined by regular epressions:, the empty set: a language with no strings ε, the empty string a, where a Σ and Σ is our alphabet M N, alternation: either M or N M N, concatenation: M followed by N M, zero or more occurences of M where M and N are both regular epressions. What are M? and M +? bison tokens y.tab.c gcc parser AST We can write regular epressions for the tokens in our source language using standard POSIX notation: simple operators: "*", "/", "+", "-" parentheses: "(", ")" integer constants: 0 ([1-9][0-9]*) entifiers: [a-za-z_][a-za-z0-9_]* white space: [ \t\n]+
2 COMP 520 Fall 2008 Scanners and Parsers (5) COMP 520 Fall 2008 Scanners and Parsers (6) fle accepts a list of regular epressions (rege), converts each rege internally to an NFA (Thompson construction), and then converts each NFA to a DFA (see Appel, Ch. 2): \t\n \t\n * / ( ) a-za-z a-za-z0-9 Given DFAs D 1,..., D n, ordered by the input rule order, the behaviour of a fle-generated scanner on an input string is: while input is not empty do s i := the longest prefi that D i accepts l := ma{ s i if l > 0 then j := min{i : s i = l remove s j from input end perform the j th action else (error case) move one character from input to output end In nglish: The longest initial substring match forms the net token, and it is subject to some action The first rule to match breaks any ties ach DFA has an associated action. Non-matching characters are echoed back COMP 520 Fall 2008 Scanners and Parsers (7) COMP 520 Fall 2008 Scanners and Parsers (8) Why the longest match principle? ample: keywords Why the first match principle? Again ample: keywords [ \t]+ /* ignore */... import return timport... [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *)malloc(strlen(yytet)+1) printf(yylval.stringconst,"%s",yytet) return tidntifir Want to match importedfiles as tidntifir(importedfiles) and not as timport tidntifir(edfiles). Because we prefer longer matches, we get the right result. [ \t]+ /* ignore */... continue return tcontinu... [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *)malloc(strlen(yytet)+1) printf(yylval.stringconst,"%s",yytet) return tidntifir Want to match continue foo as tcontinu tidntifir(foo) and not as tidntifir(continue) tidntifir(foo). First match rule gives us the right answer: When both tcontinu and tidntifir match, prefer the first.
3 COMP 520 Fall 2008 Scanners and Parsers (9) COMP 520 Fall 2008 Scanners and Parsers (10) When first longest match (flm) is not enough, look-ahead may help. Another eample taken from FORTRAN: Fortran ignores whitespace FORTRAN allows for the following tokens:.q., 363, 363.,.363 flm analysis of 363.Q.363 gives us: tfloat(363) Q tfloat(0.363) What we actually want is: tintgr(363) tq tintgr(363) fle allows us to use look-ahead, using / : 363/.Q. return tintgr 1. DO5I = 1.25 DO5I=1.25 in C: do5i = DO 5 I = 1,25 DO5I=1,25 in C: for(i=1i<25++i){... (5 is interpreted as a line number here) Case 1: flm analysis correct: tid(do5i) tq tral(1.25) Case 2: want: tdo tint(5) tid(i) tq tint(1) tcomma tint(25) Cannot make decision on tdo until we see the comma! Look-ahead comes to the rescue: DO/({letter {digit)*=({letter {digit)*, return tdo COMP 520 Fall 2008 Scanners and Parsers (11) COMP 520 Fall 2008 Scanners and Parsers (12) $ cat print_tokens.l # fle source code /* includes and other arbitrary C code */ %{ #include <stdio.h> /* for printf */ % /* helper definitions */ DIGIT [0-9] /* rege + action rules come after the first */ [ \t\n]+ printf ("white space, length %i\n", yyleng) "*" printf ("times\n") "/" printf ("div\n") "+" printf ("plus\n") "-" printf ("minus\n") "(" printf ("left parenthesis\n") ")" printf ("right parenthesis\n") 0 ([1-9]{DIGIT*) printf ("integer constant: %s\n", yytet) [a-za-z_][a-za-z0-9_]* printf ("entifier: %s\n", yytet) /* user code comes after the second */ main () { yyle () Using fle to create a scanner is really simple: $ emacs print_tokens.l $ fle print_tokens.l $ gcc -o print_tokens le.yy.c -lfl When input a*(b-17) + 5/c: $ echo "a*(b-17) + 5/c"./print_tokens our print tokens scanner outputs: entifier: a times left parenthesis entifier: b minus integer constant: 17 right parenthesis white space, length 1 plus white space, length 1 integer constant: 5 div entifier: c white space, length 1 You should confirm this for yourself!
4 COMP 520 Fall 2008 Scanners and Parsers (13) COMP 520 Fall 2008 Scanners and Parsers (14) Count lines and characters: %{ int lines = 0, chars = 0 % \n lines++ chars++. chars++ main () { yyle () printf ("#lines = %i, #chars = %i\n", lines, chars) Remove vowels and increment integers: %{ #include <stdlib.h> /* for atoi */ #include <stdio.h> /* for printf */ % [aeiouy] /* ignore */ [0-9]+ printf ("%i", atoi (yytet) + 1) main () { yyle () A contet-free grammar is a 4-tuple (V, Σ, R, S), where we have: V, a set of variables (or non-terminals) Σ, a set of terminals such that V Σ = R, a set of rules, where the LHS is a variable in V and the RHS is a string of variables in V and terminals in Σ S V, the start variable CFGs are stronger than regular epressions, and able to epress recursively-defined constructs. ample: we cannot write a regular epression for any number of matched parentheses: (), (()), ((())),... Using a CFG: ( ) ǫ COMP 520 Fall 2008 Scanners and Parsers (15) COMP 520 Fall 2008 Scanners and Parsers (16) Automatic parser generators use CFGs as input and generate parsers using the machinery of a deterministic pushdown automaton. joos.y Simple CFG eample: A a B A ǫ B b B B c Alternatively: A a B ǫ B b B c bison tokens In both cases we specify S = A. Can you write this grammar as a regular epression? y.tab.c gcc parser We can perform a rightmost derivation by repeatedly replacing variables with their RHS until only terminals remain: AST By limiting the kind of CFG allowed, we get efficient parsers. A a B a b B a b b B a b b c
5 COMP 520 Fall 2008 Scanners and Parsers (17) COMP 520 Fall 2008 Scanners and Parsers (18) There are several different grammar formalisms. First, conser BNF (Backus-Naur Form): stmt ::= stmt_epr "" while_stmt block if_stmt while_stmt ::= WHIL "(" epr ")" stmt block ::= "{" stmt_list "" if_stmt ::= IF "(" epr ")" stmt IF "(" epr ")" stmt LS stmt We have four options for stmt list: 1. stmt list ::= stmt list stmt ǫ 0 or more, left-recursive 2. stmt list ::= stmt stmt list ǫ 0 or more, right-recursive 3. stmt list ::= stmt list stmt stmt 1 or more, left-recursive 4. stmt list ::= stmt stmt list stmt 1 or more, right-recursive Second, conser BNF (tended BNF): BNF derivations BNF A A a b b A a A b { a (left-recursive) A a a b a a A a A b b a A A { a b (right-recursive) a a A a a b where { and are like Kleene * s in regular epressions. Using BNF repetition, our four choices for stmt_list become: 1. stmt_list ::= { stmt 2. stmt_list ::= { stmt 3. stmt_list ::= { stmt stmt 4. stmt_list ::= stmt { stmt COMP 520 Fall 2008 Scanners and Parsers (19) COMP 520 Fall 2008 Scanners and Parsers (20) BNF also has an optional-construct. For eample: stmt_list ::= stmt stmt_list stmt could be written as: Third, conser railroad synta diagrams: (thanks rail.sty!)??? stmt_list ::= stmt [ stmt_list ] And similarly: if_stmt ::= IF "(" epr ")" stmt IF "(" epr ")" stmt LS stmt could be written as: if_stmt ::= IF "(" epr ")" stmt [ LS stmt ] where [ and ] are like? in regular epressions.
6 COMP 520 Fall 2008 Scanners and Parsers (21) COMP 520 Fall 2008 Scanners and Parsers (22)??? S S S L S := num L L, S print ( L ) a := 7 b := c + (d := 5 + 6, d) + ( S, ) S (rightmost derivation) S S S := S := + S := + (S, ) S := + (S, ) S := + ( :=, ) S := + ( := +, ) S := + ( := + num, ) S := + ( := num + num, ) S := + ( := num + num, ) := := + ( := num + num, ) := num := + ( := num + num, ) COMP 520 Fall 2008 Scanners and Parsers (23) COMP 520 Fall 2008 Scanners and Parsers (24) S S S L S := num L L, S print ( L ) a := 7 b := c + (d := 5 + 6, d) + S S ( S, ) S := := num + ( S, ) := + A grammar is ambiguous if a sentence has different parse trees: := + + S := + + S := + The above is harmless, but conser: := - - := + * + Clearly, we need to conser associativity and precedence when designing grammars. num num
7 COMP 520 Fall 2008 Scanners and Parsers (25) COMP 520 Fall 2008 Scanners and Parsers (26) An ambiguous grammar: / ( ) num + may be rewritten to become unambiguous: There are fundamentally two kinds of parser: 1) Top-down, predictive or recursive descent parsers. Used in all languages designed by Wirth, e.g. Pascal, Modula, and Oberon. + T T T F F T T T / F F num T T F F ( ) + T T T * F One can (easily) write a predictive parser by hand, or generate one from an LL(k) grammar: Left-to-right parse Leftmost-derivation and F F k symbol lookahead. Algorithm: look at beginning of input (up to k characters) and unambiguously epand leftmost non-terminal. COMP 520 Fall 2008 Scanners and Parsers (27) COMP 520 Fall 2008 Scanners and Parsers (28) 2) Bottom-up parsers. Algorithm: look for a sequence matching RHS and reduce to LHS. Postpone any decision until entire RHS is seen, plus k tokens lookahead. Can write a bottom-up parser by hand (tricky), or generate one from an LR(k) grammar (easy): Left-to-right parse Rightmost-derivation and k symbol lookahead. The -reduce bottom-up parsing technique. 1) tend the grammar with an end-of-file $, introduce fresh start symbol S : S S$ S S S L S := num L L, S print ( L ) + ( S, ) 2) Choose between the following actions: : move first input token to top of stack reduce: replace α on top of stack by X for some rule X α accept: when S is on the stack
8 COMP 520 Fall 2008 Scanners and Parsers (29) COMP 520 Fall 2008 Scanners and Parsers (30) a:=7 b:=c+(d:=5+6,d)$ :=7 b:=c+(d:=5+6,d)$ 7 b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ :=c+(d:=5+6,d)$ c+(d:=5+6,d)$ +(d:=5+6,d)$ := := num := S S S S := S := S := S := + S := + ( S := + ( S := + ( := S := + ( := num S := + ( := S := + ( := + +(d:=5+6,d)$ (d:=5+6,d)$ d:=5+6,d)$ :=5+6,d)$ S := + ( := + num S := + ( := + S := + ( := S := + ( S S := + ( S, S := + ( S, S := + ( S, S := + ( S, ) S := + S := S S S S$ S 5+6,d)$ +6,d)$ +6,d)$ 6,d)$,d)$,d)$,d)$,d)$ d)$ )$ )$ $ $ $ $ $ num S := num num + S := (S) + S := S SS S S$ accept 0 S S$ 5 num 1 S S S S := 7 ( S, ) 3 S print ( L ) 8 L 4 9 L L, Use a DFA to choose the action the stack only contains DFA states now. Start with the initial state (s1) on the stack. Lookup (stack top, net input symbol): (n): skip net input symbol and push state n reduce(k): rule k is X α pop α times lookup (stack top, X) in table goto(n): push state n accept: report success error: report failure COMP 520 Fall 2008 Scanners and Parsers (31) COMP 520 Fall 2008 Scanners and Parsers (32) DFA terminals non-terminals state num print, + := ( ) $ S L 1 s4 s7 g2 2 s3 a 3 s4 s7 g5 4 s6 5 r1 r1 r1 6 s20 s10 s8 g11 7 s9 8 s4 s7 g12 9 g15 g14 10 r5 r5 r5 r5 r5 11 r2 r2 s16 r2 12 s3 s18 13 r3 r3 r3 14 s19 s13 15 r8 r8 16 s20 s10 s8 g17 17 r6 r6 s16 r6 r6 18 s20 s10 s8 g21 19 s20 s10 s8 g23 20 r4 r4 r4 r4 r4 21 s22 22 r7 r7 r7 r7 r7 23 r9 s16 r9 s 1 a := 7$ (4) s 1 s 4 := 7$ (6) s 1 s 4 s 6 7$ (10) s 1 s 4 s 6 s 10 $ reduce(5): num s 1 s 4 s 6 ////// s 10 $ lookup(s 6,) = goto(11) s 1 s 4 s 6 s 11 $ reduce(2): S := s 1 //// s 4 //// s 6 ////// s 11 $ lookup(s 1,S) = goto(2) s 1 s 2 $ accept rror transitions omitted.
9 COMP 520 Fall 2008 Scanners and Parsers (33) COMP 520 Fall 2008 Scanners and Parsers (34) LR(1) is an algorithm that attempts to construct a parsing table: Left-to-right parse Rightmost-derivation and 1 symbol lookahead. If no conflicts (/reduce, reduce/reduce) arise, then we are happy otherwise, fi grammar. An LR(1) item (A α. βγ, ) consists of 1. A grammar production, A αβγ 2. The RHS position, represented by. 3. A lookahead symbol, We first compute a set of LR(1) states from our grammar, and then use them to build a parse table. There are four kinds of entry to make: 1. goto: when β is non-terminal 2. : when β is terminal 3. reduce: when β is empty (the net state is the number of the production used) 4. accept: when we have A B. $ Follow construction on the tiny grammar: 0 S $ 2 T 1 T + 3 T An LR(1) state is a set of LR(1) items. The sequence α is on top of the stack, and the head of the input is derivable from βγ. There are two cases for β, terminal or non-terminal. COMP 520 Fall 2008 Scanners and Parsers (35) COMP 520 Fall 2008 Scanners and Parsers (36) Constructing the LR(1) NFA: start with state S. $? state A α. B β l has: ǫ-successor B. γ, if: eists rule B γ, and lookahead(β) B-successor A α B. β l state A α. β l has: -successor A α. β l Constructing the LR(1) DFA: Standard power-set construction, inlining ǫ-transitions. 1 S.$? S.$? 2.T + $.T $ T T. + T.+ $ 3 T. $ T. $ 5 T. + T. $ 6 T+. $ + T + $ T 1 s5 g2 g3 2 a 3 s4 r2 4 s5 g6 g3 5 r3 r3 6 r1 T +. $.T + $.T $ T. $ T. + 4
10 COMP 520 Fall 2008 Scanners and Parsers (37) COMP 520 Fall 2008 Scanners and Parsers (38) Conflicts LR(1) tables may become very large. A.B A C. y no conflict (lookahead deces) Parser generators use LALR(1), which merges states that are entical ecept for lookaheads. A.B A C. /reduce conflict LL(k) LL(1) LR(k) LR(1) A. A C. y /reduce conflict LALR(1) A B. A C. reduce/reduce conflict LL(0) SLR LR(0) A.B A.C B C s i s j / conflict? by construction of the DFA we have s i = s j COMP 520 Fall 2008 Scanners and Parsers (39) COMP 520 Fall 2008 Scanners and Parsers (40) bison (yacc) is a parser generator: it inputs a grammar it computes an LALR(1) parser table it reports conflicts it resolves conflicts using defaults (!) and it creates a C program. Nobody writes (simple) parsers by hand anymore. The grammar: 1 4 / 7 ( ) 2 num is epressed in bison as: %{ /* C declarations */ % /* Bison declarations tokens come from leer (scanner) */ %token tidntifir tintconst %start ep /* Grammar rules after the first */ ep : tidntifir tintconst ep * ep ep / ep ep + ep ep - ep ( ep ) /* User C code after the second */ Input this code into ep.y to follow the eample.
11 COMP 520 Fall 2008 Scanners and Parsers (41) COMP 520 Fall 2008 Scanners and Parsers (42) The grammar is ambiguous: $ bison --verbose ep.y # --verbose produces ep.output ep.y contains 16 /reduce conflicts. $ cat ep.output State 11 contains 4 /reduce conflicts. State 12 contains 4 /reduce conflicts. State 13 contains 4 /reduce conflicts. State 14 contains 4 /reduce conflicts. [...] state 11 ep -> ep. * ep (rule 3) ep -> ep * ep. (rule 3) <-- problem is here ep -> ep. / ep (rule 4) ep -> ep. + ep (rule 5) ep -> ep. - ep (rule 6) *, and go to state 6 /, and go to state 7 +, and go to state 8 -, and go to state 9 * [reduce using rule 3 (ep)] / [reduce using rule 3 (ep)] + [reduce using rule 3 (ep)] - [reduce using rule 3 (ep)] $default reduce using rule 3 (ep) Rewrite the grammar to force reductions: + T T T F F - T T T / F F num T T F F ( ) %token tidntifir tintconst %start ep ep : ep + term ep - term term term : term * factor term / factor factor factor : tidntifir tintconst ( ep ) COMP 520 Fall 2008 Scanners and Parsers (43) COMP 520 Fall 2008 Scanners and Parsers (44) Or use precedence directives: %token tidntifir tintconst %start ep %left + - /* left-associative, lower precedence */ %left * / /* left-associative, higher precedence */ ep : tidntifir tintconst ep * ep ep / ep ep + ep ep - ep ( ep ) which resolve /reduce conflicts: Conflict in state 11 between rule 5 and token + resolved as reduce. <-- Reduce ep + ep. + Conflict in state 11 between rule 5 and token - resolved as reduce. <-- Reduce ep + ep. - Conflict in state 11 between rule 5 and token * resolved as. <-- Shift ep + ep. * Conflict in state 11 between rule 5 and token / resolved as. <-- Shift ep + ep. / The precedence directives are: %left (left-associative) %right (right-associative) %nonassoc (non-associative) When constructing a parse table, an action is chosen based on the precedence of the last symbol on the right-hand se of the rule. Precedences are ordered from lowest to highest on a linewise basis. If precedences are equal, then: %left favors reducing %right favors ing %nonassoc yields an error This usually ends up working. Note that this is not the same state 11 as before.
12 COMP 520 Fall 2008 Scanners and Parsers (45) COMP 520 Fall 2008 Scanners and Parsers (46) state 0 tidntifir, and go to state 1 tintconst, and go to state 2 (, and go to state 3 ep go to state 4 state 1 ep -> tidntifir. (rule 1) $default reduce using rule 1 (ep) state 2 ep -> tintconst. (rule 2) $default reduce using rule 2 (ep)... state 14 ep -> ep. * ep (rule 3) ep -> ep. / ep (rule 4) ep -> ep / ep. (rule 4) ep -> ep. + ep (rule 5) ep -> ep. - ep (rule 6) $default reduce using rule 4 (ep) state 15 $ go to state 16 state 16 $default accept $ cat ep.y %{ #include <stdio.h> /* for printf */ etern char *yytet /* string from scanner */ vo yyerror() { printf ("synta error before %s\n", yytet) % %union { int intconst char *stringconst %token <intconst> tintconst %token <stringconst> tidntifir %start ep %left + - %left * / ep : tidntifir { printf ("load %s\n", $1) tintconst { printf ("push %i\n", $1) ep * ep { printf ("mult\n") ep / ep { printf ("div\n") ep + ep { printf ("plus\n") ep - ep { printf ("minus\n") ( ep ) { COMP 520 Fall 2008 Scanners and Parsers (47) COMP 520 Fall 2008 Scanners and Parsers (48) $ cat ep.l %{ #include "y.tab.h" /* for ep.y types */ #include <string.h> /* for strlen */ #include <stdlib.h> /* for malloc and atoi */ % [ \t\n]+ /* ignore */ "*" return * "/" return / "+" return + "-" return - "(" return ( ")" return ) 0 ([1-9][0-9]*) { yylval.intconst = atoi (yytet) return tintconst [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *) malloc (strlen (yytet) + 1) sprintf (yylval.stringconst, "%s", yytet) return tidntifir. /* ignore */ $ cat main.c vo yyparse() int main (vo) { yyparse () Using fle/bison to create a parser is simple: $ fle ep.l $ bison --yacc --defines ep.y # note compatability options $ gcc le.yy.c y.tab.c y.tab.h main.c -o ep -lfl When input a*(b-17) + 5/c: $ echo "a*(b-17) + 5/c"./ep our ep parser outputs the correct order of operations: load a load b push 17 minus mult push 5 load c div plus You should confirm this for yourself!
13 COMP 520 Fall 2008 Scanners and Parsers (49) COMP 520 Fall 2008 Scanners and Parsers (50) If the input contains synta errors, then the bison-generated parser calls yyerror and stops. We may ask it to recover from the error: ep : tidntifir { printf ("load %s\n", $1)... ( ep ) error { yyerror() and on input a@(b-17) ++ 5/c get the output: load a synta error before ( synta error before ( synta error before ( synta error before b push 17 minus synta error before ) synta error before ) synta error before + plus push 5 load c div plus SableCC (by tienne Gagnon, McGill alumnus) is a compiler compiler: it takes a grammatical description of the source language as input, and generates a leer (scanner) and parser for it. joos.sablecc SableCC joos/*.java javac foo.joos scanner& parser CST/AST rror recovery hardly ever works. COMP 520 Fall 2008 Scanners and Parsers (51) COMP 520 Fall 2008 Scanners and Parsers (52) The SableCC 2 grammar for our Tiny language: Package tiny Helpers tab = 9 cr = 13 lf = 10 digit = [ ] lowercase = [ a.. z ] uppercase = [ A.. Z ] letter = lowercase uppercase letter = letter _ char = letter _ digit Tokens eol = cr lf cr lf blank = tab star = * slash = / plus = + minus = - l_par = ( r_par = ) number = 0 [digit- 0 ] digit* = letter char* Productions ep = {plus ep plus factor {minus ep minus factor {factor factor factor = {mult factor star term {divd factor slash term {term term term = {paren l_par ep r_par { {number number Version 2 produces parse trees, a.k.a. concrete synta trees (CSTs). Ignored Tokens blank, eol
14 COMP 520 Fall 2008 Scanners and Parsers (53) The SableCC 3 grammar for our Tiny language: Productions cst_ep {-> ep = {cst_plus cst_ep plus factor {-> New ep.plus(cst_ep.ep,factor.ep) {cst_minus cst_ep minus factor {-> New ep.minus(cst_ep.ep,factor.ep) {factor factor {-> factor.ep factor {-> ep = {cst_mult factor star term {-> New ep.mult(factor.ep,term.ep) {cst_divd factor slash term {-> New ep.divd(factor.ep,term.ep) {term term {-> term.ep term {-> ep = {paren l_par cst_ep r_par {-> cst_ep.ep {cst_ {-> New ep.() {cst_number number {-> New ep.number(number) Abstract Synta Tree ep = {plus [l]:ep [r]:ep {minus [l]:ep [r]:ep {mult [l]:ep [r]:ep {divd [l]:ep [r]:ep { {number number Version 3 generates abstract synta trees (ASTs).
Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.
COMP 520 Winter 2015 Parsing COMP 520: Compiler Design (4 credits) Professor Laurie Hendren hendren@cs.mcgill.ca Parsing (1) COMP 520 Winter 2015 Parsing (2) A parser transforms a string of tokens into
More informationCOMP 520 Fall 2012 Scanners and Parsers (1) Scanners and parsers
COMP 520 Fall 2012 Scanners and Parsers (1) Scanners and parsers COMP 520 Fall 2012 Scanners and Parsers (2) A scanner or lexer transforms a string of characters into a string of tokens: uses a combination
More informationScanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.
COMP 520 Winter 2016 Scanning COMP 520: Compiler Design (4 credits) Professor Laurie Hendren hendren@cs.mcgill.ca Scanning (1) COMP 520 Winter 2016 Scanning (2) Readings Crafting a Compiler: Chapter 2,
More informationScanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279
COMP 520 Winter 2017 Scanning COMP 520: Compiler Design (4 credits) Alexander Krolik alexander.krolik@mail.mcgill.ca MWF 13:30-14:30, MD 279 Scanning (1) COMP 520 Winter 2017 Scanning (2) Announcements
More informationEDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:
EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and
More informationLecture 7: Deterministic Bottom-Up Parsing
Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general
More informationLecture 8: Deterministic Bottom-Up Parsing
Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up
More informationPart 5 Program Analysis Principles and Techniques
1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape
More informationParsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing
Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides
More informationWednesday, September 9, 15. Parsers
Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda
More informationParsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:
What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda
More informationA Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.
A Bison Manual 1 Overview Bison (and its predecessor yacc) is a tool that take a file of the productions for a context-free grammar and converts them into the tables for an LALR(1) parser. Bison produces
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back
More informationCS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).
CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language
More informationBottom-Up Parsing. Lecture 11-12
Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient
More informationBottom-Up Parsing. Lecture 11-12
Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11
More informationLecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1
Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery Last modified: Mon Feb 23 10:05:56 2015 CS164: Lecture #14 1 Shift/Reduce Conflicts If a DFA state contains both [X: α aβ, b] and [Y: γ, a],
More informationshift-reduce parsing
Parsing #2 Bottom-up Parsing Rightmost derivations; use of rules from right to left Uses a stack to push symbols the concatenation of the stack symbols with the rest of the input forms a valid bottom-up
More informationContext-free grammars
Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,
More informationCMSC 330: Organization of Programming Languages. Context Free Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationParsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468
Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure
More informationWednesday, August 31, Parsers
Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically
More informationCompiler Construction: Parsing
Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax
More informationParsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing
Parsing Wrapup Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing LR(1) items Computing closure Computing goto LR(1) canonical collection This lecture LR(1) parsing Building ACTION
More informationSyntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.
Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will
More informationCOMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )
COMPILR (CS 4120) (Lecture 6: Parsing 4 Bottom-up Parsing ) Sungwon Jung Mobile Computing & Data ngineering Lab Dept. of Computer Science and ngineering Sogang University Seoul, Korea Tel: +82-2-705-8930
More informationSyntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax
Susan Eggers 1 CSE 401 Syntax Analysis/Parsing Context-free grammars (CFG s) Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax
More informationCSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1
CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions
More informationCSE 3302 Programming Languages Lecture 2: Syntax
CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:
More informationUNIT III & IV. Bottom up parsing
UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.
More informationContext-free grammars (CFG s)
Syntax Analysis/Parsing Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax tree (AST) AST: captures hierarchical structure of
More informationMIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
MIT 6.035 Parse Table Construction Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Parse Tables (Review) ACTION Goto State ( ) $ X s0 shift to s2 error error goto s1
More informationMonday, September 13, Parsers
Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationPrelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science
Prelude COMP Lecture Topdown Parsing September, 00 What is the Tufts mascot? Jumbo the elephant Why? P. T. Barnum was an original trustee of Tufts : donated $0,000 for a natural museum on campus Barnum
More informationLexical and Syntax Analysis. Bottom-Up Parsing
Lexical and Syntax Analysis Bottom-Up Parsing Parsing There are two ways to construct derivation of a grammar. Top-Down: begin with start symbol; repeatedly replace an instance of a production s LHS with
More informationConflicts in LR Parsing and More LR Parsing Types
Conflicts in LR Parsing and More LR Parsing Types Lecture 10 Dr. Sean Peisert ECS 142 Spring 2009 1 Status Project 2 Due Friday, Apr. 24, 11:55pm The usual lecture time is being replaced by a discussion
More informationParsing. source code. while (k<=n) {sum = sum+k; k=k+1;}
Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)
More information3. Parsing. Oscar Nierstrasz
3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/
More informationCS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1
CS453 : JavaCUP and error recovery CS453 Shift-reduce Parsing 1 Shift-reduce parsing in an LR parser LR(k) parser Left-to-right parse Right-most derivation K-token look ahead LR parsing algorithm using
More informationChapter 3: Lexing and Parsing
Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding
More informationCPS 506 Comparative Programming Languages. Syntax Specification
CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationSyntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science
Syntax and Parsing COMS W4115 Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science Lexical Analysis (Scanning) Lexical Analysis (Scanning) Goal is to translate a stream
More informationArchitecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End
Architecture of Compilers, Interpreters : Organization of Programming Languages ource Analyzer Optimizer Code Generator Context Free Grammars Intermediate Representation Front End Back End Compiler / Interpreter
More informationCMSC 330: Organization of Programming Languages. Context Free Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationLexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream
More informationCompiler Construction 2016/2017 Syntax Analysis
Compiler Construction 2016/2017 Syntax Analysis Peter Thiemann November 2, 2016 Outline 1 Syntax Analysis Recursive top-down parsing Nonrecursive top-down parsing Bottom-up parsing Syntax Analysis tokens
More informationLR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States
TDDD16 Compilers and Interpreters TDDB44 Compiler Construction LR Parsing, Part 2 Constructing Parse Tables Parse table construction Grammar conflict handling Categories of LR Grammars and Parsers An NFA
More informationCS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)
Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language
More informationPart III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.
Part III : Parsing From Regular to Context-Free Grammars Deriving a Parser from a Context-Free Grammar Scanners and Parsers A Parser for EBNF Left-Parsable Grammars Martin Odersky, LAMP/DI 1 From Regular
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationCS 4120 Introduction to Compilers
CS 4120 Introduction to Compilers Andrew Myers Cornell University Lecture 6: Bottom-Up Parsing 9/9/09 Bottom-up parsing A more powerful parsing technology LR grammars -- more expressive than LL can handle
More informationCMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters
: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter
More informationIn One Slide. Outline. LR Parsing. Table Construction
LR Parsing Table Construction #1 In One Slide An LR(1) parsing table can be constructed automatically from a CFG. An LR(1) item is a pair made up of a production and a lookahead token; it represents a
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationChapter 2 - Programming Language Syntax. September 20, 2017
Chapter 2 - Programming Language Syntax September 20, 2017 Specifying Syntax: Regular expressions and context-free grammars Regular expressions are formed by the use of three mechanisms Concatenation Alternation
More informationCMSC 330: Organization of Programming Languages. Context Free Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationChapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax Michael L. Scott kkman@sangji.ac.kr, 2015 1 Regular Expressions A regular expression is one of the following: A character The empty string, denoted by Two regular
More informationCOP 3402 Systems Software Syntax Analysis (Parser)
COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis
More informationChapter 4. Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.
More informationSyntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.
Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character
More informationSimple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012
LR(0) Drawbacks Consider the unambiguous augmented grammar: 0.) S E $ 1.) E T + E 2.) E T 3.) T x If we build the LR(0) DFA table, we find that there is a shift-reduce conflict. This arises because the
More informationS Y N T A X A N A L Y S I S LR
LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number
More informationRegular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications
Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata
More informationSyntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science
Syntax and Parsing COMS W4115 Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science Lexical Analysis (Scanning) Lexical Analysis (Scanning) Translates a stream of characters
More informationCompiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.
Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character
More informationSyntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay
Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax
More informationCompilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University
Compilation 2012 Parsers and Scanners Jan Midtgaard Michael I. Schwartzbach Aarhus University Context-Free Grammars Example: sentence subject verb object subject person person John Joe Zacharias verb asked
More informationLECTURE 11. Semantic Analysis and Yacc
LECTURE 11 Semantic Analysis and Yacc REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely specifying valid structures with a context-free
More informationBottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing
Also known as Shift-Reduce parsing More powerful than top down Don t need left factored grammars Can handle left recursion Attempt to construct parse tree from an input string eginning at leaves and working
More informationLR Parsing LALR Parser Generators
LR Parsing LALR Parser Generators Outline Review of bottom-up parsing Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) A bottom-up parser rewrites the input string to the
More informationProgramming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson
Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create
More informationEDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:
EDA180: Compiler Construc6on Context- free grammars Görel Hedin Revised: 2013-01- 28 Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens
More informationCompilers. Yannis Smaragdakis, U. Athens (original slides by Sam
Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform
More informationChapter 3. Describing Syntax and Semantics ISBN
Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All
More informationConcepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis
More informationCS2210: Compiler Construction Syntax Analysis Syntax Analysis
Comparison with Lexical Analysis The second phase of compilation Phase Input Output Lexer string of characters string of tokens Parser string of tokens Parse tree/ast What Parse Tree? CS2210: Compiler
More information3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino
3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the
More informationLex & Yacc (GNU distribution - flex & bison) Jeonghwan Park
Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park Prerequisite Ubuntu Version 14.04 or over Virtual machine for Windows user or native OS flex bison gcc Version 4.7 or over Install in Ubuntu
More informationUsing an LALR(1) Parser Generator
Using an LALR(1) Parser Generator Yacc is an LALR(1) parser generator Developed by S.C. Johnson and others at AT&T Bell Labs Yacc is an acronym for Yet another compiler compiler Yacc generates an integrated
More informationSyntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011
Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token
More informationBottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.
Bottom-Up Parsing II Different types of Shift-Reduce Conflicts) Lecture 10 Ganesh. Lecture 10) 1 Review: Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Doesn
More informationProgramming Language Syntax and Analysis
Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of
More informationCS415 Compilers. LR Parsing & Error Recovery
CS415 Compilers LR Parsing & Error Recovery These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review: LR(k) items The LR(1) table construction
More informationTopic 5: Syntax Analysis III
Topic 5: Syntax Analysis III Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis
More informationLR Parsing LALR Parser Generators
Outline LR Parsing LALR Parser Generators Review of bottom-up parsing Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) A bottom-up parser rewrites the input string to the
More informationCOP 3402 Systems Software Top Down Parsing (Recursive Descent)
COP 3402 Systems Software Top Down Parsing (Recursive Descent) Top Down Parsing 1 Outline 1. Top down parsing and LL(k) parsing 2. Recursive descent parsing 3. Example of recursive descent parsing of arithmetic
More informationWhere We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser
More informationprogramming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs
Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationA left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.
Bottom-up parsing Recall For a grammar G, with start symbol S, any string α such that S α is a sentential form If α V t, then α is a sentence in L(G) A left-sentential form is a sentential form that occurs
More informationCompilers. Bottom-up Parsing. (original slides by Sam
Compilers Bottom-up Parsing Yannis Smaragdakis U Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Bottom-Up Parsing More general than top-down parsing And just as efficient Builds
More informationSyntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38
Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program
More informationCS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1
CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout CS453 Shift-reduce Parsing 1 Plan for Today Finish PA1 this week Friday recitation: help with
More informationSection A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.
Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or
More informationCS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions
More informationCSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on
More information