Scanners and parsers. A scanner or lexer transforms a string of characters into a string of tokens: uses a combination of deterministic finite

Size: px

Start display at page:

Download "Scanners and parsers. A scanner or lexer transforms a string of characters into a string of tokens: uses a combination of deterministic finite"

Neil Cooper
5 years ago
Views:

1 COMP 520 Fall 2008 Scanners and Parsers (1) COMP 520 Fall 2008 Scanners and Parsers (2) Scanners and parsers A scanner or leer transforms a string of characters into a string of tokens: uses a combination of deterministic finite automata (DFA) plus some glue code to make it work can be generated by tools like fle (or le), JFle,... joos.l fle foo.joos le.yy.c gcc scanner tokens COMP 520 Fall 2008 Scanners and Parsers (3) COMP 520 Fall 2008 Scanners and Parsers (4) A parser transforms a string of tokens into a parse tree, according to some grammar: it corresponds to a deterministic push-down automaton plus some glue code to make it work can be generated by bison (or yacc), CUP, ANTLR, SableCC, Beaver, JavaCC,... joos.y Tokens are defined by regular epressions:, the empty set: a language with no strings ε, the empty string a, where a Σ and Σ is our alphabet M N, alternation: either M or N M N, concatenation: M followed by N M, zero or more occurences of M where M and N are both regular epressions. What are M? and M +? bison tokens y.tab.c gcc parser AST We can write regular epressions for the tokens in our source language using standard POSIX notation: simple operators: "*", "/", "+", "-" parentheses: "(", ")" integer constants: 0 ([1-9][0-9]*) entifiers: [a-za-z_][a-za-z0-9_]* white space: [ \t\n]+

2 COMP 520 Fall 2008 Scanners and Parsers (5) COMP 520 Fall 2008 Scanners and Parsers (6) fle accepts a list of regular epressions (rege), converts each rege internally to an NFA (Thompson construction), and then converts each NFA to a DFA (see Appel, Ch. 2): \t\n \t\n * / ( ) a-za-z a-za-z0-9 Given DFAs D 1,..., D n, ordered by the input rule order, the behaviour of a fle-generated scanner on an input string is: while input is not empty do s i := the longest prefi that D i accepts l := ma{ s i if l > 0 then j := min{i : s i = l remove s j from input end perform the j th action else (error case) move one character from input to output end In nglish: The longest initial substring match forms the net token, and it is subject to some action The first rule to match breaks any ties ach DFA has an associated action. Non-matching characters are echoed back COMP 520 Fall 2008 Scanners and Parsers (7) COMP 520 Fall 2008 Scanners and Parsers (8) Why the longest match principle? ample: keywords Why the first match principle? Again ample: keywords [ \t]+ /* ignore */... import return timport... [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *)malloc(strlen(yytet)+1) printf(yylval.stringconst,"%s",yytet) return tidntifir Want to match importedfiles as tidntifir(importedfiles) and not as timport tidntifir(edfiles). Because we prefer longer matches, we get the right result. [ \t]+ /* ignore */... continue return tcontinu... [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *)malloc(strlen(yytet)+1) printf(yylval.stringconst,"%s",yytet) return tidntifir Want to match continue foo as tcontinu tidntifir(foo) and not as tidntifir(continue) tidntifir(foo). First match rule gives us the right answer: When both tcontinu and tidntifir match, prefer the first.

3 COMP 520 Fall 2008 Scanners and Parsers (9) COMP 520 Fall 2008 Scanners and Parsers (10) When first longest match (flm) is not enough, look-ahead may help. Another eample taken from FORTRAN: Fortran ignores whitespace FORTRAN allows for the following tokens:.q., 363, 363.,.363 flm analysis of 363.Q.363 gives us: tfloat(363) Q tfloat(0.363) What we actually want is: tintgr(363) tq tintgr(363) fle allows us to use look-ahead, using / : 363/.Q. return tintgr 1. DO5I = 1.25 DO5I=1.25 in C: do5i = DO 5 I = 1,25 DO5I=1,25 in C: for(i=1i<25++i){... (5 is interpreted as a line number here) Case 1: flm analysis correct: tid(do5i) tq tral(1.25) Case 2: want: tdo tint(5) tid(i) tq tint(1) tcomma tint(25) Cannot make decision on tdo until we see the comma! Look-ahead comes to the rescue: DO/({letter {digit)*=({letter {digit)*, return tdo COMP 520 Fall 2008 Scanners and Parsers (11) COMP 520 Fall 2008 Scanners and Parsers (12) $ cat print_tokens.l # fle source code /* includes and other arbitrary C code */ %{ #include <stdio.h> /* for printf */ % /* helper definitions */ DIGIT [0-9] /* rege + action rules come after the first */ [ \t\n]+ printf ("white space, length %i\n", yyleng) "*" printf ("times\n") "/" printf ("div\n") "+" printf ("plus\n") "-" printf ("minus\n") "(" printf ("left parenthesis\n") ")" printf ("right parenthesis\n") 0 ([1-9]{DIGIT*) printf ("integer constant: %s\n", yytet) [a-za-z_][a-za-z0-9_]* printf ("entifier: %s\n", yytet) /* user code comes after the second */ main () { yyle () Using fle to create a scanner is really simple: $ emacs print_tokens.l $ fle print_tokens.l $ gcc -o print_tokens le.yy.c -lfl When input a*(b-17) + 5/c: $ echo "a*(b-17) + 5/c"./print_tokens our print tokens scanner outputs: entifier: a times left parenthesis entifier: b minus integer constant: 17 right parenthesis white space, length 1 plus white space, length 1 integer constant: 5 div entifier: c white space, length 1 You should confirm this for yourself!

4 COMP 520 Fall 2008 Scanners and Parsers (13) COMP 520 Fall 2008 Scanners and Parsers (14) Count lines and characters: %{ int lines = 0, chars = 0 % \n lines++ chars++. chars++ main () { yyle () printf ("#lines = %i, #chars = %i\n", lines, chars) Remove vowels and increment integers: %{ #include <stdlib.h> /* for atoi */ #include <stdio.h> /* for printf */ % [aeiouy] /* ignore */ [0-9]+ printf ("%i", atoi (yytet) + 1) main () { yyle () A contet-free grammar is a 4-tuple (V, Σ, R, S), where we have: V, a set of variables (or non-terminals) Σ, a set of terminals such that V Σ = R, a set of rules, where the LHS is a variable in V and the RHS is a string of variables in V and terminals in Σ S V, the start variable CFGs are stronger than regular epressions, and able to epress recursively-defined constructs. ample: we cannot write a regular epression for any number of matched parentheses: (), (()), ((())),... Using a CFG: ( ) ǫ COMP 520 Fall 2008 Scanners and Parsers (15) COMP 520 Fall 2008 Scanners and Parsers (16) Automatic parser generators use CFGs as input and generate parsers using the machinery of a deterministic pushdown automaton. joos.y Simple CFG eample: A a B A ǫ B b B B c Alternatively: A a B ǫ B b B c bison tokens In both cases we specify S = A. Can you write this grammar as a regular epression? y.tab.c gcc parser We can perform a rightmost derivation by repeatedly replacing variables with their RHS until only terminals remain: AST By limiting the kind of CFG allowed, we get efficient parsers. A a B a b B a b b B a b b c

5 COMP 520 Fall 2008 Scanners and Parsers (17) COMP 520 Fall 2008 Scanners and Parsers (18) There are several different grammar formalisms. First, conser BNF (Backus-Naur Form): stmt ::= stmt_epr "" while_stmt block if_stmt while_stmt ::= WHIL "(" epr ")" stmt block ::= "{" stmt_list "" if_stmt ::= IF "(" epr ")" stmt IF "(" epr ")" stmt LS stmt We have four options for stmt list: 1. stmt list ::= stmt list stmt ǫ 0 or more, left-recursive 2. stmt list ::= stmt stmt list ǫ 0 or more, right-recursive 3. stmt list ::= stmt list stmt stmt 1 or more, left-recursive 4. stmt list ::= stmt stmt list stmt 1 or more, right-recursive Second, conser BNF (tended BNF): BNF derivations BNF A A a b b A a A b { a (left-recursive) A a a b a a A a A b b a A A { a b (right-recursive) a a A a a b where { and are like Kleene * s in regular epressions. Using BNF repetition, our four choices for stmt_list become: 1. stmt_list ::= { stmt 2. stmt_list ::= { stmt 3. stmt_list ::= { stmt stmt 4. stmt_list ::= stmt { stmt COMP 520 Fall 2008 Scanners and Parsers (19) COMP 520 Fall 2008 Scanners and Parsers (20) BNF also has an optional-construct. For eample: stmt_list ::= stmt stmt_list stmt could be written as: Third, conser railroad synta diagrams: (thanks rail.sty!)??? stmt_list ::= stmt [ stmt_list ] And similarly: if_stmt ::= IF "(" epr ")" stmt IF "(" epr ")" stmt LS stmt could be written as: if_stmt ::= IF "(" epr ")" stmt [ LS stmt ] where [ and ] are like? in regular epressions.

6 COMP 520 Fall 2008 Scanners and Parsers (21) COMP 520 Fall 2008 Scanners and Parsers (22)??? S S S L S := num L L, S print ( L ) a := 7 b := c + (d := 5 + 6, d) + ( S, ) S (rightmost derivation) S S S := S := + S := + (S, ) S := + (S, ) S := + ( :=, ) S := + ( := +, ) S := + ( := + num, ) S := + ( := num + num, ) S := + ( := num + num, ) := := + ( := num + num, ) := num := + ( := num + num, ) COMP 520 Fall 2008 Scanners and Parsers (23) COMP 520 Fall 2008 Scanners and Parsers (24) S S S L S := num L L, S print ( L ) a := 7 b := c + (d := 5 + 6, d) + S S ( S, ) S := := num + ( S, ) := + A grammar is ambiguous if a sentence has different parse trees: := + + S := + + S := + The above is harmless, but conser: := - - := + * + Clearly, we need to conser associativity and precedence when designing grammars. num num

7 COMP 520 Fall 2008 Scanners and Parsers (25) COMP 520 Fall 2008 Scanners and Parsers (26) An ambiguous grammar: / ( ) num + may be rewritten to become unambiguous: There are fundamentally two kinds of parser: 1) Top-down, predictive or recursive descent parsers. Used in all languages designed by Wirth, e.g. Pascal, Modula, and Oberon. + T T T F F T T T / F F num T T F F ( ) + T T T * F One can (easily) write a predictive parser by hand, or generate one from an LL(k) grammar: Left-to-right parse Leftmost-derivation and F F k symbol lookahead. Algorithm: look at beginning of input (up to k characters) and unambiguously epand leftmost non-terminal. COMP 520 Fall 2008 Scanners and Parsers (27) COMP 520 Fall 2008 Scanners and Parsers (28) 2) Bottom-up parsers. Algorithm: look for a sequence matching RHS and reduce to LHS. Postpone any decision until entire RHS is seen, plus k tokens lookahead. Can write a bottom-up parser by hand (tricky), or generate one from an LR(k) grammar (easy): Left-to-right parse Rightmost-derivation and k symbol lookahead. The -reduce bottom-up parsing technique. 1) tend the grammar with an end-of-file $, introduce fresh start symbol S : S S$ S S S L S := num L L, S print ( L ) + ( S, ) 2) Choose between the following actions: : move first input token to top of stack reduce: replace α on top of stack by X for some rule X α accept: when S is on the stack

8 COMP 520 Fall 2008 Scanners and Parsers (29) COMP 520 Fall 2008 Scanners and Parsers (30) a:=7 b:=c+(d:=5+6,d)$ :=7 b:=c+(d:=5+6,d)$ 7 b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ b:=c+(d:=5+6,d)$ :=c+(d:=5+6,d)$ c+(d:=5+6,d)$ +(d:=5+6,d)$ := := num := S S S S := S := S := S := + S := + ( S := + ( S := + ( := S := + ( := num S := + ( := S := + ( := + +(d:=5+6,d)$ (d:=5+6,d)$ d:=5+6,d)$ :=5+6,d)$ S := + ( := + num S := + ( := + S := + ( := S := + ( S S := + ( S, S := + ( S, S := + ( S, S := + ( S, ) S := + S := S S S S$ S 5+6,d)$ +6,d)$ +6,d)$ 6,d)$,d)$,d)$,d)$,d)$ d)$ )$ )$ $ $ $ $ $ num S := num num + S := (S) + S := S SS S S$ accept 0 S S$ 5 num 1 S S S S := 7 ( S, ) 3 S print ( L ) 8 L 4 9 L L, Use a DFA to choose the action the stack only contains DFA states now. Start with the initial state (s1) on the stack. Lookup (stack top, net input symbol): (n): skip net input symbol and push state n reduce(k): rule k is X α pop α times lookup (stack top, X) in table goto(n): push state n accept: report success error: report failure COMP 520 Fall 2008 Scanners and Parsers (31) COMP 520 Fall 2008 Scanners and Parsers (32) DFA terminals non-terminals state num print, + := ( ) $ S L 1 s4 s7 g2 2 s3 a 3 s4 s7 g5 4 s6 5 r1 r1 r1 6 s20 s10 s8 g11 7 s9 8 s4 s7 g12 9 g15 g14 10 r5 r5 r5 r5 r5 11 r2 r2 s16 r2 12 s3 s18 13 r3 r3 r3 14 s19 s13 15 r8 r8 16 s20 s10 s8 g17 17 r6 r6 s16 r6 r6 18 s20 s10 s8 g21 19 s20 s10 s8 g23 20 r4 r4 r4 r4 r4 21 s22 22 r7 r7 r7 r7 r7 23 r9 s16 r9 s 1 a := 7$ (4) s 1 s 4 := 7$ (6) s 1 s 4 s 6 7$ (10) s 1 s 4 s 6 s 10 $ reduce(5): num s 1 s 4 s 6 ////// s 10 $ lookup(s 6,) = goto(11) s 1 s 4 s 6 s 11 $ reduce(2): S := s 1 //// s 4 //// s 6 ////// s 11 $ lookup(s 1,S) = goto(2) s 1 s 2 $ accept rror transitions omitted.

9 COMP 520 Fall 2008 Scanners and Parsers (33) COMP 520 Fall 2008 Scanners and Parsers (34) LR(1) is an algorithm that attempts to construct a parsing table: Left-to-right parse Rightmost-derivation and 1 symbol lookahead. If no conflicts (/reduce, reduce/reduce) arise, then we are happy otherwise, fi grammar. An LR(1) item (A α. βγ, ) consists of 1. A grammar production, A αβγ 2. The RHS position, represented by. 3. A lookahead symbol, We first compute a set of LR(1) states from our grammar, and then use them to build a parse table. There are four kinds of entry to make: 1. goto: when β is non-terminal 2. : when β is terminal 3. reduce: when β is empty (the net state is the number of the production used) 4. accept: when we have A B. $ Follow construction on the tiny grammar: 0 S $ 2 T 1 T + 3 T An LR(1) state is a set of LR(1) items. The sequence α is on top of the stack, and the head of the input is derivable from βγ. There are two cases for β, terminal or non-terminal. COMP 520 Fall 2008 Scanners and Parsers (35) COMP 520 Fall 2008 Scanners and Parsers (36) Constructing the LR(1) NFA: start with state S. $? state A α. B β l has: ǫ-successor B. γ, if: eists rule B γ, and lookahead(β) B-successor A α B. β l state A α. β l has: -successor A α. β l Constructing the LR(1) DFA: Standard power-set construction, inlining ǫ-transitions. 1 S.$? S.$? 2.T + $.T $ T T. + T.+ $ 3 T. $ T. $ 5 T. + T. $ 6 T+. $ + T + $ T 1 s5 g2 g3 2 a 3 s4 r2 4 s5 g6 g3 5 r3 r3 6 r1 T +. $.T + $.T $ T. $ T. + 4

10 COMP 520 Fall 2008 Scanners and Parsers (37) COMP 520 Fall 2008 Scanners and Parsers (38) Conflicts LR(1) tables may become very large. A.B A C. y no conflict (lookahead deces) Parser generators use LALR(1), which merges states that are entical ecept for lookaheads. A.B A C. /reduce conflict LL(k) LL(1) LR(k) LR(1) A. A C. y /reduce conflict LALR(1) A B. A C. reduce/reduce conflict LL(0) SLR LR(0) A.B A.C B C s i s j / conflict? by construction of the DFA we have s i = s j COMP 520 Fall 2008 Scanners and Parsers (39) COMP 520 Fall 2008 Scanners and Parsers (40) bison (yacc) is a parser generator: it inputs a grammar it computes an LALR(1) parser table it reports conflicts it resolves conflicts using defaults (!) and it creates a C program. Nobody writes (simple) parsers by hand anymore. The grammar: 1 4 / 7 ( ) 2 num is epressed in bison as: %{ /* C declarations */ % /* Bison declarations tokens come from leer (scanner) */ %token tidntifir tintconst %start ep /* Grammar rules after the first */ ep : tidntifir tintconst ep * ep ep / ep ep + ep ep - ep ( ep ) /* User C code after the second */ Input this code into ep.y to follow the eample.

11 COMP 520 Fall 2008 Scanners and Parsers (41) COMP 520 Fall 2008 Scanners and Parsers (42) The grammar is ambiguous: $ bison --verbose ep.y # --verbose produces ep.output ep.y contains 16 /reduce conflicts. $ cat ep.output State 11 contains 4 /reduce conflicts. State 12 contains 4 /reduce conflicts. State 13 contains 4 /reduce conflicts. State 14 contains 4 /reduce conflicts. [...] state 11 ep -> ep. * ep (rule 3) ep -> ep * ep. (rule 3) <-- problem is here ep -> ep. / ep (rule 4) ep -> ep. + ep (rule 5) ep -> ep. - ep (rule 6) *, and go to state 6 /, and go to state 7 +, and go to state 8 -, and go to state 9 * [reduce using rule 3 (ep)] / [reduce using rule 3 (ep)] + [reduce using rule 3 (ep)] - [reduce using rule 3 (ep)] $default reduce using rule 3 (ep) Rewrite the grammar to force reductions: + T T T F F - T T T / F F num T T F F ( ) %token tidntifir tintconst %start ep ep : ep + term ep - term term term : term * factor term / factor factor factor : tidntifir tintconst ( ep ) COMP 520 Fall 2008 Scanners and Parsers (43) COMP 520 Fall 2008 Scanners and Parsers (44) Or use precedence directives: %token tidntifir tintconst %start ep %left + - /* left-associative, lower precedence */ %left * / /* left-associative, higher precedence */ ep : tidntifir tintconst ep * ep ep / ep ep + ep ep - ep ( ep ) which resolve /reduce conflicts: Conflict in state 11 between rule 5 and token + resolved as reduce. <-- Reduce ep + ep. + Conflict in state 11 between rule 5 and token - resolved as reduce. <-- Reduce ep + ep. - Conflict in state 11 between rule 5 and token * resolved as. <-- Shift ep + ep. * Conflict in state 11 between rule 5 and token / resolved as. <-- Shift ep + ep. / The precedence directives are: %left (left-associative) %right (right-associative) %nonassoc (non-associative) When constructing a parse table, an action is chosen based on the precedence of the last symbol on the right-hand se of the rule. Precedences are ordered from lowest to highest on a linewise basis. If precedences are equal, then: %left favors reducing %right favors ing %nonassoc yields an error This usually ends up working. Note that this is not the same state 11 as before.

12 COMP 520 Fall 2008 Scanners and Parsers (45) COMP 520 Fall 2008 Scanners and Parsers (46) state 0 tidntifir, and go to state 1 tintconst, and go to state 2 (, and go to state 3 ep go to state 4 state 1 ep -> tidntifir. (rule 1) $default reduce using rule 1 (ep) state 2 ep -> tintconst. (rule 2) $default reduce using rule 2 (ep)... state 14 ep -> ep. * ep (rule 3) ep -> ep. / ep (rule 4) ep -> ep / ep. (rule 4) ep -> ep. + ep (rule 5) ep -> ep. - ep (rule 6) $default reduce using rule 4 (ep) state 15 $ go to state 16 state 16 $default accept $ cat ep.y %{ #include <stdio.h> /* for printf */ etern char *yytet /* string from scanner */ vo yyerror() { printf ("synta error before %s\n", yytet) % %union { int intconst char *stringconst %token <intconst> tintconst %token <stringconst> tidntifir %start ep %left + - %left * / ep : tidntifir { printf ("load %s\n", $1) tintconst { printf ("push %i\n", $1) ep * ep { printf ("mult\n") ep / ep { printf ("div\n") ep + ep { printf ("plus\n") ep - ep { printf ("minus\n") ( ep ) { COMP 520 Fall 2008 Scanners and Parsers (47) COMP 520 Fall 2008 Scanners and Parsers (48) $ cat ep.l %{ #include "y.tab.h" /* for ep.y types */ #include <string.h> /* for strlen */ #include <stdlib.h> /* for malloc and atoi */ % [ \t\n]+ /* ignore */ "*" return * "/" return / "+" return + "-" return - "(" return ( ")" return ) 0 ([1-9][0-9]*) { yylval.intconst = atoi (yytet) return tintconst [a-za-z_][a-za-z0-9_]* { yylval.stringconst = (char *) malloc (strlen (yytet) + 1) sprintf (yylval.stringconst, "%s", yytet) return tidntifir. /* ignore */ $ cat main.c vo yyparse() int main (vo) { yyparse () Using fle/bison to create a parser is simple: $ fle ep.l $ bison --yacc --defines ep.y # note compatability options $ gcc le.yy.c y.tab.c y.tab.h main.c -o ep -lfl When input a*(b-17) + 5/c: $ echo "a*(b-17) + 5/c"./ep our ep parser outputs the correct order of operations: load a load b push 17 minus mult push 5 load c div plus You should confirm this for yourself!

13 COMP 520 Fall 2008 Scanners and Parsers (49) COMP 520 Fall 2008 Scanners and Parsers (50) If the input contains synta errors, then the bison-generated parser calls yyerror and stops. We may ask it to recover from the error: ep : tidntifir { printf ("load %s\n", $1)... ( ep ) error { yyerror() and on input a@(b-17) ++ 5/c get the output: load a synta error before ( synta error before ( synta error before ( synta error before b push 17 minus synta error before ) synta error before ) synta error before + plus push 5 load c div plus SableCC (by tienne Gagnon, McGill alumnus) is a compiler compiler: it takes a grammatical description of the source language as input, and generates a leer (scanner) and parser for it. joos.sablecc SableCC joos/*.java javac foo.joos scanner& parser CST/AST rror recovery hardly ever works. COMP 520 Fall 2008 Scanners and Parsers (51) COMP 520 Fall 2008 Scanners and Parsers (52) The SableCC 2 grammar for our Tiny language: Package tiny Helpers tab = 9 cr = 13 lf = 10 digit = [ ] lowercase = [ a.. z ] uppercase = [ A.. Z ] letter = lowercase uppercase letter = letter _ char = letter _ digit Tokens eol = cr lf cr lf blank = tab star = * slash = / plus = + minus = - l_par = ( r_par = ) number = 0 [digit- 0 ] digit* = letter char* Productions ep = {plus ep plus factor {minus ep minus factor {factor factor factor = {mult factor star term {divd factor slash term {term term term = {paren l_par ep r_par { {number number Version 2 produces parse trees, a.k.a. concrete synta trees (CSTs). Ignored Tokens blank, eol

14 COMP 520 Fall 2008 Scanners and Parsers (53) The SableCC 3 grammar for our Tiny language: Productions cst_ep {-> ep = {cst_plus cst_ep plus factor {-> New ep.plus(cst_ep.ep,factor.ep) {cst_minus cst_ep minus factor {-> New ep.minus(cst_ep.ep,factor.ep) {factor factor {-> factor.ep factor {-> ep = {cst_mult factor star term {-> New ep.mult(factor.ep,term.ep) {cst_divd factor slash term {-> New ep.divd(factor.ep,term.ep) {term term {-> term.ep term {-> ep = {paren l_par cst_ep r_par {-> cst_ep.ep {cst_ {-> New ep.() {cst_number number {-> New ep.number(number) Abstract Synta Tree ep = {plus [l]:ep [r]:ep {minus [l]:ep [r]:ep {mult [l]:ep [r]:ep {divd [l]:ep [r]:ep { {number number Version 3 generates abstract synta trees (ASTs).

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren. COMP 520 Winter 2015 Parsing COMP 520: Compiler Design (4 credits) Professor Laurie Hendren hendren@cs.mcgill.ca Parsing (1) COMP 520 Winter 2015 Parsing (2) A parser transforms a string of tokens into