Parsing Techniques C41/C413 Introduction to Compilers Tim Teitelbaum Lecture 11: yntax-irected efinitions February 16, 005 LL parsing Computes a Leftmost derivation Builds the derivation top-down LL parsing table indicates which production to use for expanding the rightmost non-terminal LR parsing Computes a Rightmost derivation Builds the derivation bottom-up Uses a set of LR states and a stack of symbols LR parsing table indicates, for each state, what action to perform (shift/reduce) and what state to go to next Use these techniques to construct an AT C 41/413 pring 005 Introduction to Compilers 1 C 41/413 pring 005 Introduction to Compilers AT Review erivation = sequence of applied productions E 1 1 E 1 Parse tree = graph representation of a derivation oesn t capture the order of applying the productions Abstract yntax Tree (AT) discards unnecessary information from the parse tree Parse Tree E 1 E E 3 AT 1 3 AT ata tructures abstract class Expr { class extends Expr { Expr left, right; (Expr L, Expr R) { left = L; right = R; class Num extends Expr { int value; Num (int v) { value = v; Num 1 Num Num 3 C 41/413 pring 005 Introduction to Compilers 3 C 41/413 pring 005 Introduction to Compilers 4 Implicit AT Construction LL/LR parsing techniques implicitly build the AT The parse tree is captured in the derivation LL parsing: AT is implicitly represented by the sequence of applied productions LR parsing: AT is implicitly represented by the sequence of applied reductions We want to explicitly construct the AT during the parsing phase: add code in the parser to explicitly build the AT AT Construction LL parsing: extend procedures for nonterminals Example: E' ' ε E num ( ) vo parse_() { switch (token) { case num: case ( : parse_e(); parse_ (); return; default: throw new ParseError(); Expr parse_() { switch (token) { case num: case ( : Expr left = parse_e(); Expr right = parse_ (); if (right == null) return left; else return new (left, right); default: throw new ParseError(); C 41/413 pring 005 Introduction to Compilers 5 C 41/413 pring 005 Introduction to Compilers 6
AT Construction LR parsing We need again to add code for explicit AT construction AT construction mechanism for LR Parsing tore parts of the tree on the stack For each nonterminal symbol B on stack, also store the sub-tree rooted at B on stack Whenever the parser performs a reduce operation for a production B γ, create an AT node for B AT Construction for LR Parsing Example stack E Before reduction E Num() Num(3) Num(1) E E num ( ) Num(1) After reduction E Num() Num(3) C 41/413 pring 005 Introduction to Compilers 7 C 41/413 pring 005 Introduction to Compilers 8 Problems Unstructured code: mixed parsing code with AT construction code Automatic parser generators The generated parser needs to contain AT construction code How to construct a customized AT data structure using an automatic parser generator? May want to perform other actions concurrently with the parsing phase E.g. semantic checks This can reduce the number of compiler passes yntax-irected efinition olution: syntax-directed definition Extends each grammar production with an associated semantic action (code): E { action The parser generator adds these actions into the generated parser Each action is executed when the corresponding production is reduced C 41/413 pring 005 Introduction to Compilers 9 C 41/413 pring 005 Introduction to Compilers 10 emantic Actions Actions = code in a programming language ame language as the automatically generated parser Examples: Yacc = actions written in C CUP = actions written in Java The actions access the parser stack! Parser generators extend the stack of states (corresponding to RH symbols) symbols with entries for user-defined structures (e.g., parse trees) The action code should be able to refer to the states (corresponding to the RH grammar symbols in the production Need a naming scheme C 41/413 pring 005 Introduction to Compilers 11 Naming cheme Need names for grammar symbols to use in the semantic action code Need to refer to multiple occurrences of the same nonterminal symbol E E 1 E istinguish the nonterminal on the LH E 0 E E C 41/413 pring 005 Introduction to Compilers 1
Naming cheme: CUP CUP: Name RH nonterminal occurrences using distinct, user-defined labels: expr ::= expr:e1 PLU expr:e Use keyword REULT for LH nonterminal CUP Example: expr ::= expr:e1 PLU expr:e {: REULT = e1 e; : Naming cheme: yacc Yacc: Uses keywords: $1 refers to the first RH symbol, $ refers to the second RH symbol, etc. Keyword $$ refers to the LH nonterminal YaccExample: expr ::= expr PLU expr { $$ = $1 $3; C 41/413 pring 005 Introduction to Compilers 13 C 41/413 pring 005 Introduction to Compilers 14 Building the AT Use semantic actions to build the AT AT is built bottom-up along with parsing non terminal Expr expr; User-defined type for semantic objects on the stack Nonterminal name expr ::= NUM:i {: REULT = new Num(i.val); : expr ::= expr:e1 PLU expr:e {: REULT = new (e1,e); : expr ::= expr:e1 MULT expr:e {: REULT = new Mul(e1,e); : expr ::= LPAR expr:e RPAR {: REULT = e; : Example E num (E) EE E*E Parser stack stores value of each nonterminal (1)*3 (1 )*3 (E Num(1) )*3 REULT=new Num(1) (E )*3 (EE Num() )*3 REULT=new Num() (E (, ) )*3 REULT=new (e1,e) (E) *3 E *3 REULT=e C 41/413 pring 005 Introduction to Compilers 15 C 41/413 pring 005 Introduction to Compilers 16 AT esign Keep the AT abstract o not introduce a tree node for every node in parse tree (not very abstract) E ( ) E E 3 1 E? 1 3 um Expr um LPar um RPar Expr Expr um 3 1 Expr C 41/413 pring 005 Introduction to Compilers 17 AT esign o not use one single class AT_node E.g., need information for if, while,, *, I, NUM class AT_node { int node_type; AT_node[ ] children; tring name; int value; etc Problem: must have fields for every different kind of node with attributes Not extensible, Java type checking no help C 41/413 pring 005 Introduction to Compilers 18
Use Class Hierarchy Can use subclassing to solve problem Use an abstract class for each interesting set of non-terminals in grammar (e.g. expressions) E EE E*E -E (E) abstract class Expr { class extends Expr { Expr left, right; class Mult extends Expr { Expr left, right; // or: class BinExpr extends Expr { Oper o; Expr l, r; class Minus extends Expr { Expr e; Another Example E ::= num (E) EE ::= E ; if (E) if (E) else = E ; ; abstract class Expr { class Num extends Expr { Num(int value) class extends Expr { (Expr e1, Expr e) class Id extends Expr { Id(tring name) abstract class tmt { class If extends tmt { If(Expr c, tmt s1, tmt s) class Empty extends tmt { Empty() class Assign extends tmt { Assign(tring, Expr e) C 41/413 pring 005 Introduction to Compilers 19 C 41/413 pring 005 Introduction to Compilers 0 Other yntax-irected efinitions Can use syntax-directed definitions to perform semantic checks during parsing E.g. type-checking Benefit = efficiency One single compiler pass for multiple tasks isadvantage = unstructured code Mixes parsing and semantic checking phases Perform checks while AT is changing Limited to one pass in bottom-up order C 41/413 pring 005 Introduction to Compilers 1 Type eclaration Example T 1, { Type(, T.type);.type = T.type; { Type(, 1.type);.type = 1.type; T int { T.type = inttype; T float { T.type = floattype; C 41/413 pring 005 Introduction to Compilers Propagation of Values Propagate type attributes while building the AT int a, b T.type.type T inttype int.type, Type(,.type) Type(,T.type) T L Another Example {.type = T.type; L.type = T.type; T int { T.type = inttype; T float { T.type = floattype; L { Type(,???); L L 1, { Type(, L 1.type);??? C 41/413 pring 005 Introduction to Compilers 3 C 41/413 pring 005 Introduction to Compilers 4
Propagation of Values Propagate values both bottom-up and top-down int a, b T.type T inttype int LR parsing: AT is built bottom-up! L.type L.type L L Type(,L.Type), Type(,L.type) tructured Approach eparate AT construction from semantic checking phase Traverse the AT and perform semantic checks (or other actions) only after the tree has been built and its structure is stable This approach is more flexible and less error-prone It is better when efficiency is not a critical issue C 41/413 pring 005 Introduction to Compilers 5 C 41/413 pring 005 Introduction to Compilers 6 Where We Are ummary ource code (character stream) Token stream Abstract syntax tree (AT) if (b == 0) a = b; if ( b == 0 ) a = b ; if == b 0 = a b Lexical Analysis yntax Analysis (Parsing) emantic Analysis yntax-directed definitions attach semantic actions to grammar productions Easy to construct the AT using syntax-directed definitions Can use syntax-directed definitions to perform semantic checks eparate AT construction from semantic checks or other actions which traverse the AT C 41/413 pring 005 Introduction to Compilers 7 C 41/413 pring 005 Introduction to Compilers 8