CS2210: Compiler Construction Syntax Analysis Syntax Analysis


 Cameron Johnston
 4 years ago
 Views:
Transcription
1
2 Comparison with Lexical Analysis The second phase of compilation Phase Input Output Lexer string of characters string of tokens Parser string of tokens Parse tree/ast
3 What Parse Tree? CS2210: Compiler Construction A parse tree represents the program structure of the input Programming language constructs usually have recursive structures Ifstmt if (EXPR) then Stmt else Stmt fi Stmt Ifstmt Whilestmt...
4 A Parse Tree Example Code to be compiled:... if x==y then... else... fi Lexer: Parser: Input: sequence of tokens... IF ID==ID THEN... ELSE... FI Desired output: IFSTMT == STMT STMT ID ID
5 How to Specify a Syntax How can we specify a syntax with nested structures? Is it possible to use RE/FA? RE(Regular Expression) FA(Finite Automata)
6 How to Specify a Syntax How can we specify a syntax with nested structures? Is it possible to use RE/FA? RE(Regular Expression) FA(Finite Automata) RE/FA is not powerful enough Example: matching parenthesis: # of ( equals # of ) (x+y)*z ((x+y)+y)*z... (...(((x+y)+y)+y)...) ((x+y)+y)+y)*z
7 RE/FA is Not Powerful Enough Pumping Lemma for Regular Languages In FA, all sufficiently long words may be pumped that is, have a substring repeated an arbitrary number of times If an FA is to accept a string longer than the number of states, there must be a set of states that are repeated (q s... q t in above picture) Let y be the substring that is repeated. Then all strings xy i z (i >= 0) are also accepted and is part of language
8 RE/FA is Not Powerful Enough Matching parentheses is not a Regular Language Suppose it is RL and there is an FA for this language Now construct a string with matching ()s long enough such that there are more (s than the number of states Then, there would be a pumping y filled entirely with ( Then, any string xy i z will also be accepted by FA but not be matching Contradiction QED Similarly, any language with nested structures is not an RL
9 What Language is C Syntax? An example of a Context Free Language (CFL) A broader category of languages that includes languages with nested structures Before discussing CFL, we need to learn a more general way of specifying languages than RE, called Grammars Can specify both RL and CFL and more...
10 Grammar General Language Specification Recall language definition Language set of strings over alphabet Alphabet: finite set of symbols Sentences: strings in the language A grammar is a general way to specify a language E.g. The English grammar specifies the English language
11 An Example CS2210: Compiler Construction Language L = { any string with 00 at the end } 1 A 0 B 0 C 0 Grammar G = { T, N, s, δ } where T = { 0, 1 }, N = { A, B }, s = A, and grammar rule set δ = { A 0A 1A 0B, B 0 } Derivation: from grammar to language A 0A 00B 000 A 1A 10B 100 A 0A 00A 000B 0000 A 0A 01A...
12 Grammar CS2210: Compiler Construction A grammar consists of 4 components (T, N, s, δ) T set of terminal symbols Essentially tokens leaves in the parse tree N set of nonterminal symbols Internal nodes in the parse tree One or more symbols grouped into higher construct example: declaration, statement, loop,... s the nonterminal start symbol where derivation starts δ a set of production rules LHS RHS : lefthandside produces righthandside
13 Production Rule and Derivation Production Rule: LHS RHS Meaning: LHS can be replaced with RHS A derivation a series of applications of production rules Derivation: β α Meaning: String α is derived from β β α 1 step β α 0 or more steps β = α 0 or more steps example: A 0A 00B 000 A = 000 A = + 000
14 Noam Chomsky Grammars Classification of languages based on form of grammar rules Four(4) types of grammars: Type 0 recursive grammar Type 1 context sensitive grammar Type 2 context free grammar Type 3 regular grammar
15 Type 0: Unrestricted Grammar Type 0 grammar unrestricted or recursively enumerable Form of rules α β where α (N T ) +, β (N T ) Implied restrictions: LHS: no ε allowed Example: ab acd ; LHS is shorter than RHS aab ab ; LHS is longer than RHS A ε ; εproductions are allowed Computational complexity: unbounded Derivation strings may contract and expand repeatedly (Since LHS may be longer or shorter than RHS) Unbounded number of productions before target string L(Unrestricted) L(Turing Machine)
16 Type 1: Context Sensitive Grammar Type 1 grammar context sensitive grammar Form of rules αaβ αγβ where A N, α, β (N T ), γ (N T ) + Replace A by γ only if found in the context of α and β Implied restrictions: LHS: shorter or equal to RHS RHS: no ε allowed Example: aab acb ; Replace A with C when in between a and B A C ; Replace A with C regardless of context Computational complexity: likely NPComplete Derivation strings may only expand Bounded number of derivations before target string L(CSG) L(Linear Bounded Turing Machine)
17 Type 2: Context Free Grammar Type 2 grammar context free grammar Form of rules A γ where A N, γ (N T ) + Replace A by γ (no context can be specified) Implied restrictions: LHS: a single nonterminal RHS: no ε allowed Sometimes relaxed to simplify grammar but rules can always be rewritten to exclude εproductions Example: A abc ; Replace A with abc regardless of context Computational complexity: Polynomial O(n ), but most real world CFGs are O(n) L(CFG) L(Pushdown Automaton)
18 Type 3: Regular Grammar Type 3 grammar regular grammar Form of rules A α, or A αb where A, B N, α T In terms of FA: Move from state A to state B on input α Implied restrictions: LHS: a single nonterminal RHS: a terminal or a terminal followed by a nonterminal Example: A 1A 0 Computational complexity: Linear O(n) Derivation string length increases by 1 at each step L(RG) L(Finite Automaton) L(Regular Expression)
19 Differentiating Type 2 (RG) and 3 (CFG) Language L1 = { [ i ] j i, j >= 1} Regular grammar S [ S [ T T ] T ] Language L2 = { [ i ] i i >= 1} Context free grammar S [ S ] [ ]
20 Differentiating Type 1 (CSG) and 2 (CFG) Type 2 grammar (context free) S D U D int x; int y; U x=1; y=1; Type 1 grammar (context sensitive) S D U D int x; int y; int x; U int x; x=1; int y; U int y; y=1;
21 Differentiating Type 1 (CSG) and 2 (CFG) Language from type 2 (CFG): S DU int x; U int x; x=1; S DU int x; U int x; y=1; S DU int y; U int y; x=1; S DU int y; U int y; y=1; Language from type 1 (CSG): S DU int x; U int x; x=1; S DU int y; U int y; y=1;
22 Differentiating Type 1 (CSG) and 2 (CFG) Language from type 2 (CFG): S DU int x; U int x; x=1; S DU int x; U int x; y=1; S DU int y; U int y; x=1; S DU int y; U int y; y=1; Language from type 1 (CSG): S DU int x; U int x; x=1; S DU int y; U int y; y=1; If PLs are contextsensitive, why use CFGs for parsing? CSG parsers are provably inefficient (PSPACEComplete) Most PL constructs are contextfree: Ifstmt, declarations The remaining contextsensitive constructs can be analyzed at the semantic analysis stage (e.g. defbeforeuse, matching formal/actual parameters)
23 Language Classification Regular Grammar CFG CSG Recursive Grammar Type 0 Type 1 Type 2 Type 3
24 Language Classification Regular Grammar CFG CSG Recursive Grammar Type 0 Type 1 Type 2 Type 3 However, L y L x where L x :[ i ] k RG, L y :[ i ] i CFG Is it a problem?
25 Context Free Grammars
26 Grammar and Grammar is used to derive string or construct parser A derivation is a sequence of applications of rules Starting from the start symbol S (sentence) Leftmost and Rightmost derivations At each derivation step, leftmost derivation always replaces the leftmost nonterminal symbol Rightmost derivation always replaces the rightmost one
27 Examples CS2210: Compiler Construction E E * E E + E ( E ) id leftmost derivation E E + E E * E + E id * E + E id * id + E... id * id + id * id rightmost derivation E E + E E + E * E E + E * id E + id * id... id * id + id * id
28 Parse Trees CS2210: Compiler Construction Parse tree structure Describes program structure (defined by the rules applied) Internal nodes are nonterminals, leaves are terminals Structure depends on what production rules were applied Grammars should specify precisely what rules to apply to avoid ambiguity in program structure Structure does not depend on order of application Same tree for previous rightmost/leftmost derivations Order of rule application is implementation dependent
29 Parse Trees CS2210: Compiler Construction Parse tree structure Describes program structure (defined by the rules applied) Internal nodes are nonterminals, leaves are terminals Structure depends on what production rules were applied Grammars should specify precisely what rules to apply to avoid ambiguity in program structure Structure does not depend on order of application Same tree for previous rightmost/leftmost derivations Order of rule application is implementation dependent E E + E E * E E * E id id id id
30 Different Parse Trees Consider the string id * id + id * id can result in 3 different trees (and more) for this grammar E E + E E E * E E E * E E * E E * E id E + E id E * E id id id id id E * E E + E id id id id id
31 Ambiguity CS2210: Compiler Construction A grammar G is ambiguous if there exist a string str L(G) such that more than one parse tree derives str In practice, we prefer unambiguous grammars Fortunately, ambiguity is (often) the property of a grammar and not the language It is (often) possible to rewrite grammar to remove ambiguity Programming languages are designed not to be ambiguous it s the job of compiler writers to express those rules accurately in the grammar
32 How to Remove Ambiguity Step I: Specify precedence Build precedence into grammar by recursion on a different nonterminal for each precedence level. Insight: Lower precedence tend to be higher in tree (close to root) Higher precedence tend to be lower in tree (far from root) Place lower precedence nonterminals higher up in the tree E E + E E  E E * E E / E E E ( E ) id Rewrite it to: E E + T E  T T ; Lowest precedence: +,  T T * F T / F F ; Middle precedence: *, / F B F B ; Highest precedence: B id ( E ) ; Even higher precedence: ()
33 How to Remove Ambiguity Step II: Specify associativity Allow recursion only on either left or right nonterminal Left associative recursion on left nonterminal Right associative recursion on right nonterminal In the previous example, E E + E... ; allows both left/right associativity Rewrite it to: E E + T... F B F... ; + is left associative ; exponent is right associative
34 Properties of Context Free Grammars Decidable: computable using a Turing Machine In other words, guaranteed an answer in finite time E.g. if a string is in a context free language: decidable In fact we can run a CFL parser in polynomial time If a CFG is ambiguous: Undecidable In general, impossible to tell if a CFG is ambiguous Can only check whether it is ambiguous for a given string In practice, parser generators like Yacc check for a more restricted grammar (e.g. LALR(1)) instead LALR(1) is a subset of unambiguous grammars Whether grammar is LALR(1) is easily decidable If two CFGs generate same language: Undecidable Impossible to tell if language changed by tweaking grammar Parsers are regression tested against a test set frequently
35 From Grammar to Parser What exactly is parsing, or syntax analysis? To process an input string for a given grammar, and compose the derivation if the string is in the language Two subtasks to determine if string is in the language or not to construct a parse tree (a representation of the derivation)
36 From Grammar to Parser What exactly is parsing, or syntax analysis? To process an input string for a given grammar, and compose the derivation if the string is in the language Two subtasks to determine if string is in the language or not to construct a parse tree (a representation of the derivation) How would you construct a parser from a grammar?
37 Types of Parsers CS2210: Compiler Construction Universal parser Can parse any CFG e.g. Early s algorithm Powerful but can be very inefficient (O(N 3 ) where N is length of string) Topdown parser Starts from root and expands into leaves Tries to expand start symbol to input string Finds leftmost derivation Only works for a certain class of CFGs called LL But provably works in O(n) time Parser code structure closely mimics grammar Amenable to implementation by hand Automated tools exist to convert to code (e.g. ANTLR)
38 Types of Parsers (cont.) Bottomup parser Starts at leaves and builds up to root Tries to reduce the input string to the start symbol Finds reverse order of the rightmost derivation Works for a wider class of CFGs called LR Also provably works in O(n) time Parser code structure nothing like grammar Very difficult to implement by hand Automated tools exist to convert to code (e.g. Yacc, Bison)
39 What Output do We Want? The output of parsing is parse tree, or abstract syntax tree An abstract syntax tree is abbreviated representation of a parse tree drops some details without compromising meaning some terminal symbols that no longer contribute to semantics are dropped (e.g. parentheses) internal nodes may contain terminal symbols
40 An Example CS2210: Compiler Construction Consider the grammar E int ( E ) E + E and an input 5 + ( ) After lexical analysis, we have a sequence of tokens INT 5 + ( INT 2 + INT 3 )
41 Parse Tree of the Input E E + E INT 5 ( E ) E + E INT 2 INT A parse tree 3 Traces the process of derivation Captures all the rules that were applied but contains redundant information parentheses singlesuccessor nodes
42 Abstract Syntax Tree An Abstract Syntax Tree (AST) for the input PLUS PLUS AST captures all that is semantically relevant AST but is more compact in representation AST abstracts from parse tree (a.k.a. concrete syntax tree) ASTs are used in most compilers rather than parse trees
43 How are ASTs constructed in a compiler? Through implementation of semantic actions Semantic actions: actions triggered on a rule match Already used in lexical analysis to return token tuples For syntactic analysis, involves computing semantic value of whole construct from values of its parts To construct AST, attach an attribute for each symbol X Attributes contain the semantic value for each symbol X.ast the constructed AST for symbol X Then attach semantic actions to production rules, e.g. X Y 1 Y 2...Y n { actions } actions define X.ast using Y i.ast (1 i n)
44 Example CS2210: Compiler Construction For the previous example, we have E int { E.ast = mkleaf(int.lval) } E1 + E2 { E.ast = mkplus(e1.ast, E2.ast) } (E1) { E.ast = E1.ast } Two functions that need to be defined ptr1=mkleaf(n) create a leaf node and assign value n ptr2=mkplus(t1, t2) create a tree node and assign value PLUS, then attach two child trees t1 and t2 ptr1 ptr2 n PLUS t1 t2
45 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation)
46 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation) E1.ast=mkleaf(5) 5
47 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation) E1.ast=mkleaf(5) E2.ast=mkleaf(2) 5 2
48 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation) E1.ast=mkleaf(5) E2.ast=mkleaf(2) E3.ast=mkleaf(3) 5 2 3
49 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation) E4.ast=mkplus(E2.ast, E3.ast) PLUS E1.ast=mkleaf(5) E2.ast=mkleaf(2) E3.ast=mkleaf(3) 5 2 3
50 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation) E4.ast=mkplus(E2.ast, E3.ast) PLUS E1.ast=mkleaf(5) 5 2 3
51 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation) E5.ast=mkplus(E1.ast, E4.ast) PLUS E4.ast=mkplus(E2.ast, E3.ast) PLUS E1.ast=mkleaf(5) 5 2 3
52 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottomup LR(1) parser (Order can change depending on parser implementation) E5.ast=mkplus(E1.ast, E4.ast) PLUS PLUS 5 2 3
53 Summary CS2210: Compiler Construction Compilers specify program structure using CFG Most programming languages are not context free Context sensitive analysis can easily separate out to semantic analysis phase A parser uses CFG to... answer if an input str L(G)... and build a parse tree... or build an AST instead... and pass it to the rest of compiler
54 Parsing
55 Parsing CS2210: Compiler Construction We will study two approaches Topdown Easier to understand and implement manually Bottomup More powerful, but hard to implement manually
56 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1)
57 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B
58 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C
59 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C c
60 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D c
61 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D c d
62 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D c d a c b d
63 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D C c d a c b d
64 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B A a C b D C c d a c b d
65 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B A a C b D C D c d a c b d
66 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B A B a C b D C D c d a c b d
67 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) S A B A B a C b D C D c d a c b d
68 Top Down Parsers CS2210: Compiler Construction Recursive descent parser Implemented using recursive calls to functions that implement the expansion of each nonterminal Goes through all possible expansions by trialanderror until match with input; backtracks when mismatch detected Simple to implement, but may take exponential time Predictive parser Recursive descent parser with prediction (no backtracking) Predict next rule by looking ahead k number of symbols Restrictions on the grammar to avoid backtracking Only works for a class of grammars called LL(k) Nonrecursive predictive parser Predictive parser with no recursive calls Table driven suitable for automated parser generators
69 Recursive Descent Example E T + E T T int * T int ( E ) input string: start symbol: int * int E initial parse tree is E
70 Recursive Descent Example E T + E T T int * T int ( E ) input string: start symbol: int * int E initial parse tree is E Assume: when there are alternative rules, try right rule first
71 Parsing Sequence (using Backtracking) E
72 Parsing Sequence (using Backtracking) E T pick right most rule E T
73 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E)
74 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E) ( does not match int
75 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level
76 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level
77 Parsing Sequence (using Backtracking) E T ( E ) int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int
78 Parsing Sequence (using Backtracking) E T ( E ) int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level
79 Parsing Sequence (using Backtracking) E T ( E ) int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level
80 Parsing Sequence (using Backtracking) E T ( E ) int int * T pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int *
81 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E)
82 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level
83 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level
84 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) int * int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level pick next T int
85 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) int * int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level pick next T int match, accept
86 Recursive Descent Parsing uses Backtracking Approach: for a nonterminal in the derivation, productions are tried in some order until A production is found that generates a portion of the input, or No production is found that generates a portion of the input, in which case backtrack to previous nonterminal Terminals of the derivation are compared against input Match advance input, continue parsing Mismatch backtrack, or fail Parsing fails if no production for the start symbol generates the entire input
87 Implementation CS2210: Compiler Construction Create a procedure for each nonterminal 1. For RHS of each production rule, a. For a terminal, match with input symbol and consume b. For a nonterminal, call procedure for that nonterminal c. If match succeeds for entire RHS, return success d. If match fails, regurgitate input and try next production rule 2. If match succeeds for any rule, apply that rule to LHS
88 Sample Code CS2210: Compiler Construction Sample implementation of parser for previous grammar: E T + E T T int * T int ( E ) int match(int token) { return getnext() == token; } int expr() { R1: if (!term() ) goto R2; if (!match(addnum) ) goto R2; if (!expr() ) goto R2; return 1; R2: /* backtrack R1 */ if (!term() ) goto Fail: return 1; Fail: /* backtrack R2 */ return 0; } int term() { R1: if (!match(intnum) ) goto R2; if (!match(starnum) ) goto R2; if (!term() ) goto R2; return 1; R2: /* backtrack R1 */ if (!match(intnum) ) goto R3: return 1; R3: /* backtrack R2 */ if (!match(leftparennum) ) goto Fail; if (!expr() ) goto Fail; if (!match(rightparennum) ) goto Fail; return 1; Fail: /* backtrack R3 */ return 0; }
89 Left Recursion Problem Recursive descent doesn t work with left recursion Right recursion is okay Why is left recursion a problem? For left recursive grammar A A b c We may repeatedly choose to apply A b A A b A b b... Sentence can grow indefinitely w/o consuming input How do you know when to stop recursion and choose c?
90 Left Recursion Problem Recursive descent doesn t work with left recursion Right recursion is okay Why is left recursion a problem? For left recursive grammar A A b c We may repeatedly choose to apply A b A A b A b b... Sentence can grow indefinitely w/o consuming input How do you know when to stop recursion and choose c? Rewrite the grammar so that it is right recursive Which expresses the same language
91 Remove Left Recursion In general, we can eliminate all immediate left recursion A A x y change to A y A A x A ε Not all left recursion is immediate may be hidden in multiple production rules A BC D B AE F... see Section 4.3 for elimination of general left recursion... (not required for this course)
92 Summary of Recursive Descent Recursive descent is a simple and general parsing strategy Leftrecursion must be eliminated first Can be eliminated automatically as in previous slide L(Recursive_Descent) L(CFG) CFL However it is not popular because of backtracking Backtracking requires reparsing the same string Which is inefficient (can take exponential time) Also undoing semantic actions may be difficult E.g. removing already added nodes in parse tree Techniques used in practice do no backtracking... at the cost of restricting the class of grammar
93 Predictive Parsers CS2210: Compiler Construction A topdown parser with no backtracking: choose the appropriate production given next input terminal If first terminal of every alternative production is unique A a B D b B B B c b c e D d parsing input abced requires no backtracking
94 Predictive Parsers CS2210: Compiler Construction A topdown parser with no backtracking: choose the appropriate production given next input terminal If first terminal of every alternative production is unique A a B D b B B B c b c e D d parsing input abced requires no backtracking Rewriting grammars for predictive parsing Left factor to enable prediction A αβ αγ can be changed to A α A A β γ Eliminate left recursion, just like for recursive descent
95 LL(k) Parsers CS2210: Compiler Construction LL(k) Parser A predictive parser that uses k lookahead tokens L left to right scan L leftmost derivation k k symbols of lookahead LL(k) Grammar A grammar that can parsed using a LL(k) parser LL(k) Language A language that can be expressed as a LL(k) grammar LL(k) languages are a restricted subset of CFLs But many languages are LL(k).. in fact many are LL(1)! Can be implemented in a recursive or nonrecursive fashion
96 Nonrecursive Predictive Parser input tokens $ read head syntax stack TOP parser driver output $ parse table M[N,T] Syntax stack holds unmatched portion of derivation string Parser driver next action based on (current token, stack top) Parse table M[A,b] an entry containing rule A... or error Tabledriven: amenable to automatic code generation (just like lexers)
97 A Sample LL(1) Parse Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε Implementation with 2D parse table First column lists all nonterminals First row lists all possible terminals and $ A table entry contains one production One action for each (nonterminal, input) combination It predicts the correct action based on one lookahead No backtracking required
98 Algorithm for Parsing X symbol at the top of the syntax stack a current input symbol Parsing based on (X,a) If X==a==$, then parser halts with success If X==a!=$, then pop X from stack and advance input head If X!=a, then Case (a): if X T, then parser halts with failed, input rejected Case (b): if X N, M[X,a] = X RHS pop X and push RHS to stack in reverse order
99 then CS2210: Compiler Construction Push RHS in Reverse Order X symbol at the top of the syntax stack a current input symbol if M[X,a] = X B c D X... $ B c D... $
100 Applying LL(1) Parsing to a Grammar Given our old grammar E T + E T T int * T int ( E ) No left recursion But require left factoring After rewriting grammar, we have E T X X + E ε T int Y ( E ) Y * T ε
101 Using the Parse Table To recognize int * int input int * int $ parser driver E $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
102 Using the Parse Table To recognize int * int input int * int $ parser driver E $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
103 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
104 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
105 Using the Parse Table To recognize int * int input int * int $ parser driver int Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
106 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
107 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
108 Using the Parse Table To recognize int * int input int * int $ parser driver * T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
109 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
110 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
111 Using the Parse Table To recognize int * int input int * int $ parser driver int Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
112 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
113 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
114 Using the Parse Table To recognize int * int input int * int $ parser driver X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
115 Using the Parse Table To recognize int * int input int * int $ parser driver X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
116 Using the Parse Table To recognize int * int input int * int $ parser driver $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
117 Using the Parse Table To recognize int * int input int * int $ parser driver Accept!!! $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
118 Recognition Sequence It is possible to write in a action list Stack Input Action E $ int * int $ E TX T X $ int * int $ T int Y int Y X $ int * int $ match Y X $ * int $ Y * T * T X $ * int $ match T X $ int $ T int Y int Y X $ int $ match Y X $ $ Y ε X $ $ X ε $ $ halt and accept
119 Constructing the Parse Table Need to know 2 sets First(A): set of terminals that can appear first in A Follow(A): set of terminals that can appear following A
120 Intuitive Meaning of First and Follow S α A a β c γ c First(A) a Follow(A) Why is the Follow Set important?
121 First(α) CS2210: Compiler Construction First(α) = set of terminals that start string of terminals derived from α. Apply followsing rules until no terminal or ε can be added 1). If t T, then First(t)={t}. For example First(+)={+}. 2). If X N and X ε exists, then add ε to First(X). For example, First(Y) = {*, ε}. 3). If X N and X Y 1 Y 2 Y 3...Y m, then Add (First(Y 1 )  ε) to First(X). If First(Y 1 ),..., First(Y k 1 ) all contain ε, then add ( P 1 i k First(Y i) ε) to First(X). If First(Y 1 ),..., First(Y m) all contain ε, then add ε to First(X).
122 Follow(α) CS2210: Compiler Construction Follow(α) = {t S αtβ} Intuition: if X A B, then First(B) Follow(A) little trickier because B may be ε i.e. B * ε Apply followsing rules until no terminal or ε can be added 1). $ Follow(S), where S is the start symbol. e.g. Follow(E) = {$... }. 2). Look at the occurrence of a nonterminal on the right hand side of a production which is followed by something If A αbβ, then First(β){ε} Follow(B) 3). Look at N on the RHS that is not followed by anything, if (A αb) or (A αbβ and ε First(β)), then Follow(A) Follow(B)
123 Informal Interpretation of First and Follow Sets First set of X (focus on LHS) If X T, then terminal symbol itself If X Y Z, then add First(Y) If X ε, then add ε Follow set of X (focus on RHS) If X is start symbol, add $ If... X Y, then add First(Y) If Y X, then add Follow(Y)
124 For the example CS2210: Compiler Construction E T X X + E ε T int Y ( E ) Y * T ε First set (focus on LHS) E T X X + E X ε T int Y T ( E ) Y * T Y ε Follow set (focus on RHS) $ T ( E ) X + E E T X Y * T T int Y
125 Example CS2210: Compiler Construction Symbol ( ) + int Y X T E E T X X + E ε T int Y ( E ) Y * T ε First ( ) + int *, ε +, ε (, int (, int Symbol E X T Y Follow $,) $,) $,),+ $,),+
126 Construction of LL(1) Parse Table To construct the parse table, we check each A α For each terminal a First(α), add A α to M[A,a]. If ε First(α), then for each terminal b Follow(A), add A α to M[A,b].
127 Example CS2210: Compiler Construction E T X X + E ε T int Y ( E ) Y * T ε Symbol ( ) + int Y X T E First ( ) + int *, ε +, ε (, int (, int Symbol E X T Y Follow $,) $,) $,),+ $,),+ Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
128 Determine if Grammar G is LL(1) Observation If a grammar is LL(1), then each of its LL(1) table entry contains at most one rule. Otherwise, it is not LL(1). Two methods to determine if a grammar is LL(1) or not (1) Construct LL(1) table, and check if there is a multirule entry or (2) Checking each rule as if the table is getting constructed. G is LL(1) iff for a rule A α β First(α) First(β) = φ at most one of α and β can derive ε If β derives ε, then First(α) Follow(A) = φ
129 NonLL(1) Grammars Suppose a grammar is not LL(1). What then? (1) The language may still be LL(1). Try to massage grammar to LL(1) grammar: Apply leftfactoring Apply leftrecursion removal Try to remove ambiguity in grammar: Encode precendence into rules Encode associativity into rules (2) If (1) fails, language may not be LL(1) Impossible to resolve conflict at the grammar level Programmer chooses which rule to use for conflicting entry (if choosing that rule is always semantically correct) Otherwise, use a more powerful parser (e.g. LL(k), LR(1))
130 NonLL(1) Grammar Example Some grammars are not LL(1) even after leftfactoring and leftrecursion removal S if C then S if C then S else S a (other statements) C b change to S if C then S X a X else S ε C b problem sentence: if b then if b then a else a else First(X) First(X)ε Follow(S) X else... ε else Follow(X) Such grammars are potentially ambiguous
131 NonLL(1) Even After Removing Ambiguity For the ifthenelse example, how would you rewrite it? S if C then S if C then S2 else S S2 if C then S2 else S2 a C b Now grammar is unambiguous but it is not LL(k) for any k Must lookahead until matching else to choose rule for S That lookahead may be an arbirary number of tokens with arbitrary nesting of ifthenelse statements Some languages are not LL(1) or even LL(k) No amount of lookahead can tell parser whether this statement is an ifthen or ifthenelse Fortunately in this case, can resolve conflicts in table Always choose X else S over X ε on else token Semantically correct, since else should match closest if
132 LL(1) Time and Space Complexity Linear time and space relative to length of input
133 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why?
134 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why? If symbol at top of stack is a terminal: Matched immediately in one step If symbol at top of stack is a nonterminal: Matched in at most N steps, where N = number of rules Since no leftrecursion, cannot apply same rule twice without consuming input
135 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why? If symbol at top of stack is a terminal: Matched immediately in one step If symbol at top of stack is a nonterminal: Matched in at most N steps, where N = number of rules Since no leftrecursion, cannot apply same rule twice without consuming input Space smaller than input (after removing X ε). Why?
136 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why? If symbol at top of stack is a terminal: Matched immediately in one step If symbol at top of stack is a nonterminal: Matched in at most N steps, where N = number of rules Since no leftrecursion, cannot apply same rule twice without consuming input Space smaller than input (after removing X ε). Why? RHS is always longer or equal to LHS Derivation string expands monotonically Derivation string is always shorter than final input string Stack is a subset of derivation string (unmatched portion)
137 Nonrecursive Parser Summary First and Follow sets are used to construct predictive parsing tables Intuitively, First and Follow sets guide the choice of rules For nonterminal A and lookahead t, use the production rule A α where t First(α) OR For nonterminal A and lookahead t, use the production rule A α where ε First(α) and t Follow(A) There can only be ONE such rule or grammar is not LL(1)
138 Questions CS2210: Compiler Construction What is LL(0)? Why LL(2)... LL(k) are not widely used?
Parsing. Roadmap. > Contextfree grammars > Derivations and precedence > Topdown parsing > Leftrecursion > Lookahead > Tabledriven parsing
Roadmap > Contextfree grammars > Derivations and precedence > Topdown parsing > Leftrecursion > Lookahead > Tabledriven parsing The role of the parser > performs contextfree syntax analysis > guides
More informationSyntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38
Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program
More informationDerivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012
Derivations vs Parses Grammar is used to derive string or construct parser Context ree Grammars A derivation is a sequence of applications of rules Starting from the start symbol S......... (sentence)
More informationTop down vs. bottom up parsing
Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs
More informationAbstract Syntax Trees & TopDown Parsing
Review of Parsing Abstract Syntax Trees & TopDown Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree
More information3. Parsing. Oscar Nierstrasz
3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/
More informationTopDown Parsing and Intro to BottomUp Parsing. Lecture 7
TopDown Parsing and Intro to BottomUp Parsing Lecture 7 1 Predictive Parsers Like recursivedescent but parser can predict which production to use Predictive parsers are never wrong Always able to guess
More informationAbstract Syntax Trees & TopDown Parsing
Abstract Syntax Trees & TopDown Parsing Review of Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree
More informationAbstract Syntax Trees & TopDown Parsing
Review of Parsing Abstract Syntax Trees & TopDown Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree
More informationTypes of parsing. CMSC 430 Lecture 4, Page 1
Types of parsing Topdown parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrackfree (predictive)
More informationSyntactic Analysis. TopDown Parsing
Syntactic Analysis TopDown Parsing Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make
More informationTopDown Parsing and Intro to BottomUp Parsing. Lecture 7
TopDown Parsing and Intro to BottomUp Parsing Lecture 7 1 Predictive Parsers Like recursivedescent but parser can predict which production to use Predictive parsers are never wrong Always able to guess
More informationLL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012
Predictive Parsers LL(k) Parsing Can we avoid backtracking? es, if for a given input symbol and given nonterminal, we can choose the alternative appropriately. his is possible if the first terminal of
More informationCS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)
CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries
More information8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing
8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string
More informationMonday, September 13, Parsers
Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of nonterminals S is the start symbol P is the set of productions
More informationCS 321 Programming Languages and Compilers. VI. Parsing
CS 321 Programming Languages and Compilers VI. Parsing Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = words Programs = sentences For further information,
More informationCS 314 Principles of Programming Languages
CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution
More informationParsing III. (Topdown parsing: recursive descent & LL(1) )
Parsing III (Topdown parsing: recursive descent & LL(1) ) Roadmap (Where are we?) Previously We set out to study parsing Specifying syntax Contextfree grammars Ambiguity Topdown parsers Algorithm &
More informationCA Compiler Construction
CA4003  Compiler Construction David Sinclair A topdown parser starts with the root of the parse tree, labelled with the goal symbol of the grammar, and repeats the following steps until the fringe of
More informationWednesday, September 9, 15. Parsers
Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda
More informationParsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:
What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda
More informationParsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones
Parsing III (Topdown parsing: recursive descent & LL(1) ) (Bottomup parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,
More informationCompilers. Yannis Smaragdakis, U. Athens (original slides by Sam
Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform
More informationParsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468
Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure
More informationSYNTAX ANALYSIS 1. Define parser. Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing. 2. Mention the basic
More informationWednesday, August 31, Parsers
Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically
More informationSyntax Analysis Part I
Syntax Analysis Part I Chapter 4: ContextFree Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,
More information3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino
3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the
More informationCSCI312 Principles of Programming Languages
Copyright 2006 The McGrawHill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGrawHill
More informationCS 406/534 Compiler Construction Parsing Part I
CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.
More informationBuilding a Parser III. CS164 3:305:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1
Building a Parser III CS164 3:305:00 TT 10 Evans 1 Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka
More informationConcepts Introduced in Chapter 4
Concepts Introduced in Chapter 4 Grammars ContextFree Grammars Derivations and Parse Trees Ambiguity, Precedence, and Associativity Top Down Parsing Recursive Descent, LL Bottom Up Parsing SLR, LR, LALR
More informationParsing Part II (Topdown parsing, leftrecursion removal)
Parsing Part II (Topdown parsing, leftrecursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit
More informationAdministrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:305:00 TT 10 Evans.
Administrativia Building a Parser III CS164 3:305:00 10 vans WA1 due on hu PA2 in a week Slides on the web site I do my best to have slides ready and posted by the end of the preceding logical day yesterday,
More informationAmbiguity, Precedence, Associativity & TopDown Parsing. Lecture 910
Ambiguity, Precedence, Associativity & TopDown Parsing Lecture 910 (From slides by G. Necula & R. Bodik) 9/18/06 Prof. Hilfinger CS164 Lecture 9 1 Administrivia Please let me know if there are continued
More informationCompilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017
Compilerconstructie najaar 2017 http://www.liacs.leidenuniv.nl/~vlietrvan1/coco/ Rudy van Vliet kamer 140 Snellius, tel. 071527 2876 rvvliet(at)liacs(dot)nl college 3, vrijdag 22 september 2017 + werkcollege
More informationBottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.
Bottom up parsing Construct a parse tree for an input string beginning at leaves and going towards root OR Reduce a string w of input to start symbol of grammar Consider a grammar S aabe A Abc b B d And
More informationBottomUp Parsing. Lecture 1112
BottomUp Parsing Lecture 1112 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 BottomUp Parsing Bottomup parsing is more general than topdown parsing And just as efficient
More informationPART 3  SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309
PART 3  SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 64 / 309 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing
More informationTableDriven Parsing
TableDriven Parsing It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than implicitly via recursive calls [1] The nonrecursive parser looks up the production
More informationEDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:
EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 20170911 This lecture Regular expressions Contextfree grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)
More informationSyntax Analysis Check syntax and construct abstract syntax tree
Syntax Analysis Check syntax and construct abstract syntax tree if == = ; b 0 a b Error reporting and recovery Model using context free grammars Recognize using Push down automata/table Driven Parsers
More informationNote that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).
LL(k) Grammars We need a bunch of terminology. For any terminal string a we write First k (a) is the prefix of a of length k (or all of a if its length is less than k) For any string g of terminal and
More informationCompilers. Predictive Parsing. Alex Aiken
Compilers Like recursivedescent but parser can predict which production to use By looking at the next fewtokens No backtracking Predictive parsers accept LL(k) grammars L means lefttoright scan of input
More informationCS502: Compilers & Programming Systems
CS502: Compilers & Programming Systems Topdown Parsing Zhiyuan Li Department of Computer Science Purdue University, USA There exist two wellknown schemes to construct deterministic topdown parsers:
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationCS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back
More information4 (c) parsing. Parsing. Top down vs. bo5om up parsing
4 (c) parsing Parsing A grammar describes syntac2cally legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string
More informationCompiler Design Concepts. Syntax Analysis
Compiler Design Concepts Syntax Analysis Introduction First task is to break up the text into meaningful words called tokens. newval=oldval+12 id = id + num Token Stream Lexical Analysis Source Code (High
More informationVIVA QUESTIONS WITH ANSWERS
VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another languagethe
More informationAmbiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int
Administrivia Ambiguity, Precedence, Associativity & opdown Parsing eam assignments this evening for all those not listed as having one. HW#3 is now available, due next uesday morning (Monday is a holiday).
More informationBottomUp Parsing. Lecture 1112
BottomUp Parsing Lecture 1112 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11
More informationContextFree Languages & Grammars (CFLs & CFGs) Reading: Chapter 5
ContextFree Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular So what happens to the languages which are not regular? Can we still come up with a language recognizer?
More informationCSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D1
CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D1 Agenda LR Parsing Tabledriven Parsers Parser States ShiftReduce and ReduceReduce conflicts UW CSE P 501 Spring 2018
More informationChapter 4. Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem RecursiveDescent Parsing BottomUp Parsing Copyright 2012 AddisonWesley. All rights reserved.
More informationSyntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6Feb2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.
Syntax Analysis Prof. James L. Frankel Harvard University Version of 6:43 PM 6Feb2018 Copyright 2018, 2015 James L. Frankel. All rights reserved. ContextFree Grammar (CFG) terminals nonterminals start
More informationMore BottomUp Parsing
More BottomUp Parsing Lecture 7 Dr. Sean Peisert ECS 142 Spring 2009 1 Status Project 1 Back By Wednesday (ish) savior lexer in ~cs142/s09/bin Project 2 Due Friday, Apr. 24, 11:55pm My office hours 3pm
More informationOptimizing Finite Automata
Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states
More informationCSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Introduction  Language implementation systems must analyze source code, regardless of the specific implementation approach  Nearly all syntax analysis is based on
More informationCS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).
CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language
More informationIntroduction to parsers
Syntax Analysis Introduction to parsers Contextfree grammars Pushdown automata Topdown parsing LL grammars and parsers Bottomup parsing LR grammars and parsers Bison/Yacc  parser generators Error
More informationLecture 7: Deterministic BottomUp Parsing
Lecture 7: Deterministic BottomUp Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general
More informationPart 3. Syntax analysis. Syntax analysis 96
Part 3 Syntax analysis Syntax analysis 96 Outline 1. Introduction 2. Contextfree grammar 3. Topdown parsing 4. Bottomup parsing 5. Conclusion and some practical considerations Syntax analysis 97 Structure
More informationContextfree grammars
Contextfree grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals  tokens nonterminals  represent higherlevel structures of a program start symbol,
More informationTopic 3: Syntax Analysis I
Topic 3: Syntax Analysis I Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 BackEnd FrontEnd The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis
More informationProperties of Regular Expressions and Finite Automata
Properties of Regular Expressions and Finite Automata Some token patterns can t be defined as regular expressions or finite automata. Consider the set of balanced brackets of the form [[[ ]]]. This set
More informationExtra Credit Question
TopDown Parsing #1 Extra Credit Question Given this grammar G: E E+T E T T T * int T int T (E) Is the string int * (int + int) in L(G)? Give a derivation or prove that it is not. #2 Revenge of Theory
More informationExample CFG. Lectures 16 & 17 BottomUp Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.
Example CFG Lectures 16 & 17 BottomUp Parsing CS 241: Foundations of Sequential Programs Fall 2016 1. Sʹ " S 2. S " AyB 3. A " ab 4. A " cd Matt Crane University of Waterloo 5. B " z 6. B " wz 2 LL(1)
More informationPESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru 100 Department of Computer Science and Engineering
TEST 1 Date : 24 02 2015 Marks : 50 Subject & Code : Compiler Design ( 10CS63) Class : VI CSE A & B Name of faculty : Mrs. Shanthala P.T/ Mrs. Swati Gambhire Time : 8:30 10:00 AM SOLUTION MANUAL 1. a.
More informationThe analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.
COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another languagethe target
More informationCompiler Construction: Parsing
Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Contextfree grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax
More informationChapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.
Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive Descent Parsing BottomUp parsing 2 Language Implementation Compilation There are three possible approaches
More informationIntroduction to Parsing. Comp 412
COMP 412 FALL 2010 Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make
More informationSyntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay
Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
KATHMANDU UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING REPORT ON NONRECURSIVE PREDICTIVE PARSER Fourth Year First Semester Compiler Design Project Final Report submitted
More informationLecture 8: Deterministic BottomUp Parsing
Lecture 8: Deterministic BottomUp Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general
More informationHomework & Announcements
Homework & nnouncements New schedule on line. Reading: Chapter 18 Homework: Exercises at end Due: 11/1 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 25 COS 140: Foundations of
More informationSyntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill
Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. HernandezCampos, and D. Stotts. The Big Picture Character
More informationSection A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.
Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or
More informationParsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.
Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationCOMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar
COMP Lecture Parsing September, 00 Prelude What is the common name of the fruit Synsepalum dulcificum? Miracle fruit from West Africa What is special about miracle fruit? Contains a protein called miraculin
More informationLexical and Syntax Analysis. TopDown Parsing
Lexical and Syntax Analysis TopDown Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax
More informationSyntax Analysis/Parsing. Contextfree grammars (CFG s) Contextfree grammars vs. Regular Expressions. BNF description of PL/0 syntax
Susan Eggers 1 CSE 401 Syntax Analysis/Parsing Contextfree grammars (CFG s) Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax
More informationCompilation Lecture 3: Syntax Analysis: TopDown parsing. Noam Rinetzky
Compilation 03683133 Lecture 3: Syntax Analysis: TopDown parsing Noam Rinetzky 1 Recursive descent parsing Define a function for every nonterminal Every function work as follows Find applicable production
More informationCompiler Design 1. TopDown Parsing. Goutam Biswas. Lect 5
Compiler Design 1 TopDown Parsing Compiler Design 2 Nonterminal as a Function In a topdown parser a nonterminal may be viewed as a generator of a substring of the input. We may view a nonterminal
More informationReview of CFGs and Parsing II Bottomup Parsers. Lecture 5. Review slides 1
Review of CFGs and Parsing II Bottomup Parsers Lecture 5 1 Outline Parser Overview opdown Parsers (Covered largely through labs) Bottomup Parsers 2 he Functionality of the Parser Input: sequence of
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Parsing CMSC 330  Spring 2017 1 Recall: Front End Scanner and Parser Front End Token Source Scanner Parser Stream AST Scanner / lexer / tokenizer converts
More informationParsing Part II. (Ambiguity, Topdown parsing, Leftrecursion Removal)
Parsing Part II (Ambiguity, Topdown parsing, Leftrecursion Removal) Ambiguous Grammars Definitions If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous
More informationPrelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science
Prelude COMP Lecture Topdown Parsing September, 00 What is the Tufts mascot? Jumbo the elephant Why? P. T. Barnum was an original trustee of Tufts : donated $0,000 for a natural museum on campus Barnum
More informationFall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich BenGurion University
Fall 20142015 Compiler Principles Lecture 3: Parsing part 2 Roman Manevich BenGurion University Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Lowering
More informationSyntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011
Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token
More informationCOP 3402 Systems Software Syntax Analysis (Parser)
COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis
More informationCSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Phases of a compiler Syntactic structure Figure 1.6, page 5 of text Recap Lexical analysis: LEX/FLEX (regex > lexer) Syntactic analysis:
More informationLexical and Syntax Analysis
Lexical and Syntax Analysis (of Programming Languages) TopDown Parsing Lexical and Syntax Analysis (of Programming Languages) TopDown Parsing Easy for humans to write and understand String of characters
More informationParser Generation. BottomUp Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottomup  from leaves to the root
Parser Generation Main Problem: given a grammar G, how to build a topdown parser or a bottomup parser for it? parser : a program that, given a sentence, reconstructs a derivation for that sentence 
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 201213 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:
More informationChapter 3. Parsing #1
Chapter 3 Parsing #1 Parser source file get next character scanner get token parser AST token A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs)
More information