CS2210: Compiler Construction Syntax Analysis Syntax Analysis
|
|
- Cameron Johnston
- 6 years ago
- Views:
Transcription
1
2 Comparison with Lexical Analysis The second phase of compilation Phase Input Output Lexer string of characters string of tokens Parser string of tokens Parse tree/ast
3 What Parse Tree? CS2210: Compiler Construction A parse tree represents the program structure of the input Programming language constructs usually have recursive structures If-stmt if (EXPR) then Stmt else Stmt fi Stmt If-stmt While-stmt...
4 A Parse Tree Example Code to be compiled:... if x==y then... else... fi Lexer: Parser: Input: sequence of tokens... IF ID==ID THEN... ELSE... FI Desired output: IF-STMT == STMT STMT ID ID
5 How to Specify a Syntax How can we specify a syntax with nested structures? Is it possible to use RE/FA? RE(Regular Expression) FA(Finite Automata)
6 How to Specify a Syntax How can we specify a syntax with nested structures? Is it possible to use RE/FA? RE(Regular Expression) FA(Finite Automata) RE/FA is not powerful enough Example: matching parenthesis: # of ( equals # of ) (x+y)*z ((x+y)+y)*z... (...(((x+y)+y)+y)...) ((x+y)+y)+y)*z
7 RE/FA is Not Powerful Enough Pumping Lemma for Regular Languages In FA, all sufficiently long words may be pumped that is, have a substring repeated an arbitrary number of times If an FA is to accept a string longer than the number of states, there must be a set of states that are repeated (q s... q t in above picture) Let y be the substring that is repeated. Then all strings xy i z (i >= 0) are also accepted and is part of language
8 RE/FA is Not Powerful Enough Matching parentheses is not a Regular Language Suppose it is RL and there is an FA for this language Now construct a string with matching ()s long enough such that there are more (s than the number of states Then, there would be a pumping y filled entirely with ( Then, any string xy i z will also be accepted by FA but not be matching Contradiction QED Similarly, any language with nested structures is not an RL
9 What Language is C Syntax? An example of a Context Free Language (CFL) A broader category of languages that includes languages with nested structures Before discussing CFL, we need to learn a more general way of specifying languages than RE, called Grammars Can specify both RL and CFL and more...
10 Grammar General Language Specification Recall language definition Language set of strings over alphabet Alphabet: finite set of symbols Sentences: strings in the language A grammar is a general way to specify a language E.g. The English grammar specifies the English language
11 An Example CS2210: Compiler Construction Language L = { any string with 00 at the end } 1 A 0 B 0 C 0 Grammar G = { T, N, s, δ } where T = { 0, 1 }, N = { A, B }, s = A, and grammar rule set δ = { A 0A 1A 0B, B 0 } Derivation: from grammar to language A 0A 00B 000 A 1A 10B 100 A 0A 00A 000B 0000 A 0A 01A...
12 Grammar CS2210: Compiler Construction A grammar consists of 4 components (T, N, s, δ) T set of terminal symbols Essentially tokens leaves in the parse tree N set of non-terminal symbols Internal nodes in the parse tree One or more symbols grouped into higher construct example: declaration, statement, loop,... s the non-terminal start symbol where derivation starts δ a set of production rules LHS RHS : left-hand-side produces right-hand-side
13 Production Rule and Derivation Production Rule: LHS RHS Meaning: LHS can be replaced with RHS A derivation a series of applications of production rules Derivation: β α Meaning: String α is derived from β β α 1 step β α 0 or more steps β = α 0 or more steps example: A 0A 00B 000 A = 000 A = + 000
14 Noam Chomsky Grammars Classification of languages based on form of grammar rules Four(4) types of grammars: Type 0 recursive grammar Type 1 context sensitive grammar Type 2 context free grammar Type 3 regular grammar
15 Type 0: Unrestricted Grammar Type 0 grammar unrestricted or recursively enumerable Form of rules α β where α (N T ) +, β (N T ) Implied restrictions: LHS: no ε allowed Example: ab acd ; LHS is shorter than RHS aab ab ; LHS is longer than RHS A ε ; ε-productions are allowed Computational complexity: unbounded Derivation strings may contract and expand repeatedly (Since LHS may be longer or shorter than RHS) Unbounded number of productions before target string L(Unrestricted) L(Turing Machine)
16 Type 1: Context Sensitive Grammar Type 1 grammar context sensitive grammar Form of rules αaβ αγβ where A N, α, β (N T ), γ (N T ) + Replace A by γ only if found in the context of α and β Implied restrictions: LHS: shorter or equal to RHS RHS: no ε allowed Example: aab acb ; Replace A with C when in between a and B A C ; Replace A with C regardless of context Computational complexity: likely NP-Complete Derivation strings may only expand Bounded number of derivations before target string L(CSG) L(Linear Bounded Turing Machine)
17 Type 2: Context Free Grammar Type 2 grammar context free grammar Form of rules A γ where A N, γ (N T ) + Replace A by γ (no context can be specified) Implied restrictions: LHS: a single non-terminal RHS: no ε allowed Sometimes relaxed to simplify grammar but rules can always be rewritten to exclude ε-productions Example: A abc ; Replace A with abc regardless of context Computational complexity: Polynomial O(n ), but most real world CFGs are O(n) L(CFG) L(Pushdown Automaton)
18 Type 3: Regular Grammar Type 3 grammar regular grammar Form of rules A α, or A αb where A, B N, α T In terms of FA: Move from state A to state B on input α Implied restrictions: LHS: a single non-terminal RHS: a terminal or a terminal followed by a non-terminal Example: A 1A 0 Computational complexity: Linear O(n) Derivation string length increases by 1 at each step L(RG) L(Finite Automaton) L(Regular Expression)
19 Differentiating Type 2 (RG) and 3 (CFG) Language L1 = { [ i ] j i, j >= 1} Regular grammar S [ S [ T T ] T ] Language L2 = { [ i ] i i >= 1} Context free grammar S [ S ] [ ]
20 Differentiating Type 1 (CSG) and 2 (CFG) Type 2 grammar (context free) S D U D int x; int y; U x=1; y=1; Type 1 grammar (context sensitive) S D U D int x; int y; int x; U int x; x=1; int y; U int y; y=1;
21 Differentiating Type 1 (CSG) and 2 (CFG) Language from type 2 (CFG): S DU int x; U int x; x=1; S DU int x; U int x; y=1; S DU int y; U int y; x=1; S DU int y; U int y; y=1; Language from type 1 (CSG): S DU int x; U int x; x=1; S DU int y; U int y; y=1;
22 Differentiating Type 1 (CSG) and 2 (CFG) Language from type 2 (CFG): S DU int x; U int x; x=1; S DU int x; U int x; y=1; S DU int y; U int y; x=1; S DU int y; U int y; y=1; Language from type 1 (CSG): S DU int x; U int x; x=1; S DU int y; U int y; y=1; If PLs are context-sensitive, why use CFGs for parsing? CSG parsers are provably inefficient (PSPACE-Complete) Most PL constructs are context-free: If-stmt, declarations The remaining context-sensitive constructs can be analyzed at the semantic analysis stage (e.g. def-before-use, matching formal/actual parameters)
23 Language Classification Regular Grammar CFG CSG Recursive Grammar Type 0 Type 1 Type 2 Type 3
24 Language Classification Regular Grammar CFG CSG Recursive Grammar Type 0 Type 1 Type 2 Type 3 However, L y L x where L x :[ i ] k RG, L y :[ i ] i CFG Is it a problem?
25 Context Free Grammars
26 Grammar and Grammar is used to derive string or construct parser A derivation is a sequence of applications of rules Starting from the start symbol S (sentence) Leftmost and Rightmost derivations At each derivation step, leftmost derivation always replaces the leftmost non-terminal symbol Rightmost derivation always replaces the rightmost one
27 Examples CS2210: Compiler Construction E E * E E + E ( E ) id leftmost derivation E E + E E * E + E id * E + E id * id + E... id * id + id * id rightmost derivation E E + E E + E * E E + E * id E + id * id... id * id + id * id
28 Parse Trees CS2210: Compiler Construction Parse tree structure Describes program structure (defined by the rules applied) Internal nodes are non-terminals, leaves are terminals Structure depends on what production rules were applied Grammars should specify precisely what rules to apply to avoid ambiguity in program structure Structure does not depend on order of application Same tree for previous rightmost/leftmost derivations Order of rule application is implementation dependent
29 Parse Trees CS2210: Compiler Construction Parse tree structure Describes program structure (defined by the rules applied) Internal nodes are non-terminals, leaves are terminals Structure depends on what production rules were applied Grammars should specify precisely what rules to apply to avoid ambiguity in program structure Structure does not depend on order of application Same tree for previous rightmost/leftmost derivations Order of rule application is implementation dependent E E + E E * E E * E id id id id
30 Different Parse Trees Consider the string id * id + id * id can result in 3 different trees (and more) for this grammar E E + E E E * E E E * E E * E E * E id E + E id E * E id id id id id E * E E + E id id id id id
31 Ambiguity CS2210: Compiler Construction A grammar G is ambiguous if there exist a string str L(G) such that more than one parse tree derives str In practice, we prefer unambiguous grammars Fortunately, ambiguity is (often) the property of a grammar and not the language It is (often) possible to rewrite grammar to remove ambiguity Programming languages are designed not to be ambiguous it s the job of compiler writers to express those rules accurately in the grammar
32 How to Remove Ambiguity Step I: Specify precedence Build precedence into grammar by recursion on a different non-terminal for each precedence level. Insight: Lower precedence tend to be higher in tree (close to root) Higher precedence tend to be lower in tree (far from root) Place lower precedence non-terminals higher up in the tree E E + E E - E E * E E / E E E ( E ) id Rewrite it to: E E + T E - T T ; Lowest precedence: +, - T T * F T / F F ; Middle precedence: *, / F B F B ; Highest precedence: B id ( E ) ; Even higher precedence: ()
33 How to Remove Ambiguity Step II: Specify associativity Allow recursion only on either left or right non-terminal Left associative recursion on left non-terminal Right associative recursion on right non-terminal In the previous example, E E + E... ; allows both left/right associativity Rewrite it to: E E + T... F B F... ; + is left associative ; exponent is right associative
34 Properties of Context Free Grammars Decidable: computable using a Turing Machine In other words, guaranteed an answer in finite time E.g. if a string is in a context free language: decidable In fact we can run a CFL parser in polynomial time If a CFG is ambiguous: Undecidable In general, impossible to tell if a CFG is ambiguous Can only check whether it is ambiguous for a given string In practice, parser generators like Yacc check for a more restricted grammar (e.g. LALR(1)) instead LALR(1) is a subset of unambiguous grammars Whether grammar is LALR(1) is easily decidable If two CFGs generate same language: Undecidable Impossible to tell if language changed by tweaking grammar Parsers are regression tested against a test set frequently
35 From Grammar to Parser What exactly is parsing, or syntax analysis? To process an input string for a given grammar, and compose the derivation if the string is in the language Two subtasks to determine if string is in the language or not to construct a parse tree (a representation of the derivation)
36 From Grammar to Parser What exactly is parsing, or syntax analysis? To process an input string for a given grammar, and compose the derivation if the string is in the language Two subtasks to determine if string is in the language or not to construct a parse tree (a representation of the derivation) How would you construct a parser from a grammar?
37 Types of Parsers CS2210: Compiler Construction Universal parser Can parse any CFG e.g. Early s algorithm Powerful but can be very inefficient (O(N 3 ) where N is length of string) Top-down parser Starts from root and expands into leaves Tries to expand start symbol to input string Finds leftmost derivation Only works for a certain class of CFGs called LL But provably works in O(n) time Parser code structure closely mimics grammar Amenable to implementation by hand Automated tools exist to convert to code (e.g. ANTLR)
38 Types of Parsers (cont.) Bottom-up parser Starts at leaves and builds up to root Tries to reduce the input string to the start symbol Finds reverse order of the rightmost derivation Works for a wider class of CFGs called LR Also provably works in O(n) time Parser code structure nothing like grammar Very difficult to implement by hand Automated tools exist to convert to code (e.g. Yacc, Bison)
39 What Output do We Want? The output of parsing is parse tree, or abstract syntax tree An abstract syntax tree is abbreviated representation of a parse tree drops some details without compromising meaning some terminal symbols that no longer contribute to semantics are dropped (e.g. parentheses) internal nodes may contain terminal symbols
40 An Example CS2210: Compiler Construction Consider the grammar E int ( E ) E + E and an input 5 + ( ) After lexical analysis, we have a sequence of tokens INT 5 + ( INT 2 + INT 3 )
41 Parse Tree of the Input E E + E INT 5 ( E ) E + E INT 2 INT A parse tree 3 Traces the process of derivation Captures all the rules that were applied but contains redundant information parentheses single-successor nodes
42 Abstract Syntax Tree An Abstract Syntax Tree (AST) for the input PLUS PLUS AST captures all that is semantically relevant AST but is more compact in representation AST abstracts from parse tree (a.k.a. concrete syntax tree) ASTs are used in most compilers rather than parse trees
43 How are ASTs constructed in a compiler? Through implementation of semantic actions Semantic actions: actions triggered on a rule match Already used in lexical analysis to return token tuples For syntactic analysis, involves computing semantic value of whole construct from values of its parts To construct AST, attach an attribute for each symbol X Attributes contain the semantic value for each symbol X.ast the constructed AST for symbol X Then attach semantic actions to production rules, e.g. X Y 1 Y 2...Y n { actions } actions define X.ast using Y i.ast (1 i n)
44 Example CS2210: Compiler Construction For the previous example, we have E int { E.ast = mkleaf(int.lval) } E1 + E2 { E.ast = mkplus(e1.ast, E2.ast) } (E1) { E.ast = E1.ast } Two functions that need to be defined ptr1=mkleaf(n) create a leaf node and assign value n ptr2=mkplus(t1, t2) create a tree node and assign value PLUS, then attach two child trees t1 and t2 ptr1 ptr2 n PLUS t1 t2
45 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation)
46 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation) E1.ast=mkleaf(5) 5
47 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation) E1.ast=mkleaf(5) E2.ast=mkleaf(2) 5 2
48 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation) E1.ast=mkleaf(5) E2.ast=mkleaf(2) E3.ast=mkleaf(3) 5 2 3
49 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation) E4.ast=mkplus(E2.ast, E3.ast) PLUS E1.ast=mkleaf(5) E2.ast=mkleaf(2) E3.ast=mkleaf(3) 5 2 3
50 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation) E4.ast=mkplus(E2.ast, E3.ast) PLUS E1.ast=mkleaf(5) 5 2 3
51 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation) E5.ast=mkplus(E1.ast, E4.ast) PLUS E4.ast=mkplus(E2.ast, E3.ast) PLUS E1.ast=mkleaf(5) 5 2 3
52 AST Construction Steps For input INT 5 + ( INT 2 + INT 3 ) Construction order given is for a bottom-up LR(1) parser (Order can change depending on parser implementation) E5.ast=mkplus(E1.ast, E4.ast) PLUS PLUS 5 2 3
53 Summary CS2210: Compiler Construction Compilers specify program structure using CFG Most programming languages are not context free Context sensitive analysis can easily separate out to semantic analysis phase A parser uses CFG to... answer if an input str L(G)... and build a parse tree... or build an AST instead... and pass it to the rest of compiler
54 Parsing
55 Parsing CS2210: Compiler Construction We will study two approaches Top-down Easier to understand and implement manually Bottom-up More powerful, but hard to implement manually
56 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1)
57 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B
58 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C
59 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C c
60 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D c
61 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D c d
62 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D c d a c b d
63 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B a C b D C c d a c b d
64 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B A a C b D C c d a c b d
65 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B A a C b D C D c d a c b d
66 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) A B A B a C b D C D c d a c b d
67 Example CS2210: Compiler Construction Consider a CFG grammar G S A B A a C B b D D d C c Actually, this language has only one sentence, i.e. L(G) = { acbd } Leftmost Derivation: S AB (1) acb (2) acb (3) acbd (4) acbd (5) S Rightmost Derivation: S AB (5) AbD (4) Abd (3) acbd (2) acbd (1) S A B A B a C b D C D c d a c b d
68 Top Down Parsers CS2210: Compiler Construction Recursive descent parser Implemented using recursive calls to functions that implement the expansion of each non-terminal Goes through all possible expansions by trial-and-error until match with input; backtracks when mismatch detected Simple to implement, but may take exponential time Predictive parser Recursive descent parser with prediction (no backtracking) Predict next rule by looking ahead k number of symbols Restrictions on the grammar to avoid backtracking Only works for a class of grammars called LL(k) Nonrecursive predictive parser Predictive parser with no recursive calls Table driven suitable for automated parser generators
69 Recursive Descent Example E T + E T T int * T int ( E ) input string: start symbol: int * int E initial parse tree is E
70 Recursive Descent Example E T + E T T int * T int ( E ) input string: start symbol: int * int E initial parse tree is E Assume: when there are alternative rules, try right rule first
71 Parsing Sequence (using Backtracking) E
72 Parsing Sequence (using Backtracking) E T pick right most rule E T
73 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E)
74 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E) ( does not match int
75 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level
76 Parsing Sequence (using Backtracking) E T ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level
77 Parsing Sequence (using Backtracking) E T ( E ) int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int
78 Parsing Sequence (using Backtracking) E T ( E ) int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level
79 Parsing Sequence (using Backtracking) E T ( E ) int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level
80 Parsing Sequence (using Backtracking) E T ( E ) int int * T pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int *
81 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E)
82 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level
83 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level
84 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) int * int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level pick next T int
85 Parsing Sequence (using Backtracking) E T ( E ) int int * T int * ( E ) int * int pick right most rule E T pick right most rule T (E) ( does not match int failure, backtrack one level pick next T int int matches input int however, more tokens remain failure, backtrack one level pick next T int * T int * matches input int * pick right most rule T (E) ( does not match input int failure, backtrack one level pick next T int match, accept
86 Recursive Descent Parsing uses Backtracking Approach: for a non-terminal in the derivation, productions are tried in some order until A production is found that generates a portion of the input, or No production is found that generates a portion of the input, in which case backtrack to previous non-terminal Terminals of the derivation are compared against input Match advance input, continue parsing Mismatch backtrack, or fail Parsing fails if no production for the start symbol generates the entire input
87 Implementation CS2210: Compiler Construction Create a procedure for each non-terminal 1. For RHS of each production rule, a. For a terminal, match with input symbol and consume b. For a non-terminal, call procedure for that non-terminal c. If match succeeds for entire RHS, return success d. If match fails, regurgitate input and try next production rule 2. If match succeeds for any rule, apply that rule to LHS
88 Sample Code CS2210: Compiler Construction Sample implementation of parser for previous grammar: E T + E T T int * T int ( E ) int match(int token) { return getnext() == token; } int expr() { R1: if (!term() ) goto R2; if (!match(addnum) ) goto R2; if (!expr() ) goto R2; return 1; R2: /* backtrack R1 */ if (!term() ) goto Fail: return 1; Fail: /* backtrack R2 */ return 0; } int term() { R1: if (!match(intnum) ) goto R2; if (!match(starnum) ) goto R2; if (!term() ) goto R2; return 1; R2: /* backtrack R1 */ if (!match(intnum) ) goto R3: return 1; R3: /* backtrack R2 */ if (!match(leftparennum) ) goto Fail; if (!expr() ) goto Fail; if (!match(rightparennum) ) goto Fail; return 1; Fail: /* backtrack R3 */ return 0; }
89 Left Recursion Problem Recursive descent doesn t work with left recursion Right recursion is okay Why is left recursion a problem? For left recursive grammar A A b c We may repeatedly choose to apply A b A A b A b b... Sentence can grow indefinitely w/o consuming input How do you know when to stop recursion and choose c?
90 Left Recursion Problem Recursive descent doesn t work with left recursion Right recursion is okay Why is left recursion a problem? For left recursive grammar A A b c We may repeatedly choose to apply A b A A b A b b... Sentence can grow indefinitely w/o consuming input How do you know when to stop recursion and choose c? Rewrite the grammar so that it is right recursive Which expresses the same language
91 Remove Left Recursion In general, we can eliminate all immediate left recursion A A x y change to A y A A x A ε Not all left recursion is immediate may be hidden in multiple production rules A BC D B AE F... see Section 4.3 for elimination of general left recursion... (not required for this course)
92 Summary of Recursive Descent Recursive descent is a simple and general parsing strategy Left-recursion must be eliminated first Can be eliminated automatically as in previous slide L(Recursive_Descent) L(CFG) CFL However it is not popular because of backtracking Backtracking requires re-parsing the same string Which is inefficient (can take exponential time) Also undoing semantic actions may be difficult E.g. removing already added nodes in parse tree Techniques used in practice do no backtracking... at the cost of restricting the class of grammar
93 Predictive Parsers CS2210: Compiler Construction A top-down parser with no backtracking: choose the appropriate production given next input terminal If first terminal of every alternative production is unique A a B D b B B B c b c e D d parsing input abced requires no backtracking
94 Predictive Parsers CS2210: Compiler Construction A top-down parser with no backtracking: choose the appropriate production given next input terminal If first terminal of every alternative production is unique A a B D b B B B c b c e D d parsing input abced requires no backtracking Rewriting grammars for predictive parsing Left factor to enable prediction A αβ αγ can be changed to A α A A β γ Eliminate left recursion, just like for recursive descent
95 LL(k) Parsers CS2210: Compiler Construction LL(k) Parser A predictive parser that uses k lookahead tokens L left to right scan L leftmost derivation k k symbols of lookahead LL(k) Grammar A grammar that can parsed using a LL(k) parser LL(k) Language A language that can be expressed as a LL(k) grammar LL(k) languages are a restricted subset of CFLs But many languages are LL(k).. in fact many are LL(1)! Can be implemented in a recursive or nonrecursive fashion
96 Nonrecursive Predictive Parser input tokens $ read head syntax stack TOP parser driver output $ parse table M[N,T] Syntax stack holds unmatched portion of derivation string Parser driver next action based on (current token, stack top) Parse table M[A,b] an entry containing rule A... or error Table-driven: amenable to automatic code generation (just like lexers)
97 A Sample LL(1) Parse Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε Implementation with 2D parse table First column lists all non-terminals First row lists all possible terminals and $ A table entry contains one production One action for each (non-terminal, input) combination It predicts the correct action based on one lookahead No backtracking required
98 Algorithm for Parsing X symbol at the top of the syntax stack a current input symbol Parsing based on (X,a) If X==a==$, then parser halts with success If X==a!=$, then pop X from stack and advance input head If X!=a, then Case (a): if X T, then parser halts with failed, input rejected Case (b): if X N, M[X,a] = X RHS pop X and push RHS to stack in reverse order
99 then CS2210: Compiler Construction Push RHS in Reverse Order X symbol at the top of the syntax stack a current input symbol if M[X,a] = X B c D X... $ B c D... $
100 Applying LL(1) Parsing to a Grammar Given our old grammar E T + E T T int * T int ( E ) No left recursion But require left factoring After rewriting grammar, we have E T X X + E ε T int Y ( E ) Y * T ε
101 Using the Parse Table To recognize int * int input int * int $ parser driver E $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
102 Using the Parse Table To recognize int * int input int * int $ parser driver E $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
103 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
104 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
105 Using the Parse Table To recognize int * int input int * int $ parser driver int Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
106 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
107 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
108 Using the Parse Table To recognize int * int input int * int $ parser driver * T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
109 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
110 Using the Parse Table To recognize int * int input int * int $ parser driver T X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
111 Using the Parse Table To recognize int * int input int * int $ parser driver int Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
112 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
113 Using the Parse Table To recognize int * int input int * int $ parser driver Y X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
114 Using the Parse Table To recognize int * int input int * int $ parser driver X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
115 Using the Parse Table To recognize int * int input int * int $ parser driver X $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
116 Using the Parse Table To recognize int * int input int * int $ parser driver $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
117 Using the Parse Table To recognize int * int input int * int $ parser driver Accept!!! $ Stack Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
118 Recognition Sequence It is possible to write in a action list Stack Input Action E $ int * int $ E TX T X $ int * int $ T int Y int Y X $ int * int $ match Y X $ * int $ Y * T * T X $ * int $ match T X $ int $ T int Y int Y X $ int $ match Y X $ $ Y ε X $ $ X ε $ $ halt and accept
119 Constructing the Parse Table Need to know 2 sets First(A): set of terminals that can appear first in A Follow(A): set of terminals that can appear following A
120 Intuitive Meaning of First and Follow S α A a β c γ c First(A) a Follow(A) Why is the Follow Set important?
121 First(α) CS2210: Compiler Construction First(α) = set of terminals that start string of terminals derived from α. Apply followsing rules until no terminal or ε can be added 1). If t T, then First(t)={t}. For example First(+)={+}. 2). If X N and X ε exists, then add ε to First(X). For example, First(Y) = {*, ε}. 3). If X N and X Y 1 Y 2 Y 3...Y m, then Add (First(Y 1 ) - ε) to First(X). If First(Y 1 ),..., First(Y k 1 ) all contain ε, then add ( P 1 i k First(Y i) ε) to First(X). If First(Y 1 ),..., First(Y m) all contain ε, then add ε to First(X).
122 Follow(α) CS2210: Compiler Construction Follow(α) = {t S αtβ} Intuition: if X A B, then First(B) Follow(A) little trickier because B may be ε i.e. B * ε Apply followsing rules until no terminal or ε can be added 1). $ Follow(S), where S is the start symbol. e.g. Follow(E) = {$... }. 2). Look at the occurrence of a non-terminal on the right hand side of a production which is followed by something If A αbβ, then First(β)-{ε} Follow(B) 3). Look at N on the RHS that is not followed by anything, if (A αb) or (A αbβ and ε First(β)), then Follow(A) Follow(B)
123 Informal Interpretation of First and Follow Sets First set of X (focus on LHS) If X T, then terminal symbol itself If X Y Z, then add First(Y) If X ε, then add ε Follow set of X (focus on RHS) If X is start symbol, add $ If... X Y, then add First(Y) If Y X, then add Follow(Y)
124 For the example CS2210: Compiler Construction E T X X + E ε T int Y ( E ) Y * T ε First set (focus on LHS) E T X X + E X ε T int Y T ( E ) Y * T Y ε Follow set (focus on RHS) $ T ( E ) X + E E T X Y * T T int Y
125 Example CS2210: Compiler Construction Symbol ( ) + int Y X T E E T X X + E ε T int Y ( E ) Y * T ε First ( ) + int *, ε +, ε (, int (, int Symbol E X T Y Follow $,) $,) $,),+ $,),+
126 Construction of LL(1) Parse Table To construct the parse table, we check each A α For each terminal a First(α), add A α to M[A,a]. If ε First(α), then for each terminal b Follow(A), add A α to M[A,b].
127 Example CS2210: Compiler Construction E T X X + E ε T int Y ( E ) Y * T ε Symbol ( ) + int Y X T E First ( ) + int *, ε +, ε (, int (, int Symbol E X T Y Follow $,) $,) $,),+ $,),+ Table int * + ( ) $ E E TX E TX X X +E X ε X ε T T int Y T (E) Y Y *T Y ε Y ε Y ε
128 Determine if Grammar G is LL(1) Observation If a grammar is LL(1), then each of its LL(1) table entry contains at most one rule. Otherwise, it is not LL(1). Two methods to determine if a grammar is LL(1) or not (1) Construct LL(1) table, and check if there is a multi-rule entry or (2) Checking each rule as if the table is getting constructed. G is LL(1) iff for a rule A α β First(α) First(β) = φ at most one of α and β can derive ε If β derives ε, then First(α) Follow(A) = φ
129 Non-LL(1) Grammars Suppose a grammar is not LL(1). What then? (1) The language may still be LL(1). Try to massage grammar to LL(1) grammar: Apply left-factoring Apply left-recursion removal Try to remove ambiguity in grammar: Encode precendence into rules Encode associativity into rules (2) If (1) fails, language may not be LL(1) Impossible to resolve conflict at the grammar level Programmer chooses which rule to use for conflicting entry (if choosing that rule is always semantically correct) Otherwise, use a more powerful parser (e.g. LL(k), LR(1))
130 Non-LL(1) Grammar Example Some grammars are not LL(1) even after left-factoring and left-recursion removal S if C then S if C then S else S a (other statements) C b change to S if C then S X a X else S ε C b problem sentence: if b then if b then a else a else First(X) First(X)-ε Follow(S) X else... ε else Follow(X) Such grammars are potentially ambiguous
131 Non-LL(1) Even After Removing Ambiguity For the if-then-else example, how would you rewrite it? S if C then S if C then S2 else S S2 if C then S2 else S2 a C b Now grammar is unambiguous but it is not LL(k) for any k Must lookahead until matching else to choose rule for S That lookahead may be an arbirary number of tokens with arbitrary nesting of if-then-else statements Some languages are not LL(1) or even LL(k) No amount of lookahead can tell parser whether this statement is an if-then or if-then-else Fortunately in this case, can resolve conflicts in table Always choose X else S over X ε on else token Semantically correct, since else should match closest if
132 LL(1) Time and Space Complexity Linear time and space relative to length of input
133 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why?
134 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why? If symbol at top of stack is a terminal: Matched immediately in one step If symbol at top of stack is a non-terminal: Matched in at most N steps, where N = number of rules Since no left-recursion, cannot apply same rule twice without consuming input
135 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why? If symbol at top of stack is a terminal: Matched immediately in one step If symbol at top of stack is a non-terminal: Matched in at most N steps, where N = number of rules Since no left-recursion, cannot apply same rule twice without consuming input Space smaller than input (after removing X ε). Why?
136 LL(1) Time and Space Complexity Linear time and space relative to length of input Time each input symbol is consumed within a constant number of steps. Why? If symbol at top of stack is a terminal: Matched immediately in one step If symbol at top of stack is a non-terminal: Matched in at most N steps, where N = number of rules Since no left-recursion, cannot apply same rule twice without consuming input Space smaller than input (after removing X ε). Why? RHS is always longer or equal to LHS Derivation string expands monotonically Derivation string is always shorter than final input string Stack is a subset of derivation string (unmatched portion)
137 Nonrecursive Parser Summary First and Follow sets are used to construct predictive parsing tables Intuitively, First and Follow sets guide the choice of rules For non-terminal A and lookahead t, use the production rule A α where t First(α) OR For non-terminal A and lookahead t, use the production rule A α where ε First(α) and t Follow(A) There can only be ONE such rule or grammar is not LL(1)
138 Questions CS2210: Compiler Construction What is LL(0)? Why LL(2)... LL(k) are not widely used?
Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing
Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides
More informationSyntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38
Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program
More informationDerivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012
Derivations vs Parses Grammar is used to derive string or construct parser Context ree Grammars A derivation is a sequence of applications of rules Starting from the start symbol S......... (sentence)
More informationTop down vs. bottom up parsing
Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs
More informationAbstract Syntax Trees & Top-Down Parsing
Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree
More information3. Parsing. Oscar Nierstrasz
3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/
More informationTop-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7
Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess
More informationAbstract Syntax Trees & Top-Down Parsing
Abstract Syntax Trees & Top-Down Parsing Review of Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree
More informationAbstract Syntax Trees & Top-Down Parsing
Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree
More informationTypes of parsing. CMSC 430 Lecture 4, Page 1
Types of parsing Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive)
More informationSyntactic Analysis. Top-Down Parsing
Syntactic Analysis Top-Down Parsing Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make
More informationTop-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7
Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess
More informationLL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012
Predictive Parsers LL(k) Parsing Can we avoid backtracking? es, if for a given input symbol and given nonterminal, we can choose the alternative appropriately. his is possible if the first terminal of
More informationCS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)
CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries
More information8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing
8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string
More informationMonday, September 13, Parsers
Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions
More informationCS 321 Programming Languages and Compilers. VI. Parsing
CS 321 Programming Languages and Compilers VI. Parsing Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = words Programs = sentences For further information,
More informationCS 314 Principles of Programming Languages
CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution
More informationParsing III. (Top-down parsing: recursive descent & LL(1) )
Parsing III (Top-down parsing: recursive descent & LL(1) ) Roadmap (Where are we?) Previously We set out to study parsing Specifying syntax Context-free grammars Ambiguity Top-down parsers Algorithm &
More informationCA Compiler Construction
CA4003 - Compiler Construction David Sinclair A top-down parser starts with the root of the parse tree, labelled with the goal symbol of the grammar, and repeats the following steps until the fringe of
More informationWednesday, September 9, 15. Parsers
Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda
More informationParsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:
What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda
More informationParsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones
Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,
More informationCompilers. Yannis Smaragdakis, U. Athens (original slides by Sam
Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform
More informationParsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468
Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure
More informationSYNTAX ANALYSIS 1. Define parser. Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing. 2. Mention the basic
More informationWednesday, August 31, Parsers
Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically
More informationSyntax Analysis Part I
Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,
More information3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino
3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the
More informationCSCI312 Principles of Programming Languages
Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGraw-Hill
More informationCS 406/534 Compiler Construction Parsing Part I
CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.
More informationBuilding a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1
Building a Parser III CS164 3:30-5:00 TT 10 Evans 1 Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka
More informationConcepts Introduced in Chapter 4
Concepts Introduced in Chapter 4 Grammars Context-Free Grammars Derivations and Parse Trees Ambiguity, Precedence, and Associativity Top Down Parsing Recursive Descent, LL Bottom Up Parsing SLR, LR, LALR
More informationParsing Part II (Top-down parsing, left-recursion removal)
Parsing Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit
More informationAdministrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.
Administrativia Building a Parser III CS164 3:30-5:00 10 vans WA1 due on hu PA2 in a week Slides on the web site I do my best to have slides ready and posted by the end of the preceding logical day yesterday,
More informationAmbiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10
Ambiguity, Precedence, Associativity & Top-Down Parsing Lecture 9-10 (From slides by G. Necula & R. Bodik) 9/18/06 Prof. Hilfinger CS164 Lecture 9 1 Administrivia Please let me know if there are continued
More informationCompilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017
Compilerconstructie najaar 2017 http://www.liacs.leidenuniv.nl/~vlietrvan1/coco/ Rudy van Vliet kamer 140 Snellius, tel. 071-527 2876 rvvliet(at)liacs(dot)nl college 3, vrijdag 22 september 2017 + werkcollege
More informationBottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.
Bottom up parsing Construct a parse tree for an input string beginning at leaves and going towards root OR Reduce a string w of input to start symbol of grammar Consider a grammar S aabe A Abc b B d And
More informationBottom-Up Parsing. Lecture 11-12
Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient
More informationPART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309
PART 3 - SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 64 / 309 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing
More informationTable-Driven Parsing
Table-Driven Parsing It is possible to build a non-recursive predictive parser by maintaining a stack explicitly, rather than implicitly via recursive calls [1] The non-recursive parser looks up the production
More informationEDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:
EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)
More informationSyntax Analysis Check syntax and construct abstract syntax tree
Syntax Analysis Check syntax and construct abstract syntax tree if == = ; b 0 a b Error reporting and recovery Model using context free grammars Recognize using Push down automata/table Driven Parsers
More informationNote that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).
LL(k) Grammars We need a bunch of terminology. For any terminal string a we write First k (a) is the prefix of a of length k (or all of a if its length is less than k) For any string g of terminal and
More informationCompilers. Predictive Parsing. Alex Aiken
Compilers Like recursive-descent but parser can predict which production to use By looking at the next fewtokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input
More informationCS502: Compilers & Programming Systems
CS502: Compilers & Programming Systems Top-down Parsing Zhiyuan Li Department of Computer Science Purdue University, USA There exist two well-known schemes to construct deterministic top-down parsers:
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationCS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back
More information4 (c) parsing. Parsing. Top down vs. bo5om up parsing
4 (c) parsing Parsing A grammar describes syntac2cally legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string
More informationCompiler Design Concepts. Syntax Analysis
Compiler Design Concepts Syntax Analysis Introduction First task is to break up the text into meaningful words called tokens. newval=oldval+12 id = id + num Token Stream Lexical Analysis Source Code (High
More informationVIVA QUESTIONS WITH ANSWERS
VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the
More informationAmbiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int
Administrivia Ambiguity, Precedence, Associativity & op-down Parsing eam assignments this evening for all those not listed as having one. HW#3 is now available, due next uesday morning (Monday is a holiday).
More informationBottom-Up Parsing. Lecture 11-12
Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11
More informationContext-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5
Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular So what happens to the languages which are not regular? Can we still come up with a language recognizer?
More informationCSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1
CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts UW CSE P 501 Spring 2018
More informationChapter 4. Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.
More informationSyntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.
Syntax Analysis Prof. James L. Frankel Harvard University Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved. Context-Free Grammar (CFG) terminals non-terminals start
More informationMore Bottom-Up Parsing
More Bottom-Up Parsing Lecture 7 Dr. Sean Peisert ECS 142 Spring 2009 1 Status Project 1 Back By Wednesday (ish) savior lexer in ~cs142/s09/bin Project 2 Due Friday, Apr. 24, 11:55pm My office hours 3pm
More informationOptimizing Finite Automata
Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states
More informationCSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on
More informationCS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).
CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language
More informationIntroduction to parsers
Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars and parsers Bison/Yacc - parser generators Error
More informationLecture 7: Deterministic Bottom-Up Parsing
Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general
More informationPart 3. Syntax analysis. Syntax analysis 96
Part 3 Syntax analysis Syntax analysis 96 Outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing 5. Conclusion and some practical considerations Syntax analysis 97 Structure
More informationContext-free grammars
Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,
More informationTopic 3: Syntax Analysis I
Topic 3: Syntax Analysis I Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis
More informationProperties of Regular Expressions and Finite Automata
Properties of Regular Expressions and Finite Automata Some token patterns can t be defined as regular expressions or finite automata. Consider the set of balanced brackets of the form [[[ ]]]. This set
More informationExtra Credit Question
Top-Down Parsing #1 Extra Credit Question Given this grammar G: E E+T E T T T * int T int T (E) Is the string int * (int + int) in L(G)? Give a derivation or prove that it is not. #2 Revenge of Theory
More informationExample CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.
Example CFG Lectures 16 & 17 Bottom-Up Parsing CS 241: Foundations of Sequential Programs Fall 2016 1. Sʹ " S 2. S " AyB 3. A " ab 4. A " cd Matt Crane University of Waterloo 5. B " z 6. B " wz 2 LL(1)
More informationPESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering
TEST 1 Date : 24 02 2015 Marks : 50 Subject & Code : Compiler Design ( 10CS63) Class : VI CSE A & B Name of faculty : Mrs. Shanthala P.T/ Mrs. Swati Gambhire Time : 8:30 10:00 AM SOLUTION MANUAL 1. a.
More informationThe analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.
COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target
More informationCompiler Construction: Parsing
Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax
More informationChapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.
Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches
More informationIntroduction to Parsing. Comp 412
COMP 412 FALL 2010 Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make
More informationSyntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay
Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
KATHMANDU UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING REPORT ON NON-RECURSIVE PREDICTIVE PARSER Fourth Year First Semester Compiler Design Project Final Report submitted
More informationLecture 8: Deterministic Bottom-Up Parsing
Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general
More informationHomework & Announcements
Homework & nnouncements New schedule on line. Reading: Chapter 18 Homework: Exercises at end Due: 11/1 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 25 COS 140: Foundations of
More informationSyntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill
Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character
More informationSection A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.
Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or
More informationParsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.
Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationCOMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar
COMP Lecture Parsing September, 00 Prelude What is the common name of the fruit Synsepalum dulcificum? Miracle fruit from West Africa What is special about miracle fruit? Contains a protein called miraculin
More informationLexical and Syntax Analysis. Top-Down Parsing
Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax
More informationSyntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax
Susan Eggers 1 CSE 401 Syntax Analysis/Parsing Context-free grammars (CFG s) Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax
More informationCompilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky
Compilation 0368-3133 Lecture 3: Syntax Analysis: Top-Down parsing Noam Rinetzky 1 Recursive descent parsing Define a function for every nonterminal Every function work as follows Find applicable production
More informationCompiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5
Compiler Design 1 Top-Down Parsing Compiler Design 2 Non-terminal as a Function In a top-down parser a non-terminal may be viewed as a generator of a substring of the input. We may view a non-terminal
More informationReview of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1
Review of CFGs and Parsing II Bottom-up Parsers Lecture 5 1 Outline Parser Overview op-down Parsers (Covered largely through labs) Bottom-up Parsers 2 he Functionality of the Parser Input: sequence of
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Parsing CMSC 330 - Spring 2017 1 Recall: Front End Scanner and Parser Front End Token Source Scanner Parser Stream AST Scanner / lexer / tokenizer converts
More informationParsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)
Parsing Part II (Ambiguity, Top-down parsing, Left-recursion Removal) Ambiguous Grammars Definitions If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous
More informationPrelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science
Prelude COMP Lecture Topdown Parsing September, 00 What is the Tufts mascot? Jumbo the elephant Why? P. T. Barnum was an original trustee of Tufts : donated $0,000 for a natural museum on campus Barnum
More informationFall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University
Fall 2014-2015 Compiler Principles Lecture 3: Parsing part 2 Roman Manevich Ben-Gurion University Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Lowering
More informationSyntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011
Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token
More informationCOP 3402 Systems Software Syntax Analysis (Parser)
COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis
More informationCSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Phases of a compiler Syntactic structure Figure 1.6, page 5 of text Recap Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis:
More informationLexical and Syntax Analysis
Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Easy for humans to write and understand String of characters
More informationParser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root
Parser Generation Main Problem: given a grammar G, how to build a top-down parser or a bottom-up parser for it? parser : a program that, given a sentence, reconstructs a derivation for that sentence ----
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:
More informationChapter 3. Parsing #1
Chapter 3 Parsing #1 Parser source file get next character scanner get token parser AST token A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs)
More information