Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

Size: px
Start display at page:

Download "Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1"

Transcription

1 Syntactic Analysis Chapter 4 Compiler Construction Syntactic Analysis 1

2 Context-free Grammars The syntax of programming language constructs can be described by context-free grammars (CFGs) Relatively simple and widely used More powerful grammars exist Context-sensitive grammars (CSG) Type-0 grammars Both are too complex and inefficient for general use Backus-Naur Form (BNF) and extended BNF (EBNF) are a convenient way to represent CFGs Compiler Construction Syntactic Analysis 2

3 Advantages of CFGs Precise, easy-to-understand syntactic specification of a programming language Efficient parsers can be automatically generated for some classes of CFGs This automatic generation process can reveal ambiguities that might otherwise go undetected during the language design A well-designed grammar makes translation to object code easier Language evolution is expedited by an existing grammatical language description Compiler Construction Syntactic Analysis 3

4 Role of the Syntactic Analyzer Second phase of compilation Input to parser is the output of the lexer Output of parser is (usually) a parse tree source code lexer token get next token parser symbol table Compiler Construction Syntactic Analysis 4

5 Parsers Universal parsers Cocke-Younger-Kasami algorithm Earley s algorithm Both too inefficient for production compilers Normal parsers Work only on subclasses of CFGs Examples: LL, LR, LALR(1) Automated tools available for the popular subclasses Compiler Construction Syntactic Analysis 5

6 Context-free Grammar Context-free Grammar (CFG) is a 4-tuple V N,V T,s,P V N is a set of non-terminal symbols V T is a set of terminal symbols s is a distinguished element of V N called the start symbol P is a set of productions or rules that specify how legal strings are built P V N (V N V T ) Compiler Construction Syntactic Analysis 6

7 CFG Elements Terminals: basic symbols from which strings are formed (typically corresponds to tokens from lexer) Non-terminals: syntactic variables that denote sets of strings and, in particular, denoting language constructs Start symbol: a non-terminal; the set of strings denoted by the start symbol is the language defined by the grammar Productions: set of rules that define how terminals and non-terminals can be combined to form strings in the language A bxy z Compiler Construction Syntactic Analysis 7

8 Example Symbol table interpreter G = V N,V T,s,P V N = {S} V T = {new,id,num,insert,lookup,quit} s = S P : S new id num insert id id num lookup id id quit Compiler Construction Syntactic Analysis 8

9 Example An arithmetic expression language G = V N,V T,s,P V N = {E} V T = {id,+,,(,), } s = E P : E E + E E E (E) E id Compiler Construction Syntactic Analysis 9

10 Notational Conventions (1) Dragon book, pages 166, 167 Terminals Lower-case letters early in the alphabet (a, b, etc.) Operator symbols (+,, etc.) Punctuation symbols (parentheses, commas, etc.) Digits Boldface strings (id, if, etc.) Compiler Construction Syntactic Analysis 10

11 Notational Conventions (2) Non-terminals Upper-case letters early in the alphabet (A, B, etc.) The letter S, if used, is usually the start symbol Lower-case italics names (expr, stmt, etc.) Compiler Construction Syntactic Analysis 11

12 Notational Conventions (3) Grammar symbols (either terminals or non-terminals) Upper-case letters late in the alphabet (X, Y, etc.) Strings of terminals Lower-case letters late in the alphabet (u, v, etc.) Strings of grammar symbols Lower-case Greek letters (α, β, etc.) Useful for representing generic productions Compiler Construction Syntactic Analysis 12

13 Notational Conventions (4) Productions with the same left side can be merged into one production using the symbol A α 1, A α 2,..., A α k becomes A α 1 α 2... α k Unless otherwise indicated, the left side of the first listed production is the start symbol Compiler Construction Syntactic Analysis 13

14 Example A programming language construct stmt ; if ( expr ) stmt else stmt while ( expr ) stmt blk id = expr ; blk { stmt } Compiler Construction Syntactic Analysis 14

15 Derivations Rewrite rule approach A production is treated as a rewriting rule in which a non-terminal on the left side of the production is replaced by the grammar symbols on the right side of the production Begin with the start symbol and through a sequence of derivations produce any string in L(G) Compiler Construction Syntactic Analysis 15

16 Derivation Given the productions A αbβ B λ 1 λ 2...λ n we can derive A αbβ αλ 1 λ 2...λ n β Compiler Construction Syntactic Analysis 16

17 A Derivation Given the productions E E + E E E (E) E id we can derive (id + id): E E (E) (E +E) (id+e) (id+id) Compiler Construction Syntactic Analysis 17

18 Derivations Let α be a set of grammar symbols (terminals and non-terminals) α β means zero or more derivations 1. α α (Base case) 2. If α γ and γ β, then α β (Inductive case) Compiler Construction Syntactic Analysis 18

19 The Language of a Grammar Given a grammar G, the language of G is L(G) L(G) V T L(G) = {w V T S w} Compiler Construction Syntactic Analysis 19

20 Sentential Forms Leftmost derivation Leftmost non-terminal is replaced at each step Rightmost derivation replaces the rightmost non-terminal at each step Sentential form A set of grammar symbols that may obtained from a set of valid derivations Leftmost sentential form A set of grammar symbols that may obtained from a set of valid leftmost derivations Compiler Construction Syntactic Analysis 20

21 Regular Languages and CFLs All regular languages are context-free Consider the regular expression a b Let G = {A,B},{a,b},A,{A aa B,B bb ɛ} Compiler Construction Syntactic Analysis 21

22 Producing a Grammar from a Regular Language 1. Construct an NFA from the regular expression 2. Each state in the NFA corresponds to a non-terminal symbol 3. For a transition from state A to state B given input symbol x, add a production of the form A xb 4. If A is a final state, add the production A ɛ Compiler Construction Syntactic Analysis 22

23 Parse Trees A graphical representation of a sequence of derivations E E + E Each interior node is a non-terminal and its id E * E children are the right side of one of the id id non-terminal s productions Compiler Construction Syntactic Analysis 23

24 Parse Trees If you read the leaves of the tree from left to right they form a sentential form E E + E Also called the yield or frontier of the parse tree id E * E All the leaves need not be terminals; the parse tree id id may be incomplete Valid sentential forms can contain non-terminals Compiler Construction Syntactic Analysis 24

25 Ambiguity Given the productions E E + E E E (E) id Derive id + id id: E E + E id + E id + E E id + id E id + id id or E E E E + E E id + E E id + id E id + id id Compiler Construction Syntactic Analysis 25

26 Ambiguity and Parse Trees A grammar G is ambiguous if a string in L(G) can have more than one parse tree E E E + E E * E id E * E E + E id id id id id Compiler Construction Syntactic Analysis 26

27 Consequences of Ambiguity Ambiguity is generally bad Often means there is more than one way to interpret a string Add before multiply or multiply before add? An ambiguous grammar should be rewritten to remove the ambiguity Compiler Construction Syntactic Analysis 27

28 Removing the Ambiguity Consider the rewritten productions E T E + T T F T F F (E) id E T E + T T * F F id F id id Here only one parse tree is possible Compiler Construction Syntactic Analysis 28

29 Disambiguating Rules Can we provide rules for disambiguating id + (id id) from (id + id) id Compiler Construction Syntactic Analysis 29

30 Top-down Parsing Recursive descent is an example Grows the parse tree from the root down to the leaves Useful for recognizing flow-of-control constructs since they are always labeled with a keyword (e.g., if,while,do, for) Requires each production for the same non-terminal to begin with a unique token Compiler Construction Syntactic Analysis 30

31 Left factoring Can be used to factor out a common prefix in two of more productions For example, to parse if...then vs. if...then...else C if E then S else S if E then S Left factor the grammar (factor out common left expression): C if E then SX X else S ɛ Compiler Construction Syntactic Analysis 31

32 Top-down Parsing Two requirements Left-factor the grammar Produce grammar in which no productions for the same nonterminal have a common prefix No left recursion A + Aα Parser could get into an infinite loop Compiler Construction Syntactic Analysis 32

33 Top-down Parsing Top-down parsing produces a sequence of left-most derivations A Bx Cy B z C w Produces two strings: zx and wy Compiler Construction Syntactic Analysis 33

34 Top-down Parsers Two common approaches are used in top-down parsing Recursive descent parser Recursive The structure of the grammar is hard-coded into the parsing program Table-driven parser Non-recursive The structure of the language is encoded in a parse table Compiler Construction Syntactic Analysis 34

35 Recursive Descent Relatively easy to implement Reads the input stream (from the scanner) left to right and verifies its correctness Perl has a recursive descent parser (Parse::RecDescent) Recursive, since parsing is accomplished via recursive procedures Descent, since parsing is top-down (descends from the root down the branches to the leaves) Compiler Construction Syntactic Analysis 35

36 Recursive Descent Each non-terminal is a subroutine call A Bx Cy B z C w A B x B z 6 7 C 3 4 y 5 C 8 w 9 Compiler Construction Syntactic Analysis 36

37 Recursive Descent A candidate grammar: Bad because of left recursion E T E + T T F T F F (E) d The grammar can be modified to support a recursive descent parser: E T E E +T E ɛ T FT T FT ɛ F (E) d Compiler Construction Syntactic Analysis 37

38 Generalized Parser public abstract class RecursiveDescent { private String input; protected int cursor = 0; public RecursiveDescent() { getinputstring(); if ( parse() && cursor == input.length() ) { System.out.println("Accept"); } else { error(); } } protected final boolean checknexttoken(char ch) { // Ignore whitespace } while ( cursor < input.length() && (input.charat(cursor) == input.charat(cursor) == \t ) ) { cursor++; } return (cursor < input.length())? input.charat(cursor++) == ch : false; } protected static void error() { System.out.println("Invalid string"); System.exit(1); } protected final void getinputstring() { input = Console.In.getString(); } public abstract boolean parse(); Compiler Construction Syntactic Analysis 38

39 Subclass for Given Grammar (1) public class Expression extends RecursiveDescent { /* * Original Grammar: * E -> T E + T * T -> F T * F * F -> ( E ) d * * Adapted Grammar: * E -> T E * E -> + T E e * T -> F T * T -> * F T e * F -> ( E ) d * * Note method names: E1() => E and T1() => T */ public boolean parse() { return E(); } public static void main(string[] args) { new Expression(); } // Continued... Compiler Construction Syntactic Analysis 39

40 Subclass for Given Grammar (2) private boolean E() { int pos = cursor; // E -> T E if ( T() && E1() ) { return true; } cursor = pos; // Backtrack return false; } E T E Compiler Construction Syntactic Analysis 40

41 Subclass for Given Grammar (3) private boolean E1() { int pos = cursor; // E -> + T E if ( checknexttoken( + ) && T() && E1() ) { return true; } cursor = pos; // Backtrack // E -> e return true; } E +T E ɛ Compiler Construction Syntactic Analysis 41

42 Subclass for Given Grammar (4) } private boolean T() { int pos = cursor; // T -> F T if ( F() && T1() ) { return true; } cursor = pos; // Backtrack return false; } T FT Compiler Construction Syntactic Analysis 42

43 Subclass for Given Grammar (5) private boolean T1() { int pos = cursor; // T -> * F T if ( checknexttoken( * ) && F() && T1() ) { return true; } cursor = pos; // Backtrack // T -> e return true; } T FT ɛ Compiler Construction Syntactic Analysis 43

44 Subclass for Given Grammar (6) } private boolean F() { int pos = cursor; // F -> ( E ) if ( checknexttoken( ( ) && E() && checknexttoken( ) ) ) { return true; } cursor = pos; // Backtrack // F -> d if ( checknexttoken( d ) ) { return true; } cursor = pos; // Backtrack return false; } F (E) d Compiler Construction Syntactic Analysis 44

45 Backtracking The example recursive descent parser used backtracking Recursive descent parsing is criticized as being inefficient due to backtracking Some grammars can be written so that no backtracking is required The right side of the production starts with a terminal, so you know immediately which production to apply A top-down parser that requires no backtracking is called a predictive parser Compiler Construction Syntactic Analysis 45

46 The Bad News Some grammars cannot be processed with a top-down parser We need to determine the characteristics required to make a top-down parser feasible Compiler Construction Syntactic Analysis 46

47 Preprocessing Needed FIRST(α) is the set of terminals that begin strings derived from α A Bx Cy B z C w FIRST(B) = {z} FIRST(C) = {w} FIRST(A) = {z, w} Compiler Construction Syntactic Analysis 47

48 One Criteria Given a production of the form A α β if FIRST(α) FIRST(β), then a top-down parser cannot be used Compiler Construction Syntactic Analysis 48

49 ɛ Productions ɛ productions complicate the situation FOLLOW(A) is the set of terminals that can appear immediately to the right of A in some sentential form A Bx Cy B z ɛ C w FIRST(B) = {z} FIRST(C) = {w} FIRST(A) = {z, w} FOLLOW(B) = {x} FOLLOW(C) = {y} FOLLOW(A) = {$}(end of input) Compiler Construction Syntactic Analysis 49

50 FOLLOW Without any ɛ productions, FIRST would be sufficient Formally: If X V N V T, then FIRST(X) = { {X}, if X V T {a a V T and X aβ}, otherwise If A V N, then FOLLOW(A) = {a a V T and A αaaβ} How do we compute FIRST and FOLLOW? Compiler Construction Syntactic Analysis 50

51 FIRST Computation SetOfTerminalSymbols FIRST(GrammarSymbol X) { if ( X is a terminal ) F {X}; FIRST(X) is just X } else { F ; if ( X ɛ is a production ) F F ɛ; Add ɛ to FIRST(X) if ( X y 1 y 2...y n is a production ) { if ( i such that ɛ FIRST(y 1 ), ɛ FIRST(y 2 ),..., ɛ FIRST(y i 1 ), and a FIRST(y i ) ) F F a; if ( ɛ FIRST(y 1 ), ɛ FIRST(y 2 ),..., ɛ FIRST(y n ) ) F F ɛ; Add ɛ to FIRST(X) } } return F; Compiler Construction Syntactic Analysis 51

52 FIRST In a nutshell: If A ɛ, then FIRST(A) = {a V T A aβ} Else, if A ɛ, then FIRST(A) = {a V T A aβ} {ɛ} (if A ɛ) Compiler Construction Syntactic Analysis 52

53 FOLLOW Computation SetOfTerminalSymbols FOLLOW(NonTerminalSymbol A) { F ; } if ( A is the start symbol ) F F $ ; if ( B αaβ is a production ) F F (FIRST(β) - ɛ); if ( C αa or (C αaγ and ɛ FIRST(γ)) ) F F FOLLOW(C); return F; α can be ɛ Compiler Construction Syntactic Analysis 53

54 FOLLOW In a nutshell: + If S αa, then FOLLOW(A) = {a V T S + αaaβ} Else, if S + αa, then FOLLOW(A) = {a V T S + αaaβ} {$} Compiler Construction Syntactic Analysis 54

55 FIRST and FOLLOW Example Compute the FIRST and FOLLOW sets for the grammar from our recursive descent parser was built: E T E E +T E ɛ T FT T FT ɛ F (E) d Compiler Construction Syntactic Analysis 55

56 FIRST and FOLLOW Example E T E E +T E ɛ T FT T FT ɛ F (E) d The solution: FIRST(+) = {+} FIRST( ) = { } FIRST(d) = {d} FIRST(() = {(} FIRST()) = {)} FIRST(E) = {(,d} FIRST(E ) = {ɛ,+} FIRST(T ) = {(,d} FIRST(T ) = {ɛ, } FIRST(F) = {(,d} FOLLOW(E) = {$,)} FOLLOW(E ) = {$,)} FOLLOW(T) = {+,), $} FOLLOW(T ) = {+,),$} FOLLOW(F) = {, +,), $} Compiler Construction Syntactic Analysis 56

57 LL(1) Grammar Scanning Left-to-right Leftmost derivation 1 symbol lookahead LL(2),..., LL(k) means 2,..., k lookahead symbols Most parsers have just one symbol of lookahead Compiler Construction Syntactic Analysis 57

58 LL(1) Grammar Formally, a grammar is LL(1) if and only if whenever A α β 1. FIRST(α) FIRST(β) = 2. At most one of α or β can derive ɛ 3. If β ɛ, then α does not derive any string that starts with a terminal in FOLLOW(A) All LL(1) grammars can be parsed by a recursive descent parser, and recursive descent parsers can parse only LL(1) grammars Compiler Construction Syntactic Analysis 58

59 Common Prefixes Recall the common prefix example: C if E then S else S if E then S FIRST(if E then S else S) = {if} FIRST(if E then S) = {if} Thus the grammar is not LL(1), but the factored grammar is LL(1) (but ambiguous): C if E then SX X else S ɛ Compiler Construction Syntactic Analysis 59

60 Left Recursion Consider the grammar: E E + d d FIRST(E + d) = {d} FIRST(d) = {d} Thus the grammar is not LL(1) A recursive descent parser would succumb to infinite recursion Compiler Construction Syntactic Analysis 60

61 Parse Table from FIRST, FOLLOW If more than one production matches, then the grammar is not LL(1) For any two productions P i, P j, FIRST(P i ) FIRST(P j ) = If A α and b FIRST(α), then parsetable[a][b] = A α If X α and ɛ FIRST(α), then for each b FOLLOW(X) parsetable[x][b] = X α Compiler Construction Syntactic Analysis 61

62 Parse Table for Example Grammar Build an LL(1) parse table for our sample grammar: E T E E +T E ɛ T FT T FT ɛ F (E) d FIRST and FOLLOW sets: FIRST(+) = {+} FIRST( ) = { } FIRST(d) = {d} FIRST(() = {(} FIRST()) = {)} FIRST(E) = {(,d} FIRST(E ) = {ɛ,+} FIRST(T ) = {(,d} FIRST(T ) = {ɛ, } FIRST(F) = {(,d} FOLLOW(E) = {$,)} FOLLOW(E ) = {$,)} FOLLOW(T) = {+,), $} FOLLOW(T ) = {+,),$} FOLLOW(F) = {, +,), $} Compiler Construction Syntactic Analysis 62

63 Parse Table for Example Grammar The solution: Top of Input Symbol Stack d + ( ) $ E E TE E T E E E +TE E ɛ E ɛ T T FT T FT T T ɛ T FT T ɛ T ɛ F F d F (E) Compiler Construction Syntactic Analysis 63

64 LL(1) Table-driven Parser Input a a 1 2 a 3 a n $ Stack LL Parser Output Parse Table Compiler Construction Syntactic Analysis 64

65 LL(1) Parsing Algorithm LL Parser() { stack.push(s); Push start symbol onto empty stack } } a scanner.getnexttoken(); while ( not stack.empty() ) { X stack.top(); if ( X is a non-terminal and parsetable[x][a] = X y 1...y k ) { Get next token Look at top of stack stack.pop(); Pop off top item stack.push(y k...y 1 ); } else if ( X = a ) { stack.pop(); a scanner.getnexttoken(); } else Error(); Push left side symbols on in reverse order Pop off top item Get next token Illegal string Compiler Construction Syntactic Analysis 65

66 Parsing Example Stack Input Rule $ E d + d * d $ E T E $ E T d + d * d $ T FT $ E T F d + d * d $ F d $ E T d d + d * d $ $ E T + d * d $ T ɛ $ E + d * d $ E +TE $ E T + + d * d $ $ E T d * d $ T FT $ E T F d * d $ F d $ E T d d * d $ $ E T * d $ T FT $ E T F* * d $ $ E T F d $ F d $ E T d d $ $ E T $ T ɛ $ E $ E ɛ $ $ Accept Compiler Construction Syntactic Analysis 66

67 Another Parsing Example Stack Input Rule $ E (d + d) * d$ E T E $ E T (d + d) * d$ T FT $ E T F (d + d) * d$ F (E) $ E T )E( (d + d) * d$ $ E T )E d + d) * d$ E T E $ E T )E T d + d) * d$ T FT $ E T )E T F d + d) * d$ F d $ E T )E T d d + d) * d$ $ E T )E T + d) * d$ T ɛ $ E T )E + d) * d$ E +T E $ E T )E T + + d) * d$ $ E T )E T d) * d$ T FT $ E T )E T F d) * d$ F d $ E T )E T d d) * d$ $ E T )E T ) * d$ T ɛ $ E T )E ) * d$ E ɛ $ E T ) ) * d$ $ E T * d$ T FT $ E T F * d$ $ E T F d$ F d $ E T d d$ $ E T $ T ɛ $ E $ E ɛ $ $ Accept Compiler Construction Syntactic Analysis 67

68 Try a Non-LL(1) Grammar E E + id id Observe FIRST(E + id) = FIRST(id) = {id} Recursive descent parser: infinite recursion Parse table: Top of Input Symbol Stack d $ E E id E E + id Compiler Construction Syntactic Analysis 68

69 Top-down Parsing Summary To produce a top-down parser: 1. Eliminate left recursion and common prefixs; this yields an LL(1) grammar 2. Find the FIRST and FOLLOW sets 3. Build either the recursive descent parser methods or the parsing table Compiler Construction Syntactic Analysis 69

70 Limitations of LL(1) Grammars In many cases a grammar G 1 can be easily devised to represent strings in a language L(G 1 ), but G 1 is not LL(1) Sometimes G 1 can be rewritten to form G 2, where L(G 1 ) = L(G 2 ) and G 2 is LL(1) Some context-free languages have no LL(1) grammars Compiler Construction Syntactic Analysis 70

71 Bottom-up Parsing Grows parse tree from the leaves up Only two choices when scanning input shift symbol onto stack reduce Parser reduces in the reverse order of a rightmost derivation Bottom-up parsers are more powerful than top-down parsers They can be used to parse a larger variety of grammars Compiler Construction Syntactic Analysis 71

72 Reduction E E + E E E (E) E id E E + E E + E E E + E id E + id id id + id id Parser gives a rightmost reverse derivation Compiler Construction Syntactic Analysis 72

73 Handles A handle of a string is a substring that matches the right side of a production whose reduction to the non-terminal on the left side represents one step along the reverse of a rightmost derivation For unambiguous grammars, every right-sentential form has a unique handle Compiler Construction Syntactic Analysis 73

74 Handle More Formally A handle of a right-sentential form γ is a production A β and a position in γ where β can be found If (A β,k) is a handle, then replacing β in γ at position k with A produces the previous right-sentential form in a rightmost derivation of γ The substring to the right of a handle contains only terminal symbols Compiler Construction Syntactic Analysis 74

75 Handle Pruning Begin with string to parse Find handle and replace with the left side of a production that produces that handle Repeat until only the start symbol remains Compiler Construction Syntactic Analysis 75

76 Handle Pruning Example E E + T T T T F F F d Sentential Form d + d d F + d d T + d d E + d d E + F d E + T d E + T F E + T E Handle (F d,1) (T F,1) (E T,1) (F d,3) (T F,3) (F d,5) (T T F,3) (E E + T,1) Observe that this a rightmost derivation in reverse Compiler Construction Syntactic Analysis 76

77 Shift-Reduce Parsing Two problems to solve Find substring to be reduced in a right-sentential form Determine what production to choose in case more than one production has that substring on its right side Compiler Construction Syntactic Analysis 77

78 Overview of Process Stack contains states and grammar symbols Stack Input a a a a n $ Grammar symbols on stack represent a viable prefix LR Parser Action Goto Parse Table Compiler Construction Syntactic Analysis 78

79 Parse Table Action shift reduce Stack Input a a a a n $ LR Parser Goto Action Goto Next state Parse Table Compiler Construction Syntactic Analysis 79

80 Parse Table Actions Shift Pushes input symbol and state Input on to the stack Stack a 1 a a a 2 3 n $ Reduce Replaces a LR Parser Action Goto string of symbols on the stack with a non-terminal Parse Table Symbols on the stack can be either terminals or non-terminals Compiler Construction Syntactic Analysis 80

81 Shift-Reduce Parsing Stack holds grammar symbols $ indicates bottom of stack Input buffer for string to be parsed $ indicates end of string Parser activity shifts zero or more input symbols onto the stack until a handle β is on the top of the stack β is then reduced to the left side of a production Compiler Construction Syntactic Analysis 81

82 Shift-Reduce Parsing Initial parser state Stack: $ Input: w$ (Stack grows to the right; string is consumed from left to right) Final parser state (if no errors) Stack: $S Input: $ Parser actions Shift next input symbol to top of stack Reduce handle on top of stack to non-terminal Accept when string consumed and S on stack Error when string cannot be parsed Compiler Construction Syntactic Analysis 82

83 Viable Prefix Prefix of a right sentential form that can appear on the stack of a shiftreduce parser Compiler Construction Syntactic Analysis 83

84 Types of Bottom-up Parsers SLR Simple LR LR(0), no lookahead LR LR(1), more powerful, but requires a lot of memory LALR Look ahead LR Yacc is LALR(1) Compiler Construction Syntactic Analysis 84

85 SLR We ll concentrate on SLR since it is the simplest form To construct an SLR parse table we need items An item consists of a production and a numeric position within that production An item encodes where you are in a production Compiler Construction Syntactic Analysis 85

86 Expression Grammar E E + E E E (E) id compare to E E + T T T T F F F (E) id Compiler Construction Syntactic Analysis 86

87 Canonical LR(0) States 1. Augment the grammar by adding a new production S S 2. closure operation sets up states 3. goto operation computes transitions between states Compiler Construction Syntactic Analysis 87

88 LR(0) Items An LR(0) item of a grammar G is a production of G with a dot ( ) at some position of the right side. Example: Four items can be derived from production A XYZ A XY Z A X YZ A XY Z A XY Z Compiler Construction Syntactic Analysis 88

89 Interpreting LR(0) Items An item indicates how much of a production we have seen at a given point in the parsing process The item [A X Y Z] means we have seen a string derivable from X and hope to see a string derivable from Y Z Compiler Construction Syntactic Analysis 89

90 Closure Algorithm ItemSet closure(itemset I) { J I; do { Jold J; for each item [A α Bβ] J and each production B γ G do { J J {B γ}; } } while ( J J old ); return J; } B is a non-terminal If one B-production is added to the closure with a dot on the left end, then all B-productions will be added to the closure Compiler Construction Syntactic Analysis 90

91 Closure closure([e E + T]) = E E + T T T F T F F (E) F id Compiler Construction Syntactic Analysis 91

92 goto Function goto(i, X) I is a set of items (really just a state) X is a grammar symbol goto(i,x) is defined as the closure of the set of all items [A αx β] such that [A α Xβ] is in I Intuitively, if I is the set of items valid for a viable prefix γ, then goto(i,x) is the set of items valid for the viable prefix γx Compiler Construction Syntactic Analysis 92

93 LR(0) Item Sets E E E T T F F E I0 E E + T T T * F F ( E ) d ( d T F ( E E E T T F I 1 E E + T I 2 T T I 3 F F * I 4 F ( E ) E E + T E T T T * F T F F ( E ) F d d T + * E F d ( ( I 8 E E + T T T * F T F F ( E ) F d I 9 T T * F F ( E ) F d F I 10 T T * F I 6 F ( E ) E E + T ) I 7 F ( E ) T d * E T I 11 E + T T * F + F I 5 d Compiler Construction Syntactic Analysis 93

94 Set-of-Items Construction SetOfItems items(grammar G ) { C { closure ([S S])}); do { Cold C; for each set of items I C and each grammar symbol X such that goto(i,x) is not empty do { C C { goto(i,x) }; } } while ( C C old ); return C; } G is the augmented grammar Compiler Construction Syntactic Analysis 94

95 SLR Parse Table Construction BuildSLRParser(Grammar G ) { Initialize all the entries in the goto and action tables to error ; C items(g ); C = {I 0,I 1,...,I n } for each item set I i C do { if [A α aβ] I i and goto(i i,a) = I j } action([i][a]) shift j ; if [A α ] I i and A S for all a FOLLOW(A) do action([i][a]) reduce A α ; if [S S ] I i action([i][$]) accept ; } for each non-terminal A G do if goto(i i,a) = I j goto[i][a] j; The initial state of the parser is i where [S S] I i ; a is a terminal G is the augmented grammar Compiler Construction Syntactic Analysis 95

96 SLR Parsing Example FOLLOW(E) = {$, +,)} FOLLOW(T ) = {$,+,,)} FOLLOW(F) = {$, +,,)} Compiler Construction Syntactic Analysis 96

97 SLR Parse Table Action Goto State d + ( ) $ E T F 0 shift 5 shift shift 8 Accept 2 reduce shift 9 reduce reduce E T E T E T 3 reduce reduce reduce reduce T F T F T F T F 4 shift 5 shift reduce reduce reduce reduce F d F d F d F d 6 shift 8 shift 7 7 reduce reduce reduce reduce F (E) F (E) F (E) F (E) 8 shift 5 shift shift 5 shift reduce reduce reduce reduce T T F T T F T T F T T F 11 reduce shift 9 reduce reduce E E + T E E + T E E + T Compiler Construction Syntactic Analysis 97

98 LR Parsing Algorithm LR Parser() { stack.push(s); done false; } a scanner.getnexttoken(); while ( not done ) { } s stack.top(); if ( action[s][a] = shift s ) { stack.push(a); stack.push(s ); a = scanner.getnexttoken(); } else if ( action[s][a] = reduce A B ) { stack.pop 2 B symbols; s stack.top(); stack.push(a); stack.push(goto[s ][A]); } else if ( action[s][a] = accept ) { done true; } else { } Error(); Push initial state onto empty stack Get next token Look at state on top of stack Pop off some symbols Illegal string Compiler Construction Syntactic Analysis 98

99 Parsing Example Stack Input Rule $ S0 (d + d) * d $ Shift 4 $ S0(4 d + d) * d $ Shift 5 $ S0(4d5 + d) * d $ Reduce F d $ S0(4F3 + d) * d $ Reduce T F $ S0(4T2 + d) * d $ Reduce E T $ S0(4E6 + d) * d $ Shift 8 $ S0(4E6+8 d) * d $ Shift 5 $ S0(4E6+8d5 ) * d $ Reduce F d $ S0(4E6+8F3 ) * d $ Reduce T F $ S0(4E6+8T 11 ) * d $ Reduce T E + T $ S0(4E6 ) * d $ Shift 7 $ S0(4E6)7 * d $ Reduce F (E) $ S0F3 * d $ Reduce T F $ S0T 2 * d $ Shift 9 $ S0T 2*9 d $ Shift 5 $ S0T2*9d5 $ Reduce F d $ S0T2*9F10 $ Reduce T T F $ S0T2 $ Reduce E T $ S0E1 $ Accept Compiler Construction Syntactic Analysis 99

100 Comparing Grammars LR(1) grammars describe languages that are a proper superset of languages represented by LL(1) grammars LR(1) is more powerful than LALR(1) LALR(1) is more efficient than LR(1) For a language like C: LR(1) parser has thousands of states LALR(1) parser has hundreds of states Compiler Construction Syntactic Analysis 100

101 Comparing Context-free Grammars LL(1) SLR(1) LALR(1) LR(1) LR( k ) CFGs Compiler Construction Syntactic Analysis 101

102 Chomsky s Grammar Hierarchy Consider productions of the form α β Type Name Criteria Recognizer Type 3 Regular A a ab Finite automaton Type 2 Context-free A α Push-down automaton Type 1 Context-sensitive α β Linear bounded automaton Type 0 Unrestricted α ɛ Turing machine Compiler Construction Syntactic Analysis 102

103 Grammar Hierarchy Unrestricted Context sensitive Context free Regular Type 3 Type 2 Type 1 Type 0 Compiler Construction Syntactic Analysis 103

104 Error Handling Compilers cannot only process syntactically correct programs Language specifications do not usually describe how the compiler should respond to syntactical errors Review of types of errors Lexical Syntactic Semantic Logical Compiler Construction Syntactic Analysis 104

105 Syntactic Errors What should be done when the stream of tokens coming from the lexer disobeys the grammatical rules of the language? Compiler Construction Syntactic Analysis 105

106 Goals Errors should be reported clearly and accurately Some error recovery should be performed so subsequent errors can be detected The error detection and reporting mechanism should not significantly slow down the processing of correct programs Compiler Construction Syntactic Analysis 106

107 Issues Sometimes an error exist many lines before it is detected Types of errors are dependent on the programming language used See Example 4.1 in the dragon book Compiler Construction Syntactic Analysis 107

108 Error Handling Report the location of the detected error at least line number possibly the position within that line report problem Recovery A poor job may produce many spurious errors One strategy: skip bad tokens and continue with a number of good tokens until any subsequent errors are reported Compiler Construction Syntactic Analysis 108

109 Error Recovery Strategies (1) Panic-mode Discard tokens until some synchronizing token is detected Advantage simple to implement won t enter an infinite loop Compiler Construction Syntactic Analysis 109

110 Error Recovery Strategies (2) Phrase-level Perform local correction on remaining input (e.g., replace comma by semicolon) to allow parser to continue Used first with top-down parsers Has difficulty coping with errors that occur before the point of detection Compiler Construction Syntactic Analysis 110

111 Error Recovery Strategies (3) Error productions Augment grammar with special error rules Very useful if certain erroneous constructs are anticipated Yacc supports error productions Compiler Construction Syntactic Analysis 111

112 Error Recovery Strategies (4) Global correction Finds the minimal number of corrections required to produce a good parse tree from a bad one Interesting from a theoretical point of view, but not too practical Corrected parse tree obviously may not be what the programmer intended! Compiler Construction Syntactic Analysis 112

113 Yacc/Bison Program Used to generate LALR(1) parsers Developed by S.C. Johnson YACC stands for Yet another compiler compiler As with Lex, originally for C under Unix, but other platforms are supported Yacc generated C code can be linked with Lex generated C code for a ready-made lexer/parser combination GNU Bison is the modern version that we will use We ll just call it Yacc, though Compiler Construction Syntactic Analysis 113

114 Yacc Specification %{ %} %% %% C/C++ Declarations Yacc Declarations Rules Programmer functions Compiler Construction Syntactic Analysis 114

115 Yacc Specification (2) %{ %} %% %% C/C++ Declarations Yacc Declarations Rules Programmer functions 1. C/C++ macros and declarations are placed in the C/C++ declarations section 2. Yacc token declarations and precedence assignments are placed in the Yacc declarations section 3. Code to execute when productions are matched is placed placed in the rules section 4. Arbitrary C/C++ code is placed in the programmer functions section; functions named yylex() and yyerror() (normally produced by Lex) must be available Compiler Construction Syntactic Analysis 115

116 Yacc Rules Consist of a grammar production and an associated action The Yacc syntax for the rule A Bx C is A : B x { $$ = new ANode($1, "x"); cout << "Matched A -> Bx" << endl; } C { $$ = new ANode($1); cout << "Matched A -> C" << endl; } ; Compiler Construction Syntactic Analysis 116

117 Yacc Rules A Bx C A : B x { $$ = new ANode($1, "x"); cout << "Matched A -> Bx" << endl; } C { $$ = new ANode($1); cout << "Matched A -> C" << endl; } ; The $$ metasymbol represents the value to be returned by the parser when the production is matched; it represents the left side non-terminal (A is this case) The $1, $2, etc. metasymbols represent the values of the grammar symbols matched on the right side of the production Since the parser works from the bottom up, the left side non-terminals will have already been matched and their values will be available Compiler Construction Syntactic Analysis 117

118 Example Yacc Specification %{ /* C/C++ declarations */ #include <ctype.h> int yylex(); void yyerror(char *); %} /* Yacc declarations */ %union { int value; int symbol; } %type <value> S E I %token <symbol> digit %left + %left * %% /* Rules */ S : E { printf("%d\n", $1); } /* epsilon */ {} ; E : E + E { $$ = $1 + $3; } E * E { $$ = $1 * $3; } ( E ) { $$ = $2; } I { $$ = $1; } ; I : I digit { $$ = 10 * $1 + ($2-0 ); } digit { $$ = $1-0 ; } ; %% /* C/C++ code */ int main() { while (!feof(stdin) ) { yyparse(); } return 0; } Compiler Construction Syntactic Analysis 118

119 Yacc Specification to Parser prog.y Declarations %% Production rules %% C procedures main() { yyparse(); } y.tab.c yyparse() DFA Parse Table Compiler Construction Syntactic Analysis 119

120 Build Process Declarations %% Production rules %% C procedures main() { yyparse(); } prog.y y.tab.c prog yacc gcc yacc prog.y gcc o prog y.tab.c Compiler Construction Syntactic Analysis 120

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

Context-free grammars

Context-free grammars Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

More information

Compiler Construction: Parsing

Compiler Construction: Parsing Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S. Bottom up parsing Construct a parse tree for an input string beginning at leaves and going towards root OR Reduce a string w of input to start symbol of grammar Consider a grammar S aabe A Abc b B d And

More information

Concepts Introduced in Chapter 4

Concepts Introduced in Chapter 4 Concepts Introduced in Chapter 4 Grammars Context-Free Grammars Derivations and Parse Trees Ambiguity, Precedence, and Associativity Top Down Parsing Recursive Descent, LL Bottom Up Parsing SLR, LR, LALR

More information

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence. Bottom-up parsing Recall For a grammar G, with start symbol S, any string α such that S α is a sentential form If α V t, then α is a sentence in L(G) A left-sentential form is a sentential form that occurs

More information

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form Bottom-up parsing Bottom-up parsing Recall Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form If α V t,thenα is called a sentence in L(G) Otherwise it is just

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program

More information

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309 PART 3 - SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 64 / 309 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing

More information

Syntactic Analysis. Top-Down Parsing

Syntactic Analysis. Top-Down Parsing Syntactic Analysis Top-Down Parsing Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing. LR Parsing Compiler Design CSE 504 1 Shift-Reduce Parsing 2 LR Parsers 3 SLR and LR(1) Parsers Last modifled: Fri Mar 06 2015 at 13:50:06 EST Version: 1.7 16:58:46 2016/01/29 Compiled at 12:57 on 2016/02/26

More information

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Formal Languages and Compilers Lecture VII Part 3: Syntactic A Formal Languages and Compilers Lecture VII Part 3: Syntactic Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

Top down vs. bottom up parsing

Top down vs. bottom up parsing Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs

More information

CA Compiler Construction

CA Compiler Construction CA4003 - Compiler Construction David Sinclair A top-down parser starts with the root of the parse tree, labelled with the goal symbol of the grammar, and repeats the following steps until the fringe of

More information

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017 Compilerconstructie najaar 2017 http://www.liacs.leidenuniv.nl/~vlietrvan1/coco/ Rudy van Vliet kamer 140 Snellius, tel. 071-527 2876 rvvliet(at)liacs(dot)nl college 3, vrijdag 22 september 2017 + werkcollege

More information

SYNTAX ANALYSIS 1. Define parser. Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing. 2. Mention the basic

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

UNIT-III BOTTOM-UP PARSING

UNIT-III BOTTOM-UP PARSING UNIT-III BOTTOM-UP PARSING Constructing a parse tree for an input string beginning at the leaves and going towards the root is called bottom-up parsing. A general type of bottom-up parser is a shift-reduce

More information

LR Parsing Techniques

LR Parsing Techniques LR Parsing Techniques Introduction Bottom-Up Parsing LR Parsing as Handle Pruning Shift-Reduce Parser LR(k) Parsing Model Parsing Table Construction: SLR, LR, LALR 1 Bottom-UP Parsing A bottom-up parser

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing 8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals. Bottom-up Parsing: Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals. Basic operation is to shift terminals from the input to the

More information

WWW.STUDENTSFOCUS.COM UNIT -3 SYNTAX ANALYSIS 3.1 ROLE OF THE PARSER Parser obtains a string of tokens from the lexical analyzer and verifies that it can be generated by the language for the source program.

More information

Syntax Analysis Part I

Syntax Analysis Part I Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,

More information

S Y N T A X A N A L Y S I S LR

S Y N T A X A N A L Y S I S LR LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number

More information

Table-Driven Parsing

Table-Driven Parsing Table-Driven Parsing It is possible to build a non-recursive predictive parser by maintaining a stack explicitly, rather than implicitly via recursive calls [1] The non-recursive parser looks up the production

More information

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character

More information

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token

More information

Syn S t yn a t x a Ana x lysi y s si 1

Syn S t yn a t x a Ana x lysi y s si 1 Syntax Analysis 1 Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token, tokenval Get next token Parser and rest of front-end Intermediate representation Lexical error Syntax

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 20, 2007 Outline Recap

More information

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform

More information

Compiler Construction 2016/2017 Syntax Analysis

Compiler Construction 2016/2017 Syntax Analysis Compiler Construction 2016/2017 Syntax Analysis Peter Thiemann November 2, 2016 Outline 1 Syntax Analysis Recursive top-down parsing Nonrecursive top-down parsing Bottom-up parsing Syntax Analysis tokens

More information

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing Parsing Wrapup Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing LR(1) items Computing closure Computing goto LR(1) canonical collection This lecture LR(1) parsing Building ACTION

More information

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) TD parsing - LL(1) Parsing First and Follow sets Parse table construction BU Parsing Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) Problems with SLR Aho, Sethi, Ullman, Compilers

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Types of parsing. CMSC 430 Lecture 4, Page 1

Types of parsing. CMSC 430 Lecture 4, Page 1 Types of parsing Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive)

More information

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root Parser Generation Main Problem: given a grammar G, how to build a top-down parser or a bottom-up parser for it? parser : a program that, given a sentence, reconstructs a derivation for that sentence ----

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 8! Bo;om- Up Parsing Shi?- Reduce LR(0) automata and

More information

CS308 Compiler Principles Syntax Analyzer Li Jiang

CS308 Compiler Principles Syntax Analyzer Li Jiang CS308 Syntax Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program.

More information

Introduction to Parsing. Comp 412

Introduction to Parsing. Comp 412 COMP 412 FALL 2010 Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make

More information

Outline. 1 Introduction. 2 Context-free Grammars and Languages. 3 Top-down Deterministic Parsing. 4 Bottom-up Deterministic Parsing

Outline. 1 Introduction. 2 Context-free Grammars and Languages. 3 Top-down Deterministic Parsing. 4 Bottom-up Deterministic Parsing Parsing 1 / 90 Outline 1 Introduction 2 Context-free Grammars and Languages 3 Top-down Deterministic Parsing 4 Bottom-up Deterministic Parsing 5 Parser Generation Using JavaCC 2 / 90 Introduction Once

More information

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Formal Languages and Compilers Lecture VII Part 4: Syntactic A Formal Languages and Compilers Lecture VII Part 4: Syntactic Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CS2210: Compiler Construction Syntax Analysis Syntax Analysis Comparison with Lexical Analysis The second phase of compilation Phase Input Output Lexer string of characters string of tokens Parser string of tokens Parse tree/ast What Parse Tree? CS2210: Compiler

More information

Syntax Analyzer --- Parser

Syntax Analyzer --- Parser Syntax Analyzer --- Parser ASU Textbook Chapter 4.2--4.9 (w/o error handling) Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 A program represented by a sequence of tokens

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6 Compiler Design 1 Bottom-UP Parsing Compiler Design 2 The Process The parse tree is built starting from the leaf nodes labeled by the terminals (tokens). The parser tries to discover appropriate reductions,

More information

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal Acknowledgements The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal Syntax Analysis Check syntax and construct abstract syntax tree if == = ; b 0 a b Error

More information

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018 Parsing Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018 Character stream Lexical Analyzer Machine-Independent Code Code Optimizer F r o n t e n d Token stream Syntax Analyzer Syntax tree Semantic

More information

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4) CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part I CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

More information

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Abstract Syntax Trees & Top-Down Parsing Review of Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology MIT 6.035 Parse Table Construction Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Parse Tables (Review) ACTION Goto State ( ) $ X s0 shift to s2 error error goto s1

More information

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved. Syntax Analysis Prof. James L. Frankel Harvard University Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved. Context-Free Grammar (CFG) terminals non-terminals start

More information

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

More information

How do LL(1) Parsers Build Syntax Trees?

How do LL(1) Parsers Build Syntax Trees? How do LL(1) Parsers Build Syntax Trees? So far our LL(1) parser has acted like a recognizer. It verifies that input token are syntactically correct, but it produces no output. Building complete (concrete)

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5 Compiler Design 1 Top-Down Parsing Compiler Design 2 Non-terminal as a Function In a top-down parser a non-terminal may be viewed as a generator of a substring of the input. We may view a non-terminal

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1 CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts UW CSE P 501 Spring 2018

More information

CS 4120 Introduction to Compilers

CS 4120 Introduction to Compilers CS 4120 Introduction to Compilers Andrew Myers Cornell University Lecture 6: Bottom-Up Parsing 9/9/09 Bottom-up parsing A more powerful parsing technology LR grammars -- more expressive than LL can handle

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

Lexical and Syntax Analysis. Bottom-Up Parsing

Lexical and Syntax Analysis. Bottom-Up Parsing Lexical and Syntax Analysis Bottom-Up Parsing Parsing There are two ways to construct derivation of a grammar. Top-Down: begin with start symbol; repeatedly replace an instance of a production s LHS with

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Compilers. Bottom-up Parsing. (original slides by Sam

Compilers. Bottom-up Parsing. (original slides by Sam Compilers Bottom-up Parsing Yannis Smaragdakis U Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Bottom-Up Parsing More general than top-down parsing And just as efficient Builds

More information

Parsing Part II (Top-down parsing, left-recursion removal)

Parsing Part II (Top-down parsing, left-recursion removal) Parsing Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

More information

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser Parser Generation Main Problem: given a grammar G, how to build a top-down parser or a bottom-up parser for it? parser : a program that, given a sentence, reconstructs a derivation for that sentence ----

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Easy for humans to write and understand String of characters

More information

Chapter 2 :: Programming Language Syntax

Chapter 2 :: Programming Language Syntax Chapter 2 :: Programming Language Syntax Michael L. Scott kkman@sangji.ac.kr, 2015 1 Regular Expressions A regular expression is one of the following: A character The empty string, denoted by Two regular

More information

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis. Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

CS 321 Programming Languages and Compilers. VI. Parsing

CS 321 Programming Languages and Compilers. VI. Parsing CS 321 Programming Languages and Compilers VI. Parsing Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = words Programs = sentences For further information,

More information

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1 CSE 401 Compilers LR Parsing Hal Perkins Autumn 2011 10/10/2011 2002-11 Hal Perkins & UW CSE D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts 10/10/2011

More information

Building Compilers with Phoenix

Building Compilers with Phoenix Building Compilers with Phoenix Syntax-Directed Translation Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate

More information

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1 Top-Down Parsing Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 2 Non-terminal as a Function In a top-down parser a non-terminal may be viewed

More information

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh. Bottom-Up Parsing II Different types of Shift-Reduce Conflicts) Lecture 10 Ganesh. Lecture 10) 1 Review: Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Doesn

More information

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012 LR(0) Drawbacks Consider the unambiguous augmented grammar: 0.) S E $ 1.) E T + E 2.) E T 3.) T x If we build the LR(0) DFA table, we find that there is a shift-reduce conflict. This arises because the

More information

LR Parsing Techniques

LR Parsing Techniques LR Parsing Techniques Bottom-Up Parsing - LR: a special form of BU Parser LR Parsing as Handle Pruning Shift-Reduce Parser (LR Implementation) LR(k) Parsing Model - k lookaheads to determine next action

More information

CSCI312 Principles of Programming Languages

CSCI312 Principles of Programming Languages Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGraw-Hill

More information

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;} Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)

More information