EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

Size: px
Start display at page:

Download "EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:"

Transcription

1 EDA180: Compiler Construc6on More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

2 Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens intermediate code Syntac6c analysis (parsing) Op6miza6on AST ASributed AST intermediate code Seman6c analysis Analysis Machine code genera6on Synthesis machine code 2

3 A closer look at the parser text Scanner tokens Defined by: regular expressions Parser Pure parsing concrete parse tree (implicit) context- free grammar AST building AST abstract grammar 3

4 S Recall main parsing ideas A X A X B C t r u v r u r u... t r u v r u r u... LL(1): decides to build Assign after seeing the first token of its subtree. The tree is built top down. LR(1): decides to build Assign after seeing the first token following its subtree. The tree is built bottom up. For each produc6on X -> γ we need to compute FIRST(γ): the tokens that can appear first in a γ deriva6on NULLABLE(γ): can the empty string be derived from γ? FOLLOW(X): the tokens that can follow an X deriva6on 4

5 Algorithm for construc6ng an LL(1) table initialize all entries table[x i, t j ] to the empty set. for each production p: X -> γ for each t FIRST(γ) add p to table[x, t] if NULLABLE(γ) for each t FOLLOW(X) add p to table[x, t] t 1 t 2 t 3 t 4 X 1 p1 p2 X 2 p3 p3 p4 If some entry has more than one element, then the grammar is not LL(1). 5

6 Exercise: what is NULLABLE(X)? Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z NULLABLE 6

7 Solu6on: what is NULLABLE(X) Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z NULLABLE true true false 7

8 Defini6on of NULLABLE Informally: NULLABLE(γ): true if the empty string can be derived from γ Formal definition, given G=(N,T,P,S) NULLABLE(ε) == true "(1) NULLABLE(t) == false "(2) where t T, i.e., t is a terminal symbol NULLABLE(X) == NULLABLE(γ 1 )... NULLABLE(γ n ) "(3) where X -> γ 1,... X -> γ n are all the productions for X in P NULLABLE(sγ) == NULLABLE (s) && NULLABLE (γ) "(4) where s N U T, i.e., s is a nonterminal or a terminal The equations for NULLABLE are recursive. How would you write a program that computes NULLABLE(X)? Just using recursive functions could lead to nontermination! 8

9 Fixed- point problems Computing NULLABLE(X) is an example of a fixed-point problem. These problems have the form: x == f(x) Can we find a value x for which the equation holds (i.e., a solution)? x is then called a fixed point of the function f. Fixed-point problems can (sometimes) be solved using iteration: Guess an initial value x 0, then apply the function iteratively, until the fixed point is reached: x 1 := f(x 0 ); x 2 := f(x 1 );... x n := f(x n-1 ); until x n == x n-1 This is called a fixed-point iteration, and x n is the fixed point. 9

10 Implement NULLABLE by a fixed- point itera6on represent NULLABLE as a vector nlbl[ ] of boolean variables initialize all nlbl[x] to false repeat changed = false for each nonterminal X with productions X -> γ 1,..., X -> γ n do newvalue = nlbl(γ 1 )... nlbl(γ n ) if newvalue!= nlbl[x] then nlbl[x] = newvalue changed = true fi do until!changed where nlbl(γ) is computed using the current values in nlbl[ ]. The computation will terminate because - the variables are only changed monotonically (from false to true) - the number of possible changes is finite (from all false to all true) 10

11 Exercise: compute NULLABLE(X) nlbl[ ] Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z iter 0 iter 1 iter 2 iter 3 f f f In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = nlbl(γ 1 )... nlbl(γ n ) where nlbl(γ) is computed using the current values in nlbl[ ]. 11

12 Solu6on: compute NULLABLE(X) nlbl[ ] Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a iter 0 iter 1 iter 2 iter 3 X f f t t Y f t t t Z f f f f In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = nlbl(γ 1 )... nlbl(γ n ) where nlbl(γ) is computed using the current values in nlbl[ ]. 12

13 Defini6on of FIRST Informally: FIRST(γ): the tokens that can occur as the first token in sentences derived from γ FIRST(ε) == Formal definition, given G=(N,T,P,S) "(1) FIRST(t) == { t "(2) where t T, i.e., t is a terminal symbol FIRST(X) == FIRST(γ 1 )... FIRST(γ n ) "(3) where X -> γ 1,... X -> γ n are all the productions for X in P FIRST(sγ) == FIRST(s) (if NULLABLE(s) then FIRST(γ) else fi) "(4) where s N U T, i.e., s is a nonterminal or a terminal The equations for FIRST are recursive. Compute using fixed-point iteration. 13

14 Implement FIRST by a fixed- point itera6on represent FIRST as a vector FIRST[ ] of token sets initialize all FIRST[X] to the empty set repeat changed = false for each nonterminal X with productions X -> γ 1,..., X -> γ n do newvalue = FIRST(γ 1 )... FIRST(γ n ) if newvalue!= FIRST[X] then FIRST[X] = newvalue changed = true fi do until!changed where FIRST(γ) is computed using the current values in FIRST[ ]. The computation will terminate because - the variables are changed monotonically (using set union) - the largest possible set is finite: T, the set of all tokens - the number of possible changes is therefore finite 14

15 Solu6on: compute FIRST(X) Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z FIRST[ ] NULLABLE t t f X Y Z iter 0 iter 1 iter 2 iter 3 In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = FIRST(γ 1 )... FIRST(γ n ) where FIRST(γ) is computed using the current values in FIRST[ ]. 15

16 Exercise: compute FIRST(X) Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z FIRST[ ] NULLABLE t t f iter 0 iter 1 iter 2 iter 3 X {a {a, c {a, c Y {c {c {c Z {a, c, d {a, c, d {a, c, d In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = FIRST(γ 1 )... FIRST(γ n ) where FIRST(γ) is computed using the current values in FIRST[ ]. 16

17 Defini6on of FOLLOW Informally: FOLLOW(X): the tokens that can occur as the first token following X, in any sentential form derived from the start symbol S. Formal definition, given G=(N,T,P,S) The nonterminal X occurs in the right-hand side of a number of productions. Let Y -> γ X δ denote such an occurrence, where γ and δ are arbitrary sequences of terminals and nonterminals. FOLLOW(X) == U FOLLOW(Y -> γ X δ ), "(1) over all occurrences Y -> γ X δ and where FOLLOW(Y -> γ X δ ) == "(2) FIRST(δ) (if NULLABLE(δ) then FOLLOW(Y) else fi) The equations for FOLLOW are recursive. Compute using fixed-point iteration. 17

18 Implement FOLLOW by a fixed- point itera6on represent FOLLOW as a vector FOLLOW[ ] of token sets initialize all FOLLOW[X] to the empty set repeat changed = false for each nonterminal X do newvalue == U FOLLOW(Y -> γ X δ ), for each occurrence Y -> γ X δ if newvalue!= FOLLOW[X] then FOLLOW[X] = newvalue changed = true fi do until!changed where FOLLOW(Y -> γ X δ ) is computed using the current values in FOLLOW[ ]. Again, the computation will terminate because - the variables are changed monotonically (using set union) - the largest possible set is finite: T 18

19 Exercise: compute FOLLOW(X) S -> Z $ Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a NULLABLE FIRST X t {a, c Y t {c Z f {a, c, d FOLLOW[ ] The grammar has been extended with end of file, $. X Y Z iter 0 iter 1 iter 2 iter 3 In each iteration, compute: newvalue == U FOLLOW(Y -> γ X δ ), for each occurrence Y -> γ X δ where FOLLOW(Y -> γ X δ ) is computed using the current values in FOLLOW[ ]. 19

20 Solu6on: compute FOLLOW(X) S -> Z $ Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a The grammar has been extended with end of file, $. NULLABLE FIRST X t {a, c Y t {c Z f {a, c, d FOLLOW[ ] iter 0 iter 1 iter 2 iter 3 X {a, c, d {a, c, d Y {a, c, d {a, c, d Z {$ {$ In each iteration, compute: newvalue == U FOLLOW(Y -> γ X δ ), for each occurrence Y -> γ X δ where FOLLOW(Y -> γ X δ ) is computed using the current values in FOLLOW[ ]. 20

21 Recall: Algorithm for construc6ng an LL(1) table initialize all entries table[x i, t j ] to the empty set. for each production p: X -> γ for each t FIRST(γ) add p to table[x, t] if NULLABLE(γ) for each t FOLLOW(X) add p to table[x, t] t 1 t 2 t 3 t 4 X 1 p1 p1, p2 X 2 p3 p3 p4 A collision in the table! The grammar is not LL(1)! The parser generator gives an error message! Use FIRST and FOLLOW to understand why! 21

22 Lang.jj Example 1: JavaCC gives a warning:... void stmt() : { { <ID> "(" idlist() ")" <ID> "=" exp()... JavaCC detects a conflict and gives a warning: > javacc Lang.jj Reading from Lang.jj... Warning: Choice conflict involving two expansions at line 56, column 3 and line 57, column 3 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion. Parser generated with 0 errors and 1 warnings. Note! JavaCC considers this a warning, and not an error. It will resolve the conflict by selecting the first production. This is probably not the behavior you want! 22

23 Example 1: resolving the conflict with LOOKAHEAD Lang.jj... void stmt() : { { LOOKAHEAD (2) <ID> "(" idlist() ")" <ID> "=" exp()... JavaCC now uses local lookahead 2 for the first production. There is no longer any conflict: > javacc Lang.jj Reading from Lang.jj... Parser generated successfully. Note! The LOOKAHEAD(2) command makes JavaCC use two tokens to match that production. Later productions (without LOOKAHEADs commands) will be matched with only one token. If they also match, JavaCC will give no warning or error. 23

24 Example 2: Add another produc6on that also matches Lang.jj... void stmt() : { { LOOKAHEAD (2) <ID> "(" idlist() ")" <ID> "(" exp() ")"... JavaCC uses local lookahead 2 for the first production. The first production will match if the lookahead is <ID> "(". The second production also matches. But JavaCC just selects the first rule, and gives no warning or error: > javacc Lang.jj Reading from Lang.jj... Parser generated successfully. So, beware! if you use LOOKAHEAD just to silence the warnings, without understanding how it works, you might get unexpected results! In this case, you need a lookahead of 3! 24

25 Example 3: An unlimited common prefix Lang.jj... void stmt() : { { name() "(" idlist() ")" name() "=" exp() void name() : { { <ID> ("." <ID>)*... JavaCC suggests lookahead, but will it help??? > javacc Lang.jj Reading from Lang.jj... Warning: Choice conflict involving two expansions at line 91, column 3 and line 92, column 3 respectively. A common prefix is: <ID> "." Consider using a lookahead of 3 or more for earlier expansion. Parser generated with 0 errors and 1 warnings. No! You need to factor out the common prefix! 25

26 Example 4: A led- recursive grammar Lang.jj... void exp() : { { exp() "+" term() term()... JavaCC recognizes the left recursion, and cannot generate a parser. > javacc Lang.jj Reading from Lang.jj... Error: Line 82, Column 1: Left recursion detected: "exp... --> exp..." Detected 1 errors and 0 warnings. You need to modify the grammar to remove the left recursion! 26

27 The "dangling else" problem S -> "if" E "then" S ["else" S] S -> ID "=" E E -> ID Construct a parse tree for: if a then if b then c = d else e = f The desired tree S Two possible parse trees! The grammar is ambiguous! S S S if a then if b then c = d else e = f if a then if b then c = d else e = f 27

28 Solu6ons to the "dangling else" problem Rewrite to equivalent unambiguous grammar - possible, but results in more complex grammar Use the ambiguous grammar - "by relying on "rule priority", the parser can build the correct tree. - "works for the dangling else problem, but not for ambiguous grammars in general Change the language - e.g., add a keyword "fi" that closes the "if"-statement - not always an option 28

29 Lang.jj Handling Dangling Else in JavaCC void stmt() : { { "if" exp() "then" stmt() [ "else" stmt() ] <ID> "=" exp() This is equivalent to: void stmt() : { { "if" exp() "then" stmt() optelse() <ID> "=" exp() void optelse() : { { "else" stmt() { // the empty production 29

30 Lang.jj Running it through JavaCC void stmt() : { { "if" exp() "then" stmt() [ "else" stmt() ] <ID> "=" exp() JavaCC output: > javacc Lang.jj Reading from Lang.jj... Warning: Choice conflict in [...] construct at line 66, column 28. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "else" Consider using a lookahead of 2 or more for nested expansion. Parser generated with 0 errors and 1 warnings. JavaCC will favour the "else" part, over the empty production. But will generate a warning. 30

31 Lang.jj Selec6ng the "else" part explicitly using LOOKAHEAD void stmt() : { { "if" exp() "then" stmt() [ LOOKAHEAD(1) "else" stmt() ] <ID> "=" exp() or: void stmt() : { { "if" exp() "then" stmt() optelse() <ID> "=" exp() void optelse() : { { LOOKAHEAD(1) "else" stmt() { // the empty production JavaCC output: > javacc Lang.jj Reading from Lang.jj... Parser generated successfully. 31

32 Advice when you get conflicts in JavaCC Look in detail at the conflict, to understand what goes on. Make sure you understand how JavaCC's LOOKAHEAD works. Is there left-recursion? Refactor the grammar. Is there an ambiguity? Usually refactor, but sometimes, like in the dangling-else situation, you can solve it by rule priority (adding LOOKAHEAD). Beware that JavaCC might report ambiguities as a common prefix problem. Is there a common prefix? Is it small and limited use local LOOKAHEAD. Can it be arbitrarily long? Refactor the grammar. Beware that JavaCC might suggest adding LOOKAHEAD, but that will not work. 32

33 The goal of parsing: an Abstract Syntax Tree text Scanner tokens Defined by: regular expressions Parser Pure parsing concrete parse tree (implicit) context- free grammar AST building AST abstract grammar 33

34 What does it describe? Concrete vs Abstract grammar Concrete Grammar Describes the concrete text representation of programs Abstract Grammar Describes the abstract structure of programs Main use Parsing text to trees Model representing the program inside compiler. Underlying formalism Context-free grammar Recursive data types What is named? What tokens occur in the grammar? Only nonterminals (productions are usually anonymous) all tokens corresponding to "words" in the text Independent of abstract structure Both nonterminals and productions. usually only tokens with values (identifiers, literals) Independent of parser and parser algorithm 34

35 Example: Concrete vs Abstract Concrete grammar Exp -> Exp "+" Term Exp -> Term Term -> ID Abstract grammar Add: Exp -> Exp Exp IdExp: Exp -> ID Productions are named! Exp Add Exp Term Term IdExp IdExp ID ID ID + ID 35

36 Abstract grammar vs. OO model Abstract grammar OO model Other terminology used (algebraic datatypes) nonterminal superclass type, sort production subclass constructor, operator Abstract grammar Add: Exp -> Exp Exp IdExp: Exp -> ID Add Exp IdExp A canonical abstract grammar corresponds to a two-level class hierarchy! 36

37 Example Java implementa6on Abstract grammar Add: Exp -> Exp Exp IdExp: Exp -> ID Add Exp IdExp abstract class Exp { class Add extends Exp { Exp exp1, exp2; class IdExp extends Exp { String ID; 37

38 JastAdd A compiler generation tool. Generates Java code. Supports ASTs and modular computations on ASTs. JastAdd: "Just add computations to the ast" Independent of the parser used. Developed at LTH, see Parser specification Parser L.jjt JJTree L.jj JavaCC *.ast *.java Abstract grammar creates objects *.ast *.ast JastAdd *.ast *.java *.ast *.jadd AST classes Computations 38

39 JastAdd abstract grammars (compared to canonical abstract grammars) Classes instead of nonterminals and productions Classes can be abstract (like in Java) Arbitrarily deep inheritance hierarchy (not just two levels) Support for optional, list, and token components Components can be named Right-hand side can be inherited from superclass (see BinExpr). No parentheses! You need to name all node classes in the AST. Program ::= Stmt*; abstract Stmt; Assignment : Stmt ::= IdExpr Expr; IfStmt : Stmt ::= Expr Then:Stmt [Else:Stmt]; abstract Expr; IdExpr : Expr ::= <ID:String>; IntExpr : Expr ::= <INT:String>; BinExpr : Expr ::= Left:Expr Right:Expr; Add : BinExpr; 39

40 Generated Java API, ordinary components abstract Stmt; WhileStmt : Stmt ::= Cond:Expr Stmt; abstract class Stmt extends ASTNode { class WhileStmt extends Stmt { Expr getcond(); Stmt getstmt(); WhileStmt getcond() getstmt() Expr Stmt Example AST A general class ASTNode is used as implicit superclass. A traversal API with get methods is generated. If component names are given, they are used in the API (getcond). Otherwise the type names are used (getstmt). 40

41 Generated Java API, lists Program ::= Stmt*; Program getstmts() class Program extends ASTNode { int getnumstmt(); // 0 if empty Stmt getstmt(int i); // numbered from 0 List<Stmt> getstmts(); // ite rator List<Stmt> Stmt Stmt Stmt Example AST The list is represented by a List object that can be used as an iterator: Program p =...; for (Stmt s : p.getstmts()) {... Or access a specific statement: Program p =...; if (p.getnumstmt() >= 1) { Stmt s = p.getstmt(0);... 41

42 Generated Java API, op6onals A ::= B [C]; A class A extends ASTNode { B getb(); boolean hasc(); C getc(); //Exception if not hasc() B Opt<C> C Example AST A general class ASTNode is used as implicit superclass. The traversal API includes a has method for the optional component. 42

43 Abstract grammar A ::= B [C]; B ::=...; C ::=...; Connec6on to JavaCC Generated by JavaCC/JJTree Generated by JastAdd Node SimpleNode ASTNode A B C Low-level traversal API. Will stop also at Opt and List nodes. Don't use normally. The high-level API is much more readable. The method dumptree prints the AST. Useful for testing. class ASTNode extends SimpleNode { int getnumchild(); ASTNode getchild(int i); ASTNode getparent(); // null for the root void dumptree(string indent, java.io.printstream pstream); Opt List 43

44 Defining an abstract grammar This is object-oriented modeling! What kinds of objects are there in the AST? E.g., Program, WhileStmt, Assignment, Add,... What are the generalized concepts (abstract classes)? E.g., Statement, Expression,... What are the components of an object? E.g., an Assignment has an Identifier and an Expression... Program ::=...; abstract Statement; abstract Expression; WhileStmt : Statement ::=...; Assignment : Statement ::= Identifier Expression;... 44

45 Use good names! when you write... A : B ::=... C ::= D E F D ::= X:E Y:E G ::= [H] J ::= <K:T> L ::= M*...the following should make sense An A is a special kind of B A C has a D, an E, and an F A D has one E called X and another E called Y A G may have a H A J has a K token of type T An L has a number of Ms Examples of bad naming (from inexperienced students) A ::= [OptParam]; OptParam ::= Name Type; A ::= Stmts*; abstract Stmts; While : Stmts ::= Exp Stmt; Better naming A ::= [Param]; Param ::= Name Type; A ::= Stmt*: abstract Stmt; While : Stmt ::= Exp Stmt; 45

46 Design simple abstract grammars! Don't model detailed parsing restrictions in the abstract grammar. Such grammars are just complex to write computations for. The special cases are easily ruled out either in the parser, or by additional semantic checks. Unnecessarily complex grammar A ::= First:B Rest:B* A ::= B* Better grammar simpler 46

47 Design an LL concrete grammar Design the abstract grammar first. Then design the concrete grammar, making it as similar as possible to the abstract grammar. Use an alternation rule for each superclass that occurs in a rhs. Use a composition rule for each concrete subclass. Make the grammar LL(1) Factor out common prefixes, or use local lookahead Eliminate ambiguities. Eliminate left-recursion using EBNF lists (*). Don't try to detect semantic errors during parsing, e.g., type errors. You cannot detect them all anyway, and they are easy to detect later. 47

48 Building the Abstract Syntax Tree text Scanner tokens Defined by: regular expressions Parser Pure parsing concrete parse tree (implicit) context- free grammar AST building AST abstract grammar 48

49 Seman6c ac6ons Code that is added to a parser, to perform actions during parsing. Usually, to build the AST. Old-style 1-pass compilers did the whole compilation as semantic actions. Parser generators support semantic actions in the parser specification. 49

50 JavaCC example CFG: stmt -> ifstmt assignment ifstmt -> IF "(" expr ")" stmt assignment-> ID ASSIGN expr Abstract grammar abstract Stmt; IfStmt : Stmt ::= Expr Stmt; Assignment : Stmt ::= IdExpr Expr; IdExpr : Expr ::= <ID:String>; Lang.jj without semantic actions: void stmt() : { { assignment() ifstmt() void ifstmt() : { { <IF> "(" expr() ")" stmt() void assignment() : { { <ID> <ASSIGN> expr() 50

51 JavaCC example CFG: stmt -> ifstmt assignment ifstmt -> IF "(" expr ")" stmt assignment-> ID ASSIGN expr Abstract grammar abstract Stmt; IfStmt : Stmt ::= Expr Stmt; Assignment : Stmt ::= IdExpr Expr; IdExpr : Expr ::= <ID:String>; Lang.jj with semantic actions: Stmt stmt() : {Stmt s; { (s = assignment() s = ifstmt()) {return s; IfStmt ifstmt() : {Expr e; Stmt s; { <IF> "(" e = expr() ")" s = stmt() {return new IfStmt(e, s); Assignment assignment() : {Token t; Expr e { t = <ID> <ASSIGN> e = expr() {return new Assignment(new IdExpr(t.image), e); 51

52 JJTree A tree- building preprocessor for JavaCC Parser specification with tree-building directives Parser specification with semantic actions for building trees Parser L.jjt JJTree L.jj JavaCC *.ast *.java Assignment 2: Use JJTree to automatically create a concrete syntax tree Assignment 3 and onwards: Use JJTree to build a JastAdd AST. Set the nodedefaultvoid option to true in the build file. Use #-directives in the.jjt file to build all AST nodes explicitly. Use JastAdd to generate the AST classes, bypassing JJTree generation of such classes. 52

53 Building ASTs with the JJTree stack JJTree has a stack for collecting the children of a node. It is easy to: build an AST node corresponding to a concrete rule skip building nodes for concrete rules without abstract counterpart build JastAdd Opt and List nodes build left-recursive ASTs from an iterating concrete grammar build left-recursive ASTs from a right-recursive concrete grammar build desired ASTs even when grammar has been left factored to eliminate a common prefix 53

54 Building an AST node for a concrete rule CFG: ifstmt -> IF "(" expr ")" stmt Abstract grammar IfStmt : Stmt ::= Expr Stmt; Lang.jjt with directives: void ifstmt() : #IfStmt { { <IF> "(" expr() ")" stmt() The current top of stack is marked expr() pushes an Expr node stmt() pushes a Stmt node #IfStmt causes a new IfStmt node to be created the nodes above the mark are popped they become children of the IfStmt node the new IfStmt node is pushed 54

55 Skip building an AST node for a concrete rule CFG: stmt -> ifstmt assignment Lang.jjt: void stmt() : { { ifstmt() assignment() Abstract grammar abstract Stmt; IfStmt : Stmt ::= Expr Stmt; Assignment : Stmt ::= IdExpr Expr; We don't want to create an instance of Stmt here it's an abstract class! Simply do not specify any #... directive ifstmt() or assignment() will create and push a (subclass of) a Stmt node callers of stmt() can use that node for AST building 55

56 Building an AST node with a token CFG: idexpr -> ID Abstract grammar abstract Expr; IdExpr : Expr ::= <ID:String>; Lang.jjt: void idexpr() #IdExpr : {Token t; { t = <ID> {jjtthis.setid(t.image); jjtthis refers to the new IdExpr node. The method setid(string) has been generated by JastAdd. 56

57 CFG: Building a List AST node program -> "begin" [ stmt (";" stmt)* ] "end" Program Lang.jjt: Abstract grammar Program ::= Stmt*; void program() #Program : { { "begin" ( [ stmt() (";" stmt())* ] ) #List(true) "end" List Stmt Stmt Stmt Desired AST A marker is put on the stack The left parenthesis causes another marker to be put on the stack One Stmt is pushed for each call to stmt(). At #List(true), a new List node is created, and all nodes above the latest marker are popped and added as children to it. The new List node is pushed. The parsing completes. A new Program node is created, and the List node (the only node above the remaining marker) is added as its child. The new Program node is pushed. 57

58 CFG: Why #List(true)? program -> "begin" [ stmt (";" stmt)* ] "end" Abstract grammar Program ::= Stmt*; Program Lang.jjt: void program() #Program : { { "begin" ( [ stmt() (";" stmt())* ] ) #List(true) "end" List Stmt Stmt Stmt Desired AST With #List(false), building the List node would be conditional: it will not be built if the list is empty. JastAdd requires the List node even if there are no Stmt nodes in it. So that's why we use #List(true) the List node will always be built. 58

59 Building an Opt AST node CFG: classdecl -> "class" classid ["extends" classid] classbody Abstract grammar ClassDecl ::= ClassId [Super:ClassId] ClassBody; Lang.jjt: void classdecl() #ClassDecl : { { "class" classid() ( ["extends" classid()] ) #Opt(true) classbody() ClassDecl Similarly here, the Opt node should always be built, regardless if there is a child or not. ClassId Opt ClassBody ClassId? Desired AST 59

60 CFG: Building Led- Recursive ASTs expr -> factor ( "*" factor "/" factor)* Lang.jjt: Abstract grammar abstract Expr; BinExpr : Expr ::= Left:Expr Right:Expr; Mul : BinExpr; Div : BinExpr; void expr() : { { factor() ( "*" factor() #Mul(2) "/" factor() #Div(2))* Div Mul IdExpr #Mul(2) creates a Mul node pops 2 nodes from the stack adds these nodes as children of the Mul node and pushes the Mul node on the stack. IdExpr IdExpr Desired AST 60

61 Returning the AST to the client CFG: start -> expr Abstract grammar Start ::= Expr; Lang.jjt: Start start() #Start : { { expr() { return jjtthis; Start Add a return type and a return statement. jjtthis refers to the Start node created by #Start. Expr Desired AST 61

62 Summary ques6ons What is the defini6on of NULLABLE, FIRST, and FOLLOW? What is a fixed- point problem? How can it be solved using itera6on? What do error messages and warnings from LL parser generators mean? What is the "dangling else"- problem, and how can it be solved? What is the difference between an abstract and a concrete grammar? What is the correspondence between an abstract grammar and an object- oriented model? Orienta6on about JastAdd abstract grammars, traversal API, and connec6on to JavaCC. What are proper6es of a good abstract grammar? What is a "seman6c ac6on"? How can JJTree be used for building ASTs? 62

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar LL(k) Compiler Construction More LL parsing Abstract syntax trees Lennart Andersson Revision 2012 01 31 2012 Related names top-down the parse tree is constructed top-down recursive descent if it is implemented

More information

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3 LL(k) Compiler Construction More LL parsing Abstract syntax trees Lennart Andersson Revision 2011 01 31 2010 Related names top-down the parse tree is constructed top-down recursive descent if it is implemented

More information

LL parsing Nullable, FIRST, and FOLLOW

LL parsing Nullable, FIRST, and FOLLOW EDAN65: Compilers LL parsing Nullable, FIRST, and FOLLOW Görel Hedin Revised: 2014-09- 22 Regular expressions Context- free grammar ATribute grammar Lexical analyzer (scanner) SyntacKc analyzer (parser)

More information

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a EDA180: Compiler Construc6on Top- down parsing Görel Hedin Revised: 2013-01- 30a Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens intermediate

More information

.jj file with actions

.jj file with actions Hand-coded parser without actions Compiler Construction Computations on ASTs Lennart Andersson Revision 2011-02-07 2011 void stmt() { switch(token) { case IF: accept(if); expr(); accept(then); stmt();

More information

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised: EDA180: Compiler Construc6on Context- free grammars Görel Hedin Revised: 2013-01- 28 Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens

More information

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing Görel Hedin Revised: 2017-09-04 This lecture Regular expressions Context-free grammar Attribute grammar

More information

Semantic analysis. Compiler Construction. An expression evaluator. Examples of computations. Computations on ASTs Aspect oriented programming

Semantic analysis. Compiler Construction. An expression evaluator. Examples of computations. Computations on ASTs Aspect oriented programming Semantic analysis Compiler Construction source code scanner semantic analysis Computations on ASTs Aspect oriented programming tokens AST with attributes Lennart Andersson parser code generation Revision

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

Visitors. Move functionality

Visitors. Move functionality Visitors Compiler Construction Visitor pattern, Semantic analysis Lennart Andersson How to modularize in Java (or any other OO language) if we do not have access to AOP mechanisms? Revision 2011-02-08

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;} Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)

More information

CA Compiler Construction

CA Compiler Construction CA4003 - Compiler Construction David Sinclair A top-down parser starts with the root of the parse tree, labelled with the goal symbol of the grammar, and repeats the following steps until the fringe of

More information

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers. Part III : Parsing From Regular to Context-Free Grammars Deriving a Parser from a Context-Free Grammar Scanners and Parsers A Parser for EBNF Left-Parsable Grammars Martin Odersky, LAMP/DI 1 From Regular

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

Context-free grammars (CFG s)

Context-free grammars (CFG s) Syntax Analysis/Parsing Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax tree (AST) AST: captures hierarchical structure of

More information

Topic 3: Syntax Analysis I

Topic 3: Syntax Analysis I Topic 3: Syntax Analysis I Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis

More information

Programming Assignment 2 LALR Parsing and Building ASTs

Programming Assignment 2 LALR Parsing and Building ASTs Lund University Computer Science Jesper Öqvist, Görel Hedin, Niklas Fors, Christoff Bürger Compilers EDAN65 2016-09-05 Programming Assignment 2 LALR Parsing and Building ASTs The goal of this assignment

More information

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012 Predictive Parsers LL(k) Parsing Can we avoid backtracking? es, if for a given input symbol and given nonterminal, we can choose the alternative appropriately. his is possible if the first terminal of

More information

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend: Course Overview Introduction (Chapter 1) Compiler Frontend: Today Lexical Analysis & Parsing (Chapter 2,3,4) Semantic Analysis (Chapter 5) Activation Records (Chapter 6) Translation to Intermediate Code

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Part 3. Syntax analysis. Syntax analysis 96

Part 3. Syntax analysis. Syntax analysis 96 Part 3 Syntax analysis Syntax analysis 96 Outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing 5. Conclusion and some practical considerations Syntax analysis 97 Structure

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax Susan Eggers 1 CSE 401 Syntax Analysis/Parsing Context-free grammars (CFG s) Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

4 (c) parsing. Parsing. Top down vs. bo5om up parsing 4 (c) parsing Parsing A grammar describes syntac2cally legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

Programming Assignment 2 LALR Parsing and Building ASTs

Programming Assignment 2 LALR Parsing and Building ASTs Lund University Computer Science Jesper Öqvist, Görel Hedin, Niklas Fors, Christoff Bürger Compilers EDAN65 2018-09-10 Programming Assignment 2 LALR Parsing and Building ASTs The goal of this assignment

More information

Syntactic Analysis. Top-Down Parsing

Syntactic Analysis. Top-Down Parsing Syntactic Analysis Top-Down Parsing Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

Building Compilers with Phoenix

Building Compilers with Phoenix Building Compilers with Phoenix Syntax-Directed Translation Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate

More information

Top down vs. bottom up parsing

Top down vs. bottom up parsing Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs

More information

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing 8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string

More information

Types of parsing. CMSC 430 Lecture 4, Page 1

Types of parsing. CMSC 430 Lecture 4, Page 1 Types of parsing Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive)

More information

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4) CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries

More information

CSCI312 Principles of Programming Languages

CSCI312 Principles of Programming Languages Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGraw-Hill

More information

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev Fall 2017-2018 Compiler Principles Lecture 2: LL parsing Roman Manevich Ben-Gurion University of the Negev 1 Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

More information

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

shift-reduce parsing

shift-reduce parsing Parsing #2 Bottom-up Parsing Rightmost derivations; use of rules from right to left Uses a stack to push symbols the concatenation of the stack symbols with the rest of the input forms a valid bottom-up

More information

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev Fall 2016-2017 Compiler Principles Lecture 2: LL parsing Roman Manevich Ben-Gurion University of the Negev 1 Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

More information

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University Fall 2014-2015 Compiler Principles Lecture 3: Parsing part 2 Roman Manevich Ben-Gurion University Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Lowering

More information

Chapter 3. Parsing #1

Chapter 3. Parsing #1 Chapter 3 Parsing #1 Parser source file get next character scanner get token parser AST token A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs)

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages. Syntax Specification CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character

More information

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

Compiler Construction

Compiler Construction Compiler Construction Introduction and overview Görel Hedin Reviderad 2013-01-22 2013 Compiler Construction 2013 F01-1 Agenda Course registration, structure, etc. Course overview Compiler Construction

More information

Defining syntax using CFGs

Defining syntax using CFGs Defining syntax using CFGs Roadmap Last time Defined context-free grammar This time CFGs for specifying a language s syntax Language membership List grammars Resolving ambiguity CFG Review G = (N,Σ,P,S)

More information

Compilers. Predictive Parsing. Alex Aiken

Compilers. Predictive Parsing. Alex Aiken Compilers Like recursive-descent but parser can predict which production to use By looking at the next fewtokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1 Building a Parser III CS164 3:30-5:00 TT 10 Evans 1 Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka

More information

Compila(on (Semester A, 2013/14)

Compila(on (Semester A, 2013/14) Compila(on 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top- Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv, Jeff Ullman, Eran

More information

JastAdd an aspect-oriented compiler construction system

JastAdd an aspect-oriented compiler construction system JastAdd an aspect-oriented compiler construction system Görel Hedin Eva Magnusson Department of Computer Science, Lund University, Sweden Abstract We describe JastAdd, a Java-based system for compiler

More information

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava. Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Abstract Syntax Trees & Top-Down Parsing Review of Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

A simple syntax-directed

A simple syntax-directed Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

Syntax Analysis Check syntax and construct abstract syntax tree

Syntax Analysis Check syntax and construct abstract syntax tree Syntax Analysis Check syntax and construct abstract syntax tree if == = ; b 0 a b Error reporting and recovery Model using context free grammars Recognize using Push down automata/table Driven Parsers

More information

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1 CSE P 501 Compilers Implementing ASTs (in Java) Hal Perkins Winter 2008 1/22/2008 2002-08 Hal Perkins & UW CSE H-1 Agenda Representing ASTs as Java objects Parser actions Operations on ASTs Modularity

More information

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1 CSE P 501 Compilers Implementing ASTs (in Java) Hal Perkins Autumn 2009 10/20/2009 2002-09 Hal Perkins & UW CSE H-1 Agenda Representing ASTs as Java objects Parser actions Operations on ASTs Modularity

More information

Context-free grammars

Context-free grammars Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

More information

Related Course Objec6ves

Related Course Objec6ves Syntax 9/18/17 1 Related Course Objec6ves Develop grammars and parsers of programming languages 9/18/17 2 Syntax And Seman6cs Programming language syntax: how programs look, their form and structure Syntax

More information

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character

More information

CMPT 379 Compilers. Parse trees

CMPT 379 Compilers. Parse trees CMPT 379 Compilers Anoop Sarkar http://www.cs.sfu.ca/~anoop 10/25/07 1 Parse trees Given an input program, we convert the text into a parse tree Moving to the backend of the compiler: we will produce intermediate

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

Compiler Construction: Parsing

Compiler Construction: Parsing Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Easy for humans to write and understand String of characters

More information

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Compiler Theory. (Semantic Analysis and Run-Time Environments) Compiler Theory (Semantic Analysis and Run-Time Environments) 005 Semantic Actions A compiler must do more than recognise whether a sentence belongs to the language of a grammar it must do something useful

More information

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2). LL(k) Grammars We need a bunch of terminology. For any terminal string a we write First k (a) is the prefix of a of length k (or all of a if its length is less than k) For any string g of terminal and

More information

CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part I CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.

More information

Defining syntax using CFGs

Defining syntax using CFGs Defining syntax using CFGs Roadmap Last 8me Defined context-free grammar This 8me CFGs for syntax design Language membership List grammars Resolving ambiguity CFG Review G = (N,Σ,P,S) means derives derives

More information

Parsing II Top-down parsing. Comp 412

Parsing II Top-down parsing. Comp 412 COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

CS 536 Midterm Exam Spring 2013

CS 536 Midterm Exam Spring 2013 CS 536 Midterm Exam Spring 2013 ID: Exam Instructions: Write your student ID (not your name) in the space provided at the top of each page of the exam. Write all your answers on the exam itself. Feel free

More information

A programming language requires two major definitions A simple one pass compiler

A programming language requires two major definitions A simple one pass compiler A programming language requires two major definitions A simple one pass compiler [Syntax: what the language looks like A context-free grammar written in BNF (Backus-Naur Form) usually suffices. [Semantics:

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1 CSE P 501 Compilers Parsing & Context-Free Grammars Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 C-1 Administrivia Project partner signup: please find a partner and fill out the signup form by noon

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

Compiler Design Concepts. Syntax Analysis

Compiler Design Concepts. Syntax Analysis Compiler Design Concepts Syntax Analysis Introduction First task is to break up the text into meaningful words called tokens. newval=oldval+12 id = id + num Token Stream Lexical Analysis Source Code (High

More information

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token

More information

SYNTAX ANALYSIS 1. Define parser. Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing. 2. Mention the basic

More information

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CS2210: Compiler Construction Syntax Analysis Syntax Analysis Comparison with Lexical Analysis The second phase of compilation Phase Input Output Lexer string of characters string of tokens Parser string of tokens Parse tree/ast What Parse Tree? CS2210: Compiler

More information

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5 Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc JavaCC: LOOKAHEAD MiniTutorial 1. WHAT IS LOOKAHEAD The job of a parser is to read an input stream and determine whether or not the input stream conforms to the grammar. This determination in its most

More information

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Parsing III. (Top-down parsing: recursive descent & LL(1) ) Parsing III (Top-down parsing: recursive descent & LL(1) ) Roadmap (Where are we?) Previously We set out to study parsing Specifying syntax Context-free grammars Ambiguity Top-down parsers Algorithm &

More information

Syntax Analysis, III Comp 412

Syntax Analysis, III Comp 412 COMP 412 FALL 2017 Syntax Analysis, III Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp

More information

Compilers Course Lecture 4: Context Free Grammars

Compilers Course Lecture 4: Context Free Grammars Compilers Course Lecture 4: Context Free Grammars Example: attempt to define simple arithmetic expressions using named regular expressions: num = [0-9]+ sum = expr "+" expr expr = "(" sum ")" num Appears

More information

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

More information

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1 CSE P 501 Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008 1/15/2008 2002-08 Hal Perkins & UW CSE C-1 Agenda for Today Parsing overview Context free grammars Ambiguous grammars Reading:

More information

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1 Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery Last modified: Mon Feb 23 10:05:56 2015 CS164: Lecture #14 1 Shift/Reduce Conflicts If a DFA state contains both [X: α aβ, b] and [Y: γ, a],

More information

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation Concepts Introduced in Chapter 2 A more detailed overview of the compilation process. Parsing Scanning Semantic Analysis Syntax-Directed Translation Intermediate Code Generation Context-Free Grammar A

More information

More Bottom-Up Parsing

More Bottom-Up Parsing More Bottom-Up Parsing Lecture 7 Dr. Sean Peisert ECS 142 Spring 2009 1 Status Project 1 Back By Wednesday (ish) savior lexer in ~cs142/s09/bin Project 2 Due Friday, Apr. 24, 11:55pm My office hours 3pm

More information