Abstract Syntax Trees & Top-Down Parsing

Similar documents
Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Top down vs. bottom up parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Compilers. Predictive Parsing. Alex Aiken

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Ambiguity and Errors Syntax-Directed Translation

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Extra Credit Question

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

A programming language requires two major definitions A simple one pass compiler

Table-Driven Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

LL Parsing: A piece of cake after LR

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lexical and Syntax Analysis

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Lexical and Syntax Analysis. Top-Down Parsing

CS502: Compilers & Programming Systems

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Monday, September 13, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Syntactic Analysis. Top-Down Parsing

CS 314 Principles of Programming Languages

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CSCI312 Principles of Programming Languages

Lecture 14 Sections Mon, Mar 2, 2009

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Syntax-Directed Translation. Lecture 14

Abstract Syntax Trees Synthetic and Inherited Attributes

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

CA Compiler Construction

CS Parsing 1

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Building Compilers with Phoenix

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing Part II (Top-down parsing, left-recursion removal)


Compila(on (Semester A, 2013/14)

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Wednesday, August 31, Parsers

CMSC 330: Organization of Programming Languages

LR Parsing LALR Parser Generators

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

A Simple Syntax-Directed Translator

3. Parsing. Oscar Nierstrasz

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

LL parsing Nullable, FIRST, and FOLLOW

Topic 3: Syntax Analysis I

More Bottom-Up Parsing

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Introduction to Parsing. Lecture 8

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Lecture Notes on Top-Down Predictive LL Parsing

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CSE 3302 Programming Languages Lecture 2: Syntax

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

A simple syntax-directed

Chapter 3. Parsing #1

Context-Free Grammars

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CMSC 330: Organization of Programming Languages

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Syntax-Directed Translation Part I

Context-free grammars

CS 321 Programming Languages and Compilers. VI. Parsing

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Syntax-Directed Translation

VIVA QUESTIONS WITH ANSWERS

The procedure attempts to "match" the right hand side of some production for a nonterminal.

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Compiler Construction: Parsing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Transcription:

Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree of s describes how s L(G) Ambiguity: more than one parse tree (possible interpretation) for some string s Error: no parse tree for some string s How do we construct the parse tree? 2 Abstract Syntax Trees So far, a parser traces the derivation of a sequence of tokens The rest of the compiler needs a structural representation of the program Abstract syntax trees Like parse trees but ignore some details Abbreviated as AST Abstract Syntax Trees (Cont.) Consider the grammar E int ( E ) E + E And the string 5 + (2 + 3) After lexical analysis (a list of tokens) int 5 + ( int 2 + int 3 ) During parsing we build a parse tree 3 4

Example of Parse Tree Example of Abstract Syntax Tree E int 5 E + E ( E ) E + E Traces the operation of the parser Captures the nesting structure But too much info Parentheses Single-successor nodes PLUS PLUS 5 2 3 Also captures the nesting structure But abstracts from the concrete syntax a more compact and easier to use int 2 int 3 An important data structure in a compiler 5 6 Semantic Actions This is what we will use to construct ASTs Each grammar symbol may have attributes An attribute is a property of a programming language construct For terminal symbols (lexical tokens) attributes can be calculated by the lexer Each production may have an action Written as: X Y 1 Y n { action } That can refer to or compute symbol attributes Semantic Actions: An Example Consider the grammar E int E + E ( E ) For each symbol X define an attribute X.val For terminals, val is the associated lexeme For non-terminals, val is the expression s value (which is computed from values of subexpressions) We annotate the grammar with actions: E int { E.val = int.val } E 1 + E 2 { E.val = E 1.val + E 2.val } ( E 1 ) { E.val = E 1.val } 7 8

Semantic Actions: An Example (Cont.) String: 5 + (2 + 3) Tokens: int 5 + ( int 2 + int 3 ) Productions Equations E E 1 + E 2 E.val = E 1.val + E 2.val E 1 int 5 E 1.val = int 5.val = 5 E 2 (E 3 ) E 2.val = E 3.val E 3 E 4 + E 5 E 3.val = E 4.val + E 5.val E 4 int 2 E 4.val = int 2.val = 2 E 5 int 3 E 5.val = int 3.val = 3 Semantic Actions: Dependencies Semantic actions specify a system of equations Order of executing the actions is not specified Example: E 3.val = E 4.val + E 5.val Must compute E 4.val and E 5.val before E 3.val We say that E 3.val depends on E 4.val and E 5.val The parser must find the order of evaluation 9 10 Dependency Graph E E 1 + E 2 int 5 5 + ( E 3 + ) E 4 + E 5 Each node labeled with a non-terminal E has one slot for its val attribute Note the dependencies Evaluating Attributes An attribute must be computed after all its successors in the dependency graph have been computed In the previous example attributes can be computed bottom-up Such an order exists when there are no cycles Cyclically defined attributes are not legal int 2 2 int 3 3 11 12

Semantic Actions: Notes (Cont.) Synthesized attributes Calculated from attributes of descendents in the parse tree E.val is a synthesized attribute Can always be calculated in a bottom-up order Inherited Attributes Another kind of attributes Calculated from attributes of the parent node(s) and/or siblings in the parse tree Example: a line calculator Grammars with only synthesized attributes are called S-attributed grammars Most frequent kinds of grammars 13 14 A Line Calculator Each line contains an expression E int E + E Each line is terminated with the = sign L E = + E = In the second form, the value of evaluation of the previous line is used as starting value A program is a sequence of lines P ε P L Attributes for the Line Calculator Each E has a synthesized attribute val Calculated as before Each L has a synthesized attribute val L E = { L.val = E.val } + E = { L.val = E.val + L.prev } We need the value of the previous line We use an inherited attribute L.prev 15 16

Attributes for the Line Calculator (Cont.) Example of Inherited Attributes Each P has a synthesized attribute val The value of its last line P ε { P.val = 0 } P 1 L { P.val = L.val; L.prev = P 1.val } Each L has an inherited attribute prev L.prev is inherited from sibling P 1.val Example P ε 0 P L + E = 1 + E 2 + E 3 int 2 2 int 3 3 + val synthesized prev inherited All can be computed in depth-first order 17 18 Semantic Actions: Notes (Cont.) Semantic actions can be used to build ASTs And many other things as well Also used for type checking, code generation, Constructing an AST We first define the AST data type Consider an abstract tree type with two constructors: Process is called syntax-directed translation mkleaf(n) = n Substantial generalization over CFGs mkplus(, ) = PLUS T 1 T 2 T 1 T 2 19 20

Constructing a Parse Tree We define a synthesized attribute ast Values of ast values are ASTs We assume that int.lexval is the value of the integer lexeme Computed using semantic actions Parse Tree Example Consider the string int 5 + ( int 2 + int 3 ) A bottom-up evaluation of the ast attribute: E.ast = mkplus(mkleaf(5), mkplus(mkleaf(2), mkleaf(3)) E int { E.ast = mkleaf(int.lexval) } E 1 + E 2 { E.ast = mkplus(e 1.ast, E 2.ast) } ( E 1 ) { E.ast = E 1.ast } PLUS PLUS 5 2 3 21 22 Review of Abstract Syntax Trees We can specify language syntax using CFG A parser will answer whether s L(G) and will build a parse tree which we convert to an AST and pass on to the rest of the compiler Next two & a half lectures: How do we answer s L(G) and build a parse tree? After that: from AST to assembly language Second-Half of Lecture: Outline Implementation of parsers Two approaches Top-down Bottom-up These slides: Top-Down Easier to understand and program manually Then: Bottom-Up More powerful and used by most parser generators 23 24

Introduction to Top-Down Parsing Recursive Descent Parsing: Example Terminals are seen in order of appearance in the token stream: A Consider the grammar E T + E T T ( E ) int int * T t 2 t 5 t 6 t 8 t 9 The parse tree is constructed From the top From left to right t 2 t 5 C B t 6 D t 8 t 9 Token stream is: int 5 * int 2 Start with top-level non-terminal E Try the rules for E in order 25 26 Recursive Descent Parsing: Example (Cont.) Try E 0 T 1 + E 2 Then try a rule for T 1 ( E 3 ) But ( does not match input token int 5 Try T 1 int. Token matches. But + after T 1 does not match input token * Try T 1 int * T 2 This will match and will consume the two tokens. Try T 2 int (matches) but + after T 1 will be unmatched Try T 2 int * T 3 but * does not match with end-of-input Has exhausted the choices for T 1 Backtrack to choice for E 0 Token stream: int5 * int2 E T + E T Recursive Descent Parsing: Example (Cont.) Try E 0 T 1 Follow same steps as before for T 1 And succeed with T 1 int 5 * T 2 and T 2 int 2 With the following parse tree E 0 T 1 int 5 * T 2 int 2 Token stream: int5 * int2 E T + E T T (E) int int * T T (E) int int * T27 28

Recursive Descent Parsing: Notes Easy to implement by hand Somewhat inefficient (due to backtracking) But does not always work When Recursive Descent Does Not Work Consider a production S S a bool S 1 () { return S() && term(a); } bool S() { return S 1 (); } S() will get into an infinite loop A left-recursive grammar has a non-terminal S S + Sα for some α Recursive descent does not work in such cases It goes into an infinite loop 29 30 Elimination of Left Recursion Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by any number of α s The grammar can be rewritten using rightrecursion S βs S αs ε More Elimination of Left-Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ε 31 32

General Left Recursion The grammar S A α δ A S β is also left-recursive because S + S β α This left-recursion can also be eliminated Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar [See a Compilers book for a general algorithm] 33 34 Predictive Parsers Like recursive-descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input L means leftmost derivation k means predict based on k tokens of lookahead In practice, LL(1) is used LL(1) Languages In recursive-descent, for each non-terminal and input token there may be a choice of productions LL(1) means that for each non-terminal and token there is only one production that could lead to success Can be specified via 2D tables One dimension for current non-terminal to expand One dimension for next token A table entry contains one production 35 36

Predictive Parsing and Left Factoring Recall the grammar for arithmetic expressions E T + E T T ( E ) int int * T Left-Factoring Example Recall the grammar E T + E T T ( E ) int int * T Hard to predict because For T two productions start with int For E it is not clear how to predict A grammar must be left-factored before it is used for predictive parsing Factor out common prefixes of productions E T X X + E ε T ( E ) int Y Y * T ε This grammar is equivalent to the original one 37 38 LL(1) Parsing Table Example Left-factored grammar E T X X + E ε T ( E ) int Y Y * T ε The LL(1) parsing table ($ is the end marker): int * + ( ) $ E T X T X X + E ε ε T int Y ( E ) Y * T ε ε ε LL(1) Parsing Table Example (Cont.) Consider the [E, int] entry When current non-terminal is E and next input is int, use production E T X This production can generate an int in the first place Consider the [Y,+] entry When current non-terminal is Y and current token is +, get rid of Y Y can be followed by + only in a derivation in which Y ε 39 40

LL(1) Parsing Tables: Errors Blank entries indicate error situations Consider the [E,*] entry There is no way to derive a string starting with * from non-terminal E Using Parsing Tables Method similar to recursive descent, except For each non-terminal X We look at the next token a And choose the production shown at [X,a] We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input 41 42 LL(1) Parsing Algorithm initialize stack <S $> and next repeat case stack of <X, rest> : if T[X,*next] == Y 1 Y n then stack <Y 1 Y n rest>; else error(); <t, rest> : if t == *next++ then stack <rest>; else error(); until stack == <> LL(1) Parsing Example Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT int * + ( ) $ E T X T X X + E ε ε T int Y ( E ) 43 Y * T ε ε ε44

Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm where no table entry is multiply defined Once we have the table The parsing is simple and fast No backtracking is necessary Constructing Parsing Tables (Cont.) If A α, where in the line of A do we place α? In the column of t where t can start a string derived from α α * t β We say that t First(α) In the column of t if α is ε and t can follow an A S * β A t δ We say t Follow(A) We want to generate parsing tables from CFG 45 46 Computing First Sets Definition First(X) = { t X * tα} {ε X * ε} Algorithm sketch 1. First(t) = { t } 2. ε First(X) if X εis a production 3. ε First(X) if X A 1 A n and ε First(A i ) for each 1 i n 4. First(α) First(X) if X A 1 A n α and ε First(A i ) for each 1 i n Computing First Sets Definition First(X) = { t X * tα} {ε X * ε} More constructive algorithm 1. First(t) = { t } 2. For all productions X A 1 A n Add First(A 1 ) {ε} to First(X). Stop if ε First(A 1 ). Add First(A 2 ) {ε} to First(X). Stop if ε First(A 2 ).... Add First(A n ) {ε} to First(X). Stop if ε First(A n ). Add {ε} to First(X). 47 48

First Sets: Example Recall the grammar E T X T ( E ) int Y First sets First( ( ) = { ( } First( ) ) = { ) } First( int ) = { int } First( + ) = { + } First( * ) = { * } X + E ε Y * T ε First( T ) = { int, ( } First( E ) = { int, ( } First( X ) = { +, ε } First( Y ) = { *, ε } Computing Follow Sets Definition Intuition Follow(X) = { t S * β X t δ } If X A B then First(B) Follow(A) and Follow(X) Follow(B) Also if B * ε then Follow(X) Follow(A) If S is the start symbol then $ Follow(S) 49 50 Computing Follow Sets (Cont.) Computing Follow Sets (Cont.) Algorithm sketch 1. $ Follow(S) 2. First(β) - {ε} Follow(X) For each production A αx β 3. Follow(A) Follow(X) For each production A αx β where ε First(β) 51 Definition Follow(X) = { t S * β X t δ } More constructive algorithm 1. First compute the First sets for all non-terminals 2. If S is the start symbol, add $ to Follow(S) 3. For all productions Y... X A 1 A n Add First(A 1 ) {ε} to Follow(X). Stop if ε First(A 1 ). Add First(A 2 ) {ε} to Follow(X). Stop if ε First(A 2 ).... Add First(A n ) {ε} to Follow(X). Stop if ε First(A n ). Add Follow(Y) to Follow(X). 52

Follow Sets: Example Recall the grammar E T X X + E ε T ( E ) int Y Y * T ε Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = { ), $ } Follow( X ) = { $, ) } Follow( T ) = { +, ), $ } Follow( ) ) = { +, ), $ } Follow( Y ) = { +, ), $ } Follow( int) = { *, +, ), $ } Constructing LL(1) Parsing Tables Construct a parsing table T for CFG G For each production A αin G do: For each terminal t First(α) do T[A, t] = α If ε First(α), for each t Follow(A) do T[A, t] = α If ε First(α) and $ Follow(A) do T[A, $] = α 53 54 Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language grammars are not LL(1) There are tools that build LL(1) tables Review For some grammars there is a simple parsing strategy Predictive parsing (LL(1)) Next time: a more powerful parsing strategy 55 56