Computer Science 160 Translation of Programming Languages

Similar documents
Syntactic Analysis. Top-Down Parsing

CS 406/534 Compiler Construction Parsing Part I

Parsing Part II (Top-down parsing, left-recursion removal)

Types of parsing. CMSC 430 Lecture 4, Page 1

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax Analysis, III Comp 412

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

CSCI312 Principles of Programming Languages

Syntax Analysis, III Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Introduction to Parsing. Comp 412

Parsing II Top-down parsing. Comp 412

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

3. Parsing. Oscar Nierstrasz

CA Compiler Construction

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Parsing II Top-down parsing. Comp 412

CS 314 Principles of Programming Languages

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Chapter 4: LR Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Context-free grammars

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

4. Lexical and Syntax Analysis

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

4. Lexical and Syntax Analysis

Compiler Construction: Parsing

A programming language requires two major definitions A simple one pass compiler

Compilers. Bottom-up Parsing. (original slides by Sam

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Top down vs. bottom up parsing

Bottom-up Parser. Jungsik Choi

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Chapter 4. Lexical and Syntax Analysis

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Chapter 4: Syntax Analyzer

Concepts Introduced in Chapter 4


Compiler Construction 2016/2017 Syntax Analysis

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CMSC 330: Organization of Programming Languages

Introduction to Parsing

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

CS502: Compilers & Programming Systems

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Lexical and Syntax Analysis. Top-Down Parsing

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CSE302: Compiler Design

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

SLR parsers. LR(0) items

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Syntax Analysis Part I

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

CS 230 Programming Languages

Topic 3: Syntax Analysis I

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Lexical and Syntax Analysis. Bottom-Up Parsing

컴파일러입문 제 6 장 구문분석

The role of the parser

Chapter 3. Parsing #1

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Compiler Design Concepts. Syntax Analysis

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Lexical and Syntax Analysis

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Wednesday, August 31, Parsers

CMSC 330: Organization of Programming Languages

A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G.

Context-Free Languages and Parse Trees

LECTURE 7. Lex and Intro to Parsing

ICOM 4036 Spring 2004

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Monday, September 13, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Table-Driven Top-Down Parsers

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS 321 Programming Languages and Compilers. VI. Parsing

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Transcription:

Computer Science 160 Translation of Programming Languages Instructor: Christopher Kruegel

Top-Down Parsing

Parsing Techniques Top-down parsers (LL(1), recursive descent parsers) Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation) Pick a production and try to match the input Bad pick may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), shift-reduce parsers) Start at the leaves and grow toward root We can think of the process as reducing the input string to the start symbol At each reduction step, a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production Bottom-up parsers handle a large class of grammars

Top-down Parsing Algorithm Construct the root node of the parse tree, label it with the start symbol, and set the current-node to root node Repeat until all the input is consumed (i.e., until the frontier of the parse tree matches the input string) 1 If the label of the current node is a non-terminal node A, select a production with A on its lhs and, for each symbol on its rhs, construct the appropriate child 2 If the current node is a terminal symbol: If it matches the input string, consume it (advance the input pointer) If it does not match the input string, backtrack 3 Set the current node to the next node in the frontier of the parse tree If there is no node left in the frontier of the parse tree and input is not consumed, then backtrack The key is picking the right production in step 1 That choice should be guided by the input string

Example Using version with correct precedence and associativity 1 S Expr 2 Expr Expr + Term 3 Expr - Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor num 9 id And the input: x 2 * y

Example Let s try x 2 * y : S Rule Sentential Form Input - S x 2 * y 1 Expr x 2 * y 2 Expr + Term x 2 * y 4 Term + Term x 2 * y 7 Factor + Term x 2 * y 9 <id,x> + Term x 2 * y <id,x> + Term x 2 * y Expr Term Fact. Expr + Term <id,x>

Example Let s try x 2 * y : S Rule Sentential Form Input - S x 2 * y 1 Expr x 2 * y 2 Expr + Term x 2 * y 4 Term + Term x 2 * y 7 Factor + Term x 2 * y 9 <id,x> + Term x 2 * y <id,x> + Term x 2 * y Expr Term Fact. Expr + Term <id,x> The parser must backtrack to here Note that doesn t match +

Example Continuing with x 2 * y : S Rule Sentential Form Input - S x 2 * y 1 Expr x 2 * y 3 Expr Term x 2 * y 4 Term Term x 2 * y 7 Factor Term x 2 * y 9 <id,x> Term x 2 * y - <id,x> Term x 2 * y - <id,x> Term x 2 * y Expr Term Fact. Expr Term <id,x>

Example Continuing with x 2 * y : S Rule Sentential Form Input - S x 2 * y 1 Expr x 2 * y 3 Expr Term x 2 * y 4 Term Term x 2 * y 7 Factor Term x 2 * y 9 <id,x> Term x 2 * y - <id,x> Term x 2 * y - <id,x> Term x 2 * y Expr Term Fact. Expr Term <id,x> This time and matched We can advance past to look at 2 Now we need to extend Term, the last NT in the fringe of the parse tree

Example Trying to match the 2 in x 2 * y : S Rule Sentential Form Input <id,x> Term x 2 * y 7 <id,x> Factor x 2 * y 9 <id,x> <num,2> x 2 * y <id,x> <num,2> x 2 * y Expr Term Expr Term Fact. Where are we? num matches 2 Fact. <id,x> We have more input, but no NTs left to expand The expansion terminated too soon Need to backtrack <num,2>

Example Trying again with 2 in x 2 * y : S Rule Sentential Form Input - <id,x> Term x 2 * y 5 <id,x> Term * Factor x 2 * y 7 <id,x> Factor * Factor x 2 * y 8 <id,x> <num,2> * Factor x 2 * y - <id,x> <num,2> * Factor x 2 * y - <id,x> <num,2> * Factor x 2 * y 9 <id,x> <num,2> * <id,y> x 2 * y - <id,x> <num,2> * <id,y> x 2 * y Expr Term Fact. Expr Term Fact. Term * Fact. <id,y> <id,x> <num,2> This time, we matched and consumed all the input Success!

Another possible parse Other choices for expansion are possible Rule Sentential Form Input S x - 2 * y 1 Expr x - 2 * y 2 Expr + Term x - 2 * y 2 Expr + Term +Term x - 2 * y 2 Expr + Term + Term +Term x - 2 * y 2 Expr +Term + Term + +Term x - 2 * y consuming no input! This does not terminate Wrong choice of expansion leads to non-termination, the parser will not backtrack since it does not get to a point where it can backtrack Non-termination is a bad property for a parser to have Parser must make the right choice

Left Recursion Top-down parsers cannot handle left-recursive grammars Formally, A grammar is left recursive if there exists a non-terminal A such that there exists a derivation A + Aα, for some string α (NT T ) + Our expression grammar is left-recursive This can lead to non-termination in a top-down parser For a top-down parser, any recursion must be right recursion We would like to convert the left recursion to right recursion (without changing the language that is defined by the grammar)

Eliminating Immediate Left Recursion To remove left recursion, we can transform the grammar Consider a grammar fragment of the form A A α β where α or β are strings of terminal and non-terminal symbols and neither α nor β start with A A We can rewrite this as A β R R α R ε where R is a new non-terminal This accepts the same language, but uses only right recursion β A β α A R α A α R α R ε

Eliminating Immediate Left Recursion The expression grammar contains two cases of left recursion Expr Expr + Term Expr Term Term Term Term * Factor Term / Factor Factor Applying the transformation yields Expr Term Exprʹ Exprʹ + Term Exprʹ - Term Exprʹ ε Term Factor Termʹ Termʹ * Factor Termʹ / Factor Termʹ ε These fragments use only right recursion

Eliminating Immediate Left Recursion Substituting back into the grammar yields 1 S Expr 2 Expr Term Exprʹ 3 Exprʹ + Term Exprʹ 4 - Term Exprʹ 5 ε 6 Term Factor Termʹ 7 Termʹ * Factor Termʹ 8 / Factor Termʹ 9 ε 10 Factor num 11 id This grammar is correct, if somewhat non-intuitive. A top-down parser will terminate using it.

Left-Recursive and Right-Recursive Grammar 1 S Expr 2 Expr Expr + Term 3 Expr Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor num 9 id 1 S Expr 2 Expr Term Exprʹ 3 Exprʹ + Term Exprʹ 4 Term Exprʹ 5 ε 6 Term Factor Termʹ 7 Termʹ * Factor Termʹ 8 / Factor Termʹ 9 ε 10 Factor num 11 id

Preserves Precedence S S E E E T T E T T * F F T T E F F <id,x> <num,2> <id,y> <id,x> ε F <num,2> T ε * F T <id,y> ε

Eliminating Left Recursion The previous transformation eliminates immediate left recursion What about more general, indirect left recursion? The general algorithm (Algorithm 4.1 in the Textbook): Arrange the NTs into some order A 1, A 2,, A n for i 1 to n for j 1 to i-1 replace each production A i A j γ with A i δ 1 γ δ 2 γ δ k γ, where A j δ 1 δ 2 δ k are all the current productions for A j eliminate any immediate left recursion on A i using the direct transformation This assumes that the initial grammar has no cycles (A i + A i ), and no epsilon productions (A i ε )

Eliminating Left Recursion How does this algorithm work? 1. Impose arbitrary order on the non-terminals 2. Outer loop cycles through NT in order 3. Inner loop ensures that a production expanding A i has no non-terminal A j in its rhs, for j < i 4. Last step in outer loop converts any direct recursion on A i to right recursion using the transformation showed earlier 5. New non-terminals are added at the end of the order and have no left recursion At the start of the i th outer loop iteration For all k < i, no production that expands A k contains a non-terminal A s in its rhs, for s < k

Picking the Right Production If it picks the wrong production, a top-down parser may backtrack Alternative is to look ahead in input & use context to pick correctly How much look-ahead is needed? In general, an arbitrarily large amount Use the Cocke-Younger, Kasami, or Earley s algorithm Complexity is O( x 3 ) where x is the input string Fortunately, Large subclasses of context free grammars can be parsed efficiently with limited look-ahead Linear complexity, O( x ) where x is the input string Most programming language constructs fall in those subclasses Among the interesting subclasses are LL(1) and LR(1) grammars

Left-Recursive and Right-Recursive Grammar 1 S Expr 2 Expr Expr + Term 3 Expr Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor num 9 id 1 S Expr 2 Expr Term Exprʹ 3 Exprʹ + Term Exprʹ 4 Term Exprʹ 5 ε 6 Term Factor Termʹ 7 Termʹ * Factor Termʹ 8 / Factor Termʹ 9 ε 10 Factor num 11 id Why is this better? It is no longer left recursive so we eventually need to eat a token before we can continue expanding our parse tree. We can use this token to help us figure out which rule to apply. Enter Predictive Parsing

Predictive Parsing Basic idea Given A α β, the parser should be able to choose between α & β based on peeking at the next token in the stream FIRST sets For a string of grammar symbols α, define FIRST(α) as the set of tokens that appear as the first symbol in some string that derives from α That is, x FIRST(α) iff α * x γ, for some γ ( * means a bunch of (0 or more) productions applied in series) The LL(1) Property If A α and A β both appear in the grammar, we would like FIRST(α) FIRST(β) = This would allow the parser to make a correct choice with a look-ahead of exactly one symbol! (Pursuing this idea leads to LL(1) parser generators...)

Recursive Descent Parsing Recursive-descent parsing A top-down parsing method The term descent refers to the direction in which the parse tree is traversed (or built). Use a set of mutually recursive procedures (one procedure for each non-terminal symbol) Start the parsing process by calling the procedure that corresponds to the start symbol Each production becomes one clause in procedure We consider a special type of recursive-descent parsing called predictive parsing Use a look-ahead symbol to decide which production to use

Recursive Descent Parsing 1 S if E then S else S 2 begin S L 3 print E 4 L end 5 ; S L 6 E num = num void S() { switch(lookahead) { case IF: match(if); E(); match(then); S(); match(else); S(); break; case BEGIN: match(begin); S(); L(); break; case PRINT: match(print); E(); break; default: error(); } } void E() { match(num); match(eq); match(num); } void match(int token) { if (lookahead==token) lookahead=getnexttoken(); else error(); } void L() { switch(lookahead) { case END: match(end); break; case SEMI: match(semi); S(); L(); break; default: error(); } } void main() { lookahead=getnexttoken(); S(); match(eof); }

Execution For Input: if 2=2 then print 5=5 else print 1=1 main: call S(); S 1 : find the production for (S, IF) : S if E then S else S S 1 : match(if); S 1 : call E(); E 1 : find the production for (E, NUM): E num = num E 1 : match(num); match(eq); match(num); E 1 : return from E 1 to S 1 S 1 : match(then); S 1 :call S(); S 2 : find the production for (S, PRINT): S print E S 2 : match(print); S 2 : call E(); E 2 : find the production for (E, NUM): E num = num E 2 : match(num); match(eq); match(num); E 2 : return from E 2 to S 2 S 2 : return from S 2 to S 1 S 1 : match(else); S 1 : call S(); S 3 : find the production for (S, PRINT): S print E S 3 : match(print); S 3 : call E(); E 3 : find the production for (E, NUM): E num = num E 3 : match(num); match(eq); match(num); E 3 : return from E 2 to S 3 S 3 : return from S 3 to S 1 S 1 : return from S 1 to main main: match(eof); return success;

Left Factoring What if the grammar does not have the LL(1) property? We already learned one transformation: Removing left-recursion There is another transformation called left-factoring Left-Factoring Algorithm: A NT, find the longest prefix α that occurs in two or more right-hand sides of A if α ε then replace all of the A productions, A αβ 1 αβ 2 αβ n γ 1 γ 2 γ k, with A α Z γ 1 γ 2 γ k Z β 1 β 2 β n where Z is a new element of NT Repeat until no common prefixes remain

Left Factoring A graphical explanation for the left-factoring A αβ 1 αβ 2 αβ n A αβ 1 αβ 2 becomes αβ 3 A α Z Z β 1 β 2 β n A αz β 1 β 2 β 3

Left Factoring - Example Consider the following fragment of the expression grammar 1 Factor Id 2 Id [ ExprList ] 3 Id ( ExprList ) FIRST(rhs 1 ) = { Id } FIRST(rhs 2 ) = { Id } FIRST(rhs 3 ) = { Id } After left factoring, it becomes 1 Factor Id Arguments 2 Arguments [ ExprList ] 3 ( ExprList ) 4 ε This grammar accepts the same language, and it has the the LL(1) property FIRST(rhs 1 ) = { Id } FIRST(rhs 2 ) = { [ } FIRST(rhs 3 ) = { ( } FIRST(rhs 4 ) =? (Intuitively, we can think of the FOLLOW of Arguments as the first of rhs 4 ) FOLLOW(Arguments)=FOLLOW(Factor) = { $ } They are all distinct Grammar has the LL(1) property