CSCI312 Principles of Programming Languages

Similar documents
Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntactic Analysis. Top-Down Parsing

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis, III Comp 412

Parsing Part II (Top-down parsing, left-recursion removal)

Introduction to Parsing. Comp 412

Syntax Analysis, III Comp 412

Parsing II Top-down parsing. Comp 412

CS 406/534 Compiler Construction Parsing Part I

Types of parsing. CMSC 430 Lecture 4, Page 1

Computer Science 160 Translation of Programming Languages

CA Compiler Construction

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

3. Parsing. Oscar Nierstrasz

Parsing II Top-down parsing. Comp 412

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Top down vs. bottom up parsing

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Chapter 3. Parsing #1

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

CS 314 Principles of Programming Languages

Chapter 4: LR Parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

CS502: Compilers & Programming Systems

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Bottom-up Parser. Jungsik Choi

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Compilers. Predictive Parsing. Alex Aiken

Table-Driven Top-Down Parsers

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Monday, September 13, Parsers

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10


A programming language requires two major definitions A simple one pass compiler

CS 4120 Introduction to Compilers

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Chapter 4. Lexical and Syntax Analysis

The procedure attempts to "match" the right hand side of some production for a nonterminal.

4. Lexical and Syntax Analysis

Context-free grammars

Compiler Construction 2016/2017 Syntax Analysis

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

4. Lexical and Syntax Analysis

Compiler Construction: Parsing

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Lexical and Syntax Analysis. Top-Down Parsing

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

CSE 3302 Programming Languages Lecture 2: Syntax

How do LL(1) Parsers Build Syntax Trees?

CMSC 330: Organization of Programming Languages

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Wednesday, September 9, 15. Parsers

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Table-Driven Parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Bottom-Up Parsing. Lecture 11-12

Concepts Introduced in Chapter 4

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CIT 3136 Lecture 7. Top-Down Parsing

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Bottom-Up Parsing. Lecture 11-12

Lexical and Syntax Analysis

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Wednesday, August 31, Parsers

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Transcription:

Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University

Recap Copyright 2006 The McGraw-Hill Companies, Inc.

Copyright 2006 The McGraw-Hill Companies, Inc. Syntactic Analysis Phase also known as: parser Purpose is to recognize source structure Input: tokens Output: parse tree or abstract syntax tree A recursive descent parser is one in which each nonterminal in the grammar is converted to a function which recognizes input derivable from the nonterminal.

Copyright 2006 The McGraw-Hill Companies, Inc. Terms Nullable non-terminals First set

Copyright 2006 The McGraw-Hill Companies, Inc. T(EBNF) = Code: A w 1 If w is nonterminal, call it. 2 If w is terminal, match it against given token. 3 If w is { w' }: while (token in First(w')) T(w') 4 If w is: w1... wn, switch (token) { case First(w1): T(w1); break;... case First(wn): T(wn); break;

Copyright 2006 The McGraw-Hill Companies, Inc. 5 Switch (cont.): If some wi is empty, use: default: break; Otherwise default: error(token); 6 If w = [ w' ], rewrite as ( w' ) and use rule 4. 7 If w = X1... Xn, T(w) = T(X1);... T(Xn);

Copyright 2006 The McGraw-Hill Companies, Inc. Outline See more general problems in a top down parser Backtracking select appropriate productions Left recursion revise grammars Predictive parsing more than recursive descent Table-driven parsing

Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad pick may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars

Top-down Parsing A top-down parser starts with the root of the parse tree The root node is labeled with the goal symbol of the grammar Top-down parsing algorithm: Construct the root node of the parse tree Repeat until lower fringe of the parse tree matches the input string 1 At a node labeled A, select a production with A on its lhs and, for each symbol on its rhs, construct the appropriate child 2 When a terminal symbol is added to the fringe and it doesn t match the fringe, backtrack 3 Find the next node to be expanded (label NT) The key is picking the right production in step 1 That choice should be guided by the input string

Remember the expression grammar? We will call this version the classic expression grammar from last lecture 0 Goal Expr 1 Expr Expr + Term 2 Expr - Term 3 Term 4 Term Term * Factor 5 Term / Factor 6 Factor 7 Factor ( Expr ) 8 number 9 id And the input x 2 * y

Example Let s try x 2 * y : Goal Rule Sentential Form Input Goal x - 2 * y is the position in the input buffer

Example Let s try x 2 * y : Goal Rule Sentential Form Input Goal x - 2 * y Expr 0 Expr x - 2 * y Expr + Term 1 Expr +Term x - 2 * y 3 Term +Term x - 2 * y Term 6 Factor +Term x - 2 * y 9 <id,x> +Term x - 2 * y <id,x> +Term x - 2 * y Fact. <id,x> This worked well, except that doesn t match + The parser must backtrack to here is the position in the input buffer

Example Continuing with x 2 * y : Goal Rule Sentential Form Input Goal x - 2 * y Expr 0 Expr x - 2 * y 2 Expr -Term x - 2 * y Expr Term 3 Term -Term x - 2 * y 6 Factor -Term x - 2 * y Term 9 <id,x> - Term x - 2 * y <id,x> -Term x - 2 * y Fact. <id,x> -Term x - 2 * y <id,x> Now, - and - match Now we can expand Term to match 2 Now, we need to expand Term - the last NT on the fringe

Example Trying to match the 2 in x 2 * y : Goal Rule Sentential Form Input <id,x> - Term x - 2 * y Expr 6 <id,x> - Factor x - 2 * y Expr - Term 8 <id,x> - <num,2> x - 2 * y <id,x> - <num,2> x - 2 * y Term Fact. Where are we? 2 matches 2 Fact. <id,x> We have more input, but no NTs left to expand The expansion terminated too soon Need to backtrack <num,2>

Example Trying again with 2 in x 2 * y : Goal Rule Sentential Form Input <id,x> - Term x - 2 * y Expr 4 <id,x> - Term * Factor x - 2 * y Expr Term 6 <id,x> - Factor * Factor x - 2 * y 8 <id,x> - <num,2> * Factor x - 2 * y <id,x> - <num,2> * Factor x - 2 * y <id,x> - <num,2> * Factor x - 2 * y Term Fact. Term Fact. * Fact. <id,y> 9 <id,x> - <num,2> * <id,y> x - 2 * y <id,x> - <num,2> * <id,y> x - 2 * y <id,x> <num,2> The Point: This time, we matched & consumed all the input The parser must make the right choice when it expands a NT. Success! Wrong choices lead to wasted effort.

Another possible parse Other choices for expansion are possible!! Rule Sentential Form Input Goal x - 2 * y 0 Expr x - 2 * y 1 Expr +Term x - 2 * y 1 Expr + Term +Term x - 2 * y 1 Expr + Term +Term + Term x - 2 * y 1 And so on. x - 2 * y Consumes no input! This expansion doesn t terminate Wrong choice of expansion leads to non-termination Non-termination is a bad property for a parser to have Parser must make the right choice

Left Recursion! Top-down parsers cannot handle left-recursive grammars Formally, A grammar is left recursive if A NT such that a derivation A + Aα, for some string α (NT T ) + Our classic expression grammar is left recursive This can lead to non-termination in a top-down parser In a top-down parser, any recursion must be right recursion We would like to convert the left recursion to right recursion Non-termination is always a bad property in a compiler

Eliminating Left Recursion To remove left recursion, we can transform the grammar Consider a grammar fragment of the form Fee Fee α β where neither α nor β start with Fee We can rewrite this fragment as Fee β Fie Fie α Fie ε where Fie is a new non-terminal The new grammar defines the same language as the old grammar, using only right recursion. Added a reference to the empty string

Eliminating Left Recursion The expression grammar contains two cases of left recursion Expr Expr + Term Term Term * Factor Expr - Term Term * Factor Term Factor Applying the transformation yields Expr Term Expr Term Factor Term Expr + Term Expr - Term Expr ε Term * Factor Term / Factor Term ε These fragments use only right recursion Right recursion often means right associativity. In this case, the grammar does not display any particular associative bias. Comp 412, Fall 2010 13

Picking the Right Production! If it picks the wrong production, a top-down parser may backtrack Alternative is to look ahead in input & use context to pick correctly How much lookahead is needed? In general, an arbitrarily large amount Fortunately, Large subclasses of CFGs can be parsed with limited lookahead Most programming language constructs fall in those subclasses Among the interesting subclasses are LL(1) and LR(1) grammars We will focus, for now, on LL(1) grammars & predictive parsing

Predictive Parsing Basic idea Given A α β, the parser should be able to choose between α & β FIRST sets For some rhs α G, define FIRST(α) as the set of tokens that appear as the first symbol in some string that derives from α That is, x FIRST(α) iff α * x γ, for some γ

Predictive Parsing Basic idea Given A α β, the parser should be able to choose between α & β FIRST sets For some rhs α G, define FIRST(α) as the set of tokens that appear as the first symbol in some string that derives from α That is, x FIRST(α) iff α * x γ, for some γ The LL(1) Property If A α and A β both appear in the grammar, we would like FIRST(α) FIRST(β) = This would allow the parser to make a correct choice with a lookahead of exactly one symbol! This is almost correct See the next slide

Predictive Parsing What about ε-productions? They complicate the definition of LL(1) If A α and A β and ε FIRST(α), then we need to ensure that FIRST(β) is disjoint from FOLLOW(A), too, where FOLLOW(A) = the set of terminal symbols that can immediately follow A in a sentential form Define FIRST + (A α) as FIRST(α) FOLLOW(A), if ε FIRST(α) FIRST(α), otherwise Then, a grammar is LL(1) iff A α and A β implies FIRST + (A α) FIRST + (A β) =

Recursive Descent Parsing Recall the expression grammar, after transformation 0 Goal 1 Expr 2 Expr Expr Term Expr + Term Expr 3 - Term Expr 4 ε 5 Term 6 Term Factor Term * Factor Term 7 / Factor Term 8 ε 9 Factor ( Expr ) 10 number 11 id This produces a parser with six mutually recursive routines: Goal Expr EPrime Term TPrime Factor Each recognizes one NT or T The term descent refers to the direction in which the parse tree is built.

What If My Grammar Is Not LL(1)? Can we transform a non-ll(1) grammar into an LL(1) grammar? In general, the answer is no In some cases, however, the answer is yes Assume a grammar G with productions A α β 1 and A α β 2 If α derives anything other than ε, then FIRST + (A α β 1 ) FIRST+ (A α β 2 ) And the grammar is not LL(1) If we pull the common prefix, α, into a separate production, we may make the grammar LL(1). A α A, A β 1 and A β 2 Now, if FIRST + (A β 1 ) FIRST+ (A β 2 ) =, G may be LL(1)

What If My Grammar Is Not LL(1)? Left Factoring!!!!!!!! For each nonterminal A find the longest prefix α common to 2 or more alternatives for A if α ε then replace all of the productions A α β 1 α β 2 α β 3 α β n γ with A α A γ A β 1 β 2 β 3 β n Repeat until no nonterminal has alternative rhs with a common prefix This transformation makes some grammars into LL(1) grammars There are languages for which no LL(1) grammar exists

Left Factoring Example Consider a simple right-recursive expression grammar 0 Goal Expr 1 Expr Term + Expr 2 Term - Expr 3 Term 4 Term Factor * Term 5 Factor / Term 6 Factor 7 Factor number 8 id

Left Factoring Example After Left Factoring, we have 0 Goal Expr 1 Expr Term Expr 2 Expr + Expr 3 - Expr 4 ε 5 Term Factor Term 6 Term * Term 7 / Term 8 ε 9 Factor number 10 id This transformation makes some grammars into LL(1) grammars. There are languages for which no LL(1) grammar exists.

FIRST and FOLLOW Sets FIRST(α) For some α (T NT )*, define FIRST(α) as the set of tokens that appear as the first symbol in some string that derives from α That is, x FIRST(α) iff α * x γ, for some γ FOLLOW(A) For some A NT, define FOLLOW(A) as the set of symbols that can occur immediately after A in a valid sentential form FOLLOW(S) = {EOF}, where S is the start symbol To build FOLLOW sets, we need FIRST sets

Computing FIRST Sets Already studied in previous lectures

Computing FOLLOW Sets for each A NT, FOLLOW(A) Ø FOLLOW(S) {EOF } while (FOLLOW sets are still changing) for each p P, of the form A B 1 B 2 B k TRAILER FOLLOW(A) for i k down to 1 if B i NT then // domain check FOLLOW(B i ) FOLLOW(B i ) TRAILER if ε FIRST(B i ) // add right context then TRAILER TRAILER ( FIRST(B i ) { ε } ) else TRAILER FIRST(B i ) // no ε => no right context else TRAILER {B i } // B i T => only 1 symbol

Classic Expression Grammar 0 Goal Expr 1 Expr Term Expr 2 Expr + Term Expr 3 - Term Expr 4 ε 5 Term Factor Term 6 Term * Factor 7 / Factor 8 ε 9 Factor number 10 id 11 ( Expr ) FIRST + (A β ) is identical to FIRST(β ) except for productiond 4 and 8 Comp 412, Fall 2010 FIRST + (Expr ε) is {ε,), eof} Symbol FIRST FOLLOW num num Ø id id Ø + + Ø - - Ø * * Ø / / Ø ( ( Ø ) ) Ø eof eof Ø ε ε Ø Goal (,id,num eof Expr (,id,num ), eof Expr +, -, ε ), eof Term (,id,num +, -, ), eof Term *, /, ε +,-, ), eof Factor (,id,num +,-,*,/,),eof 14

Classic Expression Grammar 0 Goal Expr 1 Expr Term Expr 2 Expr + Term Expr 3 - Term Expr 4 ε 5 Term Factor Term 6 Term * Factor 7 / Factor 8 ε 9 Factor number 10 id 11 ( Expr ) Prod n FIRST+ 0 (,id,num 1 (,id,num 2 + 3-4 ε,), eof 5 (,id,num 6 * 7 / 8 ε,+,-, ), eof 9 number 10 id 11 (

Building Top-down Parsers for LL(1) Grammars Given an LL(1) grammar, and its FIRST & FOLLOW sets Emit a routine for each non-terminal Nest of if-then-else statements to check alternate rhs s Each returns true on success and throws an error on false Simple, working (perhaps ugly) code This automatically constructs a recursive-descent parser Improving matters Nest of if-then-else statements may be slow Good case statement implementation would be better What about a table to encode the options? Interpret the table with a skeleton, as we did in scanning I don t know of a system that does this

Building Top-down Parsers Strategy Encode knowledge in a table Use a standard skeleton parser to interpret the table Example The non-terminal Factor has 3 expansions ( Expr ) or Identifier or Number Table might look like: Terminal Symbols 0 Goal Expr 1 Expr Term Expr 2 Expr + Term Expr 3 - Term Expr 4 ε 5 Term Factor Term 6 Term * Factor Term 7 / Factor Term 8 ε 9 Factor number 10 id 11 ( Expr ) Nonterminal Symbols + - * / Id. Num. ( ) EOF Factor 10 9 11 Cannot expand Factor into an operator error Expand Factor by rule 9 with input number

Building Top-down Parsers Building the complete table Need a row for every NT & a column for every T

LL(1) Expression Parsing Table Row we built earlier + * / Id Num ( ) EOF Goal 0 0 0 Expr 1 1 1 Expr 2 3 4 4 Term 5 5 5 Term 8 8 6 7 8 8 Factor 10 9 11

Building Top-down Parsers Building the complete table Need a row for every NT & a column for every T Need an interpreter for the table (skeleton parser)

LL(1) Skeleton Parser word NextWord() // Initial conditions, including push EOF onto Stack // a stack to track local goals push the start symbol, S, onto Stack TOS top of Stack loop forever if TOS = EOF and word = EOF then break & report success // exit on success else if TOS is a terminal then if TOS matches word then pop Stack // recognized TOS word NextWord() else report error looking for TOS // error exit else // TOS is a non-terminal if TABLE[TOS,word] is A B 1 B 2 B k then pop Stack // get rid of A push B k, B k-1,, B 1 // in that order else break & report error expanding TOS TOS top of Stack

Building Top-down Parsers Building the complete table Need a row for every NT & a column for every T Need a table-driven interpreter for the table Need an algorithm to build the table Filling in TABLE[X,y], X NT, y T 1. entry is the rule X β, if y FIRST + (X β) entry is error if rule 1 does not define If any entry has more than one rule, G is not LL(1)! We call this algorithm the LL(1) table construction algorithm