Syntax Analysis Top Down Parsing

Similar documents
Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Chapter 4: Syntax Analyzer

CSE302: Compiler Design

Syntax Analysis. Chapter 4

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

CSE431 Translation of Computer Languages

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017


COMP Logic for Computer Scientists. Lecture 23

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Context-free grammars

A programming language requires two major definitions A simple one pass compiler

Compiler Construction: Parsing

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Top down vs. bottom up parsing

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

Syntax Analysis Part I

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

UNIT III & IV. Bottom up parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Lexical and Syntax Analysis (2)

Building Compilers with Phoenix

Question Points Score

Compilers. Predictive Parsing. Alex Aiken

VIVA QUESTIONS WITH ANSWERS

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

CS 230 Programming Languages

Chapter 4. Lexical and Syntax Analysis

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Stacks & Queues. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Parsing II Top-down parsing. Comp 412

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Table-Driven Top-Down Parsers

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

CSCI312 Principles of Programming Languages

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

3. Context-free grammars & parsing

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

3. Parsing. Oscar Nierstrasz

Chapter 3. Describing Syntax and Semantics ISBN

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Chapter 3. Parsing #1

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

Using an LALR(1) Parser Generator

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

CMSC 330: Organization of Programming Languages


Topdown parsing with backtracking

Yacc: A Syntactic Analysers Generator

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Lexical and Syntax Analysis. Top-Down Parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Compiler Design Aug 1996

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

CA Compiler Construction

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lexical and Syntax Analysis

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CPS 506 Comparative Programming Languages. Syntax Specification

CSE 3302 Programming Languages Lecture 2: Syntax

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS 4120 Introduction to Compilers

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Syntax-Directed Translation. Lecture 14

LECTURE 7. Lex and Intro to Parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

The procedure attempts to "match" the right hand side of some production for a nonterminal.

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

LANGUAGE PROCESSORS. Introduction to Language processor:

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS502: Compilers & Programming Systems

EECS 6083 Intro to Parsing Context Free Grammars

Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Lecture 8: Deterministic Bottom-Up Parsing

CIT 3136 Lecture 7. Top-Down Parsing

COMP3131/9102: Programming Languages and Compilers

Transcription:

Syntax Analysis Top Down Parsing CMPSC 470 Lecture 05 Topics: Overview Recursive-descent parser First and Follow A. Overview Top-down parsing constructs parse tree for input string from root and creating node of parse tree in preorder (depth-first, Left-Visit-Right). Topdown parsing can be viewed as finding a left most derivation for an input. Example) Consider the following grammar: EE TTEE TT FFTT FF iiii EE +TTEE εε TT FFTT εε Top-down parser creates parse tree using the following steps repeatedly. Input: 1. Determine the production to be applied for nonterminal, say AA 2. Once AA-production is selected, match the terminal symbol in the production body with the input string, and advance (move to next) token in input string Recursive predictive parser Error recovery

Top-down parser includes recursive-descent parser and recursive-predictive parser that uses LL(1) grammar B. Recursive-Descent Parser In recursive descent parser, each nonterminal become a procedure (or function). This requires backtracking. Following example shows how the parser can be implemented, and how backtracking is handled. Parser Function: Example) Consider the following grammar: SS ccaadd AA aaaabb aa Its corresponding recursive-descent parser can be: S() 1. x input pointer location 2. // production SS ccaadd 3. match cc with input symbol (current token) and advance input pointer (move to next token) 4. call A() 5. match dd, and advance 6. if all lines 2-4 succeed, return success 7. // no more production rule 8. return fail A() 1. x input point location 2. // production AA aaaabb 3. match aa, and advance 4. call A() 5. match bb, and advance 6. if all lines 3-5 succeed, return success 7. // production AA aa 8. Reset input point location to x 9. Match aa, and advance 10. if line 9 succeeds, return success 11. // no more production rule 12. return fail

Parsing: Parsing starts by calling the procedure for starting symbol: S(). It requires backtracking. Parsing steps with a given input ww = "cccccc" 1. Call S()

Example2) How to implement the following production? EE +TTEE εε E () 1. x input pointer location 2. // production EE +TTEE 3. match +, and advance 4. call T() 5. call E () 6. if all lines 3-5 succeed, return success 7. // production EE εε 8. Note: C. First and Follow Recursive-descent parser requires backtracking, which is time consuming. This can be improved by using recursive-predictive parser. First() and Follow() are functions used in construction top-down (recursive-predictive) and bottom-up parsers, which do not require backtracking. In the top-down parsing, First and Follow help to choose production.

Definition: First(αα) First(αα) is the set of terminals that begin strings derived from αα. Example) Given the grammar AA aaaa bbaa aa bb, the language is LL(AA) = { } and First(αα) = { }, since For grammar AA aa, First(AA) = AA aa εε First(AA) = AA BBaa εε BB bb First(AA) = AA AAaa bb εε First(AA) = AA BBBBaa εε BB CCbb εε CC cc εε First(AA) = AA aa εε BB bb εε CC cc εε First(AAAAAA) = Determine FFFFFFFFFF(XX) 1. if XX is a terminal, FFFFFFFFFF(xx) = XX 2. if XX YY 1 YY 2 YY kk, determine FFFFFFFFFF(XX) as follows: 1. add all FFFFFFFFFF(YY 1 ) into FFFFFFFFFF(XX). 2. If εε FFFFFFFFFF(YY 1 ), 3. If εε FFFFFFFFFF(YY 1 ) and εε FFFFFFFFFF(YY 1 ), n. If εε FFFFFFFFFF(YY 1 ),, εε FFFFFFFFFF(YY kk ), 3. if XX εε is a production,

Concept) How to use First? Consider the following grammar GG: AA BB CC BB bb cc CC dd ee In GG, FFFFFFFFFF(BB) = bb, cc and FFFFFFFFFF(BB) = dd, ee are disjoint set. When parsing with nonterminal AA, if next input symbol is bb or cc, then AA BB production will be selected by parser. If next input symbol is dd or ee, then AA CC production will be selected by parser. Definition: Following(αα) FFFFFFFFFFFF(AA), for nonterminal AA, is the set of terminals aa that can appear immediately to the right of AA in some sentential form. FFFFFFFFFFFF(AA) is the set of terminals aa such that there exists derivations of SS ααααaaββ, for some αα and ββ. If AA can be the right most symbol in sentential form (SS AA) then $ FFFFFFFFFFFF(AA), where $ is a special endmarker symbol. SS AAbb AA aa εε FFFFFFFFFFFF(AA) = SS bbbb AA aa εε FFFFFFFFFFFF(AA) = SS aabbbbdd BB bb εε CC cc εε FFFFFFFFFFFF(BB) = FFFFFFFFFFFF(CC) =

SS aabbbbee BB bb εε CC cc εε FFFFFFFFFF(CC) = FFFFFFFFFFFF(BB) = SS aabbbbff BB bb cc εε CC dd ee εε FFFFFFFFFF(CC) = FFFFFFFFFFFF(BB) = SS aabbbbbb BB bb εε CC cc εε DD dd εε FFFFFFFFFF(CC) = FFFFFFFFFF(DD) = FFFFFFFFFFFF(BB) = Determine FFFFFFFFFFFF(AA) 1. Place $ in FFFFFFFFFFFF(SS). 2. If there is a production AA αααααα, then 3. If there is a production AA αααα, or AA αααααα and εε FFFFFFFFFF(ββ), then Note:

D. Recursive Predictive Parser a) Overview Consider the following grammar ssssssss iiii ( eeeeeeee ) ssssssss eeeeeeee ssssssss (αα) wwwwwwwwww ( eeeeeeee ) ssssssss (ββ) { ssssssss_llllllll } (γγ) Given next input symbol lah (lookahead token), a production can be predicted and selected using the following rules: 1. If lah is iiii FFFFFFFFFF(αα), then choose ssssssss αα 2. If lah is wwwwwwwwww FFFFFFFFFF(ββ), then choose ssssssss ββ 3. If lah is { FFFFFFFFFF(ββ), then choose ssssssss γγ The prediction rules can be written as parsing table MM AA, aa : Nonterminals Input symbol (lookahead) iiii wwwwwwwwww { ssssssss ssssssss iiii ( eeeeeeee ) ssssssss eeeeeeee ssssssss ssssssss wwwwwwwwww ( eeeeeeee ) ssssssss ssssssss { ssssssss_llllllll } During recursive-descent parsing, if current nonterminal is ssssssss and input symbol lah is iiii, wwwwwwwwww, or {, then its right production can be selected from the above prediction table MM, which need no backtracking. b) LL(1) Grammar LL(1) grammar can construct predictive parsers (recursive-descent parsers that need no backtracking). LL(1) stands for:

A Grammar GG is LL(1) a. If GG is non-left recursive and unambiguous, or b. Hold the following conditions: If AA αα ββ are two distinct production of GG. b1. αα and ββ do not derive string beginning with the same terminal aa. b2. At most, one of αα and ββ can derive empty string. b3. If ββ εε, then αα do not derive any string beginning with a terminal in FFFFFFFFFFFF(AA). Likewise, if αα εε, then ββ do not derive any string beginning with a terminal in FFFFFFFFFFFF(AA).

c) Construct predictive parse table Idea) Given productions AA αα ββ. 1. If the next input symbol lah (lookahead token) is in FFFFFFFFFF(AA), then choose AA αα 2. If αα = εε or αα εε, and lah FFFFFFFFFFFF(AA) or lah = $ FFFFFFFFFFFF(AA), then choose again AA αα Construction algorithm: INPUT: Given grammar G Example) EE TTEE EE +TTEE εε TT FFTT TT FFTT εε FF (EE) iiii OUTPUT: Parsing table MM METHOD: For each production AA αα, do the following 1. Determine FFFFFFFFFF and FFFFFFFFFFFF 2. For each terminal aa FFFFFFFFFF(AA), add AA αα to MM AA, aa

3. If εε FFFFFFFFFF(αα), then for each terminal bb FFFFFFFFFFFF(AA), add AA αα to MM AA, bb. If εε FFFFFFFFFF(αα) and $ FFFFFFFFFFFF(AA), add AA αα to MM[AA, $] as well. 4. If, after performing above, there is no production at all in MM AA, aa, then set MM AA, aa to error.(which we normally represent by an empty in the table) Final parsing table MM is: Nonterminals EE Input symbol (lookahead) iiii + ( ) $ EE TT TT FF This table MM means that:

Note: For every LL(1) Grammar, each parse table entry is uniquely identified. If a grammar is left-recursive or ambiguous, then at least one entry of the parse table MM will have 2 productions. Some Languages cannot have LL(1) grammar, even though left-recursion elimination and left-factoring are applied. Examples include dangling else problem. Dangling-else problem: Following is an abstract form of dangling else problem, that is applied left-recursion elimination and left-factoring: SS ii EE tt SS SS aa SS ee SS εε EE bb whose parse table is: Nonterminals Input symbol (lookahead) aa bb ee ii tt $ SS SS EE

d) Recursive Predictive Parser Given the following predictive parse table Input symbol (lookahead) iiii + ( ) $ EE EE TTEE EE TTEE EE EE +TTEE EE εε EE εε TT FF FFTT TT FFTT TT TT εε TT FFTT TT εε TT εε FF FF iiii FF (EE) Nonterminals its parser can be built easily as follows: void E() { if (lah == id ) { T(); E (); } else if(lah == ( ) { T(); E (); } else report( syntax error ); } void E () { if (lah == + ) { match( + ); T(); E (); } else if(lah == ) ) { } // do nothing else if(lah == $ ) { } // do nothing else report( syntax error ); } void T() { if (lah == id ) { F(); T (); } else if(lah == ( ) { F(); T (); } else report( syntax error ); } void T () { if (lah == + ) { } // do nothing else if(lah == * ) { match( * ); F(); T (); } else if(lah == ) ) { } // do nothing else if(lah == $ ) { } // do nothing else report( syntax error ); } void F() { if (lah == id ) { match(id); } else if(lah == ( ) { match( ( ); E(); match( ) ); } else report( syntax error ); }

Let input be iiii + iiii iiii. When calling E(), it works as follows:

e) Non-recursive Predictive Parser Non-recursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly via recursive call.... a + b * c $ X Y Z $ Predictive Parsing Program Given input ww, initially the parser is in a configuration, where input buffer has ww$ and stack has the start symbol SS of grammar GG above $. The following program produce a predictive parse for the input ww, using the predictive parsing table MM. 1. aa the first symbol of ww 2. XX the opt of stack symbol 3. while ( XX $ ) { // stack is not empty 4. if ( XX = aa ) { 5. pop the stack 6. aa the next symbol of ww 7. } 8. else if ( XX is a terminal ) error() 9. else if ( MM[XX, aa] is an error entry ) error() 10. else if ( MM[XX, aa] = XX YY 1 YY 2 YY kk ) { 11. output the production XX YY 1 YY 2 YY kk 12. pop the stack 13. push YY kk, YY kk 1,, YY 1 onto the stack, with YY 1 on top 14. } 15. XX the top stack symbol 16. }

Consider following parse table, and input iiii + iiii iiii. Nonterminals Input symbol (lookahead) iiii + ( ) $ EE EE TTEE EE TTEE EE EE +TTEE EE εε EE εε TT FF FFTT TT FFTT TT TT εε TT FFTT TT εε TT εε FF FF iiii FF (EE) Note: EE lm TT EE lm Change of configuration during parser generates output: Matches Stack Input Action

E. Error Recovery If a compiler had to process only one correct program, its design and implementation will be simplified greatly. However, it is expected that a compiler locates and track down errors. a) Types of programming error Lexical error: misspelling of identifiers, keywords, operators, etc. Syntactic error: misplaced semicolons, extra braces, case statement without switch, etc. Semantic error: type mismatches between operators and operands, like return int value in void function. Logical error: anything from incorrect reasoning on the part of the programmer. b) Simplest (Errors Recovery Mode) When the first error is discovered, c) Panic Mode Recovery When an error is discovered, This recovery strategy can be implemented by adding the synchronized token into parse table. 1. Add sync token into parse table Nonterminals Input symbol (lookahead) iiii + ( ) $ EE EE TTEE EE TTEE EE EE +TTEE EE εε EE εε TT FF FFTT TT FFTT TT TT εε TT FFTT TT εε TT εε FF FF iiii FF (EE)

2. During parsing, If MM AA, aa is blank, skip aa. If MM AA, aa is sync, pop nonterminal AA from stack. If token mismatch (AA aa), pop token AA from stack. Example) Input is ) iiii + iiii iiii, Matches Stack Input Action

d) Phrase-level Recovery On discovering an error, parser may perform local correction on remaining input, such that replacing some prefix of input in order to continue parsing. This can be done by filling a blank entity of parse table with the function pointer for error routine that adds, removes, or replaces input symbol (tokens), or pop stacks, and then issues error messages.