Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Similar documents
CS 406/534 Compiler Construction Parsing Part I

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing II Top-down parsing. Comp 412

Introduction to Parsing

Parsing Part II (Top-down parsing, left-recursion removal)

Computer Science 160 Translation of Programming Languages

Syntax Analysis, III Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Types of parsing. CMSC 430 Lecture 4, Page 1

Syntactic Analysis. Top-Down Parsing

CSCI312 Principles of Programming Languages

Syntax Analysis, III Comp 412

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

EECS 6083 Intro to Parsing Context Free Grammars

Parsing II Top-down parsing. Comp 412

3. Parsing. Oscar Nierstrasz

Introduction to Parsing. Comp 412

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS 314 Principles of Programming Languages

Introduction to Parsing. Lecture 5

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

CA Compiler Construction

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Compiler Design Concepts. Syntax Analysis

Introduction to Parsing. Lecture 5

Syntax Analysis Check syntax and construct abstract syntax tree

Part 5 Program Analysis Principles and Techniques

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Context-Free Grammars

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Defining syntax using CFGs

Introduction to Parsing. Lecture 8

Compilers Course Lecture 4: Context Free Grammars

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Context-free grammars

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Context-Free Languages and Parse Trees

Front End. Hwansoo Han

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Compiler Construction: Parsing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Introduction to Parsing Ambiguity and Syntax Errors


MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Syntax Analysis Part I

Introduction to Parsing Ambiguity and Syntax Errors

Chapter 4. Lexical and Syntax Analysis

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Ambiguity. Lecture 8. CS 536 Spring

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Chapter 4: LR Parsing

4. Lexical and Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Introduction to Syntax Analysis. The Second Phase of Front-End

Concepts Introduced in Chapter 4

S Y N T A X A N A L Y S I S LR

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Top down vs. bottom up parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

4. Lexical and Syntax Analysis

CMSC 330: Organization of Programming Languages

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

Compiler Construction 2016/2017 Syntax Analysis

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Chapter 3. Parsing #1

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Introduction to Syntax Analysis

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Lexical and Syntax Analysis. Top-Down Parsing

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

CSE 3302 Programming Languages Lecture 2: Syntax

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Compilers. Bottom-up Parsing. (original slides by Sam

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Bottom-Up Parsing. Lecture 11-12

LL(1) predictive parsing

Transcription:

Parsing Part II (Ambiguity, Top-down parsing, Left-recursion Removal)

Ambiguous Grammars Definitions If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous If a grammar has more than one rightmost derivation for a single sentential form, the grammar is ambiguous The leftmost and rightmost derivations for a sentential form may differ, even in an unambiguous grammar Classic example the if-then-else problem Stmt if then Stmt if then Stmt else Stmt other stmts This ambiguity is entirely grammatical in nature from Cooper & Torczon 2

Ambiguity This sentential form has two derivations (If a derivation has more than 1 parse tree, the grammar is ambiguous if 1 then if 2 then Stmt 1 else Stmt 2 if if E 1 then else E 1 then if S 2 if E 2 then E 2 then else S 1 S 1 S 2 production 2, then production 1 production 1, then production 2 from Cooper & Torczon 3

Ambiguity Removing the ambiguity Must rewrite the grammar to avoid generating the problem Match each else to innermost unmatched if (common sense rule) 1 S tmt Wi th E lse 2 N o E ls e 3 Wi th E lse If E xpr t he n Wi th E lse el s e Wi th E lse 4 o t h er s t mt s 5 N o E ls e If E xpr t he n S tmt 6 If E xpr t he n Wi th E lse el s e N o E ls e With this grammar, the example has only one derivation from Cooper & Torczon 4

Ambiguity if 1 then if 2 then Stmt 1 else Stmt 2 R u le S e n te n t ia l Fo rm St m t 2 N oe ls e 5 if E x p r th e n St m t? if E 1 th e n St m t 1 if E 1 th e n W ith E ls e 3 if E 1 th e n if E x p r th e n W ith E ls e e ls e W ith E ls e? if E 1 th e n if E 2 th e n W ith E ls e e ls e W ith E ls e 4 if E 1 th e n if E 2 th e n S 1 e ls e W ith E ls e 4 if E 1 th e n if E 2 th e n S 1 e ls e S 2 This binds the else controlling S 2 to the inner if from Cooper & Torczon 5

Deeper Ambiguity Ambiguity usually refers to confusion in the CFG Overloading can create deeper ambiguity a = f(17) In some languages, f could be either a function or a subscripted variable Disambiguating this one requires context Need values of declarations Really an issue of type, not context-free syntax Requires an extra-grammatical solution (not in CFG) Must to handle these with a different mechanism > Step outside grammar rather than use a more complex grammar from Cooper & Torczon 6

Ambiguity - the Final Word Ambiguity arises from two distinct sources Confusion in the context-free syntax (if-then-else) Confusion that requires context to resolve (overloading) Resolving ambiguity To remove context-free ambiguity, rewrite the grammar To handle context-sensitive ambiguity takes cooperation > Knowledge of declarations, types, > Accept a superset of L(G) & check it with other means > This is a language design problem Sometimes, the compiler writer accepts an ambiguous grammar > Parsing techniques that do the right thing See Chapter 4 from Cooper & Torczon 7

Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad pick may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars from Cooper & Torczon 8

Top-down Parsing A top-down parser starts with the root of the parse tree The root node is labeled with the goal symbol of the grammar Top-down parsing algorithm: Construct the root node of the parse tree Repeat until the fringe of the parse tree matches the input string 1 At a node labeled A, select a production with A on its lhs and, for each symbol on its rhs, construct the appropriate child 2 When a terminal symbol is added to the fringe and it doesn t match the fringe, backtrack 3 Find the next node to be expanded (label NT) The key is picking the right production in step 1 > That choice should be guided by the input string from Cooper & Torczon 9

Remember the expression grammar? Version with precedence derived last lecture 1 G oa l E xpr 2 E xpr E xpr + Ter m 3 E xpr Ter m 4 Ter m 5 Ter m Ter m * Fact o r 6 Ter m / Fact o r 7 Fact o r 8 Fact o r n u m ber 9 id And the input x - 2 * y from Cooper & Torczon 10

Example Let s try x - 2 * y : Goal R u le S en t e n tial For m In pu t G oa l x - 2 * y 1 E xpr x - 2 * y + 2 E xpr + Ter m x - 2 * y 4 Ter m + Ter m x - 2 * y 7 Fact o r + Ter m x - 2 * y 9 <id, x> + Ter m x - 2 * y Fact. 9 <id, x> + Ter m x - 2 * y <id,x> from Cooper & Torczon 11

Example Let s try x - 2 * y : Goal R u le S en t e n tial For m In pu t G oa l x - 2 * y 1 E xpr x - 2 * y + 2 E xpr + Ter m x - 2 * y 4 Ter m + Ter m x - 2 * y 7 Fact o r + Ter m x - 2 * y 9 <id, x> + Ter m x - 2 * y Fact. 9 <id, x> + Ter m x - 2 * y <id,x> This worked well, except that - doesn t match + The parser must backtrack to here from Cooper & Torczon 12

Example Continuing with x - 2 * y : Goal R u le S en t e n tial For m In pu t G oa l x - 2 * y 1 E xpr x - 2 * y 3 E xpr - Ter m x - 2 * y - 4 Ter m - Ter m x - 2 * y 7 Fact o r - Ter m x - 2 * y 9 <id, x> - Ter m x - 2 * y Fact. 9 <id, x> - Ter m x - 2 * y <id, x> - Ter m x - 2 * y <id,x> from Cooper & Torczon 13

Example Continuing with x - 2 * y : Goal R u le S en t e n tial For m In pu t G oa l x - 2 * y 1 E xpr x - 2 * y 3 E xpr - Ter m x - 2 * y - 4 Ter m - Ter m x - 2 * y 7 Fact o r - Ter m x - 2 * y 9 <id, x> - Ter m x - 2 * y Fact. 9 <id, x> - Ter m x - 2 * y <id, x> - Ter m x - 2 * y <id,x> This time, - and - matched We can advance past - to look at 2 Now, we need to expand - the last NT on the fringe from Cooper & Torczon 14

Example Trying to match the 2 in x - 2 * y : Goal R u le S en t e n tial For m In pu t <id, x> - Ter m x - 2 * y 7 <id, x> - Fact o r x - 2 * y - 9 <id, x> - <n um,2> x - 2 * y <id, x> - <n um,2> x - 2 * y Fact. Fact. <num,2> <id,x> from Cooper & Torczon 15

Example Trying to match the 2 in x - 2 * y : Goal R u le S en t e n tial For m In pu t <id, x> - Ter m x - 2 * y 7 <id, x> - Fact o r x - 2 * y - 9 <id, x> - <n um,2> x - 2 * y <id, x> - <n um,2> x - 2 * y Fact. Where are we? Fact. <num,2> 2 matches 2 <id,x> We have more input, but no NTs left to expand The expansion terminated too soon Need to backtrack from Cooper & Torczon 16

Example Trying again with 2 in x - 2 * y : Goal R u le S en t e n tial For m In pu t <id, x> - Ter m x - 2 * y 5 <id, x> - Ter m * Fact o r x - 2 * y 7 <id, x> - Fact o r * Fact o r x - 2 * y - 8 <id, x> - <n um,2> * Fact o r x - 2 * y <id, x> - <n um,2> * Fact o r x - 2 * y * Fact. <id, x> - <n um,2> * Fact o r x - 2 * y Fact. Fact. <id,y> 9 <id, x> - <n um,2> * < id, y > x - 2 * y <id, x> - <n um,2> * < id, y > x - 2 * y <id,x> <num,2> This time, we matched & consumed all the input Success! from Cooper & Torczon 17

Another possible parse Other choices for expansion are possible R u le S en t e n tial For m In pu t G oa l x - 2 * y 1 E xpr x - 2 * y 2 E xpr + Ter m x - 2 * y 2 E xpr + Ter m +Ter m x - 2 * y 2 E xpr + Ter m + Ter m +Ter m x - 2 * y 2 E xpr +Ter m + Ter m + +Ter m x - 2 * y consuming no input! This doesn t terminate (obviously) Wrong choice of expansion leads to non-termination Non-termination is a bad property for a parser to have Parser must make the right choice from Cooper & Torczon 18

Left Recursion Top-down parsers cannot handle left-recursive grammars Formally, A grammar is left recursive if A NT such that a derivation A + Aα, for some string α (NT T ) + Our expression grammar is left recursive This can lead to non-termination in a top-down parser For a top-down parser, any recursion must be right recursion We would like to convert the left recursion to right recursion Non-termination is a bad property in any part of a compiler from Cooper & Torczon 19

Eliminating Left Recursion To remove left recursion, we can transform the grammar Consider a grammar fragment of the form Fee Fee α β where neither α nor β start with Fee We can rewrite this as Fee β Fie Fie α Fie ε where Fie is a new non-terminal This accepts the same language, but uses only right recursion from Cooper & Torczon 20

Eliminating Left Recursion The expression grammar contains two cases of left recursion Ex pr Ex pr + T e rm Ex pr T e rm T e rm T e rm T e rm * Fa c t o r T e rm / F a c t o r Fa c t o r Applying the transformation yields Ex pr T e rm Ex pr Ex pr + Ter m Ex pr - T e rm Ex pr ε T e rm Fa c t o r T e rm T e rm * Fa c t o r T e rm / Fa c t o r T e rm ε These fragments use only right recursion They retains the original left associativity from Cooper & Torczon 21

Eliminating Left Recursion Substituting back into the grammar yields 1 G o a l Ex pr 2 Ex pr Te rm E x p r 3 + Te rm Exp r 4 - Te rm Exp r 5 ε 6 Te rm Fac to r Te rm 7 * Fa ct or T e rm 8 / Fa ct or T e rm 9 ε 10 Fac to r nu mb er 11 id This grammar is correct, if somewhat non-intuitive. It is left associative, as was the original A top-down parser will terminate using it. A top-down parser may need to backtrack with it. from Cooper & Torczon 22

Eliminating Left Recursion The transformation eliminates immediate left recursion What about more general, indirect left recursion The general algorithm: arrange the NTs into some order A 1, A 2,, A n for i 1 to n replace each production A i A s γ with A i δ 1 γ δ 2 γ δ k γ, where A s δ 1 δ 2 δ k are all the current productions for A s eliminate any immediate left recursion on A i using the direct transformation This assumes that the initial grammar has no cycles (A i + A i ), and no epsilon productions from Cooper & Torczon 23

Eliminating Left Recursion How does this algorithm work? 1. Impose arbitrary order on the non-terminals 2. Outer loop cycles through NT in order 3. Inner loop ensures that a production expanding A i has no non-terminal A s in its rhs, for s < I 4. Last step in outer loop converts any direct recursion on A i to right recursion using the transformation showed earlier 5. New non-terminals are added at the end of the order & have no left recursion At the start of the i th outer loop iteration For all k < I, no production that expands A k contains a non-terminal A s in its rhs, for s < k from Cooper & Torczon 24