EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Similar documents
EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

LL parsing Nullable, FIRST, and FOLLOW

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CSE 3302 Programming Languages Lecture 2: Syntax

3. Parsing. Oscar Nierstrasz

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

A simple syntax-directed

EECS 6083 Intro to Parsing Context Free Grammars

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Dr. D.M. Akbar Hussain

CPS 506 Comparative Programming Languages. Syntax Specification

Principles of Programming Languages COMP251: Syntax and Grammars

CSCI312 Principles of Programming Languages!

CS 314 Principles of Programming Languages

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Defining syntax using CFGs

Part 5 Program Analysis Principles and Techniques

COP 3402 Systems Software Syntax Analysis (Parser)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CMSC 330: Organization of Programming Languages

Principles of Programming Languages COMP251: Syntax and Grammars

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntax Analysis Part I

A programming language requires two major definitions A simple one pass compiler

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

3. Context-free grammars & parsing

Building Compilers with Phoenix

Syntax Analysis Check syntax and construct abstract syntax tree

Context-Free Grammar (CFG)

CMSC 330: Organization of Programming Languages

Syntax. In Text: Chapter 3

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Parsing II Top-down parsing. Comp 412

Habanero Extreme Scale Software Research Project

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Introduction to Parsing. Lecture 5

Introduction to Parsing

A Simple Syntax-Directed Translator

CMSC 330: Organization of Programming Languages. Context Free Grammars

EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Lecture 4: Syntax Specification

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Grammars and Parsing. Paul Klint. Grammars and Parsing

CS 406/534 Compiler Construction Parsing Part I

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CMPT 755 Compilers. Anoop Sarkar.

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Top down vs. bottom up parsing

Lexical and Syntax Analysis. Top-Down Parsing

Chapter 3. Describing Syntax and Semantics

Syntax. A. Bellaachia Page: 1

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax and Grammars 1 / 21

Context-free grammars (CFG s)

Introduction to Lexing and Parsing

Describing Syntax and Semantics

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Lexical and Syntax Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CSE302: Compiler Design

Part 3. Syntax analysis. Syntax analysis 96

Syntax-Directed Translation. Lecture 14

CS5363 Final Review. cs5363 1

This book is licensed under a Creative Commons Attribution 3.0 License

ECE251 Midterm practice questions, Fall 2010

Wednesday, August 31, Parsers

Lecture 8: Deterministic Bottom-Up Parsing

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Chapter 3. Parsing #1

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CSCE 314 Programming Languages

Monday, September 13, Parsers

2.2 Syntax Definition

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Transcription:

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing Görel Hedin Revised: 2017-09-04

This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser) Semantic analyzer source code (text) tokens AST (Abstract syntax tree) Attributed AST runtime system activation records objects stack garbage collection heap Intermediate code generator Interpreter Optimizer intermediate code Virtual machine code and data intermediate code Target code generator target code machine 2

Space of context-free grammars Unambiguous LR LL Ambiguous All context-free grammars LL: Builds tree top-down Simple to understand LR: Builds tree bottom-up More powerful 3

Ambiguous grammars 4

Recall: the definition of ambiguity Grammar: -> "+" -> "*" -> A CFG is ambiguous if there is a sentence in the language that can be derived by two (or more) different parse trees. "+" "*" "*" "+" 5

Strategies for dealing with ambiguities First, decide which parse tree is the desired one. Eliminate the ambiguity: Create an equivalent unambiguous grammar. Usually possible, but there exists grammars for which it cannot be done. Unambiguous: Only one parse tree can be derived for any string. However, the parse tree will be different from the original desired one. Alternatively, use extra rules: Usethe ambiguous grammar. Add priority and associativity rules to instruct the parser to select the desired parse tree. Works for some ambiguities. Supported by some parser generators. 6

Eliminating ambiguity Unambiguous Ambiguous Goal: transform an ambiguous grammar to an equivalent unambiguous grammar. 7

Equivalent grammars Two grammars, G 1 and G 2, are equivalent if they generate the same language. I.e., a sentence can be derived from one of the grammars, iff it can be derived also from the other grammar: L(G 1 ) = L(G 2 ) 8

Common kinds of ambiguities Operators with different priorities: a + b * c == d,... Associativity of operators of the same priority: a + b c + d,... Dangling else: if (a) if (b) c = d; else e = f; 9

Example ambiguity: Priority (also called precedence) -> "+" -> "*" -> Two parse trees for "+" "*" "+" "*" "*" "+" prio("*") > prio("+") (according to tradition) prio("+") > prio("*") (would be unexpected and confusing) 10

Example ambiguity: Associativity -> "+" -> "-" -> "**" -> For operators with the same priority, how do several in a sequence associate? "+" "**" "-" "**" Left-associative (usual for most operators) Right-associative (usual for the power operator) 11

Example ambiguity: Non-associativity -> "<" -> For some operators, it does not make sense to have several in a sequence at all. They are non-associative. "<" "<" "<" "<" We would like to forbid both trees. I.e., rule out the sentence from the langauge. 12

Disambiguating expression grammars How can we change the grammar so that only the desired trees can be derived? Idea: Restrict certain subtrees by introducing new nonterminals. Priority: Introduce a new nonterminal for each priority level: Term, Factor, Primary,... Left associativity: Restrict the right operand so it only can contain expressions of higher priority Right associativity: Restrict the left operand... Non-associativity: Restrict both operands 13

Exercise Ambiguous grammar: Equivalent unambiguous grammar: r -> r "+" r r -> r "*" r r -> ID r -> "(" r ")" 14

Solution You will do this in Assignment 2! Ambiguous grammar: r -> r "+" r r -> r "*" r r -> ID r -> "(" r ")" Equivalent unambiguous grammar: r -> r "+" Term r -> Term Term -> Term "*" Factor Term -> Factor Factor -> ID Factor -> "(" r ")" Here, we introduce a new nonterminal, Term, that is more restricted than r. That is, from Term, we can not derive any new additions. For the addition production, we use Term as the right operand, to make sure no new additions will appear to the right. This gives left-associativity. For the multiplication production, we use Term, and the even more restricted nonterminal Factor to make sure no additions can appear as children (without using parentheses). This gives multiplication higher priority than addition. 15

Real-world example: The Java expression grammar ression -> Lambdaression Assignmentression Assignmentression -> Conditionalression Assignment Conditionalression ->... Additiveression -> Multiplicativeression Additiveression + Multiplicativeression Additiveression Multiplicativeression Multiplicativeression -> Unaryression Multiplicativeression * Unaryression... Unaryression ->...... Primary -> PrimaryNoNewArray ArrayCreationression PrimaryNoNewArray -> Literal this ( ression ) FieldAccess... More than 15 priority levels. See the Java Language Specification, Java SE 8, Chapter 19, Syntax http://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html 16

The "dangling else" problem S -> "if" "(" E ")" S ["else" S] S -> ID "=" E ";" E -> ID Construct a parse tree for: if (a) if (b) c = d; else e = f; The desired tree if (a) if (b) c = d; else e = f; S S Two possible parse trees! The grammar is ambiguous! S S if (a) if (b) c = d; else e = f; if ( a ) if ( b ) c = d; else e = f; if ( a ) if ( b ) c = d; else e = f; 17

Solutions to the "dangling else" problem Rewrite to equivalent unambiguous grammar - possible, but results in more complex grammar (several similar rules) Use the ambiguous grammar - use "rule priority", the parser can select the correct rule. - works for the dangling else problem, but not for ambiguous grammars in general - not all parser generators support it well Change the language - e.g., add a keyword "fi" that closes the "if"-statement - restrict the "then" part to be a block: "{... }". - only an option if you are designing the language yourself. The Java Language Specification rewrites the grammar to be unambiguous. 18

Finding ambiguities in practice You try to run a CFG through an LL or LR parser generator - If it is accepted by the parser generator, the grammar is unambiguous - If not, the grammar could be ambiguous, or unambiguous, but outside of the parser generator grammar class. In any case, you need to analyze that particular problem. This can be quite tricky, especially for large grammars. Perhaps you can find an ambiguity, or some other known LL/LR difficulty. 19

EBNF, BNF, Canonical form 20

Recall: different notations for CFGs A -> B d e C f A -> g A Canonical form sequence of terminals and nonterminals C -> D a b b E F a C BNF (Backus-Naur Form) alternative productions (......... ) G -> H* i (d E)+ F [d C] EBNF (Extended Backus-Naur Form) repetition (* and +) optionals [...] parentheses (...) 21

Writing the grammar in different notations Equivalent BNF (Backus-Naur Form): Canonical form: r -> r "+" Term r -> Term Term -> Term "*" Factor Term -> Factor Factor -> Factor -> "(" r ")" Use alternatives insteadofseveral productions per nonterminal. Equivalent EBNF (Extended BNF): Use repetition instead of recursion, where possible. 22

Writing the grammar in different notations Equivalent BNF (Backus-Naur Form): Canonical form: r -> r "+" Term r -> Term Term -> Term "*" Factor Term -> Factor Factor -> Factor -> "(" r ")" r -> r "+" Term Term Term -> Term "*" Factor Factor Factor -> "(" r ")" Use alternatives instead of several productions per nonterminal. Equivalent EBNF (Extended BNF): r -> Term ("+" Term)* Term -> Factor ("*" Factor)* Factor -> "(" r ")" Use repetition instead of recursion, where possible. 23

EBNF Translating EBNF to Canonical form Equivalent canonical form Top levelrepetition X -> g 1 g 2 * g 3 Top levelalternative X -> g 1 g 2 Top level parentheses X -> g 1 (...) g 2 Where g k is a sequence of terminals and nonterminals 24

EBNF Translating EBNF to Canonical form Equivalent canonical form Top levelrepetition X -> g 1 g 2 * g 3 X -> g 1 N g 3 N -> e N -> g 2 N Top levelalternative X -> g 1 g 2 X -> g 1 X -> g 2 Top level parentheses X -> g 1 (...) g 2 X -> g 1 N g 2 N ->... 25

Exercise: Translate from EBNF to Canonical form EBNF: Equivalent Canonical Form r -> Term ("+" Term)* 26

Solution: Translate from EBNF to Canonical form EBNF: r -> Term ("+" Term)* Equivalent Canonical Form r -> Term N N -> e N -> "+" Term N 27

Can we prove that these are equivalent? trivial Equivalent Canonical Form r -> Term N N -> e N -> "+" Term N EBNF: r -> Term ("+" Term)* Alternative Equivalent Canonical Form non-trivial r -> r "+" Term r -> Term 28

Example proof 1. We start with this: r -> Term ("+" Term)* 2. We can move the repetition: r -> (Term "+")* Term 3. Eliminate the repetition: r -> N Term N -> e N -> N Term "+" 4. Replace N Term by r in the third production: r -> N Term N -> e N -> r "+" We would like this: r -> r "+" Term r -> Term 5. Eliminate N: r -> r "+" Term r -> Term Done! 29

Equivalence of grammars Given two context-free grammars, G1 and G2. Are they equivalent? I.e., is L(G1) = L(G2)? Undecidable problem: a general algorithm cannot be constructed. We need to rely on our ingenuity to find out. (In the general case.) 30

Adapting grammars to LL parsing 31

Create equivalent LL grammar Unambiguous LL Ambiguous Typically, need to eliminate Left Recursion and Common Prefixes. (But this maynot be enough.) The parse trees will be different from the original desired ones. Try to build the desired ASTs anyway. EBNF helps: relatively easy to build the desired AST. 32

Recall: LL(1) parsing Assign Assign -> ID = ; -> Name Params Name... Name -> ID (. ID )*? What node should be built? ID = ID.ID; ID = ID.ID ( ID ); Common prefix! Cannot be handled by LL(1). This grammar is not even LL(k). LL(1): decides to build the node after seeing the firsttoken ofitssubtree. The tree is built top down. 33

Eliminating the common prefix Rewrite to an equivalent grammar without the common prefix -> Name Params Name With common prefix - not LL(1) 34

Eliminating the common prefix Rewrite to an equivalent grammar without the common prefix -> Name Params Name With common prefix - not LL(1) -> Name OptParams OptParams -> Params e Without common prefix - LL(1) Eliminating a common prefix this way is called left factoring. 35

A -> B s A -> B t B -> u v Exercise If two productions of the same nonterminal can derive a sentence starting in the same way, they share a common prefix. A -> s B A -> s C B -> t C -> u A -> s B B -> s C B -> t C C -> u Which nonterminals have common prefix productions? What is the common prefix? Is the grammar LL(1), LL(2),...? 36

Solution If two productions of a nonterminal can derive a sentence starting in the same way, they share a common prefix. A -> s B A -> s C B -> t C -> u A -> B s A -> B t B -> u v A -> s B B -> s C B -> t C C -> u A has two rules that can derive the prefix s The grammar is LL(2) A has two rules that can derive the prefix u v The grammar is LL(3) This is not a common prefix problem. The two rules that start the same cannot be derived from the same nonterminal. The grammar is LL(1) Which nonterminals have common prefix productions? What is the common prefix? Is the grammar LL(1), LL(2),...? 37

The common prefix can be indirect A -> B A -> C A -> D B -> t s C -> t v D -> x A has two rules that can derive the prefix t The grammar is LL(2) A -> B s A -> B t B -> B u B -> v A has two rules that can derive the prefix v u* So, the prefix can become arbitrarily long. The grammar is not LL(k), no matter what k we use. We need to rewrite the grammar, or use another parsing method than LL. (For example, LR has no problem with common prefixes) Which nonterminals have common prefix productions? What is the common prefix? Is the grammar LL(1), LL(2),...? 38

Eliminating the common prefix Rewrite to an equivalent grammar without the common prefix A -> B A -> C B -> t s B -> x D B -> y C -> t v D -> B C Indirect common prefix 39

Eliminating the common prefix Rewrite to an equivalent grammar without the common prefix A -> B A -> C B -> t s B -> x D B -> y C -> t v D -> B C Indirect common prefix First, make the common prefix directly visible: Substitute all B right-hand sides into the A -> B rule We can't remove the B rules since B is used in other places. Similarly for the A -> C rule A -> t s A -> x D A -> y B -> t s B -> x D B -> y A -> t v C -> t v D -> B C Direct common prefix Then, eliminate the direct common prefix, as previously. 40

Left recursion assign expr assign -> ID "=" expr ";" expr -> expr "+" term term term -> ID? What node should be built? ID = ID + ID + ID ; The grammar is left recursive. The grammar is not LL(k). An LL parser would go into endlessrecursion. (LR parsers can handle left recursion.) 41

Dealing with left recursion in LL parsers Method 1: Eliminate the leftrecursion (A bit cumbersome) Left-recursive grammar. Not LL(k) E -> E "+" T E -> T T -> ID Rewrite to right-recursion! But there is now a common prefix! Still not LL(k). E -> T "+" E E -> T T -> ID Eliminate the common prefix. The grammar is now LL(1) E -> T E' E' -> "+" E E' -> e T -> ID With a little work, it is possible to write code that builds a left-recursive AST, even if the parse is right-recursive. 42

Dealing with left recursion in LL parsers Method 2: Rewrite to EBNF (Easy!) Left-recursive grammar. Not LL(k) E -> E "+" T E -> T T -> ID Rewrite to EBNF! E -> T ( "+" T )* T -> ID A left-recursive AST can easily be built during the iteration. 43

Advice when using an LL-based parser generator If the LL parser generator does not accept your grammar, the reason might be Ambiguity usually eliminate it. In some cases, rule priority can be used. Left recursion can you use EBNF instead? Otherwise, eliminate. Common prefix is it limited? You can then use a local lookahead, for example 2. Otherwise, factor out the common prefix. You might be able to solve the problem, but the grammar might become large and less readable. 44

Different parsing algorithms Unambiguous LR LL Ambiguous All context-free grammars LL: Left-to-right scan Leftmost derivation Builds tree top-down Simple to understand LR: Left-to-right scan Rightmost derivation Builds tree bottom-up More powerful 45

Parses input LL(k) vs LR(k) LL(k) Left-to-right LR(k) Derivation Leftmost Rightmost Lookahead k symbols Build the tree top down bottom up Select rule after seeing itsfirst k tokens after seeing all its tokens, and an additional k tokens Left recursion Cannot handle Can handle! Unlimited common prefix Can resolve some ambiguities through rule priority Cannot handle Dangling else Can handle! Dangling else, associativity, priority Error recovery Trial-and-error Good algorithms exist Implement by hand? Possible. But better to use a generator. Too complicated. Use a generator. 46

Summary questions What does it mean for a grammar to be ambiguous? What does it mean for two grammars to be equivalent? Exemplify some common kinds of ambiguities. Exemplify how expression grammars with can be disambiguated. What is the "dangling else"-problem, and how can it be solved? When should we use canonical form, and when BNF or EBNF? Translate an example EBNF grammar to canonical form. Can we write an algorithm to check if two grammars are equivalent? What is a common prefix? Exemplify how a common prefix can be eliminated. What is left factoring? What is left recursion? Exemplify how left recursion can be eliminated in a grammar on canonical form. Exemplify how left recursion can be eliminated using EBNF. Can LL(k) parsing algorithms handle common prefixes and left recursion? Can LR(k) parsing algorithms handle common prefixes and left recursion? 47