Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Similar documents
Context-Free Grammars

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Syntax Analysis Check syntax and construct abstract syntax tree

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing

Introduction to Parsing Ambiguity and Syntax Errors

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Compiler Design Concepts. Syntax Analysis

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

CS 314 Principles of Programming Languages

EECS 6083 Intro to Parsing Context Free Grammars

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Introduction to Parsing. Lecture 5

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Principles of Programming Languages COMP251: Syntax and Grammars

Parsing II Top-down parsing. Comp 412

Part 5 Program Analysis Principles and Techniques

Introduction to Parsing. Lecture 5

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Intro To Parsing. Step By Step

CS 406/534 Compiler Construction Parsing Part I

3. Parsing. Oscar Nierstrasz

Compilers Course Lecture 4: Context Free Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Context-Free Grammars

CMPT 755 Compilers. Anoop Sarkar.

Defining syntax using CFGs

Context Free Languages

CPS 506 Comparative Programming Languages. Syntax Specification

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Ambiguity. Lecture 8. CS 536 Spring

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Properties of Regular Expressions and Finite Automata

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Wednesday, August 31, Parsers

Syntax Analysis Part I

Optimizing Finite Automata

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

COP 3402 Systems Software Syntax Analysis (Parser)

Formal Languages and Compilers Lecture I: Introduction to Compilers

Introduction to Syntax Analysis

Context-free grammars

CS 321 Programming Languages and Compilers. VI. Parsing

Principles of Programming Languages COMP251: Syntax and Grammars

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Dr. D.M. Akbar Hussain

CMSC 330: Organization of Programming Languages. Context Free Grammars

Monday, September 13, Parsers

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler Construction: Parsing

Introduction to Syntax Analysis. The Second Phase of Front-End

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

CMSC 330: Organization of Programming Languages

Conflicts in LR Parsing and More LR Parsing Types

CMSC 330: Organization of Programming Languages. Context Free Grammars

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:


([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CMSC 330: Organization of Programming Languages

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

2.2 Syntax Definition

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Introduction to Parsing. Lecture 8

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CSE 3302 Programming Languages Lecture 2: Syntax

Compilers and computer architecture From strings to ASTs (2): context free grammars

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

LL(1) predictive parsing

CSE302: Compiler Design

Habanero Extreme Scale Software Research Project

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

3. Context-free grammars & parsing

CSCI312 Principles of Programming Languages!

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Transcription:

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Grammars Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal Languages and Compilers BSc course 2017/18 First Semester

Summary of Lecture V Generating Languages from Grammars Ambiguous Grammars

Notion of Derivation To characterize a Language starting from a Grammar G = (V T,V N,S,P) we need to introduce the notion of Derivation. The notion of Derivation uses Productions P to generate a string starting from another string. Direct Derivation (in symbols ). If α β P and γ, δ V, then, γαδ γβδ. Derivation (in symbols ). If α 1 α 2, α 2 α 3,..., α n 1 α n, then, α 1 α n.

Generating Languages from Grammars Generative Definition of a Language. We say that a Language L is generated by the Grammar G= (V T,V N,S,P), in symbols L(G), if: L(G) = {w V T S w}. Example. Consider the following CF Grammar for arithmetic expressions: E E + E E E (E) E id The sequence of Tokens (id + id) is a well-formed sentence: E E (E) (E + E) (id + E) (id + id) Note: Token is a synonym of Terminal Symbol when talking of Grammars for programming languages.

Derivation Trees for Context-Free Grammars Derivation Trees, called also Parse Trees, are a visual method of describing any derivation in a context-free grammar. Let G= (V T,V N,S,P) be a CFG. A tree is a derivation tree for G if: 1 Every node has a label, which is a symbol of V; 2 The label of the root is S; 3 If a node, n, labeled with A has at least one descendant, then A must be in V N ; 4 If nodes n 1, n 2,..., n k are direct descendants of node n, with labels A 1, A 2,..., A k, respectively, then: A A 1, A 2,..., A k must be a production in P.

Choices in Derivations At each step in a Derivation there are two choices to be made: 1 Which non-terminal to replace; 2 Which Production to use for that non-terminal. W.r.t. point (1), we have two derivations for (id + id): 1. E E (E) (E + E) (id + E) (id + id) 2. E E (E) (E + E) (E + id) (id + id) A Parser will consider either Leftmost Derivations the leftmost non-terminal is chosen or Rightmost Derivations.

Parse-Trees and Derivations A Parse-Tree is a visualization of a Derivation that ignores variations in the order in which non-terminal are replaced point (1) above. The Parse-Tree associated to the two Derivations in the previous slide is E - E ( ) E E id + E id Every Parse-Tree is associated with a unique leftmost and a unique rightmost derivation, and viceversa. Problem of Ambiguity: A sentence can have more than one Parse-Tree. Related to point (2) above: Which Production to use for a given non-terminal

Summary of Lecture V Generating Languages from Grammars Ambiguous Grammars

Ambiguity A grammar is ambiguous if it has more than one Parse-Tree for some string. Equivalently, there is more than one right-most (or left-most) derivation for some string. Ambiguity is bad: Leaves meaning of some programs ill-defined since we cannot decide its syntactical structure uniquely. Ambiguity is a property of Grammars, not of Languages. Two alternative solutions: 1 Disambiguate the grammar 2 Use extra-grammatical mechanisms, like disambiguating rules, to discard alternative Parse-Trees.

Ambiguity: Arithmetic Expressions Consider the Grammar for arithmetic expressions: E E + E E E (E) E id The sequence of Tokens id + id id has two Parse-Trees E E E + E E * E id E * E E + E id id id id id The first Parse-Tree reflects the usual assumption that * takes precedence on +.

Eliminating Ambiguity by Disambiguating the Grammar Sometime it is possible to eliminate ambiguity by rewriting the Grammar. Example. Let us rewrite the Grammar for arithmetic expressions: E E+T T T T F F F (E) id Enforces precedence of * over +; Enforces left-associativity of + and *

Eliminating Ambiguity: Example E E+T T T T F F F (E) id The sequence of Tokens id + id id has now only one Parse-Tree E E + T T T * F F F id id id

Ambiguity: The Dangling Else Consider the Grammar for if-then-else statements: Stmt if Expr then Stmt if Expr then Stmt else Stmt other This Grammar is ambiguous. Example. Consider the statement: if E 1 then if E 2 then S 1 else S 2.

The Dangling Else: Example The statement: if E 1 then if E 2 then S 1 else S 2, has two Parse-Trees Stmt Stmt if E 1 then Stmt if E 1 then Stmt else S 2 if E 2 then S 1 else S 2 if E 2 then S 1 Typically, the first Parse-Tree is preferred. Disambiguating Rule: Match each else with the closest unmatched then.

Disambiguating Dangling Else Disambiguating Rule: Match each else with the closest unmatched then. The rule can be incorporated into the Grammar if we distinguish between matched and unmatched statements. A statement between a then-else must be matched. Stmt Matched stmt Unmatched stmt Matched stmt if Expr then Matched stmt else Matched stmt Other-Stmt Unmatched stmt if Expr then Stmt if Expr then Matched stmt else Unmatched stmt This Grammar generates the same set of strings as the previous one but gives just one Parse-Tree for if-then-else statements.

Disambiguating Rules: Precedence and Associativity Declarations Instead of rewriting the Grammar: 1 Use the more natural (ambiguous) Grammar; 2 Along with disambiguating declarations. Most tools (e.g. YACC) allow precedence and associativity declarations for terminals (e.g, * takes precedence over + ) to disambiguate grammars (see the Book, Sections 4.8-4.9, for more details).

Inherent Ambiguity It would be nice if for every ambiguous grammar, there were some way to fix the ambiguity. Unfortunately, certain CFLs are inherently ambiguous, meaning that every grammar for the language is ambiguous.

Example: Inherent Ambiguity The language {0 i 1 j 2 k i = j or j = k} is inherently ambiguous. Intuitively, at least some of the strings of the form 0 n 1 n 2 n must be generated by two different parse trees, one based on checking the 0 s and 1 s, the other based on checking the 1 s and 2 s.

One Possible Ambiguous Grammar S AB CD A 0A1 01 A generates equal 0 s and 1 s B 2B 2 B generates any sequence of 2 s C 0C 0 C generates any sequence of 0 s D 1D2 12 D generates equal 1 s and 2 s There are two derivations of every string with equal numbers of 0 s, 1 s, and 2 s. E.g.: S AB 01B 012 S CD 0D 012

Ambiguity: Concluding Remarks No general techniques for handling ambiguity. Impossible to convert automatically an ambiguous Grammar to an unambiguous one. Used with care, ambiguity can simplify the Grammar Sometimes ambiguous Grammars allow for more natural definitions But then we need extra-grammatical disambiguation mechanisms.

Summary of Lecture V Generating Languages from Grammars Ambiguous Grammars