Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Similar documents
3. Context-free grammars & parsing

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Dr. D.M. Akbar Hussain

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

EECS 6083 Intro to Parsing Context Free Grammars

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax. In Text: Chapter 3

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Habanero Extreme Scale Software Research Project

A Simple Syntax-Directed Translator

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

COP 3402 Systems Software Syntax Analysis (Parser)

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Introduction to Parsing. Lecture 5

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS 314 Principles of Programming Languages

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Introduction to Parsing. Lecture 5

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis Check syntax and construct abstract syntax tree

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Describing Syntax and Semantics

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Chapter 3. Describing Syntax and Semantics

CSCI312 Principles of Programming Languages!

Building Compilers with Phoenix

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

This book is licensed under a Creative Commons Attribution 3.0 License

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

More Assigned Reading and Exercises on Syntax (for Exam 2)

Introduction to Parsing

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Programming Language Syntax and Analysis

Part 5 Program Analysis Principles and Techniques

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CSE 3302 Programming Languages Lecture 2: Syntax

CSCE 314 Programming Languages

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Parsing II Top-down parsing. Comp 412

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

2.2 Syntax Definition

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CPS 506 Comparative Programming Languages. Syntax Specification

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Compiler Design Concepts. Syntax Analysis

Chapter 3. Describing Syntax and Semantics ISBN

Syntax. A. Bellaachia Page: 1

Introduction to Parsing. Lecture 8

Principles of Programming Languages COMP251: Syntax and Grammars

Compilers and computer architecture From strings to ASTs (2): context free grammars

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

A simple syntax-directed

Optimizing Finite Automata

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Context-Free Grammars

Intro To Parsing. Step By Step

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Syntax Analysis Part I

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Programming Languages Third Edition

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Defining syntax using CFGs

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Chapter 3. Describing Syntax and Semantics

Specifying Syntax COMP360

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

ECE251 Midterm practice questions, Fall 2010

A programming language requires two major definitions A simple one pass compiler

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Context-Free Languages and Parse Trees

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Syntax Intro and Overview. Syntax

Compilers Course Lecture 4: Context Free Grammars

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Introduction to Syntax Analysis. The Second Phase of Front-End

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Context-Free Grammars

Introduction to Syntax Analysis

CMSC 330: Organization of Programming Languages

Defining Languages GMU

Chapter 4. Lexical and Syntax Analysis

Lecture 4: Syntax Specification

CMSC 330: Organization of Programming Languages. Context Free Grammars

Transcription:

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees 3.3.1 Parse trees 1. Derivation V.S. Structure Derivations do not uniquely represent the structure of the strings There are many derivations for the same string. The string of tokens: ( - ) * There exist two different derivations for above string (1) => op [ op ] (2) => op [ ] (3) => * [op * ] (4) => ( ) * [ ( ) ] (5) =>( op ) * [ op } (6) => ( op ) * [ ] (7) => ( - ) * [op - ] (8) => ( - ) * [ ] -------------- (1) => op [ op ] (2) => () op [ ( )] (3) => ( op ) op [ op ] (4) => ( op ) op [ ] (5) =>( - ) op [op - ] (6) => ( - ) op [ ] (7) => ( - ) * [op *] (8) =>( - ) * [ ] 2. Parsing Tree A parse tree corresponding to a derivation is a labeled tree. The interior nodes are labeled by non-terminals, the leaf nodes are labeled by terminals; And the children of each internal node represent the replacement of the associated Parsing non-terminal Tree in one step of the derivation. The example: => op => op => + => + The example: The example: => op => op => + => => op + => op => + => + Corresponding to the parse tree: Corresponding to the parse tree: op + The above parse tree is corresponds to the three derivations: 1

The above parse tree is corresponds to the three derivations: Left most derivation (1) => op (2) => op (3) => + (4) => + Right most derivation (1) => op (2) => op (3) => + (4) => + Neither leftmost nor rightmost derivation (1) => op (2) => + (3) => + (4) => + Generally, a parse tree corresponds to many derivations represent the same basic structure for the parsed string of terminals. It is possible to distinguish particular derivations that are uniquely associated with the parse tree. A leftmost derivation: A derivation in which the leftmost non-terminal is replaced at each step in the derivation. Corresponds to the preorder ing of the internal nodes of its associated parse tree. A rightmost derivation: Parsing Tree A derivation in which the rightmost non-terminal is replaced at each step in the derivation. The parse Corresponds tree corresponds to the postorder to ing the first of the internal nodes of its associated parse tree. derivation. The parse tree corresponds to the first derivation. 1 2 3 op 4 Example: The + ression (34-3)*42 Example: The ression (34-3)*42 The parse The tree parse for tree the above for the arithmetic above arithmetic ression ression 1 4 3 op 2 ( 5 ) * 8 7 op 6 2

Way Abstract Syntax-Tree CS 4203 Compiler Theory if (0) other else other statement if-stmt 3.3.2 Abstract syntax trees 1. Why Abstract The parse Syntax-Tree tree contains more information than The is absolutely parse tree contains necessary more for information a compiler than is absolutely necessary for a compiler For the example: 3*4 For the example: 3*4 Why Abstract Syntax-Tree op The principle of * syntax-directed translation (3) (4) The meaning, or semantics, of the string 3+4 should be directly related to its syntactic structure as represented by the parse tree. The principle of syntax-directed translation In this The case, meaning, the parse or semantics, tree should of the imply string 3+4 that should be directly related to its the value syntactic 3 and structure the value as represented 4 are to be by added. the parse tree. In A this much case, simpler the parse way tree to should represent imply that this the same value 3 and the value 4 are to be added. A information, much simpler way namely, to represent as the this treesame information, namely, as the tree + Tree for ression (34-3)*42 3 4 Tree for ression (34-3)*42 The ression (34-3)*42 whose parse tree can be The ression (34-3)*42 represented whose more parse simply tree can by the be represented tree: more simply by the tree: * - 42 34 3 The parentheses The tokens parentheses have actually tokens disappeared have actually disappeared still represents still precisely represents the precisely semantic the content semantic of content subtracting of 3 from 34, and then multiplying subtracting by 42. 3 from 34, and then multiplying by 42. 2. Abstract Syntax Trees or Syntax Trees Syntax trees represent abstractions of the actual source code token sequences, The token sequences cannot be recovered from them (unlike parse trees). Nevertheless they contain all the information needed for translation, in a more efficient form than parse trees. Example 3.8: The grammar for simplified if-statements statement if-stmt other if-stmt if ( ) statement if ( ) statement else statement 0 1 The parse tree for the string: if ( ) statement 3 else statement 0 other other

Using the grammar of Example 3.6 statement if-stmt other if-stmt if ( ) statement else-part else-part else statement 0 1 This same string has the following parse tree: if (0) other else other statement if-stmt if ( ) statement else-part 0 other else statement other A syntax tree for the previous string (using either the grammar of Example 3.4 or 3.6) would be: if (0) other else other if 0 other other Example 3.9: The grammar of a sequence of statements separated by semicolons from Example 3.7: stmt-sequence stmt ; stmt-sequence stmt stmt s The string s; s; s has the following parse tree with respect to this grammar: stmt-sequence stmt ; stmt-sequence s stmt ; stmt-sequence s A possible syntax tree for this same string is: ; s ; stmt s s s Bind all the statement nodes in a sequence together with just one node, so that the previous syntax tree would become seq s s s 4

3.4 Ambiguity 1. What is Ambiguity Parse trees and syntax trees uniquely ress the structure of syntax But it is possible for a grammar to permit a string to have more than one parse tree For example, the simple integer arithmetic grammar: op ( ) op + - * What is Ambiguity The string: 34-3*42 This string has two different parse trees. op op * _ op op - What is Ambiguity * Corresponding Corresponding to the two to leftmost the two derivations leftmost derivations => op => op op, => op op => - op => - op => - * => - * What is Ambiguity The associated syntax trees are * _ 42 => op => op => - => - op => - op => - * => - * 34 3 AND _ 34 * 3 42 2. An Ambiguous Grammar A grammar that generates a string with two distinct parse trees Such a grammar represents a serious problem for a parser Not specify precisely the syntactic structure of a program In some sense, an ambiguous grammar is like a non-deterministic automaton Two separate paths can accept the same string Ambiguity in grammars cannot be removed nearly as easily as non-determinism in finite automata No algorithm for doing so, unlike the situation in the case of automata 5

Ambiguous grammars always fail the tests that we introduce later for the standard parsing algorithms A body of standard techniques have been developed to deal with typical ambiguities that come up in programming languages. 3. Two Basic Methods dealing with Ambiguity One is to state a rule that specifies in each ambiguous case which of the parse trees (or syntax trees) is the correct one, called a disambiguating rule. The advantage: it corrects the ambiguity without changing (and possibly complicating) the grammar. The disadvantage: the syntactic structure of the language is no longer given by the grammar alone. The alternative is to Change the grammar into a form that forces the construction of the correct parse tree, thus removing the ambiguity. Of course, in either method we must first decide which of the trees in an ambiguous case is the correct one. 4. Remove The Ambiguity in Simple Expression Grammar Simply state a disambiguating rule that establishes the relative precedence of the three operations represented. The standard solution is to give addition and subtraction the same precedence, and to give multiplication a higher precedence. A further disambiguating rule is the associativity of each of the operations of addition, subtraction, and multiplication. Specify that all three of these operations are left associative Specify that an operation is nonassociative A sequence of more than one operator in an ression is not allowed. For instance, writing simple ression grammar in the following form: fully parenthesized ressions factor op factor factor factor ( ) op + - * Strings such as 34-3-42 and even 34-3*42 are now illegal, and must instead be written with parentheses such as (34-3) -42 and 34- (3*42). Not only changed the grammar, also changed the language being recognized. 3.4.2 Precedence and Associativity 1. Group of Equal Precedence The precedence can be added to our simple ression grammar as follows: addop term addop + - term term mulop term factor mulop * factor ( ) Addition and subtraction will appear "higher" (that is, closer to the root) in the parse and syntax trees Receive lower precedence. 2. Precedence Cascade Grouping operators into different precedence levels. Cascade is a standard method in syntactic specification using BNF. Replacing the rule addop term 6

by addop term term or term addop term A left recursive rule makes operators associate on the left A right recursive rule makes them associate on the right 3. Removal of Ambiguity Removal of ambiguity in the BNF rules for simple arithmetic ressions write the rules to make all the operations left associative addop term term addop + - term term mulop factor factor mulop * factor ( ) New Parse Tree New Parse Tree The parse tree for the ression 34-3*42 is addop term term _ term mulop factor factor factor * New Parse Tree The parse tree for the ression 34-3-42 addop term addop term _ factor term _ factor factor The precedence cascades cause the parse trees to become much more complex The syntax trees, however, are not affected The precedence cascades cause the parse trees to become much more complex The syntax trees, however, are not affected 3.4.3 The dangling else problem 1. An Ambiguity Grammar Consider the grammar from: statement if-stmt other if-stmt if ( ) statement if ( ) statement else statement 0 1 This grammar is ambiguous as a result of the optional else. Consider the string if (0) if (1) other else other 7

This string has two parse trees: statement if-stmt if ( ) statement else statement 0 if-stmt other if ( ) statement 1 other 2. Dangling else problem Which tree is correct depends on associating the single else-part with the first or the second if-statement. The first associates the else-part with the first if-statement; The second associates it with the second if-statement. This ambiguity called dangling else problem This disambiguating rule is the most closely nested rule implies that the second parse tree above is the correct one. An Example For example: if (x!= 0) if (y = = 1/x) ok = TRUE; else z = 1/x; Note that, if we wanted we could associate the else-part with the first if-statement by using brackets {...} in C, as in if (x!= 0) { if (y = = 1/x) ok = TRUE; } else z = 1/x; 3. A Solution to the dangling else ambiguity in the BNF statement matched-stmt unmatched-stmt matched-stmt if ( ) matched-stmt else matched-stmt other unmatched-stmt if ( ) statement if ( ) matched-stmt else unmatched-stmt 0 1 Permitting only a matched-stmt to come before an else in an if-statement, thus forcing all else-parts to be matched as soon as possible. 8

The associated parse tree for our sample string now becomes statement unmatched-stmt if ( ) statement 0 matched-stmt if ( ) matched-stmt else matched-stmt 1 other other Which indeed associates the else part with the second if-statement 3.5 Extended Notations: EBNF and Syntax Diagrams 3.5.1 EBNF Notation 1. Special Notations for Repetitive Constructs Repetition A A (left recursive), and A A (right recursive) where and are arbitrary strings of terminals and non-terminals, and In the first rule does not begin with A and In the second does not end with A Notation for repetition as regular ressions use, the asterisk *. A *, and A * EBNF opts to use curly brackets {...} to ress repetition A { }, and A { } The problem with any repetition notation is that it obscures how the parse tree is to be constructed, but, as we have seen, we often do not care. Example: The case of statement sequences The grammar as follows, in right recursive form: stmt-sequence stmt ; stmt-sequence stmt stmt s In EBNF this would appear as stmt-sequence { stmt ; } stmt (right recursive form) stmt-sequence stmt { ; stmt} (left recursive form) A more significant problem occurs when the associativity matters addop term term term { addop term } (imply left associativity) {term addop } term (imply right associativity) 9

2. Special Notations for Optional Constructs Optional construct are indicated by surrounding them with square brackets [...]. The grammar rules for if-statements with optional else-parts would be written as follows in EBNF: statement if-stmt other if-stmt if ( ) statement [ else statement ] 0 1 stmt-sequence stmt; stmt-sequence stmt is written as stmt-sequence stmt [ ; stmt-sequence ] 3.5.2 Syntax Diagrams Syntax Diagrams Syntax Diagrams: Graphical representations for visually representing EBNF rules. An example: consider the grammar rule factor ( ) The syntax diagram: factor ( ) Boxes representing terminals and non-terminals. Arrowed lines representing sequencing and choices. Non-terminal labels for each diagram representing the grammar rule defining that Non-terminal. A round or oval box is used to indicate terminals in a diagram. A square or rectangular box is used to indicate non-terminals. A repetition : A {B} A B An optional : A [B] A B Example: Consider the example of simple arithmetic ressions. addop term term 10

addop + - term term mulop factor factor mulop * factor ( ) This BNF includes associativity and precedence The corresponding EBNF is term { addop term } addop + - term factor { mulop factor } mulop * factor ( ) r, The corresponding syntax diagrams are given as follows: ex p term addop term addo p + - term factor factor mulop mulop * ( ) factor Example: Consider the grammar of simplified if-statements, the BNF Statement if-stmt other if-stmt if ( ) statement if ( ) statement else statement 0 1 and the EBNF statement if-stmt other if-stmt if ( ) statement [ else statement ] 0 1 The corresponding syntax diagrams are given in following figure. 11

if-stmt statement if-stmt if ( ex ) stateme p nt else stateme nt 0 1 12