MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Similar documents
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Context-Free Languages and Parse Trees

Context-Free Grammars

Syntax Analysis Check syntax and construct abstract syntax tree

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Ambiguous Grammars and Compactification

Syntax Analysis Part I

Context-Free Grammars and Languages (2015/11)

Introduction to Parsing. Lecture 8

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Introduction to Syntax Analysis

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Introduction to Syntax Analysis. The Second Phase of Front-End

CS 314 Principles of Programming Languages

Optimizing Finite Automata

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Definition: two derivations are similar if one of them precedes the other.

Languages and Compilers

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Context-Free Grammars

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

CS525 Winter 2012 \ Class Assignment #2 Preparation

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

LL(1) predictive parsing

Compilers Course Lecture 4: Context Free Grammars

Introduction to Parsing. Lecture 5

CT32 COMPUTER NETWORKS DEC 2015

Compiler Design Concepts. Syntax Analysis

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Introduction to Parsing. Lecture 5

Lecture 8: Context Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)


A Simple Syntax-Directed Translator

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Properties of Regular Expressions and Finite Automata

LL(1) predictive parsing

Models of Computation II: Grammars and Pushdown Automata

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Formal Languages. Grammar. Ryan Stansifer. Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901

COP 3402 Systems Software Syntax Analysis (Parser)

Introduction to Parsing Ambiguity and Syntax Errors

UNIT I PART A PART B

Introduction to Parsing Ambiguity and Syntax Errors

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

15 212: Principles of Programming. Some Notes on Grammars and Parsing

3. Parsing. Oscar Nierstrasz

Intro To Parsing. Step By Step

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Slides for Faculty Oxford University Press All rights reserved.

CMSC 330: Organization of Programming Languages

Defining Languages GMU

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Principles of Programming Languages COMP251: Syntax and Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

Bottom-Up Parsing II. Lecture 8

Lexical and Syntax Analysis

2.2 Syntax Definition

CMSC 330: Organization of Programming Languages

CS143 Notes: Parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

Types of parsing. CMSC 430 Lecture 4, Page 1

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Parsing II Top-down parsing. Comp 412

14.1 Encoding for different models of computation

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Lecture Bottom-Up Parsing

Chapter 3. Describing Syntax and Semantics ISBN

Transcription:

MA53: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 8 Date: September 2, 20 xercise: Define a context-free grammar that represents (a simplification of) expressions in typical programming language such that the expression may contains + (addition), (multiplication) as operators and identifiers. The identifiers can be formed from the letters a and b and the digits 0 and only. very identifier must begin with a or b, which may be followed by any string in {a, b, 0, }. Solution: We need two variables in this grammar. To represent expression, we use variable. t is the start symbol and represents the language of expressions we are defining. The other variable is, represents identifiers. ts language is actually regular, it is the language of the regular expression (a + b)(a + b + 0 + ) The rules of the grammar are as follows: Table : Rules of the context-free grammar. 2. + 3. 4. () 5. a 6. b 7. a 8. b 9. 0 0. The grammar for expressions is stated formally as G = ({, }, T, R, ), where T is the set of symbols {+,, (, ), a, b, 0, }, R is the set of rules shown in above table and is the start symbol. We interpret the rules as follows: Rule () is the basis for expressions. t says that an expression can be a single identifier. Rules (2) - (4) describe the inductive case for expressions. Rule (2) (resp. Rule (3) )says that an expression can be two expressions connected by a plus (resp. multiplication) sign. Rule (4) says that if we take any expression and put matching parentheses around it, the result is also an expression. Rules (5) - (0) describe identifiers. Rules (5) and (6) say that a and b are identifiers. The remaining four rules are the inductive case. They say that if we have any identifier, we can follow it by a, b, 0, or, and the result will be another identifier.

Derivations Using a Grammar (What is the language L(G) defined by a CFG G?) The process of deriving strings of language L(G) from a CFG G by applying rules requires the definition of new relation symbol. Suppose G = (V, T, R, S) is a CFG. Let αaβ be a string of variables and terminals, with A a variable. That is α and β are strings in (V T), and A is in V. Let A γ a rule of G. Then we say αaβ G αγβ. f G is understood, we just say αaβ αγβ. Note that one derivation step replaces any variable anywhere in the string by the body of one of its rules. We may extend the relationship to present zero, one, or many derivation steps, much as the transition function δ of a finite automaton was extended to ˆδ. For derivations, we use a to denote zero or more steps, as follows: Basis: For any string α of terminals or variables, we say α α. That is, any string derives itself. nduction: f α β and β γ, then α γ. xample: We can infer that for the rules in Table, (a0+b) (a+b) is in the language of variable by showing a derivation starting with as given bellow: ( + ) ( + ) ( + ) (0 + ) (0 + ) (a0 + ) (a0 + ) (a0 + ) (a0+b) (a0+b) () (a0+b) ( +) (a0+b) ( +) (a0 + b) ( + ) (a0 + b) (a + ) (a0 + b) (a + ) (a0 + b) (a + b). The Language of a Grammar f G = (V, T, R, S) is a CFG, the language of G, denoted by L(G), is the set of all terminal strings that have derivations from the start symbol S. That is, L(G) = {w T : S w} f a language L is the language of some context-free grammar, then L is said to be a context-free language (CFL). Two grammars G and G 2 are said to be equivalent if and only if L(G ) = L(G 2 ). Leftmost and Rightmost Derivations Leftmost derivation: At each step we replace the leftmost variable by one of its production/rule bodies. For leftmost derivation we use lm and lm for one and many steps respectively. (The derivation of the above example was actually a leftmost derivation.) Rightmost derivation: At each step we replace the rightmost variable by one of its production/rule bodies. For rightmost derivation we use rm and rm for one and many steps respectively. 2

Sentential Forms Derivation from the start symbol produce strings that have a special role. We call these sentential forms. That is, if G = (V, T, R, S) is a CFG, then any string α in (V T) such that S α is a sentential form. f S lm α, then α is a left sentential form, and if S rm α, then α is a right sentential form. Note that the language L(G) is those sentential forms that are in T (i.e., they consist solely of terminals). xample: Consider the grammar for expressions from Table. For example, ( + ) is a sentential form, since there is derivation () ( + ) ( + ) However this derivation is neither leftmost nor rightmost, since at the last step, the middle is replaced. Parse Tree Let G = (V, T, R, S) be a context-free grammar. The parse trees for G are trees with the following conditions:. ach internal node is labeled by a variable in V. 2. ach leaf is labeled by either (i) a variable, (ii) a terminal, or (iii) ɛ. However, if the leaf is labeled by ɛ, then it must be the only child of its parent. 3. f an interior node is labeled by A, and its children are labeled X, X 2,...,X k respectively, from the left, then A X X 2...X k is a rule in R. xample: * ( ) ( + ) * ( ) + (i) a 0 b a (ii) b Figure : (i) A parse tree showing the derivation of () from, and (ii) parse tree showing (a0 + b) (a + b) is in the language of CFG in table Definition: The yield of a parse tree is the concatenation of leaves of any parse tree from left to right. 3

An yield is always a string (derived from the root variable). f the root is start symbol S of CFG, then yields are strings in the language. Ambiguity in Grammars and Languages Consider the sentential form + for the grammar defined in Table. t has two derivations from :. + + 2. + Notice that in derivation (), the second is replaced by, while in derivation (2), the first is replaced by +. Figure 2 shows the two parse trees, which are distinct trees. + * * + (a) (b) Figure 2: Two parse trees with the same yield The difference between these two derivations is significant. As far as the structure of the expressions is concerned, derivation () says that the second and third expressions are multiplied, and the result is added to the first expression, while derivation (2) adds the first two expressions and multiplies the result by third. n more concrete terms, the first derivation suggests that +2*3 should be grouped +(2*3) = 7, while the second derivation suggests the same expression should be grouped (+2) * 3 = 9. Obviously, the first of these (not the second), matches our notion of correct grouping of arithmetic expressions. Since the grammar of Table gives two different structures to any string of terminals that is derived by replacing the three expressions in + by identifiers, we see that this grammar is not a good one for providing unique structure. n particular, while it can give strings the correct grouping as arithmetic expressions, it also gives them incorrect groupings. On the other hand, the mere existence of different derivations for a string (as opposed to different parse trees) does not imply a defect in the grammar. For example, the string a b has many derivations. Two examples are. a a a b 2. b b a b 4

Definition: A CFG G = (V, T, R, S) is ambiguous if there is at least one string w T for which two different parse tree exist., each with root labeled S and yield w. f each string has at most one parse tree in the grammar, then the grammar is said to be unambiguous. 5