Context-Free Grammars

Similar documents
Context-Free Languages and Parse Trees

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Ambiguous Grammars and Compactification

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Context-Free Grammars and Languages (2015/11)

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Introduction to Parsing. Lecture 8

CMSC 330: Organization of Programming Languages

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Multiple Choice Questions

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Optimizing Finite Automata

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Lecture 8: Context Free Grammars

Introduction to Parsing. Lecture 5

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

UNIT I PART A PART B

JNTUWORLD. Code No: R

CS525 Winter 2012 \ Class Assignment #2 Preparation

Introduction to Syntax Analysis. The Second Phase of Front-End

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Introduction to Syntax Analysis

Syntax Analysis Check syntax and construct abstract syntax tree

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

CS 321 Programming Languages and Compilers. VI. Parsing

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

Syntax Analysis Part I

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Dr. D.M. Akbar Hussain

Introduction to Parsing. Lecture 5

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

EECS 6083 Intro to Parsing Context Free Grammars

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Formal Languages. Grammar. Ryan Stansifer. Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form.

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

COP 3402 Systems Software Syntax Analysis (Parser)

Compilers Course Lecture 4: Context Free Grammars

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

A Simple Syntax-Directed Translator

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Compiler Construction

Describing Syntax and Semantics

Compiler Design Concepts. Syntax Analysis

CS 44 Exam #2 February 14, 2001

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Parsing II Top-down parsing. Comp 412

CT32 COMPUTER NETWORKS DEC 2015

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax. In Text: Chapter 3

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Finite State Automata are Limited. Let us use (context-free) grammars!

Chapter 18: Decidability

Context-Free Grammars


Languages and Compilers

Defining Languages GMU

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Properties of Regular Expressions and Finite Automata

Compilers and computer architecture From strings to ASTs (2): context free grammars

Reflection in the Chomsky Hierarchy

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Defining syntax using CFGs

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Recursion and Structural Induction

CSE302: Compiler Design

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Formal Languages. Formal Languages

Natural Language Processing

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Context-Free Grammars

15 212: Principles of Programming. Some Notes on Grammars and Parsing

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Introduction to Parsing Ambiguity and Syntax Errors

CS 314 Principles of Programming Languages

Introduction to Parsing Ambiguity and Syntax Errors

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Transcription:

Context-Free Grammars 1

Informal Comments A context-free grammar is a notation for describing languages. It is more powerful than finite automata or RE s, but still cannot define all possible languages. Useful for nested structures, e.g., parentheses in programming languages. 2

Informal Comments (2) Basic idea is to use variables to stand for sets of strings (i.e., languages). These variables are defined recursively, in terms of one another. Recursive rules ( productions ) involve only concatenation. Alternative rules for a variable allow union. 3

Context Free Grammar G = (V,, P, S) V: Alphabet of non-terminals : Alphabet of terminals P: set of productions of the form A -> where A V and (V U )* S: an element of V, called the start symbol 4

Example: CFG for { 0 n 1 n n > 1} Productions: S -> 01 S -> 0S1 Basis: 01 is in the language. Induction: if w is in the language, then so is 0w1. 5

CFG Formalism Terminals = symbols of the alphabet of the language being defined. Variables = nonterminals = a finite set of other symbols, each of which represents a language. Start symbol = the variable whose language is the one being defined. 6

Productions A production has the form variable -> string of variables and terminals. Convention: A, B, C, are variables. a, b, c, are terminals., X, Y, Z are either terminals or variables., w, x, y, z are strings of terminals only.,,, are strings of terminals and/or variables. 7

Example: Formal CFG Here is a formal CFG for { 0 n 1 n n > 1}. Terminals = {0, 1}. Variables = {S}. Start symbol = S. Productions = S -> 01 S -> 0S1 8

Derivations Intuition We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the right side of one of its productions. That is, the productions for A are those that have A on the left side of the ->. 9

Derivations Formalism We say A => if A -> is a production. A binary relation over Example: S -> 01; S -> 0S1. S => 0S1 => 00S11 => 000111. 10

Iterated Derivation =>* means zero or more derivation steps. A binary relation over, which relates 2 strings Basis: =>* for any string. Induction: if =>* and =>, then =>*. 11

Example: Iterated Derivation S -> 01; S -> 0S1. S => 0S1 => 00S11 => 000111. So S =>* S; S =>* 0S1; S =>* 00S11; S =>* 000111. 12

Another Example Terminals = {a, b}. Variables = {S}. Start symbol = S. Productions = S -> asa S -> bsb S -> 13

Sentential Forms Any string of variables and/or terminals derived from the start symbol is called a sentential form. Formally, is a sentential form iff S =>*. 14

Language of a Grammar If G is a CFG, then L(G), the language of G, is {w S =>* w}. Note: w must be a terminal string, S is the start symbol. Example: G has productions S -> ε and S -> 0S1. L(G) = {0 n 1 n n > 0}. Note: ε is a legitimate right side. 15

Context Free Grammar A grammar G = (V,, P, S) is a context free grammar if each element of P is of the form: X -> where X V and (V U )* 16

Context-Free Languages A language L is a context-free language if there is a context-free grammar G such that L = L (G) L(G) is the language generated by the grammar G There are CFL s that are not regular languages, such as the example just given. But not all languages are CFL s. Intuitively: CFL s can count two things, not three. 17

Another Example S -> ab S -> ba A -> as A baa A -> a B -> bs B -> abb B -> b V = {S, A, B}, = {a,b}, P, S 18

Derivation Example S => ab => aabb => aabbs => aabbba => aabbba Questions: Choice between terminals in the RHS to replace. Also, which rule to use 19

Leftmost and Rightmost Derivations Derivations allow us to replace any of the variables in a string. Leads to many different derivations of the same string. By forcing the leftmost variable (or alternatively, the rightmost variable) to be replaced, we avoid these distinctions without a difference. 20

S => ab => aabb => aabsb => aababb => aababb => aababb Derivation Example 21

Derivation Example Any string x such that S =>* x, x *, will always have a lm derivation Every NT is rewritten independent of its context A -> x does not constrain us on which context A occurs in Context free 22

Leftmost Derivations Say wa => lm w if w is a string of terminals only and A -> is a production. Also, =>* lm if becomes by a sequence of 0 or more => lm steps. 23

Example: Leftmost Derivations Balanced-parentheses grammmar: S -> SS (S) () S => lm SS => lm (S)S => lm (())S => lm (())() Thus, S =>* lm (())() S => SS => S() => (S)() => (())() is a derivation, but not a leftmost derivation. 24

Rightmost Derivations Say Aw => rm w if w is a string of terminals only and A -> is a production. Also, =>* rm if becomes by a sequence of 0 or more => rm steps. 25

Example: Rightmost Derivations Balanced-parentheses grammmar: S -> SS (S) () S => rm SS => rm S() => rm (S)() => rm (())() Thus, S =>* rm (())() S => SS => SSS => S()S => ()()S => ()()() is neither a rightmost nor a leftmost derivation. 26

Examining a grammar G1 = {{S}, {0, 1}, {S -> 01; S -> 0S1}, S} Claim: G1 generates precisely all strings of the form 0 n 1 n, n >= 1 To show: L(G1) {0 n 1 n, n >= 1} and {0 n 1 n, n >= 1} L(G1) 27

Examining a grammar Basis: n = 1, S = 01 IH: G1 generates all strings of the form 0 n 1 n To show 0 n+1 1 n+1 can be generated. We can prove S =>* 0 n S1 n always Assume S =>* 0 n S1 n and show S =>* 0 n+1 S1 n+1 Other side: This grammar generates strings only of this form 28

S -> ab S -> ba A -> as A baa A -> a B -> bs B -> abb B -> b The other grammar G: V = {S, A, B}, = {a,b}, P, S 29

A word about the grammar L(G) = {x {a, b}*, x >= 2 such that x has equal number of a s and b s} If S =>* x {a, b}*, then x has equal number of a s and b s and any string with equal number of a s and b s can be generated from S If A =>* x {a, b}*, then x has one more a than b and any string with one more a than b can be generated from A If B =>* x {a, b}*, then x has one more b than a and any string with one more b than a can be generated from B 30

P1: L(G) = {x {a, b}* S =>* x} P2: {x {a, b}* no. of a s and b s are equal} Q1: L(G) = {x {a, b}* A =>* x} Q2: {x {a, b}* no. of a s is one more} R1: L(G) = {x {a, b}* B =>* x} R2: {x {a, b}* no. of b s is one more} To show P1 = P2, Q1 = Q2 and R1=R2 Induction over the length of the string helps us show: P2 P1, Q2 Q1, R2 R1 Induction over the length of the derivation helps us show: P1 P2, Q1 Q2, R1 R2 31

Proof outline for P2 P1 Simultaneous induction Base case: S =>* ab and S =>* ba IH: Any string w {a, b}* with n a s and n b s can be generated by S, i.e. S =>* w, w has n a s and n b s To show: Any string x with n+1 a s and n+1 b s can be generated by S, i.e. S =>* x, x has n+1 a s and n+1 b s Examine the structure of x, will start with either a or b Case 1: x = ay y has one more b than a B can generate it, hence B =>* y [IH] Hence, we can use S =>* ab to generate x Case 2: x = bz z has one more a than b A can generate it, hence A =>* z [IH] Hence, we can use S =>* ba to generate x Hence, proved. 32

Proof outline for Q2 Q1 Base case: A =>* a IH: Any string w {a, b}* with n a s and n-1 b s can be generated by A, i.e. S =>* w, w has n a s and n-1 b s To show: Any string x with n+1 a s and n b s can be generated by A, i.e. A =>* x, x has n+1 a s and n b s Examine the structure of x, will start with either a or b Case 1: x = ay y has equal number of a s and b s, which can be generated by S To generate x, we can use A =>* as and thus, A =>* as =>* ay Case 2: x = bz z has 2 more a s than b s Z can be written as z = vw, where each of v and w have one more a than B => Both v and w can be generated by A Therefore, to generate x, we can use A =>* baa =>* bvw =>* bz 33

To show: P1 P2 Q1 Q2 R1 R2 1. Every string over {a,b}* derived from S has equal no. of a s and b s 2. Every string over {a,b}* derived from A has one more a 3. Every string over {a,b}* derived from B has one more b Induction over the length of the derivation S => 1 => 2. => n {a, b}* Base case: do not get a terminal string in one step from S For S, base case will be 2-step derivation: ab and ba (can verify both can be derived from S) For A, can derive a in 1 step but no other terminal string in 2 steps For B, can derive b in 1 step but no other terminal string in 2 steps 34

IH: All derivations up-to length k satisfies (1)-(3) Consider a derivation of length k + 1, starting with S S => 1 => 2. => k+1 {a, b}* First step: it can be either S => ab or ba Assume we take S => ab => 2. => k+1 Argue about k+1 being generated from B 35

Parse Trees Definitions Relationship to Left- and Rightmost Derivations Ambiguity in Grammars 36

Parse Trees Parse trees are trees labeled by symbols of a particular CFG. Leaves: labeled by a terminal or ε. Interior nodes: labeled by a variable. Children are labeled by the right side of a production for the parent. Root: must be labeled by the start symbol. 37

Example: Parse Tree S -> SS (S) () S S S ( S ) ( ) ( ) 38

Yield of a Parse Tree The concatenation of the labels of the leaves in left-to-right order That is, in the order of a preorder traversal. is called the yield of the parse tree. Example: yield of S is (())() S S ( S ) ( ) ( ) 39