Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

Similar documents
Yet More CFLs; Turing Machines. CS154 Chris Pollett Mar 8, 2006.

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Decision Properties for Context-free Languages

JNTUWORLD. Code No: R

The CYK Parsing Algorithm

Context Free Grammars, Parsing, and Amibuity. CS154 Chris Pollett Feb 28, 2007.

UNIT I PART A PART B

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Syntax Analysis Part I

Multiple Choice Questions

Limits of Computation p.1/?? Limits of Computation p.2/??

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35


Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Languages and Compilers

Compiler Construction

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

COMP 330 Autumn 2018 McGill University

CYK parsing. I love automata theory I love love automata automata theory I love automata love automata theory I love automata theory

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form.

Normal Forms. Suradet Jitprapaikulsarn First semester 2005

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

Chapter 18: Decidability

Definition 2.8: A CFG is in Chomsky normal form if every rule. only appear on the left-hand side, we allow the rule S ǫ.

CT32 COMPUTER NETWORKS DEC 2015

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Formal Languages. Grammar. Ryan Stansifer. Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

Natural Language Processing

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University

Solving systems of regular expression equations

CS 44 Exam #2 February 14, 2001

CS143 Midterm Fall 2008

CSE 105 THEORY OF COMPUTATION

Decidable Problems. We examine the problems for which there is an algorithm.

CS 314 Principles of Programming Languages. Lecture 3

Introduction to Syntax Analysis

CS311 / MATH352 - AUTOMATA AND COMPLEXITY THEORY

Chapter 2 :: Programming Language Syntax

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

CMPT 755 Compilers. Anoop Sarkar.

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

QUESTION BANK. Unit 1. Introduction to Finite Automata

Introduction to Syntax Analysis. The Second Phase of Front-End

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Parsing. For a given CFG G, parsing a string w is to check if w L(G) and, if it is, to find a sequence of production rules which derive w.

CMSC 330 Practice Problem 4 Solutions

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Models of Computation II: Grammars and Pushdown Automata

Lexical and Syntax Analysis. Bottom-Up Parsing

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY SIRUVACHUR, PERAMBALUR DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Bottom-Up Parsing. Lecture 11-12

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Parsing. Top-Down Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 19

Ambiguous Grammars and Compactification

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

A Characterization of the Chomsky Hierarchy by String Turing Machines

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Midterm I (Solutions) CS164, Spring 2002

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Pushdown Automata. A PDA is an FA together with a stack.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CMSC 330, Fall 2009, Practice Problem 3 Solutions

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Lecture 12: Cleaning up CFGs and Chomsky Normal

Learn Smart and Grow with world

CS 314 Principles of Programming Languages

Formal Languages and Automata

Computation Engineering Applied Automata Theory and Logic. Ganesh Gopalakrishnan University of Utah. ^J Springer

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Non-deterministic Finite Automata (NFA)

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Chapter 4: LR Parsing

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

AUBER (Models of Computation, Languages and Automata) EXERCISES

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

CS402 - Theory of Automata Glossary By

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Context-Free Grammars and Languages (2015/11)

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

Bottom-Up Parsing. Lecture 11-12

Transcription:

Normal Forms and Parsing CS154 Chris Pollett Mar 14, 2007.

Outline Chomsky Normal Form The CYK Algorithm Greibach Normal Form

Chomsky Normal Form To get an efficient parsing algorithm for general CFGs it is convenient to have them in some kind of normal form. Chomsky Normal Form is often used. A CFG is in Chomsky Normal Form if every rule is of the form A-->BC or of the form A-->a, where A,B,C are any variables and a is a terminal. In addition the rule S--> λ is permitted.

Conversion to Chomsky Normal Form Any CFL L can be generated by a CFG in Chomsky Normal Form Proof Let G be a CFG for L. First we add a new start variable and rule S 0 -->S. This guarantees the start variable does not occur on the RHS of any rule. Second we remove any λ -rules A--> λ where A is not the start variable. Then for each occurrence of A on the RHS of a rule, say R--> uav, we add a rule R- -> uv. We do this for each occurrence of an A. So for R--> uavaw, we would add the rules R-->uvAw, R--> uavw, R--> uvw. If we had the rule R-->A, add the rule R--> λ unless we previously removed the rule R--> λ. Then we repeat the process with R. Next we handle unit rule A--> B. To do this, we delete this rule and then for each rule of the form B--> u, we add then rule A-->u, unless this is a unit rule that was previously removed. We repeat until we eliminate unit rules. Finally, we convert all the remaining rules to the proper form. For any rule A--> u 1 u 2 u k where k>=3 and each ui is a variable or a terminal symbol, we replace the rule with A --> u 1 A 1, A 1 --> u 2 A 2, A k-2 --> u k-1 u k. For any rule with k=2, we replace any terminal with a new variable U i and a rule U i --> u i.

Example Use the algorithm to convert: S--> Aba, A-->aab, B--> Ac to Chomsky Normal Form Step 1: Add new start variable to get: S 0 -->S, S--> Aba, A-->aab, B-->Ac Step 2: Remove λ rules. In this case, there aren t any so we still have: S 0 -->S, S--> Aba, A-->aab, B-->Ac Step 3: Remove unit rules. Only have one, involving S 0. S 0 --> Aba, S--> Aba, A-->aab, B-->Ac Step 4: Split up rules with RHS of length longer than 2: S 0 --> AC 1, C 1 -->ba, S--> AD 1, D 1 -->ba, A-->aE 1, E 1 -->ab, B-->Ac Step 5: Put each rule with RHS of length 2 into the correct format: S 0 --> AC 1, C 1 -->B 1 A 1, B 1 -->b, A 1 --> a, S--> AD 1, D 1 --> B 2 A 2, B 2 -->b, A 2 --> a, A-->A 3 E 1, A 3 --> a E 1 --> A 4 B 4, B 4 -->b, A 4 --> a, B-->AC 2, C 2 -->c The answer

Introduction to Cocke-Younger- Kasami (CYK) algorithm (1960) This is an O(n 3 ) algorithm to check if a string w is can be generated by a CFG in Chomsky Normal Form. As cubic algorithms tend to be slow, in practice people use algorithms based on restricted types of CFGs with a fixed amount of lookahead. Either top down LL parsing or bottom-up LR parsing. These algorithms are based on the PDA model. There have been improvements to CYK algorithm which reduce the run-time slightly below cubic (n 2.8 ) and to quadratic in the case of an unambiguous grammar.

The CYK algorithm The idea is to build a table such that table(i,j) contains those variables that can generate the substring of w start at location i until location j. Algorithm: On input w= w 1 w 2 w n : 1. If w = ε and S--> ε is a rule accept. 2. For i = 1 to n: [set up the substring of length 1 case] 3. For each variable A: 4. Test whether A--> b is a rule, where b=w i 5. If so, place A in table(i,i). 6. For l = 2 to n: [Here l is a length of a substring] 7. For i = 1 to n - l + 1: [i is the start of the substring] 8. Let j = i + l - 1, [j is the end of the substring] 9. For k = i to j-1: [k is a place to split substring] 10. For each rule A-->BC 11. If table(i,k) contains B and table(k+1, j) contains C put A in table(i,j). 12. If S is in table(1,n) accept. Otherwise, reject.

Example Consider the context free grammar S-->AT, S-->c, T-->SB, A-->a, B-->b. Let s look at the steps CYK would do to check if aacbb was in the language. We ll abbreviate table(i,j) as T(i,j) First, lines 2-5 would be used to set T(1,1) = {A}, T(2,2) = {A}, T(3,3) = {S}, T(4,4) = {B}, T(5,5) = {B}. The l=2 pass of lines 8-11 then fills in the table for substrings of length 2. We get T(1,2)={}, T(2,3)={}, T(3,4) = {T}, T(4,5) ={}. Notice we added T to T(3,4) because T(3,3) was {S} and T(4,4) was {B} and we have a rule T-- >SB. The l=3 pass of lines 8-11 then fill is the table for substrings of length 3. We get T(1,3)={}, T(2,4)={S}, T(3,5)={}. Here T(2,4)={S}, since T(2,2)={A} and T(3,4)={T} and we have the rule S--> AT The l=4 pass of lines 8-11 then fill is the table for substrings of length 4. We get T(1,4)={} and T(2,5)={T}. This follows as T(2,4)={S} and T(5,5)={B} and T-->SB. Finally, when the l=5 pass of lines 8-11 then fill is the table for substrings of length 5. i.e., the whole string aacbb. In this case, T(1,5)={S} since T(1,1)={A} and T(2,5)={T} and we have the rule S-->AT. As T(1,5) has the start variable we know the string aacbb is generated by the whole grammar.

Greibach Normal Form A CFG is said to be in Greibach Normal Form if all productions are of the form A-->ax where a is a terminal and x is a string of variables (possibly the empty string). It turns out that any CFG is equivalent to one in Greibach Normal Form. Notice unlike an s-grammar, a grammar in Greibach Normal Form is allowed to have multiple rules with the same (A, a). Greibach Normal Form is interesting for two reasons: (1) it can be used in proofs of equivalences of CFL with those languages recognized by a certain type of automaton with a stack. (2) it also gives a slightly different upper bound on the time/space needed to parse a string in a CFG. To see notice, a string is generated by a a CFG in Greibach Normal Form then as each rule consumes one string letter, the derivation will be of linear length. The brute force algorithm on this string might still take exponential time since if we are currently reading an a, there might be several applicable rules. However, as all particular derivations are of linear length, the algorithm is a linear space algortihm. Further, if we could nondeterministically guess which rule to apply, we could find verify a derivation in linear time. This shows the context free languages are in nondeterministic linear time.

Example Converting to Greibach Normal Form Consider the CFG S-->abSb aa. To convert to GNF we introduce new variables A, B with rules A-->a and B-->b. The the grammar S--> absb aa A--> a B-->b is in GNF.