Solving systems of regular expression equations

Similar documents
Syntax Analysis Part I

Natural Language Processing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Chapter 4: Syntax Analyzer

Parsing III. (Top-down parsing: recursive descent & LL(1) )

3. Parsing. Oscar Nierstrasz

CSE302: Compiler Design

Compiler Construction

Decision Properties for Context-free Languages

CSE 3302 Programming Languages Lecture 2: Syntax

Introduction to Syntax Analysis. The Second Phase of Front-End

A simple syntax-directed

CT32 COMPUTER NETWORKS DEC 2015

Introduction to Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CMSC 330: Organization of Programming Languages

A programming language requires two major definitions A simple one pass compiler

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS 314 Principles of Programming Languages

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Analysis, III Comp 412

Computer Science 160 Translation of Programming Languages

Lecture 8: Context Free Grammars

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Syntax Analysis, III Comp 412

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

The CYK Parsing Algorithm

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Building Compilers with Phoenix

Concepts Introduced in Chapter 4

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Context-Free Grammars and Languages (2015/11)

CMPT 755 Compilers. Anoop Sarkar.

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Top down vs. bottom up parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical and Syntax Analysis

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CSE431 Translation of Computer Languages


CS 406/534 Compiler Construction Parsing Part I

Abstract Syntax Trees & Top-Down Parsing

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CSCI312 Principles of Programming Languages!

2. Lexical Analysis! Prof. O. Nierstrasz!

4. Lexical and Syntax Analysis

Zhizheng Zhang. Southeast University

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Chapter 4. Lexical and Syntax Analysis

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

LECTURE 7. Lex and Intro to Parsing

Dr. D.M. Akbar Hussain

Part 3. Syntax analysis. Syntax analysis 96

Chapter 3: Lexical Analysis

CMSC 330 Practice Problem 4 Solutions

Introduction to parsers

Syntactic Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing

Parsing Chart Extensions 1

COP 3402 Systems Software Syntax Analysis (Parser)

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

JNTUWORLD. Code No: R

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Projects for Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CIT 3136 Lecture 7. Top-Down Parsing

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Recursive Descent Parsers

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CSCI312 Principles of Programming Languages

CS 321 Programming Languages and Compilers. VI. Parsing

CS415 Compilers. Lexical Analysis

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Transcription:

Solving systems of regular expression equations 1 of 27

Consider the following DFSM 1 0 0 0 A 1 A 2 A 3 1 1 For each transition from state Q to state R under input a, we create a production of the form Q Ra and for each final state, P, a production of the form P ε. Productions lead to equations in the obvious way. So, we have the equations A 1 =1A 1 +0A 2 A 2 =0A 3 +1a 1 A 3 =0A 3 +1A 1 +ε 2 of 27

These equations can be solved as follows: First solve for A 3 A 3 =0*1A 1 +0* Then replace A 3 in the equation for A 2 A 2 = 00*1A 1 +00*+1A 1 finally replacing A 2 in the equation for A 1 we have A 1 = 1A 1 +000*1A 1 +000*+01A 1 = (1+000*1+01)*000* the obvious answer! 3 of 27

Parsing In our compiler model, the parser accepts tokens and verifies that the string can be generated by the grammar. source program Lexical analyzer token get next token parser parse tree rest of front end Symbol table Three types of parsers -- tabular -- top-down -- bottom up 4 of 27

Backtrack parsing Backtrack parsing simulates a nondeterministic machine. There are both top-down and bottom up forms of backtrack parsing. Both forms may take an exponential amount of time to parse. Let us only discuss the top down version. The name top-down parsing comes from the idea that we attempt to produce a parse tree for the input string starting from the top (root) and working down to the leaves.

For top-down backtrack parsing, we begin with a tree containing one node labeled S, the start symbol of the grammar. We assume that the alternate productions for every nonterminal have been ordered. that is, if A α 1 α 2... α n are all the A productions in the grammar, we assign some ordering to the α i s (the alternates for A).

Recursively proceed as follows -- If the active node is labeled by a nonterminal, say A, then choose the first alternate, say X 1... X k, for A and create k direct descendants for A labeled X 1, X 2,..., X k. Make X 1 the active node. If k = 0, then make the node immediately to the right of A active. -- If the active node is labeled by a terminal, say a, then compare the current input symbol with a. If they match, then make active the node inmmediately to the right of of a and move the input pointer one symbol to the right. If a does not match the current input symbol, go back to the node where the previous production was applied, adjust the input pointer if necessary and try the next alternate. If no alternate is possible, go back to the next previous node, and so forth. 7 of 27

Cosider the grammar S asbs as c and the input string aacbc. The sequence of parsing tress would be as follows: S S S S S S S no match backtrack S a S no match backtrack S no match backtrack c c c c c c

S S S S a S a S a S a S c c c c 9 of 27

The Cocke-Younger-Kasami Algorithm Tabular methods include: Cocke-Younger- Kasami algorithm and Earley s algorithm. These are essentially dynamic programming methods and are discussed mainly because of their simplicity. However, they are highly inefficent (n 3 time and n 2 space). They are the parsing equivalent to bubble sort. 10 of 27

Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal form (CNF) if each production in P is one of the forms (1) A BC with A, B, C in N (2) A a with a in Σ, or (3) if ε is in L(G) then S ε is a production and S does not appear on the right hand side of any production. Theorem: Let L be a context free language (i.e. a language that can be generated by a context free grammar). Then L = L(G) for some CFG G in Chomsky Normal Form.

12 of 27

Example: The GFG, G, defined by S aab BA A BBB a B AS b can be converted to the equivalent CNF grammar, G S a <AB> BA A B <BB> a B AS b <AB> AB <BB> BB a a where G=(N,{a,b,P,S) and G =(N,{a,b,P,S). Here P and P correspond to the two sets of productions above and N ={S, A, B, <AB>, <BB>, a. 13 of 27

let w = a 1 a 2...a n be an input string which is to be parsed according to G. We assume that each a i is in Σ for 1 i n. The essence of the algorithm is the construction of a parse table T, whose elements we denote t ij for 1 i n and 1 j n-i+1. Each t ij will have a value which is a subset of N. A will be in t ij iff A + a i a i+1...a i+j-1, that is, if A derives the the j input symbols beginning at position i. As a special case, the input string w is in L(G) iff S is in t 1n.

15 of 27

16 of 27

Top Down parsing. A non-backtracking form of top-down parser is called a predictive parser (section 2.4 of the textbook). The main idea is that at each point it is possible to know by looking at the next input token which production should be applied. For most programming languages, the fact that each type of statement is initiated by a different keyword (e.g. for, while, if) facilitates the development of topdown predictive parsers. 17 of 27

Consider the following grammar type simple * id array [ simple ] of type symple integer char num : num Here the tokens are in bold. Starting from type, it is always clear which production to apply. -- Thus, if the next token is integer, char, or num, the production type simple should be appied. -- If it is *, the the production type * id should be applied. -- Finally, if it is array, type array [ simple ] of type should be applied Clearly, no need for backtracking ever arises. 18 of 27

Predictive parsing relies on the information about what first symbols can be generated by the right hand side of a production. Thus, -- FIRST(simple) = {integer, char, num -- FIRST(* id) = { * -- FIRST(array [ simple ] of type) = {array 19 of 27

20 of 27

Problems with predictive parsing. Consider the grammar S S a S b By just looking at the next token it is impossible to determine how many times the first production shoudl be applied. For example, if the input is baaaaa then the first production must be applied five times before the second one is. Grammars like this are said to be left recursive. Left recursion can be elimninated by changing the grammar. In this case to: S b S S a S ε 21 of 27

A second problem is that it may not be possible to determine by just looking at the next token which production to apply. Consider the following grammar: S i E t S i E t S e S a E b By just looking at the next token it is impossibel to determine whether the first or the second production should be applied. Again, changing the grammar can solve this problem: S i E t S S a S e S ε E b 22 of 27

Example Consider the grammar: A id = E E E + T E E - T E T E -T T T * F T T /F T F F (E) F id F number After eliminating the left recursion, we obtain 23 of 27

A id = E E T E E -T E E -T E E +T E T F T T * F T T / F T F (E) F id F number From where we can create the recursive descent parser. 24 of 27

procedure A(){ match(id); t=idnum; match( = ); expression(); emit( fstore ); emitln(t); procedure E(){ if ( lookahead == - ) { match( - ); T(); emit( fneg ); else T(); E (); procedure E (){ if ( lookahead == - ) { match( - ); T(); emit( fsub ); E (); elseif ( lookahead == + ) { match( + ); T(); emit( fadd ); E ();

procedure T(){ F(); T (); procedure T (){ if ( lookahead == * ) { match( * ); F(); emitln( fmul ); T (); elseif ( lookahead == / ) { match( / ); F(); emitln( fdiv ); T (); 26 of 27

procedure F(){ if ( lookahead == num ) { emit( ldc ); emitln(value); match (num); elseif lookahead == ( { match( ( ); E(); match( ) ); else { match(id); emit( fload ); emitln(idnum); 27 of 27