Parsing. For a given CFG G, parsing a string w is to check if w L(G) and, if it is, to find a sequence of production rules which derive w.

Size: px
Start display at page:

Download "Parsing. For a given CFG G, parsing a string w is to check if w L(G) and, if it is, to find a sequence of production rules which derive w."

Transcription

1 Parsing For a given CFG G, parsing a string w is to check if w L(G) and, if it is, to find a sequence of production rules which derive w. Since, for a given language L, there are many grammars which generates the same language L, parsing must be done based on grammar G, not on its language L(G). Consider the following two context-free grammars G 1 and G 2 which generate the same language {a i b i i > 0 }. G 1 : S asb ab G 2 : S aa A Sb b Clearly, the following PDA recognizes this language. However, this PDA does not provide any information for identifying the grammar or a way for generating a given string with one of the grammars. ( a, a/aa ) (b, a/ε ) (a, Z 0 /az 0 ) (b, a/ ε ) (ε, Z 0 /Z 0 ) start 142

2 Two Derivation Rules Recall that if a context-free grammar G is unambiguous, for each string x in L(G), there is unique parse tree that yields x. So parse tree could be a good output form for parsing. However, it is not practical to output parse trees in two dimensional form. How about representing them in one-dimensional form, i.e., a sequence of productions rules applied? There is a problem with this approach; in general there can be more than one sequence of productions rules that generate the same string x. This is true even when the grammar is unambiguous. Recall that a string x is in L(G), if x can be derived by applying a sequence of production rules (one rule at a time) in G. Suppose that ababd is derived in the middle of such sequence. The final result is irrelevant to which nonterminal symbol is chosen in ababd to derive next sentential form (i.e., string of terminals and nonterminals). We should choose one from such multiple sequences of production rules. The sequence should be uniquely identifiable and effective to work with. There are two ways for the derivation defined as follows that can be uniquely identifiable. Leftmost (rightmost) derivation: A string is derived by iteratively applying a production rule with the leftmost (rightmost) nonterminal symbol of the current sentential form. 143

3 For the following grammar G, the leftmost and rightmost derivations are as shown below. G: S ABC A aa B a C cc c Leftmost derivation: S ABC aabc aaac aaacc aaacc Rightmost derivation: S ABC ABcC ABcc Aacc aaacc S A B C aa a c C c Notice that the sequence of productions applied with the leftmost derivation rule corresponds to the top-down left-to-right traversal of the parse tree, and the reverse sequence applied with the rightmost derivation rule corresponds to bottom-up left-to-right traversal of the parse tree. 144

4 The Basic Strategy for LL(k) Parsing Now we investigate how we can use a DPDA for parsing. Consider the following CFG G which generates language {a 10 x x = b i or x = c i, i > 0 }. (For convenience, when we refer a rule of G, we shall use the rule number shown above each of the rules.) (1) (2) (3) (4) (5) (6) (7) G: S AB AC A aaaaaaaaaa B bb b C cc c We want to design a DPDA which, given a string x {a, b} * on the input tape, outputs a sequence of production rules that generates x, if x L(G). We assume that the machine has an output port as shown in the figure below, and the grammar is stored in the memory as a lookup table. Let s first try a simple greedy strategy of generating a string in the stack that matches string x appearing in the input tape. Since any string in L(G) should be generated with the start symbol S, the machine initially pushes S in the stack entering in a working state q 1, and examine the input to choose a proper production rule for S. Recall that the conventional PDA sees the stack top, which is S, and decides whether it will read the next input symbol or not. a a a a a a a a a a b b b output port q 1 G SZ 0? 145

5 Without reading the input, the machine has no information available for choosing rule (1) or (2) for S. So we let the machine read the input. Suppose that the symbol read is a as shown in the figure below. This information does not help, because both rules (1) and (2) generates the same leading a s (actually 10 a s). The b s located after a s in the input string indicate that the first production rule to apply to generate the input string is rule (1), which is S aaaaaaaaaab. Using the conventional DPDA, it is impossible to correctly choose this production rule. a a a a a a a a a a b b b q 1 G SZ 0? To overcome this problem we equip the DPDA with the capability of looking ahead the input string by some constant k cells. For the current grammar the lookahead length k should be at least 11, because the first symbol b appears 11 cells away from the current input position. (Notice that the count includes current cell under the read head.) This symbol b is the nearest information in the input string that helps for choosing the correct production rule for S, rule (1) for the example. 146

6 Now, by looking 11 symbols ahead the machine knows that the input string should be derived by applying production rule (1) first, if it is a string generated by grammar G. So the machine replaces S in the stack top with the right side string of rule (1) and output rule number (1) as shown in the figure below. (Notice that looking ahead does not involve any move of the read head.) Whenever a terminal symbol appears at the stack top, the machine reads the input symbol, compares with the stack top and pops it if they match. Otherwise, the input is rejected. a a a a a a a a a a b b b α (1) q 1 G A B Z 0 q β G Machine Configuration: (q, α, β) (a) (b) For convenience, let (q, α, β) denote a configuration of the machine with current state (including G) q, the input portion α to be read, and current stack content β. From now on we shall use this triple for the machine configuration instead of a diagram. 147

7 The initial configuration (q 0, aaaaaaaaaabbb, Z 0 ) is routinely changed to ready configuration, aaaaaaaaaabbb, SZ 0 ). Based on the information looked ahead 11 positions, this configuration has been changed by applying rule (1) as shown below. Then seeing A at the stack top, the machine replaces A with the right side of rule (3). For this operation the machine does not need to look ahead because there is only one production rule for A. Now, the first 10 a s of the input can be successfully matched one by one with the 10 a s appearing at the stack top as follows. (The number above the arrows refer the production rule applied.) (1) (3) (q 0, aaaaaaaaaabbb, Z 0 ), aaaaaaaaaabbb, SZ 0 ), aaaaaaaaaabbb, ABZ 0 ), aaaaaaaaaabbb, aaaaaaaaaabz 0 )...., abbb, abz 0 ), bbb, BZ 0 )? Now symbol B appears at the stack top. Which of the production rules B bb b should have been applied to generate next input symbol b? Since there are more than one b, the next input b must be generated by rule B bb. To see if there are more than one b, the machine needs to look ahead 2 cells. Thus, the machine applies rule B bb whenever it sees two b s ahead, and applies rule B b when it sees one b. This way the machine successfully parse the the remaining input as the following slide shows. The last configuration (q 0, ε, Z 0 ), with empty stack and null input to parse, implies that the parsing has successfully completed. Its output is the sequence of production rules applied when a nonterminal symbol appears at the stack top. 148

8 (4) (4), bbb, BZ 0, bbb, bbz 0, bb, BZ 0, bb, bbz 0 ) (5), b, BZ 0, b, bz 0, ε, Z 0 ) The sequence of productions applied by this machine is shown below, which follows exactly the order of leftmost derivation. (1) (3) (4) (4) (5) S AB aaaaaaaaaab aaaaaaaaaabb aaaaaaaaaabbb aaaaaaaaaabbb We can easily see that the machine, given a string x in the input tape, can correctly generate the sequence of production rules in the order applied for the leftmost derivation for x if and only if x is in L(G). This machine parses the input string reading left-to-right looking ahead at most 11 cells and generates the sequence of productions rules applied according to leftmost derivation. We call this machine LL(11) parser. Conventionally LL(k) parser is represented with a table that shows, depending on the nonterminal symbol appearing at the stack top and look-ahead contents, which production rule should be applied. Reading the input symbols to match stack top terminal symbols and popping operations are usually omitted for convenience. 149

9 The parse table for the above example is shown below, where blank entries are for the cases not defined (i.e., the input is rejected), and x in the look-ahead contents is a don t care (wild card) symbol. Stack top S A B C 11 look-ahead aaaaaaaaaab aaaaaaaaaac bbxxxxxxxxx b ccxxxxxxxxx c ε (no look-ahead) AB AC aaaaaaaaaa bb b cc c 150

10 Example 1. Construct an LL(k) parser for the following CFG with minimum k. (1) (2) S asb aabbb This grammar generates the language {a i aabbbb i i 0 }. Consider string aaaabbbbb and its left most derivation; (1) (1) (1) (2) S asb aasbb aaasbbb aaaaabbbbbb Notice that aabbb at the center of this string is generated by rule S aabbb. If we let our parser look ahead 3 cells, it can select correct production rule that generates the next input symbol as follows. If it sees aaa, then the first a in this look-ahead contents must have been generated by rule S asb. If it is aab, then this string aab, together with the succeeding two b s, if any, must have been generated by production rule S aabbb. Based on this observation our LL(3) parser parses string aaaaabbbbbb as follows. First the parser gets ready by pushing S into the stack. (q 0, aaaaabbbbbb, Z 0, aaaaabbbbbb, SZ 0 )? 151

11 Our parser, looking aaa ahead, applies rule (1) S asb, and seeding a at the stack top, pop it reading a from the input tape. Thus, the configuration changes as follows. (1), aaaaabbbbbb, SZ 0, aaaaabbbbbb, asbz 0, aaaabbbbbbb, SbZ 0 )? Again, looking aaa ahead, the parser applies rule (1) S asb two more times as follows. (1), aaaabbbbbbb, SbZ 0, aaaabbbbbb, asbbz 0 ) (1), aaabbbbbbb, SbZ 0, aaabbbbbb, asbbz 0 ), aabbbbbb, SbbZ 0 )? Now, our parser looks aab ahead, applies rule (2) S aabbb and then matches remaining input symbols with the ones appearing on the stack top as follows., aabbbbbb, SbbZ 0, aabbbbbb, aabbbbbz 0 ), ε, Z 0 ) 152

12 The sequence of productions applied (1) (1) (1) (2) is exactly the one applied for the leftmost derivation deriving aaaaabbbbbb. Actually the parser derived the string in the stack according to the leftmost derivation rule. Clearly, this parser operates according to the following parsing table. Stack top S 3 look-ahead aaa aab asb aabbb 153

13 Example 2. Construct an LL(K) parser with minimum k for the following grammar. (1) (2) (3) (4) S aba ε A Saa b We will build an LL(2) parser by examining how it can parse string ababaaaa by deriving it in the stack according to the following leftmost derivation. (1) (3) (1) (3) (2) S aba absaa ababaaa ababsaaaa ababaaaa Following the routine initialization operation we have S at the stack top as follows. (q 0, ababaaaa, Z 0, ababaaaa, SZ 0 )? By looking ahead two cells on the tape, the parser sees ab. Definitely the next rule applied in the leftmost derivation is rule (1), which is the only rule producing ab to the left. So the parser applies rule (1) and the configuration changes as follows; 154

14 (1), ababaaaa, SZ 0, ababaaaa, abaz 0 ).., abaaaa, AZ 0 )? For convenience the grammar is repeated here. (1) (2) (3) (4) S aba ε A Saa b If rule (4) were applied for A, the next terminal symbol appearing in the next cell would be b, not a. The next input symbol must be generated by S. So, rule (3) must have been applied next to generate the input string. Thus the configuration is changed as follows. (3), abaaaa, AZ 0, abaaaa, SaaZ 0 )? Now, the parser looks ahead two cells and sees ab. Rule (1) must have been applied to derive the input string. If rule (2) were applied, the two look ahead contents would be either ε (for the case of null input string) or aa (generated by rule (3)). Thus the parser applies rule (1) for S on the stack top, and changes its configuration as follows. 155

15 (1), abaaaa, SaaZ 0, abaaaa, abaaaz 0 ).., aaaa, AaaZ 0 )? (1) (2) (3) (4) S aba ε A Saa b Again, since the next input symbol is a, next rule applied cannot be rule (4). It must be rule (3). Thus the configuration changes as follows. (3), aaaa, AaaZ 0, aaaa, SaaaaZ 0 )? Now, the parser looks ahead aa which cannot be generated by either rule (1) or (2). It must be generated by rule (3) previously. Thus the parser applies rule (2) as follows and then matches remaining input with the string in the stack. (2), aaaa, SaaaaZ 0, aaaa, aaaaz 0 )...., ε, Z 0 ) The sequence of rules applied by the parser on the stack top is exactly same as the sequence applied for the leftmost derivation deriving ababaaaa. 156

16 Clearly, this parser can parse the language with the following parsing table. Stack top S aba ε A 2 look-ahead ab aa bx BB Saa Saa b B: blank X: don t care ε 157

17 Example 3. The grammar below is not an LL(k) grammar for any fixed integer k. S A B A aa 0 B ab 1 Notice that the language of this grammar is { a n x n 0, x {0, 1} }. The strings in this language can have arbitrary large number of a s followed by either 0 or 1, depending on whether it is generated by rule S A or S B, respectively. With finite look ahead range k it is impossible to look ahead the crucial indicator (0 or 1) that is needed to decide which production rule is applied to generated the input string. For the given grammar, there is not LL(k) parser for any finite k. a a a a a a a a a a a a 0 q 1 G SZ 0? It is easy to see that for the following grammar, which generates the same language, we can construct an LL(1) parser. S as D D

18 Formal Definition of LL(k) Grammars Notation: Let (k) ω denote the prefix of length k of string ω. If ω < k, (k) ω = ω. For example, (2) ababaa = ab and (3) ab = ab. Definition (LL(k) grammar). Let G = (V T, V N, P, S) be a CFG. Grammar G is an LL(k) grammar for some fixed integer k, if it has the following property: For any two leftmost derivations S * ωaα ωβα * ωy and S * ωaα ωγα * ωx, where α, β,γ (V T V N )* and ω, x, y V T *, if (k) x = (k) y, then it satisfies β = γ. If a CFG G has this property, then, for every x L(G), we can decide the sequence of leftmost derivations which generates x by scanning x left to right, looking ahead at most k symbols. (If you are interested in the proof of this claim, see The Theory of Computation by D. Wood, or a book for compiler construction.) 159

Key to Homework #8. (a) S aa A bs bbb (b) S AB aaab A aab aaaaaaab B bba bbbb

Key to Homework #8. (a) S aa A bs bbb (b) S AB aaab A aab aaaaaaab B bba bbbb Key to Homework #8 1. For each of the following context-free grammars (a) and (b) below, construct an LL(k) parser with minimum k according to the following guide lines (i) and (ii). (a) a b bbb (b) B

More information

Introduction to Syntax Analysis. The Second Phase of Front-End

Introduction to Syntax Analysis. The Second Phase of Front-End Compiler Design IIIT Kalyani, WB 1 Introduction to Syntax Analysis The Second Phase of Front-End Compiler Design IIIT Kalyani, WB 2 Syntax Analysis The syntactic or the structural correctness of a program

More information

Decision Properties for Context-free Languages

Decision Properties for Context-free Languages Previously: Decision Properties for Context-free Languages CMPU 240 Language Theory and Computation Fall 2018 Context-free languages Pumping Lemma for CFLs Closure properties for CFLs Today: Assignment

More information

JNTUWORLD. Code No: R

JNTUWORLD. Code No: R Code No: R09220504 R09 SET-1 B.Tech II Year - II Semester Examinations, April-May, 2012 FORMAL LANGUAGES AND AUTOMATA THEORY (Computer Science and Engineering) Time: 3 hours Max. Marks: 75 Answer any five

More information

Introduction to Syntax Analysis

Introduction to Syntax Analysis Compiler Design 1 Introduction to Syntax Analysis Compiler Design 2 Syntax Analysis The syntactic or the structural correctness of a program is checked during the syntax analysis phase of compilation.

More information

Parsing. Top-Down Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 19

Parsing. Top-Down Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 19 Parsing Top-Down Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 19 Table of contents 1 Introduction 2 The recognizer 3 The parser 4 Control structures 5 Parser generators

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

Models of Computation II: Grammars and Pushdown Automata

Models of Computation II: Grammars and Pushdown Automata Models of Computation II: Grammars and Pushdown Automata COMP1600 / COMP6260 Dirk Pattinson Australian National University Semester 2, 2018 Catch Up / Drop in Lab Session 1 Monday 1100-1200 at Room 2.41

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity CMSC 330: Organization of Programming Languages Context-Free Grammars Ambiguity Review Why should we study CFGs? What are the four parts of a CFG? How do we tell if a string is accepted by a CFG? What

More information

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters : Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter

More information

LL(1) predictive parsing

LL(1) predictive parsing LL(1) predictive parsing Informatics 2A: Lecture 11 Mary Cryan School of Informatics University of Edinburgh mcryan@staffmail.ed.ac.uk 10 October 2018 1 / 15 Recap of Lecture 10 A pushdown automaton (PDA)

More information

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser

More information

Syntax Analysis Part I

Syntax Analysis Part I Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1 CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts UW CSE P 501 Spring 2018

More information

Multiple Choice Questions

Multiple Choice Questions Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Formal Language and Automata Theory Subject Code: CS 402 Multiple Choice Questions 1. The basic limitation of an FSM

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Chapter 4: LR Parsing

Chapter 4: LR Parsing Chapter 4: LR Parsing 110 Some definitions Recall For a grammar G, with start symbol S, any string α such that S called a sentential form α is If α Vt, then α is called a sentence in L G Otherwise it is

More information

Chapter 14: Pushdown Automata

Chapter 14: Pushdown Automata Chapter 14: Pushdown Automata Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 cappello@cs.ucsb.edu The corresponding textbook chapter should

More information

Context-Free Languages and Parse Trees

Context-Free Languages and Parse Trees Context-Free Languages and Parse Trees Mridul Aanjaneya Stanford University July 12, 2012 Mridul Aanjaneya Automata Theory 1/ 41 Context-Free Grammars A context-free grammar is a notation for describing

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

UNIT III & IV. Bottom up parsing

UNIT III & IV. Bottom up parsing UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

LL(1) predictive parsing

LL(1) predictive parsing LL(1) predictive parsing Informatics 2A: Lecture 11 John Longley School of Informatics University of Edinburgh jrl@staffmail.ed.ac.uk 13 October, 2011 1 / 12 1 LL(1) grammars and parse tables 2 3 2 / 12

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular So what happens to the languages which are not regular? Can we still come up with a language recognizer?

More information

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End Architecture of Compilers, Interpreters : Organization of Programming Languages ource Analyzer Optimizer Code Generator Context Free Grammars Intermediate Representation Front End Back End Compiler / Interpreter

More information

CS 44 Exam #2 February 14, 2001

CS 44 Exam #2 February 14, 2001 CS 44 Exam #2 February 14, 2001 Name Time Started: Time Finished: Each question is equally weighted. You may omit two questions, but you must answer #8, and you can only omit one of #6 or #7. Circle the

More information

Introduction to Parsing. Lecture 8

Introduction to Parsing. Lecture 8 Introduction to Parsing Lecture 8 Adapted from slides by G. Necula Outline Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Languages and Automata Formal languages

More information

Top down vs. bottom up parsing

Top down vs. bottom up parsing Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

QUESTION BANK. Formal Languages and Automata Theory(10CS56) QUESTION BANK Formal Languages and Automata Theory(10CS56) Chapter 1 1. Define the following terms & explain with examples. i) Grammar ii) Language 2. Mention the difference between DFA, NFA and εnfa.

More information

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half. Homework Context Free Languages Homework #2 returned Homework #3 due today Homework #4 Pg 133 -- Exercise 1 (use structural induction) Pg 133 -- Exercise 3 Pg 134 -- Exercise 8b,c,d Pg 135 -- Exercise

More information

Ambiguous Grammars and Compactification

Ambiguous Grammars and Compactification Ambiguous Grammars and Compactification Mridul Aanjaneya Stanford University July 17, 2012 Mridul Aanjaneya Automata Theory 1/ 44 Midterm Review Mathematical Induction and Pigeonhole Principle Finite Automata

More information

CT32 COMPUTER NETWORKS DEC 2015

CT32 COMPUTER NETWORKS DEC 2015 Q.2 a. Using the principle of mathematical induction, prove that (10 (2n-1) +1) is divisible by 11 for all n N (8) Let P(n): (10 (2n-1) +1) is divisible by 11 For n = 1, the given expression becomes (10

More information

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis. Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches

More information

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant Syntax Analysis: Context-free Grammars, Pushdown Automata and Part - 4 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler

More information

CS 321 Programming Languages and Compilers. VI. Parsing

CS 321 Programming Languages and Compilers. VI. Parsing CS 321 Programming Languages and Compilers VI. Parsing Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = words Programs = sentences For further information,

More information

SLR parsers. LR(0) items

SLR parsers. LR(0) items SLR parsers LR(0) items As we have seen, in order to make shift-reduce parsing practical, we need a reasonable way to identify viable prefixes (and so, possible handles). Up to now, it has not been clear

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1 CSE 401 Compilers LR Parsing Hal Perkins Autumn 2011 10/10/2011 2002-11 Hal Perkins & UW CSE D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts 10/10/2011

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Syntax Analysis Parsing Syntax Or Structure Given By Determines Grammar Rules Context Free Grammar 1 Context Free Grammars (CFG) Provides the syntactic structure: A grammar is quadruple (V T, V N, S, R)

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III Year, V Semester Section : CSE - 1 & 2 Subject Code : CS6503 Subject

More information

COMP 330 Autumn 2018 McGill University

COMP 330 Autumn 2018 McGill University COMP 330 Autumn 2018 McGill University Assignment 4 Solutions and Grading Guide Remarks for the graders appear in sans serif font. Question 1[25 points] A sequence of parentheses is a sequence of ( and

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grammars 1 Informal Comments A context-free grammar is a notation for describing languages. It is more powerful than finite automata or RE s, but still cannot define all possible languages.

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Easy for humans to write and understand String of characters

More information

Context Free Languages and Pushdown Automata

Context Free Languages and Pushdown Automata Context Free Languages and Pushdown Automata COMP2600 Formal Methods for Software Engineering Ranald Clouston Australian National University Semester 2, 2013 COMP 2600 Context Free Languages and Pushdown

More information

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4) CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries

More information

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing 8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string

More information

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3) CS 573 Automata Theory and Formal Languages Professor Leslie Lander Lecture # 20 November 13, 2000 Greibach Normal Form (GNF) Sheila Greibach s normal form (GNF) for a CFG is one where EVERY production

More information

Bottom-Up Parsing II. Lecture 8

Bottom-Up Parsing II. Lecture 8 Bottom-Up Parsing II Lecture 8 1 Review: Shift-Reduce Parsing Bottom-up parsing uses two actions: Shift ABC xyz ABCx yz Reduce Cbxy ijk CbA ijk 2 Recall: he Stack Left string can be implemented by a stack

More information

CSE431 Translation of Computer Languages

CSE431 Translation of Computer Languages CSE431 Translation of Computer Languages Top Down Parsers Doug Shook Top Down Parsers Two forms: Recursive Descent Table Also known as LL(k) parsers: Read tokens from Left to right Produces a Leftmost

More information

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6 Compiler Design 1 Bottom-UP Parsing Compiler Design 2 The Process The parse tree is built starting from the leaf nodes labeled by the terminals (tokens). The parser tries to discover appropriate reductions,

More information

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F). CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

More information

Compiler Construction

Compiler Construction Compiler Construction Exercises 1 Review of some Topics in Formal Languages 1. (a) Prove that two words x, y commute (i.e., satisfy xy = yx) if and only if there exists a word w such that x = w m, y =

More information

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012 Predictive Parsers LL(k) Parsing Can we avoid backtracking? es, if for a given input symbol and given nonterminal, we can choose the alternative appropriately. his is possible if the first terminal of

More information

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Outline Limitations of regular languages Introduction to Parsing Parser overview Lecture 8 Adapted from slides by G. Necula Context-free grammars (CFG s) Derivations Languages and Automata Formal languages

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2). LL(k) Grammars We need a bunch of terminology. For any terminal string a we write First k (a) is the prefix of a of length k (or all of a if its length is less than k) For any string g of terminal and

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG) CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG) Objectives Introduce Pushdown Automaton (PDA) Show that PDA = CFG In terms of descriptive power Pushdown Automaton (PDA) Roughly

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees Derivations of a CFG MACM 300 Formal Languages and Automata Anoop Sarkar http://www.cs.sfu.ca/~anoop strings grow on trees strings grow on Noun strings grow Object strings Verb Object Noun Verb Object

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 11 Ana Bove April 26th 2018 Recap: Regular Languages Decision properties of RL: Is it empty? Does it contain this word? Contains

More information

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology MIT 6.035 Parse Table Construction Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Parse Tables (Review) ACTION Goto State ( ) $ X s0 shift to s2 error error goto s1

More information

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309 PART 3 - SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 64 / 309 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

Types of parsing. CMSC 430 Lecture 4, Page 1

Types of parsing. CMSC 430 Lecture 4, Page 1 Types of parsing Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive)

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Parsing CMSC 330 - Spring 2017 1 Recall: Front End Scanner and Parser Front End Token Source Scanner Parser Stream AST Scanner / lexer / tokenizer converts

More information

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form Bottom-up parsing Bottom-up parsing Recall Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form If α V t,thenα is called a sentence in L(G) Otherwise it is just

More information

CS502: Compilers & Programming Systems

CS502: Compilers & Programming Systems CS502: Compilers & Programming Systems Top-down Parsing Zhiyuan Li Department of Computer Science Purdue University, USA There exist two well-known schemes to construct deterministic top-down parsers:

More information

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University Compilation 2012 Parsers and Scanners Jan Midtgaard Michael I. Schwartzbach Aarhus University Context-Free Grammars Example: sentence subject verb object subject person person John Joe Zacharias verb asked

More information

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars Plan for Today Context Free s models for specifying programming languages syntax semantics example grammars derivations Parse trees yntax-directed translation Used syntax-directed translation to interpret

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11

More information

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University Talen en Compilers 2015-2016, period 2 Johan Jeuring Department of Information and Computing Sciences Utrecht University January 17, 2016 13. LR parsing 13-1 This lecture LR parsing Basic idea The LR(0)

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011 MA53: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 8 Date: September 2, 20 xercise: Define a context-free grammar that represents (a simplification of) expressions

More information

UNIT-III BOTTOM-UP PARSING

UNIT-III BOTTOM-UP PARSING UNIT-III BOTTOM-UP PARSING Constructing a parse tree for an input string beginning at the leaves and going towards the root is called bottom-up parsing. A general type of bottom-up parser is a shift-reduce

More information

Formal Languages. Grammar. Ryan Stansifer. Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901

Formal Languages. Grammar. Ryan Stansifer. Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901 Formal Languages Grammar Ryan Stansifer Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901 http://www.cs.fit.edu/~ryan/ March 15, 2018 A formal language is a set

More information

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1 Review of CFGs and Parsing II Bottom-up Parsers Lecture 5 1 Outline Parser Overview op-down Parsers (Covered largely through labs) Bottom-up Parsers 2 he Functionality of the Parser Input: sequence of

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC2620 Introduction to Data Structures Lecture 9b Grammars I Overview How can a computer do Natural Language Processing? Grammar checking? Artificial Intelligence Represent knowledge so that brute force

More information

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence. Bottom-up parsing Recall For a grammar G, with start symbol S, any string α such that S α is a sentential form If α V t, then α is a sentence in L(G) A left-sentential form is a sentential form that occurs

More information

CSCI312 Principles of Programming Languages

CSCI312 Principles of Programming Languages Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGraw-Hill

More information

SYNTAX ANALYSIS 1. Define parser. Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing. 2. Mention the basic

More information

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh. Bottom-Up Parsing II Different types of Shift-Reduce Conflicts) Lecture 10 Ganesh. Lecture 10) 1 Review: Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Doesn

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information