Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Similar documents
Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Compiler Construction: Parsing

S Y N T A X A N A L Y S I S LR

Context-free grammars

LALR Parsing. What Yacc and most compilers employ.

Principles of Programming Languages

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

LR Parsing Techniques

Lexical and Syntax Analysis. Bottom-Up Parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Conflicts in LR Parsing and More LR Parsing Types

LR Parsing LALR Parser Generators

Bottom-Up Parsing. Lecture 11-12

LR Parsing LALR Parser Generators

In One Slide. Outline. LR Parsing. Table Construction

Compiler Construction 2016/2017 Syntax Analysis

LR Parsers. Aditi Raste, CCOEW

Compilers. Bottom-up Parsing. (original slides by Sam

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

LR Parsing Techniques

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

MODULE 14 SLR PARSER LR(0) ITEMS

UNIT-III BOTTOM-UP PARSING

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Syntax Analyzer --- Parser

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Monday, September 13, Parsers



Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Wednesday, August 31, Parsers

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Lecture 8: Deterministic Bottom-Up Parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lecture Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Bottom-Up Parsing. Lecture 11-12

UNIT III & IV. Bottom up parsing

Concepts Introduced in Chapter 4

SLR parsers. LR(0) items

LR Parsing. Table Construction

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Downloaded from Page 1. LR Parsing

How do LL(1) Parsers Build Syntax Trees?

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS 314 Principles of Programming Languages

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Bottom-Up Parsing LR Parsing

CS415 Compilers. LR Parsing & Error Recovery

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Bottom-Up Parsing II. Lecture 8

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

shift-reduce parsing

CS308 Compiler Principles Syntax Analyzer Li Jiang

LR Parsing - Conflicts

CSE302: Compiler Design

LR Parsing - The Items

More Bottom-Up Parsing

VIVA QUESTIONS WITH ANSWERS

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Configuration Sets for CSX- Lite. Parser Action Table

4. Lexical and Syntax Analysis

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

4. Lexical and Syntax Analysis

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Lecture Notes on Bottom-Up LR Parsing

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Building Compilers with Phoenix

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lecture Notes on Bottom-Up LR Parsing

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Syn S t yn a t x a Ana x lysi y s si 1

CS143 Handout 14 Summer 2011 July 6 th, LALR Parsing

Compiler Construction

Transcription:

Formal Languages and Compilers Lecture VII Part 4: Syntactic Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal Languages and Compilers BSc course 2017/18 First Semester

Summary of Lecture IV Part 4 Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

When SLR is not Adequate Example. Consider the following Grammar and the Canonical Collection: Augmented Grammar S S S L = R S R L R L id R L I 0 : S. S S. L = R S. R L. R L. id R. L I 1 : S S. I 2 : S L. = R R L. I 3 : S R. I 4 : L. R R. L L. R L. id I 5 : L id. I 6 : S L =. R R. L L. R L. id I 7 : L R. I 8 : R L. I 9 : S L = R.

When SLR is not Adequate (Cont.) Consider now the set of items in I 2 : 1 If we consider the first item, then, action[2, =] = shift 6. 2 If we consider the second item, since = Follow(R), then, action[2, =] = reduce R L. Shift/Reduce Conflict. There is both a shift and a reduce action thus the entry for action[2, =] is multiply defined! The SLR parsing technique is not powerful enough!

Why SLR fails The problem can be modeled as follow: 1 The stack contains the string βα, with α on top; 2 The item set I i contains the item [A α. ]; 3 Lookahead symbol is a, and a Follow(A). Problem. The state s i is on top of the stack BUT reducing with A α is such that βa cannot be followed by a in a right-sentential form. Remark. In general, Follow(A) is not limited to right-most derivations.

Why SLR fails: An Example S I 0 I 1 L R * id I 2 I 3 * I 4 = * R L I 6 R I 9 L I 7 I 8 If we are in state I 2 with lookahead = then the stack contains 0L2 with L on top. If we reduce with R L then the stack is 0R3. In state I 3 we can only reduce with S R and lookahead $. Since the current lookahead is = then the parsing fails: action[3, =] = error, and the parser must Backtrack! id I 5 id

Summary Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

Towards LR Parsing: A Solution to SLR failure Add more information in the state to rule out invalid reductions. Each state of an LR(1) Parser indicates what symbol can follow a handle for which there is a possible reduction. Definition. An LR(1) Item is then a pair [A α. β, a], where a V T is called the Lookahead of the Item. Furthermore, a Follow(A).

LR(1) Items [A α. β, a] describes a context of the parser: We are trying to find an A followed by an a, and We have α already on top of the stack Thus we need to see next something derivable from βa along right-most derivations. The lookahead has no effect in items of the form [A α. β, a], with β ϵ, BUT On items [A α., a], the parser reduces with A α only if the next input symbol is a.

Summary Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

The Closure Operation To construct the Collection of sets of Items we proceed as in the construction of the LR(0) Canonical Collection but with new closure and goto operations. Algorithm. Closure(I). Let I be a set of LR(1) items for an augmented Grammar G, then closure(i ) is a set of items such that: 1 Initially every item in I is added to closure(i); 2 If [A α. Bβ, a] closure(i) and B γ, then we add the item [B. γ, b] to closure(i), where b First(βa). Go to step 1 until no more items can be added to closure(i). Intuition. Considering the addition of the productions involving B the parser must take into account the context of B.

The Goto Operation and LR(1) Item Collection Definition. (Goto) If I is a set of items and X is a grammar symbol, then, goto(i, X ) is the closure of the set of all items [A αx. β, a] such that [A α. X β, a] is in I. LR(1) Item Collection. We proceed as in the construction of the LR(0) Canonical Collection but with new closure and goto operations. The initial condition is: C = {closure({[s. S, $]})}.

LR(1) Item Collection: An Example Example. Build the LR(1) Item Collection for the following augmented Grammar: S S S CC C cc d

Summary Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

LR(1) Parsing Table Algorithm. LR(1) Parsing Tables Action and Goto. 1 Construct C = {I 0, I 1,..., I n }, the LR(1) item collection for the augmented grammar G. 2 To each item set I i we create a new state s i. Then the action table is: action[s i, a] = shift s j ", if [A α. aβ, b] I i, and goto(i i, a) = I j. action[s i, a] = reduce A α", for all [A α., a] I i. Here A S. action[s i, $] = accept", if [S S., $] I i. 3 goto[s i, A] = s j, if goto(i i, A) = I j. 4 All the entries not defined by rules (2) and (3) are made error. 5 The initial state is the one constructed from the closure of [S. S, $].

Canonical LR(1) Parsers The parsing table produced by the above algorithm is called the Canonical LR(1) Parsing Table; A parser using this table is called a Canonical LR(1) Parser; If the table has no multiply-defined entries then the Grammar is called an LR(1) Grammar.

Summary Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

LALR Parsers LALR stands for LookAhead-LR Parser. It is often used in practice since the parsing tables are considerably smaller than the canonical LR tables. Main Idea: Merge the states (Item sets) whose items differ only in the lookahead. We say that such states have the same core. (SLR and LALR) Vs. LR. The comparison is in term of size. For a language like Pascal we go from hundreds of states to thousand of states.

LALR Parsers: An Example Example. The Grammar: S S S CC C cc d Has items I 3 I 6, I 4 I 7, and I 8 I 9 with the same core. By merging items with the same core we obtain: I 36 : C c. C, c/d/$ C. cc, c/d/$ C. d, c/d/$ I 47 : C d., c/d/$ I 89 : C cc., c/d/$ Consider the merged state I 47. The LALR parser will reduce d to C on any input, even in situations where the original LR parser would generate an error! In any case, the LALR parser will communicate the same error but at a later stage.

Summary Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

Introducing Conflicts by Merging States Theorem. Starting from an LR(1) Grammar and merging states with the same core we can only introduce reduce/reduce conflicts. We first show that no shift/reduce conflicts are present. Suppose there is a merged state with a shift/reduce conflict on input a: I LALR : A α., a/... B β. aγ, b/...... Then, there is an LR(1) state such that: I LR : A α., a B β. aγ, c (for some c)... And I LR presents a shift/reduce conflict on input a. Absurd!

Introducing Conflict by Merging States (Cont.) We now show how it is possible to introduce reduce/reduce conflicts. Consider the following LR(1) states: I i : A c., d B c., e I j : A c., e B c., d Neither of these sets generates a conflict, however, their merge: I ij : A c., d/e B c., d/e generates a reduce/reduce conflict on both input d and e. Note. In case of a reduce/reduce conflict the Grammar is no more LALR. Thus, LALR is a simpler Grammar then an LR Grammar.

Summary Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

LALR Parsing Table The algorithm we present serves primarily for defining LALR Grammars there are more efficient techniques. Algorithm. LALR(1) Parsing Tables Action and Goto. 1 Construct C = {I 0, I 1,..., I n }, the LR(1) item collection for the augmented grammar G. 2 Find all LR(1) states having the same core and merge them. Let C = {J 0, J 1,..., J m } the resulting item collection. 3 To each item set J i we create a new state s i. Then the action table is constructed as in case of LR(1) parsers. 4 The goto function (and thus, the goto table) is constructed as follows. If J = I 1... I k, then the cores of goto(i 1, X ),...,goto(i k, X ) are all the same. Let K the union of all sets of items having the same core as goto(i 1, X ), i.e., K = i=1,...,k goto(i i, X ), then goto(j, X ) = K.

LALR Parsing Table (Cont.) If there is a reduce/reduce conflict in the above construction the Grammar is not LALR(1); The parsing table produced by the above algorithm is called the LALR(1) Parsing Table; A parser using this table is called an LALR(1) Parser.

LALR Parser: An Example Example. The Grammar: S S S CC C cc d Has the following LALR(1) Item Collection: C = {I 0, I 1, I 2, I 5, I 36, I 47, I 89 }, where: I 36 : C c. C, c/d/$ C. cc, c/d/$ C. d, c/d/$ I 47 : C d., c/d/$ I 89 : C cc., c/d/$ The LALR Parsing Table is: c d $ S C 0 s36 s47 1 2 1 acc 2 s36 s47 5 36 s36 s47 89 47 r3 r3 r3 5 r1 89 r2 r2 r2

Summary Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

Behavior of LR Vs. LALR On a correct input both parsers make the same sequence of shift and reduce, only the state s names change. On erroneous input the LALR parser makes more moves (reductions) than the LR parser.

Behavior of LR Vs. LALR: An Example Example. Consider as input the string ccd. The LR parser will push onto the stack: 0c3c3d4 Ending with an error in state 4 and current input $. The LALR parser will push onto the stack: 0c36c36d47 State 47 with input $ reduces with C d and the stack is: 0c36c36C89 State 89 with input $ reduces with C cc and the stack is: 0c36C89 A similar reduction applies and the stack is: 0C2 Ending finally with an error in state 2 and current input $.

Summary of Lecture IV Part 4 Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars

Ambiguous Grammars Ambiguous Grammars provide a shorter and more natural specification. This is reflected by a parser requiring a minor number of states. Ambiguous Grammars are not LR: A parsing table will have multiply-defined entries; A parsing table will contain reduce/reduce and/or shift/reduce conflicts. To deal with ambiguous grammars we use disambiguating rules that eliminate all the conflicts allowing for only one Parse-Tree.

Precedence and Associativity Rules The following ambiguous Grammar for arithmetic expressions does not specify the associativity and the precedence of +, : E E + E E E (E) id The unambiguous Grammar: E E+T T T T F F F (E) id Enforces precedence of over +, and left-associativity of + and. Note. The parser for the unambiguous Grammar is less efficient: The time spent for reducing the unit productions E T and T F has the sole function to enforce precedence.

Precedence and Associativity Rules (Cont.) Let us consider the following LR(0) items for the ambiguous augmented Grammar (see the Book, Sect. 4.8, for the complete set of states): I 7 : E E + E. I 8 : E E E. E E. +E E E. +E E E. E E E. E Follow(E) = {$, +, } State I 7 generates a conflict between reduce with E E + E, and shift with input + and. State I 8 generates a conflict between reduce with E E E, and shift with input + and. Both Conflicts can be solved using the Precedence and Associativity for the operations + and.

Precedence and Associativity Rules (Cont.) Consider the input id + id id and the following configuration: Stack 0E1 + 4E7 Input id$ Assuming that takes precedence over +, the parser should shift onto the stack instead of reducing with E E + E. Consider the input id + id + id and the following configuration: Stack 0E1 + 4E7 Input +id$ Assuming that + is left-associative, the parser should reduce with E E + E instead of shift + onto the stack.

Precedence and Associativity Rules (Cont.) Summary. Precedence and Associativity uniquely determine the actions of the parser eliminating the ambiguity: 1 Left-Associativity of +: action(7, +) = reduce with E E + E 2 Left-Associativity of : action(8, ) = reduce with E E E 3 Precedence of over +: action(7, ) = shift 4 Precedence of over +: action(8, +) = reduce with E E E

Dangling-Else Ambiguity Consider the following Grammar for IF-THEN-ELSE statements: S S S ises is a Where, i stands for if expr then, e stands for else, and a stands for all other productions. Let us consider the following LR(0) item for this ambiguous Grammar (see the Book for the complete set of states): I 4 : S is. es S is. Follow(S) = {$, else} Assuming that the stack contains: if expr then stmt (is) with state 4 on top, and the next input is else there is a shift/reduce conflict. Solution. else must match the last if: action(4, else) = shift else. Since Yacc solves shift/reduce conflicts in favor of shift the dangling-else is handled correctly.

Summary of Lecture IV Part 4 Inadequacy of SLR Parsing LR(1) Parsing Closure and Goto Operations, Canonical Collection LR(1) Parsing Tables LALR(1) Parsing Conflicts in LALR Parsers LALR(1) Parsing Tables LALR Vs. LR Parsers Dealing with Ambiguous Grammars