CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Similar documents
CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

UNIT III & IV. Bottom up parsing

LR Parsing Techniques

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

UNIT-III BOTTOM-UP PARSING

Downloaded from Page 1. LR Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Lecture 7: Deterministic Bottom-Up Parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Chapter 4: LR Parsing

Compilers. Bottom-up Parsing. (original slides by Sam

Lecture 8: Deterministic Bottom-Up Parsing


Context-free grammars

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Compiler Construction: Parsing

Lexical and Syntax Analysis. Bottom-Up Parsing

4. Lexical and Syntax Analysis

Compiler Construction 2016/2017 Syntax Analysis

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Principles of Programming Languages

4. Lexical and Syntax Analysis

CSE 401 Midterm Exam Sample Solution 2/11/15

LR Parsers. Aditi Raste, CCOEW

SLR parsers. LR(0) items

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Table-Driven Parsing

shift-reduce parsing

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

S Y N T A X A N A L Y S I S LR

Chapter 4. Lexical and Syntax Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones


MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CS 4120 Introduction to Compilers

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Bottom-up Parser. Jungsik Choi

Bottom-Up Parsing. Lecture 11-12

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Lecture Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

MODULE 14 SLR PARSER LR(0) ITEMS

Bottom-Up Parsing II. Lecture 8

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

LR Parsing Techniques

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

LR Parsing E T + E T 1 T

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Monday, September 13, Parsers

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Question Bank. 10CS63:Compiler Design

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Syntax-Directed Translation. Lecture 14

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

LALR Parsing. What Yacc and most compilers employ.

Introduction to Parsing. Comp 412

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CS606- compiler instruction Solved MCQS From Midterm Papers

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Configuration Sets for CSX- Lite. Parser Action Table

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CSCI312 Principles of Programming Languages

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

LR Parsing - The Items

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Programming Language Syntax and Analysis

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

Wednesday, August 31, Parsers

The procedure attempts to "match" the right hand side of some production for a nonterminal.

Concepts Introduced in Chapter 4

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Transcription:

CSE 401 Compilers LR Parsing Hal Perkins Autumn 2011 10/10/2011 2002-11 Hal Perkins & UW CSE D-1

Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts 10/10/2011 2002-11 Hal Perkins & UW CSE D-2

LR(1) Parsing We ll look at LR(1) parsers Left to right scan, Rightmost derivation, 1 symbol lookahead Almost all practical programming languages have an LR(1) grammar LALR(1), SLR(1), etc. subsets of LR(1) LALR(1) can parse most real languages, tables are more compact, and is used by YACC/Bison/ CUP/etc. 10/10/2011 2002-11 Hal Perkins & UW CSE D-3

Bottom-Up Parsing Idea: Read the input left to right Whenever we ve matched the right hand side of a production, reduce it to the appropriate non-terminal and add that non-terminal to the parse tree The upper edge of this partial parse tree is known as the frontier 10/10/2011 2002-11 Hal Perkins & UW CSE D-4

Example Grammar Bottom-up Parse S ::= aab e A ::= Abc b B ::= d a b b c d e 10/10/2011 2002-11 Hal Perkins & UW CSE D-5

Details The bottom-up parser reconstructs a reverse rightmost derivation Given the rightmost derivation S => 1 => 2 => => n-2 => n-1 => n = w the parser will first discover n-1 => n, then n-2 => n-1, etc. Parsing terminates when 1 reduced to S (start symbol, success), or No match can be found (syntax error) 10/10/2011 2002-11 Hal Perkins & UW CSE D-6

How Do We Parse with This? Key: given what we ve already seen and the next input symbol, decide what to do. Choices: Perform a reduction Look ahead further Can reduce A=> if both of these hold: A=> is a valid production A=> is a step in this rightmost derivation This is known as a shift-reduce parser 10/10/2011 2002-11 Hal Perkins & UW CSE D-7

Sentential Forms If S =>*, the string is called a sentential form of the of the grammar In the derivation S => 1 => 2 => => n-2 => n-1 => n = w each of the i are sentential forms A sentential form in a rightmost derivation is called a right-sentential form (similarly for leftmost and left-sentential) 10/10/2011 2002-11 Hal Perkins & UW CSE D-8

Handles Informally, a substring of the tree frontier that matches the right side of a production Even if A::= is a production, is a handle only if it matches the frontier at a point where A::= was used in that derivation may appear in many other places in the frontier without being a handle for that particular production 10/10/2011 2002-11 Hal Perkins & UW CSE D-9

Handles (cont.) Formally, a handle of a right-sentential form is a production A ::= and a position in where may be replaced by A to produce the previous rightsentential form in the rightmost derivation of 10/10/2011 2002-11 Hal Perkins & UW CSE D-10

Handle Examples In the derivation S => aabe => aade => aabcde => abbcde abbcde is a right sentential form whose handle is A::=b at position 2 aabcde is a right sentential form whose handle is A::=Abc at position 4 Note: some books take the left of the match as the position 10/10/2011 2002-11 Hal Perkins & UW CSE D-11

Implementing Shift-Reduce Parsers Key Data structures A stack holding the frontier of the tree A string with the remaining input 10/10/2011 2002-11 Hal Perkins & UW CSE D-12

Shift-Reduce Parser Operations Reduce if the top of the stack is the right side of a handle A::=, pop the right side and push the left side A Shift push the next input symbol onto the stack Accept announce success Error syntax error discovered 10/10/2011 2002-11 Hal Perkins & UW CSE D-13

Shift-Reduce Example S ::= aabe A ::= Abc b B ::= d Stack Input Action $ abbcde$ shift 10/10/2011 2002-11 Hal Perkins & UW CSE D-14

How Do We Automate This? Def. Viable prefix a prefix of a rightsentential form that can appear on the stack of the shift-reduce parser Equivalent: a prefix of a right-sentential form that does not continue past the rightmost handle of that sentential form Idea: Construct a DFA to recognize viable prefixes given the stack and remaining input Perform reductions when we recognize them 10/10/2011 2002-11 Hal Perkins & UW CSE D-15

DFA for prefixes of S ::= aabe A ::= Abc b B ::= d e accept 8 9 $ B start 1 a 2 A 3 b 6 c 7 S ::= aabe A ::= Abc b d 4 5 A ::= b B ::= d 10/10/2011 2002-11 Hal Perkins & UW CSE D-16

Trace S ::= aabe A ::= Abc b B ::= d Stack Input $ abbcde$ accept e 8 9 B $ start a A b c 1 2 3 6 7 S ::= aabe A ::= Abc b d 4 5 A ::= b B ::= d 10/10/2011 2002-11 Hal Perkins & UW CSE D-17

Observations Way too much backtracking We want the parser to run in time proportional to the length of the input Where the heck did this DFA come from anyway? From the underlying grammar We ll defer construction details for now 10/10/2011 2002-11 Hal Perkins & UW CSE D-18

Avoiding DFA Rescanning Observation: after a reduction, the contents of the stack are the same as before except for the new non-terminal on top Scanning the stack will take us through the same transitions as before until the last one If we record state numbers on the stack, we can go directly to the appropriate state when we pop the right hand side of a production from the stack 10/10/2011 2002-11 Hal Perkins & UW CSE D-19

Stack Change the stack to contain pairs of states and symbols from the grammar $s 0 X 1 s 1 X 2 s 2 X n s n State s 0 represents the accept state (Not always added depends on particular presentation) Observation: in an actual parser, only the state numbers need to be pushed, since they implicitly contain the symbol information, but for explanations it s clearer to use both. 10/10/2011 2002-11 Hal Perkins & UW CSE D-20

Encoding the DFA in a Table A shift-reduce parser s DFA can be encoded in two tables One row for each state action table encodes what to do given the current state and the next input symbol goto table encodes the transitions to take after a reduction 10/10/2011 2002-11 Hal Perkins & UW CSE D-21

Actions (1) Given the current state and input symbol, the main possible actions are si shift the input symbol and state i onto the stack (i.e., shift and move to state i ) rj reduce using grammar production j The production number tells us how many <symbol, state> pairs to pop off the stack 10/10/2011 2002-11 Hal Perkins & UW CSE D-22

Actions (2) Other possible action table entries accept blank no transition syntax error A LR parser will detect an error as soon as possible on a left-to-right scan A real compiler needs to produce an error message, recover, and continue parsing when this happens 10/10/2011 2002-11 Hal Perkins & UW CSE D-23

Goto When a reduction is performed, <symbol, state> pairs are popped from the stack revealing a state uncovered_s on the top of the stack goto[uncovered_s, A] is the new state to push on the stack when reducing production A ::= (after popping and revealing state uncovered_s on top) 10/10/2011 2002-11 Hal Perkins & UW CSE D-24

Reminder: DFA for S ::= aabe A ::= Abc b B ::= d accept 8 e 9 $ B start 1 a 2 A 3 b 6 c 7 S ::= aabe A ::= Abc b d 4 5 A ::= b B ::= d 10/10/2011 2002-11 Hal Perkins & UW CSE D-25

LR Parse Table for 1. S ::= aabe 2. A ::= Abc 3. A ::= b 4. B ::= d State action goto a b c d e $ A B S 1 s2 acc g1 2 s4 g3 3 s6 s5 g8 4 r3 r3 r3 r3 r3 r3 5 r4 r4 r4 r4 r4 r4 6 s7 7 r2 r2 r2 r2 r2 r2 8 s9 9 r1 r1 r1 r1 r1 r1 10/10/2011 2002-11 Hal Perkins & UW CSE D-26

LR Parsing Algorithm (1) word = scanner.gettoken(); while (true) { s = top of stack; if (action[s, word] = si ) { push word; push i (state); word = scanner.gettoken(); } else if (action[s, word] = rj ) { pop 2 * length of right side of production j (2* ); uncovered_s = top of stack; push left side A of production j ; push state goto[uncovered_s, A]; } } else if (action[s, word] = accept ) { return; } else { // no entry in action table report syntax error; halt or attempt recovery; } 10/10/2011 2002-11 Hal Perkins & UW CSE D-27

Example 1. S ::= aabe 2. A ::= Abc 3. A ::= b 4. B ::= d Stack Input $ abbcde$ action goto S a b c d e $ A B S 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 r3 r3 r3 r3 r3 5 r4 r4 r4 r4 r4 r4 6 s7 7 r2 r2 r2 r2 r2 r2 8 s9 9 r1 r1 r1 r1 r1 r1 10/10/2011 2002-11 Hal Perkins & UW CSE D-28

LR States Idea is that each state encodes The set of all possible productions that we could be looking at, given the current state of the parse, and Where we are in the right hand side of each of those productions 10/10/2011 2002-11 Hal Perkins & UW CSE D-29

Items An item is a production with a dot in the right hand side Example: Items for production A ::= XY A ::=.XY A ::= X.Y A ::= XY. Idea: The dot represents a position in the production 10/10/2011 2002-11 Hal Perkins & UW CSE D-30

DFA for S ::= aabe A ::= Abc b B ::= d 1 2 4 S ::=.aabe a S ::= a.abe A ::=.Abc A ::=.b b A ::= b. $ A 3 5 accept S ::= aa.be A ::= A.bc B ::=.d d B ::= d. 8 9 B 6 b 7 S ::= aab.e e S ::= aabe. A ::= Ab.c c A ::= Abc. 10/10/2011 2002-11 Hal Perkins & UW CSE D-31

Problems with Grammars Grammars can cause problems when constructing a LR parser Shift-reduce conflicts Reduce-reduce conflicts 10/10/2011 2002-11 Hal Perkins & UW CSE D-32

Shift-Reduce Conflicts Situation: both a shift and a reduce are possible at a given point in the parse (equivalently: in a particular state of the DFA) Classic example: if-else statement S ::= ifthen S ifthen S else S 10/10/2011 2002-11 Hal Perkins & UW CSE D-33

Parser States for 1. S ::= ifthen S 2. S ::= ifthen S else S 1 2 3 4 S ::=. ifthen S S ::=. ifthen S else S ifthen S ::= ifthen. S S ::= ifthen. S else S S S ::= ifthen S. S ::= ifthen S. else S else S ::= ifthen S else. S State 3 has a shiftreduce conflict Can shift past else into state 4 (s4) Can reduce (r1) S ::= ifthen S (Note: other S ::=. ifthen items not included in states 2-4 to save space) 10/10/2011 2002-11 Hal Perkins & UW CSE D-34

Solving Shift-Reduce Conflicts Fix the grammar Done in Java reference grammar, others Use a parse tool with a longest match rule i.e., if there is a conflict, choose to shift instead of reduce Does exactly what we want for if-else case Guideline: a few shift-reduce conflicts are fine, but be sure they do what you want 10/10/2011 2002-11 Hal Perkins & UW CSE D-35

Reduce-Reduce Conflicts Situation: two different reductions are possible in a given state Contrived example S ::= A S ::= B A ::= x B ::= x 10/10/2011 2002-11 Hal Perkins & UW CSE D-36

Parser States for 1. S ::= A 2. S ::= B 3. A ::= x 4. B ::= x 1 2 S ::=.A S ::=.B A ::=.x B ::=.x x A ::= x. B ::= x. State 2 has a reduce-reduce conflict (r3, r4) 10/10/2011 2002-11 Hal Perkins & UW CSE D-37

Handling Reduce-Reduce Conflicts These normally indicate a serious problem with the grammar. Fixes Use a different kind of parser generator that takes lookahead information into account when constructing the states Most practical tools use this information Fix the grammar 10/10/2011 2002-11 Hal Perkins & UW CSE D-38

Another Reduce-Reduce Conflict Suppose the grammar separates arithmetic and boolean expressions expr ::= aexp bexp aexp ::= aexp * aident aident bexp ::= bexp && bident bident aident ::= id bident ::= id This will create a reduce-reduce conflict 10/10/2011 2002-11 Hal Perkins & UW CSE D-39

Covering Grammars A solution is to merge aident and bident into a single non-terminal (or use id in place of aident and bident everywhere they appear) This is a covering grammar Includes some programs that are not generated by the original grammar Use the type checker or other static semantic analysis to weed out illegal programs later 10/10/2011 2002-11 Hal Perkins & UW CSE D-40

Coming Attractions Constructing LR tables We ll present a simple version (SLR(0)) in lecture, then talk about extending it to LR(1) LL parsers and recursive descent Continue reading ch. 3 10/10/2011 2002-11 Hal Perkins & UW CSE D-41