Syntax Analysis, VI Examples from LR Parsing. Comp 412

Similar documents
Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Introduction to Parsing. Comp 412

Building a Parser Part III

Bottom-up Parser. Jungsik Choi

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

Parsing II Top-down parsing. Comp 412

Syntax Analysis, III Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Syntax Analysis, III Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSCI312 Principles of Programming Languages

CS415 Compilers. LR Parsing & Error Recovery

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS 406/534 Compiler Construction Parsing Part I

S Y N T A X A N A L Y S I S LR

Introduction to Parsing

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Parsing II Top-down parsing. Comp 412

Parsing III. (Top-down parsing: recursive descent & LL(1) )

LR Parsing Techniques

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

CS415 Compilers. Lexical Analysis

Lecture 8: Deterministic Bottom-Up Parsing

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Bottom-Up Parsing. Lecture 11-12

CS 4120 Introduction to Compilers

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Chapter 4. Lexical and Syntax Analysis

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Lecture 7: Deterministic Bottom-Up Parsing

EECS 6083 Intro to Parsing Context Free Grammars

Lecture Bottom-Up Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Bottom-Up Parsing. Lecture 11-12

Wednesday, August 31, Parsers

CS 406/534 Compiler Construction Putting It All Together

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Algorithms for NLP. LR Parsing. Reading: Hopcroft and Ullman, Intro. to Automata Theory, Lang. and Comp. Section , pp.

Chapter 4: LR Parsing

Syntactic Analysis. Top-Down Parsing

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

LR Parsing E T + E T 1 T

Monday, September 13, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Bottom-Up Parsing II. Lecture 8

Context-free grammars

Downloaded from Page 1. LR Parsing

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Compiler Construction 2016/2017 Syntax Analysis

Compiler Construction: Parsing

UNIT III & IV. Bottom up parsing

Parsing Part II (Top-down parsing, left-recursion removal)

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Optimizing Finite Automata

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

CSE 582 Autumn 2002 Exam 11/26/02

Syntax-Directed Translation. Lecture 14

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Combining Optimizations: Sparse Conditional Constant Propagation

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

CS 314 Principles of Programming Languages

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

shift-reduce parsing

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale


CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

4. Lexical and Syntax Analysis

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1

Types of parsing. CMSC 430 Lecture 4, Page 1

Lexical and Syntax Analysis. Bottom-Up Parsing

CS 164 Handout 11. Midterm Examination. There are seven questions on the exam, each worth between 10 and 20 points.

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

4. Lexical and Syntax Analysis

Introduction to Parsing. Lecture 8

CS606- compiler instruction Solved MCQS From Midterm Papers

3. Parsing. Oscar Nierstrasz

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CA Compiler Construction

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Programming Language Syntax and Analysis

Implementing Control Flow Constructs Comp 412

Transcription:

Midterm Exam: Thursday October 18, 7PM Herzstein Amphitheater Syntax Analysis, VI Examples from LR Parsing Comp 412 COMP 412 FALL 2018 source code IR IR target Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Chapter 3 in EaC2e

Roadmap Last Class Bottom-up parsers, reverse rightmost derivations The mystical concept of a handle Easy to understand if we are given an oracle Opaque (at this point) unless we are given an oracle Saw a bottom-up, shift-reduce parser at work on x 2 * y This Class Structure & operation of an LR(1) parser Both a skeleton parser & the LR(1) tables Example from the Parentheses Language Look at how the LR(1) parser uses lookahead to determine shift vs reduce Lay the groundwork for the table construction lecture LR(1) items, Closure(), and Goto() COMP 412, Fall 2018 1

LR(1) Parsers This week will focus on LR(1) parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1 word) for handle recognition The class of grammars that these parsers recognize is called the set of LR(1) grammars Informal definition: A grammar is LR(1) if, given a rightmost derivation S Þ g 0 Þ g 1 Þ g 2 Þ Þ g n 1 Þ g n Þ sentence We can 1. isolate the handle of each right-sentential form g I, and 2. determine the production by which to reduce, by scanning g i from left-to-right, going at most 1 word beyond the right end of the handle of g i I always find this definition to be unsatisfying because it isn t an operational definition. LR(1) implies a left-to-right scan of the input, a rightmost derivation COMP (in reverse), 412, Fall and 2018 1 word of lookahead. 2

Bottom-up Parser Our conceptual shift-reduce parser from last lecture push INVALID word NextWord( ) repeat until (top of stack = Goal and word = EOF) if the top of the stack forms a handle A b then // reduce b to A pop b symbols off the stack push A onto the stack else if (word ¹ EOF) then push word word NextWord( ) // shift else // error: out of input report an error report success // accept Shift-reduce parsers have four kinds of actions: Shift: move next word from input to the stack Reduce: handle is at TOS pop RHS of handle push LHS of handle Accept: stop & report success Error: report an error Shift & Accept are O(1) Reduce is O( RHS ) RHS = Parse tree nodes Key insight: the parser shifts until a handle appears at TOS COMP 412, Fall 2018 3

Table-Driven LR Parsers A table-driven LR(1) parser is a bottom-up shift-reduce parser Scanner <word, category> pairs Skeleton LR Parser Action & Goto Tables IR Knowledge encoded in tables to drive skeleton specifications (as a CFG) LR(1) Parser Generator Grammatical knowledge is encoded in two tables: Action & Goto They encode the handle-finding automaton They are constructed by an LR(1) parser generator Why two tables? Reduce needs more information & more complex information than shift Goto holds that extra information COMP 412, Fall 2018 4

On Handout The LR(1) Skeleton Parser stack.push( INVALID ); stack.push(s 0 ); // initial state word NextWord(); loop forever { s stack.top(); if ( ACTION[s,word] == reduce A b ) then { stack.popnum( 2* b ); // pop RHS off stack s stack.top(); stack.push( A ); // push LHS, A stack.push( GOTO[s,A] ); // push next state } else if ( ACTION[s,word] == shift s i ) then { stack.push(word); stack.push( s i ); word NextWord(); } else if ( ACTION[s,word] == accept & word == EOF) then break; else throw a syntax error; } report success; The Skeleton LR(1) parser follows basic shift-reduce scheme from last slide relies on a stack & a scanner Stacks <symbol, state> pairs handle finder is encoded in two tables: ACTION & GOTO shifts words times reduces derivation times accepts at most once detects errors by failure of the handle-finder, not by exhausting the input Given tables, we have a parser. COMP 412, Fall 2018 5

The Parentheses Language This grammar differs from the one that look similar in EaC2e Language of Balanced Parentheses Any sentence that consists of an equal number of ( s and ) s Beyond the power of regular expressions Classic justification for context-free grammar 1 Goal List 2 List List Pair 3 Pair 4 Pair ( List ) 5 ( ) Good example to elucidate the role of context in LR(1) parsing COMP This grammar 412, Fall 2018 and its tables differ, slightly, from the one in EaC2e. 6

On Handout LR(1) Tables for Parenthesis Grammar ACTION State ( ) EOF s 0 s 3 s 1 s 3 acc s 2 r 3 r 3 s 3 s 7 s 8 s 4 r 2 r 2 s 5 s 7 s 10 s 6 r 3 r 3 s 7 s 7 s 12 s 8 r 5 r 5 s 9 r 2 r 2 s 10 r 4 r 4 s 11 s 7 s 13 s 12 r 5 r 5 s 13 r 4 r 4 COMP 412, Fall 2018 GOTO State List Pair s 0 1 2 s 1 4 s 2 s 3 5 6 s 4 s 5 9 s 6 s 7 11 6 s 8 s 9 s 10 s 11 9 s 12 s 13 1 Goal List 2 List List Pair 3 Pair 4 Pair ( List ) 5 ( ) s 23 means shift & goto state 23 r 18 means reduce by prod n 18 (& find next state in the Goto table) Blank is an error entry 7

Parsing () The Parentheses Language State Lookahead Stack Handle Action ( $ 0 none 0 ( $ 0 none shift 3 3 ) $ 0 ( 3 none shift 8 8 EOF $ 0 ( 3 ) 8 Pair ( ) reduce 5 2 EOF $ 0 Pair 2 List Pair reduce 3 1 EOF $ 0 List 1 Goal List accept The Lookahead column shows the contents of word in the algorithm 1 Goal List 2 List List Pair 3 Pair 4 Pair ( List ) COMP 412, Fall 2018 8 5 ( )

Parsing (())() The Parentheses Language State Lookahead Stack Handle Action ( $ 0 none 0 ( $ 0 none shift 3 3 ( $ 0 ( 3 none shift 7 7 ) $ 0 ( 3 ( 7 none shift 12 12 ) $ 0 ( 3 ( 7 ) 12 Pair ( ) reduce 5 6 ) $ 0 ( 3 Pair 6 List Pair reduce 3 5 ) $ 0 ( 3 List 5 none shift 10 10 ( $ 0 ( 3 List 5 ) 10 Pair ( List ) reduce 4 2 ( $ 0 Pair 2 List Pair reduce 3 1 ( $ 0 List 1 none shift 3 3 ) $ 0 List 1 ( 3 none shift 8 8 EOF $ 0 List 1 ( 3 ) 8 Pair ( ) reduce 5 4 EOF $ 0 List 1 Pair 4 List List Pair reduce 2 1 EOF $ 0 List 1 Goal List accept 1 Goal List 2 List List Pair 3 Pair 4 Pair ( List ) COMP 412, Fall 2018 9 5 ( )

Parsing (())() The Parentheses Language State Lookahead Stack Handle Action ( $ 0 none 0 ( $ 0 none shift 3 3 ( $ 0 ( 3 none shift 7 7 ) $ 0 ( 3 ( 7 none shift 12 12 ) $ 0 ( 3 ( 7 ) 12 Pair ( ) reduce 5 6 ) $ 0 ( 3 Pair 6 List Pair reduce 3 5 ) $ 0 ( 3 List 5 none shift 10 10 ( $ 0 ( 3 List 5 ) 10 Pair ( List ) reduce 4 2 ( $ 0 Pair 2 List Pair reduce 3 1 ( $ 0 List 1 none shift 3 3 ) $ 0 List 1 ( 3 none shift 8 8 EOF $ 0 List 1 ( 3 ) 8 Pair ( ) reduce 5 4 EOF $ 0 List 1 Pair 4 List List Pair reduce 2 1 EOF $ 0 List 1 Goal List accept Let s look at how it reduces (). We have seen 3 examples. COMP 412, Fall 2018 10

Parsing () The Parentheses Language State Lookahead Stack Handle Action ( $ 0 none 0 ( $ 0 none shift 3 3 ) $ 0 ( 3 none shift 8 8 EOF $ 0 ( 3 ) 8 Pair ( ) reduce 5 2 EOF $ 0 Pair 2 List Pair reduce 3 1 EOF $ 0 List 1 Goal List accept In the string ( ), reducing by production 5 reveals state s 0. 1 Goal List Goto(s 0, Pair) is s 2, which leads to chain of productions 3 & 1. 2 List List Pair 3 Pair 4 Pair ( List ) COMP 412, Fall 2018 11 5 ( )

Parsing (())() The Parentheses Language State Lookahead Stack Handle Action ( $ 0 none 0 ( $ 0 none shift 3 3 ( $ 0 ( 3 none shift 7 7 ) $ 0 ( 3 ( 7 none shift 12 12 ) $ 0 ( 3 ( 7 ) 12 Pair ( ) reduce 5 6 ) $ 0 ( 3 Pair 6 List Pair reduce 3 5 ) $ 0 ( 3 List 5 none shift 10 10 ( $ 0 ( 3 List 5 ) 10 Pair ( List ) reduce 4 Here, reducing 2 by ( 5 reveals $ state 0 Pair s 2 List Pair reduce 3 3, which represents the left context 1 of an unmatched ( (. $ 0 There List 1 will be one s 3 per none unmatched shift 3 ( they 3 count the ) remaining $ 0 List ( s 1 that ( 3 must be matched. none shift 8 1 Goal List Goto(s 3 8, Pair) is seof 6, a state $ in 0 which List 1 the ( 3 ) parser 8 expects Pair a ).( That ) reduce 5 state leads 4 to reductions EOF by $ 03 List and 1 then Pair 4. 2 List List Pair 4 List List Pair reduce 2 3 1 EOF $ 0 List 1 Goal List accept Pair 4 Pair ( List ) COMP 412, Fall 2018 12 5 ( )

Parsing (())() The Parentheses Language 1 Goal List 2 List List Pair 3 Pair State Lookahead Stack Handle Action 4 Pair ( List ) ( $ 0 none 5 ( ) 0 ( $ 0 none shift 3 Here, reducing 3 by 5 ( reveals $ state 0 ( 3 s 1, which represents none the left shift 7 context 7 of a previously ) recognized $ 0 ( 3 ( List. 7 none shift 12 Goto(s12 1, Pair) is s 4,) a state in $ 0 which ( 3 ( 7 the ) 12parser will reduce Pair List ( ) Pair reduce 5 to List (production 2) on a lookahead of either ( or EOF. 6 ) $ 0 ( 3 Pair 6 List Pair reduce 3 Here, lookahead is EOF, which leads to reduction by 2, then by 1. 5 ) $ 0 ( 3 List 5 none shift 10 10 ( $ 0 ( 3 List 5 ) 10 Pair ( List ) reduce 4 2 ( $ 0 Pair 2 List Pair reduce 3 1 ( $ 0 List 1 none shift 3 3 ) $ 0 List 1 ( 3 none shift 8 8 EOF $ 0 List 1 ( 3 ) 8 Pair ( ) reduce 5 4 EOF $ 0 List 1 Pair 4 List List Pair reduce 2 1 EOF $ 0 List 1 Goal List accept COMP 412, Fall 2018 13

LR(1) Parsers Recap: How does an LR(1) parser work? Unambiguous grammar Þ unique rightmost derivation Keep upper fringe on a stack All active handles include top of stack (TOS) Shift inputs until TOS is right end of a handle Language of handles is regular (finite) Build a handle-recognizing DFA to control the stack-based recognizer ACTION & GOTO tables encode the DFA COMP 412, Fall 2018 14

LR(1) Parsers List S 0 S 1 ( Pair ( S 2 S 6 S 4 S 5 S 10 List S 3 S 7 S 11 Pair Pair ) ) ( Pair ) ( ( List S 8 S 12 Pair S 9 ( Pair ) S 13 1 Goal List 2 List List Pair 3 Pair 4 Pair ( List ) 5 ( ) Control DFA for the Parentheses Language The Control DFA for the parentheses language is embedded in the ACTION and GOTO Tables Transitions on terminals represent shift actions Transitions on nonterminals follow reduce actions The table construction derives this DFA from the grammar. [ACTION Table] [GOTO Table] This point is not obvious. To see it, compare the ACTION & COMP 412, Fall 2018 GOTO tables for the parenthesis language with the DFA. 15

LR(1) Parsers Recap: How does an LR(1) parser work? Unambiguous grammar Þ unique rightmost derivation Keep upper fringe on a stack All active handles include top of stack (TOS) Shift inputs until TOS is right end of a handle Language of handles is regular (finite) Build a handle-recognizing DFA to control the stack-based recognizer ACTION & GOTO tables encode the DFA To match a subterm, it, effectively, invokes the DFA recursively leave old DFA s state on stack and go on Final state in DFA Þ a reduce action (& a return from the recursion ) Pop rhs off the stack to reveal invoking state It would be legal to recognize an x, and we did New state is Goto[revealed state, NT on LHS] Take a DFA transition on the new NT the LHS we just pushed COMP 412, Fall 2018 16

Building LR(1) Tables How do we generate the ACTION and GOTO tables? Use the grammar to build a model of the Control DFA Encode actions & transitions in ACTION & GOTO tables If construction succeeds, the grammar is LR(1) Succeeds means defines each table entry uniquely An operational definition The Big Picture Model the state of the parser with LR(1) items Use two functions goto(s, X) and closure(s) goto() is analogous to move() in the subset construction grammar symbol, T or NT Given a partial state, closure() adds all the items implied by the partial state Build up the states and transition functions of the DFA Use this information to fill in the ACTION and GOTO tables fixed-point algorithm, similar to the subset construction COMP 412, Fall 2018 17

LR(1) Table Construction To understand the algorithms, we need to understand the data structure that they use: LR(1) items The LR(1) table construction algorithm models the set of possible states that the parser can enter Mildly reminiscent of the subset construction (NFA DFA) The construction needs a representation for the parser s state, as a function of the context it has seen and might see LR(1) Items The LR(1) table construction algorithm represents each valid configuration of an LR(1) parser with an LR(1) item An LR(1) item is a pair [P, d], where P is a production A b with a at some position in the RHS d is a single symbol lookahead (symbol word or EOF) COMP 412, Fall 2018 18

LR(1) Items The intermediate representation of the LR(1) table construction algorithm An LR(1) item is a pair [P, d], where P is a production A b with a at some position in the RHS d is a single symbol lookahead (symbol word or EOF) The in an item indicates the position of the top of the stack [A bg,a] means that the input seen so far is consistent with the use of A bg immediately after the symbol on top of the stack. We call an item like this a possibility. [A b g,a] means that the input sees so far is consistent with the use of A bg at this point in the parse, and that the parser has already recognized b (that is, b is on top of the stack). We call an item like this a partially complete item. [A bg,a] means that the parser has seen bg, and that a lookahead symbol of a is consistent with reducing to A. This item is complete. Might STOP LR(k) parsers rely on items with a lookahead of k symbols. COMP 412, Fall 2018 19 That leads to LR(k) items, with correspondingly longer d.

LR(1) Items The production A b, where b = B 1 B 2 B 3 with lookahead a, can give rise to 4 items [A B 1 B 2 B 3,a], [A B 1 B 2 B 3,a], [A B 1 B 2 B 3,a], & [A B 1 B 2 B 3,a] The set of LR(1) items for a grammar is finite. What s the point of all these lookahead symbols? Carry them along to help choose the correct reduction Lookaheads are bookkeeping, unless item has at right end Has no direct use in [A b g,a] In [A b,a], a lookahead of a implies a reduction by A b For { [A b,a],[b g d,b] }, a Þ reduce to A; FIRST(d) Þ shift Þ Limited right context is enough to pick the actions a FIRST(d) Þ a conflict, not LR(1) COMP 412, Fall 2018 20

LR(1) Items: Why should you know this stuff? Debugging a grammar When you build an LR(1) parser, it is possible (likely) that the initial grammar is not LR(1) The tools will provide you with debugging output To the right is a sample of bison s output for the if-then-else grammar state 10 4 stmt : IF EXPR THEN stmt. 5 IF EXPR THEN stmt. ELSE stmt ELSE shift, and go to state 11 ELSE [reduce using rule 4 (stmt)] $default reduce using rule 4 (stmt) That period is the goal : stmt_list stmt_list : stmt_list stmt stmt stmt : IF EXPR THEN stmt IF EXPR THEN stmt ELSE stmt OTHER The state is described by its LR(1) items Likely STOP COMP 412, Fall 2018 21

LR(1) Table Construction High-level overview 1 Build the Canonical Collection of Sets of LR(1) Items, I a Begin in an appropriate state, s 0 [S S,EOF], along with any equivalent items Derive equivalent items as closure( s 0 ) b Repeatedly compute, for each s k, and each X, goto(s k,x) If the set is not already in the collection, add it Record all the transitions created by goto( ) This eventually reaches a fixed point S is the start symbol. To simplify things, we add S S to create a unique goal production. goto(s i, X) contains the set of LR(1) items that represent possible parser configurations if the parser recognizes an X while in state s i 2 Fill in the table from the Canonical Collection of Sets of LR(1) items The sets in the canonical collection form the states of the Control DFA. The construction traces the DFA s transitions COMP 412, Fall 2018 22

LR(1) Table Construction High-level overview 1 Build the Canonical Collection of Sets of LR(1) Items, I a Begin in an appropriate state, s 0 [S S,EOF], along with any equivalent items Derive equivalent items as closure( s 0 ) b Repeatedly compute, for each s k, and each X, goto(s k,x) If the set is not already in the collection, add it Record all the transitions created by goto( ) This eventually reaches a fixed point 2 Fill in the table from the Canonical Collection of Sets of LR(1) items Let s build the tables for the left-recursive SheepNoise grammar 0 Goal SheepNoise 1 SheepNoise SheepNoise baa 2 baa (S is Goal) COMP 412, Fall 2018 23

Computing Closures Closure(s) adds all the items implied by items already in s Any item [A b Bd,a] where B Î NT implies [B t,x] for each production that has B on the lhs, and each x Î FIRST(da) Since bbd is valid, any way to derive bbd is valid, too The Algorithm Closure( s ) while ( s is still changing ) " items [A b Bd,a] Î s " productions B t Î P " b Î FIRST(da) // d might be e if [B t,b] Ï s then s s È { [B t,b] } Classic fixed-point method Halts because s Ì I, the set of items Worklist version is faster Closure fills out a state s Generate new lookaheads. See note on p. 128 COMP 412, Fall 2018 24

Computing Closures Generating Closures is the place where a human is most likely to make a mistake With everything going on in the construction, it is easy to lose track of!a and the fact that it refers to the item, not the current production Closure( s ) while ( s is still changing ) " items [A b Bd,a] Î s " productions B t Î P " b Î FIRST(da) // d might be e if [B t,b] Ï s then s s È { [B t,b] } The lookahead computation is a great example of why these table constructions should be done by computers, not human beings COMP 412, Fall 2018 25

This is the left-recursive SheepNoise; EaC2e shows the right-recursive version. Example From SheepNoise Initial step builds the item [Goal SheepNoise, EOF] and takes its Closure( ) Closure( [Goal SheepNoise, EOF] ) Item [Goal SheepNoise, EOF] [SheepNoise SheepNoise baa, EOF] [SheepNoise baa, EOF] [SheepNoise SheepNoise baa, baa] [SheepNoise baa, baa] Source Original item ITER 1, PR 1, da is EOF ITER 1, PR 2, da is EOF ITER 2, PR 1, da is baa EOF ITER 2, PR 2, da is baa EOF So, S 0 is { [Goal SheepNoise,EOF], [SheepNoise SheepNoise baa,eof], [SheepNoise baa,eof], [SheepNoise SheepNoise baa,baa], [SheepNoise baa,baa] } Symbol 0 Goal SheepNoise 1 SheepNoise SheepNoise baa COMP 412, Fall 2018 2 baa 26 FIRST Goal { baa } SheepNoise { baa } baa { baa } EOF { EOF }

Computing Gotos Goto(s,x) computes the state that the parser would reach if it recognized an x while in state s Goto( { [A b Xd,a] }, X ) produces [A bx d,a] (obviously) It finds all such items & uses Closure() to fill out the state The Algorithm Goto( s, X ) new Ø " items [A b Xd,a] Î s new new È {[A bx d,a]} return closure( new ) Not a fixed-point method! Straightforward computation Uses Closure( ) Goto( ) models a transition in the automaton COMP Goto412, in this Fall construction 2018 is analogous to Move in the subset construction. 27

Example from SheepNoise Assume that S 0 is { [Goal SheepNoise,EOF], [SheepNoise SheepNoise baa,eof], [SheepNoise baa,eof], [SheepNoise SheepNoise baa,baa], [SheepNoise baa,baa] } Goto( S 0, baa ) Loop produces From earlier slide Item Source [SheepNoise baa, EOF] Item 3 in s 0 [SheepNoise baa, baa] Item 5 in s 0 Closure adds nothing since is at end of rhs in each item In the construction, this produces s 2 {[SheepNoise baa, {EOF,baa}]} New, but obvious, notation for two distinct items [SheepNoise baa, EOF] & [SheepNoise baa, baa] 0 Goal SheepNoise 1 SheepNoise SheepNoise baa COMP 412, Fall 2018 2 baa 28

Building the Canonical Collection Start from s 0 = Closure( [S S,EOF] ) Repeatedly construct new states, until all are found The Algorithm s 0 Closure( [S S,EOF]) S { s 0 } k 1 while (S is still changing) " s j Î S and " x Î (T È NT ) s k Goto(s j,x) record s j s k on x if s k Ï S then S S È { s k } k k + 1 Fixed-point computation Loop adds to S S Í 2 ITEMS, so S is finite Worklist version is faster because it avoids duplicated effort This membership / equality test requires careful and/or clever implementation. COMP 412, Fall 2018 29

Example from SheepNoise Starts with S 0 S 0 : { [Goal SheepNoise, EOF], [SheepNoise SheepNoise baa, EOF], [SheepNoise baa, EOF], [SheepNoise SheepNoise baa, baa], [SheepNoise baa, baa] } Iteration 1 computes S 1 = Goto(S 0, SheepNoise) = { [Goal SheepNoise, EOF], [SheepNoise SheepNoise baa, EOF], [SheepNoise SheepNoise baa, baa] } S 2 = Goto(S 0, baa) = { [SheepNoise baa, EOF], [SheepNoise baa, baa] } Iteration 2 computes S 3 = Goto(S 1, baa) = { [SheepNoise SheepNoise baa, EOF], [SheepNoise SheepNoise baa, baa] } Nothing more to compute, since is at the end of every item in S 3. 0 Goal SheepNoise 1 SheepNoise SheepNoise baa COMP 412, Fall 2018 2 baa 30

Example from SheepNoise S 0 : { [Goal SheepNoise, EOF], [SheepNoise SheepNoise baa, EOF], [SheepNoise baa, EOF], [SheepNoise SheepNoise baa, baa], [SheepNoise baa, baa] } S 1 = Goto(S 0, SheepNoise) = { [Goal SheepNoise, EOF], [SheepNoise SheepNoise baa, EOF], [SheepNoise SheepNoise baa, baa] } S 2 = Goto(S 0, baa) = { [SheepNoise baa, EOF], [SheepNoise baa, baa] } S 3 = Goto(S 1, baa) = { [SheepNoise SheepNoise baa, EOF], [SheepNoise SheepNoise baa, baa] } 0 Goal SheepNoise 1 SheepNoise SheepNoise baa COMP 412, Fall 2018 2 baa 31 Hard STOP