EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Similar documents
EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

shift-reduce parsing

Downloaded from Page 1. LR Parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Wednesday, August 31, Parsers

Context-free grammars

Lecture 8: Deterministic Bottom-Up Parsing

Compiler Construction: Parsing

Bottom-Up Parsing. Lecture 11-12

Monday, September 13, Parsers

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Lecture 7: Deterministic Bottom-Up Parsing

S Y N T A X A N A L Y S I S LR

Bottom-Up Parsing. Lecture 11-12

Compilers. Bottom-up Parsing. (original slides by Sam

LR Parsing Techniques

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form


Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Lexical and Syntax Analysis. Bottom-Up Parsing

UNIT III & IV. Bottom up parsing

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

CS 4120 Introduction to Compilers

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Using an LALR(1) Parser Generator

Conflicts in LR Parsing and More LR Parsing Types

Lecture Bottom-Up Parsing

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

In One Slide. Outline. LR Parsing. Table Construction

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Building Compilers with Phoenix

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

LR Parsing LALR Parser Generators

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

How do LL(1) Parsers Build Syntax Trees?

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Syntax Analysis Part I

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

LL parsing Nullable, FIRST, and FOLLOW

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

CS 314 Principles of Programming Languages

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CS606- compiler instruction Solved MCQS From Midterm Papers

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

LR Parsing LALR Parser Generators


Compiler Construction 2016/2017 Syntax Analysis

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Syntax-Directed Translation. Lecture 14

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

CMSC 330: Organization of Programming Languages

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

3. Parsing. Oscar Nierstrasz

UNIT-III BOTTOM-UP PARSING

Syntax Analysis, VI Examples from LR Parsing. Comp 412

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Bottom-Up Parsing II. Lecture 8

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Topic 5: Syntax Analysis III

COMP3131/9102: Programming Languages and Compilers

More Bottom-Up Parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Chapter 3. Parsing #1

LR Parsing. Table Construction

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

VIVA QUESTIONS WITH ANSWERS

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Context-free grammars (CFG s)

CS 164 Handout 11. Midterm Examination. There are seven questions on the exam, each worth between 10 and 20 points.

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Transcription:

EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11

This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser) Semantic analyzer source code (text) tokens LR parsing AST (Abstract syntax tree) Attributed AST runtime system activation records objects stack garbage collection heap Intermediate code generator Interpreter Optimizer intermediate code Virtual machine code and data intermediate code Target code generator target code machine 2

LR parsing 3

S Recall main parsing ideas A X A X B C t r u v r u r u... t r u v r u r u... LL(1): decides to build X after seeing the firsttoken ofitssubtree. The tree is built top down. LR(1): decides to build X after seeing the firsttoken following itssubtree. The tree is built bottom up. 4

Recall different parsing algorithms Unambiguous LR LL Ambiguous All context-free grammars This lecture LL: Left-to-right scan Leftmost derivation Builds tree top-down Simple to understand LR: Left-to-right scan Rightmost derivation Builds tree bottom-up More powerful 5

Parses input Recall: LL(k) vs LR(k) LL(k) Left-to-right LR(k) Derivation Leftmost Rightmost Lookahead k symbols Build the tree top down bottom up Select rule after seeing its first k tokens after seeing all its tokens, and an additional k tokens Left recursion No Yes Unlimited common prefix Resolve ambiguities through rule priority No Dangling else Yes Dangling else, associativity, priority Error recovery Trial-and-error Good algorithms exist Implement by hand? Possible. Too complicated. Use a generator. 6

LR parsing Add the EOF token ($) and an extra start rule. The parser uses a stack of symbols (terminals and nonterminals). The parser looks at the current input token and decides to do one of the following actions: shift Push the input token onto the stack. Read the next token. reduce Match the top symbols on the stack with a production right-hand side. Pop those symbols and push the left-hand side nonterminal. At the same time, build this part of the tree. accept when the parser is about to shift $, the parse is complete. The parser uses a finite state automaton (encoded in a table) to decide which action to take and which state to go to after each a shift action. 7

Stack Input ID = ID + ID $ LR parsing example Grammar: p0: S -> Stmt $ p1: Stmt -> ID "=" Exp p2: Exp -> ID p3: Exp -> Exp "+" ID shift: push token onto stack, read next token reduce: pop rhs, push lhs, build part of tree accept: the tree is ready 8

Stack Input ID = ID + ID $ shift ID = ID + ID $ shift ID = ID + ID $ shift ID = ID + ID $ reduce Exp -> ID ID = Exp + ID $ ID shift ID = Exp + ID $ ID shift ID = Exp + ID $ ID LR parsing example reduce Exp -> Exp "+" ID ID = Exp $ Exp + ID ID reduce Stmt -> ID "=" Exp Stmt $ ID = Exp Exp + ID ID accept Grammar: p0: S -> Stmt $ p1: Stmt -> ID "=" Exp p2: Exp -> ID p3: Exp -> Exp "+" ID shift: push token onto stack, read next token reduce: pop rhs, push lhs, build part of tree accept: the tree is ready Follow the reduction steps in reverse order. They correspond to a rightmost derivation. Stmt => ID "=" Exp => ID "=" Exp "+" ID => ID "=" ID "+" ID 9

LR(1) items The parser uses a DFA (a deterministic finite automaton) to decide whether to shift or reduce. The states in the DFA are sets of LR items. LR(1) item: X -> a b t,s An LR(1) item is a production extended with: A dot ( ), corresponding to the position in the input sentence. One or more possible lookahead terminal symbols, t,s (we will use? when the lookahead doesn't matter) The LR(1) item corresponds to a state where: The topmost part of the stack is a. The first part of the remaining input is expected to match b(t s) 10

Grammar: p0: S -> E $ p1: E -> T "+" E p2: E -> T p3: T -> ID S -> E $? S -> E $? E -> T "+" E $ E -> T $ Constructing state 1 First, take the start production and place the dot in the beginning... Note that there is a nonterminal E right after the dot, and it is followed by a terminal $. Add the productions for E, with $ as the lookahead. Note that there is a nonterminal T right after the dot, and which is followed by either "+" or $. Add the productions for T, with "+" and $ as the lookahead. (We write them on the same line as a shorthand.) 1 S -> E $? E -> T "+" E $ E -> T $ T -> ID +,$ We have already added productions for all nonterminals that are right after the dot. Nothing more can be added. We are finished constructing state 1. Adding new productions for nonterminals following the dot, until no more productions can be added, is called taking the closure of the LR item set. 11

Grammar: p0: S -> E $ p1: E -> T "+" E p2: E -> T p3: T -> ID Constructing the next states 2 E S -> E $? 1 S -> E $? E -> T "+" E $ E -> T $ T -> ID +,$ T 3 E -> T "+" E $ E -> T $ ID 4 T -> ID +,$ Note that the dot is followed by E, T, and ID in state 1. For each of these symbols, create a new set of LR items, by advancing the dot passed that symbol. Then complete the states by taking the closure. (Nothing had to be added for these states.) 12

Grammar: Completing the LR DFA p0: S -> E $ p1: E -> T "+" E p2: E -> T p3: T -> ID E 2 S -> E $? 6 E -> T "+" E $ 1 S -> E $? E -> T "+" E $ E -> T $ T -> ID +,$ T 3 E -> T "+" E $ E -> T $ T "+" 5 E E -> T "+" E $ E -> T "+" E $ E -> T $ T -> ID +,$ ID 4 T -> ID +,$ ID Complete the DFA by advancing the dot, creating new states, completing them by taking the closure. If there is already a state with the same items, we use that state instead. 13

Constructing the LR table For each token edge t, from state j to state k, add a shift action "s k" (shift and goto state k) to table[j,t]. For each nonterminal edge X, from state j to state k, add a goto action "g k" (goto state k) to table[j,x]. For a state j containing an LR item with the dot to the left of $, add an accept action "a" to table[j,$] For each state j that contains an LR item where the dot is at the end, add a reduce action "r p" (reduce p) to table[j,t], where p is the production and t is the lookahead token. state "+" ID $ E T 1 2 3 4 5 6 14

Constructing the LR table For each token edge t, from state j to state k, add a shift action "s k" (shift and goto state k) to table[j,t]. For each nonterminal edge X, from state j to state k, add a goto action "g k" (goto state k) to table[j,x]. For a state j containing an LR item with the dot to the left of $, add an accept action "a" to table[j,$] For each state j that contains an LR item where the dot is at the end, add a reduce action "r p" (reduce p) to table[j,t], where p is the production and t is the lookahead token. state "+" ID $ E T 1 s 4 g 2 g 3 2 a 3 s 5 r p2 4 r p3 r p3 5 s 4 g 6 g 3 6 r p1 15

Using the LR table for parsing Use a symbol stack and a state stack The current state is the state stack top. Push state 1 to the state stack Perform an action for each token: Case Shift s: Push the token to the symbol stack Push s to the state stack The current state is now s. Case Reduce p: Pop symbols for the rhs of p Push the lhs symbol X of p Pop the same number of states Let s1 = the top of the state stack Let s2 = table[s1,x] Push s2 to the state stack The current state is now s2. Case Accept: Report successful parse state "+" ID $ E T 1 s 4 g 2 g 3 2 a 3 s 5 r p2 4 r p3 r p3 5 s 4 g 6 g 3 6 r p1 16

Grammar: p0: S -> E $ p1: E -> T "+" E p2: E -> T p3: T -> ID Parse table: state "+" ID $ E T Example of LR parsing 1 s 4 g 2 g 3 2 a 3 s 5 r p2 4 r p3 r p3 5 s 4 g 6 g 3 6 r p1 Parsing ID + ID $ State stack Symbol stack Input 1 ID + ID $ action 17

Grammar: p0: S -> E $ p1: E -> T "+" E p2: E -> T p3: T -> ID Parse table: state "+" ID $ E T Example of LR parsing 1 s 4 g 2 g 3 2 a 3 s 5 r p2 4 r p3 r p3 5 s 4 g 6 g 3 6 r p1 Parsing ID + ID $ State stack Symbol stack Input action 1 ID + ID $ shift 4 1 4 ID + ID $ reduce p3 1 3 T + ID $ shift 5 1 3 5 T + ID $ shift 4 1 3 5 4 T + ID $ reduce p3 1 3 5 3 T + T $ reduce p2 1 3 5 6 T + E $ reduce p1 1 2 E $ accept 18

Grammar: p0: S -> E $ p1: E -> E "+" E p2: E -> E "*" E p3: E -> ID Parts of the DFA: Conflict in an LR table Parts of the parse table: state... "+"............ 3... 3 E -> E "+" E? E -> E "*" E "+" "+" Fill in the parse table. 5 What is the problem? 19

Conflict in an LR table Grammar: p0: S -> E $ p1: E -> E "+" E p2: E -> E "*" E p3: E -> ID Parts of the DFA: Parts of the parse table: state... "+"............ 3 s 5, r p2... 3 5 E -> E "+" E? E -> E "*" E "+" "+" There is a shift-reduce conflict. The grammar is ambiguous. In this case, we can resolve the conflict by selecting one of the actions. To understand which one, think about what the top of the stack looks like. Think about what will happen later if we take the shift rule or the reduce rule. 20

Analyzing LR conflicts... Example output from parser generator (CUP):... *** Shift/Reduce conflict found in state #5 between expr ::= expr PLUS expr (*) and expr ::= expr (*) PLUS expr under symbol PLUS Resolved in favor of shifting.... Note! The dot is written as "(*)". Note! The parser generator automatically resolves the conflict by shifting. Is this what we want??? Line up the dots in the state: expr -> expr PLUS expr expr -> expr PLUS expr The top of stack and input may look like:... expr PLUS expr PLUS expr... top of stack remaining input 21

Line up the dots in the state:... Analyzing LR conflicts expr -> expr PLUS expr PLUS expr -> expr PLUS expr? expr expr expr expr... expr PLUS expr PLUS expr...... expr PLUS expr PLUS expr... If we shift If we reduce Which rule should we choose? 22

Different kinds of conflicts E -> E "+" E? E -> E "*" E "+" A shift-reduce conflict. A -> B C t D -> C t A reduce-reduce conflict. Shift-reduce conflicts can sometimes be solved with precedence rules. In particular for binary expressions with priority and associativity. For other cases, you need to carefully analyze the shift-reduce conflicts to see if precedence rules are applicable, or if you need to change the grammar. For reduce-reduce conflicts, it is advisable to think through the problems, and change the grammar. 23

Typical precedence rules for an LR parser generator E -> E "==" E E -> E "**" E E -> E "*" E E -> E "/" E E -> E "+" E E -> E "-" E E -> ID E -> INT precedence nonassoc EQ precedence left PLUS, MINUS precedence left TIMES, DIV precedence right POWER Shift-reduce conflicts can be automatically resolved using precedence rules. Operators in the same rule have the same priority (e.g., PLUS, MINUS). Operators in a later rule have higher priority (e.g. TIMES has higher prio than PLUS.) 24

How the precedence rules work A rule is given the priority and associativity of its rightmost token. For two conflicting rules with different priority, the rule with the highest priority is chosen: E -> E + E? E -> E * E + Reduce is chosen E -> E * E? E -> E + E * Shift is chosen Two conflicting rules with the same priority have the same associativity. Left-associativity favors reduce. Right-associativity favors shift. Non-associativity removes both rules from the table (input following that pattern will cause a parse error). E -> E + E + E -> E + E? E -> E ** E ** E -> E ** E? E -> E == E == E -> E == E? Reduce is chosen Shift is chosen No rule is chosen 25

Type LR(0) SLR Simple LR LALR(1) used in practice LR(1) Different variants of LR(k) parsers Characteristics LR items without lookahead. Not very useful in practice. Look at the FOLLOW set to decide where to put reduce actions. Can parse some useful grammars. Merges states that have the same LR items, but different lookaheads (LA). Leads to much smaller tables than LR(1). Used by most well known tools: Yacc, CUP, Beaver, SableCC,... Sufficient for most practical parsing problems. Slightly more powerful than LALR(1). Not used in practice the tables become very large. LR(k) Much too large tables for k>1 26

Different variants of LR parsers Unambiguous LR(0) SLR LALR(1) LR(1) LR(k) Ambiguous GLR All context-free grammars 27

GLR Generalized LR Universal parsing algorithms Can parse any context free grammar. Including ambiguous grammars! Returns a parse forest (all possible parse trees). Additional mechanism needed to select which of the trees to use. Can parse grammars with shift-reduce and reduce-reduce conflicts (spawns parallel parsers). Has cubic worst-case complexity (in the length of the input). Is often much better than that in practice. But still slower than LALR. Used in several research systems. There is a recent universal LL algorithm as well: GLL (from 2010) 28

Name JavaCC Some well-known parser generators Type, host language LL, Java ANTLR Adaptive LL(*), Java (also earlier versions for C, C#,...) yacc bison CUP beaver SDF/SGLR LALR,C, "yet another compiler compiler" Developed for AT&T Unix in 1970. LALR, C++, GNU GPL LALR, Java LALR, Java Scannerless GLR, C, Java For more examples, see http://en.wikipedia.org/wiki/comparison_of_parser_generators 29

Parsing Expression Grammars (PEGs) Look like CFGs, but productions are ordered Always unambiguous Unlimited lookahead Backtracks (tries the next alternative) if an alternative is not successful Does not allow left recursion (but supports EBNF) Straightforward implementation is exponential "Packrat" implementation is linear, but uses more memory proportional to input Often implemented using parser combinators higher-order functions that combine subparsers. Parser combinator libraries are alternatives to using parser generators. 30

Summary questions: LR parsing How does LR differ from LL parsers? What does it mean to shift? What does it mean to reduce? Explain how LR parsing works on an example. What is an LR item? What does an LR state consist of? What does it mean to take the closure of a set of LR items? What do the edges in an LR DFA represent? How can an LR table be constructed from an LR DFA? How is the LR table used for parsing? What is meant by a shift-reduce conflict and a reduce-reduce conflict? How can such a conflict be analyzed? How can precedence rules be used in an LR parser? What is LR(0) and SLR parsing? What is the difference between LALR(1) and LR(1)? Explain why the LALR(1) algorithm is most commonly used in parser generators. What is a GLR parser? 31