Downloaded from Page 1. LR Parsing

Similar documents
LR Parsers. Aditi Raste, CCOEW

UNIT-III BOTTOM-UP PARSING

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

LR Parsing Techniques

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

MODULE 14 SLR PARSER LR(0) ITEMS

S Y N T A X A N A L Y S I S LR

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Chapter 3. Parsing #1

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

UNIT III & IV. Bottom up parsing

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

CS606- compiler instruction Solved MCQS From Midterm Papers

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

LALR Parsing. What Yacc and most compilers employ.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

LR Parsing - Conflicts


Wednesday, August 31, Parsers

shift-reduce parsing

CSE 3302 Programming Languages Lecture 2: Syntax

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Lexical and Syntax Analysis. Bottom-Up Parsing

Principles of Programming Languages

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Lecture 8: Deterministic Bottom-Up Parsing

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Question Bank. 10CS63:Compiler Design

2068 (I) Attempt all questions.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 4120 Introduction to Compilers

Conflicts in LR Parsing and More LR Parsing Types

Lecture 7: Deterministic Bottom-Up Parsing

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Introduction to Parsing

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

LR Parsing Techniques

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Bottom-up Parser. Jungsik Choi

Chapter 2 :: Programming Language Syntax

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Bottom-Up Parsing. Lecture 11-12

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Formal Languages and Compilers Lecture VII Part 4: Syntactic A


VIVA QUESTIONS WITH ANSWERS

3. Parsing. Oscar Nierstrasz

Concepts Introduced in Chapter 4

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Top down vs. bottom up parsing

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Parsing II Top-down parsing. Comp 412

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

General Overview of Compiler

Monday, September 13, Parsers

Bottom-Up Parsing. Lecture 11-12

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

4. Lexical and Syntax Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

4. Lexical and Syntax Analysis

Compilers. Bottom-up Parsing. (original slides by Sam

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Front End. Hwansoo Han

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Lecture Bottom-Up Parsing

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

QUESTIONS RELATED TO UNIT I, II And III

Context-free grammars

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax-Directed Translation. Lecture 14

CS308 Compiler Principles Syntax Analyzer Li Jiang

EECS 6083 Intro to Parsing Context Free Grammars

CSE 401 Midterm Exam Sample Solution 2/11/15

Compiler Construction: Parsing

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Compiler Construction 2016/2017 Syntax Analysis

Lecture Notes on Bottom-Up LR Parsing

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Bottom-Up Parsing LR Parsing

Introduction to Parsing. Comp 412

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Transcription:

Downloaded from http://himadri.cmsdu.org Page 1 LR Parsing We first understand Context Free Grammars. Consider the input string: x+2*y When scanned by a scanner, it produces the following stream of tokens: id(x) + num(2) * id(y) The goal is to parse this expression and construct a data structure (a parse tree) to represent it. A possible syntax for expressions is given by the following grammar G1: E ::= E + T E T T T ::= T * F T / F F F ::= num id where E, T and F stand for expression, term, and factor respectively. For example, the rule for E indicates that an expression E can take one of the following 3 forms: an expression followed by the token + followed by a term, or an expression followed by the token followed by a term, or simply a term. The first rule for E is actually a shorthand of 3 productions: E ::= E + T E ::= E T E ::= T G1 is an example of a context free grammar; the symbols E, T and F are non terminals and should be defined using production rules, while +,, *, /, num, and id are terminals (i.e., tokens) produced by the scanner. The non terminal E is the start symbol of the grammar. In general, a context free grammar (CFG) has a finite set of terminals (tokens), a finite set of nonterminals from which one is the start symbol, and a finite set of productions of the form: A ::= X 1 X 2... X n where A is a non terminal and each X i is either a terminal or non terminal symbol.

Downloaded from http://himadri.cmsdu.org Page 2 Given two sequences of symbols a and b (can be any combination of terminals and non terminals) and a production A : : = X 1 X 2...X n, the form aab = > ax 1 X 2...X n b is called a derivation. That is, the non terminal symbol A is replaced by the rhs (right hand side) of the production for A. For example, T / F + 1 x => T * F / F + 1 x is a derivation since we used the production T := T * F. Top down parsing starts from the start symbol of the grammar S and applies derivations until the entire input string is derived (ie, a sequence of terminals that matches the input tokens). For example, E => E + T => E + T * F => T + T * F => T + F * F => T + num * F => F + num * F => id + num * F => id + num * id which matches the input sequence id(x) + num(2) * id(y). Top down parsing occasionally requires backtracking. For example, suppose that we used the derivation E => E T instead of the first derivation. Then, later we would have to backtrack because the derived symbols will not match the input tokens. This is true for all non terminals that have more than one production since it indicates that there is a choice of which production to use. We will learn how to construct parsers for many types of CFGs that never backtrack. These parsers are based on a method called predictive parsing. One issue to consider is which non terminal to replace when there is a choice. For example, in T + F * F we have 3 choices: we can use a derivation for T, for the first F, or for the second F. When we always replace the leftmost nonterminal, it is called leftmost derivation. In contrast to top down parsing, bottom up parsing starts from the input string and uses derivations in the opposite directions (i.e., by replacing the rhs sequence X 1 X 2...X n of a production A : : = X 1 X 2...X n with the non terminal A. It stops when it derives the start symbol. For example, id(x) + num(2) * id(y) => id(x) + num(2) * F => id(x) + F * F => id(x) + T * F => id(x) + T => F + T => T + T => E + T => E An LR Parser is a type of bottom up Parser for context free grammars that is very commonly used by computer programming language compilers (and other associated tools). The technique that can be

Downloaded from http://himadri.cmsdu.org Page 3 used to parse a large class of context free grammars is called LR(k) parsing; the L is for left to right scanning of the input, the R for constructing a rightmost derivation in reverse and the k for the number of input symbols of look ahead that are used in making parsing decisions. When (k) is omitted, it is assumed to be 1. LR parsing is attractive for a variety of reasons. 1. LR parsers can be constructed to recognize virtually all programming language constructs for which context free grammars can be written. 2. The LR parsing method is the most general non backtracking shift reduce parsing method known, yet it can be implemented as efficiently as other shift reduce methods. 3. The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers. 4. An LR parser can detect a syntactic error as soon as it is possible to do so on a left to right scan of the input. The principal drawback of the method is that it is too much work to construct an LR parser by hand for a typical programming language grammar. One needs a specialized tool an LR parser generator. With such a generator, one can write a context free grammar and have the generator automatically produce a parser for that grammar. If the grammar contains ambiguities or other constructs that are difficult to parse in a left to right scan of the input, then the parser generator can locate these constructs and inform the compiler designer of their presence.

Downloaded from http://himadri.cmsdu.org Page 4 There are three techniques for constructing an LR parsing table for a grammar. Simple LR (SLR for short), is the easiest to implement, but the least powerful of the three. It may fail to produce a parsing table for certain grammars on which the other methods succeed. A Simple LR parser or SLR parser is an LR parser for which the parsing tables are generated as for an LR(0) parser except that it only performs a reduction with a grammar rule A w if the next symbol on the input stream is in the follow set of A. Such a parser can prevent certain shiftreduce and reduce reduce conflicts that occur in LR(0) parsers and it can therefore deal with more grammars. However, it still cannot parse all grammars that can be parsed by an LR(1) parser. A grammar that can be parsed by an SLR parser is called a SLR grammar. Example A grammar that can be parsed by an SLR parser [but not by an LR(0) parser] is the following: (1) E 1 E (2) E 1 Constructing the action and goto table as is done for LR(0) parsers would give the following item sets and tables: Item set 0 S E + E 1 E + E 1 Item set 1 E 1 E E 1 + E 1 E + E 1 Item set 2 S E Item set 3 E 1 E The action and goto tables:

Downloaded from http://himadri.cmsdu.org Page 5 action goto state 1 $ E 0 s1 2 1 s1/r2 r2 3 2 acc 3 r1 r1 As can be observed there is a shift reduce conflict for state 1 and terminal '1'. However, the follow set of E is { $ } so the reduce actions r1 and r2 are only valid in the column for $. The results are the following conflict less action and goto table: action goto state 1 $ E 0 s1 2 1 s1 r2 3 2 acc 3 r1 Canonical LR, is the most powerful and expensive. A canonical LR parser or LR(1) parser is an LR parser whose parsing tables are constructed in a similar way as with LR(0) parsers except that the items in the item sets also contain a follow, i.e., a terminal that is expected by the parser after the right hand side of the rule. Such an item for a rule A > BC is for example of the form A > B C, a which would mean that the parser has read a string corresponding to B and expects next a string corresponding to C followed by the terminal 'a'. LR(1) parsers can deal with a very large class of grammars but their parsing tables are often very big. This can often be solved by merging item sets if they are identical except for the follows, which results in so called LALR parsers. Look Ahead LR (LALR for short), is intermediate in power and cost between the other two. This method is often used in practice because the tables obtained by it are considerably smaller than the Canonical LR tables, yet most common syntactic constructs of programming languages can be expressed conveniently by an LALR grammar. References [1] http://www.economicexpert.com/a/canonical:lr:parser.html assessed on December 13, 2009 at 2235 Hrs [2] http://www.economicexpert.com/a/simple:lr:parser.htm assessed on December 13, 2009 at 2305 Hrs [3] http://lambda.uta.edu/cse5317/notes/node12.html assessed on December 13, 2009 at 2330 Hrs [4] http://203.208.166.84/dtanvirahmed/cse309n/cse309 4 Rest.ppt retrieved on December 11, 2009 at 1620 Hrs