Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Similar documents
Parsing III. (Top-down parsing: recursive descent & LL(1) )

Introduction to Parsing. Comp 412

CSCI312 Principles of Programming Languages

Parsing Part II (Top-down parsing, left-recursion removal)

Syntactic Analysis. Top-Down Parsing

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Parsing II Top-down parsing. Comp 412

Syntax Analysis, III Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

Syntax Analysis, III Comp 412

Bottom-up Parser. Jungsik Choi

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Chapter 4: LR Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

Computer Science 160 Translation of Programming Languages

Parsing II Top-down parsing. Comp 412

3. Parsing. Oscar Nierstrasz

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

CS 314 Principles of Programming Languages

Chapter 3. Parsing #1

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Chapter 4. Lexical and Syntax Analysis

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Top down vs. bottom up parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CA Compiler Construction

4. Lexical and Syntax Analysis

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

4. Lexical and Syntax Analysis

Lexical and Syntax Analysis. Top-Down Parsing

CS 406/534 Compiler Construction Putting It All Together

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Lexical and Syntax Analysis

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.


EECS 6083 Intro to Parsing Context Free Grammars

Building a Parser Part III

Compiler Construction 2016/2017 Syntax Analysis

Introduction to Parsing

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Table-Driven Parsing

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

CSE 3302 Programming Languages Lecture 2: Syntax

Syntax Analysis Check syntax and construct abstract syntax tree

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Monday, September 13, Parsers

CS415 Compilers. Lexical Analysis

Wednesday, August 31, Parsers

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Introduction to Bottom-Up Parsing

CS 4120 Introduction to Compilers

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Lecture 8: Deterministic Bottom-Up Parsing

Context-free grammars

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Lexical and Syntax Analysis

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis Part I

Compiler Construction: Parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Lecture 7: Deterministic Bottom-Up Parsing

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Defining syntax using CFGs

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Front End. Hwansoo Han

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Bottom-Up Parsing. Lecture 11-12

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Compilers. Bottom-up Parsing. (original slides by Sam

LECTURE 7. Lex and Intro to Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

LR Parsing Techniques

Transcription:

Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.

Left Factoring (Generality) Question Answer Example By eliminating left recursion and left factoring, can we transform an arbitrary CFG to a form where it meets the LL(1) condition? (and can be parsed predictively with a single token lookahead?) Given a CFG that doesn t meet the LL(1) condition, it is undecidable whether or not an equivalent LL(1) grammar exists. {a n 0 b n n 1} {a n 1 b 2n n 1} has no LL(1) grammar

Language that Cannot Be LL(1) Example {a n 0 b n n 1} {a n 1 b 2n n 1} has no LL(1) grammar G aab abbb A aab 0 Problem: need an unbounded number of a characters before you can determine whether you are in the A group or the B group. B abbb 1

Recursive Descent (Summary) 1. Build FIRST (and FOLLOW) sets 2. Massage grammar to have LL(1) condition a. Remove left recursion b. Left factor it 3. Define a procedure for each non-terminal a. Implement a case for each right-hand side b. Call procedures as needed for non-terminals 4. Add extra code, as needed a. Perform context-sensitive checking b. Build an IR to record the code Can we automate this process?

FIRST and FOLLOW Sets FIRST(α) For some α T NT, define FIRST(α) as the set of tokens that appear as the first symbol in some string that derives from α That is, x FIRST(α) iff α * x γ, for some γ FOLLOW(α) For some α NT, define FOLLOW(α) as the set of symbols that can occur immediately after α in a valid sentence. FOLLOW(S) = {EOF}, where S is the start symbol To build FIRST sets, we need FOLLOW sets

Building Top-down Parsers Given an LL(1) grammar, and its FIRST & FOLLOW sets Emit a routine for each non-terminal Nest of if-then-else statements to check alternate rhs s Each returns true on success and throws an error on false Simple, working (, perhaps ugly,) code This automatically constructs a recursive-descent parser Improving matters Nest of if-then-else statements may be slow Good case statement implementation would be better I don t know of a system that does this What about a table to encode the options? Interpret the table with a skeleton, as we did in scanning

Building Top-down Parsers Strategy Encode knowledge in a table Use a standard skeleton parser to interpret the table Example The non-terminal Factor has three expansions ( Expr ) or Identifier or Number Table might look like: Terminal Symbols + - * / Id. Num. EOF Non-terminal Symbols Factor 10 11 Error on `+ Reduce by rule 10 on `Id

Building Top Down Parsers Building the complete table Need a row for every NT & a column for every T Need a table-driven interpreter for the table

LL(1) Skeleton Parser token next_token() push EOF onto Stack push the start symbol, S, onto Stack TOS top of Stack loop forever if TOS = EOF and token = EOF then break & report success exit on success else if TOS is a terminal then if TOS matches token then pop Stack // recognized TOS token next_token() else report error looking for TOS else // TOS is a non-terminal if TABLE[TOS,token] is A B 1 B 2 B k then pop Stack // get rid of A push B k, B k-1,, B 1 // in that order else report error expanding TOS TOS top of Stack

Building Top Down Parsers Building the complete table Need a row for every NT & a column for every T Need an algorithm to build the table Filling in TABLE[X,y], X NT, y T 1. entry is the rule X β, if y FIRST(β ) 2. entry is the rule X ε if y FOLLOW(X ) and X ε G 3. entry is error if neither 1 nor 2 define it If any entry is defined multiple times, G is not LL(1) This is the LL(1) table construction algorithm

Recursive Descent in Object-Oriented Languages Shortcomings of Recursive Descent Too procedural No convenient way to build parse tree Solution Associate a class with each non-terminal symbol Allocated object contains pointer to the parse tree Class NonTerminal { public: NonTerminal(Scanner & scnr) { s = &scnr; tree = NULL; } virtual ~NonTerminal() { } virtual bool ispresent() = 0; TreeNode * absyntree() { return tree; } protected: } Scanner * s; TreeNode * tree;

Non-terminal Classes Class Expr : public NonTerminal { public: Expr(Scanner & scnr) : NonTerminal(scnr) { } virtual bool ispresent(); } Class EPrime : public NonTerminal { public: EPrime(Scanner & scnr, TreeNode * p) : NonTerminal(scnr) { exprsofar = p; } virtual bool ispresent(); protected: TreeNode * exprsofar; } // definitions for Term and TPrime Class Factor : public NonTerminal { public: Factor(Scanner & scnr) : NonTerminal(scnr) { }; virtual bool ispresent(); }

Implementation of ispresent bool Expr::isPresent() { Term * operand1 = new Term(*s); if (!operand1->ispresent()) return FALSE; Eprime * operand2 = new EPrime(*s, NULL); if (!operand2->ispresent()) // do nothing; } return TRUE;

Implementation of ispresent bool EPrime::isPresent() { token_type op = s->nexttoken(); if (op == PLUS op == MINUS) { s->advance(); Term * operand2 = new Term(*s); if (!operand2->ispresent()) throw SyntaxError(*s); Eprime * operand3 = new EPrime(*s, NULL); if (operand3->ispresent()); //do nothing } return TRUE; } else return FALSE;

Abstract Syntax Tree Construction bool Expr::isPresent() { // with semantic processing Term * operand1 = new Term(*s); if (!operand1->ispresent()) return FALSE; tree = operand1->absyntree(); EPrime * operand2 = new EPrime(*s, tree); if (operand2->ispresent()) tree = operand2->abssyntree(); } // here tree is either the tree for the Term // or the tree for Term followed by EPrime return TRUE;

Abstract Syntax Tree Construction bool EPrime::isPresent() { // with semantic processing token_type op = s->nexttoken(); if (op == PLUS op == MINUS) { s->advance(); Term * operand2 = new Term(*s); if (!operand2->ispresent()) throw SyntaxError(*s); TreeNode * t2 = operand2->abssyntree(); tree = new TreeNode(op, exprsofar, t2); } Eprime * operand3 = new Eprime(*s, tree); if (operand3->ispresent()) tree = operand3->abssyntree(); return TRUE; } else return FALSE;

Factor bool Factor::isPresent() { // with semantic processing token_type op = s->nexttoken(); } if (op == IDENTIFIER op == NUMBER) { tree = new TreeNode(op, s->tokenvalue()); s->advance(); return TRUE; } else if (op == LPAREN) { s->advance(); Expr * operand = new Expr(*s); if (!operand->ispresent()) throw SyntaxError(*s); if (s->nexttoken()!= RPAREN) throw SyntaxError(*s); s->advance(); tree = operand->abssyntree(); return TRUE; } else return FALSE;

Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad pick may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars

Bottom-up Parsing (definitions) The point of parsing is to construct a derivation A derivation consists of a series of rewrite steps S γ 0 γ 1 γ 2 γ n 1 γ n sentence Each γ i is a sentential form If γ contains only terminal symbols, γ is a sentence in L(G) If γ contains 1 non-terminals, γ is a sentential form To get γ i from γ i 1, expand some NT A γ i 1 by using A β Replace the occurrence of A γ i 1 with β to get γ i In a leftmost derivation, it would be the first NT A γ i 1 A left-sentential form occurs in a leftmost derivation A right-sentential form occurs in a rightmost derivation

Bottom-up Parsing A bottom-up parser builds a derivation by working from the input sentence back toward the start symbol S S γ 0 γ 1 γ 2 γ n 1 γ n sentence bottom-up To reduce γ i to γ i 1 match some rhs β against γ i then replace β with its corresponding lhs, A. (assuming the production A β) In terms of the parse tree, this is working from leaves to root Nodes with no parent in a partial tree form its upper fringe Since each replacement of β with A shrinks the upper fringe, we call it a reduction. The parse tree need not be built, it can be simulated parse tree nodes = words + reductions

Finding Reductions Consider the simple grammar And the input string abbcde The trick is scanning the input and finding the next reduction The mechanism for doing this must be efficient

Finding Reductions (Handles) The parser must find a substring β of the tree s frontier that matches some production A β that occurs as one step in the rightmost derivation ( β A is in RRD) Informally, we call this substring β a handle Formally, A handle of a right-sentential form γ is a pair <A β,k> where A β P and k is the position in γ of β s rightmost symbol. If <A β,k> is a handle, then replacing β at k with A produces the right sentential form from which γ is derived in the rightmost derivation. Because γ is a right-sentential form, the substring to the right of a handle contains only terminal symbols the parser doesn t need to scan past the handle (very far)

Finding Reductions (Handles) Critical Insight (Theorem?) If G is unambiguous, then every right-sentential form has a unique handle. If we can find those handles, we can build a derivation! Sketch of Proof: 1 G is unambiguous rightmost derivation is unique 2 a unique production A β applied to derive γ i from γ i 1 3 a unique position k at which A β is applied 4 a unique handle <A β,k> This all follows from the definitions

Example (a very busy slide) The expression grammar Handles for rightmost derivation of x 2 * y This is the inverse of Figure 3.9 in EaC

Handle-pruning, Bottom-up Parsers The process of discovering a handle & reducing it to the appropriate left-hand side is called handle pruning Handle pruning forms the basis for a bottom-up parsing method To construct a rightmost derivation S γ 0 γ 1 γ 2 γ n 1 γ n w Apply the following simple algorithm for i n to 1 by 1 This takes 2n steps Find the handle <A i β i, k i > in γ i Replace β i with A i to generate γ i 1

Handle-pruning, Bottom-up Parsers One implementation technique is the shift-reduce parser push INVALID token next_token( ) repeat until (top of stack = Goal and token = EOF) if the top of the stack is a handle A β then // reduce β to A pop β symbols off the stack push A onto the stack else if (token EOF) then // shift push token token next_token( ) else // need to shift, but out of input report an error How do errors show up? failure to find a handle hitting EOF & needing to shift (final else clause) Either generates an error Figure 3.7 in EAC

Back to x - 2 * y Stack Input Handle Action $ id num * id none shift $ id num * id 1. Shift until the top of the stack is the right end of a handle 2. Find the left end of the handle & reduce

Back to x - 2 * y Stack Input Handle Action $ id num * id none shift $ id num * id 9,1 red. 9 $ Factor num * id 7,1 red. 7 $ Term num * id 4,1 red. 4 $ Expr num * id 1. Shift until the top of the stack is the right end of a handle 2. Find the left end of the handle & reduce

Back to x - 2 * y Stack Input Handle Action $ id num * id none shift $ id num * id 9,1 red. 9 $ Factor num * id 7,1 red. 7 $ Term num * id 4,1 red. 4 $ Expr num * id none shift $ Expr num * id none shift $ Expr num * id 1. Shift until the top of the stack is the right end of a handle 2. Find the left end of the handle & reduce

Back to x - 2 * y Stack Input Handle Action $ id num * id none shift $ id num * id 9,1 red. 9 $ Factor num * id 7,1 red. 7 $ Term num * id 4,1 red. 4 $ Expr num * id none shift $ Expr num * id none shift $ Expr num * id 8,3 red. 8 $ Expr Factor * id 7,3 red. 7 $ Expr Term * id 1. Shift until the top of the stack is the right end of a handle 2. Find the left end of the handle & reduce

Back to x - 2 * y Stack Input Handle Action $ id num * id none shift $ id num * id 9,1 red. 9 $ Factor num * id 7,1 red. 7 $ Term num * id 4,1 red. 4 $ Expr num * id none shift $ Expr num * id none shift $ Expr num * id 8,3 red. 8 $ Expr Factor * id 7,3 red. 7 $ Expr Term * id none shift $ Expr Term * id none shift $ Expr Term * id 1. Shift until the top of the stack is the right end of a handle 2. Find the left end of the handle & reduce

Back to x 2 * y Stack Input Handle Action $ id num * id none shift $ id num * id 9,1 red. 9 $ Factor num * id 7,1 red. 7 $ Term num * id 4,1 red. 4 $ Expr num * id none shift $ Expr num * id none shift $ Expr num * id 8,3 red. 8 $ Expr Factor * id 7,3 red. 7 $ Expr Term * id none shift $ Expr Term * id none shift $ Expr Term * id 9,5 red. 9 $ Expr Term * Factor 5,5 red. 5 $ Expr Term 3,3 red. 3 $ Expr 1,1 red. 1 $ Goal none accept 5 shifts + 9 reduces + 1 accept 1. Shift until the top of the stack is the right end of a handle 2. Find the left end of the handle & reduce

Example Goal Expr Expr Term Term Term * Fact. Fact. Fact. <id,y> <id,x> <num,2>

Shift-reduce Parsing Shift reduce parsers are easily built and easily understood A shift-reduce parser has just four actions Shift next word is shifted onto the stack Reduce right end of handle is at top of stack Locate left end of handle within the stack Pop handle off stack & push appropriate lhs Accept stop parsing & report success Error call an error reporting/recovery routine Accept & Error are simple Shift is just a push and a call to the scanner Reduce takes rhs pops & 1 push Handle finding is key handle is on stack finite set of handles use a DFA! If handle-finding requires state, put it in the stack 2x work

An Important Lesson about Handles To be a handle, a substring of a sentential form γ must have two properties: It must match the right hand side β of some rule A β There must be some rightmost derivation from the goal symbol that produces the sentential form γ with A β as the last production applied Simply looking for right hand sides that match strings is not good enough Critical Question: How can we know when we have found a handle without generating lots of different derivations? Answer: we use look ahead in the grammar along with tables produced as the result of analyzing the grammar. LR(1) parsers build a DFA that runs over the stack & finds them

An Important Lesson about Handles To be a handle, a substring of a sentential form γ must have two properties: It must match the right hand side β of some rule A β There must be some rightmost derivation from the goal symbol that produces the sentential form γ with A β as the last production applied We have seen that simply looking for right hand sides that match strings is not good enough Critical Question: How can we know when we have found a handle without generating lots of different derivations? Answer: we use look ahead in the grammar along with tables produced as the result of analyzing the grammar. o There are a number of different ways to do this. o We will look at two: operator precedence and LR parsing