Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Similar documents
COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing Part II (Top-down parsing, left-recursion removal)

Types of parsing. CMSC 430 Lecture 4, Page 1

CSCI312 Principles of Programming Languages

CS 406/534 Compiler Construction Parsing Part I

3. Parsing. Oscar Nierstrasz

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntactic Analysis. Top-Down Parsing

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Introduction to Parsing. Comp 412

Syntax Analysis, III Comp 412

Top down vs. bottom up parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing II Top-down parsing. Comp 412

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

CS 314 Principles of Programming Languages

Syntax Analysis, III Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Monday, September 13, Parsers

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Wednesday, August 31, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Computer Science 160 Translation of Programming Languages

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Chapter 4: LR Parsing

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Chapter 3. Parsing #1

CA Compiler Construction

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Lexical and Syntax Analysis. Top-Down Parsing

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Lexical and Syntax Analysis

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

COP4020 Programming Languages. Syntax Prof. Robert van Engelen


CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Table-Driven Parsing

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Compiler Construction 2016/2017 Syntax Analysis

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Front End. Hwansoo Han

Context-free grammars

LECTURE 7. Lex and Intro to Parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Concepts Introduced in Chapter 4

A programming language requires two major definitions A simple one pass compiler

Compiler Construction: Parsing

Compilers. Predictive Parsing. Alex Aiken

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Introduction to Parsing

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CMSC 330: Organization of Programming Languages

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Bottom-up Parser. Jungsik Choi

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Introduction to Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

Lecture Bottom-Up Parsing

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

LL Parsing: A piece of cake after LR

Parsing II Top-down parsing. Comp 412

4. Lexical and Syntax Analysis

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Bottom-Up Parsing. Lecture 11-12

4. Lexical and Syntax Analysis

Outline. The strategy: shift-reduce parsing. Introduction to Bottom-Up Parsing. A key concept: handles

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Bottom-Up Parsing II. Lecture 8

Compilers. Bottom-up Parsing. (original slides by Sam

Transcription:

Prelude COMP Lecture Topdown Parsing September, 00 What is the Tufts mascot? Jumbo the elephant Why? P. T. Barnum was an original trustee of Tufts : donated $0,000 for a natural museum on campus Barnum Museum, later Barnum Hall Jumbo : famous circus elephant : Jumbo died, was stuffed, donated to Tufts 9: Fire destroyed Barnum Hall, Jumbo Tufts University Computer Science Last time Finished scanning Produces a stream of tokens Removes things we don t care about, like white space and comments Contetfree grammars Formal description of language synta Deriving strings using CFG Depicting derivation as a parse tree Grammar issues Often: more than one way to derive a string Why is this a problem? Parsing: is string a member of L(G)? We want more than a yes or no answer Key: Represent the derivation as a parse tree We want the structure of the parse tree to capture the meaning of the sentence Tufts University Computer Science Tufts University Computer Science Grammar issues Parse tree: * y Often: more than one way to derive a string Why is this a problem? Parsing: is string a member of L(G)? We want more than a yes or no answer op Key: number Represent the derivation as a parse tree We want the structure of the parse op tree to capture + the meaning of the sentence * / Rule Rightmost derivation Sentential form op op <id,y> * <id,y> op * <id,y> op <num,> * <id,y> <num,> * <id,y> <id,> <num,> * <id,y> Parse tree op op * y Tufts University Computer Science Tufts University Computer Science

Abstract synta tree Left vs right derivations Parse tree contains etra junk Eliminate inediate nodes Move operators up to parent nodes Result: abstract synta tree op op * y * y Two derivations of * y Rule Sentential form op <id, > op <id,> <id,> op <id,> <num,> op <id,> <num,> * <id,> <num,> * <id,y> Rule Sentential form op op <id,y> * <id,y> op * <id,y> op <num,> * <id,y> <num,> * <id,y> <id,> <num,> * <id,y> Leftmost derivation Rightmost derivation Tufts University Computer Science Tufts University Computer Science Derivations One captures meaning, the other doesn t With precedence Last time: ways to force the right tree shape Add productions to represent precedence * y Leftmost derivation * y Rightmost derivation op number op + * / + * or / or or or number Tufts University Computer Science 9 Tufts University Computer Science 0 With precedence Parsing op op * y * * What is parsing? Discovering the derivation of a string If one eists Harder than generating strings Not surprisingly Two major approaches Topdown parsing Bottomup parsing y Don t work on all contetfree grammars Properties of grammar deine parseability Our goal: make parsing efficient We may be able to transform a grammar Tufts University Computer Science Tufts University Computer Science

Two approaches Topdown parsers LL(), recursive descent Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad pick may need to backtrack Bottomup parsers LR(), operator precedence Start at the leaves and grow toward root As input is consumed, encode possible parse trees in an internal state (similar to our NFA DFA conversion) Bottomup parsers handle a large class of grammars Grammars and parsers LL() parsers Lefttoright input Leftmost derivation symbol of lookahead LR() parsers Lefttoright input Rightmost derivation symbol of lookahead Also: LL(k), LR(k), SLR, LALR, Grammars that this can handle are called LL() grammars Grammars that this can handle are called LR() grammars Tufts University Computer Science Tufts University Computer Science Topdown parsing Start with the root of the parse tree Root of the tree: node labeled with the start symbol Algorithm: Repeat until the fringe of the parse tree matches input string At a node A, select a production for A Add a child node for each symbol on rhs If a inal symbol is added that doesn t match, backtrack Find the net node to be epanded (a noninal) Done when: Leaves of parse tree match input string (success) All productions ehausted in backtracking (failure) Tufts University Computer Science Eample Epression grammar + * or / or or or number Input string * y (with precedence) Tufts University Computer Science Eample Current position in the input stream Backtracking Rule Sentential form Input string * y + * y + * y or + * y <id> + * y <id,> + * y + Rule Sentential form Input string * y + * y + * y or + * y <id> + * y? <id,> + * y Undo all these productions Problem: Can t match net inal We guessed wrong at step Rollback productions Choose a different production for Continue Tufts University Computer Science Tufts University Computer Science

Retrying Successful parse Rule Sentential form Input string * y * y * y or * y <id> * y <id,> * y <id,> or * y <id,> <num> * y Problem: More input to read Another cause of backtracking Rule Sentential form Input string * y * y * y or * y <id> * y <id,> * y <id,> * * y <id,> * * y <id,> <num> * * y <id,> <num,> * * y <id,> <num,> * <id> * y All inals match we re done * y Tufts University Computer Science 9 Tufts University Computer Science 0 Other possible parses Rule Sentential form Input string * y + * y + + * y + + + * y + + + + * y Problem: ination Wrong choice leads to infinite epansion (More importantly: without consuming any input!) May not be as obvious as this Our grammar is left recursive Tufts University Computer Science Left recursion Formally, A grammar is left recursive if a noninal A such that A * A α (for some set of symbols α) What does * mean? A B B A y Bad news: Topdown parsers cannot handle left recursion Good news: We can systematically eliminate left recursion Tufts University Computer Science Notation Eliminating left recursion Noninals Capital letter: A, B, C Terminals Lowercase, underline:, y, z Some mi of inals and noninals Greek letters: α, β, γ Eample: A B + A B α α = + Consider this grammar: Rewrite as foo foo α β foo β bar bar α bar New noninal Language is β followed by zero or more α This production gives you one β These two productions give you zero or more α Tufts University Computer Science Tufts University Computer Science

Back to essions Eliminating left recursion Two cases of left recursion: + Transform as follows: + * or / or or or * or / or Resulting grammar All right recursive Retain original language and associativity Not as intuitive to read Topdown parser Will always inate May still backtrack There s a lovely algorithm to do this automatically, which we will skip 9 0 + or * or / or or number Tufts University Computer Science Tufts University Computer Science Topdown parsers Problem: Leftrecursion Solution: Technique to remove it What about backtracking? Current algorithm is brute force Problem: how to choose the right production? Idea: use the net input token (duh) How? Look at our rightrecursive grammar Tufts University Computer Science Rightrecursive grammar 9 0 + or * or / or or number Two productions with no choice at all All other productions are uniquely identified by a inal symbol at the start of RHS We can choose the right production by looking at the net input symbol This is called lookahead BUT, this can be tricky Tufts University Computer Science Lookahead Goal: avoid backtracking Look at future input symbols Use etra contet to make right choice How much lookahead is needed? In general, an arbitrary amount is needed for the full class of contetfree grammars Use fancydancy algorithm CYK algorithm, O(n ) Fortunately, Many CFGs can be parsed with limited lookahead Covers most programming languages not C++ or Perl Topdown parsing Goal: Given productions A α β, the parser should be able to choose between α and β Trying to match A How can the net input token help us decide? Solution: FIRST sets (almost a solution) Informally: FIRST(α) is the set of tokens that could appear as the first symbol in a string derived from α Def: in FIRST(α) iff α * γ Tufts University Computer Science 9 Tufts University Computer Science 0

Topdown parsing Building FIRST sets We ll look at this algorithm later The LL() property Given A α and A β, we would like: FIRST(α) FIRST(β) = Parser can make right choice by looking at one lookahead token..almost.. Topdown parsing What about ε productions? Complicates the definition of LL() Consider A α and A β and α may be empty In this case there is no symbol to identify α Eample: What is FIRST()? = { ε } A B y C What lookahead symbol tells us we are matching production? Tufts University Computer Science Tufts University Computer Science Topdown parsing If A was empty What will the net symbol be? Must be one of the symbols that immediately follow an A Solution Build a FOLLOW set for each production with ε Etra condition for LL: FIRST(β) must be disjoint from FIRST(α) and FOLLOW(Α) FOLLOW sets Eample: FIRST() = { } FIRST() = { y } FIRST() = { ε } A B y C E A z What can follow A? Look at the contet of all uses of A FOLLOW(A) = { z } Now we can uniquely identify each production: If we are trying to match an A and the net token is z, then we matched production Tufts University Computer Science Tufts University Computer Science More on FIRST and FOLLOW Notice: FIRST and FOLLOW may be sets FIRST may contain ε in addition to other symbols Eample: FIRST() = {, y, ε } FOLLOW(A) = { z, w } Question: When would we care about FOLLOW(A)? Answer: if FIRST(C) contains ε A B C B y E A z F A w Tufts University Computer Science LL() property Including ε productions FOLLOW(A) = the set of inal symbols that can immediately follow A Def: FIRST+(A α) as FIRST(α) U FOLLOW(A), if ε FIRST(α) FIRST(α), otherwise Def: a grammar is LL() iff A α and A β and FIRST+(A α) FIRST+(A β) = Tufts University Computer Science

LL() property Question Can there be two rules A αand A βin a LL() grammar such that ε FIRST(α) and ε FIRST(β)? Answer Yes, as long as they have different FOLLOW sets Parsing LL() grammar Given an LL() grammar Code: simple, fast routine to recognize each production Given A β β β, with FIRST + (β i ) FIRST + (β j ) = /* find rule for A */ if (current token FIRST+(β )) select A β else if (current token FIRST+(β )) select A β else if (current token FIRST+(β )) for all i!= j select A β else report an error and return false Tufts University Computer Science Tufts University Computer Science Predictive parsing Recursive descent Predictive parsing The parser can predict the correct epansion Using lookahead and FIRST and FOLLOW sets Two kinds of predictive parsers Recursive descent Often handwritten Tabledriven Generate tables from First and Follow sets 9 0 goal + or * or / or or number ( ) This produces a parser with si mutually recursive routines: Goal Epr Epr Term Term Factor Each recognizes one NT or T The descent refers to the direction in which the parse tree is built. Tufts University Computer Science 9 Tufts University Computer Science 0 Eample code Goal symbol: Eample code Match main() /* Match goal > */ tok = nettoken(); if (() && tok == EOF) then proceed to net step; else return false; Toplevel ession () /* Match > */ if (() && ()); else return false; () /* Match > + */ /* Match > */ if (tok == + or tok == ) tok = nettoken(); if (()) then return (); else return false; /* Match > empty */ Check FIRST and FOLLOW sets to distinguish Tufts University Computer Science Tufts University Computer Science

Eample code or() /* Match or > ( ) */ if (tok == ( ) tok = nettoken(); if (() && tok == ) ) else synta error: epecting ) return false /* Match or > num */ if (tok is a num) return true /* Match or > id */ if (tok is an id) Topdown parsing So far: Gives us a yes or no answer We want to build the parse tree How? Add actions to matching routines Create a node for each production How do we assemble the tree? Tufts University Computer Science Tufts University Computer Science Building a parse tree Notice: Recursive calls match the shape of the tree Idea: use a stack Each routine: main or Pops off the children it needs Creates its own node Pushes that node back on the stack Building a parse tree With stack operations () /* Match > */ if (() && ()) _node = pop(); _node = pop(); _node = new Node(_node, _node) push(_node); else return false; Tufts University Computer Science Tufts University Computer Science Net time Finish topdown parsing Tabledriven parsers Building FIRST and FOLLOW sets Start bottomup parsing Tufts University Computer Science