Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Similar documents
EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Context-free grammars (CFG s)

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Top down vs. bottom up parsing

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Downloaded from Page 1. LR Parsing

Bottom-Up Parsing. Lecture 11-12

Lecture 7: Deterministic Bottom-Up Parsing

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Lecture 8: Deterministic Bottom-Up Parsing

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Building Compilers with Phoenix

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Context-free grammars

4. Lexical and Syntax Analysis

Bottom-Up Parsing. Lecture 11-12

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

UNIT III & IV. Bottom up parsing

Compiler Construction: Parsing

4. Lexical and Syntax Analysis

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

3. Parsing. Oscar Nierstrasz

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

CS 314 Principles of Programming Languages

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Chapter 2 :: Programming Language Syntax

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Compilers. Bottom-up Parsing. (original slides by Sam

Chapter 3: Lexing and Parsing

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

2068 (I) Attempt all questions.

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

LECTURE 7. Lex and Intro to Parsing

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

COP 3402 Systems Software Syntax Analysis (Parser)

Wednesday, August 31, Parsers

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Parsing II Top-down parsing. Comp 412

Chapter 4. Lexical and Syntax Analysis

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Analysis Part I

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Monday, September 13, Parsers

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

CJT^jL rafting Cm ompiler

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Syntax-Directed Translation. Lecture 14

Optimizing Finite Automata

Compilers. Predictive Parsing. Alex Aiken

Chapter 3. Parsing #1

Using an LALR(1) Parser Generator

QUESTIONS RELATED TO UNIT I, II And III

CSE431 Translation of Computer Languages

Syntax. In Text: Chapter 3

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser


Question Bank. 10CS63:Compiler Design

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

LR Parsing Techniques

Principles of Programming Languages COMP251: Syntax and Grammars

Bottom-Up Parsing II. Lecture 8

shift-reduce parsing

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CS606- compiler instruction Solved MCQS From Midterm Papers

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Compiler Construction

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

A programming language requires two major definitions A simple one pass compiler

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CSCI312 Principles of Programming Languages

Lexical and Syntax Analysis

Transcription:

Parsing: continued David Notkin Autumn 2008 Parsing Algorithms Earley s algorithm (1970) works for all CFGs O(N 3 ) worst case performance O(N 2 ) for unambiguous grammars Based on dynamic programming, used primarily for computational linguistics Different parsing algorithms generally place various restrictions on the grammar of the language to be parsed Top-down Bottom-up Recursive descent LL LR LALR SLR CYK GLR Simple precedence parser Bounded context ACM digital library returned 5600+ articles matching parsing algorithm Google Scholar almost 34,000 CSE401 Au08 2 Top Down Parsing Build parse tree from the top (start symbol) down to leaves (terminals) Basic issue: when expanding a nonterminal, which right hand side should be selected? Solution: look at input tokens to decide Stmts ::= Call Assign If While Call ::= Id ( Expr {,Expr ) Assign ::= Id := Expr ; If ::= if Test then Stmts end if Test then Stmts else Stmts end While ::= while Test do Stmts end CSE401 Au08 3 Predictive Parser Predictive parser: top-down parser that uses at most the next k tokens to select production (the lookahead) Efficient: no backtracking needed, linear time to parse Implementations (analogous to lexing) recursive-descent parser each nonterminal parsed by a procedure call other procedures to parse sub-nonterminals, recursively typically written by hand table-driven parser push-down automata: essentially a table-driven FSA, plus stack to do recursive calls typically generated by a tool from a grammar specification CSE401 Au08 4 1

LL(k) Grammars LL(k) kfind Left-to-right tokens lookahead derivation scan Can construct predictive parser automatically and easily if grammar is LL(k) Left-to-right scan of input, finds leftmost derivation k tokens of look ahead needed Some restrictions including no ambiguity no common prefixes of length k: If ::= if Test then Stmts end if Test then Stmts else Stmts end no left recursion (e.g., E ::= E Op E...) Restrictions guarantee that, given k input tokens, can always select correct right hand side to expand nonterminal. What if there are common prefixes? Left factor common prefixes to eliminate them create new nonterminal for different suffixes delay choice until after common prefix Before If ::= if Test then Stmts end if Test then Stmts else Stmts end After If ::= if Test then Stmts IfCont IfCont ::= end else Stmts end CSE401 Au08 5 CSE401 Au08 6 Left recursion? Rewrite Table-driven predictive parser Before E ::= E + T T T ::= T * F F F ::= id... After E ::= T ECon ECon ::= + T ECon T ::= F TCon TCon ::= * F TCon F ::= id... Automatically compute PREDICT table from grammar PREDICT(nonterminal,input-token) => right hand side May not be as clear; can sugar it E ::= T { + T T ::= F { * F F ::= id ( E ) Greater distance from concrete syntax to abstract syntax CSE401 Au08 7 CSE401 Au08 8 2

Compute PREDICT table Example for you to do: if you want Compute FIRST set for each right hand side All tokens that can appear first in a derivation from that right hand side In case right hand side can be empty Compute FOLLOW set for each non-terminal All tokens that can appear immediately after that non-terminal in a derivation Compute FIRST and FOLLOW sets mutually recursively PREDICT then depends on the FIRST set CSE401 Au08 9 CSE401 Au08 10 PREDICT and LL(1) If PREDICT table has at most one entry per cell Then the grammar is LL(1) There is always exactly one right choice So it s fast to parse and easy to implement If multiple entries in each cell Ex: common prefixes, left recursion, ambiguity Can rewrite grammar (sometimes) Can patch table manually, if you know what to do Or can use more powerful parsing technique CSE401 Au08 11 Top down implementation For years the 401 compiler was a topdown predictive parser, implemented by a method for each nonterminal We have shifted to a bottom-up, automatically generated parser But if you re going to build a simple one, this is usually best Examples from http://en.wikibooks.org/ wiki/compiler_construct ion Helper functions on right int accept(symbol s) { if (sym == s) { getsym(); return 1; return 0; int expect(symbol s) { if (accept(s)) return 1; error("expect: unexpected symbol"); return 0; CSE401 Au08 12 3

Example method void factor(void) { if (accept(ident)) { ; else if (accept(number)) { ; else if (accept(lparen)) { expression(); expect(rparen); else { error("factor: syntax error"); getsym(); CSE401 Au08 13 Example method void statement(void) { if (accept(ident)) { expect(becomes); expression(); else if (accept(ifsym)) { condition(); expect(thensym); statement(); else if (accept(whilesym)) { condition(); expect(dosym); statement(); CSE401 Au08 14 Bottom up parsing Shift-reduce strategy Construct parse tree for input from leaves up reducing a string of tokens to single start symbol by inverting productions Bottom-up parsing is more general than top-down parsing and just as efficient generally preferred in practice int * int + int int * T + int T + int T + T T + E E T ::= int T ::= int * T T ::= int E ::= T E ::= T + E Read the productions found by bottom-up parse bottom to top; this is a rightmost derivation! CSE401 Au08 15 read ( shift ) tokens until the right hand side of correct production has been seen reduce handle to nonterminal, then continue done when all input read and reduced to start nonterminal xyzabcdef ^ A ::= bc.d CSE401 Au08 16 4

LR(k) LR(k) parsing Left-to-right scan of input, rightmost derivation k tokens of look ahead Strictly more general than LL(k) Gets to look at whole right hand side of production before deciding what to do, not just first k tokens Can handle left recursion and common prefixes As efficient as any top-down parsing Complex to implement Generally need automatic tools to construct parser from grammar CSE401 Au08 17 LR Parsing Tables Construct parsing tables implementing a FSA with a stack rows: states of parser columns: token(s) of lookahead entries: action of parser shift, goto state X reduce production X ::= RHS accept error Algorithm to construct FSA similar to algorithm to build DFA from NFA each state represents set of possible places in parsing LR(k) algorithm may build huge tables CSE401 A8 18 Questions? Ada language/compiler color US DoD wanted (roughly) a single, high-level programming language They wrote requirements for this language and received 14 bids (1977) Four semi-finalists (1978): green (Cii), red for (Intermetrics), blue (SofTech), yellow for (SRI) Two finalists: green and red requirements finalized as Steelman document CSE401 Au08 19 CSE401 Au08 20 5

General syntax: examples from Steelman York Ada compiler (c. 1986) Facts and Figures About the York Ada Compiler (Wand et al.) 2A. Character Set. The full set of character graphics that may be used in source problems shall be given in the language definition. Every source program shall also have a representation that uses only the following 55 character subset of the ASClI graphics: 2B. Grammar. The language should have a simple, uniform, and easily parsed grammar and lexical structure. The language shall have free form syntax and should use familiar notations where such use does not conflict with other goals. 2D. Other Syntactic Issues. Multiple occurrences of a language defined symbol appearing in the same context shall not have essentially different meanings. 2E. Mnemonic identifiers. Mnemonically significant identifiers shall be allowed. There shall be a break character for use within identifiers. The language and its translators shall not permit identifiers or reserved words to be abbreviated. 2G. Numeric Literals. There shall be built-in decimal literals. There shall be no implicit truncation or rounding of integer and fixed point literals Written in C About 80 KLOC for compiler Front-end about 57 KLOC, code gen about 20 KLOC, VAX-specific code gen about 3 KLOC 7 KLOC for run-time It is difficult to make an accurate estimate of the time taken to write the compiler because the compiler writers had other demands on their time (completing PhDs, teaching, etc.). Fourteen individuals have been involved at various times during the project and have contributed approximately 20 man years to the design and construction of the software. The money spent directly to support the construction of the compiler was [approximately $340k], however this included neither the salaries of four members of the project nor the cost of computer time (we used approximately 30% of a VAX-11/780 over a five year period). CSE401 Au08 21 CSE401 Au08 22 6