Top down vs. bottom up parsing

Similar documents
8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Compilers. Predictive Parsing. Alex Aiken

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Syntactic Analysis. Top-Down Parsing

CSCI312 Principles of Programming Languages

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Table-Driven Parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Monday, September 13, Parsers

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Extra Credit Question

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Types of parsing. CMSC 430 Lecture 4, Page 1

CA Compiler Construction

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Chapter 3. Parsing #1

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS502: Compilers & Programming Systems

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Parsing Part II (Top-down parsing, left-recursion removal)

Wednesday, August 31, Parsers

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468


3. Parsing. Oscar Nierstrasz

LL Parsing: A piece of cake after LR

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Lexical and Syntax Analysis

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Lexical and Syntax Analysis. Top-Down Parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Introduction to Parsing. Comp 412

Chapter 4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

4. Lexical and Syntax Analysis

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Lexical and Syntax Analysis

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

CS 314 Principles of Programming Languages

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

The procedure attempts to "match" the right hand side of some production for a nonterminal.

CS 406/534 Compiler Construction Parsing Part I

How do LL(1) Parsers Build Syntax Trees?

BSCS Fall Mid Term Examination December 2012

Syntax Analysis, III Comp 412

CS 4120 Introduction to Compilers

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

A programming language requires two major definitions A simple one pass compiler

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

Introduction to parsers

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Compila(on (Semester A, 2013/14)

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Syntax Analysis, III Comp 412

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS 321 Programming Languages and Compilers. VI. Parsing

LECTURE 7. Lex and Intro to Parsing

Syntax Analysis Part I

CS308 Compiler Principles Syntax Analyzer Li Jiang

More Bottom-Up Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Concepts Introduced in Chapter 4

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Transcription:

Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs a derivation or parse tree for a sentence, if possible Two common types of parsers: bottom-up or data driven top-down or hypothesis driven A recursive descent parser is a way to implement a topdown parser that is particularly simple

Top down vs. bottom up parsing S The parsing problem is to connect the root node S with the tree leaves, the input Top-down parsers: starts constructing the parse tree at the top (root) and move down towards the leaves. Easy to implement by hand, but requires restricted grammars. E.g.: - Predictive parsers (e.g., LL(k)) Bottom-up parsers: build nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handles larger class of grammars. E.g.: shift-reduce parser (or LR(k) parsers) Both are general techniques that can be made to work for all languages (but not all grammars!). A = 1 + 3 * 4 / 5

Top down vs. bottom up parsing Both are general techniques that can be made to work for all languages (but not all grammars!) Recall that a given language can be described by several grammars Both of these grammars describe the same language E -> E + Num E -> Num E -> Num + E E -> Num The first one, with its left recursion, causes problems for top down parsers For a given parsing technique, we may have to transform the grammar to work with it

How hard is the parsing task? Parsing an arbitrary CFG is O(n 3 ), e.g., it can take time proportional the cube of the number of input symbols. This is bad! (why?) If we constrain the grammar somewhat, we can always parse in linear time. This is good! Linear-time parsing LL parsers Recognize LL grammar Use a top-down strategy LR parsers Parsing complexity Recognize LR grammar Use a bottom-up strategy LL(n) : Left to right, Leftmost derivation, look ahead at most n symbols. LR(n) : Left to right, Rightmost derivation in Reverse, look ahead at most n symbols.

Top Down Parsing Methods Simplest method is a full-backup, recursive descent parser Often used for parsing simple languages Write recursive recognizers (subroutines) for each grammar rule If rules succeeds perform some action (i.e., build a tree node, emit code, etc.) If rule fails, return failure. Caller may try another choice or fail On failure it backs up

Top Down Parsing Methods: Problems When going forward, the parser consumes tokens from the input, so what happens if we have to back up? suggestions? Algorithms that use backup tend to be, in general, inefficient There might be a large number of possibilities to try before finding the right one or giving up Grammar rules which are left-recursive lead to non-termination!

Recursive Decent Parsing: Example For the grammar: <term> -> <factor> {(* /)<factor>}* We could use the following recursive descent parsing subprogram (this one is written in C) void term() { factor(); /* parse first factor*/ while (next_token == ast_code next_token == slash_code) { lexical(); /* get next token */ factor(); /* parse next factor */ } }

Problems Some grammars cause problems for top down parsers Top down parsers do not work with leftrecursive grammars E.g., one with a rule like: E -> E + T We can transform a left-recursive grammar into one which is not A top down grammar can limit backtracking if it only has one rule per non-terminal The technique of rule factoring can be used to eliminate multiple rules for a non-terminal

Left-recursive grammars A grammar is left recursive if it has rules like X -> X β Or if it has indirect left recursion, as in X -> A β A -> X Q: Why is this a problem? A: it can lead to non-terminating recursion!

Left-recursive grammars Consider E -> E + Num E -> Num We can manually or automatically rewrite a grammar removing left-recursion, making it ok for a top-down parser.

Elimination of Left Recursion Consider left-recursive grammar S S α S -> β S generates strings β β α β α α Rewrite using right-recursion S β S S α S ε Concretely T -> T + id T-> id T generates strings id id+id id+id+id Rewrite using rightrecursion T -> id T -> id T

More Elimination of Left-Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ε

General Left Recursion The grammar S A α δ A S β is also left-recursive because S + S β α where + means can be rewritten in one or more steps This indirect left-recursion can also be automatically eliminated

Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar, allowing us to successfully predict which rule to use The gcc compiler now uses recursive descent

Predictive Parser A predictive parser uses information from the first terminal symbol of each expression to decide which production to use A predictive parser, aka an LL(k) parser, does a Left-to-right parse, a Leftmost-derivation, and k- symbol lookahead Grammars where one can decide which rule to use by examining only the first token are LL(1) LL(1) grammars are widely used in practice The syntax of a PL can usually be adjusted to enable it to be described with an LL(1) grammar

Predictive Parser Example: consider the grammar S if E then S else S S begin S L S print E L end L ; S L E num = num An S expression starts either with an IF, BEGIN, or PRINT token, and an L expression start with an END or a SEMICOLON token, and an E expression has only one production.

Remember Given a grammar and a string in the language defined by the grammar There may be more than one way to derive the string leading to the same parse tree it just depends on the order in which you apply the rules and what parts of the string you choose to rewrite next All of the derivations are valid To simplify the problem and the algorithms, we often focus on one of A leftmost derivation A rightmost derivation

LL(k) and LR(k) parsers Two important parser classes are LL(k) and LR(k) The name LL(k) means: L - Left-to-right scanning of the input L - Constructing leftmost derivation k max number of input symbols needed to select parser action The name LR(k) means: L - Left-to-right scanning of the input R - Constructing rightmost derivation in reverse k max number of input symbols needed to select parser action A LR(1) or LL(1) parser never need to look ahead more than one input token to know what parser production rule applies

Predictive Parsing and Left Factoring Consider the grammar E T + E E T T int T int * T T ( E ) Hard to predict because For T, two productions start with int For E, it is not clear how to predict which rule to use A grammar must be left-factored before use for predictive parsing Left-factoring involves rewriting the rules so that, if a nonterminal has more than one rule, each begins with a terminal

Left-Factoring Example Add new non-terminals X and Y to factor out common prefixes of rules E T + E E T T int T int * T T ( E ) E T X X + E X ε T ( E ) T int Y Y * T Y ε

Left Factoring Consider a rule of the form A => a B1 a B2 a B3 a Bn A top down parser generated from this grammar is not efficient as it requires backtracking. To avoid this problem we left factor the grammar. Collect all productions with the same left hand side and begin with the same symbols on the right hand side Combine common strings into a single production and append a new non-terminal to end of this new production Create new productions using this new non-terminal for each of the suffixes to the common production. After left factoring the above grammar is transformed into: A > a A1 A1 -> B1 B2 B3 Bn

Using Parsing Tables LL(1) means that for each non-terminal and token there is only one production Can be specified via 2D tables One dimension for current non-terminal to expand One dimension for next token A table entry contains one production Method similar to recursive descent, except For each non-terminal S We look at the next token a And chose the production shown at [S, a] We use a stack to keep track of pending non-terminals We reject when we encounter an error state We accept when we encounter end-of-input

LL(1) Parsing Table Example Left-factored grammar E T X X + E ε T ( E ) int Y Y * T ε End of input symbol The LL(1) parsing table int * + ( ) $ E T X T X X + E ε ε T int Y ( E ) Y * T ε ε ε

LL(1) Parsing Table Example Consider the [E, int] entry When current non-terminal is E and next input is int, use production E T X Only production that can generate an int in next place Consider the [Y, +] entry When current non-terminal is Y and current token is +, get rid of Y Y can be followed by + only in a derivation where Y ε Consider the [E, *] entry Blank entries indicate error situations E T X X + E ε T ( E ) int Y Y * T ε There is no way to derive a string starting with * from non-terminal E int * + ( ) $ E T X T X X + E ε ε T int Y ( E ) Y * T ε ε ε

LL(1) Parsing Algorithm initialize stack = <S $> and next repeat case stack of <X, rest> : if T[X,*next] = Y 1 Y n then stack <Y 1 Y n rest>; else error (); <t, rest> : if t == *next ++ then stack <rest>; else error (); until stack == < > where: (1) next points to the next input token (2) X matches some non-terminal (3) t matches some terminal

LL(1) Parsing Example Stack Input Action E $ int * int $ pop();push(t X) T X $ int * int $ pop();push(int Y) int Y X $ int * int $ pop();next++ Y X $ * int $ pop();push(* T) * T X $ * int $ pop();next++ T X $ int $ pop();push(int Y) int Y X $ int $ pop();next++; Y X $ $ pop() X $ $ pop() $ $ ACCEPT! E TX X +E X ε T (E) T int Y Y *T Y ε int * + ( ) $ E T X T X X + E ε ε T int Y ( E ) Y * T ε ε ε

Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG If A α, where in the line of A we place α? In the column of t where t can start a string derived from α α * t β We say that t First(α) In the column of t if α is ε and t can follow an A S * β A t δ We say t Follow(A)

Computing First Sets Definition: First(X) = { t X * tα} {ε X * ε} Algorithm sketch (see book for details): 1. for all terminals t do First(t) { t } 2. for each production X ε do First(X) { ε } 3. if X A 1 A n α and ε First(A i ), 1 i n do add First(α) to First(X) 4. for each X A 1 A n s.t. ε First(A i ), 1 i n do add ε to First(X) 5. repeat steps 4 & 5 until no First set can be grown

Recall the grammar First Sets. Example E T X T ( E ) int Y X + E ε Y * T ε First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * }

Computing Follow Sets Definition: Follow(X) = { t S * β X t δ } Intuition If S is the start symbol then $ Follow(S) If X A B then First(B) Follow(A) and Follow(X) Follow(B) Also if B * ε then Follow(X) Follow(A)

Computing Follow Sets Algorithm sketch: 1. Follow(S) { $ } 2. For each production A α X β add First(β) - {ε} to Follow(X) 3. For each A α X β where ε First(β) add Follow(A) to Follow(X) repeat step(s) until no Follow set grows

Follow Sets. Example Recall the grammar E T X X + E ε T ( E ) int Y Y * T ε Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ), $} Follow( ) ) = {+, ), $} Follow( Y ) = {+, ), $} Follow( int) = {*, +, ), $}

Constructing LL(1) Parsing Tables Construct a parsing table T for CFG G For each production A α in G do: For each terminal t First(α) do T[A, t] = α If ε First(α), for each t Follow(A) do T[A, t] = α If ε First(α) and $ Follow(A) do T[A, $] = α

Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored Most programming language grammars are not LL(1) There are tools that build LL(1) tables

Bottom-up Parsing YACC uses bottom up parsing. There are two important operations that bottom-up parsers use: shift and reduce In abstract terms, we do a simulation of a Push Down Automata as a finite state automata Input: given string to be parsed and the set of productions. Goal: Trace a rightmost derivation in reverse by starting with the input string and working backwards to the start symbol

Algorithm 1. Start with an empty stack and a full input buffer. (The string to be parsed is in the input buffer.) 2. Repeat until the input buffer is empty and the stack contains the start symbol. a. Shift zero or more input symbols onto the stack from input buffer until a handle (beta) is found on top of the stack. If no handle is found report syntax error and exit. b. Reduce handle to the nonterminal A. (There is a production A -> beta) 3. Accept input string and return some representation of the derivation sequence found (e.g.., parse tree) The four key operations in bottom-up parsing are shift, reduce, accept and error. Bottom-up parsing is also referred to as shift-reduce parsing. Important thing to note is to know when to shift and when to reduce and to which reduce.

Example of Bottom-up Parsing STACK INPUT BUFFER ACTION $ num1+num2*num3$ shift $num1 +num2*num3$ reduc $F +num2*num3$ reduc $T +num2*num3$ reduc $E +num2*num3$ shift $E+ num2*num3$ shift $E+num2 *num3$ reduc $E+F *num3$ reduc $E+T *num3$ shift E+T* num3$ shift E+T*num3 $ reduc E+T*F $ reduc E+T $ reduc E $ accept E -> E+T T E-T T -> T*F F T/F F -> (E) id -E num