LL(1) Grammars. Example. Recursive Descent Parsers. S A a {b,d,a} A B D {b, d, a} B b { b } B λ {d, a} D d { d } D λ { a }

Similar documents
Building A Recursive Descent Parser. Example: CSX-Lite. match terminals, and calling parsing procedures to match nonterminals.

Table-Driven Top-Down Parsers

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Configuration Sets for CSX- Lite. Parser Action Table

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Chapter 3. Parsing #1

How do LL(1) Parsers Build Syntax Trees?

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

The procedure attempts to "match" the right hand side of some production for a nonterminal.

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

CS502: Compilers & Programming Systems

Top down vs. bottom up parsing

CSE431 Translation of Computer Languages

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Monday, September 13, Parsers

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CA Compiler Construction

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CSCI312 Principles of Programming Languages

Lexical and Syntax Analysis (2)

Let s look at the CUP specification for CSX-lite. Recall its CFG is

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

CSE 3302 Programming Languages Lecture 2: Syntax

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Compilers. Predictive Parsing. Alex Aiken

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CS 314 Principles of Programming Languages

CS 536 Midterm Exam Spring 2013

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Optimizing Finite Automata

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Parsing II Top-down parsing. Comp 412

3. Parsing. Oscar Nierstrasz

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Properties of Regular Expressions and Finite Automata

Wednesday, August 31, Parsers

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing


Abstract Syntax Trees & Top-Down Parsing

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

LR Parsing. The first L means the input string is processed from left to right.

Alternatives for semantic processing

Defining syntax using CFGs

Chapter 4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

Recursive Descent Parsers

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

Concepts of Programming Languages Recitation 1: Predictive Parsing

Syntax Analysis Part I

A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G.

Syntactic Analysis. Top-Down Parsing

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

CS 230 Programming Languages

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Programming Language Syntax and Analysis

Chapter 4: Syntax Analyzer

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

컴파일러구성 제 2 강 Recursive-descent Parser / Predictive Parser

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

Syntax-Directed Translation. Lecture 14

Syntax Analysis Top Down Parsing

LL parsing Nullable, FIRST, and FOLLOW

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Downloaded from Page 1. LR Parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Class Information ANNOUCEMENTS

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

A programming language requires two major definitions A simple one pass compiler

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Syntax. In Text: Chapter 3

Chapter 3. Describing Syntax and Semantics ISBN

Part 3. Syntax analysis. Syntax analysis 96

Table-Driven Parsing

Compiler construction in4303 lecture 3

Parsing Part II (Top-down parsing, left-recursion removal)

Transcription:

LL(1) Grammars A context-free grammar whose Predict sets are always disjoint (for the same non-terminal) is said to be LL(1). LL(1) grammars are ideally suited for top-down parsing because it is always possible to correctly predict the expansion of any nonterminal. No backup is ever needed. Formally, let First(X 1...X n )= {a in V t A X 1...X n * a... Follow(A) = {a in V t S +...Aa... Predict(A X 1...X n ) = If X 1...X n * λ Then First(X 1...X n ) U Follow(A) Else First(X 1...X n ) If some CFG, G, has the property that for all pairs of distinct productions with the same lefthand side, A X 1...X n and A Y 1...Y m it is the case that Predict(A X 1...X n ) Predict(A Y 1...Y m ) = φ then G is LL(1). LL(1) grammars are easy to parse in a top-down manner since predictions are always correct. 247 248 Example Recursive Descent Parsers Production Predict Set S A a {b,d,a A B D {b, d, a B b { b B λ {d, a D d { d D λ { a Since the predict sets of both B productions and both D productions are disjoint, this grammar is LL(1). An early implementation of topdown (LL(1)) parsing was recursive descent. A parser was organized as a set of parsing procedures, one for each non-terminal. Each parsing procedure was responsible for parsing a sequence of tokens derivable from its non-terminal. For example, a parsing procedure, A, when called, would call the scanner and match a token sequence derivable from A. Starting with the start symbol s parsing procedure, we would then match the entire input, which must be derivable from the start symbol. 249 250

This approach is called recursive descent because the parsing procedures were typically recursive, and they descended down the input s parse tree (as top-down parsers always do). Building A Recursive Descent Parser We start with a procedure Match, that matches the current input token against a predicted token: void Match(Terminal a) { if (a == currenttoken) currenttoken = Scanner() else SyntaxErrror() To build a parsing procedure for a non-terminal A, we look at all productions with A on the lefthand side: A X 1...X n A Y 1...Y m... We use predict sets to decide which production to match (LL(1) grammars always have disjoint predict sets). We match a production s righthand side by calling Match to 251 252 match terminals, and calling parsing procedures to match nonterminals. The general form of a parsing procedure for A X 1...X n A Y 1...Y m... is Usually this general form isn t used. Instead, each production is macro-expanded into a sequence of Match and parsing procedure calls. void A() { if (currenttoken in Predict(A X 1...X n )) for(i=1i<=ni++) if (X[i] is a terminal) Match(X[i]) else X[i]() else if (currenttoken in Predict(A Y 1...Y m )) for(i=1i<=mi++) if (Y[i] is a terminal) Match(Y[i]) else Y[i]() else // Handle other A... productions else // No production predicted SyntaxError() 253 254

Example: CSX-Lite CSX-Lite Parsing Procedures Production Predict Set Prog { { Stmt id if void Prog() { Match("{") Match("") λ Stmt id = id Stmt if ( ) Stmt if id Etail id Etail + + Etail - - Etail λ ) void () { if (currenttoken == id currenttoken == if){ Stmt() else { /* null */ void Stmt() { if (currenttoken == id){ Match(id) Match("=") () else { Match(if) Match("(") () Match(")") Stmt() 255 256 void () { Match(id) Etail() void Etail() { if (currenttoken == "+") { Match("+") () else if (currenttoken == "-"){ Match("-") () else { /* null */ Let s use recursive descent to parse { a = b + c We start by calling Prog() since this represents the start symbol. Calls Pending Prog() Match("{") Match("") Match("") { a = b + c { a = b + c a = b + c Stmt() Match("") a = b + c Match(id) Match("=") () Match("") a = b + c 257 258

Calls Pending Calls Pending Match("=") () Match("") = b + c Match("+") () Match("") + c () Match("") b + c () Match("") c Match(id) Etail() Match("") b + c Match(id) Etail() Match("") c Etail() Match("") + c Etail() Match("") /* null */ Match("") 259 260 Calls Pending Match("") Match("") /* null */ Match("") Match("") Done! All input matched Syntax Errors in Recursive Descent Parsing In recursive descent parsing, syntax errors are automatically detected. In fact, they are detected as soon as possible (as soon as the first illegal token is seen). How? When an illegal token is seen by the parser, either it fails to predict any valid production or it fails to match an expected token in a call to Match. Let s see how the following illegal CSX-lite program is parsed: { b + c = a (Where should the first syntax error be detected?) 261 262

Calls Pending Prog() Match("{") Match("") Match("") Stmt() Match("") Match(id) Match("=") () Match("") { b + c = a { b + c = a b + c = a b + c = a b + c = a Calls Pending Match("=") () Match("") + c = a Call to Match fails! + c = a 263 264 Table-Driven Top-Down Parsers Recursive descent parsers have many attractive features. They are actual pieces of code that can be read by programmers and extended. This makes it fairly easy to understand how parsing is done. Parsing procedures are also convenient places to add code to build ASTs, or to do typechecking, or to generate code. A major drawback of recursive descent is that it is quite inconvenient to change the grammar being parsed. Any change, even a minor one, may force parsing procedures to be reprogrammed, as productions and predict sets are modified. To a less extent, recursive descent parsing is less efficient than it might be, since subprograms are called just to match a single token or to recognize a righthand side. An alternative to parsing procedures is to encode all prediction in a parsing table. A pre-programed driver program can use a parse table (and list of productions) to parse any LL(1) grammar. If a grammar is changed, the parse table and list of productions will change, but the driver need not be changed. 265 266

LL(1) Parse Tables CSX-lite Example An LL(1) parse table, T, is a twodimensional array. Entries in T are production numbers or blank (error) entries. T is indexed by: A, a non-terminal. A is the nonterminal we want to expand. CT, the current token that is to be matched. T[A][CT] = A X 1...X n if CT is in Predict(A X 1...X n ) T[A][CT] = error if CT predicts no production with A as its lefthand side Production Predict Set 1 Prog { { 2 Stmt id if 3 λ 4 Stmt id = id 5 Stmt if ( ) Stmt if 6 id Etail id 7 Etail + + 8 Etail - - 9 Etail λ ) { if ( ) id = + - eof Prog 1 3 2 2 Stmt 5 4 6 Etail 9 7 8 9 267 268 LL(1) Parser Driver Example of LL(1) Parsing Here is the driver we ll use with the LL(1) parse table. We ll also use a parse stack that remembers symbols we have yet to match. We ll again parse { a = b + c We start by placing Prog (the start symbol) on the parse stack. void LLDriver(){ Push(StartSymbol) while(! stackempty()){ //Let X=Top symbol on parse stack //Let CT = current token to match if (isterminal(x)) { match(x) //CT is updated pop() //X is updated else if (T[X][CT]!= Error){ //Let T[X][CT] = X Y 1...Y m Replace X with Y 1...Y m on parse stack else SyntaxError(CT) Parse Stack Prog { Stmt { a = b + c { a = b + c a = b + c a = b + c 269 270

Parse Stack Parse Stack id = = id Etail a = b + c = b + c b + c b + c Etail + id Etail + c + c c c 271 272 Parse Stack Etail Done! All input matched Syntax Errors in LL(1) Parsing In LL(1) parsing, syntax errors are automatically detected as soon as the first illegal token is seen. How? When an illegal token is seen by the parser, either it fetches an error entry from the LL(1) parse table or it fails to match an expected token. Let s see how the following illegal CSX-lite program is parsed: { b + c = a (Where should the first syntax error be detected?) 273 274

Parse Stack Parse Stack Prog { { b + c = a { b + c = a b + c = a = Current token (+) fails to match expected token (=)! + c = a + c = a Stmt b + c = a id = b + c = a 275 276