The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Similar documents
Chapter 4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

ICOM 4036 Spring 2004

Lexical and Syntax Analysis (2)

4. LEXICAL AND SYNTAX ANALYSIS

Chapter 3. Describing Syntax and Semantics ISBN

Programming Language Syntax and Analysis

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

CS 230 Programming Languages

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Lexical and Syntax Analysis

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Chapter 4: LR Parsing

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones


Syntax. In Text: Chapter 3

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Top down vs. bottom up parsing

Bottom-up Parser. Jungsik Choi

Bottom-Up Parsing. Lecture 11-12

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Context-free grammars

Introduction to Parsing. Comp 412

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Bottom-Up Parsing. Lecture 11-12

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

CSCI312 Principles of Programming Languages

CS 406/534 Compiler Construction Parsing Part I

Compiler Construction: Parsing

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Syntax-Directed Translation. Lecture 14

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Parsing Part II (Top-down parsing, left-recursion removal)

Concepts Introduced in Chapter 4

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Syntax Analysis Part I

UNIT III & IV. Bottom up parsing

Compiler Construction 2016/2017 Syntax Analysis

Lecture 7: Deterministic Bottom-Up Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Recursive Descent Parsers

Parsing II Top-down parsing. Comp 412

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lecture 8: Deterministic Bottom-Up Parsing

LR Parsing Techniques

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

3. Context-free grammars & parsing

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Computer Science 160 Translation of Programming Languages

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Syntactic Analysis. Top-Down Parsing

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CS502: Compilers & Programming Systems

CS 4120 Introduction to Compilers

3. Parsing. Oscar Nierstrasz

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

LR Parsing LALR Parser Generators

Types of parsing. CMSC 430 Lecture 4, Page 1

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

10/12/17. Motivating Example. Lexical and Syntax Analysis (2) Recursive-Descent Parsing. Recursive-Descent Parsing. Recursive-Descent Parsing

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Wednesday, August 31, Parsers

Lexical and Syntax Analysis. Bottom-Up Parsing

Roll No. :... Invigilator's Signature :. CS/B.Tech(CSE)/SEM-7/CS-701/ LANGUAGE PROCESSOR. Time Allotted : 3 Hours Full Marks : 70

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

S Y N T A X A N A L Y S I S LR

CS 314 Principles of Programming Languages

Monday, September 13, Parsers

LECTURE 7. Lex and Intro to Parsing

Parsing - 1. What is parsing? Shift-reduce parsing. Operator precedence parsing. Shift-reduce conflict Reduce-reduce conflict

A programming language requires two major definitions A simple one pass compiler

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Software II: Principles of Programming Languages

Syntax Analysis, III Comp 412

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIDTERM EXAM (Solutions)

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Transcription:

ICOM 4036 Programming Languages Lexical and Syntax Analysis Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing This lecture covers review questions 14-27 This lecture covers problem sets 1-6 Dr. Amirhossein Chinaei, ECE, UPRM Spring 2010 ***Some slides are adapted from the Sebesta s textbook The Parsing Problem (cont d) The Complexity of Parsing Parsers that work for any unambiguous grammar are complex and inefficient ( O(n 3 ), where n is the length of the input ) Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time ( O(n), where n is the length of the input ) ICOM 4036. Ch.4. 396 Recursive-Descent Parsing There is a subprogram for each nonterminal in the grammar, which can parse sentences that can be generated by that nonterminal EBNF is ideally suited for being the basis for a recursive-descent parser, because EBNF minimizes the number of nonterminals A grammar for simple expressions: <expr> <term> {(+ -) <term> <term> <factor> {(* /) <factor> <factor> id int_constant ( <expr> ) ICOM 4036. Ch.4. 397 ICOM 4036. Ch.4. 398 1

Assume we have a lexical analyzer named lex, which puts the next token code in nexttoken The coding process when there is only one RHS: For each terminal symbol in the RHS, compare it with the next input token; if they match, continue, else there is an error For each nonterminal symbol in the RHS, call its associated parsing subprogram ICOM 4036. Ch.4. 399 /* Function expr Parses strings in the language generated by the rule: <expr> <term> {(+ -) <term> */ void expr() { /* Parse the first term */ term(); /* As long as the next token is + or -, call lex to get the next token and parse the next term */ while (nexttoken == ADD_OP nexttoken == SUB_OP){ term(); This particular routine does not detect errors Convention: Every parsing routine leaves the next token in nexttoken 400 /* term Parses strings in the language generated by the rule: <term> -> <factor> {(* /) <factor>) */ void term() { printf("enter <term>\n"); /* Parse the first factor */ factor(); /* As long as the next token is * or /, call lex to get the next token and parse the next factor */ while (nexttoken == MULT_OP nexttoken == DIV_OP) { factor(); printf("exit <term>\n"); /* End of function term */ ICOM 4036. Ch.4. 401 A nonterminal that has more than one RHS requires an initial process to determine which RHS it is to parse The correct RHS is chosen on the basis of the next token of input (the lookahead) The next token is compared with the first token that can be generated by each RHS until a match is found If no match is found, it is a syntax error ICOM 4036. Ch.4. 402 2

/* Function factor Parses strings in the language generated by the rule: <factor> -> id int_const (<expr>) */ void factor() { /* Determine which RHS */ if (nexttoken == ID_CODE nexttoken == INT_CODE) /* For the RHS id or int-const, just call lex */ /* If the RHS is (<expr>) call lex to pass over the left parenthesis, call expr, and check for the right parenthesis */ else if (nexttoken == LP_CODE) { expr(); if (nexttoken == RP_CODE) else error(); /* End of else if (nexttoken == LP_CODE */ The LL Grammar Class The Left Recursion Problem If a grammar has left recursion, either direct or indirect, it cannot be the basis for a top-down parser A grammar can be modified to remove left recursion For each nonterminal, A, 1. Group the A-rules as A Aα 1 Aα m β 1 β 2 β n where none of the β s begins with A 2. Replace the original A-rules with A β 1 A β 2 A β n A A α 1 A α 2 A α m A ε else error(); /* Neither RHS matches */ 403 ICOM 4036. Ch.4. 404 The other characteristic of grammars that disallows top-down parsing is the lack of pairwise disjointness The inability to determine the correct RHS on the basis of one token of lookahead Def: FIRST( ) = {a =>* a (If =>*, is in FIRST( )) Pairwise Disjointness Test: For each nonterminal, A, in the grammar that has more than one RHS, for each pair of rules, A i and A j, it must be true that FIRST( i ) FIRST( j ) = Examples: A a bb A a ab cab ICOM 4036. Ch.4. 405 ICOM 4036. Ch.4. 406 3

Left factoring can resolve the problem Replace <variable> identifier identifier [<expression>] with <variable> identifier <new> <new> [<expression>] or <variable> identifier [[<expression>]] (the outer brackets are metasymbols of EBNF) Bottom-up Parsing The parsing problem is finding the correct RHS in a right-sentential form to reduce to get the previous right-sentential form in the derivation ICOM 4036. Ch.4. 407 ICOM 4036. Ch.4. 408 Bottom-up Parsing (cont d) Intuition about handles: Def: is the handle of the right sentential form = w if and only if S =>* rm Aw => rm w Def: is a phrase of the right sentential form if and only if S =>* = 1 A 2 =>+ 1 2 Def: is a simple phrase of the right sentential form if and only if S =>* = 1 A 2 => 1 2 Bottom-up Parsing (cont d) Intuition about handles (cont d): The handle of a right sentential form is its leftmost simple phrase Given a parse tree, it is now easy to find the handle Parsing can be thought of as handle pruning ICOM 4036. Ch.4. 409 ICOM 4036. Ch.4. 410 4

Shift-Reduce Algorithms Reduce is the action of replacing the handle on the top of the parse stack with its corresponding LHS Shift is the action of moving the next token to the top of the parse stack Advantages of LR parsers: They will work for nearly all grammars that describe programming languages. They work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser. They can detect syntax errors as soon as it is possible. The LR class of grammars is a superset of the class parsable by LL parsers. ICOM 4036. Ch.4. 411 ICOM 4036. Ch.4. 412 Bottom-up Parsing (cont d) LR parsers must be constructed with a tool Knuth s insight: A bottom-up parser could use the entire history of the parse, up to the current point, to make parsing decisions There were only a finite and relatively small number of different parse situations that could have occurred, so the history could be stored in a parser state, on the parse stack An LR configuration stores the state of an LR parser (S 0 X 1 S 1 X 2 S 2 X m S m, a i a i +1 a n $) ICOM 4036. Ch.4. 413 ICOM 4036. Ch.4. 414 5

Structure of an LR Parser LR parsers are table driven, where the table has two components, an ACTION table and a GOTO table The ACTION table specifies the action of the parser, given the parser state and the next token Rows are state names; columns are terminals The GOTO table specifies which state to put on top of the parse stack after a reduction action is done Rows are state names; columns are nonterminals ICOM 4036. Ch.4. 415 ICOM 4036. Ch.4. 416 Initial configuration: (S 0, a 1 a n $) Parser actions: If ACTION[S m, a i ] = Shift S, the next configuration is: (S 0 X 1 S 1 X 2 S 2 X m S m a i S, a i+1 a n $) If ACTION[S m, a i ] = Reduce A and S = GOTO[S m-r, A], where r = the length of, the next configuration is: (S 0 X 1 S 1 X 2 S 2 X m-r S m-r AS, a i a i+1 a n $) Parser actions (continued): If ACTION[S m, a i ] = Accept, the parse is complete and no errors were found. If ACTION[S m, a i ] = Error, the parser calls an error-handling routine. ICOM 4036. Ch.4. 417 ICOM 4036. Ch.4. 418 6

LR Parsing Table A parser table can be generated from a given grammar with a tool, e.g., yacc 419 ICOM 4036. Ch.4. 420 Summary Syntax analysis is a common part of language implementation A lexical analyzer is a pattern matcher that isolates small-scale parts of a program Detects syntax errors Produces a parse tree A recursive-descent parser is an LL parser EBNF Parsing problem for bottom-up parsers: find the substring of current sentential form The LR family of shift-reduce parsers is the most common bottom-up parsing approach ICOM 4036. Ch.4. 421 7