Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Similar documents
Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

How do LL(1) Parsers Build Syntax Trees?

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Building Compilers with Phoenix

A programming language requires two major definitions A simple one pass compiler

Lexical and Syntax Analysis (2)

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.


Chapter 4. Lexical and Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Top down vs. bottom up parsing

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Configuration Sets for CSX- Lite. Parser Action Table

Types of parsing. CMSC 430 Lecture 4, Page 1

Table-Driven Top-Down Parsers

4. Lexical and Syntax Analysis

Wednesday, August 31, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

4. Lexical and Syntax Analysis

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Chapter 4: LR Parsing

Syntax Analysis Part I

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Monday, September 13, Parsers

Lecture 7: Deterministic Bottom-Up Parsing

Context-free grammars

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

CSE 3302 Programming Languages Lecture 2: Syntax

UNIVERSITY OF CALIFORNIA

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

CS 230 Programming Languages

Downloaded from Page 1. LR Parsing

Compiler Construction: Parsing

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Syntax. In Text: Chapter 3

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Bottom-Up Parsing. Lecture 11-12

Abstract Syntax Trees & Top-Down Parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Chapter 4: Syntax Analyzer

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

CSCI312 Principles of Programming Languages

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

CS502: Compilers & Programming Systems

Recursive Descent Parsers

CS 314 Principles of Programming Languages

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CSE302: Compiler Design

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CSE431 Translation of Computer Languages

Lecture 8: Deterministic Bottom-Up Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Bottom-Up Parsing. Lecture 11-12

CS 4120 Introduction to Compilers

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

In One Slide. Outline. LR Parsing. Table Construction

Conflicts in LR Parsing and More LR Parsing Types

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CPS 506 Comparative Programming Languages. Syntax Specification

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Parsing - 1. What is parsing? Shift-reduce parsing. Operator precedence parsing. Shift-reduce conflict Reduce-reduce conflict

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Programming Language Syntax and Analysis

Lecture Bottom-Up Parsing

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Compilers. Predictive Parsing. Alex Aiken

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

CA Compiler Construction

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Compiler Construction 2016/2017 Syntax Analysis

Transcription:

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Outline. Top-down versus Bottom-up Parsing. Recursive Descent Parsing. Left Recursion Removal. Left Factoring. Predictive Parsing.

Introduction. We want to describe now how to write parsers for a grammar written in EBNF. At its most basic a parser needs to be able to recognizes programs in the programming language specified by the grammar. Most likely also want to be able to build a abstract syntax tree and attach some semantics to the program.

Top-down versus Bottom-up Parsing. There are two approaches to parsing: Bottom-up: Here we start from the program and try to match initial segments to left hand sides of rules. When we get a match, the right hand-side of a rule is replaced (reduced) with its left hand side on the stack. These parsers are sometimes called shift/reduce parsers, because one often shifts token onto the stack prior to deciding to do a reduction. Top-down: One starts at the start symbol for the grammar, and replaces non-terminals with the righthand-side of rules until one gets down to terminals which match the input. (Mentioned last-day.) Both methods can be automated. Yacc/Bison is a shift/reduce parser.

Recursive Descent Parsing. One common way to write a top-down parser by hand is to rely on the run-time stack of the language you are writing the compiler in. That is, for each non-terminal you write a procedure. This procedure is supposed to be able to do parsing for that non-terminal. If that nonterminal is the right hand-side of a rule, the procedure will try to match any tokens in the rule to the input, and recursively call procedures for non-terminals on the right hand side of the rule.

Example. Consider: sentence -> nounphrase verbphrase. nounphrase -> article noun. article -> a the. This might yield procedures such as: void sentence (void) {nounphrase();verbphrase;} void nounphrase(void) {article(); noun();} void article(void) {if (token== a ) match( a ); else if (token == the ) match( the ); else error(); } We imagine token is a global variable provided by the scanner/lexer.

What if a non-terminal has multiple things it goes to? We mentioned last day that one problem with ambiguous grammars was that we can t figure out what to put on the stack when doing top-down parsing. Isn t this the same problem? No. As long as the grammar is not ambiguous, we can take an approach like for article above. Consider S->AB CD. We could write: void S() { A(); B(); if(parseerror()) {rewind(); C(); D(); }}. rewind() returns the string to where we started parsing S. This is called backtracking. Ideally, we want to design our grammars so we don t need to do backtracking.

Left Recursion Removal. Another issue with recursive descent is that it will tend to go into an infinite loop if you have a leftrecursive rule. For example, a rule like expr -> expr + term term where left hand side nonterminal is also the leftmost nonterminal on the right-hand side of the rule. This can be fixed by changing the above to expr - > term + expr term, but note this makes + into a right associative operator. Code would look like: void expr(void) {term(); if(token == + ){match( + ); expr();}}

Fixing Associativity. Notice if we write the above in EBNF it becomes expr -> term {+ term}, a term followed by 0 or more + term s. So we see this could be handled by using a loop rather than recursion in our procedure: void expr() {term(); while(token == + ){match( + ); term();}} Just after the second call to term we can handle the associativity as we desire.

Left Factoring. A right recursive rule like: <expr> -> <term> @ <expr> <term>. Can also be rewritten in EBNF as: <expr> -> <term> [@ <expr> ]. This is called left factoring. Consider: <if-statement> -> if(<expr>) <statement> if(<expr>) <statement> else <statement>. This cannot be directly translated into code as both rules begin with the same prefix, but we can factor out the prefix: <if-statement> -> if(<expr>) <statement> [else <statement>]. This can be code viewing the [ ] as an if clause: void ifstatement() {match( if ); match( ( ); expression(); match( ) ); statement(); if(token== else ){match( else ); statement();}}

Predictive Parsing. As we mentioned above, we would like to avoid backtracking. This means we need a way to predict which rule to select for a given nonterminal. For grammars which meet two conditions we now describe this can be done. The idea is that the parser will do a single-symbol lookahead ahead and use that to determine which rule to use.

More on Predictive Parsing. Consider a rule of the form A -> α 1 α 2 α 3 α n. The first condition is that the first symbols of each of the rules must be distinct. For example, for the grammar: <factor> ::= (<expr>) <number>. <number> ::= <digit> {<digit>}. <digit> ::= 0 1 2 3 4 5 6 7 8 9. We need First((<expr>)) and First(<number>) to be disjoint. First((<expr>)) = {(} and First (<number>) = First(<digit>) = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} so the condition holds.

Second Criteria for Predictive Parsing. If we have rule of the form A->α[β]γ (I.e., we have an optional β), then the set of first tokens β can go to must be distinct from the set of tokens that could immediately follow β. For example A -> B [C] D. C-> ae bf. D-> cg. Then First( C ) = {a, b} and Follow( C ) ={c}. So the criteria would hold.