Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Similar documents
Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Syntax Analysis Check syntax and construct abstract syntax tree

Compilers Course Lecture 4: Context Free Grammars

A programming language requires two major definitions A simple one pass compiler

CS 314 Principles of Programming Languages

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Parsing II Top-down parsing. Comp 412

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Extra Credit Question

CMSC 330: Organization of Programming Languages

Defining syntax using CFGs

Types of parsing. CMSC 430 Lecture 4, Page 1

LL Parsing: A piece of cake after LR

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Syntax Analysis Part I

CSCI312 Principles of Programming Languages

Introduction to Parsing. Lecture 5

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Parsing. Lecture 8

Introduction to Parsing. Lecture 5

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

CMSC 330: Organization of Programming Languages. Context Free Grammars

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Top down vs. bottom up parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Context-Free Grammars

CS 536 Midterm Exam Spring 2013

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Computer Science 160 Translation of Programming Languages

CMPT 755 Compilers. Anoop Sarkar.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 4: Syntax Analyzer

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CS 406/534 Compiler Construction Parsing Part I

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Table-Driven Top-Down Parsers

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSE 3302 Programming Languages Lecture 2: Syntax

Parsing Part II (Top-down parsing, left-recursion removal)

3. Parsing. Oscar Nierstrasz

A Simple Syntax-Directed Translator

Introduction to Bottom-Up Parsing

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Chapter 3. Parsing #1

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Syntax-Directed Translation. Lecture 14

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Compiler Design Concepts. Syntax Analysis

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Optimizing Finite Automata

Bottom-Up Parsing. Lecture 11-12

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Functional Parsers

CS153: Compilers Lecture 4: Recursive Parsing

Lecture 14 Sections Mon, Mar 2, 2009

CSCE 314 Programming Languages. Functional Parsers

Building Compilers with Phoenix

Lexical and Syntax Analysis

Principles of Programming Languages COMP251: Syntax and Grammars

Bottom-Up Parsing. Lecture 11-12

Compilers. Compiler Construction Tutorial The Front-end

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Abstract Syntax Trees Synthetic and Inherited Attributes

Properties of Regular Expressions and Finite Automata

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

In One Slide. Outline. LR Parsing. Table Construction

Dr. D.M. Akbar Hussain

Parsing II Top-down parsing. Comp 412

Transcription:

Administrativia Building a Parser II CS164 3:30-5:00 TT 10 Evans PA2 assigned today due in 12 days WA1 assigned today due in a week it s a practice for the exam First midterm Oct 5 will contain some project-inspired questions 1 2 Overview Grammars derivations Recursive descent parser Eliminating left recursion Grammars Programming language constructs have recursive structure. which is why our hand-written parser had this structure, too An expression is either: number, or variable, or expression + expression, or expression - expression, or ( expression ), or 3 4 Context-free grammars (CFG) Example: arithmetic expressions a natural notation for this recursive structure grammar for our balanced parens expressions: BalancedExpression d a ( BalancedExpression ) describes (generates) strings of symbols: a, (a), ((a)), (((a))), like regular expressions but can refer to other expressions (here, BalancedExpression) and do this recursively (giving is non-finite state ) Simple arithmetic expressions: E d n id ( E ) E + E E * E Some elements of this language: id n ( n ) n + id id * ( id + id ) 5 6 1

Symbols: Terminals and Nonterminals grammars use two kinds of symbols terminals: no rules for replacing them once generated, terminals are permanent these are tokens of our language nonterminals: to be replaced (expanded) in regular expression lingo, these serve as names of expressions start non-terminal: the first symbol to be expanded Notational Conventions In these lecture notes, let s adopt a notation: Non-terminals are written upper-case Terminals are written lower-case or as symbols, e.g., token LPAR is written as ( The start symbol is the left-hand side of the first production 7 8 Derivations This is how a grammar generates strings: think of grammar rules (called productions) as rewrite rules Derivation: the process of generating a string 1. begin with the start non-terminal 2. rewrite the non-terminal with some of its productions 3. select a non-terminal in your current string i. if no non-terminal left, done. ii. otherwise go to step 2. Example: derivation Grammar: E d n id ( E ) E + E E * E a derivation: E rewrite E with ( E ) ( E ) rewrite E with n ( n ) this is the final string of terminals another derivation (written more concisely): E d ( E ) d ( E * E ) d ( E + E * E ) d ( n + E * E ) d ( n + id * E ) d ( n + id * id ) 9 10 So how do derivations help us in parsing? A program (a string of tokens) has no syntax error if it can be derived from the grammar. but so far you only know how to derive some (any) string, not how to check if a given string is derivable So how to do parsing? a naïve solution: derive all possible strings and check if your program is among them not as bad as it sounds: there are parsers that do this, kind of. Coming soon. Decaf Example A fragment of Decaf: STMT d while ( EXPR ) STMT id ( EXPR ) ; EXPR d EXPR + EXPR EXPR EXPR EXPR < EXPR ( EXPR ) id 11 12 2

Decaf Example (Cont.) Some elements of the (fragment of) language: id ( id ) ; id ( ( ( ( id ) ) ) ) ; while ( id < id ) id ( id ) ; while ( while ( id ) ) id ( id ) ; while ( id ) while ( id ) while ( id ) id ( id ) ; Question: One of the strings is not from the language. Which one? STMT EXPR d while ( EXPR ) STMT id ( EXPR ) ; d EXPR + EXPR EXPR EXPR EXPR < EXPR ( EXPR ) id CFGs (definition) A CFG consists of A set of terminal symbols T A set of non-terminal symbols N A start symbol S (a non-terminal) A set of productions: produtions are of two forms (X N) X d ε, or X d Y 1 Y 2... Y n where Y i N 4 T 13 14 context-free grammars Now let s parse a string what is context-free? means the grammar is not context-sensitive context-sensitive gramars can describe more languages than CFGs because their productions restrict when a nonterminal can be rewritten. An example production: d N d d A B c meaning: N can be rewritten into ABc only when preceded by d can be used to encode semantic checks, but parsing is hard recursive descent parser derives all strings until it matches derived string with the input string or until it is sure there is a syntax error 15 16 Recursive Descent Parsing Consider the grammar E T + E T T int int * T ( E ) Token stream is: int 5 * int 2 Start with top-level non-terminal E Try the rules for E in order Recursive Descent Parsing. Example (Cont.) Try E 0 T 1 + E 2 Then try a rule for T 1 ( E 3 ) But ( does not match input token int 5 Try T 1 int. Token matches. But + after T 1 does not match input token * Try T 1 int * T 2 This will match but + after T 1 will be unmatched Have exhausted the choices for T 1 Backtrack to choice for E 0 17 18 3

Recursive Descent Parsing. Example (Cont.) Try E 0 T 1 Follow same steps as before for T 1 And succeed with T 1 int * T 2 and T 2 int With the following parse tree E 0 T 1 int 5 * T 2 A Recursive Descent Parser (2) Define boolean functions that check the token string for a match of A given token terminal bool term(token tok) { return in[next++] == tok; } A given production of S (the n th ) bool S n () { } Any production of S: bool S() { } int 2 These functions advance next 19 20 A Recursive Descent Parser (3) A Recursive Descent Parser (4) For production E T + E bool E 1 () { return T() && term(plus) && E(); } For production E T bool E 2 () { return T(); } For all productions of E (with backtracking) bool E() { int save = next; return (next = save, E 1 ()) (next = save, E 2 ()); } Functions for non-terminal T bool T 1 () { return term(open) && E() && term(close); } bool T 2 () { return term(int) && term(times) && T(); } bool T 3 () { return term(int); } bool T() { int save = next; return (next = save, T 1 ()) (next = save, T 2 ()) (next = save, T 3 ()); } 21 22 Recursive Descent Parsing. Notes. Recursive Descent Parsing. Notes. To start the parser Initialize next to point to first token Invoke E() Notice how this simulates our backtracking example from lecture Easy to implement by hand Predictive parsing is more efficient Easy to implement by hand An example implementation is provided as a supplement Recursive Descent Parsing But does not always work 23 24 4

Recursive-Descent Parsing Parsing: given a string of tokens t 1 t 2... t n, find its parse tree Recursive-descent parsing: Try all the productions exhaustively At a given moment the fringe of the parse tree is: t 1 t 2 t k A Try all the productions for A: if A d BC is a production, the new fringe is t 1 t 2 t k B C Backtrack when the fringe doesn t match the string Stop when there are no more non-terminals When Recursive Descent Does Not Work Consider a production S S a: In the process of parsing S we try the above rule What goes wrong? A left-recursive grammar has a non-terminal S S + Sα for some α Recursive descent does not work in such cases It goes into an loop 25 26 Elimination of Left Recursion Elimination of Left-Recursion. Example Consider the left-recursive grammar S S α β Consider the grammar S d 1 S 0 ( β = 1 and α = 0 ) S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S βs S αs ε can be rewritten as S d 1 S S d 0 S ε 27 28 More Elimination of Left-Recursion General Left Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ε The grammar S A α δ A S β is also left-recursive because S + S β α This left-recursion can also be eliminated See [ASU], Section 4.3 for general algorithm 29 30 5

Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar 31 6