Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Similar documents
Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Parsing II Top-down parsing. Comp 412

Abstract Syntax Trees & Top-Down Parsing

CMSC 330: Organization of Programming Languages

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Extra Credit Question

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Parsing Ambiguity and Syntax Errors

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Introduction to Parsing Ambiguity and Syntax Errors

Syntax Analysis Check syntax and construct abstract syntax tree

Compilers Course Lecture 4: Context Free Grammars

Introduction to Parsing. Lecture 5

CMSC 330: Organization of Programming Languages

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Defining syntax using CFGs

CMSC 330: Organization of Programming Languages. Context Free Grammars

LL Parsing: A piece of cake after LR

Error Handling Syntax-Directed Translation Recursive Descent Parsing

A programming language requires two major definitions A simple one pass compiler

Introduction to Parsing. Lecture 5

CMSC 330: Organization of Programming Languages. Context Free Grammars

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Types of parsing. CMSC 430 Lecture 4, Page 1

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Introduction to Parsing. Lecture 8

Chapter 3. Parsing #1

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Syntax Analysis Part I

CSCI312 Principles of Programming Languages

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Syntax-Directed Translation. Lecture 14

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Optimizing Finite Automata

CS 314 Principles of Programming Languages

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

CSE 3302 Programming Languages Lecture 2: Syntax

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Principles of Programming Languages COMP251: Syntax and Grammars

Properties of Regular Expressions and Finite Automata

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

A Simple Syntax-Directed Translator

Context-Free Grammars

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSE 401 Midterm Exam Sample Solution 2/11/15

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CMPT 755 Compilers. Anoop Sarkar.

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Functional Parsers

Introduction to Parsing

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

CSCE 314 Programming Languages. Functional Parsers

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Compilers. Compiler Construction Tutorial The Front-end

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

3. Parsing. Oscar Nierstrasz

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Downloaded from Page 1. LR Parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

CS 406/534 Compiler Construction Parsing Part I

COP 3402 Systems Software Syntax Analysis (Parser)

Parsing a primer. Ralf Lämmel Software Languages Team University of Koblenz-Landau

Top down vs. bottom up parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Chapter 4: Syntax Analyzer

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

A simple syntax-directed

Table-Driven Top-Down Parsers

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Ambiguity. Lecture 8. CS 536 Spring

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Computer Science 160 Translation of Programming Languages

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

CS 536 Midterm Exam Spring 2013

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Parsing Part II (Top-down parsing, left-recursion removal)

Transcription:

Building a Parser II CS164 3:30-5:00 TT 10 Evans 1

Grammars Programming language constructs have recursive structure. which is why our hand-written parser had this structure, too An expression is either: number, or variable, or expression + expression, or expression - expression, or ( expression ), or 4

Context-free grammars (CFG) a natural notation for this recursive structure grammar for our balanced parens expressions: BalancedExpression a ( BalancedExpression ) describes (generates) strings of symbols: a, (a), ((a)), (((a))), like regular expressions but can refer to other expressions (here, BalancedExpression) and do this recursively (giving is non-finite state ) 5

Example: arithmetic expressions Simple arithmetic expressions: E n id ( E ) E + E E * E Some elements of this language: id n ( n ) n + id id * ( id + id ) 6

Symbols: Terminals and Nonterminals grammars use two kinds of symbols terminals: no rules for replacing them once generated, terminals are permanent these are tokens of our language nonterminals: to be replaced (expanded) in regular expression lingo, these serve as names of expressions start non-terminal: the first symbol to be expanded 7

Notational Conventions In these lecture notes, let s adopt a notation: Non-terminals are written upper-case Terminals are written lower-case or as symbols, e.g., token LPAR is written as ( The start symbol is the left-hand side of the first production 8

Derivations This is how a grammar generates strings: think of grammar rules (called productions) as rewrite rules Derivation: the process of generating a string 1. begin with the start non-terminal 2. rewrite the non-terminal with some of its productions 3. select a non-terminal in your current string i. if no non-terminal left, done. ii. otherwise go to step 2. 9

Example: derivation Grammar: E n id ( E ) E + E E * E a derivation: E rewrite E with ( E ) ( E ) rewrite E with n ( n ) this is the final string of terminals another derivation (written more concisely): E ( E ) ( E * E ) ( E + E * E ) ( n + E * E ) ( n + id * E ) ( n + id * id ) 10

So how do derivations help us in parsing? A program (a string of tokens) has no syntax error if it can be derived from the grammar. but so far you only know how to derive some (any) string, not how to check if a given string is derivable So how to do parsing? a naïve solution: derive all possible strings and check if your program is among them not as bad as it sounds: there are parsers that do this, kind of. Coming soon. 11

Decaf Example A fragment of Decaf: STMT while ( EXPR ) STMT id ( EXPR ) ; EXPR EXPR + EXPR EXPR EXPR EXPR < EXPR ( EXPR ) id 12

Decaf Example (Cont.) Some elements of the (fragment of) language: id ( id ) ; id ( ( ( ( id ) ) ) ) ; while ( id < id ) id ( id ) ; while ( while ( id ) ) id ( id ) ; while ( id ) while ( id ) while ( id ) id ( id ) ; Question: One of the strings is not from the language. Which one? STMT EXPR while ( EXPR ) STMT id ( EXPR ) ; EXPR + EXPR EXPR EXPR EXPR < EXPR ( EXPR ) id 13

CFGs (definition) A CFG consists of A set of terminal symbols T A set of non-terminal symbols N A start symbol S (a non-terminal) A set of productions: produtions are of two forms (X N) X, or X Y 1 Y 2... Y n where Y i N T 14

context-free grammars what is context-free? means the grammar is not context-sensitive context-sensitive gramars can describe more languages than CFGs because their productions restrict when a nonterminal can be rewritten. An example production: d N d A B c meaning: N can be rewritten into ABc only when preceded by d can be used to encode semantic checks, but parsing is hard 15

Now let s parse a string recursive descent parser derives all strings until it matches derived string with the input string or until it is sure there is a syntax error 16

Recursive Descent Parsing Consider the grammar E T + E T T int int * T ( E ) Token stream is: int 5 * int 2 Start with top-level non-terminal E Try the rules for E in order 17

Recursive Descent Parsing. Example (Cont.) Try E 0 T 1 + E 2 Then try a rule for T 1 ( E 3 ) But ( does not match input token int 5 Try T 1 int. Token matches. But + after T 1 does not match input token * Try T 1 int * T 2 This will match but + after T 1 will be unmatched Have exhausted the choices for T 1 Backtrack to choice for E 0 18

Recursive Descent Parsing. Example (Cont.) Try E 0 T 1 Follow same steps as before for T 1 And succeed with T 1 int * T 2 and T 2 int With the following parse tree E 0 T 1 int 5 * T 2 int 2 19

A Recursive Descent Parser (2) Define boolean functions that check the token string for a match of A given token terminal bool term(token tok) { return in[next++] == tok; } A given production of S (the n th ) bool S n () { } Any production of S: bool S() { } These functions advance next 20

A Recursive Descent Parser (3) For production E T + E bool E 1 () { return T() && term(plus) && E(); } For production E T bool E 2 () { return T(); } For all productions of E (with backtracking) bool E() { int save = next; return (next = save, E 1 ()) (next = save, E 2 ()); } 21

A Recursive Descent Parser (4) Functions for non-terminal T bool T 1 () { return term(open) && E() && term(close); } bool T 2 () { return term(int) && term(times) && T(); } bool T 3 () { return term(int); } bool T() { int save = next; return (next = save, T 1 ()) (next = save, T 2 ()) (next = save, T 3 ()); } 22

Recursive Descent Parsing. Notes. To start the parser Initialize next to point to first token Invoke E() Notice how this simulates our backtracking example from lecture Easy to implement by hand Predictive parsing is more efficient 23

Recursive Descent Parsing. Notes. Easy to implement by hand An example implementation is provided as a supplement Recursive Descent Parsing But does not always work 24

Recursive-Descent Parsing Parsing: given a string of tokens t 1 t 2... t n, find its parse tree Recursive-descent parsing: Try all the productions exhaustively At a given moment the fringe of the parse tree is: t 1 t 2 t k A Try all the productions for A: if A BC is a production, the new fringe is t 1 t 2 t k B C Backtrack when the fringe doesn t match the string Stop when there are no more non-terminals 25

When Recursive Descent Does Not Work Consider a production S S a: In the process of parsing S we try the above rule What goes wrong? A left-recursive grammar has a non-terminal S S + S for some Recursive descent does not work in such cases It goes into an loop 26

Elimination of Left Recursion Consider the left-recursive grammar S S S generates all strings starting with a and followed by a number of Can rewrite using right-recursion S S S S 27

Elimination of Left-Recursion. Example Consider the grammar S 1 S 0 ( = 1 and = 0 ) can be rewritten as S 1 S S 0 S 28

More Elimination of Left-Recursion In general S S 1 S n 1 m All strings derived from S start with one of 1,, m and continue with several instances of 1,, n Rewrite as S 1 S m S S 1 S n S 29

General Left Recursion The grammar S A A S is also left-recursive because S + S This left-recursion can also be eliminated See [ASU], Section 4.3 for general algorithm 30

Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar 31