MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Similar documents
MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

LECTURE 7. Lex and Intro to Parsing

Parsing II Top-down parsing. Comp 412

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

A programming language requires two major definitions A simple one pass compiler

CSCI312 Principles of Programming Languages

CS 406/534 Compiler Construction Parsing Part I

Syntax Analysis, III Comp 412

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Types of parsing. CMSC 430 Lecture 4, Page 1

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax Analysis, III Comp 412

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Parsing Part II (Top-down parsing, left-recursion removal)

Top down vs. bottom up parsing

Syntactic Analysis. Top-Down Parsing

Chapter 4. Lexical and Syntax Analysis

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Building Compilers with Phoenix

Computer Science 160 Translation of Programming Languages

Chapter 3. Describing Syntax and Semantics ISBN

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

CS 230 Programming Languages

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CIT 3136 Lecture 7. Top-Down Parsing

Compilers. Predictive Parsing. Alex Aiken

Introduction to Parsing. Lecture 8

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Introduction to Parsing

CA Compiler Construction

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Abstract Syntax Trees & Top-Down Parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Chapter 4: Syntax Analyzer

Lexical and Syntax Analysis. Top-Down Parsing

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

A Simple Syntax-Directed Translator

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Lexical and Syntax Analysis

Introduction to Parsing. Comp 412

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Lexical and Syntax Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lecture 8: Deterministic Bottom-Up Parsing

Parsing III. (Top-down parsing: recursive descent & LL(1) )

4. Lexical and Syntax Analysis

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Lecture 7: Deterministic Bottom-Up Parsing

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Introduction to Syntax Analysis Recursive-Descent Parsing

More Assigned Reading and Exercises on Syntax (for Exam 2)

4. Lexical and Syntax Analysis

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Compiler Construction D7011E

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Test I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Department of Electrical Engineering and Computer Science

Lexical and Syntax Analysis (2)

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Syntax. In Text: Chapter 3

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Compiler Design Concepts. Syntax Analysis

3. Parsing. Oscar Nierstrasz

Syntax-Directed Translation. Lecture 14

Downloaded from Page 1. LR Parsing

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

CSE 3302 Programming Languages Lecture 2: Syntax

CS 314 Principles of Programming Languages

LL parsing Nullable, FIRST, and FOLLOW

Parsing CSCI-400. Principles of Programming Languages.

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Chapter 3. Parsing #1

CSC 467 Lecture 13-14: Semantic Analysis

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Context-Free Grammars

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Transcription:

MIT 6.035 Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Orientation Language specification Lexical structure regular expressions Syntactic structure grammar This Lecture - recursive descent parsers Code parser as set of mutually recursive procedures Structure of program matches structure of grammar

Starting Point Assume lexical analysis has produced a sequence of tokens Each token has a type and value Types correspond to terminals Values to contents of token read in Examples Int 549 integer token with value 549 read in if - if keyword, no need for a value AddOp + - add operator, value +

Basic Approach Start with Start symbol Build a leftmost derivation If leftmost symbol is nonterminal, choose a production and apply it If leftmost symbol is terminal, match against input If all terminals match, have found a parse! Key: find correct productions for nonterminals

Graphical Illustration of Leftmost Derivation Sentential Form NT 1 T 1 T 2 T 3 NT 2 NT 3 Apply Production Here Not Here

Grammar for Parsing Example Start Expr Expr Expr + Expr Expr - Expr * Int / Int Int Set of tokens is { +, -, *, /, Int }, where Int = [0-9][0-9]* For convenience, may represent each Int n token by n

Parse Tree Start Parsing Example Remaining Input 2-2*2 Sentential Form Start Current Position in Parse Tree

Parse Tree Start Expr Parsing Example Remaining Input 2-2*2 Sentential Form Expr Applied Production Current Position in Parse Tree Start Expr

Parse Tree Start Expr Parsing Example Remaining Input 2-2*2 Sentential Form Expr - Expr - Expr Expr + Expr Expr - Expr Applied Production Expr Expr -

Parse Tree Start Expr Parsing Example Remaining Input 2-2*2 Sentential Form Expr - Expr Expr + Expr Expr - Expr - Applied Production Expr

Parsing Example Parse Tree Start Expr Expr - Int Remaining Input 2-2*2 Sentential Form Int - Applied Production Int

Parsing Example Parse Tree Start Expr Match Input Token! Remaining Input 2-2*2 Sentential Form Expr - 2 - Int 2

Parsing Example Parse Tree Start Expr Match Input Token! Remaining Input -2*2 Sentential Form Expr - 2 - Int 2

Parsing Example Parse Tree Start Expr Match Input Token! Remaining Input 2*2 Sentential Form Expr - 2 - Int 2

Parsing Example Parse Tree Start Expr Expr - Remaining Input 2*2 Sentential Form 2 - *Int * Int Int 2 Applied Production * Int

Parsing Example Parse Tree Start Expr Expr - Remaining Input 2*2 Sentential Form 2 - Int * Int * Int Int 2 Int Applied Production Int

Parsing Example Parse Tree Start Expr Match Input Token! Remaining Input 2*2 Sentential Form Expr - 2-2* Int * Int Int 2 Int 2

Parsing Example Parse Tree Start Expr Match Input Token! Remaining Input *2 Sentential Form Expr - 2-2* Int * Int Int 2 Int 2

Parsing Example Parse Tree Start Expr Match Input Token! Remaining Input 2 Sentential Form Expr - 2-2* Int * Int Int 2 Int 2

Parsing Example Parse Tree Start Expr Expr - Parse Complete! Remaining Input 2 Sentential Form 2-2*2 * Int 2 Int 2 Int 2

Summary Three Actions (Mechanisms) Apply production to expand current nonterminal in parse tree Match current terminal (consuming input) Accept the parse as correct Parser generates preorder traversal of parse tree visit parents before children visit siblings from left to right

Policy Problem Which production to use for each nonterminal? Classical Separation of Policy and Mechanism One Approach: Backtracking Treat it as a search problem At each choice point, try next alternative If it is clear that current try fails, go back to previous choice and try something different General technique for searching Used a lot in classical AI and natural language processing (parsing, speech recognition)

Parse Tree Start Backtracking Example Remaining Input 2-2*2 Sentential Form Start

Parse Tree Start Expr Backtracking Example Remaining Input 2-2*2 Sentential Form Expr Applied Production Start Expr

Parse Tree Start Expr Expr + Backtracking Example Remaining Input 2-2*2 Sentential Form Expr + Applied Production Expr Expr +

Parse Tree Start Expr Expr + Backtracking Example Remaining Input 2-2*2 Sentential Form + Applied Production Expr

Parse Tree Start Expr Expr + Int Backtracking Example Match Input Token! Remaining Input 2-2*2 Sentential Form Int + Applied Production Int

Parse Tree Start Expr Expr + Int 2 Backtracking Example Can t Match Input Token! Remaining Input -2*2 Sentential Form 2 - Applied Production Int

Parse Tree Start Expr Backtracking Example So Backtrack! Remaining Input 2-2*2 Sentential Form Expr Applied Production Start Expr

Parse Tree Start Expr Expr - Backtracking Example Remaining Input 2-2*2 Sentential Form Expr - Applied Production Expr Expr -

Parse Tree Start Expr Expr - Backtracking Example Remaining Input 2-2*2 Sentential Form - Applied Production Expr

Parse Tree Start Expr Expr - Int Backtracking Example Remaining Input 2-2*2 Sentential Form Int - Applied Production Int

Parse Tree Start Expr Expr - Int 2 Backtracking Example Match Input Token! Remaining Input -2*2 Sentential Form 2 -

Parse Tree Start Expr Expr - Int 2 Backtracking Example Match Input Token! Remaining Input 2*2 Sentential Form 2 -

Left Recursion + Top-Down Parsing = Infinite Loop Example Production: *Num Potential parsing steps: * Num * Num * Num

General Search Issues Three components Search space (parse trees) Search algorithm (parsing algorithm) Goal to find (parse tree for input program) Would like to (but can t always) ensure that Find goal (hopefully quickly) if it exists Search terminates if it does not Handled in various ways in various contexts Finite search space makes it easy Exploration strategies for infinite search space Sometimes one goal more important (model checking) For parsing, hack grammar to remove left recursion

Eliminating Left Recursion Start with productions of form A A A, sequences of terminals and nonterminals that do not start with A Repeated application of A A builds parse tree like this: A A A

Eliminating Left Recursion Replacement productions A A A R R is a new nonterminal A R R R Old Parse Tree A A New Parse Tree A R R R

Hacked Grammar Original Grammar Fragment * Int / Int Int New Grammar Fragment Int * Int / Int

Parse Tree Comparisons Original Grammar New Grammar Int * * Int Int Int * Int * Int

Eliminating Left Recursion Changes search space exploration algorithm Eliminates direct infinite recursion But grammar less intuitive Sets things up for predictive parsing

Predictive Parsing Alternative to backtracking Useful for programming languages, which can be designed to make parsing easier Basic idea Look ahead in input stream Decide which production to apply based on next tokens in input stream We will use one token of lookahead

Predictive Parsing Example Grammar Start Expr Expr Expr Expr + Expr Expr - Expr Expr Int * Int / Int

Choice Points Assume is current position in parse tree Have three possible productions to apply * Int / Int Use next token to decide If next token is *, apply * Int If next token is /, apply / Int Otherwise, apply

Predictive Parsing + Hand Coding = Recursive Descent Parser One procedure per nonterminal NT Productions NT 1,, NT n Procedure examines the current input symbol T to determine which production to apply If T First( k ) Apply production k Consume terminals in k (check for correct terminal) Recursively call procedures for nonterminals in k Current input symbol stored in global variable token Procedures return true if parse succeeds false if parse fails

Boolean () Example if (token = Int n) token = NextToken(); return(prime()) else return(false) Boolean Prime() if (token = *) token = NextToken(); if (token = Int n) token = NextToken(); return(prime()) else return(false) else if (token = /) token = NextToken(); if (token = Int n) token = NextToken(); return(prime()) else return(false) else return(true) Int * Int / Int

Multiple Productions With Same Example Grammar NT if then NT if then else Prefix in RHS Assume NT is current position in parse tree, and if is the next token Unclear which production to apply Multiple k such that T First( k ) if First(if then) if First(if then else)

Solution: Left Factor the Grammar New Grammar Factors Common Prefix Into Single Production NT if then NT NT else NT No choice when next token is if! All choices have been unified in one production.

Nonterminals What about productions with nonterminals? NT NT 1 1 NT NT 2 2 Must choose based on possible first terminals that NT 1 and NT 2 can generate What if NT 1 or NT 2 can generate? Must choose based on 1 and 2

NT derives Two rules NT implies NT derives NT NT 1... NT n and for all 1 i n NT i derives implies NT derives

Fixed Point Algorithm for Derives for all nonterminals NT set NT derives to be false for all productions of the form NT set NT derives to be true while (some NT derives changed in last iteration) for all productions of the form NT NT 1... NT n if (for all 1 i n NT i derives ) set NT derives to be true

First( ) T First( ) if T can appear as the first symbol in a derivation starting from 1) T First(T ) 2) First(S ) First(S ) 3) NT derives implies First( ) First(NT ) 4) NT S implies First(S ) First(NT ) Notation T is a terminal, NT is a nonterminal, S is a terminal or nonterminal, and is a sequence of terminals or nonterminals

Rules + Request Generate System of Subset Inclusion Constraints Grammar * Int / Int Rules 1) T First(T ) 2) First(S) First(S ) 3) NT derives implies First( ) First(NT ) 4) NT S implies First(S ) First(NT ) Request: What is First( )? Constraints First(* Num ) First( ) First(/ Num ) First( ) First(*) First(* Num ) First(/) First(/ Num ) * First(*) / First(/)

Constraint Propagation Algorithm Constraints First(* Num ) First( ) First(/ Num ) First( ) First(*) First(* Num ) First(/) First(/ Num ) * First(*) / First(/) Solution First( ) = {} First(* Num ) = {} First(/ Num ) = {} First(*) = {*} First(/) = {/} Initialize Sets to {} Propagate Constraints Until Fixed Point

Constraint Propagation Algorithm Constraints First(* Num ) First( ) First(/ Num ) First( ) First(*) First(* Num ) First(/) First(/ Num ) * First(*) / First(/) Solution First( ) = {} First(* Num ) = {} First(/ Num ) = {} First(*) = {*} First(/) = {/} Grammar * Int / Int

Constraint Propagation Algorithm Constraints First(* Num ) First( ) First(/ Num ) First( ) First(*) First(* Num ) First(/) First(/ Num ) * First(*) / First(/) Solution First( ) = {} First(* Num ) = {*} First(/ Num ) = {/} First(*) = {*} First(/) = {/} Grammar * Int / Int

Constraint Propagation Algorithm Constraints First(* Num ) First( ) First(/ Num ) First( ) First(*) First(* Num ) First(/) First(/ Num ) * First(*) / First(/) Solution First( ) = {*,/} First(* Num ) = {*} First(/ Num ) = {/} First(*) = {*} First(/) = {/} Grammar * Int / Int

Constraint Propagation Algorithm Constraints First(* Num ) First( ) First(/ Num ) First( ) First(*) First(* Num ) First(/) First(/ Num ) * First(*) / First(/) Solution First( ) = {*,/} First(* Num ) = {*} First(/ Num ) = {/} First(*) = {*} First(/) = {/} Grammar * Int / Int

Building A Parse Tree Have each procedure return the section of the parse tree for the part of the string it parsed Use exceptions to make code structure clean

Building Parse Tree In Example () if (token = Int n) oldtoken = token; token = NextToken(); node = Prime(); if (node == NULL) return oldtoken; else return(new Node(oldToken, node); else throw SyntaxError Prime() if (token = *) (token = /) first = token; next = NextToken(); if (next = Int n) token = NextToken(); return(new PrimeNode(first, next, Prime()) else throw SyntaxError else return(null)

Parse Tree for 2*3*4 Concrete Parse Tree Desired Abstract Parse Tree Int 2 * Int 3 * Int 4 Int 2 * * Int 3 Int 4

Why Use Hand-Coded Parser? Why not use parser generator? What do you do if your parser doesn t work? Recursive descent parser write more code Parser generator Hack grammar But if parser generator doesn t work, nothing you can do If you have complicated grammar Increase chance of going outside comfort zone of parser generator Your parser may NEVER work

Bottom Line Recursive descent parser properties Probably more work But less risk of a disaster - you can almost always make a recursive descent parser work May have easier time dealing with resulting code Single language system No need to deal with potentially flaky parser generator No integration issues with automatically generated code If your parser development time is small compared to rest of project, or you have a really complicated language, use hand-coded recursive descent parser

Summary Top-Down Parsing Use Lookahead to Avoid Backtracking Parser is Hand-Coded Set of Mutually Recursive Procedures

Direct Generation of Abstract Tree Prime builds an incomplete tree Missing leftmost child Returns root and incomplete node (root, incomplete) = Prime() Called with token = * Remaining tokens = 3 * 4 root incomplete Missing left child to be filled in by caller * * Int 3 Int 4

Code for () if (token = Int n) leftmostint = token; token = NextToken(); (root, incomplete) = Prime(); if (root == NULL) return leftmostint; incomplete.leftchild = leftmostint; return root; else throw SyntaxError Input to parse 2*3*4 token Int 2

Code for () if (token = Int n) leftmostint = token; token = NextToken(); (root, incomplete) = Prime(); if (root == NULL) return leftmostint; incomplete.leftchild = leftmostint; return root; else throw SyntaxError Input to parse 2*3*4 token Int 2

Code for () if (token = Int n) leftmostint = token; token = NextToken(); (root, incomplete) = Prime(); if (root == NULL) return leftmostint; incomplete.leftchild = leftmostint; return root; else throw SyntaxError Input to parse 2*3*4 token Int 2

Code for () if (token = Int n) leftmostint = token; token = NextToken(); (root, incomplete) = Prime(); if (root == NULL) return leftmostint; incomplete.leftchild = leftmostint; return root; else throw SyntaxError root Input to parse 2*3*4 incomplete leftmostint Int 2 * * Int 3 Int 4

Code for () if (token = Int n) leftmostint = token; token = NextToken(); (root, incomplete) = Prime(); if (root == NULL) return leftmostint; incomplete.leftchild = leftmostint; return root; else throw SyntaxError root Input to parse 2*3*4 incomplete leftmostint Int 2 * * Int 3 Int 4

Code for () if (token = Int n) leftmostint = token; token = NextToken(); (root, incomplete) = Prime(); if (root == NULL) return leftmostint; incomplete.leftchild = leftmostint; return root; else throw SyntaxError root Input to parse 2*3*4 incomplete leftmostint Int 2 * * Int 3 Int 4

Code for Prime Prime() if (token = *) (token = /) op = token; next = NextToken(); if (next = Int n) token = NextToken(); (root, incomplete) = Prime(); if (root == NULL) root = new ExprNode(NULL, op, next); return (root, root); else else throw SyntaxError else return(null,null) newchild = new ExprNode(NULL, op, next); incomplete.leftchild = newchild; return(root, newchild); Missing left child to be filled in by caller