CS 314 Principles of Programming Languages

Similar documents
Types of parsing. CMSC 430 Lecture 4, Page 1

Syntax Analysis Check syntax and construct abstract syntax tree

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

CS 406/534 Compiler Construction Parsing Part I

Syntax Analysis Part I

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Parsing II Top-down parsing. Comp 412

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Introduction to Parsing. Comp 412

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

EECS 6083 Intro to Parsing Context Free Grammars

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSCI312 Principles of Programming Languages

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Principles of Programming Languages COMP251: Syntax and Grammars

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Context-Free Grammars

Introduction to Parsing

Compilers Course Lecture 4: Context Free Grammars

CSE302: Compiler Design

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CA Compiler Construction

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Defining syntax using CFGs

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Parsing Part II (Top-down parsing, left-recursion removal)

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Compiler Design Concepts. Syntax Analysis

Syntactic Analysis. Top-Down Parsing

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Top down vs. bottom up parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7


EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Abstract Syntax Trees & Top-Down Parsing

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CSE 3302 Programming Languages Lecture 2: Syntax

Part 5 Program Analysis Principles and Techniques

CS 314 Principles of Programming Languages. Lecture 3

Introduction to Parsing. Lecture 5

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Syntax. In Text: Chapter 3

CMSC 330: Organization of Programming Languages

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

A Simple Syntax-Directed Translator

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

CMSC 330: Organization of Programming Languages. Context Free Grammars

Concepts Introduced in Chapter 4

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Chapter 4: LR Parsing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Computer Science 160 Translation of Programming Languages

Context-free grammars

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Compiler Construction 2016/2017 Syntax Analysis

Introduction to Parsing. Lecture 5

Compiler Construction: Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Dr. D.M. Akbar Hussain

A programming language requires two major definitions A simple one pass compiler

Habanero Extreme Scale Software Research Project

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Chapter 3. Parsing #1

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Monday, September 13, Parsers

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Wednesday, August 31, Parsers

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Optimizing Finite Automata

Introduction to Parsing. Lecture 8

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Bottom-Up Parsing. Lecture 11-12

CS 314 Principles of Programming Languages

Transcription:

CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018

Class Information Homework 1 is being graded now. The sample solution will be posted soon. Homework 2 posted. Due in one week. 2

Review: Context Free Grammars (CFGs) A formalism to for describing languages A CFG G = < T, N, P, S >: 1. A set T of terminal symbols (tokens). 2. A set N of nonterminal symbols. 3. A set P production (rewrite) rules. 4. A special start symbol S. The language L(G) is the set of sentences of terminal symbols in T* that can be derived from the start symbol S: L(G) = {w T* S * w} 3

Review: Context Free Grammar <if-stmt> ::= if <expr> then <stmt> <expr> ::= id <= id <stmt> ::= id := num Rule 1 $1 1& Rule 2 $0 0$ Rule 3 &1 1$ Rule 4 &0 0& Rule 5 $# A Rule 6 &# B Context free grammar Not a context free grammar CFGs are rewrite systems with restrictions on the form of rewrite (production) rules that can be used. The left hand side of a production rule can only be one non-terminal symbol. 4

A Language May Have Many Grammars Consider G : The Original Grammar G: 1. < letter > ::= A B C Z 2. < digit > ::= 0 1 2 9 3. < ident > ::= < letter > 1. < letter > ::= A B C Z 2. < digit > ::= 0 1 2 9 3. < identifier > ::= < letter > 4. < ident > < letterordigit > 4. < identifier > < letter > 5. < stmt > ::= < ident > := 0 5. < identifier > < digit > 6. < letterordigit > ::= < letter > < digit > 6. < stmt > ::= < identifier > := 0 <stmt> <ident> := 0 X2 := 0 <stmt> <identifier> := 0 <ident> <letterordigit> <identifier> <digit> <letter> <digit> <letter> 2 X 2 X 5

Review: Grammars and Programming Languages Many grammars may correspond to one programming language. Good grammars: Captures the logic structure of the language structure carries some semantic information (example: expression grammar) Use meaningful names Are easy to read Are unambiguous 6

Review: Ambiguous Grammars Time flies like an arrow; fruit flies like a banana. A grammar G is ambiguous iff there exists a w L(G) such that there are: two distinct parse trees for w, or two distinct leftmost derivations for w, or two distinct rightmost derivations for w. We want a unique semantics of our programs, which typically requires a unique syntactic structure. 7

Review: Arithmetic Expression Grammar Parse 8-3 * 2 : < expr > < start > ::= < expr > < expr > :: = < expr > - < expr > < expr > * < expr > < d > < l > < d > :: = 0 1 2 3 9 < l > :: = a b c z Two parse trees! < expr > < expr > - < expr > < expr > * < expr > < d > 8 < expr > * < expr > < d > < d > < expr > - < expr > < d > < d > < d > 2 3 2 8 3 8

Review: Arithmetic Expression Grammar Parse 8-3 * 2 : Two leftmost derivations! < expr > < expr > - < expr > < expr > < expr > * < expr > < d > < expr > * < expr > < d > < d > < expr > - < expr > < d > < d > < d > 8 3 2 8 3 2 leftmost derivation <expr> <expr> - <expr> <d> - <expr> <d> - <expr> * <expr> <d> - <d> * <expr> <d> - <d> * <d> L L L L L leftmost derivation <expr> <expr> * <expr> L L <expr> - <expr> * <expr> L <d> - <expr> * <expr> <d> - <d> * <expr> <d> - <d> * <d> L L 9

Review: Ambiguity How to deal with ambiguity? Change the language Example: Adding new terminal symbols as delimiters. Fix the dangling else, expression grammars. Change the grammar Example: Impose associativity and precedence in an arithmetic expression grammar. 10

Changing the Grammar to Impose Precedence < start > ::= < expr > < expr > ::= < expr > + < expr > < expr > - < expr > < expr > * < expr > < expr > / < expr > < d > < l > < d > ::= 0 1 2 3 9 < l > ::= a b c z < start > ::= < expr > < expr > ::= < expr > + < expr > < expr > - < expr > < term > < term > ::= < term > * < term > < term > / < term > < d > < l > < d > ::= 0 1 2 3 9 < l > ::= a b c z Original Grammar G Modified Grammar G 11

Grouping in Parse Tree Now Reflects Precedence Parse 8-3 * 2 : < start > ::= < expr > < expr > < expr > :: = < expr > + < expr > < expr > - < expr > < expr > - < expr > < term > < term > < term > * < term > < term > ::= < term > * < term > < term > / < term > < d > < d > < d > < d > < l > 8 3 2 < d > :: = 0 1 2 3 9 < l > :: = a b c z Modified Grammar G Only One Possible Parse Tree 12

Precedence Low Precedence: Addition + and Subtraction - Medium Precedence: Multiplication * and Division / Highest Precedence: Exponentiation ^ < start > ::= < expr > < expr > :: = < expr > + < expr > < expr > - < expr > < term > < term > ::= < term > * < term > < term > / < term > < d > < l > < d > :: = 0 1 2 3 9 < l > :: = a b c z 13

Still Have Ambiguity How about 3-2 - 1? < start > 3-2 - 1 OR? < expr > - < expr > < expr > 1 3 < expr > - < expr > 3 2 < start > 3-2 - 1 - < expr > < expr > - < expr > 2 1 Eval: (3-2) - 1 = 0 Eval: 3 - (2-1) = 2 < start > ::= < expr > < expr > :: = < expr > + < expr > < expr > - < expr > < term > < term > ::= < term > * < term > < term > / < term > < d > < l > < d > :: = 0 1 2 3 9 < l > :: = a b c z 14

Still Have Ambiguity How about 3-2 - 1? 3-2 - 1 OR? < start > < expr > - < expr > < expr > 1 3 < expr > - < expr > 3 2 3-2 - 1 < start > - < expr > < expr > - < expr > 2 1 < start > ::= < expr > < expr > :: = < expr > + < expr > < expr > - < expr > < term > < term > ::= < term > * < term > < term > / < term > < d > < l > < d > :: = 0 1 2 3 9 < l > :: = a b c z 15

Still Have Ambiguity Grouping of operators of same precedence not disambiguated. Non-commutative operators: only one parse tree correct. < start > ::= < expr > < expr > :: = < expr > + < expr > < expr > - < expr > < term > < term > ::= < term > * < term > < term > / < term > < d > < l > < d > :: = 0 1 2 3 9 < l > :: = a b c z 16

Imposing Associativity Same grammar with left / right recursion for - : Our choices: < expr > :: = < d > - < expr > < d > < d > :: = 0 1 2 3 9 Or: < expr > :: = < expr > - < d > < d > < d > :: = 0 1 2 3 9 <expr> < d > - <expr> < d > - < d > - <expr> < d > - < d > - < d > - <expr> <expr> <expr> - < d > <expr> - < d > - < d > <expr> - < d > - < d > - <d> Which one do we want for - in the calculator language? 17

Associativity Deals with operators of same precedence Implicit grouping or parenthesizing Left to right: *, /, +, - Right to left: ^ 18

Complete, Unambiguous Arithmetic Expression Grammar < start > ::= < expr > < expr > :: = < expr > + < expr > < expr > - < expr > < expr > * < expr > < expr > / < expr > < expr > ^ < expr > < d > < l > < d > :: = 0 1 2 3 9 < l > :: = a b c z Original Ambiguous Grammar G < start > ::= < expr > < expr > ::= < expr > + < term > < expr > < term > < term > < term > ::= < term > * < factor > < term > / < factor > < factor > < factor >::= < g > ^ < factor > < g > < g > ::= ( < expr > ) < d > < l > < d > ::= 0 1 2 9 < l > ::= a b c z Unambiguous Grammar G 19

Abstract versus Concrete Syntax Concrete Syntax: Representation of a construct in a particular language, including placement of keywords and delimiters. Abstract Syntax: Structure of meaningful components of each language construct. 20

Example: Consider A * B C: Concrete Syntax Tree < S > < E > < E > < T > < S > ::= < E > < E > ::= < E > < T > < T > < T > ::= < T > * id id Abstract Syntax Tree (AST) < T > < T > * id id id id * id id 21

Abstract versus Concrete Syntax Same abstract syntax, different concrete syntax: Pascal while x <> A[i], do i := i + 1 end C while ( x!= A[i]) i = i + 1; 22

Regular vs. Context Free All Regular languages are context free languages Not all context free languages are regular languages Example: N ::= X Y X ::= a X b Y ::= c Y c is equivalent to: a b* c + Question: Is {a n b n n 0} a context free language? Is {a n b n n 0} a regular language? 23

Regular Grammars CFGs with restrictions on the shape of production rules. Left-linear: X ::= a X b N ::= X a b Right-linear: Y ::= a b a b Y N ::= b b Y 24

Complexity of Parsing Classification of languages that can be recognized by specific grammars. Complexity: Regular grammars DFAs O(n) LR grammars Kunth s algorithm O(n) Arbitrary CFGs Earley s algorithm O(n 3 ) CFG CSG RG LR Reading: Arbitrary CSGs LBAs P-SPACE COMPLETE Scott Chapter 2.3.4 (for LR parser) and Chapter 2.4.3 for language class. Earley, Jay (1970), An efficient context-free parsing algorithm, Communications of the ACM. 25

Top - Down Parsing - LL(1) <S> <A> α β Basic Idea: x The parse tree is constructed from the root, expanding non-terminal nodes on the tree s frontier following a leftmost derivation. The input program is read from left to right, and input tokens are read (consumed) as the program is parsed. The next non-terminal symbol is replaced using one of its rules. The particular choice has to be unique and uses parts of the input (partially parsed program), for instance the first token of the remaining input. y 26

Top - Down Parsing - LL(1) (cont.) Example: S ::= a S b ε How can we parse (automatically construct a leftmost derivation) the input string a a a b b b using a PDA (push-down automaton) and only the first symbol of the remaining input? INPUT: a a a b b b eof 27

Predictive Parsing Basic idea: For any two productions A ::= α β, we would like a distinct way of choosing the correct production to expand. For some rhs α G, define FIRST(α) as the set of tokens that appear as the first symbol in some string derived from α. That is x FIRST(α) iff α xγ for some γ, and ε FIRST(α) iff α ε 28

Predictive Parsing For a non-terminal A, define FOLLOW(A) as the set of terminals that can appear immediately to the right of A in some sentential form. Thus, a non-terminal s FOLLOW set specifies the tokens that can legally appear after it. A terminal symbol has no FOLLOW set. FIRST and FOLLOW sets can be constructed automatically 29

Predictive Parsing (cont.) Key Property: Whenever two productions A ::= α and A ::= β both appear in the grammar, we would like FIRST (α) FIRST (β) =, and if α * ε, then FIRST (β) FOLLOW (A) = Analogue case for β * ε. Note: due to first condition, at most one of α and β can derive ε. This would allow the parser to make a correct choice with a lookahead of only one symbol! 30

LL(1) Grammar Define FIRST+(δ) for rule A ::= δ FIRST (δ) - { ε } U Follow (A), if ε FIRST(δ) FIRST (δ) otherwise A Grammar is LL(1) iff (A ::= α and A ::= β) implies FIRST + (α) FIRST + (β) = 31

Next Lecture Next Time: Review and more on LL(1) parsing Read Scott, Chapter 2.3.1-2.3.2 (and some chapter on companion site) 32