Introduction to Parsing. Lecture 5

Similar documents
Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing. Lecture 8

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Intro To Parsing. Step By Step

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Context-Free Grammars

Ambiguity. Lecture 8. CS 536 Spring

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Compilers and computer architecture From strings to ASTs (2): context free grammars

Context-Free Grammars

Syntax Analysis Check syntax and construct abstract syntax tree

Compilers Course Lecture 4: Context Free Grammars

Bottom-Up Parsing. Lecture 11-12

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Bottom-Up Parsing. Lecture 11-12

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

CS 314 Principles of Programming Languages

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Ambiguity and Errors Syntax-Directed Translation

Compiler Design Concepts. Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

Optimizing Finite Automata

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

LR Parsing LALR Parser Generators

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Lecture 8: Deterministic Bottom-Up Parsing

A Simple Syntax-Directed Translator

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

LR Parsing LALR Parser Generators

CMSC 330: Organization of Programming Languages

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Properties of Regular Expressions and Finite Automata

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lecture 7: Deterministic Bottom-Up Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

Parsing II Top-down parsing. Comp 412

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Semantic Analysis. Lecture 9. February 7, 2018

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

CMSC 330: Organization of Programming Languages

Syntax-Directed Translation. Lecture 14

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CMSC 330: Organization of Programming Languages. Context Free Grammars

In One Slide. Outline. LR Parsing. Table Construction

Conflicts in LR Parsing and More LR Parsing Types

CMSC 330: Organization of Programming Languages

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

EECS 6083 Intro to Parsing Context Free Grammars

Syntax Analysis Part I

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

LECTURE 3. Compiler Phases

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CSE 401 Midterm Exam Sample Solution 2/11/15

Context-Free Languages and Parse Trees

Introduction to Lexing and Parsing

Defining syntax using CFGs

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

2.2 Syntax Definition

CSCI312 Principles of Programming Languages!

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Lecture 4: Syntax Specification

Context-Free Grammars

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CS 406/534 Compiler Construction Parsing Part I

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

CSE 3302 Programming Languages Lecture 2: Syntax

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Topic 5: Syntax Analysis III

CSCI312 Principles of Programming Languages!

LR Parsing. Table Construction

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

Transcription:

Introduction to Parsing Lecture 5 1

Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2

Languages and Automata Formal languages are very important in CS specially in programming languages Regular languages The weakest formal languages widely used Many applications (as we ve seen) We will also study context-free languages, tree languages 3

Beyond Regular Languages Difficulty with regular languages is that many languages are not regular Some are very important They can t be expressed using Rs and FAs x. Strings of balanced parentheses are not regular: Note this is fairly representative of lots of programming constructs {() i i i 0} Note: given as set not R 4

Beyond Regular Languages x. Nested arithmetic expressions ((1+2) * 3) x. Nested if then else statements if if if fi fi then fi then then if here acts like ( in previous example Note that even if language doesn t have the fi like Cool, it is usually implied 5

An xample To Help Understand the Limitations Consider the following DFA 1 0 x: 1111111 1 What does it recognize? 0 Note: doesn t have any way of knowing length of input string 6

Beyond Regular Languages In general: Nesting constructs cannot be handled by regular expressions Raises the questions: What can be expressed? Why are Rs insufficient for recognizing arbitrary nesting constructs? 7

What Can Regular Languages xpress? Languages requiring counting modulo a fixed integer.g., parity Intuition: A finite automaton that runs long enough must repeat states Finite automaton can t remember # of times it has visited a particular state 8

The Functionality of the Parser Input: sequence of tokens from lexer Output: parse tree of the program (But some parsers never produce a parse tree...) 9

xample Cool if x = y then 1 else 2 fi Parser input (from lexical analyzer) IF ID = ID THN INT LS INT FI Parser output IF-THN-LS = INT INT ID ID 10

xample Note: nesting structure has been made explicit by tree Also the three components of the if then else Predicate Then branch lse branch IF-THN-LS = INT INT ID ID 11

Comparison with Lexical Analysis Phase Input Output Lexer Parser String of characters String of tokens String of tokens Parse tree 12

Couple of things As mentioned, sometimes parse tree is only implicit More on this later Many compilers do build full parse tree, many do not There are compilers that combine lexer and parser phases into one phase verything done by the parser Parsing technology powerful enough to express lexical analysis in addition to parsing But most compilers use two phases, because Rs are such a good match for lexical analysis 13

The Role of the Parser Not all strings of tokens are programs...... parser must distinguish between valid and invalid strings of tokens And give error messages for the invalid ones We need A language for describing valid strings of tokens An algorithm for distinguishing valid from invalid strings of tokens 14

Context-Free Grammars Programming language constructs have recursive structure An XPR in Cool can be if XPR then XPR else XPR fi while XPR loop XPR pool Note: Recursively composed of other expressions Context-free grammars are a natural notation for this recursive structure 15

What is a Context-Free Grammar (CFG)? A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions X Y 1 Y 2!Y n where X N and Y T N { ε} i 16

Notational Conventions In these lecture notes Non-terminals are written upper-case Terminals are written lower-case The start symbol is the left-hand side of the first production This is standard for CFGs 17

xamples of CFGs S ( S ) S ε 18

xamples of CFGs S ( S ) S ε What are the parts of the grammar: N =? T =? Start =? 19

xamples of CFGs S ( S ) S ε What are the parts of the grammar: N = { S } T = { (, ) } Start = S (the only nonterminal) 20

xamples of CFGs S ( S ) S ε What are the parts of the grammar: N = { S } T = { (, ) } Start = S (the only nonterminal) Productions? 21

xamples of CFGs A fragment of Cool: XPR if XPR then XPR else XPR fi while XPR loop XPR pool id 22

xamples of CFGs (cont.) Simple arithmetic expressions: + ( ) id 23

The Language of a CFG Read productions as rules: X Y 1!Y n Means X can be replaced by Y 1!Y n That is, in general, the right hand side can replace the left hand side. 24

Key Idea 1. Begin with a string consisting of the start symbol S 2. Replace any non-terminal X in the string by a the right-hand side of some production X Y 1!Y n 3. Repeat (2) until there are no non-terminals in the string So note, the string is changing over time 25

The Language of a CFG (Cont.) More formally, write X 1! X i! X n X 1! X i 1 Y 1!Y m X i+1! X n if there is a production X i Y 1!Y m and say that the left hand side derives the right, or can derive the right hand side, etc. This is one step of a context-free derivation. 26

The Language of a CFG (Cont.) Write if X 1! X n Y 1!Y m in 0 or more steps X 1! X n!! Y 1!Y m We say the left hand side rewrites in zero or more steps to the right hand side 27

So, in general When we write X 0 X n it is shorthand for saying that there is some sequence of individual productions (rules) that get us from X 0 to X n in zero or more steps 28

The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language, L(G), of G is: # & $ a 1 a n S a 1 a n and every a i is a terminal' % ( 29

Terminals Terminals are so-called because there are no rules for replacing them Once generated, terminals are permanent feature of the string Terminals ought to be tokens of the language 30

Recall earlier xample L(G) is the language of CFG G Strings of balanced parentheses {() i i i 0} Two grammars: S S ( S) ε OR S ( S) ε 31

Cool xample A fragment of COOL: XPR if XPR then XPR else XPR fi while XPR loop XPR pool id Recall: Non-terminals are written upper-case Terminals are written lower-case Also, could have written as three productions 32

Cool xample (Cont.) Some elements of the language (why?) id if id then id else id fi while id loop id pool if while id loop id pool then id else id if if id then id else id fi then id else id fi 33

Arithmetic xample Simple arithmetic expressions: + () id Some elements of the language: id id + id (id) id id (id) id id (id) 34

Notes The idea of a CFG is a big step. But: Membership in a language is yes or no ; also need parse tree of the input Must handle errors gracefully Need an implementation of CFG s (e.g., bison) 35

More Notes Form of the grammar is important Many grammars generate the same language Tools are sensitive to the grammar Note: Tools for regular languages (e.g., flex) are sensitive to the form of the regular expression, but this is rarely a problem in practice 36

Completely Off Topic, But Relevant What the heck do we mean by context-free when we use the term context-free grammar Well, it s not context-sensitive Let s look at a couple of context sensitive grammars: 37

Completely Off Topic, But Relevant How do we recognize CFG vs CSG? In a CFG, all productions have only a single nonterminal on the left side of any production In a CSG, some production has more than a single non-terminal on the left side of a production 38

Completely Off Topic, But Relevant What does this have to do with context? In a CFG, you can replace a given nonterminal, using a production, without regard to what symbols surround it (the context ) 39

Completely Off Topic, But Relevant What does this have to do with context? But in a CSG, context matters. For example, in the first grammar, if c preceeds a B, you can replace that with WB. But if b preceeds a B, then you replace it with bb. That is, what you can replace B with depends on the surrounding symbols (the context ) 40

Completely Off Topic, But Relevant What does this have to do with context? In the second CSG below, one can replace an A with the terminal a, but one cannot replace B with a unless B is preceded by A (again, context matters) 41

Why is this Relevant Because when we later cover some of the material in semantic analysis, we ll mention that some programming language constructs are not context-free (and thus cannot be represented by a CFG) 42

Derivations and Parse Trees A derivation is a sequence of productions S!!! A derivation can be drawn as a tree Start symbol is the tree s root For a production add children to node X X Y 1!Y n X Y 1!Y n Y 1 Y n 43

Derivation xample Grammar + () id String id id + id We wish to parse the string 44

Derivation xample (Cont.) + + id id id + id + id + id + * id id id parse tree (of the input string) 45

Derivation in Detail (1) 46

Derivation in Detail (2) + + 47

Derivation in Detail (3) + + + * 48

Derivation in Detail (4) + + + * id + id 49

Derivation in Detail (5) + + + id + * id id + id id 50

Derivation in Detail (6) + + + id + * id id id id + id + id id id 51

Some Interesting Things About Parse Trees A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input Let s go back and take a look The parse tree shows the association of operations, the input string does not Note * binds more tightly than + because * is a subtree of the parse tree 52

Aside: Inorder Traversal 53

An Interesting Question How did I know to pick this particular parse tree for the derivation? It turns out that there is more than one 54

Left-most and Right-most Derivations The example we did is a left-most derivation At each step, replace the left-most non-terminal There is an equivalent notion of a right-most derivation + + id + id id + id id + id 55

Left-most and Right-most Derivations The example we did is a left-most derivation At each step, replace the left-most non-terminal NOT unique Depends on sequence of productions There is an equivalent notion of a right-most derivation + +id + id id + id id id + id 56

Right-most Derivation in Detail (1) 57

Right-most Derivation in Detail (2) + + 58

Right-most Derivation in Detail (3) + + + id id 59

Right-most Derivation in Detail (4) + + +id * id + id 60

Right-most Derivation in Detail (5) + + +id + id * id id + id id 61

Right-most Derivation in Detail (6) + +id + + id * id id + id id id + id id id 62

Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree In this case And this is not an accident The difference is the order in which branches are added Finally, there could be other parse trees that arise from neither left-most or right-most derivation But we are most interested in left-most and rightmost 63

Summary of Derivations We are not just interested in whether s is in L(G) We need a parse tree for s A derivation defines a parse tree But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation 64

Ambiguity Grammar + () id String id id + id 65

Ambiguity (Cont.) This string has two parse trees + * * id id + id id id id 66

Ambiguity (Cont.) A grammar is ambiguous if it has more than one parse tree for some string quivalently, there is more than one right-most or left-most derivation for some string Ambiguity is BAD Leaves meaning of some programs ill-defined ffectively, you re leaving it up to the compiler to pick which of the multiple interpretations of the program is to be used 67

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously + ' ' id ʹ id () ʹ () ' nforces precedence of * over + Because it forces the + to be used higher in tree

Ambiguity in Arithmetic xpressions Recall the grammar + * ( ) int The string int * int + int has two parse trees: + * * int int + int int int int 69

Ambiguity: The Dangling lse Consider the grammar if then if then else OTHR This grammar is also ambiguous Recall parse subtree for if-then-else has 3 branches predicate then clause else clause Parse subtree for if-then has 2 branches predicate then clause 70

The Dangling lse: xample The expression if 1 then if 2 then 3 else 4 has two parse trees if if 1 if 4 1 if 2 3 2 3 4 if ( 1 ) then (if 2 then 3 ) else 4 if ( 1 ) then (if 2 then 3 else 4 )

The Dangling lse: xample The expression if 1 then if 2 then 3 else 4 has two parse trees if if 1 if 4 1 if 2 3 2 3 4 Typically we want the second form else goes with closest if 72

The Dangling lse: A Fix else matches the closest unmatched then We can describe this in the grammar matched if MIF /* all then are matched */ UIF /* some then is unmatched */ MIF if then MIF else MIF OTHR UIF if then if then MIF else UIF Describes the same set of strings unmatched if 73

The Dangling lse: xample Revisited The expression if 1 then if 2 then 3 else 4 if if 1 if 1 if 4 2 3 4 2 3 A valid parse tree (for a UIF) Not valid because the then expression is not a MIF 74

Ambiguity No general techniques for handling ambiguity Impossible to automatically convert an ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions We need disambiguation mechanisms 75

Precedence and Associativity Declarations Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars 76

Precedence and Associativity Declarations Associativity: An operator is left-associative if an operand with this operator on both sides of it belongs to the left operator x. The + operator is typically left-associative, so in the expression 4 + 6 + 10, the 6 belongs to the first plus sign, i.e., (4+6) + 10 Similar definition for right-associative x. The assignment operator = is typically rightassociative, so a = b = c is interpreted as a = (b = c) 77

Precedence and Associativity Declarations What about 5 * 3 6? Not a case of associativity, since different operators on each side of 3 But, we d expect this to be (5 *3) 6, because of the usual precedence rules Tools like Bison and Java CUP allow users to specify associativity and precedence rules using, amazingly enough, precedence and associativity declarations 78

Associativity Declarations Consider the grammar + int Ambiguous: two parse trees of int + int + int + + + int int + int int int int Left associativity declaration: %left + 79

Precedence Declarations Consider the grammar + * int And the string int + int * int * + + int int * int int int Precedence declarations: %left + %left * int 80

Precedence Declarations Consider the grammar + * int And the string int + int * int * + + int int * int int int Precedence declarations: %left + %left * Note that higher precedence operator is lower in list int