Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Similar documents
Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Introduction to Parsing. Lecture 8

Intro To Parsing. Step By Step

Context-Free Grammars

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Ambiguity. Lecture 8. CS 536 Spring

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Context-Free Grammars

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Ambiguity and Errors Syntax-Directed Translation

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Bottom-Up Parsing. Lecture 11-12

Compiler Design Concepts. Syntax Analysis

Compilers and computer architecture From strings to ASTs (2): context free grammars

Compilers Course Lecture 4: Context Free Grammars

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

LR Parsing LALR Parser Generators

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Bottom-Up Parsing. Lecture 11-12

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CS 314 Principles of Programming Languages

COP 3402 Systems Software Syntax Analysis (Parser)

LR Parsing LALR Parser Generators

A Simple Syntax-Directed Translator

Syntax Analysis Part I

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-Free Grammars

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CMSC 330: Organization of Programming Languages

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Principles of Programming Languages COMP251: Syntax and Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Syntax-Directed Translation. Lecture 14

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE 3302 Programming Languages Lecture 2: Syntax

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

Conflicts in LR Parsing and More LR Parsing Types

Abstract Syntax Trees & Top-Down Parsing

Context-Free Languages and Parse Trees

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CMSC 330: Organization of Programming Languages

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Introduction to Parsing

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

Properties of Regular Expressions and Finite Automata

CMSC 330: Organization of Programming Languages

CSCE 314 Programming Languages

Lecture 8: Deterministic Bottom-Up Parsing

In One Slide. Outline. LR Parsing. Table Construction

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSCI312 Principles of Programming Languages!

Context Free Languages

EECS 6083 Intro to Parsing Context Free Grammars

Optimizing Finite Automata

Lecture 7: Deterministic Bottom-Up Parsing

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Defining syntax using CFGs

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Part 5 Program Analysis Principles and Techniques

Introduction to Lexing and Parsing

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

More Bottom-Up Parsing

MIDTERM EXAMINATION - CS130 - Spring 2003

Parsing II Top-down parsing. Comp 412

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Context-Free Grammar (CFG)

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Transcription:

Introduction to Parsing Lecture 5 (Modified by Professor Vijay Ganesh) 1

Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2

Languages and Automata Formal languages are very important in CS specially in programming languages Regular languages The weakest formal languages wely used Many applications We will also study context-free languages, tree languages 3

Beyond Regular Languages Many languages are not regular Strings of balanced parentheses are not regular: {() i i i 0} 4

What Can Regular Languages xpress? Languages requiring counting modulo a fixed integer Intuition: A finite automaton that runs long enough must repeat states Finite automaton can t remember # of times it has visited a particular state 5

The Functionality of the Parser Input: sequence of tokens from lexer Output: parse tree of the program (But some parsers never produce a parse tree...) 6

xample Cool if x = y then 1 else 2 fi Parser input IF ID = ID THN INT LS INT FI Parser output IF-THN-LS = INT INT ID ID 7

Comparison with Lexical Analysis Phase Input Output Lexer Parser String of characters String of tokens String of tokens Parse tree 8

The Role of the Parser Not all strings of tokens are programs...... parser must distinguish between val and inval strings of tokens We need A language for describing val strings of tokens A method for distinguishing val from inval strings of tokens 9

Context-Free Grammars Programming language constructs have recursive structure An XPR is if XPR then XPR else XPR fi while XPR loop XPR pool Context-free grammars are a natural notation for this recursive structure 10

CFGs (Cont.) A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions X YY L Y 1 2 n where { } X N and Y T N ε i 11

Notational Conventions In these lecture notes Non-terminals are written upper-case Terminals are written lower-case The start symbol is the left-hand se of the first production 12

xamples of CFGs A fragment of Cool: XPR if XPR then XPR else XPR fi while XPR loop XPR pool 13

xamples of CFGs (cont.) Simple arithmetic expressions: + ( ) 14

The Language of a CFG Read productions as rules: Means X YL Y 1 n X can be replaced by 1 YL Y n 15

Key Idea 1. Begin with a string consisting of the start symbol S 2. Replace any non-terminal X in the string by a the right-hand se of some production X YL Y 1 n 3. Repeat (2) until there are no non-terminals in the string 16

The Language of a CFG (Cont.) More formally, write X L X L X X L X YL Y X L X 1 i n 1 i 1 1 m i+ 1 n if there is a production X YL Y i 1 m 17

The Language of a CFG (Cont.) Write if X L X YL Y 1 n 1 m in 0 or more steps X L X L L YL Y 1 n 1 m 18

The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: { and every is a terminal } a K a Sa K a a 1 n 1 n i 19

Terminals Terminals are so-called because there are no rules for replacing them Once generated, terminals are permanent Terminals ought to be tokens of the language 20

xamples L(G) is the language of CFG G Strings of balanced parentheses {() i i i 0} Two grammars: S S ( S) ε OR S ( S) ε 21

Cool xample A fragment of COOL: XPR if XPR then XPR else XPR fi while XPR loop XPR pool 22

Cool xample (Cont.) Some elements of the language if then else fi while loop pool if while loop pool then else if if then else fi then else fi 23

Arithmetic xample Simple arithmetic expressions: + () Some elements of the language: + () () () 24

Notes The ea of a CFG is a big step. But: Membership in a language is yes or no ; also need parse tree of the input Must handle errors gracefully Need an implementation of CFG s (e.g., bison) 25

More Notes Form of the grammar is important Many grammars generate the same language Tools are sensitive to the grammar Note: Tools for regular languages (e.g., flex) are sensitive to the form of the regular expression, but this is rarely a problem in practice 26

Derivations and Parse Trees A derivation is a sequence of productions S L L L A derivation can be drawn as a tree Start symbol is the tree s root X YL Y n For a production add children 1 to node X Y1 L Y n 27

Derivation xample Grammar + () String + 28

Derivation xample (Cont.) + + + + * + + 29

Derivation in Detail (1) 30

Derivation in Detail (2) + + 31

Derivation in Detail (3) + + + * 32

Derivation in Detail (4) + + + * + 33

Derivation in Detail (5) + + + + * + 34

Derivation in Detail (6) + + + + * + + 35

Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not 36

Left-most and Right-most Derivations The example is a leftmost derivation At each step, replace the left-most non-terminal There is an equivalent notion of a right-most derivation + + + + + 37

Right-most Derivation in Detail (1) 38

Right-most Derivation in Detail (2) + + 39

Right-most Derivation in Detail (3) + + + 40

Right-most Derivation in Detail (4) + + + * + 41

Right-most Derivation in Detail (5) + + + + * + 42

Right-most Derivation in Detail (6) + + + + * + + 43

Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The difference is the order in which branches are added 44

Summary of Derivations We are not just interested in whether s e L(G) We need a parse tree for s A derivation defines a parse tree But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation 45

Ambiguity Grammar + () String + 46

Ambiguity (Cont.) This string has two parse trees + * * + 47

Ambiguity (Cont.) A grammar is ambiguous if it has more than one parse tree for some string quivalently, there is more than one right-most or left-most derivation for some string Ambiguity is BAD Leaves meaning of some programs ill-defined 48

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously + ' ' ʹ () ʹ () ' nforces precedence of * over + 49

Ambiguity in Arithmetic xpressions Recall the grammar + * ( ) int The string int * int + int has two parse trees: + * * int int + int int int int 50

Ambiguity: The Dangling lse Conser the grammar if then if then else OTHR This grammar is also ambiguous 51

The Dangling lse: xample The expression if 1 then if 2 then 3 else 4 has two parse trees if if 1 if 4 1 if 2 3 2 3 4 Typically we want the second form 52

The Dangling lse: A Fix else matches the closest unmatched then We can describe this in the grammar MIF /* all then are matched */ UIF /* some then is unmatched */ MIF if then MIF else MIF OTHR UIF if then if then MIF else UIF Describes the same set of strings 53

The Dangling lse: xample Revisited The expression if 1 then if 2 then 3 else 4 if if 1 if 1 if 4 2 3 4 2 3 A val parse tree (for a UIF) Not val because the then expression is not a MIF 54

Ambiguity No general techniques for handling ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions We need disambiguation mechanisms 55

Precedence and Associativity Declarations Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars xamples 56

Associativity Declarations Conser the grammar + int Ambiguous: two parse trees of int + int + int + + + int int + int int int int Left associativity declaration: %left + 57

Precedence Declarations Conser the grammar + * int And the string int + int * int * + + int int * int int int Precedence declarations: %left + %left * int 58