( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Similar documents
Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Introduction to Parsing Ambiguity and Syntax Errors

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing Ambiguity and Syntax Errors

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Intro To Parsing. Step By Step

Introduction to Parsing. Lecture 8

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Context-Free Grammars

Ambiguity. Lecture 8. CS 536 Spring

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Context-Free Grammars

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Ambiguity and Errors Syntax-Directed Translation

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Syntax Analysis Check syntax and construct abstract syntax tree

Bottom-Up Parsing. Lecture 11-12

Compilers and computer architecture From strings to ASTs (2): context free grammars

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Bottom-Up Parsing. Lecture 11-12

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Compiler Design Concepts. Syntax Analysis

Compilers Course Lecture 4: Context Free Grammars

LR Parsing LALR Parser Generators

COP 3402 Systems Software Syntax Analysis (Parser)

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Principles of Programming Languages COMP251: Syntax and Grammars

CS 314 Principles of Programming Languages

Syntax-Directed Translation. Lecture 14

LR Parsing LALR Parser Generators

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

A Simple Syntax-Directed Translator

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Syntax Analysis Part I

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Introduction to Parsing

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Context-Free Grammars

CMSC 330: Organization of Programming Languages

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Lecture 8: Deterministic Bottom-Up Parsing

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Properties of Regular Expressions and Finite Automata

CSCE 314 Programming Languages

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CSE 3302 Programming Languages Lecture 2: Syntax

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Lecture 7: Deterministic Bottom-Up Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

EECS 6083 Intro to Parsing Context Free Grammars

Optimizing Finite Automata

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

CMSC 330: Organization of Programming Languages

Abstract Syntax Trees & Top-Down Parsing

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

CMSC 330: Organization of Programming Languages. Context Free Grammars

Conflicts in LR Parsing and More LR Parsing Types

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Part 5 Program Analysis Principles and Techniques

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

In One Slide. Outline. LR Parsing. Table Construction

MIDTERM EXAMINATION - CS130 - Spring 2003

Parsing II Top-down parsing. Comp 412

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Lecture 13. Notation. The rules. Evaluation Rules So Far

CSCI312 Principles of Programming Languages!

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Context-Free Grammar (CFG)

CS 406/534 Compiler Construction Parsing Part I

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CS153: Compilers Lecture 4: Recursive Parsing

Context-Free Languages and Parse Trees

Introduction to Lexing and Parsing

A simple syntax-directed

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CSCI312 Principles of Programming Languages!

Transcription:

Outline Regular languages revisited Introduction to Parsing Lecture 5 Parser overview Context-free grammars (CFG s) Derivations Prof. Aiken CS 143 Lecture 5 1 Ambiguity Prof. Aiken CS 143 Lecture 5 2 Languages and Automata Formal languages are very important in CS specially in programming languages Regular languages The weakest formal languages wely used Many applications Beyond Regular Languages Many languages are not regular Strings of balanced parentheses are not regular: i i ( ) i 0 We will also study context-free languages, tree languages Prof. Aiken CS 143 Lecture 5 3 Prof. Aiken CS 143 Lecture 5 4 What Can Regular Languages xpress? Languages requiring counting modulo a fixed integer Intuition: A finite automaton that runs long enough must repeat states The Functionality of the Parser Input: sequence of tokens from lexer Output: parse tree of the program (But some parsers never produce a parse tree...) Finite automaton can t remember # of times it has visited a particular state Prof. Aiken CS 143 Lecture 5 5 Prof. Aiken CS 143 Lecture 5 6 1

xample Cool x = y then 1 else 2 fi Parser input IF ID = ID THN INT LS INT FI Parser output IF-THN-LS = INT INT Comparison with Lexical Analysis Phase Input Output Lexer Parser String of characters String of tokens String of tokens Parse tree ID ID Prof. Aiken CS 143 Lecture 5 7 Prof. Aiken CS 143 Lecture 5 8 The Role of the Parser Context-Free Grammars Not all strings of tokens are programs...... Parser must distinguish between val and inval strings of tokens We need A language for describing val strings of tokens A method for distinguishing val from inval strings of tokens Prof. Aiken CS 143 Lecture 5 9 Programming language constructs have recursive structure An XPR is XPR then XPR else XPR fi while XPR loop XPR pool Context-free grammars are a natural notation for this recursive structure Prof. Aiken CS 143 Lecture 5 10 CFGs (Cont.) A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions Notational Conventions In these lecture notes Non-terminals are written upper-case Terminals are written lower-case The start symbol is the left-hand se of the first production X YY Y 1 2 n where X N and Y T N i Prof. Aiken CS 143 Lecture 5 11 Prof. Aiken CS 143 Lecture 5 12 2

xamples of CFGs A fragment of Cool: xamples of CFGs (cont.) Simple arithmetic expressions: XPR XPR then XPR else XPR fi while XPR loop XPR pool Prof. Aiken CS 143 Lecture 5 13 Prof. Aiken CS 143 Lecture 5 14 The Language of a CFG Read productions as rules: X Y Y 1 n Means X can be replaced by Y1 Yn Key Idea 1. Begin with a string consisting of the start symbol S 2. Replace any non-terminal X in the string by a the right-hand se of some production X Y Y 1 n 3. Repeat (2) until there are no non-terminals in the string Prof. Aiken CS 143 Lecture 5 15 Prof. Aiken CS 143 Lecture 5 16 The Language of a CFG (Cont.) More formally, write X X X X X Y Y X X 1 i n 1 i1 1 m i1 n there is a production X Y Y i 1 m The Language of a CFG (Cont.) Write in 0 or more steps 1 n 1 X X Y Y X X Y Y 1 n 1 m m Prof. Aiken CS 143 Lecture 5 17 Prof. Aiken CS 143 Lecture 5 18 3

The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: a a S a a and every a is a terminal 1 n 1 n i Terminals Terminals are so-called because there are no rules for replacing them Once generated, terminals are permanent Terminals ought to be tokens of the language Prof. Aiken CS 143 Lecture 5 19 Prof. Aiken CS 143 Lecture 5 20 xamples L(G) is the language of CFG G Strings of balanced parentheses Two grammars: S ( S) S OR i i ( ) i 0 S ( S) Cool xample A fragment of COOL: XPR XPR then XPR else XPR fi while XPR loop XPR pool Prof. Aiken CS 143 Lecture 5 21 Prof. Aiken CS 143 Lecture 5 22 Cool xample (Cont.) Some elements of the language then else fi while loop pool while loop pool then else then else fi then else fi Prof. Aiken CS 143 Lecture 5 23 Arithmetic xample Simple arithmetic expressions: () Some elements of the language: () () () Prof. Aiken CS 143 Lecture 5 24 4

Notes The ea of a CFG is a big step. But: Membership in a language is yes or no ; also need parse tree of the input Must handle errors gracefully More Notes Form of the grammar is important Many grammars generate the same language Tools are sensitive to the grammar Note: Tools for regular languages (e.g., flex) are sensitive to the form of the regular expression, but this is rarely a problem in practice Need an implementation of CFG s (e.g., bison) Prof. Aiken CS 143 Lecture 5 25 Prof. Aiken CS 143 Lecture 5 26 Derivations and Parse Trees A derivation is a sequence of productions S Derivation xample Grammar () A derivation can be drawn as a tree Start symbol is the tree s root For a production X Y add children 1Yn to node X Y Y 1 n String Prof. Aiken CS 143 Lecture 5 27 Prof. Aiken CS 143 Lecture 5 28 Derivation xample (Cont.) Derivation in Detail (1) Prof. Aiken CS 143 Lecture 5 29 Prof. Aiken CS 143 Lecture 5 30 5

Derivation in Detail (2) Derivation in Detail (3) Prof. Aiken CS 143 Lecture 5 31 Prof. Aiken CS 143 Lecture 5 32 Derivation in Detail (4) Derivation in Detail (5) Prof. Aiken CS 143 Lecture 5 33 Prof. Aiken CS 143 Lecture 5 34 Derivation in Detail (6) Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not Prof. Aiken CS 143 Lecture 5 35 Prof. Aiken CS 143 Lecture 5 36 6

Left-most and Right-most Derivations Right-most Derivation in Detail (1) The example is a leftmost derivation At each step, replace the left-most non-terminal There is an equivalent notion of a right-most derivation Prof. Aiken CS 143 Lecture 5 37 Prof. Aiken CS 143 Lecture 5 38 Right-most Derivation in Detail (2) Right-most Derivation in Detail (3) Prof. Aiken CS 143 Lecture 5 39 Prof. Aiken CS 143 Lecture 5 40 Right-most Derivation in Detail (4) Right-most Derivation in Detail (5) Prof. Aiken CS 143 Lecture 5 41 Prof. Aiken CS 143 Lecture 5 42 7

Right-most Derivation in Detail (6) Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The dference is the order in which branches are added Prof. Aiken CS 143 Lecture 5 43 Prof. Aiken CS 143 Lecture 5 44 Summary of Derivations We are not just interested in whether s e L(G) We need a parse tree for s Ambiguity Grammar String () A derivation defines a parse tree But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation Prof. Aiken CS 143 Lecture 5 45 Prof. Aiken CS 143 Lecture 5 46 Ambiguity (Cont.) Ambiguity (Cont.) This string has two parse trees A grammar is ambiguous it has more than one parse tree for some string quivalently, there is more than one right-most or left-most derivation for some string Ambiguity is BAD Leaves meaning of some programs ill-defined Prof. Aiken CS 143 Lecture 5 47 Prof. Aiken CS 143 Lecture 5 48 8

Dealing with Ambiguity Ambiguity in Arithmetic xpressions There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously ' ' ' () () nforces precedence of over Prof. Aiken CS 143 Lecture 5 49 Recall the grammar ( ) int The string int int int has two parse trees: int int int int int int Prof. Aiken CS 143 Lecture 5 50 Ambiguity: The Dangling lse Conser the grammar then then else OTHR The Dangling lse: xample The expression 1 then 2 then 3 else 4 has two parse trees This grammar is also ambiguous 1 4 1 2 3 2 3 4 Typically we want the second form Prof. Aiken CS 143 Lecture 5 51 Prof. Aiken CS 143 Lecture 5 52 The Dangling lse: A Fix else matches the closest unmatched then We can describe this in the grammar MIF / all then are matched / UIF / some then is unmatched / MIF then MIF else MIF OTHR UIF then then MIF else UIF Describes the same set of strings Prof. Aiken CS 143 Lecture 5 53 The Dangling lse: xample Revisited The expression 1 then 2 then 3 else 4 1 2 3 4 A val parse tree (for a UIF) 1 2 3 4 Not val because the then expression is not a MIF Prof. Aiken CS 143 Lecture 5 54 9

Ambiguity No general techniques for handling ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simply the grammar Sometimes allows more natural definitions We need disambiguation mechanisms Prof. Aiken CS 143 Lecture 5 55 Precedence and Associativity Declarations Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars xamples Prof. Aiken CS 143 Lecture 5 56 Associativity Declarations Precedence Declarations Conser the grammar int Ambiguous: two parse trees of int int int int int int int int int Left associativity declaration: %left Prof. Aiken CS 143 Lecture 5 57 Conser the grammar int And the string int int int int int int int int Precedence declarations: %left %left Prof. Aiken CS 143 Lecture 5 58 int 10