Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Similar documents
( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5

Introduction to Parsing Ambiguity and Syntax Errors

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Introduction to Parsing Ambiguity and Syntax Errors

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Introduction to Parsing. Lecture 8

Intro To Parsing. Step By Step

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Context-Free Grammars

Ambiguity. Lecture 8. CS 536 Spring

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Context-Free Grammars

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Ambiguity and Errors Syntax-Directed Translation

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12

Compiler Design Concepts. Syntax Analysis

Compilers and computer architecture From strings to ASTs (2): context free grammars

Compilers Course Lecture 4: Context Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

LR Parsing LALR Parser Generators

CS 314 Principles of Programming Languages

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

COP 3402 Systems Software Syntax Analysis (Parser)

LR Parsing LALR Parser Generators

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

A Simple Syntax-Directed Translator

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Syntax-Directed Translation. Lecture 14

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Syntax Analysis Part I

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-Free Grammars

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

CMSC 330: Organization of Programming Languages

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE 3302 Programming Languages Lecture 2: Syntax

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Conflicts in LR Parsing and More LR Parsing Types

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Abstract Syntax Trees & Top-Down Parsing

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

CMSC 330: Organization of Programming Languages

Introduction to Parsing

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CMSC 330: Organization of Programming Languages. Context Free Grammars

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Lecture 8: Deterministic Bottom-Up Parsing

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

Properties of Regular Expressions and Finite Automata

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CMSC 330: Organization of Programming Languages

CSCE 314 Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

In One Slide. Outline. LR Parsing. Table Construction

CSCI312 Principles of Programming Languages!

Lecture 7: Deterministic Bottom-Up Parsing

EECS 6083 Intro to Parsing Context Free Grammars

Optimizing Finite Automata

Context Free Languages

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CS 406/534 Compiler Construction Parsing Part I

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Context-Free Languages and Parse Trees

Defining syntax using CFGs

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Introduction to Lexing and Parsing

Part 5 Program Analysis Principles and Techniques

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Introduction to Syntax Analysis. The Second Phase of Front-End

Introduction to Syntax Analysis

Topic 5: Syntax Analysis III

MIDTERM EXAMINATION - CS130 - Spring 2003

Parsing II Top-down parsing. Comp 412

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Lecture 13. Notation. The rules. Evaluation Rules So Far

Transcription:

Outline Regular languages revisited Introduction to Parsing Lecture 5 Parser overview Context-free grammars (CFG s) Derivations Prof. Aiken CS 143 Lecture 5 1 Ambiguity Prof. Aiken CS 143 Lecture 5 2 Languages and Automata Formal languages are very important in CS specially in programming languages Regular languages The weakest formal languages wely used Many applications Beyond Regular Languages Many languages are not regular Strings of balanced parentheses are not regular: {() i i i 0} We will also study context-free languages, tree languages Prof. Aiken CS 143 Lecture 5 3 Prof. Aiken CS 143 Lecture 5 4 1!

What Can Regular Languages xpress? The Functionality of the Parser Languages requiring counting modulo a fixed integer Intuition: A finite automaton that runs long enough must repeat states Input: sequence of tokens from lexer Output: parse tree of the program (But some parsers never produce a parse tree...) Finite automaton can t remember # of times it has visited a particular state Prof. Aiken CS 143 Lecture 5 5 Prof. Aiken CS 143 Lecture 5 6 xample Comparison with Lexical Analysis Cool x = y then 1 else 2 fi Parser input IF ID = ID THN INT LS INT FI Parser output IF-THN-LS = INT INT Phase Input Output Lexer Parser String of characters String of tokens String of tokens Parse tree ID ID Prof. Aiken CS 143 Lecture 5 7 Prof. Aiken CS 143 Lecture 5 8 2!

The Role of the Parser Context-Free Grammars Not all strings of tokens are programs...... parser must distinguish between val and inval strings of tokens We need A language for describing val strings of tokens A method for distinguishing val from inval strings of tokens Prof. Aiken CS 143 Lecture 5 9 Programming language constructs have recursive structure An XPR is XPR then XPR else XPR fi while XPR loop XPR pool Context-free grammars are a natural notation for this recursive structure Prof. Aiken CS 143 Lecture 5 10 CFGs (Cont.) A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions Notational Conventions In these lecture notes Non-terminals are written upper-case Terminals are written lower-case The start symbol is the left-hand se of the first production X dy 1 Y 2 Y n where X c N and Y i ct 4 N 4 { } Prof. Aiken CS 143 Lecture 5 11 Prof. Aiken CS 143 Lecture 5 12 3!

xamples of CFGs A fragment of Cool: XPR XPR then XPR else XPR fi while XPR loop XPR pool xamples of CFGs (cont.) Simple arithmetic expressions: ( ) Prof. Aiken CS 143 Lecture 5 13 Prof. Aiken CS 143 Lecture 5 14 The Language of a CFG Read productions as rules: X dy 1 Y n means X can be replaced by Y 1 Y n Key Idea 1. Begin with a string consisting of the start symbol S 2. Replace any non-terminal X in the string by a the right-hand se of some production X dy 1 Y n 3. Repeat (2) until there are no non-terminals in the string Prof. Aiken CS 143 Lecture 5 15 Prof. Aiken CS 143 Lecture 5 16 4!

The Language of a CFG (Cont.) More formally, write The Language of a CFG (Cont.) Write X 1 X i-1 X i X i1... X n d X 1 X i-1 Y 1... Y m X i1... X n there is a production X 1 X n d Y 1... Y m X 1 X n d d d Y 1... Y m X i dy 1 Y m in 0 or more steps Prof. Aiken CS 143 Lecture 5 17 Prof. Aiken CS 143 Lecture 5 18 The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: Terminals Terminals are so-called because there are no rules for replacing them {a 1 a n S da 1 a n and every a i is a terminal } Once generated, terminals are permanent Terminals ought to be tokens of the language Prof. Aiken CS 143 Lecture 5 19 Prof. Aiken CS 143 Lecture 5 20 5!

xamples L(G) is the language of CFG G Strings of balanced parentheses Two grammars: S ( S) S ε OR {() i i i 0} S ( S) ε Cool xample A fragment of COOL: XPR XPR then XPR else XPR fi while XPR loop XPR pool Prof. Aiken CS 143 Lecture 5 21 Prof. Aiken CS 143 Lecture 5 22 Cool xample (Cont.) Some elements of the language then else fi while loop pool while loop pool then else then else fi then else fi Prof. Aiken CS 143 Lecture 5 23 Arithmetic xample Simple arithmetic expressions: () Some elements of the language: () () () Prof. Aiken CS 143 Lecture 5 24 6!

Notes The ea of a CFG is a big step. But: Membership in a language is yes or no We also need a parse tree of the input Must handle errors gracefully More Notes Form of the grammar is important Many grammars generate the same language Tools are sensitive to the grammar Note: Tools for regular languages (e.g., flex) are sensitive to the form of the regular expression, but this is rarely a problem in practice Need an implementation of CFG s (e.g., bison) Prof. Aiken CS 143 Lecture 5 25 Prof. Aiken CS 143 Lecture 5 26 Derivations and Parse Trees A derivation is a sequence of productions S d d A derivation can be drawn as a tree Start symbol is the tree s root For a production X dy 1 Y n add children Y 1 Y n to node X Derivation xample Grammar String () Prof. Aiken CS 143 Lecture 5 27 Prof. Aiken CS 143 Lecture 5 28 7!

Derivation xample (Cont.) Derivation in Detail (1) Prof. Aiken CS 143 Lecture 5 29 Prof. Aiken CS 143 Lecture 5 30 Derivation in Detail (2) Derivation in Detail (3) Prof. Aiken CS 143 Lecture 5 31 Prof. Aiken CS 143 Lecture 5 32 8!

Derivation in Detail (4) Derivation in Detail (5) Prof. Aiken CS 143 Lecture 5 33 Prof. Aiken CS 143 Lecture 5 34 Derivation in Detail (6) Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not Prof. Aiken CS 143 Lecture 5 35 Prof. Aiken CS 143 Lecture 5 36 9!

Left-most and Right-most Derivations Right-most Derivation in Detail (1) The example is a leftmost derivation At each step, replace the left-most non-terminal There is an equivalent notion of a right-most derivation Prof. Aiken CS 143 Lecture 5 37 Prof. Aiken CS 143 Lecture 5 38 Right-most Derivation in Detail (2) Right-most Derivation in Detail (3) Prof. Aiken CS 143 Lecture 5 39 Prof. Aiken CS 143 Lecture 5 40 10!

Right-most Derivation in Detail (4) Right-most Derivation in Detail (5) Prof. Aiken CS 143 Lecture 5 41 Prof. Aiken CS 143 Lecture 5 42 Right-most Derivation in Detail (6) Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The dference is the order in which branches are added Prof. Aiken CS 143 Lecture 5 43 Prof. Aiken CS 143 Lecture 5 44 11!

Summary of Derivations We are not just interested in whether s c L(G) We need a parse tree for s Ambiguity Grammar String () A derivation defines a parse tree But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation Prof. Aiken CS 143 Lecture 5 45 Prof. Aiken CS 143 Lecture 5 46 Ambiguity (Cont.) Ambiguity (Cont.) This string has two parse trees A grammar is ambiguous it has more than one parse tree for some string quivalently, there is more than one right-most or left-most derivation for some string Ambiguity is BAD Leaves meaning of some programs ill-defined Prof. Aiken CS 143 Lecture 5 47 Prof. Aiken CS 143 Lecture 5 48 12!

Dealing with Ambiguity Ambiguity in Arithmetic xpressions There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously ' ' ' () () nforces precedence of over ʹ Prof. Aiken CS 143 Lecture 5 49 ʹ Recall the grammar ( ) int The string int int int has two parse trees: int int int int int int Prof. Aiken CS 143 Lecture 5 50 Ambiguity: The Dangling lse Conser the grammar then then else OTHR The Dangling lse: xample The expression 1 then 2 then 3 else 4 has two parse trees This grammar is also ambiguous 1 4 1 2 3 2 3 4 Typically we want the second form Prof. Aiken CS 143 Lecture 5 51 Prof. Aiken CS 143 Lecture 5 52 13!

The Dangling lse: A Fix else matches the closest unmatched then We can describe this in the grammar MIF / all then are matched / UIF / some then is unmatched / MIF then MIF else MIF OTHR UIF then then MIF else UIF Describes the same set of strings Prof. Aiken CS 143 Lecture 5 53 The Dangling lse: xample Revisited The expression 1 then 2 then 3 else 4 1 2 3 4 A val parse tree (for a UIF) 1 2 3 4 Not val because the then expression is not a MIF Prof. Aiken CS 143 Lecture 5 54 Ambiguity No general techniques for handling ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simply the grammar Sometimes allows more natural definitions We need disambiguation mechanisms Precedence and Associativity Declarations Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars xamples Prof. Aiken CS 143 Lecture 5 55 Prof. Aiken CS 143 Lecture 5 56 14!

Associativity Declarations Conser the grammar int Ambiguous: two parse trees of int int int int int int int int int Left associativity declaration: %left Prof. Aiken CS 143 Lecture 5 57 Precedence Declarations Conser the grammar int And the string int int int int int int int int Precedence declarations: %left %left Prof. Aiken CS 143 Lecture 5 58 int 15!