Introduction to Parsing Ambiguity and Syntax Errors

Similar documents
Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Introduction to Parsing. Lecture 8

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Intro To Parsing. Step By Step

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Context-Free Grammars

Ambiguity and Errors Syntax-Directed Translation

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Ambiguity. Lecture 8. CS 536 Spring

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Syntax Analysis Check syntax and construct abstract syntax tree

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Context-Free Grammars

Syntax Analysis Part I

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Syntax Analysis. Chapter 4

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Compilers and computer architecture From strings to ASTs (2): context free grammars

Chapter 4: Syntax Analyzer

Compilers Course Lecture 4: Context Free Grammars

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compiler Design Concepts. Syntax Analysis

LR Parsing LALR Parser Generators

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

A Simple Syntax-Directed Translator

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

CS 314 Principles of Programming Languages

LECTURE 3. Compiler Phases

LR Parsing LALR Parser Generators

CMSC 330: Organization of Programming Languages

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Properties of Regular Expressions and Finite Automata

Part 5 Program Analysis Principles and Techniques

Bottom-Up Parsing. Lecture 11-12

Principles of Programming Languages COMP251: Syntax and Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CSCI312 Principles of Programming Languages!

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Topic 5: Syntax Analysis III

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Bottom-Up Parsing. Lecture 11-12

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

COP 3402 Systems Software Syntax Analysis (Parser)

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Syntax-Directed Translation

Defining syntax using CFGs

Optimizing Finite Automata

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Lexical and Syntax Analysis. Top-Down Parsing

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CMSC 330: Organization of Programming Languages

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CMSC 330: Organization of Programming Languages. Context Free Grammars

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Introduction to Lexing and Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CS153: Compilers Lecture 4: Recursive Parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

A programming language requires two major definitions A simple one pass compiler

CMSC 330: Organization of Programming Languages

Part 3. Syntax analysis. Syntax analysis 96

CSE 3302 Programming Languages Lecture 2: Syntax

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

EECS 6083 Intro to Parsing Context Free Grammars

Introduction to Parsing

Syntax-Directed Translation. Lecture 14

CSCE 314 Programming Languages

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CSC 467 Lecture 3: Regular Expressions

3. Parsing. Oscar Nierstrasz

Transcription:

Introduction to Parsing Ambiguity and Syntax rrors

Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity Syntax errors 2

Languages and Automata Formal languages are very important in CS specially in programming languages Regular languages The weakest formal languages widely used Many applications We will also study context-free languages 3

Limitations of Regular Languages Intuition: A finite automaton that runs long enough must repeat states A finite automaton cannot remember number of times it has visited a particular state because a finite automaton has finite memory Only enough to store in which state it is Cannot count, except up to a finite limit Many languages are not regular.g., language of balanced parentheses is not regular: { ( i ) i i 0} 4

The Functionality of the Parser Input: sequence of tokens from lexer Output: parse tree of the program 5

xample If-then-else statement if (x == y) then z = 1; else z = 2; Parser input IF (ID == ID) THN ID = INT; LS ID = INT; Possible parser output IF-THN-LS == = = ID ID ID INT ID INT 6

Comparison with Lexical Analysis Phase Input Output Lexer Parser Sequence of characters Sequence of tokens Sequence of tokens Parse tree 7

The Role of the Parser Not all sequences of tokens are programs... Parser must distinguish between valid and invalid sequences of tokens We need A language for describing valid sequences of tokens A method for distinguishing valid from invalid sequences of tokens 8

Context-Free Grammars Many programming language constructs have a recursive structure.g. A STMT is of the form if COND then STMT else STMT, or while COND do STMT, or Context-free grammars are a natural notation for this recursive structure 9

CFGs (Cont.) A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions Assuming X N the productions are of the form X ε, or X Y 1 Y 2... Y n where Y i N T 10

Notational Conventions In these lecture notes Non-terminals are written upper-case Terminals are written lower-case The start symbol is the left-hand side of the first production 11

xamples of CFGs A fragment of our example language (simplified): STMT if COND then STMT else STMT while COND do STMT id = int 12

xamples of CFGs (cont.) Grammar for simple arithmetic expressions: * + ( ) id 13

The Language of a CFG Read productions as replacement rules: X Y 1... Y n Means X can be replaced by Y 1... Y n (in this order) X ε Means X can be erased (replaced with empty string) 14

Key Idea (1) Begin with a string consisting of the start symbol S (2) Replace any non-terminal X in the string by a right-hand side of some production X Y LY 1 n (3) Repeat (2) until there are no non-terminals in the string 15

The Language of a CFG (Cont.) More formally, we write X LX LX X LX YLY X LX 1 i n 1 i 1 1 m i+ 1 n if there is a production X Y LY i 1 m 16

The Language of a CFG (Cont.) Write if X LX Y LY 1 n 1 X LX LLY LY 1 n 1 m m in 0 or more steps 17

The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: { and every is a terminal } a Ka Sa Ka a 1 n 1 n i 18

Terminals Terminals are called so because there are no rules for replacing them Once generated, terminals are permanent Terminals ought to be tokens of the language 19

xamples L(G) is the language of the CFG G Strings of balanced parentheses {() i i i 0} Two grammars: S S ( S) ε or S ( S) ε 20

xample A fragment of our example language (simplified): STMT if COND then STMT if COND then STMT else STMT while COND do STMT id = int COND (id == id) (id!= id) 21

xample (Cont.) Some elements of the our language id = int if (id == id) then id = int else id = int while (id!= id) do id = int while (id == id) do while (id!= id) do id = int if (id!= id) then if (id == id) then id = int else id = int 22

Arithmetic xample Simple arithmetic expressions: + () id Some elements of the language: id id + id (id) id id (id) id id (id) 23

Notes The idea of a CFG is a big step. But: Membership in a language is just yes or no ; we also need the parse tree of the input Must handle errors gracefully Need an implementation of CFG s (e.g., yacc) 24

More Notes Form of the grammar is important Many grammars generate the same language Parsing tools are sensitive to the grammar Note: Tools for regular languages (e.g., lex/ml-lex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice 25

Derivations and Parse Trees A derivation is a sequence of productions S LLL A derivation can be drawn as a tree Start symbol is the tree s root X Y LY For a production add children 1 n to node X Y1 LY n 26

Derivation xample Grammar String + () id id id + id 27

Derivation xample (Cont.) Ε + * () id + + + id + * id id id + id id + id id id 28

Derivation in Detail (1) Ε + * () id 29

Derivation in Detail (2) Ε + * () id + + 30

Derivation in Detail (3) Ε + * () id + + + * 31

Derivation in Detail (4) Ε + * () id + + + * id + id 32

Derivation in Detail (5) Ε + * () id + + + id + * id id + id id 33

Derivation in Detail (6) Ε + * () id + + + id + * id id id + id id + id id id 34

Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not 35

Left-most and Right-most Derivations What was shown before was a left-most derivation At each step, we replaced the left-most non-terminal Ε + * () id + There is an equivalent notion of a right-most derivation Shown on the right +id + id id + id id id + id 36

Right-most Derivation in Detail (1) Ε + * () id 37

Right-most Derivation in Detail (2) Ε + * () id + + 38

Right-most Derivation in Detail (3) Ε + * () id + + + id id 39

Right-most Derivation in Detail (4) Ε + * () id + + +id * id + id 40

Right-most Derivation in Detail (5) Ε + * () id + + +id + id * id id + id id 41

Right-most Derivation in Detail (6) Ε + * () id + +id + + id * id id + id id id + id id id 42

Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The difference is just in the order in which branches are added 43

Summary of Derivations We are not just interested in whether s L(G) We need a parse tree for s A derivation defines a parse tree But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation 44

Ambiguity Grammar: + * ( ) int The string int * int + int has two parse trees + * * int int + int int int int 45

Ambiguity (Cont.) A grammar is ambiguous if it has more than one parse tree for some string quivalently, there is more than one right-most or left-most derivation for some string Ambiguity is bad Leaves meaning of some programs ill-defined Ambiguity is common in programming languages Arithmetic expressions IF-THN-LS 46

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously T + T T int * T int ( ) This grammar enforces precedence of * over + 47

Ambiguity: The Dangling lse Consider the following grammar S if C then S if C then S else S OTHR This grammar is also ambiguous 48

The Dangling lse: xample The expression if C 1 then if C 2 then S 3 else S 4 has two parse trees if if C 1 if S 4 C 1 if C 2 S 3 C 2 S 3 S 4 Typically we want the second form 49

The Dangling lse: A Fix else should match the closest unmatched then We can describe this in the grammar S MIF /* all then are matched */ UIF /* some then are unmatched */ MIF if C then MIF else MIF OTHR UIF if C then S if C then MIF else UIF Describes the same set of strings 50

The Dangling lse: xample Revisited The expression if C 1 then if C 2 then S 3 else S 4 if if C 1 if C 1 if S 4 C 2 S 3 S 4 C 2 S 3 A valid parse tree (for a UIF) Not valid because the then expression is not a MIF 51

Ambiguity No general techniques for handling ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions We need disambiguation mechanisms 52

Precedence and Associativity Declarations Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars xamples 53

Associativity Declarations Consider the grammar + int Ambiguous: two parse trees of int + int + int + + + int int + int int int int Left associativity declaration: %left + 54

Precedence Declarations Consider the grammar + * int And the string int + int * int * + + int int * int int int int Precedence declarations: %left + %left * 55

rror Handling Purpose of the compiler is To detect non-valid programs To translate the valid ones Many kinds of possible errors (e.g. in C) rror kind xample Detected by Lexical $ Lexer Syntax x*% Parser Semantic int x; y = x(3); Type checker Correctness your favorite program Tester/User 56

Syntax rror Handling rror handler should Report errors accurately and clearly Recover from an error quickly Not slow down compilation of valid code Good error handling is not easy to achieve 57

Approaches to Syntax rror Recovery From simple to complex Panic mode rror productions Automatic local or global correction Not all are supported by all parser generators 58

rror Recovery: Panic Mode Simplest, most popular method When an error is detected: Discard tokens until one with a clear role is found Continue from there Such tokens are called synchronizing tokens Typically the statement or expression terminators 59

Syntax rror Recovery: Panic Mode (Cont.) Consider the erroneous expression (1 + + 2) + 3 Panic-mode recovery: Skip ahead to next integer and then continue (ML)-Yacc: use the special terminal error to describe how much input to skip int + ( ) error int ( error ) 60

Syntax rror Recovery: rror Productions Idea: specify in the grammar based on known common mistakes ssentially promotes common errors to alternative syntax xample: Write 5 x instead of 5 * x Add the production Disadvantage Complicates the grammar 61

Syntax rror Recovery: Past and Present Past Slow recompilation cycle (even once a day) Find as many errors in one cycle as possible Researchers could not let go of the topic Present Quick recompilation cycle Users tend to correct one error/cycle Complex error recovery is needed less Panic-mode seems enough 62