Error Handling Syntax-Directed Translation Recursive Descent Parsing

Similar documents
Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Ambiguity and Errors Syntax-Directed Translation

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Introduction to Parsing Ambiguity and Syntax Errors

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Introduction to Parsing Ambiguity and Syntax Errors

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Lecture 14 Sections Mon, Mar 2, 2009

Chapter 4: Syntax Analyzer

Abstract Syntax Trees Synthetic and Inherited Attributes

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

A programming language requires two major definitions A simple one pass compiler

Syntax-Directed Translation. Lecture 14

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Syntax Analysis. Chapter 4

Context-Free Grammars

Recursive Descent Parsers

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing. Lecture 8

LR Parsing LALR Parser Generators

Top down vs. bottom up parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Principles of Programming Languages

LECTURE 3. Compiler Phases

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Syntax-Directed Translation

CS /534 Compiler Construction University of Massachusetts Lowell

Syntax Analysis Part I

Syntax-Directed Translation. Introduction

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Syntax-Directed Translation

CS 314 Principles of Programming Languages

Compilers. Compiler Construction Tutorial The Front-end

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Syntax-Directed Translation Part I

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

A simple syntax-directed

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 5. Exam 1 overview. Type checking basics. HW4 due. HW5 out, due in 2 Tuesdays

Syntax-Directed Translation Part II

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Lexical and Syntax Analysis. Top-Down Parsing

Introduction to Parsing. Lecture 5

UNIT-4 (COMPILER DESIGN)

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CMSC 330: Organization of Programming Languages

Syntax Intro and Overview. Syntax

CSCI312 Principles of Programming Languages!

Building Compilers with Phoenix

Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

Chapter 4 - Semantic Analysis. June 2, 2015

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Context-free grammars (CFG s)

CMPT 379 Compilers. Parse trees

Topic 3: Syntax Analysis I

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

A Simple Syntax-Directed Translator

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Introduction to Compiler

Syntax Errors; Static Semantics

Semantic Analysis. Lecture 9. February 7, 2018

Chapter 4 :: Semantic Analysis

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Semantic Analysis. Compiler Architecture

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

CS5363 Final Review. cs5363 1

CS 406/534 Compiler Construction Putting It All Together

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CMSC 330: Organization of Programming Languages

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

LR Parsing LALR Parser Generators

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Extra Credit Question

Lexical and Syntax Analysis

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Programming Lecture 3

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Context-free grammars

Compiler Construction

Introduction to Parsing. Lecture 5

Transcription:

Error Handling Syntax-Directed Translation Recursive Descent Parsing Lecture 6 by Professor Vijay Ganesh) 1

Outline Recursive descent Extensions of CFG for parsing Precedence declarations Error handling Semantic actions Constructing a parse tree 2

Recursive Descent Parsing E T T + E T int int * T ( E ) E ( int 5 ) 3

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( int 5 ) 4

Recursive Descent Parsing E T T + E T int int * T ( E ) E T int Mismatch: int is not (! Backtrack ( int 5 ) 5

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( int 5 ) 6

Recursive Descent Parsing E T T + E T int int * T ( E ) E T int * T Mismatch: int is not (! Backtrack ( int 5 ) 7

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( int 5 ) 8

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( E ) Match! Advance input. ( int 5 ) 9

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( E ) ( int 5 ) 10

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( E ) ( int 5 ) T 11

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( E ) Match! Advance input. ( int 5 ) T int 12

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( E ) Match! Advance input. ( int 5 ) T int 13

Recursive Descent Parsing E T T + E T int int * T ( E ) E T ( E ) End of input, accept. ( int 5 ) T int 14

A Recursive Descent Parser. Preliminaries Let TOKEN be the type of tokens Special tokens INT, OPEN, CLOSE, PLUS, TIMES Let the global next point to the next token 15

A (Limited) Recursive Descent Parser (2) Define boolean functions that check the token string for a match of A given token terminal bool term(token tok) { return *next++ == tok; } The nth production of S: bool S n () { } Try all productions of S: bool S() { } 16

A (Limited) Recursive Descent Parser (3) For production E T bool E 1 () { return T(); } For production E T + E bool E 2 () { return T() && term(plus) && E(); } For all productions of E (with backtracking) bool E() { TOKEN *save = next; return (next = save, E 1 ()) (next = save, E 2 ()); } 17

A (Limited) Recursive Descent Parser (4) Functions for non-terminal T bool T 1 () { return term(int); } bool T 2 () { return term(int) && term(times) && T(); } bool T 3 () { return term(open) && E() && term(close); } bool T() { TOKEN *save = next; return (next = save, T 1 ()) (next = save, T 2 ()) (next = save, T 3 ()); } 18

Recursive Descent Parsing. Notes. To start the parser Initialize next to point to first token Invoke E() Notice how this simulates the example parse Easy to implement by hand But not completely general Cannot backtrack once a production is successful Works for grammars where at most one production can succeed for a non-terminal 19

Example E T T + E ( int ) T int int * T ( E ) bool term(token tok) { return *next++ == tok; } bool E 1 () { return T(); } bool E 2 () { return T() && term(plus) && E(); } bool E() {TOKEN *save = next; return (next = save, E 1 ()) (next = save, E 2 ()); } bool T 1 () { return term(int); } bool T 2 () { return term(int) && term(times) && T(); } bool T 3 () { return term(open) && E() && term(close); } bool T() { TOKEN *save = next; return (next = save, T 1 ()) (next = save, T 2 ()) (next = save, T 3 ()); } 20

When Recursive Descent Does Not Work Consider a production S S a bool S 1 () { return S() && term(a); } bool S() { return S 1 (); } S() goes into an infinite loop A left-recursive grammar has a non-terminal S S + Sα for some α Recursive descent does not work in such cases 21

Elimination of Left Recursion Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S β S S α S ε 22

More Elimination of Left-Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ε 23

General Left Recursion The grammar S A α δ A S β is also left-recursive because S + S β α This left-recursion can also be eliminated See Dragon Book for general algorithm Section 4.3 24

Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar 25

Error Handling Purpose of the compiler is To detect non-valid programs To translate the valid ones Many kinds of possible errors (e.g. in C) Error kind Example Detected by Lexical $ Lexer Syntax x *% Parser Semantic int x; y = x(3); Type checker Correctness your favorite program Tester/User 26

Syntax Error Handling Error handler should Report errors accurately and clearly Recover from an error quickly Not slow down compilation of valid code Good error handling is not easy to achieve 27

Approaches to Syntax Error Recovery From simple to complex Panic mode Error productions Automatic local or global correction Not all are supported by all parser generators 28

Error Recovery: Panic Mode Simplest, most popular method When an error is detected: Discard tokens until one with a clear role is found Continue from there Such tokens are called synchronizing tokens Typically the statement or expression terminators 29

Syntax Error Recovery: Panic Mode (Cont.) Consider the erroneous expression (1 + + 2) + 3 Panic-mode recovery: Skip ahead to next integer and then continue Bison: use the special terminal error to describe how much input to skip E int E + E ( E ) error int ( error ) 30

Syntax Error Recovery: Error Productions Idea: specify in the grammar known common mistakes Essentially promotes common errors to alternative syntax Example: Write 5 x instead of 5 * x Add the production E E E Disadvantage Complicates the grammar 31

Error Recovery: Local and Global Correction Idea: find a correct nearby program Try token insertions and deletions Exhaustive search Disadvantages: Hard to implement Slows down parsing of correct programs Nearby is not necessarily the intended program Not all tools support it 32

Syntax Error Recovery: Past and Present Past Slow recompilation cycle (even once a day) Find as many errors in one cycle as possible Researchers could not let go of the topic Present Quick recompilation cycle Users tend to correct one error/cycle Complex error recovery is less compelling Panic-mode seems enough 33

Abstract Syntax Trees So far a parser traces the derivation of a sequence of tokens The rest of the compiler needs a structural representation of the program Abstract syntax trees Like parse trees but ignore some details Abbreviated as AST 34

Abstract Syntax Tree. (Cont.) Consider the grammar E int ( E ) E + E And the string 5 + (2 + 3) After lexical analysis (a list of tokens) int 5 + ( int 2 + int 3 ) During parsing we build a parse tree 35

Example of Parse Tree E E + E Traces the operation of the parser int 5 ( E ) Does capture the nesting structure E + int 2 E int 3 But too much info Parentheses Single-successor nodes 36

Example of Abstract Syntax Tree PLUS PLUS 5 2 3 Also captures the nesting structure But abstracts from the concrete syntax => more compact and easier to use An important data structure in a compiler 37

Semantic Actions This is what we ll use to construct ASTs Each grammar symbol may have attributes For terminal symbols (lexical tokens) attributes can be calculated by the lexer Each production may have an action Written as: X Y 1 Y n { action } That can refer to or compute symbol attributes 38

Semantic Actions: An Example Consider the grammar E int E + E ( E ) For each symbol X define an attribute X.val For terminals, val is the associated lexeme For non-terminals, val is the expression s value (and is computed from values of subexpressions) We annotate the grammar with actions: E int { E.val = int.val } E 1 + E 2 { E.val = E 1.val + E 2.val } ( E 1 ) { E.val = E 1.val } 39

Semantic Actions: An Example (Cont.) String: 5 + (2 + 3) Tokens: int 5 + ( int 2 + int 3 ) Productions Equations E E 1 + E 2 E.val = E 1.val + E 2.val E 1 int 5 E 1.val = int 5.val = 5 E 2 ( E 3 ) E 2.val = E 3.val E 3 E 4 + E 5 E 3.val = E 4.val + E 5.val E 4 int 2 E 4.val = int 2.val = 2 E 5 int 3 E 5.val = int 3.val = 3 40

Semantic Actions: Notes Semantic actions specify a system of equations Order of resolution is not specified Example: E 3.val = E 4.val + E 5.val Must compute E 4.val and E 5.val before E 3.val We say that E 3.val depends on E 4.val and E 5.val The parser must find the order of evaluation 41

Dependency Graph E + E 1 + E 2 Each node labeled E has one slot for the val attribute Note the dependencies int 5 5 ( E 3 + ) E 4 + E 5 int 2 2 int 3 3 42

Evaluating Attributes An attribute must be computed after all its successors in the dependency graph have been computed In previous example attributes can be computed bottom-up Such an order exists when there are no cycles Cyclically defined attributes are not legal 43

Dependency Graph E 10 E 1 5 + E 2 5 int 5 5 ( E 3 5 ) E + 4 2 E 5 3 int 2 2 int 3 3 44

Semantic Actions: Notes (Cont.) Synthesized attributes Calculated from attributes of descendents in the parse tree E.val is a synthesized attribute Can always be calculated in a bottom-up order Grammars with only synthesized attributes are called S-attributed grammars Most common case 45

Inherited Attributes Another kind of attribute Calculated from attributes of parent and/or siblings in the parse tree Example: a line calculator 46

A Line Calculator Each line contains an expression E int E + E Each line is terminated with the = sign L E = + E = In second form the value of previous line is used as starting value A program is a sequence of lines P ε P L 47

Attributes for the Line Calculator Each E has a synthesized attribute val Calculated as before Each L has an attribute val L E = { L.val = E.val } + E = { L.val = E.val + L.prev } We need the value of the previous line We use an inherited attribute L.prev 48

Attributes for the Line Calculator (Cont.) Each P has a synthesized attribute val The value of its last line P ε { P.val = 0 } P 1 L { P.val = L.val; L.prev = P 1.val } Each L has an inherited attribute prev L.prev is inherited from sibling P 1.val Example 49

Example of Inherited Attributes P val synthesized P ε 0 L + E 3 + + = prev inherited E + 4 int 2 2 E 5 int 3 3 All can be computed in depth-first order 50

Example of Inherited Attributes P 5 val synthesized P ε L 0 0 5 + E 3 5 0 = prev inherited E 4 2 + int 2 2 E 5 int 3 3 3 All can be computed in depth-first order 51

Semantic Actions: Notes (Cont.) Semantic actions can be used to build ASTs And many other things as well Also used for type checking, code generation, Process is called syntax-directed translation Substantial generalization over CFGs 52

Constructing An AST We first define the AST data type Supplied by us for the project Consider an abstract tree type with two constructors: mkleaf(n) = n mkplus(, ) = PLUS T 1 T 2 T 1 T 2 53

Constructing a Parse Tree We define a synthesized attribute ast Values of ast values are ASTs We assume that int.lexval is the value of the integer lexeme Computed using semantic actions E int E.ast = mkleaf(int.lexval) E 1 + E 2 E.ast = mkplus(e 1.ast, E 2.ast) ( E 1 ) E.ast = E 1.ast 54

Parse Tree Example Consider the string int 5 + ( int 2 + int 3 ) A bottom-up evaluation of the ast attribute: E.ast = mkplus(mkleaf(5), mkplus(mkleaf(2), mkleaf(3)) PLUS PLUS 5 2 3 55

Summary We can specify language syntax using CFG A parser will answer whether s L(G) and will build a parse tree which we convert to an AST and pass on to the rest of the compiler 56

Intro to Top-Down Parsing: The Idea The parse tree is constructed From the top From left to right 1 t 2 3 t 9 Terminals are seen in order of appearance in the token stream: t 5 4 t 6 7 t 8 t 2 t 5 t 6 t 8 t 9 57

Recursive Descent Parsing Consider the grammar E T T + E T int int * T ( E ) Token stream is: ( int 5 ) Start with top-level non-terminal E Try the rules for E in order 58