Building a Parser Part III

Similar documents
Syntax Analysis, VI Examples from LR Parsing. Comp 412

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Parsing II Top-down parsing. Comp 412

Introduction to Parsing. Comp 412

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Syntax Analysis, III Comp 412

Syntax Analysis, III Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 406/534 Compiler Construction Parsing Part I

CS415 Compilers. LR Parsing & Error Recovery

Intermediate Representations

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

UNIT III & IV. Bottom up parsing

S Y N T A X A N A L Y S I S LR

Bottom-Up Parsing. Lecture 11-12

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Introduction to Parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Implementing Control Flow Constructs Comp 412

CSCI312 Principles of Programming Languages

Parsing II Top-down parsing. Comp 412

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Intermediate Representations

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Lecture 8: Deterministic Bottom-Up Parsing

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Syntax-Directed Translation

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Lecture 7: Deterministic Bottom-Up Parsing

Wednesday, August 31, Parsers

Context-sensitive Analysis Part IV Ad-hoc syntax-directed translation, Symbol Tables, andtypes

Bottom-up Parser. Jungsik Choi

Parsing III. (Top-down parsing: recursive descent & LL(1) )

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing Part II (Top-down parsing, left-recursion removal)

Bottom-Up Parsing. Lecture 11-12

EECS 6083 Intro to Parsing Context Free Grammars

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Monday, September 13, Parsers

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Configuration Sets for CSX- Lite. Parser Action Table

Syntax Analysis Part IV

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Using an LALR(1) Parser Generator

CS 4120 Introduction to Compilers

More Bottom-Up Parsing

LR Parsing LALR Parser Generators

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

CS 406/534 Compiler Construction Putting It All Together

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Programming Language Syntax and Analysis

CSCI Compiler Design

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntactic Analysis. Top-Down Parsing

DEMO A Language for Practice Implementation Comp 506, Spring 2018

Handling Assignment Comp 412

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

Compilers. Bottom-up Parsing. (original slides by Sam

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

Context-free grammars

In One Slide. Outline. LR Parsing. Table Construction

Conflicts in LR Parsing and More LR Parsing Types

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

Downloaded from Page 1. LR Parsing

LR Parsing LALR Parser Generators

Chapter 4. Lexical and Syntax Analysis

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

The Software Stack: From Assembly Language to Machine Code

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Bottom-Up Parsing II. Lecture 8

Part 5 Program Analysis Principles and Techniques

LR Parsing of CFG's with Restrictions

ICOM 4036 Spring 2004

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Grammars. CS434 Lecture 15 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Alternatives for semantic processing

CS415 Compilers. Lexical Analysis

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE 582 Autumn 2002 Exam 11/26/02

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

LR Parsing Techniques

CSCE 531, Spring 2015 Final Exam Answer Key

Transcription:

COMP 506 Rice University Spring 2018 Building a Parser Part III With Practical Application To Lab One source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 506 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved Section numbers refer to EaC2e.

Where Are We? Lab 1 is due in 17 days. You need to get started. You may feel that you need more knowledge. Last Class Ambiguous grammars, precedence, encoding meaning into grammar Basics of a bison input file Introduction to semantic actions in bison Today s Class How to get started with flex and bison Jump in and start coding Understanding what bison is trying to tell you 1 Handling syntax errors in bison 1 You are justified in thinking that it does a poor job of trying to tell you what is wrong COMP with your 506, grammar. Spring 2018 It takes a lot of context to understand the diagnostic information. 2

The Grammar for Demo In the lab 1 materials, you are given a grammar for Demo The grammar is not an LR(1) grammar. You must make it into one. You can use bison to identify the problems with the grammar You will find that we have already fixed some of the problems (e.g., precedence) You need to build flex and bison input files for Demo There are more details to learn about tokens, types, line numbers, interfaces, You will want to build some support routines (e.g., yyerror()) You will need to build a driver to call your parser You will need to test your parser on code we provide and your own examples You will want to build better error handling than bison, by default, provides The error handling facilities are somewhat cryptic & hard to use You will need to experiment with use & placement of the error token COMP 506, Spring 2018 3

From Last Lecture Ambiguity: The Classic Example The if-then-else construct The straightforward grammar for the if-then-else construct is ambiguous Allows for both an if-then and an if-then-else Allows one if-then or if-then-else to control another 0 Stmt if Expr then Stmt 1 if Expr then Stmt else Stmt 2 other statements The problem lies in matching else clauses with then clauses The matching should be (1) consistent, (2) unambiguous, & (3) obvious This control-flow construct appears in nearly all programming languages COMP 506, Spring 2018 4

IF-THEN-ELSE: A Final Word This ambiguity is a bit more subtle than it looks Stmts Stmts Stmt Stmt Stmt Reference = Expr IF ( Expr ) THEN Stmt IF ( Expr ) THEN Stmt ELSE Stmt Where Reference and Expr are non-terminals defined elsewhere We know how to fix this ambiguity, using the withelse transformation What happens if we add a Stmt that contains Stmt? Stmt WHILE ( Expr ) Stmt Since Stmt can expand to an if-then-else, you get an ambiguity if (a > b) then while (c > d) if (e < f) then Stmt 1 else Stmt 2 Only allow a WithElse, unless it is enclosed in some bracket construct, such as { } or begin and end. COMP 506, Spring 2018 5

IF-THEN-ELSE: A Final Word This ambiguity is a bit more subtle than it looks Stmts Stmts Stmt Stmt Stmt Reference = Expr IF ( Expr ) THEN Stmt IF ( Expr ) THEN Stmt ELSE Stmt Where Reference and Expr are non-terminals defined elsewhere We know how to fix this ambiguity, using the withelse transformation Demo has both for and while loops The loop constructs have C-style brackets around the statement list. The brackets were added precisely to allow an if-then inside a loop. Yet another insight into why programming languages look as they do? COMP 506, Spring 2018 6

Using bison on the if-then-else grammar When you invoke bison, it produces multiple files diana% ls ITE.y diana% bison -vd ITE.y ITE.y: conflicts: 1 shift/reduce diana% ls -1 ITE.output ITE.tab.c ITE.tab.h ITE.y diana% ITE.tab.c contains the parser itself ITE.tab.h is the set of token definitions for inclusion in the flex scanner ITE.output describes all of the states of the parser %token <int> if %token <int> then %token <int> else %token <int> Expr %token <int> assignment %type <int> Start %type <int> Stmt %% Start : Stmt ; Stmt : if Expr then Stmt if Expr then Stmt else Stmt assignment ; %% File ITE.y bison version of if-then-else Declared each terminal a token with stack type int; each nonterminal has stack type int, too. COMP 506, Spring 2018 7

Using bison on the if-then-else grammar When you invoke bison, it produces multiple files diana% ls ITE.y diana% bison -vd ITE.y ITE.y: conflicts: 1 shift/reduce diana% ls -1 ITE.output ITE.tab.c ITE.tab.h ITE.y diana% What about this error message? The answer takes some context. ITE.tab.c contains the parser itself ITE.tab.h is the set of token definitions for inclusion in the flex scanner (-d option) ITE.output describes all of the states of the parser (-v option) %token <int> if %token <int> then %token <int> else %token <int> Expr %token <int> assignment %type <int> Start %type <int> Stmt %% Start : Stmt ; Stmt : if Expr then Stmt if Expr then Stmt else Stmt assignment ; %% File ITE.y bison version of if-then-else COMP 506, Spring 2018 8

Using bison About that error message diana% bison -vd ITE.y ITE.y: conflicts: 1 shift/reduce diana The key to LR(1) parsing is that the parser can decide from the combination of the state, the stack, and the next word, whether to shift, reduce, accept, or throw an error The error message means that, in one parser configuration, the parser generator is unable to decide whether to shift or to reduce a shift/reduce conflict. Reduce/reduce conflicts are also possible We need to eliminate the ambiguities that create these conflicts We need to know where to look %token <int> if %token <int> then %token <int> else %token <int> Expr %token <int> assignment %type <int> Start %type <int> Stmt %% Start : Stmt ; Stmt : if Expr then Stmt if Expr then Stmt else Stmt assignment ; %% File ITE.y bison version of if-then-else COMP 506, Spring 2018 9

Using bison When you invoke bison, it produces multiple files The ITE.output file is the key (-v option) Shows all the parser configurations Each configuration forms a state Any errors are called out by state The ITE.output file has two parts Top of the file is a summary Error messages, if any A listing of the grammar A directory of where symbols appear Bottom is a list of all the states and parser configurations State 8 conflicts: 1 shift/reduce Grammar 0 $accept: Start $end 1 Start: Stmt 2 Stmt: if Expr then Stmt 3 if Expr then Stmt else Stmt 4 assignment Terminals, with rules where they appear $end (0) 0 error (256) if (258) 2 3 then (259) 2 3 else (260) 3 Expr (261) 2 3 assignment (262) 4 Nonterminals, with rules where they appear $accept (8) on left: 0 Start (9) on left: 1, on right: 0 Stmt (10) on left: 2 3 4, on right: 1 2 3 messages grammar directory COMP 506, Spring 2018 10

Using bison Each state has a four-part structure Each state is labeled with a number state 0 0 $accept:. Start $end if shift, and go to state 1 assignment shift, and go to state 2 Start go to state 3 Stmt go to state 4 list of states & configurations state 1 2 Stmt: if. Expr then Stmt 3 if. Expr then Stmt else Stmt Expr shift, and go to state 5 state 2 4 Stmt: assignment. $default reduce using rule 4 (Stmt) state 3 0 $accept: Start. $end $end shift, and go to state 6 COMP 506, Spring 2018 11

Using bison Each state has a four-part structure Each state is labeled with a number The state of the parse as a collection of LR(1) items Productions with placeholders Parser state, or configuration, is a collection of LR(1) items (often large) The listing shows the useful LR(1) items state 0 0 $accept:. Start $end if shift, and go to state 1 assignment shift, and go to state 2 Start go to state 3 Stmt go to state 4 state 1 2 Stmt: if. Expr then Stmt 3 if. Expr then Stmt else Stmt Expr shift, and go to state 5 state 2 4 Stmt: assignment. list of states & configurations $default reduce using rule 4 (Stmt) state 3 0 $accept: Start. $end $end shift, and go to state 6 See 3.4.2 in EaC2e COMP 506, Spring 2018 12

LR(1) Items The intermediate representation of the LR(1) table construction algorithm An LR(1) item represents a valid configuration of an LR(1) parser An LR(1) item is a pair [P, d], where P is a production A b with a at some position in the RHS d is a single symbol lookahead The in an item indicates the position of the top of the stack (symbol word or EOF) [A bg,a] means that the input seen so far is consistent with the use of A bg immediately after the symbol on top of the stack. We call an item like this a possibility. [A b g,a] means that the input sees so far is consistent with the use of A bg at this point in the parse, and that the parser has already recognized b (that is, b is on top of the stack). We call an item like this a partially complete item. [A bg,a] means that the parser has seen bg, and that a lookahead symbol of a is consistent with reducing to A. This item is complete. See 3.4.2 in EaC2e COMP 506, Spring 2018 13

Using bison Each state has a four-part structure Label that gives parser state number The state of the parse as a collection of LR(1) items Productions with placeholders The listing shows the useful LR(1) items A set of parser ACTIONs <word,action> pairs Actions are shift, reduce, or accept state 0 0 $accept:. Start $end if shift, and go to state 1 assignment shift, and go to state 2 Start go to state 3 Stmt go to state 4 state 1 2 Stmt: if. Expr then Stmt 3 if. Expr then Stmt else Stmt Expr shift, and go to state 5 state 2 4 Stmt: assignment. list of states & configurations $default reduce using rule 4 (Stmt) state 3 0 $accept: Start. $end $end shift, and go to state 6 COMP 506, Spring 2018 14

Using bison Each state has a four-part structure Label that gives parser state number The state of the parse as a collection of LR(1) items Productions with placeholders The listing shows the useful LR(1) items A set of parser ACTIONs <word,action> pairs Actions are shift, reduce, or accept A set of GOTO transitions (may be empty) <NT, action> pairs Used on reduction to find new state The actions and transitions form the ACTION and GOTOtables, respectively state 0 0 $accept:. Start $end if shift, and go to state 1 assignment shift, and go to state 2 Start go to state 3 Stmt go to state 4 state 1 2 Stmt: if. Expr then Stmt 3 if. Expr then Stmt else Stmt Expr shift, and go to state 5 state 2 4 Stmt: assignment. $default reduce using rule 4 (Stmt) state 3 0 $accept: Start. $end list of states & configurations $end shift, and go to state 6 COMP 506, Spring 2018 15

Using bison Each state has a three-part structure Label that gives parser state number The state of the parse as a collection of LR(1) items The LR(1) items encode what the parser has seen & what the possibilities are A set of parser ACTIONs Actions are taken based on the next word in the input stream: shift, reduce, or accept. No action an error A set of GOTO transitions GOTOs map a non-terminal (the lhs of a reduction) and a state into the parser s next state They thread together the derivation state 0 0 $accept:. Start $end if shift, and go to state 1 assignment shift, and go to state 2 Start go to state 3 Stmt go to state 4 state 1 2 Stmt: if. Expr then Stmt 3 if. Expr then Stmt else Stmt Expr shift, and go to state 5 state 2 4 Stmt: assignment. $default reduce using rule 4 (Stmt) state 3 0 $accept: Start. $end list of states & configurations $end shift, and go to state 6 COMP 506, Spring 2018 16

What about those GOTO actions? The GOTO transitions thread together the derivation (what?) Look closely at reduce action Removes RHS from the stack Saves the revealed state in s Pushes LHS and some new state: GOTO[s,LHS] The revealed state encodes where the parser was when it began looking for an LHS reduce action says we found the LHS we needed GOTO transition moves parser ahead by the LHS symbol stack.push( INVALID ); stack.push(s 0 ); // initial state word NextWord(); loop forever { s stack.top(); if ( ACTION[s,word] == reduce A b ) then { stack.popnum( 2* b ); // pop RHS off stack s stack.top(); stack.push( A ); // push LHS, A stack.push( GOTO[s,A] ); // push next state } else if ( ACTION[s,word] == shift s i ) then { stack.push(word); stack.push( s i ); word NextWord(); } else if ( ACTION[s,word] == accept & word == EOF) then break; else throw a syntax error; } report success; COMP 506, Spring 2018 17

Using bison About that error message diana% bison -vd ITE.y ITE.y: conflicts: 1 shift/reduce diana The parser has two conflicting options: shift the else onto the stack reduce the if-then binding the else to a surrounding if-then and it cannot choose between them To reach state 8, the parser must have already seen one or more if-then s Either option can lead to a derivation The grammar allows either one State 8 conflicts: 1 shift/reduce lots of stuff... state 8 2 Stmt: if Expr then Stmt. 3 if Expr then Stmt. else Stmt else shift, and go to state 9 else [reduce using rule 2 (Stmt)] $default reduce using rule 2 (Stmt) The way to read this state is that the parser has two legal actions on an else, because of an ambiguity in productions 2 and 3. The placeholders in the LR(1) items show how the productions line up for the ambiguity. And, we know how to fix it The if-then-else grammar is used as an COMP example 506, in Spring 3.2.22018 & 3.4.3 of EaC2e 18

Using bison What about a reduce/reduce conflict? A reduce/reduce conflict indicates that we have two identical rhs s The third clause in our definition of ambiguity from last lecture Again, this situation is an ambiguity in the grammar Typically, this situation arises from overloading some symbol Typically, this situation is a deliberate decision by the language designer Example 0 Reference id 1 ArrayRef 2 FuncCall 3 ArrayRef id ( ExprList ) 4 FuncCall id ( ExprList ) Fortran, PL/I, & other languages used the same syntax for array subscript lists and function call parameter lists. COMP 506, Spring 2018 19

Using bison To fix this kind of ambiguity, use just one production for the syntax The parser generator is saying that ArrayRef & FuncCall are syntactically identical The parser generator is correct. The compiler needs some other way to distinguish between these two cases. Typically, the type information on id Use one production Decide the correct use later The difference between ArrayRef and FuncCall is not syntax, but meaning Parsers lookat syntax We will revisit this decide later strategy in a couple of lectures. 0 Reference id 1 ArrayRef 2 FuncCall 3 ArrayRef id ( ExprList ) 4 FuncCall id ( ExprList ) 0 Reference id 1 id ( ExprList ) COMP 506, Spring 2018 20

Automatic Generation of Scanners and Parsers Three time frames At design time, the compiler writer writes specifications for the microsyntax (spelling) and input program when the the syntax (grammar) What about errors in the compiler runs? At build time, (at compile the tools time?) convert specifications to How do we handle those code and compile that code in a bison-built parser? to produce the actual compiler At compile time, the user invokes the compiler to translate an application into an executable form specifications as CFGs specifications as REs stream of characters Parser Generator Scanner Generator Scanner Parser Semantic Elaboration Compiler-build time Compile time IR annotations Front End COMP 506, Spring 2018 21

Error handling in bison First things first: bison s default error messages are terrible Enable more verbose (more precise? more explanatory?) error messages Define the preprocessor symbol YYERROR_VERBOSE Insert #define YYERROR_VERBOSE in code section at the top of the file Note that the code section is offset with brackets %{ and %} Top of the file DEMOgram.y in the lab1_ref parser %{ /* Copyright 2016, Keith D. Cooper & Linda Torczon * * Written at Rice University, Houston, Texas as part * of the instructional materials for COMP 506. */ #define YYERROR_VERBOSE #include <stdio.h> #include "demo.h" int yylineno; char *yytext; %} COMP 506, Spring 2018 22

Error Handling in bison What does bison do with a syntax error? Action[state,lookahead] is set to invalid The parser pops <symbol,state> pairs off the stack until either It finds a state where the token error is valid, or It empties the parse stack and halts error is valid if Action[state,error] is set to something other than invalid If it finds a state where error is valid it goes back to shifting & reducing It generates an error message, to yyerror() It remains in error mode until it does three shifts or the code calls yyerrok() The compiler writer has the option of restarting with the current token (the default) or discarding the current token (call yyclearin() ) To recover from an error and keep parsing, there must be entries for error The compiler writer should add productions containing error tokens Adding error productions is subtle and takes practice error is a pseudo-token; it is treated like a token, but you need not declare it. COMP 506, Spring 2018 23

Error Handling in bison How does the Action table entry for <state,error> get set? The compiler writer adds rules that contain the (pseudo-)token error These rules create shifts and reduces, just like other tokens A production with an error token allows the compiler writer to specify situations in which she believes that an error is likely, and A specific action that the parser should take if an error is detected The key to good error handling in a bison parser is use of error tokens The mechanism is not at all intuitive. It takes experimentation and practice. Lab 1 will give you that experience. COMP 506, Spring 2018 24

Error handling in bison To catch a bad statement in demo, you might add Stmt Reference = Expr ; { Stmts } error { yyclearin; yyerrok;} To catch a missing right bracket, you might add Stmt { Stmts error /* no yyclearin; */ COMP 506, Spring 2018 25

Error handling in bison Not all errors can be caught with error tokens To catch an empty statement or an empty statement list, you might add explicit productions describing the errors Stmt Reference = Expr ; ; { yyerror( empty statement, ; ); yyclearin; /* throw away ; */ } { Stmts } { } { yyerror( empty statement list ); } error { yyclearin; yyerrok;} It takes some practice and experimentation. That is why we provided you with the error input files. COMP 506, Spring 2018 26