Programming Project II

Similar documents
CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

Using an LALR(1) Parser Generator

Programming Assignment III

Decaf PP2: Syntax Analysis

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

COMPILER CONSTRUCTION Seminar 02 TDDB44

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

A simple syntax-directed

TDDD55 - Compilers and Interpreters Lesson 3

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Compiler Construction Assignment 3 Spring 2018

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

CS606- compiler instruction Solved MCQS From Midterm Papers

Programming Assignment II

Syntax Analysis Part IV

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

TDDD55- Compilers and Interpreters Lesson 3

A programming language requires two major definitions A simple one pass compiler

Lecture 8: Deterministic Bottom-Up Parsing

Syntax-Directed Translation

Yacc: A Syntactic Analysers Generator

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Compilers Project 3: Semantic Analyzer

Lecture 7: Deterministic Bottom-Up Parsing

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Syntax Analysis. Chapter 4

A Simple Syntax-Directed Translator

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Syntax-Directed Translation. Lecture 14

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

An Introduction to LEX and YACC. SYSC Programming Languages

More Assigned Reading and Exercises on Syntax (for Exam 2)

How do LL(1) Parsers Build Syntax Trees?

UNIVERSITY OF CALIFORNIA

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

CSE 401/M501 18au Midterm Exam 11/2/18. Name ID #

CS143 Handout 13 Summer 2011 July 1 st, 2011 Programming Project 2: Syntax Analysis

Bottom-Up Parsing. Lecture 11-12

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Parser Tools: lex and yacc-style Parsing

Principles of Programming Languages

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 1

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Syntax-Directed Translation. Introduction

Compiler Lab. Introduction to tools Lex and Yacc

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Important Project Dates

CS143 Midterm Fall 2008

Hyacc comes under the GNU General Public License (Except the hyaccpar file, which comes under BSD License)

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CS /534 Compiler Construction University of Massachusetts Lowell

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

CSE 3302 Programming Languages Lecture 2: Syntax

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

Programming Assignment 2 LALR Parsing and Building ASTs

LECTURE 11. Semantic Analysis and Yacc

Programming Project 1: Lexical Analyzer (Scanner)

UNIT III & IV. Bottom up parsing

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

ECE251 Midterm practice questions, Fall 2010

CSCI Compiler Design

LR Parsing Techniques

Downloaded from Page 1. LR Parsing

CSCE 314 Programming Languages

Building Compilers with Phoenix

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE 401 Midterm Exam Sample Solution 2/11/15

cmps104a 2002q4 Assignment 2 Lexical Analyzer page 1

Compilers. Bottom-up Parsing. (original slides by Sam

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Syntax Analysis Part I

Parser Tools: lex and yacc-style Parsing

Bottom-Up Parsing. Lecture 11-12

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Context-free grammars

2.2 Syntax Definition

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Syntax Errors; Static Semantics

Building a Parser Part III

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

Compiler Construction: Parsing

Semantic actions for declarations and expressions

Transcription:

Programming Project II CS 322 Compiler Construction Winter Quarter 2006 Due: Saturday, January 28, at 11:59pm START EARLY! Description In this phase, you will produce a parser for our version of Pascal. Your parser will parse the token stream (output of the lexical analyzer) and build an abstract syntax tree. In particular, for this assignment you will 1. slightly modify your lexical analyzer and incorporate it with a BISON syntax analyzer. 2. modify the supplied grammar to recognize multidimensional arrays. 3. modify the supplied grammar so that it handles or resolves conflicts properly. 4. build and print out the syntax tree (in mixed infix and prefix notation). 5. perform some basic error recovery Preparation START EARLY! Read this handout before you start writing code. Read the manual for BISON. Study the given code. Study the given header files and the grammar. Study the standard output files.

The Abstract Syntax Tree As the input is parsed, an abstract syntax tree will be created. Each node of the tree represents a symbol (terminal or nonterminal). Every time the generated parser performs a reduction, it will need to create a new node for the symbol it is reducing to, as a parent of the symbols on the right-hand side of the production. To do that, you will need to specify a semantic action for each production (rule). The purpose of this action will be to compute the semantic value of the left-hand side, using the semantic values of the symbols on the right-hand side of the production. Semantic values are values of certain attributes associated with symbols. In this stage, the semantic value of each symbol is of type Node* representing a node in the abstract syntax tree that is being created. Note that the nodes for certain terminal symbols are created in flex.l. Files The following files are needed for the project: flex.l You will use the flex.l file you created for phase 1 of the compiler, with some modifications. Since the semantic value is a Node*, the yylval of tokens such as TUINT, TIDENT will now be a new node for that token. In addition, your flex should not echo the input and line number as it did before. The provided flex.l file already contains the most important of these modifications. You just have to Add any additional declarations from your flex.l Add the actions for comments (without echoing the comments this time) Add the actions for strings. Do not forget to set yylval. Note: remove any code that prints line numbers and echoes the input. flex.l should only print when there is an error. The scanning function has been removed since the process will now be controlled by the parser. main() and yyerror() have been moved to the grammar file, for the same reason. grammar.y This is the grammar for our subset of Pascal. A skeleton file is provided. Notice that YYSTYPE is again defined at the beginning of grammar.y. It specifies the data type (Node*) for the semantic values of the tokens. Thus, the constructs $$ and $n ($n = value of nth component in rule) are always Node*s. Study the grammar carefully to see how it produces the language. Note the line yydebug = 1; in main(). If you uncomment it, you will get a trace of Bison s parsing actions (i.e. whether it is shifting or reducing, what state it is in, etc.) This is very useful in deciding whether your disambiguation is correct.

Hint: The command bison -v grammar.y will give you a file grammar.output that contains the conflicts, rules, parser states and the goto table of the LALR(1) parser. You have to modify grammar.y as follows: Complete the Actions Most of the rules have actions that compute the value of the left-hand-side nonterminal from those on the right-hand side. In several cases, need to decide for yourself whether an action is needed or not. Typically, an action will create an AST node of some type. The constructor arguments are either the children of the node (i.e. the nodes that were created earlier for certain symbols on the right-hand side of the production) or values related to those children. See the rules that we already wrote for simple type and type declaration part. Do not forget that the default semantic action is $$ = $1. Recognize Multiple Subscripts You will need to modify the grammar slightly to recognize multidimensional arrays (declarations and references). A multidimensional array may be declared as follows: type x=array[1..2, 1..3] of integer and an element of the array may be referenced in two ways: comma separated subscripts Example: arr[1,4] bracket separated subscripts Example: arr[1][4] These two forms may be intermixed and are equivalent. For example, an element of a threedimensional array may be referenced as arr[1,2][3] You have to add new rules for parsing such statements. When a multidimensional array is declared, a new MultiArrayType node should be created and then immediately split into a sequence of simple arrays. For example, the node for the array declaration shown above would essentially become ARRAY [1..2] OF ARRAY [1..3] OF INTEGER Multidimensional array references should be handled in a similar way. Disambiguate the grammar You will note that the supplied grammar is ambiguous. There are reduce-reduce conflicts as well as precedence related ambiguities involving expressions. Refer to the BISON documentation to see how you can resolve them, and modify the grammar accordingly. Some of the conflicts may be properly handled by Bison s default behavior. You don t have to do anything in those cases. A note regarding the ambiguity caused by the three rules that handle variables and expressions that reduce to parameters. The goal is to instruct the parser to always reduce a variable

directly to a parameter instead of reducing the variable to an expression and the expression to a parameter. A conflict may still exist in the final parser, but it must always be handled correctly. However, avoid having more than 3-4 conflicts in the final product. Implement some error recovery Your parser should be able to recover from the following types of error: Error in the condition of a while or if statement Extra semicolon before the token TEND in a block of statements (in Pascal, semicolons separate statements, therefore, the last statement in a block is not followed by a semicolon). Missing comma or extra/wrong characters between variable names in lists of variables or parameters. You will have to implement yyerror() with some slight modifications. It should print out not only the error message like before, but also some information about where the error occurred (the line as well as the approximate location). Bison provides special variables and macros that may be useful here. See the provided output files for an example. ast.h Contains class declarations for the abstract syntax tree. It is strongly recommended that you not modify this file, as it will be used throughout the project. ast.cpp Definitions of ast s member functions. The nodes of the tree must be visited depth-first. On a visit, the root may print something before visiting its children, between visits to its children, and/or after visits to its children. This way, when the tree is printed, it will look like the input Pascal program in infix/prefix form. Most of the debugging information is printed in infix notation, except binary expressions, which are in prefix for grading purposes. See the standard output files provided in the test directory. You will notice several special symbols such as #, @, etc. We use these as markers so we can tell whether the correct type of node has been created (and whether your parser has reduced correctly). For example, a single variable is represented by a Variable* node and, when it is visited, a # is printed in front of its name. If that variable is also reduced to an expression (this will not always be the case), a new VarExpr* node is created for it. When that node is visited, a @ is printed before the name. We have provided a skeleton file with the functions you need to implement. The cerr statements are there for debugging purposes. You can use them to see in what order the nodes are visited. symtab.h, symtab.cpp Use your symtab.* from PA1, with no modifications.

Testing As before, we provide test cases as well as sample output which your code must match. YOUR CODE MUST MATCH THE TEST CASES EXACTLY! We will be using diff to compare your results to ours, so other than extra newlines, the rest must match. Submitting your code Submit flex.l, grammar.y, ast.* and symtab.* in a tarball. As usual, email your code to c22@cs.northwestern.edu CAREFUL! cs.nwu.edu will bounce.