CS /534 Compiler Construction University of Massachusetts Lowell

Similar documents
CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

CS 406/534 Compiler Construction Putting It All Together

Programming Assignment IV Due Monday, November 8 (with an automatic extension until Friday, November 12, noon)

Syntax Analysis. Chapter 4

Programming Assignment IV

CS 406/534 Compiler Construction Parsing Part I

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Programming Assignment III

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

UNIT-4 (COMPILER DESIGN)

1 Lexical Considerations

LECTURE 3. Compiler Phases

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Semantic Analysis. Compiler Architecture

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

The role of semantic analysis in a compiler

Syntax Errors; Static Semantics

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

EECS 6083 Intro to Parsing Context Free Grammars

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Lexical Considerations

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Parsing II Top-down parsing. Comp 412

Compilers Project 3: Semantic Analyzer

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

Introduction to Programming Using Java (98-388)

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Programming Languages Third Edition. Chapter 7 Basic Semantics

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

DEMO A Language for Practice Implementation Comp 506, Spring 2018

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Building a Parser Part III

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

R13 SET Discuss how producer-consumer problem and Dining philosopher s problem are solved using concurrency in ADA.

Lexical Considerations

Class Information ANNOUCEMENTS

CS 6353 Compiler Construction Project Assignments

Lecture 16: Static Semantics Overview 1

Decaf PP2: Syntax Analysis

COMP 412, Fall 2018 Lab 1: A Front End for ILOC

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

Programming Project II

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

A Short Summary of Javali

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

CSE 401 Midterm Exam Sample Solution 11/4/11

1. Consider the following program in a PCAT-like language.

Introduction to Parsing Ambiguity and Syntax Errors

CS5363 Final Review. cs5363 1

Context-Free Grammars

Chapter 2 Basic Elements of C++

When do We Run a Compiler?

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Ambiguity and Errors Syntax-Directed Translation

CS/SE 153 Concepts of Compiler Design

CMPE 152 Compiler Design

Semantic Analysis. Lecture 9. February 7, 2018

DO NOT OPEN UNTIL INSTRUCTED

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Compilers. Computer Science 431

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

CSCE 314 Programming Languages. Type System

CS 4201 Compilers 2014/2015 Handout: Lab 1

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Introduction to Parsing Ambiguity and Syntax Errors

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Formats of Translated Programs

The PCAT Programming Language Reference Manual

The SPL Programming Language Reference Manual

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Type Checking. Error Checking

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

What do Compilers Produce?

Programming Languages, Summary CSC419; Odelia Schwartz

Extending xcom. Chapter Overview of xcom

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Language Reference Manual simplicity

CS415 Compilers Context-Sensitive Analysis Type checking Symbol tables

Grammars. CS434 Lecture 15 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Introduction to Computer Science and Business

Compilers and Interpreters

Semantic actions for declarations and expressions

Assignment 6. Computer Science 52. Due Friday, November 9, 2018, at 5:00 pm

Semantic actions for declarations and expressions

Transcription:

CS 91.406/534 Compiler Construction University of Massachusetts Lowell Professor Li Xu Fall 2004 Lab Project 2: Parser and Type Checker for NOTHING Due: Sunday, November 14, 2004, 11:59 PM 1 Introduction This project is intended to give you experience building a parser and conducting context-sensitive analysis. You will build a parser for the demonstration language NOTHING. You will either use an automatic parser generator tool, or build a hand-coded, recursive-descent parser. In either case, you will get experience manipulating a programming language grammar and performing some limited context-sensitive analysis. Read this document and the accompanying NOTHING document completely before embarking on this adventure. 2 Project Requirements Your task is to construct a parser that accepts the NOTHING programming language. The parser will obtain input by calling a lexical analyzer (scanner) on a word-by-word basis. The parser will perform the following functions: 1. It will determine whether or not the input NOTHING program presented for compilation is, in fact, a syntactically correct NOTHING program. If the program is not syntactically correct, a listing of the various syntax errors should be printed to the standard error file (stderr). 2. It will perform some limited context-sensitive analysis to discover whether the alleged NOTHING program presented for compilation adheres to the context-sensitive rules for NOTHING programs. If the program has context-sensitive errors, they should be reported to the user on the stderr file. 3. It will build an abstract syntax tree (AST) for the input program if the program can be successfully parsed. The parser then performs a tree walk on the AST tree and prints out the following statistics about the input program: the number of INTEGER, FLOAT, CHARACTER and ARRAY variables that have been declared in both the main program and subprograms. Part of the documents are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr. Linda Torczon s teaching materials at Rice University. All rights reserved. 1

Operator Precedence *, MOD 5 +, - 4 <, <=, >, >=, =, <> 3 NOT 2 AND, OR 1 Figure 1: Operator precedences in NOTHING the number of user functions and procedures that have been defined. the number of Invocation and WhileStmtstatements in the program. For WhileStmt, the parser should also print out an itemized breakdown ordered by the nesting levels of the statements. The statistics data should be printed to the stdout file. 3 The Parser You are to construct a parser. Either use an automatic parser generator like ANTLR or yacc/bison, or hand-code a recursive-descent parser. The directions in this section assume that you are using ANTLR. Similar steps apply to other parser generators. For a hand-coded recursive-descent parser, you will follow different, but analogous steps. You should proceed in the following steps: 1. Read the ANTLR documentation and familiarize yourself with the tool. In the course directory ~cs406/lab2/parser example, we have installed a sample arithmetic expression parser along with the AST tree generator. A brief description of the sample parser and how to build and run the parser can be found on the course web page. You may also look at other examples from the ANTLR source package. 2. Convert the grammar to a form suitable for use with ANTLR. This includes translating it into the ANTLR input format, massaging it to ensure that the proper associativity and precedence are enforced, and ensuring that the resulting grammar is accepted by ANTLR with no errors (Note: the NOTHING grammar, as written, is ambiguous). Check the robustness of your parser against the test files provided in ~cs406/lab2/test files. 3. Extend the simple parser by adding actions to be performed on the various productions. The code in these actions should check for compliance with the context-sensitive rules described later in this document and in the NOTHING document. 4. Improve the context-free error handling of your parser. ANTLR provides an exception-based error handling mechanism. You need to define parsing exception handlers to recover from common syntax errors and provide meaningful diagnosis of possible errors. The parser should be able to detect as many syntax errors as possible during a single pass over the input program. 5. Build the AST tree representation of the input program. You can either annotate the ANTLR grammar and use the built-in AST tree generator, or generate the tree nodes from scratch in the action code for the grammar rules. Write a separate AST tree walker to traverse the tree in post-order and print out the required statistics of the input program. 4 NOTHING Specification The syntax of NOTHING is specified in a document titled NOTHING: A Language For Practice Implementation. The grammar given in that document will need to be massaged to create one that is acceptable for input to ANTLR. The following additional information may be of use. 2

1. All the operators in NOTHING should be left associative. 2. Operator precedences in NOTHING are specified in Figure 1. Multiplication and MOD have the highest priority; AND, OR, and NOT have the lowest. 3. The scope of a name is the region of the program in which the name can be used. Variables declared in a subprogram are only visible within that subprogram. They obscure an identically named variable or procedure in the surrounding scope. The scopes of distinct subprograms are disjoint a name declared in one subprogram is not visible inside another subprogram. The scope defined by the main program is called the global scope. Names declared in the global scope are accessible from the point of declaration to the end of the program, including the body of the main program and any subprograms that do not redeclare the same name. All variables declared in the main program are in the global scope. 4. Function names are in the global scope. This has two important implications. First, a function can be called from any other function, provided that the calling function has not redefined that name. (This is true, even if the function being called appears later in the source text.) Second, any function can be called recursively. 5. The lifetime of a variable is limited to the execution of the procedure (main program or function) that defines the scope in which it is declared. Thus, the value of a variable x declared in function f ceases to exist after the function returns. 5 Type Checking and Context-Sensitive Analysis To compile a NOTHING program requires a large amount of knowledge that cannot be encoded into a context-free grammar. Thus, you must augment the standard ANTLR parser to compute the required information. For this lab, your context-sensitive analysis should detect the following errors. 1. a variable is referenced but not declared 2. a variable is declared multiple times in a single scope 3. a variable is declared but never referenced (defined or used) 4. any type mismatch (illegal mixed type expression), see Section 6. 5. incorrect subscript type in array variable reference 6. a constant-valued subscript that is outside the declared bounds of an array 7. a procedure call that invokes a function (incorrectly discarding the return value) 8. a function call that invokes a procedure (incorrectly using a non-existent return value) 9. any type mismatch between actual parameters at a call site and the definition of the called procedure. This list should not be considered exhaustive. As you discover other errors, add them to your project s list. Write a test program that exposes the error and include it in the materials that you submit as your final report. 6 Mixed Type Expressions NOTHING supports three basic data types: integer, floating point, and character. Each expression and subexpression has a type that can be determined at compile time. Your lab should determine (1) the type of each subexpression, (2) where coercions must be inserted, and (3) where invalid type combinations exist. Figure 2 gives type conversion tables for several of the NOTHING operators. Several operators are idiosyncratic. 3

Table for +, -, * int float int int float float float float Table for AND, OR char int float char char error error int error int int float error int float Table for <, <=, >, >=, =, <> char int float char int error error int error int int float error int int Figure 2: Conversion tables for mixed mode expressions 1. The type of a subscripted name is wholly determined by the array s type declaration it is independent of the type of the subscript expression. The type of the subscript expression must match the type in the array s dimension declaration. 2. Similarly, the type of a function call is determined by the function s definition rather than by the types of any actual parameters at the call site. 3. Assignment uses an idiosyncratic and asymmetric rule. For a left-hand side of type integer or float, the right-hand side is converted to the type of the left hand side. If the left-hand side is of type character, the right-hand side must have type character. 4. The result of a NOT always has type integer. 5. Relational operators always produce an integer. Comparisons between characters and numbers make no sense; they are illegal. Comparisons between integers and floats produce integer results. To perform the comparison, the integer is converted to a float. Your lab will need to recognize when conversions are required and report any expressions that would require an illegal coercion. It may be useful during debugging to have the parser report all coercions, both legal and illegal. This is a good use for a command-line debugging flag. 7 Debugging Advice It is next to impossible to debug your parser by entering grammar rules and action statements for the whole NOTHING language and attempting to debug it all at once. You should enter actions for a few rules and test, then continue to add more rules and actions and test. ANTLR provides several useful debugging aids. You can turn on the -traceparser option to trace the entry/exit of matching production rules. You may also use the more advanced parse-tree debugging feature to print out the derivations during parsing. Refer to the ANTLR manual for more details. 8 Error Handling Your parser should recover from errors found during parsing by printing an appropriate error message and continuing to process the input. ANTLR provides an exception-based error recovery and handling mechanism. You can define exception handlers for a specific production rule or non-terminal symbol rule set. ANTLR will also generate an exception handler if no explicit handler is defined. The default exception handler will report an error, synchronize to the follow set of the rule, and return from that rule. You should extend this relatively simple error handling to provide more informative diagnosis of parsing errors. 9 Electronic Turn-in and Due Date The due date of lab2 is Sunday, November 14, 2004, 11:59 PM (including documentation). We will run and test your code on the CS network machines. You should make sure your code can compile and run on mercury.cs.uml.edu or similar CS machines. Your turn-in package should include your lab implementation with all supporting files source code, make/build file, test files, etc. Your turn-in 4

should also include a README file and a brief lab report (3 5 pages). The README file should list and describe the source files in your directory, and gives directions on how to build and run your parser; your lab report should provide a brief discussion of your implementation, a summary of your testing procedures (and pointers to any test files you created on your own), and a discussion of your parser s error handling capabilities. Your lab report should be in either PostScript or PDF format. The files in your turn-in package including the documentation should have the last modification time no later than 11:59 PM on the due date. To turn in the lab, leave all the code and the documentation in a directory named Lab2 on your CS account and send email to cs406@cs.uml.edu, indicating the directory location in the message body. Be sure to set the permissions so that the professor can read the directory and execute your code. 10 Grading Criteria The criteria for grading this lab are as follows: 50% of your grade will be based on the functionality and error detection capability of your parser; 20% will be based on your testing procedures and the coverage of your test cases; 15% will be based on the AST tree generation and tree walker functionality; and 15% for documentation. 5