Compiler Construction Assignment 3 Spring 2018

Similar documents
Compiler Construction Assignment 4 Spring 2017

Syntax Analysis Part IV

Compiler Construction Assignment 1 Spring 2018

Compiler Construction Assignment 1 Fall 2009

Lexical and Syntax Analysis

Using an LALR(1) Parser Generator

Syntax-Directed Translation Part I

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Compiler Lab. Introduction to tools Lex and Yacc

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP5621 Exam 3 - Spring 2005

Syntax-Directed Translation

Parser Tools: lex and yacc-style Parsing

Programming Project II

A Simple Syntax-Directed Translator

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

A simple syntax-directed

COP4020 Programming Assignment 2 Spring 2011

Conflicts in LR Parsing and More LR Parsing Types

Parser Tools: lex and yacc-style Parsing

CSCI Compiler Design

COP4020 Programming Assignment 2 - Fall 2016

1 Lexical Considerations

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Decaf PP2: Syntax Analysis

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

CSCE 531, Spring 2015 Final Exam Answer Key

LALR Parsing. What Yacc and most compilers employ.

Principles of Programming Languages

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Bottom-Up Parsing. Lecture 11-12

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-free grammars

UNIVERSITY OF CALIFORNIA

Yacc: A Syntactic Analysers Generator

Lexical Considerations

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Introduction to Lex & Yacc. (flex & bison)

Bottom-Up Parsing. Lecture 11-12

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

LECTURE 11. Semantic Analysis and Yacc

Compiler Construction: Parsing

COMPILER CONSTRUCTION Seminar 02 TDDB44

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

LR Parsing LALR Parser Generators

Lexical Considerations

Building a Parser Part III

A programming language requires two major definitions A simple one pass compiler

Compilers. Bottom-up Parsing. (original slides by Sam

CSCE 531 Spring 2009 Final Exam

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

Yacc Yet Another Compiler Compiler

Compilers. Compiler Construction Tutorial The Front-end

Hyacc comes under the GNU General Public License (Except the hyaccpar file, which comes under BSD License)

COP 3402 Systems Software Syntax Analysis (Parser)

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

LR Parsing LALR Parser Generators

Typescript on LLVM Language Reference Manual

An Introduction to LEX and YACC. SYSC Programming Languages

Principles of Programming Languages [PLP-2015] Detailed Syllabus

CSE 3302 Programming Languages Lecture 2: Syntax

Introduction to Compiler Design

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Syntax. A. Bellaachia Page: 1

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

ASML Language Reference Manual

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Context-free grammars (CFG s)

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Question Bank. 10CS63:Compiler Design

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Compiler Construction

Introduction to Parsing. Lecture 5

Syntax Directed Translation

Introduction to Programming Using Java (98-388)

CS164: Midterm I. Fall 2003

CSCI312 Principles of Programming Languages!

Parser and syntax analyzer. Context-Free Grammar Definition. Scanning and parsing. How bottom-up parsing works: Shift/Reduce tecnique.

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

CSE302: Compiler Design

Wednesday, August 31, Parsers

Compiler Construction

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Compiler construction in4020 lecture 5

CPS 506 Comparative Programming Languages. Syntax Specification

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Transcription:

Compiler Construction Assignment 3 Spring 2018 Robert van Engelen µc for the JVM µc (micro-c) is a small C-inspired programming language. In this assignment we will implement a compiler in C++ for µc. The compiler compiles µc programs to java class files for execution with the Java virtual machine. To implement the compiler, we can reuse the same concepts in the code-generation parts that were done in programming assignment 1 and reuse parts of the lexical analyzer you implemented in programming assignment 2. We will implement a new parser based on Yacc/Bison. This new parser utilizes translation schemes defined in Yacc grammars to emit Java bytecode. In the next programming assignment (the last assignment following this assignment) we will further extend the capabilities of our µc compiler by adding static semantics such as data types, apply type checking, and implement scoping rules for functions and blocks. Download Download the Pr3.zip file from http://www.cs.fsu.edu/~engelen/courses/cop5621/pr3.zip. unzipping you will get the following files After Makefile bytecode.c bytecode.h error.c global.h init.c javaclass.c javaclass.h mycc.l mycc.y symbol.c test#.uc A makefile The bytecode emitter (same as Pr1) The bytecode definitions (same as Pr1) Error reporter Global definitions Symbol table initialization Java class file operations (same as Pr1) Java class file definitions (same as Pr1) *) Lex specification *) Yacc specification and main program *) Symbol table operations A number of µc test programs The files marked ) are incomplete. For this assignment you are required to complete these files. You can reuse parts of the code you wrote for Pr1 and Pr2. Download RE/flex from https://sourceforge.net/projects/re-flex and build with./build.sh. The Flex documentation is at http://dinosaur.compilertools.net/flex. The RE/flex documentation is at 1

http://www.cs.fsu.edu/~engelen/doc/reflex/html (RE/flex has additional features compared to Flex). The Makefile assumes that reflex is located in a local directory in the current project directory, but you are free to adapt this to reflect your installation of RE/flex. We will use the following µc programming constructs in this assignment to implement a parser: stmts stmts stmt Statement sequencing ε stmt ; Empty statement expr ; Expression statement (assignments, function calls) if ( expr ) stmt If-then if ( expr ) stmt else stmt If-then-else (disambiguation: else matches closest if) while ( expr ) stmt While loop do stmt while ( expr ) ; Do-while loop for ( expr ; expr ; expr ) stmt For loop with start, while-condition, and update expr. return expr ; Return from program (return from function in Pr4) { stmts } Compound statement block The grammar for these statements and expressions in Yacc notation is (see mycc.y): stmts : stmts stmt /* empty */ ; stmt : ; expr ; { emit(pop); /* do not leave a value on the stack */ } IF ( expr ) stmt { /* TO BE COMPLETED */ error("if-then not implemented"); } IF ( expr ) stmt ELSE stmt { /* TO BE COMPLETED */ error("if-then-else not implemented"); } WHILE ( expr ) stmt { /* TO BE COMPLETED */ error("while-loop not implemented"); } DO stmt WHILE ( expr ) ; { /* TO BE COMPLETED */ error("do-while-loop not implemented"); } FOR ( expr ; expr ; expr ) stmt { /* TO BE COMPLETED */ error("for-loop not implemented"); } RETURN expr ; { emit(istore_2); /* return val goes in local var 2 */ } { stmts } error ; { yyerrok; } ; expr : ID = expr { emit(dup); emit2(istore, $1->localvar); }... Expressions expr are a subset ANSI C expressions and composed of identifiers (variables), integer constants, character constants, a special form to refer to the program s command-line arguments denoted by $0, $1, $2, etc. and most of the ANSI C operators defined for expressions (see the mycc.y file). The return statement returns from the program with a return value (this behavior will be extended to implement function returns later in Pr4 when we implement functions). 2

In its current incomplete state the µc compiler mycc can be built with make. However, it won t run on any input since the minimum requirement for keyword lookup and parsing is not complete yet (in symbol.c). You will notice a lot of parse table conflicts generated by Yacc (or Bison). These conflicts (except the one for the if-then-else ambiguity) should be resolved by adding Yacc declarations for associativity and precedence. The yacc and bison option -v produces y.output with LALR(1) parse table with the conflicts. You should inspect the y.output to view the LALR(1) states with shift/reduce (and possibly reduce/reduce) conflicts. First complete the symbol.c insert/lookup operations, e.g. using the code you wrote for the previous project(s). Now you can build the mycc compiler and run it on a very simple program such as test0.uc. Then execute the resulting Code class as follows: $./mycc test0.uc Compilation successful: saving Code.class $ java Code 123 If you get a java.lang.noclassdeffounderror, set the CLASSPATH shell variable to include.. The next test requires one command-line argument: $./mycc test1.uc Compilation successful: saving Code.class $ java Code 102109 102109 The next program doesn t compile, because the operators and their associativity and precedence levels have not been defined yet: $./mycc test2.uc + operator not implemented Lex Specification Our compiler accepts a subset of the ANSI C grammar. It relies on the lexical analyzer mycc.l to provide a token stream that is compliant with ANSI C (although some simplifications are applicable). Thus, the lexical analyzer must be able to recognize the full set of ANSI C keywords, operators, identifiers, and literal constants. You can reuse the Lex specification of assignment 2 to extend mycc.l. The Lex actions should provide tokens and token values (yylval) to the Yacc-based parser. Yacc Specification The grammar and main program are defined in mycc.y. The grammar should be completed with semantic actions to emit the correct Java bytecode. To implement semantic actions you should use marker nonterminals. The actions for three marker nonterminals L, N, and M are already defined in mycc. These three marker nonterminals have actions that are used to direct the control flow for conditional programming constructs 3

and loops with conditional and unconditional jumps. As usual, backpatching should be used to set the target location of a forward jump. The current grammar for expressions is ambiguous and the appropriate declarations for operator associativity and precedence should be added to disambiguate the grammar, except for the ambiguous if-then-else. Thus, only one shift-reduce conflict should remain. Arithmetic expressions are only performed on integers. So you can ignore floats and strings, even through we define the content of the Yacc stack s attributes to include symbols (for identifiers), numbers (for integers), floats, strings, and locations (for backpatching). The Yacc specification of the alternate data types of its synthesized attributes are defined in a Yacc %union: %union { Symbol *sym; /* token value yylval.sym is the symbol table entry of an ID */ unsigned num; /* token value yylval.num is the value of an int constant */ float flt; /* token value yylval.flt is the value of a float constant */ char *str; /* token value yylval.str is the value of a string constant */ unsigned loc; /* location of instruction to backpatch */ } Each grammar symbol can have one of these types for its synthesized attribute. For example: %token <sym> ID defines ID to be a token with attribute type sym. Thus, yylval.sym is the attribute value of token ID in the Lex specification and the attribute value in the Yacc specification is referenced with $i, as in: ID { emit2(iload, $1->localvar); } Note that a global counter localvar in mycc.l is used to assign JVM local variable indexes to variables in the source code, so all variables in the µc source code are mapped to JVM local variables. The objective of this assignment is to implement all semantic actions required to support the procedural programming constructs and integer arithmetic of µc. A Note on Short-Circuit Operators For this assignment you should not implement short-circuit code for the operators, &&, and! (textbook section 6.6.2 uses short-circuit for conditional operators). But rather these should be implemented as logical operators taking values 0 (false) or 1 (true) by utilizing the following mapping: a b = a b a&&b = a&b!a = 1-a 4

Bonus for Extra Credit You can earn 1% extra credit on the total final grade of this course by implementing a break statement that terminates the (closest) loop construct (do, while and for loop). Note that loops may be nested, and multiple break statements may appear at the same or at different loop levels. Implementing this is not trivial, but the challenge is rewarding. Evaluate your approach first before implementing it. - End 5