CPSC 411, Fall 2010 Midterm Examination

Similar documents
CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia

Announcements. Project 2: released due this sunday! Midterm date is now set: check newsgroup / website. Chapter 7: Translation to IR

Administration. Today

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

LECTURE 3. Compiler Phases

CS 415 Midterm Exam Spring 2002

Compiler Theory. (Semantic Analysis and Run-Time Environments)

A simple syntax-directed

Topic 7: Intermediate Representations

CPSC 411: Introduction to Compiler Construction

CSE 5317 Midterm Examination 4 March Solutions

Fundamentals of Compilation

7 Translation to Intermediate Code

CSE 401/M501 18au Midterm Exam 11/2/18. Name ID #

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Compiler Construction 2009/2010: Intermediate Representation

Compiler Internals. Reminders. Course infrastructure. Registering for the course

CPS 506 Comparative Programming Languages. Syntax Specification

Examination in Compilers, EDAN65

Compilation 2013 Translation to IR

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Chapter 4. Abstract Syntax

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

CS 403: Scanning and Parsing

Formal Semantics. Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1

CS 553 Compiler Construction Fall 2007 Project #1 Adding floats to MiniJava Due August 31, 2005

CSE P 501 Exam 8/5/04

Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

MIDTERM EXAMINATION - CS130 - Spring 2003

More Assigned Reading and Exercises on Syntax (for Exam 2)

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Languages and Compiler Design II IR Code Generation I

PART ONE Fundamentals of Compilation

CA Compiler Construction

CS 415 Midterm Exam Spring SOLUTION

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

Comp 311: Sample Midterm Examination

Undergraduate Compilers in a Day

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

A Simple Syntax-Directed Translator

Compiler Design (40-414)

CSE 413 Final Exam. June 7, 2011

CMPT 379 Compilers. Parse trees

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

ASTs, Objective CAML, and Ocamlyacc

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

CSCE 314 Programming Languages

Itree Stmts and Exprs. Back-End Code Generation. Summary: IR -> Machine Code. Side-Effects

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Syntax Intro and Overview. Syntax

Syntax Errors; Static Semantics

Topic 3: Syntax Analysis I

Time : 1 Hour Max Marks : 30

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Lexical Scanning COMP360

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CS 132 Compiler Construction

UNIT-4 (COMPILER DESIGN)

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

1 Lexical Considerations

Intermediate Representations

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Programming Languages Third Edition. Chapter 7 Basic Semantics

CPSC 311, 2010W1 Midterm Exam #2

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Administration. Where we are. Canonical form. Canonical form. One SEQ node. CS 412 Introduction to Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Programming Language Processor Theory

PRACTICAL CLASS: Flex & Bison

Writing Evaluators MIF08. Laure Gonnord

Lexical Analysis (ASU Ch 3, Fig 3.1)

CS 415 Midterm Exam Fall 2003

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

COSE312: Compilers. Lecture 1 Overview of Compilers

Intermediate Code Generation

Appendix A The DL Language

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

CSE 401 Midterm Exam Sample Solution 2/11/15

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

Projects for Compilers

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

CSE 12 Abstract Syntax Trees

Final Examination May 5, 2005

TDDD55 - Compilers and Interpreters Lesson 3

Object-oriented Compiler Construction

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

2.2 Syntax Definition

CSE 401 Midterm Exam Sample Solution 11/4/11

1. Consider the following program in a PCAT-like language.

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Name EID. (calc (parse '{+ {with {x {+ 5 5}} {with {y {- x 3}} {+ y y} } } z } ) )

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Syntax and Grammars 1 / 21

ECE251 Midterm practice questions, Fall 2010

Transcription:

CPSC 411, Fall 2010 Midterm Examination. Page 1 of 11 CPSC 411, Fall 2010 Midterm Examination Name: Q1: 10 Q2: 15 Q3: 15 Q4: 10 Q5: 10 60 Please do not open this exam until you are told to do so. But please do read this entire first page now. Some of the problems on this quiz ask for written answers, rather than code or numbers. The best answers to such questions are short, concrete and to-the-point. Sometimes a small example program can help to make the answer clear. Do not attempt to write cover all the bases answers to such questions overly long answers will receive no marks. Write all your answers on the front side of each page.

CPSC 411, Fall 2010 Midterm Examination. Page 2 of 11 1. High Level Question [10 pts] So far we have covered 4 compiler stages. The initial input is a program in textual form. The final output is IR. In the space below, in order from first to last, write the name of each stage and briefly describe what it does. Stage 1 name: Lexical Analysis input: stream of characters description: output: Stream of tokens Divides the input into tokens. Tokens are "word-like" entities in the input, such as identifiers, numbers, operators, punctuation marks etc. It is also the job of this stage to skip over whitespace and comments. Stage 2 name: Parsing input: Stream of Tokens output: Parse tree (AST) description: This stage analyzes the "phrase structure" of the program. It constructs a parse tree or abstract syntax tree. that represents this structure, while abstracting away from the details of the concrete syntax. Stage 3 name: Semantic Analysis input: Parse Tree output: Decorated AST + error messages + symbol table description: This stage's main function is to check for "semantic errors", such as type errors and undefined identifiers. It may also decorate the AST with extra information (e.g. expressions may be decorated with their type). The stage makes heavy use of tables (symbol tables) that keep track of information about identifiers defined in a particular context. Stage 4 name: Translation to IR input: Decorated AST, symbol table output: IR Trees description: This converts a (assumed to be semantically correct) program into equivalent IR code. IR code is an "intermediate representation" that is at the same time closer to assembly language and also abstract enough to not be specific to a particular target architecture or source language.

CPSC 411, Fall 2010 Midterm Examination. Page 3 of 11 2. Symbol Tables [4pt] Below are two simple Java interfaces for symbol tables. The differences are highlighted in bold. interface XXXTable<Value> { Value lookup(symbol key); void insert(symbol key, Value v); interface YYYTable<Value> { Value lookup(symbol key); YYYTable<Value> insert(symbol key, Value v); a) [1pt]Which of the the two is the functional style: XXX / YYY (circle one) b) [1pt]One of the two interfaces is incomplete: it requires a way to remove symbols from the table. To provide this functionality we might provide two additional methods: void beginscope(); void endscope(); Which table needs these two methods? Functional / Imperative (circle one) c) [2pt] Explain briefly why the other style table does not require any methods for removing entries. (Note: there is much more space below here than you really need). The reason we need a "delete" is to be able to "undo" inserts when we are leaving a scope and need to go back to a higher scope. In the functional style we don't need an undo to restore the symbol table to a previous scope, because in the functional style the table from the previous scope still exists (an insert creates a new table without modifying the original table, so the original table still exists).

CPSC 411, Fall 2010 Midterm Examination. Page 4 of 11 Symbol Tables continued [11pts] We can use either hashtables or trees to implement each style of symbol tables but of course one is more natural than the other and yields better performance. Below provide "big O" estimates of the expected performance of a good quality implementation. Important note: hash tables are implemented using side effects. A correct implementation of a "functional style interface" may use side-effects internally, but from a user's point of view it must behave as a purely functional implementation. a) [4pt] Estimate the time consumed by each operation: Functional style interface Imperative style interface lookup insert lookup insert Tree-based O(log(n)) O(log(n)) O(log(n)) O(log(n)) Hash-based O(1) O(n) O(1) O(1) b) [2pt] Estimate the memory consumed by the insert operation.. Functional style interface Imperative style interface Tree-based O(log(n)) O(1) Hash-based O(n) O(1) Note: "memory consumed" means how much additional memory, on average, would be allocated during the execution of the insert operation. Do not consider potential garbage collection that may (or may not) occur after the insert operation is complete. c) [5pt] Assume that we are implementing an efficient single pass compiler for Pascal. All you need to know about Pascal for this question is that Pascal was designed to make it easy to implement a single pass compiler (the declaration of any identifier must always occur before it is first used). What style of interface would you choose. Functional / Imperative (circle one) What implementation do you choose. Hash / Tree (circle one) Explain briefly why: Generally the imperative hash-based implementation is the most efficient (in terms of both time as well as space). The only reason to choose a functional/tree style interface is because it supports persistence in an efficient manner (at the cost of making inserts and lookups a bit more expensive). Since our compiler is "single pass" it will only enter/exit each scope in the program exactly once. We would have little use for the "persistence" feature. So we have no good reason to choose the less efficient functional / tree-based implementation.

CPSC 411, Fall 2010 Midterm Examination. Page 5 of 11 3. Parsing [4pts] The following is an EBNF definition for a simple expression language. Exp ::= Exp ("+" "-") Exp Exp ":" Exp "(" Exp ")" <IDE> <NUM> <IDE> ::= <LETTER> ( <LETTER> <DIGIT> )* <NUM> ::= (<DIGIT>)+ This syntax is inspired by the Haskell language. The ":" operator is an infix version of Scheme's cons operator for creating lists. For example, assuming the identifier empty is bound to a representation of the empty list, a list of three elements [1,2,3] can be constructed as follows: 1:2:3:empty a) This grammar is ambiguous. Give an example that demonstrates this and explain the example by drawing parse trees. There are many examples... essentially, any expression that uses at least two operators will do. A simple example is 1+2-3 It can be parsed as + or - / \ / \ 1 - + 3 / \ / \ 2 3 1 2 Thus, the grammar is ambiguous since there are some Strings in the language that can be parsed in more than one way.

CPSC 411, Fall 2010 Midterm Examination. Page 6 of 11 Parsing (continued) [6pts] To make the grammar unambiguous we should take operator precedence and associativity into account. The table below shows the intended precedence rules. Example Equivalent to Rule 1: "+" and "-" associate left 3+4-5+6 ((3+4)-5)+6 Rule 2: ":" associates right 1:2:3:empty 1:(2:(3:empty)) Rule 3: "+" and "-" highest precedence 1+2:3-4:empty (1+2):(3+4):empty Below is an excerpt from a JavaCC definition. // BNF in left column // Java code in right column Exp Exp() : { Token t; Exp e1,e2; { e1=simplerexp() ( (t="+" t="-") e2=simplerexp() { e1 = new OpExp(t.image, e1, e2); )* { return e1; Exp SimplerExp() : { Token t; Exp e1,e2; { e1=primexp() ( t=":" e2=primexp() { e1 = new OpExp(t.image, e1, e2); )* { return e1; Exp PrimExp() : { Exp e; { ( e=num() e=ide() "(" e=exp() ")" ) { return e; b) The JavaCC file compiles without error but it has logical bugs. Below, answer for each precedence rule whether the parser correctly follows that rule. Rule 1: "+" and "-" associate left Correct / Incorrect (circle one) Rule 2: ":" associates right Correct / Incorrect (circle one) Rule 3: "+" and "-" highest precedence Correct / Incorrect (circle one)

CPSC 411, Fall 2010 Midterm Examination. Page 7 of 11 Parsing (continued) [5pts] On this page, write a correct replacement for the JavaCC code on the previous page. Two things need to be done: - switch the "+" "-" tokens with the ":" to fix precedence between them - then fix the ":" rule to make it build the parse tree right to left Exp Exp() : {Token t; Exp e1,e2; { e1=simplerexp() ( t=":" e2=simplerexp() { e1 = new OpExp(t.image, e1, e2); )? { return e1; Exp SimplerExp() : {Token t; Exp e1,e2; { e1=primexp() ( (t="+" t="-") e2=primexp() { e1 = new OpExp(t.image, e1, e2); )* { return e1; Exp PrimExp() :... unchanged...

CPSC 411, Fall 2010 Midterm Examination. Page 8 of 11 4. Translation into IR Code [10pts] Using the IR syntax provided in the appendix, write out plausible IR code that a hypothetical translator might produce. You may assume that the code is from a valid MiniJava program and that any variables in the code are allocated to a TEMP of the same name (e.g. x -> TEMP(x) ) a) [2pts] sum = sum+i; MOVE( TEMP(sum), PLUS( TEMP(sum), TEMP(i) ) ) b) [2pts] i = i + 1; MOVE( TEMP(sum), PLUS( TEMP(sum), CONST(1) ) ) c) [2pts] (Hint: you may "reuse" the code from a and b here by drawing a box labeled with a or b.) while (i<10) { sum = sum+i; i=i+1; SEQ( LABEL(start), CJUMP(LT, TEMP(i), CONST(10), body, end), LABEL(body), [A], [B], JUMP(NAME(start)), LABEL(end) )

CPSC 411, Fall 2010 Midterm Examination. Page 9 of 11 For the last part of this question, you are to consider two different translator implementations, used to translate the following (Mini)Java statement: if (flag>10) result=dothis(); else result=dothat(); d) [2pts] Write plausible IR produced by a translator implementation that uses the classes Nx, Ex and Cx to generate IR for expressions, specific to the context of their use. SEQ( CJUMP( GT, TEMP(flag), CONST(10), thn, els), LABEL(thn), MOVE( TEMP(result), CALL(doThis) ), JUMP(NAME(end)), LABEL(els), MOVE( TEMP(result), CALL(doThat) ), LABEL(end) ) Key feature of this solution: it contains only ONE CJUMP. e) [2pts]Write plausible IR, assuming a translator implementation that translates an AST Expression node directly into an IR Exp node, and then wraps some glue code around it to fit it into the context of use. SEQ( CJUMP( EQ, ***, CONST(1), thn, els), LABEL(thn), MOVE( TEMP(result), CALL(doThis) ), JUMP(NAME(end)), LABEL(els), MOVE( TEMP(result), CALL(doThat) ), LABEL(end) ) where *** is the result of translating "flag>10" into an IRExp node: *** = ESEQ( SEQ( MOVE( TEMP(rr), CONST(0) ), CJUMP(GT, TEMP(flag), CONST(10), tt, ff), LABEL(tt), MOVE( TEMP(rr), CONST(1) ), LABEL(ff) ), TEMP(rr) ) Key feature of this solution: it contains TWO CJUMPs.

CPSC 411, Fall 2010 Midterm Examination. Page 10 of 11 5. Canonicalization [10pts] We have seen a technique to transform IR into a more manageable canonical form. The algorithm (amongst others) conceptually implements a set of rewriting rules that lift ESEQ nodes out of expressions and statements. The rewriting rules in the book (Figure 8.1) are a subset of the rules necessary to eliminate all ESEQs from expressions. Show the right-hand side(s) for each of the following incomplete rules. Note that in some cases you may need to write two right-hand sides depending on whether something commutes (just as in part (3) and (4) of Figure 8.1). a) MOVE(TEMP t, ESEQ(s, e)) => SEQ( s, MOVE( TEMP t, e) ) b) EXP(CALL( e f, ESEQ(s, e 1 ), e 2 )) => if e f and s commute: SEQ(s, EXP(CALL( e f, ESEQ(s, e 1 ), e 2 ))) if e f and s do not commute (or are not known to commute): SEQ( MOVE( TEMP f, e f ), s, EXP(CALL(TEMP f, e 1, e 2 )) )

CPSC 411, Fall 2010 Midterm Examination. Page 11 of 11 APPENDIX: IR Tree Syntax Items between /*..*/ are comments indicating the purpose of an element. They are not part of the actual syntax. E.g MOVE contains two sub expressions, the first one is the "destination". Exp ::= CONST( int ) NAME( Label ) TEMP( Temp ) BINOP( Op, Exp, Exp) MEM(Exp) CALL( Exp /*fun*/ (, Exp)* /*args*/) ESEQ( Stm, Exp ) Stm ::= MOVE(Exp /*dst*/, Exp /*src*/) EXP(Exp) JUMP(Label) CJUMP(RelOp, Exp, Exp, Label /*thn*/, Label /*els*/) SEQ( Stm, Stm) LABEL( Label ) Op ::= PLUS, MINUS, MUL, DIV, AND, OR, LSHIFT,... RelOp ::= EQ, NE, LT, GT, LE, GE, ULT, ULE,... Label ::= <IDENTIFIER> Temp ::= <IDENTIFIER> To make IR code more readable, it is acceptable to use SEQ(s1, s2,..., sn) as shorthand for SEQ(s1, SEQ(s2,.. sn)))) It is also acceptable to use shorthand for BINOPs. For example, you can write PLUS(...,...) instead of BINOP(PLUS,...,...). Use meaningful formatting and indentation conventions to make the tree/nesting structure of the IR clear. For example: ESEQ( SEQ( MOVE(TEMP(x), CONST(0)), MOVE(TEMP(y), CALL(NAME(foo), TEMP(x), CONST(5)))), TEMP(y))