We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

Similar documents
Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Administration. Where we are. Canonical form. Canonical form. One SEQ node. CS 412 Introduction to Compilers

Where we are. Instruction selection. Abstract Assembly. CS 4120 Introduction to Compilers

CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia

Intermediate Code Generation

Languages and Compiler Design II IR Code Generation I

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Code Generation. Lecture 30

Administration. Today

Project Compiler. CS031 TA Help Session November 28, 2011

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

We can emit stack-machine-style code for expressions via recursion

Three-address code (TAC) TDT4205 Lecture 16

CS 6353 Compiler Construction Project Assignments

CS 6353 Compiler Construction Project Assignments

7 Translation to Intermediate Code

Compilers and computer architecture: A realistic compiler to MIPS

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Code Generation. Lecture 12

Itree Stmts and Exprs. Back-End Code Generation. Summary: IR -> Machine Code. Side-Effects

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CSE P 501 Exam 8/5/04

Compiler Construction Lecture 05 A Simple Stack Machine. Lent Term, Lecturer: Timothy G. Griffin. Computer Laboratory University of Cambridge

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

Topic 7: Intermediate Representations

Intermediate Representations

Announcements. Project 2: released due this sunday! Midterm date is now set: check newsgroup / website. Chapter 7: Translation to IR

Code Generation. Lecture 31 (courtesy R. Bodik) CS164 Lecture14 Fall2004 1

Compiler Construction 2009/2010: Intermediate Representation

Lecture 15 CIS 341: COMPILERS

Computer Architecture

Compiler Internals. Reminders. Course infrastructure. Registering for the course

Functional abstraction. What is abstraction? Eating apples. Readings: HtDP, sections Language level: Intermediate Student With Lambda

Functional abstraction

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS 320: Concepts of Programming Languages

Slide Set 1. for ENCM 339 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

Lecture Notes on Intermediate Representation

CPSC 411, Fall 2010 Midterm Examination

Translation. From ASTs to IR trees

Compiler Construction I

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

6.001 Recitation 23: Register Machines and Stack Frames

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

UNIT IV INTERMEDIATE CODE GENERATION

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

6.001 Notes: Section 15.1

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

COMP 303 Computer Architecture Lecture 3. Comp 303 Computer Architecture

CS 24: INTRODUCTION TO. Spring 2018 Lecture 3 COMPUTING SYSTEMS

Lecture #31: Code Generation

CS61C : Machine Structures

CSE P 501 Exam 8/5/04 Sample Solution. 1. (10 points) Write a regular expression or regular expressions that generate the following sets of strings.

CMPT 379 Compilers. Parse trees

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

CS61C : Machine Structures

Midterm II CS164, Spring 2006

CSE 582 Autumn 2002 Exam 11/26/02

CIS 341 Midterm March 2, 2017 SOLUTIONS

Interpreters and Tail Calls Fall 2017 Discussion 8: November 1, 2017 Solutions. 1 Calculator. calc> (+ 2 2) 4

Lecture Notes on Intermediate Representation

The remote testing experiment. It works! Code Generation. Lecture 12. Remote testing. From the cs164 newsgroup

Code generation scheme for RCMA

Compiler Construction D7011E

IR Generation. May 13, Monday, May 13, 13

Induction and Semantics in Dafny

Summary: Direct Code Generation

Compilers. Compiler Construction Tutorial The Front-end

Alternatives for semantic processing

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Three-Address Code IR

Implementing Procedure Calls

CS 360 Programming Languages Interpreters

IR trees: Statements. IR trees: Expressions. Translating MiniJava. Kinds of expressions. Local variables: Allocate as a temporary t

CS 2210 Programming Project (Part IV)

S3.0 : A multicore 32-bit Processor

CA Compiler Construction

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Intermediate Code, Object Representation, Type-Based Optimization

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

CSE 401/M501 Compilers

To figure this out we need a more precise understanding of how ML works

Lectures 5. Announcements: Today: Oops in Strings/pointers (example from last time) Functions in MIPS

Compilers and computer architecture From strings to ASTs (2): context free grammars

Other Forms of Intermediate Code. Local Optimizations. Lecture 34

Administrative. Other Forms of Intermediate Code. Local Optimizations. Lecture 34. Code Generation Summary. Why Intermediate Languages?

Types and Static Type Checking (Introducing Micro-Haskell)

Lecture Notes on Intermediate Representation

The PCAT Programming Language Reference Manual

Programming Languages Fall 2013

7. Introduction to Denotational Semantics. Oscar Nierstrasz

Oberon2 Compiler CS335: Compiler Project

Boolean expressions. Elements of Programming Languages. Conditionals. What use is this?

Lecture Outline. Topic 1: Basic Code Generation. Code Generation. Lecture 12. Topic 2: Code Generation for Objects. Simulating a Stack Machine

Compiler Construction I

Transcription:

CS 4120 Lecture 14 Syntax-directed translation 26 September 2011 Lecturer: Andrew Myers We want to translate from a high-level programming into an intermediate representation (IR). This lecture introduces syntax-directed translation as a concise way to formally describe the translation process. 1 The target We are using an IR based on Appel s tree-structured IR: s ::= MOVE(e dest, e src ) EXP(e) SEQ(s 1,..., s n ) JUMP(e) CJUMP(e, l 1, l 2 ) LABEL(l) RETURN e ::= CONST(i) TEMP(t) OP(e 1, e 2 ) MEM(e) CALL(e f, e 1,..., e n ) NAME(l) ESEQ(s, e) OP ::= ADD SUB MUL DIV MOD SHL SHR ASHR We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR. 2 Translation spec We want to implement a translation from high-level AST nodes to these IR nodes, something like the following methods: class Stmt { /** Return an IR statement node implementing this AST node. */ IRStmtNode translate() { class Expr { /** Return an IR expression node implementing this AST node. */ IRExprNode translate() { 1

x: t[]... element 2 element 1 element 0... arg 2 arg 1 return PC length array layout FP SP prev FP temps stack layout for function calls Figure 1: Memory layouts We will formally describe these two methods with translation functions S [s] and E [e]. If s is an Xi statement, S [s] is its translation into an IR statement node that has the same effect. If e is a Xi expression, E [e] is its translation into an IR expression node that has the same side effects and evaluates to the same value (or more precisely, the correct IR-level representation of the same value). We will define these two translation functions recursively. When developing translations, the key is to clearly define the specification of the translation, and to choose the right specification. Once the specification is right, the translation pretty much writes itself. Think of the specification as a contract that must be guaranteed by the actual translation chosen, but can rely on the contract being satisfied by any recursive uses of the same specification. 3 Memory layout The IR is considerably lower-level than the source level: it eliminates all higher-level data structures and control structures, making memory accesses and jumps explicit. This means we have to make some decisions about how built-in types are represented in memory, as depicted in Figure 1. Arrays need to store their own length, to be able to implement the length operation. We ll represent a value of array type as a reference to element 0, and store the length just before that in memory. Functions will be implemented using the C calling conventions for IA-32 (Intel 32-bit) processors: the arguments can be accessed relative to the frame pointer register, which we will write as TEMP(F P ) or just F P in the IR. The first argument is at address F P + 8, the second at F P + 12, and so on. To keep things simple for now, we ignore expressions with tuple type. Because the result of these expressions is larger than a single, word-sized IR value, we can t translate them directly to IR expressions. 4 Translating expressions We define the translation of expression e by considering each of the possible syntactic forms that e can take. For each form, there is a single rule that can be chosen to translate the expression. Therefore translation does not involve any searching; it is syntax-directed, just as the type systems we wrote earlier were. Different kinds of variables are translated differently. Here we ve chosen for function parameters to store them in the stack location where arguments are pushed as part of the function calling conventions. Alternatively we could use a TEMP, and start each function with a prologue that copies all the stack locations into TEMP nodes. Arithmetic is straightforward to translate. Notice that we have to use the translation function E [e] recursively on the subexpressions. 1 1 Because it is always used on proper subterms on the right-hand side, the definition of E[[e]] is a well-founded inductive definition. 2

E [e 1 + e 2 ] = ADD(E [e 1 ], E [e 2 ]) E [e 1 e 2 ] = SUB(E [e 1 ], E [e 2 ]) E [e 1 e 2 ] = MUL(E [e 1 ], E [e 2 ]) E [e 1 /e 2 ] = DIV(E [e 1 ], E [e 2 ]) E [e 1 %e 2 ] = MOD(E [e 1 ], E [e 2 ]) We ll assume that we have the types of expressions available to disambiguate array accesses from function applications. Given this, we can translate array indexing and function calls: For example, consider what happens when we translate the source expression gcd(x, y-x). Using the rules we have so far, we get: E [gcd(x, y x)] = CALL(NAME(gcd), TEMP(x), SUB(TEMP(y), TEMP(x))) 5 Translating statements The translation function S [s] translates a statement s into a an IR statement. S [x = e] = MOVE(E [x], E [e]) S [e 1 [e 2 ] = e 3 ] = MOVE(E [e 1 [e 2 ]], E [e 3 ]) For statements that involve control, we need to generate labels and use JUMP statements. The statement labels should be fresh: not used anywhere else in the translation. S [if (e) s] = SEQ(CJUMP(E [e], l t, l f ), LABEL(l t ), S [s], LABEL(l f )) (Translation with else is similar) S [while (e) s] = SEQ(LABEL(l h ), CJUMP(E [e], l e, l t ), LABEL(l t ), S [s], JUMP(NAME(l h )), LABEL(l e )) 6 Function definitions Suppose that a function is written as f(x 1 : t 1,..., x n : t n ) = s. We assume there is a special temporary TEMP(RV ) that holds the return value of a function (which at the IR level must be a single word). The function body s is translated to an IR statement as: SEQ(LABEL(f), S [s]) We translate return simply: 3

S [return] = RETURN S [return e] = SEQ(MOVE(TEMP(RV ), E [e]), RETURN) Returning multiple values from a function must be accomplished in this IR by storing them into a memory location. We leave how to do this for later. 7 Booleans and control flow We might be tempted to translate the boolean conjunction operation as follows: E [e 1 &e 2 ] = AND(E [e 1 ], E [e 2 ]) (BAD!) This doesn t work because we need the & operator to short-circuit if the first argument is false. We can code this up explicitly using control flow, but the result is rather verbose: E [e 1 &e 2 ] = ESEQ(SEQ(MOVE(TEMP(x), 0), TEMP(x)) CJUMP(E [e 1 ], l 1, l f ), LABEL(l 1 ), CJUMP(E [e 2 ], l 2, l f ), LABEL(l 2 ), MOVE(TEMP(x), 1) LABEL(l f )), Imagine applying this translation to the statement if (e 1 &e 2 ) s. We will obtain a big chunk of IR that combines all the IR nodes from this translation with those of if. Instead, here is a much shorter translation of if (e 1 &e 2 ) s: CJUMP(E [e 1 ], l 1, l f ) LABEL(l 1 ), CJUMP(E [e 2 ], l 2, l f ) S [s] LABEL(l f ) How can we get an efficient translation like this one? The key is to observe that the efficient translation never records the value of e 1 and e 2. Instead, it turns their results into immediate control flow decisions. The fact that we arrive at label l 1 means that e 1 is true. Because e 1 and e 2 appear only as children of CJUMP nodes, we will be able to generate more efficient code when we translate IR to assembly, too. 8 Translating booleans to control flow Therefore, we develop a third syntax-directed translation, which we call C for control. The specification of our translation is as follows: C [e, t, f ] is an IR statement that has all the side effects of e, and then jumps to label t if e evaluates to true, and to label f if e evaluates to false. Given this spec, we can easily translate simple boolean expressions: 4

C [true, t, f ] = JUMP(t) C [false, t, f ] = JUMP(f) C [e, t, f ] = CJUMP(E [e], t, f) (We use this rule if no other rule applies.) The payoff comes for boolean operators, e.g.: C [e 1 &e 2, t, f ] = SEQ(C [e 1, l 1, f ], LABEL(l 1 ), C [e 2, t, f ]) We can then translate if and while more efficiently by using the C [ ] translation instead: S [if (e) s] = SEQ(C [e, l 1, l 2 ], LABEL(l 1 ), S [s], LABEL(l 2 )) S [while (e) s] = SEQ(LABEL(l h ), C [e, l 1, l e ], LABEL(l 1 ), S [s], JUMP(l h ), LABEL(l e )) Now when we translate if (e 1 &e 2 ) s using these new rules, the result is compact and efficient. 5