CS 432 Fall Mike Lam, Professor. Code Generation

Similar documents
CS415 Compilers. Intermediate Represeation & Code Generation

Intermediate Code Generation

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

Intermediate Code Generation

Intermediate Code Generation

CS5363 Final Review. cs5363 1

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

CS2210: Compiler Construction. Code Generation

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Code Generation. Lecture 30

CS 406/534 Compiler Construction Putting It All Together

Implementing Control Flow Constructs Comp 412

Compiler Design. Code Shaping. Hwansoo Han

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Introduction to Programming Using Java (98-388)

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Intermediate Representations

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

CSE 504: Compiler Design. Code Generation

Intermediate Representations

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

CSE 401/M501 Compilers

Module 27 Switch-case statements and Run-time storage management

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

CS 261 Fall Mike Lam, Professor Integer Encodings

Homework I - Solution

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

Semantic actions for declarations and expressions

Intermediate Code Generation

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Semantic actions for declarations and expressions. Monday, September 28, 15

1 Lexical Considerations

Semantic actions for expressions

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Languages and Compiler Design II IR Code Generation I

(just another operator) Ambiguous scalars or aggregates go into memory

Concepts Introduced in Chapter 6

Intermediate Code Generation

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

Code Shape II Expressions & Assignment

IR Optimization. May 15th, Tuesday, May 14, 13

Compilers. Compiler Construction Tutorial The Front-end

Semantic actions for declarations and expressions

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

CSE P 501 Exam Sample Solution 12/1/11

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Intermediate Representations

Modules, Structs, Hashes, and Operational Semantics

INTERMEDIATE REPRESENTATIONS RTL EXAMPLE

Chapter 2 A Quick Tour

Lexical Considerations

CSE 401/M501 Compilers

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Intermediate Representations

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

An Overview of Compilation

Intermediate Representations Part II

Intermediate representation

Code Generation. Lecture 31 (courtesy R. Bodik) CS164 Lecture14 Fall2004 1

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

CS 6353 Compiler Construction Project Assignments

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

Practical Malware Analysis

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

Writing Evaluators MIF08. Laure Gonnord

Formats of Translated Programs

IR Generation. May 13, Monday, May 13, 13

Program Representations

Compiler Construction I

CSE P 501 Exam 11/17/05 Sample Solution

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

Digital Forensics Lecture 3 - Reverse Engineering

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Intermediate Code Generation (ICG)

Project Compiler. CS031 TA Help Session November 28, 2011

Object Code (Machine Code) Dr. D. M. Akbar Hussain Department of Software Engineering & Media Technology. Three Address Code

Three-Address Code IR

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

x86-64 Programming III & The Stack

Intermediate Representa.on

CS 261 Fall Mike Lam, Professor. CPU architecture

CS 11 C track: lecture 8

Intermediate Representations

Intermediate Representations & Symbol Tables

Implementing Classes, Arrays, and Assignments

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

Programming Language Processor Theory

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Code Generation. CS 1622: Code Generation & Register Allocation. Arrays. Why is Code Generation Hard? Multidimensional Arrays. Array Element Address

THEORY OF COMPILATION

Transcription:

CS 432 Fall 2015 Mike Lam, Professor Code Generation

Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00... Lexing Parsing Code Generation & Optimization "Front end" Current focus

Our Project Current status: type-checked AST Next step: convert to ILOC This step is called code generation Convert from a tree-based IR to a linear IR (or directly to machine code) Use a tree traversal to linearize the program But first, more general code gen topics

Goals Code generator outputs Stack code (push a, push b, multiply, pop c) Three-address code (c = a + b) Machine code (movq a, %eax; addq b, %eax; movq %eax, c) Code generator requirements Must preserve semantics Should produce efficient code Should run efficiently

Obstacles Generating the most optimal code is undecidable Unlike front-end transformations (e.g., lexing & parsing) Must use heuristics and approximation algorithms This is why most compilers research since 1960s has been on the back end

Phases Instruction selection Map IR to target instructions Difficulty is directly related to uniformity and completeness of target instruction set Register allocation/assignment Allocation: selecting which variables to store in registers Assignment: selecting which register to use for each variable General problem is NP-complete Instruction scheduling Optimize for pipelined architectures w/ caching Take advantage of speculative execution

Syntax-Directed Translation Similar to attribute grammars (Figure 4.15) Associate bits of code with each production This code performs the translation or code gen Save intermediate results in temporary registers for now In our project, we will use a visitor Still syntax-based (actually AST-based) Not dependent on original grammar

ILOC Linear IR based on research compiler from Rice See Appendix A (and ILOCInstruction.java) I have made some modifications Removed most immediate instructions (i.e., subi) Removed binary shift instructions Removed character-based instructions Removed jump tables Removed comparison-based conditional jumps Added labels and function call mechanisms (call, param, return) Added symbol address referencing (loads) Added binary not and arithmetic neg Added print and nop instructions

SSA Form Static single-assignment Naming convention that uses a unique name for each newlycalculated value Values are collapsed at control flow points using Φ-functions (not actual executed!) Useful for various types of analysis cmp_lt r1, r2 r3 cbr r3 l1, l2 l1: loadi 4 r4 jmp l3 l2: loadi 8 r5 jmp l3 l3: r6 = Φ(r4, r5)

Assigning Storage Locations Memory regions Code ("text") Static ("data") Heap Stack Registers General Special

Boolean Encoding Integers: 0 for false, 1 for true Difference from book No comparison-based conditional branches Conditional branching uses boolean values instead Short-circuiting Not in Decaf!

String Handling Arrays of chars vs. encapsulated type Former is faster, latter is easier/safer C uses the former, Java uses the latter Mutable vs. immutable Former is more intuitive, latter is (sometimes) faster C uses the former, Java uses the latter Decaf: immutable string constants only No string variables

Array Accesses Generalization to multidimensions: base + (i_1 * w_1) + (i_2 * w_2) +... + (i_k * w_k) Alternate definition: 1d: base + width * (i_1) 2d: base + width * (i_1 * n_2 + i_2) nd: base + width * ((... ((i_1 * n_2 + i_2) * n_3 + i_3)... ) * n_k + i_k) * width Row-major vs. column-major In Decaf: row-major one-dimensional global arrays

Struct and Record Types How to access member values? Static offsets from base of struct/record OO adds another level of complexity Now classes have methods Class instance records and virtual method tables In Decaf: no structs or classes

Control Flow Introduce program labels Named location in the program Generated sequentially using static newlabel() call Generate goto instructions using templates Also called "jumps" or "branches" In ILOC: cbr instruction Templates are composable

Control Flow if statement: if (E) B1 re = << E code >> cbr re b1, skip b1: << B1 code >> skip:

Control Flow if statement: if (E) B1 else B2 re = << E code >> cbr re b1, b2 b1: << B1 code >> jmp done b2: << B2 code >> done:

Control Flow while loop: while (E) B cond: re = << E code >> cbr re body, done body: << B code >> jmp cond done: ; CONTINUE target ; BREAK target

Control Flow for loop: for V in E1, E2 B rx = << E1 code >> ry = << E2 code >> rv = rx cond: cmp_ge rv, ry rc cbr rc done, body body: << B code >> rv = rv + 1 jmp cond done: ; CONTINUE target ; BREAK target NOT CURRENTLY IN DECAF

Control Flow switch statement: switch (E) { case V1: B1 case V2: B2 default: BD } NOT CURRENTLY IN DECAF b1: b2: l3: re = << E code >> if re == V1 goto b1 if re == V2 goto b2 << BD code >> jmp end << B1 code >> jmp end << B2 code >> jmp end

Control Flow For sequential values starting with constant (C): ("jump table") re = << E code >> jmp jt(re) jt: jmp l1 jmp l2 (...)

Procedure Calls These are harder We'll talk about them next week