Intermediate Representations

Similar documents
Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Intermediate Representations

Intermediate Representations & Symbol Tables

Intermediate Representations

CS415 Compilers. Intermediate Represeation & Code Generation

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

Intermediate Representations Part II

Intermediate Representations

Intermediate Representation (IR)

Alternatives for semantic processing

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

6. Intermediate Representation

CSE 401/M501 Compilers

Front End. Hwansoo Han

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

6. Intermediate Representation!

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Intermediate representation

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Intermediate Code Generation

Syntax-directed translation. Context-sensitive analysis. What context-sensitive questions might the compiler ask?

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Context-sensitive analysis. Semantic Processing. Alternatives for semantic processing. Context-sensitive analysis

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Compiler Optimization Intermediate Representation

Lecture 31: Building a Runnable Program

Intermediate Representations

CODE GENERATION Monday, May 31, 2010

CSc 453 Interpreters & Interpretation

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

The View from 35,000 Feet

CS5363 Final Review. cs5363 1

Chapter 11. Instruction Sets: Addressing Modes and Formats. Yonsei University

Principles of Compiler Design

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

tokens parser 1. postfix notation, 2. graphical representation (such as syntax tree or dag), 3. three address code

Overview of a Compiler

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

CSC488S/CSC2107S - Compilers and Interpreters. CSC 488S/CSC 2107S Lecture Notes

Chapter 1 Preliminaries

Code Shape II Expressions & Assignment

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Figure 28.1 Position of the Code generator

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Chapter 5. A Closer Look at Instruction Set Architectures

Understand the factors involved in instruction set

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 1. Preliminaries

EC-801 Advanced Computer Architecture

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures

Intermediate Code Generation

Compiler Design. Code Shaping. Hwansoo Han

CSC488S/CSC2107S - Compilers and Interpreters. CSC 488S/CSC 2107S Lecture Notes

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Code Generation (#')! *+%,-,.)" !"#$%! &$' (#')! 20"3)% +"#3"0- /)$)"0%#"

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

CS2210: Compiler Construction. Code Generation

Intermediate Code & Local Optimizations

CST-402(T): Language Processors

Time : 1 Hour Max Marks : 30

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Compiling Techniques

High-Level Language VMs

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

History of Compilers The term

UNIT IV INTERMEDIATE CODE GENERATION

CSc 520 Principles of Programming Languages

LECTURE 17. Expressions and Assignment

Chapter 1. Preliminaries

CSc 553. Principles of Compilation. 2 : Interpreters. Department of Computer Science University of Arizona

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

CSc 453. Compilers and Systems Software. 18 : Interpreters. Department of Computer Science University of Arizona. Kinds of Interpreters

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

Advanced C Programming

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Implementing Control Flow Constructs Comp 412

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

CHAPTER 5 A Closer Look at Instruction Set Architectures

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

Chapter 2 Instruction Set Architecture

CMPSC 160 Translation of Programming Languages. Three-Address Code

Addressing Modes. Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 11 Instruction Sets: Addressing Modes and Formats

Compilers. Lecture 2 Overview. (original slides by Sam

Intermediate Code Generation

Transcription:

Intermediate Representations Intermediate Representations (EaC Chaper 5) Source Code Front End IR Middle End IR Back End Target Code Front end - produces an intermediate representation (IR) Middle end - transforms the IR into an equivalent IR that runs more efficiently Back end - transforms the IR into native code IR encodes the compiler s knowledge of the program Middle end usually consists of several passes 1

Intermediate Representations Decisions in IR design affect the speed and efficiency of the compiler Some important IR properties Ease of generation Ease of manipulation Procedure size Freedom of expression Level of abstraction The importance of different properties varies between compilers Selecting an appropriate IR for a compiler is critical 3 Types of Intermediate Representations Three major categories Structural Graphically oriented Heavily used in source-to-source translators Tend to be large Linear Pseudo-code for an abstract machine Level of abstraction varies Simple, compact data structures Easier to rearrange Hybrid Combination of graphs and linear code Examples: Trees, DAGs Examples: 3 address code Stack machine code Example: Control-flow graph 4

Level of Abstraction The level of detail exposed in an IR influences the profitability and feasibility of different optimizations. Two different representations of an array reference: subscript A i j High level AST: Good for memory disambiguation I 1 => r 1 sub r j, r 1 => r I 10 => r 3 mult r, r 3 => r 4 sub r i, r 1 => r 5 add r 4, r 5 => r 6 I @A => r 7 Add r 7, r 6 => r 8 r 8 => r Aij Low level linear code: Good for address calculation 5 Level of Abstraction Structural IRs are usually considered high-level Linear IRs are usually considered low-level Not necessarily true: Low level AST + + @A Array A,i,j High level linear code * - - 10 j 1 j 1 6 3

Abstract Syntax Tree An abstract syntax tree is the procedure s parse tree with the nodes for most non-terminal nodes removed - x * x - * y Can use linearized form of the tree Easier to manipulate than pointers x y * - y in postfix form - * y x in prefix form S-expressions are (essentially) ASTs 7 Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique node for each value z - w / x * Makes sharing explicit Encodes redundancy y z x - * y w x / Same expression twice means that the compiler might arrange to evaluate it just once! 8 4

Stack Machine Code Originally used for stack-based computers, now Java Example: x - * y becomes push x push push y multiply subtract Advantages Compact form Introduced names are implicit, not explicit Simple to generate and execute code Useful where code is transmitted over slow communication links (the net ) Implicit names take up no space, where explicit ones do! 9 Three Address Code Several different representations of three address code In general, three address code has statements of the form: x y op z With 1 operator (op ) and, at most, 3 names (x, y, z) Example: z x - * y becomes Advantages: Resembles many machines Introduces a new set of names * Compact form t * y z x - t 10 5

Three Address Code: Quadruples Naïve representation of three address code Table of k * 4 small integers Simple record structure Easy to reorder Explicit names The original FORTRAN compiler used quads r1, y I r, mult r3, r, r1 r4, x sub r5, r4, r3 i mult sub 1 3 4 5 Y X 4 1 RISC assembly code Quadruples 11 Three Address Code: Triples Index used as implicit name 5% less space consumed than quads Much harder to reorder (1) y () I (3) mult (1) () (4) x (5) sub (4) (3) Implicit names take no space! 1 6

Three Address Code: Indirect Triples List triples in a statement list data structure Implicit name space Uses more space than triples, but easier to reorder stmt array (103) (101) (100) (10) (104) (100) (101) (10) (103) (104) I mult sub y (100) x (103) (101) (10) Major tradeoff between quads and triples is compactness versus ease of manipulation (note: multiple occurrences of same statement in stmt array is possible) compile-time space critical? compilation speed more important? 13 Static Single Assignment Form (SSA) The main idea: each name defined exactly once in program Introduce φ-functions to make it work Original SSA-form x... y... while (x < k) x x + 1 y y + x Strengths of SSA-form Sharper analysis φ-functions give hints about placement (sometimes) faster algorithms x 0... y 0... if (x 0 > k) goto next loop: x 1 φ(x 0,x ) y 1 φ(y 0,y ) x x 1 + 1 y y 1 + x if (x < k) goto loop next:... 14 7

Two Address Code Allows statements of the form x x op y Has 1 operator (op ) and, at most, names (x and y) Example: z x - * y Can be very compact becomes t 1 t y t t * t 1 z x z z - t Problems Machines no longer rely on destructive operations Difficult name space Destructive operations make reuse hard Good model for machines with destructive ops (PDP-11) 15 Control-flow Graph (CFG) Models the transfer of control in the procedure Nodes in the graph are basic blocks Can be represented with quads or any other linear representation Edges in the graph represent control flow Example a b 5 if (x = y) a 3 b 4 Basic blocks Maximal length sequences of straight-line code c a * b 16 8

Using Multiple Representations Source Code Front End IR 1 Middle IR Middle IR 3 End End Back End Target Code Repeatedly lower the level of the intermediate representation Each intermediate representation is suited towards certain optimizations Example: the Open64 compiler WHIRL intermediate format Consists of 5 different IRs that are progressively more detailed 17 Memory Models Two major models Register-to-register model Keep all values that can legally be stored in a register in registers Ignore machine limitations on number of registers Compiler back-end must insert s and stores Memory-to-memory model Keep all values in memory Only promote values to registers directly before they are used Compiler back-end can remove s and stores Compilers for RISC machines usually use register-to-register Reflects programming model Easier to determine when registers are used 18 9

The Rest of the Story Representing the code is only part of an IR There are other necessary components Symbol table (already discussed) Constant table Representation, type Storage class, offset Storage map Overall storage layout Overlap information Virtual register assignments 19 Virtual Machines Can interpret IR using virtual machine Examples P-code for Pascal postscript for display devices Java byte code for everywhere Result easy & portable much slower Just-in-time compilation (JIT) begin interpreting IR find performance critical section(s) compile section(s) to native code...or just compile entire program compilation time becomes execution time 0 10

Java Virtual Machine (JVM) The JVM consists of four parts Memory Stack (for function call frames) Heap (for dynamically allocated memory) Constant pool (shared constant data) Code segment (instructions of class files) Registers Stack pointer (SP), local stack pointer (LSP), program counter (PC) Condition codes Stores result of last conditional instruction Execution unit 1. Reads current JVM instruction. Change state of virtual machine 3. Increment PC (modify if call, branch) 1 Java Byte Codes 11

Java Byte Codes 3 Java Byte Codes 4 1

Java Byte Codes 5 Java Byte Code Interpreter 6 13