FSU. Av oiding Unconditional Jumps by Code Replication. Frank Mueller and David B. Whalley. Florida State University DEPARTMENT OF COMPUTER SCIENCE

Size: px
Start display at page:

Download "FSU. Av oiding Unconditional Jumps by Code Replication. Frank Mueller and David B. Whalley. Florida State University DEPARTMENT OF COMPUTER SCIENCE"

Transcription

1 Av oiding Unconditional Jumps by Code Replication by Frank Mueller and David B. Whalley Florida State University

2 Overview uncond. jumps occur often in programs [4-0% dynamically] produced by loops, conditional statements, etc. can almost always be avoided technique to avoid uncond. jumps introduction evaluation 2

3 Motivation code generated includes uncond. jumps for while-loops and for-loops typically if-then-else construct always other language constructs (break, goto, continue in C) asaside-effect of optimizations uncond. jump instruction can be avoided when code is replicated from the target methods often employed for certain class of loops in frontend new method ispart of optimizations in back-end of compiler can be applied universally to all uncond. jumps may introduce sources for other optimizations 3

4 Example While-Loop (RTLs for 68020) without replication i=; while (x[i]) { x[i-] = x[i]; i++; } with replication a[0]=a[6]+x.+; NZ=B[a[6]+x.+]?0; a[]=a[0]; PC=NZ==0,L6; L5 a[0]=a[6]+x.; NZ=B[a[0]]?0; L000 PC=NZ==0,L6; B[a[0]]=B[a[0]+]; B[a[0]-]=B[a[]++]; a[0]=a[0]+; a[0]=a[0]+; NZ=B[a[0]+]?0; PC=L5; PC=NZ!=0,L000; L6... L6... 4

5 Example 2 For-Loop (RTLs for 68020) without replication for (i = k; i < 0; i++) x[i] = y[i]; with replication L9 L8 d[0]=l[a[6]+k.]; a[0]=d[0]+a[6]+x.; a[]=d[0]+a[6]+y.; PC=L8; B[a[0]++]=B[a[]++]; d[0]=d[0]+; L9 d[0]=l[a[6]+k.]; NZ=d[0]?0; PC=NZ>=0,L000; a[0]=d[0]+a[6]+x.; a[]=d[0]+a[6]+y.; B[a[0]++]=B[a[]++]; d[0]=d[0]+; NZ=d[0]?0; NZ=d[0]?0; PC=NZ<0,L9; PC=NZ<0,L9;... L

6 Example 3 Exit Condition in the Middle of a Loop (RTLs for 68020) i=; while (i++ < n) x[i-] = x[i]; without replication with replication L5 d[]=; a[0]=a[6]+x.; d[0]=; d[]=2; NZ=d[0]?L[_n]; PC=NZ>=0,L6; a[0]=a[6]+x.+; d[0]=d[]; a[0]=a[0]+; d[]=d[]+; L000 NZ=d[0]?L[_n]; B[a[0]]=B[a[0]+]; PC=NZ>=0,L6; a[0]=a[0]+; B[a[0]]=B[a[0]+]; d[0]=d[]; PC=L5; d[]=d[]+; L6... NZ=d[0]?L[_n] PC=NZ<0,L000; L6... 6

7 Example 4 If-Then-Else Statement (RTLs for 68020) without replication if (i > 5) i=i/n; else i=i*n; return(i); with replication L22 L23 NZ=L[a[6]+i.]?5; PC=NZ<=0,L22; d[0]=l[a[6]+i.]; d[0]=d[0]/l[a[6]+n.]; L[a[6]+i.]=d[0]; PC=L23; d[0]=l[a[6]+i.]; d[0]=d[0]*l[a[6]+n.]; L[a[6]+i.]=d[0]; a[6]=uk; PC=RT; L22 NZ=L[a[6]+i.]?5; PC=NZ<=0,L22; d[0]=l[a[6]+i.]; d[0]=d[0]/l[a[6]+n.]; L[a[6]+i.]=d[0]; a[6]=uk; PC=RT; d[0]=l[a[6]+i.]; d[0]=d[0]*l[a[6]+n.]; L[a[6]+i.]=d[0]; a[6]=uk; PC=RT; 7

8 Remote Preheader No Preheader Remote Preheader 8

9 Algorithm JUMPS. set up matrix (used to find shortest replication sequence) 2. traverse basic blocks until uncond. jump found 3. choose replication sequence (towards return or loop) 4. expand replication sequence to include all blocks within a loop 5. replicate code and adjust its control flow 6. adjust control flow ofportion of loops which was not copied 7. if control flow has become non-reducible then remove replicated code 9

10 Interference with Natural Loops Without Replication With Partial Replication With Loop Replication

11 Partial Overlapping of Natural Loops Initial Control Flow After Replication Adjusted Control Flow 2

12 Measurements for Motorola 68020/6888 and Sun SPARC test programs included benchmarks UNIX utilities applications instrumentation of programs at code generation time compiled with different sets of optimizations: SIMPLE: standard opt. LOOPS: standard opt. + code replication at loops JUMPS: standard opt. + generalized code repl. 3

13 Measurements (cont.) Number of Static and Dynamic Instructions (Sun SPARC) Sun SPARC program static instructions dynamic instructions executed SIMPLE LOOPS JUMPS SIMPLE LOOPS JUMPS cal % +2.89% 37, % -3.5% quicksort % +50.6% 836, % -4.2% wc % % 540, % -.96% grep % %,930, % -3.57% sort, % +89.7%,8, % -0.49% od, % +95.9% 2,336, % -0.22% mincost, % % 335, % -3.9% bubblesort % +5.4% 29,07, % -0.07% matmult % +3.67% 4,403, % -0.28% banner % % 2, % -0.25% sieve % +3.23% 2,84, % -3.73% compact, % +75.8% 3,409, % -4.86% queens % +7.89% 263, % -0.03% deroff 7, % % 448,58-0.0% -3.3% average, % % 4,784, % -5.7% 5

14 Measurements (cont.) Number of Static and Dynamic Instructions (Motorola 68020) Motorola program static instructions dynamic instructions executed SIMPLE LOOPS JUMPS SIMPLE LOOPS JUMPS cal % % 36, % -3.7% quicksort % % 536, % -3.96% wc % % 42, % -5.32% grep % %,309, % -3.44% sort, % % 902, % -2.43% od, % %,980, % -0.30% mincost % % 302, % -5.3% bubblesort % +2.92% 20,340, % -8.92% matmult % +3.42% 4,89, % -0.2% banner % % 2, % -3.34% sieve % +.43%,759, % -8.53% compact, % % 0,602, % -5.26% queens % +2.77% 89, % -0.05% deroff 5, % +55.7% 360, % -7.05% average % % 3,6, % -6.94% 6

15 Future Work handle indirect jumps in algorithm should improve dynamic savings may reduce size of generated code limit length of replication sequence, use depth-bound DFS should reduce size of generated code should improve compile-time overhead of optimization phase determine best phase ordering trade-off compile-time / exploit optimizations 8

16 Conclusions code replication avoids almost all uncond. jumps reduces number of executed instructions by 6% increases number of instructions between branches by.5 on SPARC results in 4% decreased cache work (except for small caches) increases code size by 53% outperforms traditional methods to avoid uncond. jumps should be applied in the back-end of highly optimizing compilers 9

17 Order of Optimizations branch chaining; dead code elimination; reorder basic blocks to minimize jumps; code replication (either JUMPS or LOOPS); dead code elimination; instruction selection; register assignment; if (change) instruction selection; do { register allocation by register coloring; instruction selection; common subexpression elimination; dead variable elimination; code motion; strength reduction; recurrences; instruction selection; branch chaining; constant folding at conditional branches; code replication (either JUMPS or LOOPS); dead code elimination; } while (change); filling of delay slots for RISCs; 20

Avoiding Unconditional Jumps by Code Replication. Frank Mueller and David B. Whalley. from the execution of a number of programs.

Avoiding Unconditional Jumps by Code Replication. Frank Mueller and David B. Whalley. from the execution of a number of programs. voiding Unconditional Jumps by Code Replication Frank Mueller and David B Whalley Department of Computer Science, B-73 Florida State University Tallahassee, Florida 32306-09 e-mail: whalley@csfsuedu bstract

More information

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization? What Do Compilers Do? Lecture 1 Introduction I What would you get out of this course? II Structure of a Compiler III Optimization Example Reference: Muchnick 1.3-1.5 1. Translate one language into another

More information

Evaluating Heuristic Optimization Phase Order Search Algorithms

Evaluating Heuristic Optimization Phase Order Search Algorithms Evaluating Heuristic Optimization Phase Order Search Algorithms by Prasad A. Kulkarni David B. Whalley Gary S. Tyson Jack W. Davidson Computer Science Department, Florida State University, Tallahassee,

More information

Control flow graphs and loop optimizations. Thursday, October 24, 13

Control flow graphs and loop optimizations. Thursday, October 24, 13 Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange

More information

FSU DEPARTMENT OF COMPUTER SCIENCE

FSU DEPARTMENT OF COMPUTER SCIENCE mueller@cs.fsu.edu whalley@cs.fsu.edu of Computer Science Department State University Florida On Debugging RealTime Applications Frank Mueller and David Whalley Tallahassee, FL 323044019 email: On Debugging

More information

r[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2];

r[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2]; Using a Swap Instruction to Coalesce Loads and Stores Apan Qasem, David Whalley, Xin Yuan, and Robert van Engelen Department of Computer Science, Florida State University Tallahassee, FL 32306-4530, U.S.A.

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

Lecture Notes on Loop Optimizations

Lecture Notes on Loop Optimizations Lecture Notes on Loop Optimizations 15-411: Compiler Design Frank Pfenning Lecture 17 October 22, 2013 1 Introduction Optimizing loops is particularly important in compilation, since loops (and in particular

More information

CSC D70: Compiler Optimization

CSC D70: Compiler Optimization CSC D70: Compiler Optimization Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons CSC D70: Compiler Optimization

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

Lecture 4: Instruction Set Design/Pipelining

Lecture 4: Instruction Set Design/Pipelining Lecture 4: Instruction Set Design/Pipelining Instruction set design (Sections 2.9-2.12) control instructions instruction encoding Basic pipelining implementation (Section A.1) 1 Control Transfer Instructions

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Static Single Assignment Form in the COINS Compiler Infrastructure

Static Single Assignment Form in the COINS Compiler Infrastructure Static Single Assignment Form in the COINS Compiler Infrastructure Masataka Sassa (Tokyo Institute of Technology) Background Static single assignment (SSA) form facilitates compiler optimizations. Compiler

More information

In Search of Near-Optimal Optimization Phase Orderings

In Search of Near-Optimal Optimization Phase Orderings In Search of Near-Optimal Optimization Phase Orderings Prasad A. Kulkarni David B. Whalley Gary S. Tyson Jack W. Davidson Languages, Compilers, and Tools for Embedded Systems Optimization Phase Ordering

More information

Languages and Compiler Design II IR Code Optimization

Languages and Compiler Design II IR Code Optimization Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1 Agenda IR Optimization

More information

Optimization Prof. James L. Frankel Harvard University

Optimization Prof. James L. Frankel Harvard University Optimization Prof. James L. Frankel Harvard University Version of 4:24 PM 1-May-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Reasons to Optimize Reduce execution time Reduce memory

More information

COP5621 Exam 4 - Spring 2005

COP5621 Exam 4 - Spring 2005 COP5621 Exam 4 - Spring 2005 Name: (Please print) Put the answers on these sheets. Use additional sheets when necessary. Show how you derived your answer when applicable (this is required for full credit

More information

Intermediate Code & Local Optimizations

Intermediate Code & Local Optimizations Lecture Outline Intermediate Code & Local Optimizations Intermediate code Local optimizations Compiler Design I (2011) 2 Code Generation Summary We have so far discussed Runtime organization Simple stack

More information

FSU DEPARTMENT OF COMPUTER SCIENCE

FSU DEPARTMENT OF COMPUTER SCIENCE mueller@cs.fsu.edu whalley@cs.fsu.edu of Computer Science Department State University Florida Predicting Instruction Cache Behavior Frank Mueller () David Whalley () Marion Harmon(FAMU) Tallahassee, FL

More information

for (sp = line; *sp; sp++) { for (sp = line; ; sp++) { switch (*sp) { switch (*sp) { case p :... case k :...

for (sp = line; *sp; sp++) { for (sp = line; ; sp++) { switch (*sp) { switch (*sp) { case p :... case k :... Eectively Exploiting Indirect Jumps GANG-RYUNG UH Lucent Technologies, Allentown, PA 18103 tel: 610-712-2447, email:uh@lucent.com and DAVID B. WHALLEY Department of Computer Science, Florida State University,

More information

Goals of Program Optimization (1 of 2)

Goals of Program Optimization (1 of 2) Goals of Program Optimization (1 of 2) Goal: Improve program performance within some constraints Ask Three Key Questions for Every Optimization 1. Is it legal? 2. Is it profitable? 3. Is it compile-time

More information

CODE GENERATION Monday, May 31, 2010

CODE GENERATION Monday, May 31, 2010 CODE GENERATION memory management returned value actual parameters commonly placed in registers (when possible) optional control link optional access link saved machine status local data temporaries A.R.

More information

Methods for Saving and Restoring Register Values across Function Calls

Methods for Saving and Restoring Register Values across Function Calls Methods for Saving and Restoring Register Values across Function Calls JACK W DAVIDSON AND DAVID B WHALLEY Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA SUMMARY

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013 Loops or Lather, Rinse, Repeat CS4410: Spring 2013 Program Loops Reading: Appel Ch. 18 Loop = a computation repeatedly executed until a terminating condition is reached High-level loop constructs: While

More information

Lecture 3 Local Optimizations, Intro to SSA

Lecture 3 Local Optimizations, Intro to SSA Lecture 3 Local Optimizations, Intro to SSA I. Basic blocks & Flow graphs II. Abstraction 1: DAG III. Abstraction 2: Value numbering IV. Intro to SSA ALSU 8.4-8.5, 6.2.4 Phillip B. Gibbons 15-745: Local

More information

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu. Optimization ASU Textbook Chapter 9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine.

More information

Decreasing Process Memory Requirements by Overlapping Program Portions

Decreasing Process Memory Requirements by Overlapping Program Portions Decreasing Process Memory Requirements by Overlapping Program Portions Richard L. Bowman Emily J. Ratliff David B. Whalley Harris, Melbourne IBM, Fort Lauderdale Florida State Univ., CS Dept bowman@harris.com

More information

4.1 Interval Scheduling

4.1 Interval Scheduling 41 Interval Scheduling Interval Scheduling Interval scheduling Job j starts at s j and finishes at f j Two jobs compatible if they don't overlap Goal: find maximum subset of mutually compatible jobs a

More information

Compiler and Architectural Support for. Jerey S. Snyder, David B. Whalley, and Theodore P. Baker. Department of Computer Science

Compiler and Architectural Support for. Jerey S. Snyder, David B. Whalley, and Theodore P. Baker. Department of Computer Science Fast Context Switches: Compiler and Architectural Support for Preemptive Scheduling Jerey S. Snyder, David B. Whalley, and Theodore P. Baker Department of Computer Science Florida State University, Tallahassee,

More information

More Code Generation and Optimization. Pat Morin COMP 3002

More Code Generation and Optimization. Pat Morin COMP 3002 More Code Generation and Optimization Pat Morin COMP 3002 Outline DAG representation of basic blocks Peephole optimization Register allocation by graph coloring 2 Basic Blocks as DAGs 3 Basic Blocks as

More information

Control Flow Analysis & Def-Use. Hwansoo Han

Control Flow Analysis & Def-Use. Hwansoo Han Control Flow Analysis & Def-Use Hwansoo Han Control Flow Graph What is CFG? Represents program structure for internal use of compilers Used in various program analyses Generated from AST or a sequential

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science

More information

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

USC 227 Office hours: 3-4 Monday and Wednesday  CS553 Lecture 1 Introduction 4 CS553 Compiler Construction Instructor: URL: Michelle Strout mstrout@cs.colostate.edu USC 227 Office hours: 3-4 Monday and Wednesday http://www.cs.colostate.edu/~cs553 CS553 Lecture 1 Introduction 3 Plan

More information

Lecture 8: Induction Variable Optimizations

Lecture 8: Induction Variable Optimizations Lecture 8: Induction Variable Optimizations I. Finding loops II. III. Overview of Induction Variable Optimizations Further details ALSU 9.1.8, 9.6, 9.8.1 Phillip B. Gibbons 15-745: Induction Variables

More information

Final Exam Review Questions

Final Exam Review Questions EECS 665 Final Exam Review Questions 1. Give the three-address code (like the quadruples in the csem assignment) that could be emitted to translate the following assignment statement. However, you may

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code

Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code BILL BLUNDEN Apress About the Author Acknowledgments Introduction xi xiii xv Chapter 1 Preventative Medicine 1 1.1 Core Problems 2

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 ILP techniques Hardware Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit

More information

Optimizations. Optimization Safety. Optimization Safety CS412/CS413. Introduction to Compilers Tim Teitelbaum

Optimizations. Optimization Safety. Optimization Safety CS412/CS413. Introduction to Compilers Tim Teitelbaum Optimizations CS412/CS413 Introduction to Compilers im eitelbaum Lecture 24: s 24 Mar 08 Code transformations to improve program Mainly: improve execution time Also: reduce program size Can be done at

More information

Aggressive Loop Unrolling in a Retargetable, Optimizing Compiler JACK W. DAVIDSON and SAN JAY JINTURKAR. Abstract

Aggressive Loop Unrolling in a Retargetable, Optimizing Compiler JACK W. DAVIDSON and SAN JAY JINTURKAR. Abstract Aggressive Loop Unrolling in a Retargetable, Optimizing Compiler JACK W. DAVIDSON and SAN JAY JINTURKAR {jwd, sj 3e} Ovirginia. edu Department of Computer Science, Thornton Hall University of Virginia,

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit ILP 1 / 44 ILP techniques

More information

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code) Compiler Theory (Intermediate Code Generation Abstract S yntax + 3 Address Code) 006 Why intermediate code? Details of the source language are confined to the frontend (analysis phase) of a compiler, while

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210 Intermediate Representations CS2210 Lecture 11 Reading & Topics Muchnick: chapter 6 Topics today: Intermediate representations Automatic code generation with pattern matching Optimization Overview Control

More information

Machine-Independent Optimizations

Machine-Independent Optimizations Chapter 9 Machine-Independent Optimizations High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. This chapter

More information

Induction Variable Identification (cont)

Induction Variable Identification (cont) Loop Invariant Code Motion Last Time Uses of SSA: reaching constants, dead-code elimination, induction variable identification Today Finish up induction variable identification Loop invariant code motion

More information

Avoiding Conditional Branches by Code Replication

Avoiding Conditional Branches by Code Replication Avoiding Conditional Branches by Code Replication FRANK MUELLER DAVID B. WHALLEY Fachbereich Informatik Department of Computer Science Humboldt-Universitȧ. tzuberlin Florida State University Unter den

More information

Compiler Optimization and Code Generation

Compiler Optimization and Code Generation Compiler Optimization and Code Generation Professor: Sc.D., Professor Vazgen Melikyan 1 Course Overview Introduction: Overview of Optimizations 1 lecture Intermediate-Code Generation 2 lectures Machine-Independent

More information

Multi-dimensional Arrays

Multi-dimensional Arrays Multi-dimensional Arrays IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula) A 2D array is a 1D array of 1D arrays Java uses arrays of pointers to arrays for >1D

More information

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions) By the end of this course, students should CIS 1.5 Course Objectives a. Understand the concept of a program (i.e., a computer following a series of instructions) b. Understand the concept of a variable

More information

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD 17 April 2018

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD   17 April 2018 ARSITEKTUR SISTEM KOMPUTER Wayan Suparta, PhD https://wayansuparta.wordpress.com/ 17 April 2018 Reduced Instruction Set Computers (RISC) CISC Complex Instruction Set Computer RISC Reduced Instruction Set

More information

CS553 Lecture Profile-Guided Optimizations 3

CS553 Lecture Profile-Guided Optimizations 3 Profile-Guided Optimizations Last time Instruction scheduling Register renaming alanced Load Scheduling Loop unrolling Software pipelining Today More instruction scheduling Profiling Trace scheduling CS553

More information

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University Code Generation Frédéric Haziza Department of Computer Systems Uppsala University Spring 2008 Operating Systems Process Management Memory Management Storage Management Compilers Compiling

More information

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013 Liveness Analysis and Register Allocation Xiao Jia May 3 rd, 2013 1 Outline Control flow graph Liveness analysis Graph coloring Linear scan 2 Basic Block The code in a basic block has: one entry point,

More information

Implementation of Additional Optimizations for VPO on the Power Platform

Implementation of Additional Optimizations for VPO on the Power Platform Implementation of Additional Optimizations for VPO on the Power Platform Chungsoo Lim, Gary A. Smith, Won So {clim, gasmith, wso}@ncsu.edu http://www4.ncsu.edu/~gasmith/csc791a_proj Spring 2005 CSC791A

More information

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University Soot A Java Bytecode Optimization Framework Sable Research Group School of Computer Science McGill University Goal Provide a Java framework for optimizing and annotating bytecode provide a set of API s

More information

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine

More information

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection An Overview of GCC Architecture (source: wikipedia) CS553 Lecture Control-Flow, Dominators, Loop Detection, and SSA Control-Flow Analysis and Loop Detection Last time Lattice-theoretic framework for data-flow

More information

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space Code optimization Have we achieved optimal code? Impossible to answer! We make improvements to the code Aim: faster code and/or less space Types of optimization machine-independent In source code or internal

More information

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210 Redundant Computation Elimination Optimizations CS2210 Lecture 20 Redundancy Elimination Several categories: Value Numbering local & global Common subexpression elimination (CSE) local & global Loop-invariant

More information

5008: Computer Architecture HW#2

5008: Computer Architecture HW#2 5008: Computer Architecture HW#2 1. We will now support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be

More information

INSTITUTE OF AERONAUTICAL ENGINEERING

INSTITUTE OF AERONAUTICAL ENGINEERING INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad -500 043 INFORMATION TECHNOLOGY TUTORIAL QUESTION BANK Course Name : DESIGN AND ANALYSIS OF ALGORITHMS Course Code : AIT001 Class

More information

Bentley Rules for Optimizing Work

Bentley Rules for Optimizing Work 6.172 Performance Engineering of Software Systems SPEED LIMIT PER ORDER OF 6.172 LECTURE 2 Bentley Rules for Optimizing Work Charles E. Leiserson September 11, 2012 2012 Charles E. Leiserson and I-Ting

More information

Reducing the Cost of Branches by Using Registersf-

Reducing the Cost of Branches by Using Registersf- Reducing the Cost of Branches by Using Registersf- JACK W. DAVIDSON AND DAVID B. WHALLEY Department of Computer Science University of Virginia Charlottesville, VA 22903, U. S. A. ABSTRACT In an attempt

More information

Using a Swap Instruction to Coalesce Loads and Stores

Using a Swap Instruction to Coalesce Loads and Stores Using a Swap Instruction to Coalesce Loads and Stores Apan Qasem, David Whalley, Xin Yuan, and Robert van Engelen Department of Computer Science, Florida State University Tallahassee, FL 32306-4530, U.S.A.

More information

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis Loop Invariant Code Motion Loop Invariant Code Motion Background: ud- and du-chains Last Time Control flow analysis Today Loop invariant code motion ud-chains A ud-chain connects a use of a variable to

More information

Using Cache Line Coloring to Perform Aggressive Procedure Inlining

Using Cache Line Coloring to Perform Aggressive Procedure Inlining Using Cache Line Coloring to Perform Aggressive Procedure Inlining Hakan Aydın David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115 {haydin,kaeli}@ece.neu.edu

More information

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013 1 COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013 Lecture Outline 1. Code optimization strategies 2. Peephole optimization 3. Common subexpression elimination

More information

Principles of Compiler Design

Principles of Compiler Design Principles of Compiler Design Code Generation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Intermediate Code Code Generation Target Program

More information

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http://www.cse.buffalo.edu/faculty/alphonce/sp17/cse443/index.php https://piazza.com/class/iybn4ndqa1s3ei Announcements Grading survey

More information

Intermediate representation

Intermediate representation Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing HIR semantic analysis HIR intermediate code gen.

More information

Intermediate Representations Part II

Intermediate Representations Part II Intermediate Representations Part II Types of Intermediate Representations Three major categories Structural Linear Hybrid Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique

More information

EECS 583 Class 3 More on loops, Region Formation

EECS 583 Class 3 More on loops, Region Formation EECS 583 Class 3 More on loops, Region Formation University of Michigan September 19, 2016 Announcements & Reading Material HW1 is out Get busy on it!» Course servers are ready to go Today s class» Trace

More information

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5. CS380C Compilers Instructor: TA: lin@cs.utexas.edu Office Hours: Mon/Wed 3:30-4:30 GDC 5.512 Jia Chen jchen@cs.utexas.edu Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.440 January 21, 2015 Introduction

More information

Memory Bandwidth Optimizations for Wide-Bus Machines

Memory Bandwidth Optimizations for Wide-Bus Machines Reprinted from Proceedings of the Hawaii International Conference on System Sciences, January 1993 Memory Bandwidth Optimizations for Wide-Bus Machines Michael A. Alexander, Mark W. Bailey, Bruce R. Childers,

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers William Stallings Computer Organization and Architecture Chapter 12 Reduced Instruction Set Computers Major Advances in Computers(1) The family concept IBM System/360 1964 DEC PDP-8 Separates architecture

More information

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion

More information

Topic 9: Control Flow

Topic 9: Control Flow Topic 9: Control Flow COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Front End The Back End (Intel-HP codename for Itanium ; uses compiler to identify parallelism)

More information

Model-based Software Development

Model-based Software Development Model-based Software Development 1 SCADE Suite Application Model in SCADE (data flow + SSM) System Model (tasks, interrupts, buses, ) SymTA/S Generator System-level Schedulability Analysis Astrée ait StackAnalyzer

More information

Intermediate Code & Local Optimizations. Lecture 20

Intermediate Code & Local Optimizations. Lecture 20 Intermediate Code & Local Optimizations Lecture 20 Lecture Outline Intermediate code Local optimizations Next time: global optimizations 2 Code Generation Summary We have discussed Runtime organization

More information

Fast Searches for Effective Optimization Phase Sequences

Fast Searches for Effective Optimization Phase Sequences Fast Searches for Effective Optimization Phase Sequences Prasad Kulkarni, Stephen Hines, Jason Hiser David Whalley, Jack Davidson, Douglas Jones Computer Science Department, Florida State University, Tallahassee,

More information

Topic I (d): Static Single Assignment Form (SSA)

Topic I (d): Static Single Assignment Form (SSA) Topic I (d): Static Single Assignment Form (SSA) 621-10F/Topic-1d-SSA 1 Reading List Slides: Topic Ix Other readings as assigned in class 621-10F/Topic-1d-SSA 2 ABET Outcome Ability to apply knowledge

More information

Why Global Dataflow Analysis?

Why Global Dataflow Analysis? Why Global Dataflow Analysis? Answer key questions at compile-time about the flow of values and other program properties over control-flow paths Compiler fundamentals What defs. of x reach a given use

More information

Calvin Lin The University of Texas at Austin

Calvin Lin The University of Texas at Austin Loop Invariant Code Motion Last Time SSA Today Loop invariant code motion Reuse optimization Next Time More reuse optimization Common subexpression elimination Partial redundancy elimination February 23,

More information

Instruction-Level Parallelism. Instruction Level Parallelism (ILP)

Instruction-Level Parallelism. Instruction Level Parallelism (ILP) Instruction-Level Parallelism CS448 1 Pipelining Instruction Level Parallelism (ILP) Limited form of ILP Overlapping instructions, these instructions can be evaluated in parallel (to some degree) Pipeline

More information

Compiler Optimization Techniques

Compiler Optimization Techniques Compiler Optimization Techniques Department of Computer Science, Faculty of ICT February 5, 2014 Introduction Code optimisations usually involve the replacement (transformation) of code from one sequence

More information

Lecture 1 Introduc-on

Lecture 1 Introduc-on Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert

More information

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz Compiler Design Fall 2015 Control-Flow Analysis Sample Exercises and Solutions Prof. Pedro C. Diniz USC / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292

More information

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel

More information

A main goal is to achieve a better performance. Code Optimization. Chapter 9

A main goal is to achieve a better performance. Code Optimization. Chapter 9 1 A main goal is to achieve a better performance Code Optimization Chapter 9 2 A main goal is to achieve a better performance source Code Front End Intermediate Code Code Gen target Code user Machineindependent

More information

Tuning Programs For Performance

Tuning Programs For Performance Tuning Programs For Performance Don t tune your program unless there is a performance issue Use performance instrumentation tools: prof: performance profiler based on sampling gprof: extension to prof

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Other Forms of Intermediate Code. Local Optimizations. Lecture 34

Other Forms of Intermediate Code. Local Optimizations. Lecture 34 Other Forms of Intermediate Code. Local Optimizations Lecture 34 (Adapted from notes by R. Bodik and G. Necula) 4/18/08 Prof. Hilfinger CS 164 Lecture 34 1 Administrative HW #5 is now on-line. Due next

More information

Administrative. Other Forms of Intermediate Code. Local Optimizations. Lecture 34. Code Generation Summary. Why Intermediate Languages?

Administrative. Other Forms of Intermediate Code. Local Optimizations. Lecture 34. Code Generation Summary. Why Intermediate Languages? Administrative Other Forms of Intermediate Code. Local Optimizations HW #5 is now on-line. Due next Friday. If your test grade is not glookupable, please tell us. Please submit test regrading pleas to

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

CISC / RISC. Complex / Reduced Instruction Set Computers

CISC / RISC. Complex / Reduced Instruction Set Computers Systems Architecture CISC / RISC Complex / Reduced Instruction Set Computers CISC / RISC p. 1/12 Instruction Usage Instruction Group Average Usage 1 Data Movement 45.28% 2 Flow Control 28.73% 3 Arithmetic

More information