IA-64 Compiler Technology

Size: px
Start display at page:

Download "IA-64 Compiler Technology"

Transcription

1 IA-64 Compiler Technology David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav (speaker), Carole Dulong Microcomputer Software Lab Page-1

2 Introduction IA-32 compiler optimizations Profile Guidance (PGOPTI) Inter-procedural Optimizer (IPO) Machine independent optimizations Vectorizer Machine dependent optimizations (Strength reduction, Code Scheduling) IA-64 compilers are focused on optimizations increasing instruction level parallelism Page-2

3 Compiler Optimizations designed to achieve the best performance on IA-64 Agenda Overview of s IA-64 Compiler Machine independent optimizations IA-64 code samples IA-64 specific optimizations IA-64 Compiler technology is ready, and exploits the architecture Page-3

4 Overview of s IA-64 compiler Compiler Architecture C++ Front End FORTRAN 90 Front End Profile Guidance (PGOPTI) Inter-procedural Optimizer (IPO) High-Level Optimizer (HLO) Global Optimizer (IL0) Code Generator (ECG) Page-4

5 Overview of s IA-64 compiler Global Optimizer (IL0) Loop Unrolling Register Variable Detection Constant Prop. Convert Opt. Strength Reduction Copy Propagation Re-association Redundancy Elim. Dead Store Elim. Dead Code Elim. Page-5

6 Overview of s IA-64 compiler Code generator (ECG) Machine instruction lowering Software Pipelining (rotating registers, loop branches) Predication (parallel compares) Global scheduling (control and data speculation, multi-way branches) Block ordering (branch hints) Global register allocation (register stack, ALAT, UNAT) Function splitting (I cache and TLB locality) Page-6

7 Compiler Optimizations designed to achieve the best performance on IA-64 Agenda Overview of s IA-64 Compiler Machine independent optimizations IA-64 code samples IA-64 specific optimizations Page-7

8 Machine independent optimizations Profile guided optimizations (PGOPTI) Inter-procedural optimizations (IPO) The larger the compilation scope, the better the performance Page-8

9 Machine independent optimizations Profile guided optimizations (PGOPTI) Execution profiling: Profile feedback Compile once with counters inserted. Run the profiled program with a sample input set. Compile again, using the counter values. Its effects: Branch probabilities guide Code generator region. Predication regions. Execution frequencies guide Procedure inlining/partial inlining Code placement Speculation heuristics Profile information key to get best out of predication and speculation techniques Page-9

10 Machine independent optimizations Inter-procedural optimizations (IPO) Procedure Integration: Inlining Copying a function body into a call site. Partial inlining Copy the hot portion of a function into a call site. Remainder becomes a splinter function. Cloning Specializing a function to a specific class of call sites. Page-10 Its effects: Exact interprocedural information for a specific call site. Larger code region to schedule Spreads function boundaries Larger scope enables better scheduling and register allocation

11 Machine independent optimizations Inter-procedural optimizations (IPO) Procedure integration: Inlining BEFORE: void func1() { int i; for (i=0; ) func2(i); } inlining AFTER: void func1() { int i; for (i=0; ) a[i] = 1.0 } void func2(int x) { a[x] = 1.0; } Eliminates function call overhead Page-11

12 Machine independent optimizations Inter-procedural optimizations (IPO) Procedure integration: Partial inlining BEFORE: void func(tree *p) { func2(p); } void func2(tree *p) { if (p->left == NULL) return;. printf( ); } partial inlining Page-12 AFTER: void func(tree *p) { if (p->left == NULL) return; else { splinter(p); } } void splinter(tree *p) {. printf( ); } Avoid unnecessary function calls

13 Machine independent optimizations Inter-procedural optimizations (IPO) Procedure integration: Cloning BEFORE: void func1() { func2(1, n); func2(1, m); func2(i, m); } void func2(int i, int j) { if (i == 0) // do something else // do something else } cloning Page-13 AFTER: void func1() { func2_0(1, n); func2_0(1, m); func2(i, m); } void func2_0(int i, int j) { // do something } Eliminate branches

14 Machine independent optimizations Inter-procedural optimizations (IPO) Inter-procedural Analysis: Alias propagation Mod/ref propagation Constant propagation Page-14 ITS EFFECTS: Alias propagation Indirect call conversion Better disambiguation Better scheduling, speculation Mod/ref propagation Improves registerization Constant propagation Makes constant parameters and globals explicit Remove unnecessary conditional code IPO improves many classical scalar optimizations

15 Compiler Optimizations designed to achieve the best performance on IA-64 Agenda Overview of s IA-64 Compiler Machine independent optimizations IA-64 code samples IA-64 specific optimizations Page-15

16 IA-64 code samples Strength reduction post increments Techniques to leverage IA-64 architecture features Page-16

17 IA-64 code samples Strength Reduction WHEN IT APPLIES: One operand is a loop invariant. Other operand is an induction variable. ITS EFFECTS: Multiplication is replaced by addition. Better chances for postincrement creation. Strength reduction key for optimizations of loops Page-17

18 IA-64 code samples Post-Increment Loads and Stores WHEN IT APPLIES: Memory address is modified after memory reference. ITS EFFECTS: Addition proceeds in parallel with load/store. No extra instructions required for induction variable update. Post increment increases parallelism Page-18

19 IA-64 Code samples Integer Multiplies in Loops (cont.) SOURCE: FUNCTION FUNC(A,B,N) REAL A(N,N), B(N,N) DO I = 1, N A(1,I) = B(1,I) RETURN END Algorithm: B(1,I) = base of B + size of element * (I-1) naive translation ASSEMBLY: r35 = base of B loop: setf.sig f32 = r32 setf.sig f33 = r33 xma f34 = f32,f33 getf.sig r34 = f34 add r36 = r34, r35 ld f34 = [r36] br... Page-19

20 IA-64 code samples Strength Reduction BEFORE: r35 = base of B loop: setf.sig f32 = r32 setf.sig f33 = r33 xma f34 = f32,f33 getf.sig r34 = f34 add r36 = r34, r35 ld f34 = [r36] br... strength reduction Page-20 AFTER: r36 = base of B r35 = stride of B loop: ld f34 = [r36] add r36 = r36, r35 br... Strength reduction replaces multiplication with additions

21 IA-64 code samples Strength Reduction+post increment BEFORE: r36 = base of B r35 = stride of B loop: ld f34 = [r36] cycle 0 add r36 = r36, r35 cycle 1 br... post increment AFTER: r36 = base of B r35 = stride of B loop: ld f34 = [r36],r35 br... Post increment replaces addition Page-21

22 Compiler Optimizations designed to achieve the best performance on IA-64 Agenda Overview of s IA-64 Compiler Machine independent optimizations IA-64 code samples Machine specific optimizations Page-22

23 Machine specific optimizations Predication Software Pipelining Page-23

24 Machine specific optimizations Predication S+=A A < B *p=s S+=B p1,p2=a<b (p1) S+=A (p2) S+=B *p=s Objective Eliminate hard to predict branches. Increase parallelism Reduces critical path A Side effect Reduce register usage (as compared to speculation) Eliminates branches, increases parallelism Page-24

25 Machine specific optimizations Predication with parallel compare A>B B>C Objective Reduce control height Reduce number of branches X Y p1=0 cmp.le.or p1,p0=a,b cmp.le.or p1,p0=b,c (p1) br X Y Page-25 If (a>b && b>c) then Y else X Parallel compare reduces control height

26 Machine specific optimizations Multi way branches with Predication X A>B B>C Y cmp.le p1,p0=a,b cmp.gt p2,p0=b,c (p1) br X (p2) br Y X Use Multi way branches Speculate compare (i.e.move above branch) Avoids predicate initialization More than 1 branch in a single cycle Allows n-way branching If (a>b && b>c) then Y else X Predication and Multi-way increase ILP Page-26

27 Machine specific optimizations Software Pipelining: Loop Example Convert string to uppercase for (i=0, i< len, i++) { if (IS_LOWERCASE(line[i])) newline[i] = CNVT_TO_UPPERCASE(line[i]); else newline[i] = line[i]; } After macro expansion for (i=0, i< len, i++) { if (line[i] >= a && line[i] <= z ) newline[i] = line[i]-32; else newline[i] = line[i]; } Typical integer-type loop Page-27

28 Machine specific optimizations Software Pipelining Overlapping execution of different loop iterations vs. Whole loop computation in one cycle Overlap iterations for parallelism Page-28

29 Machine specific optimizations Software Pipelining Cycle 1 ld ld Input 2 ld cmps 3 ld cmps sub 4 ld cmps sub st Kernel cmps sub st Output Page-29

30 Machine specific optimizations Software Pipelining: introducing Rotating Registers GR , FR can rotate Separate Rotating Register Base for each: GRs, FRs Loop branches decrement all register rotating bases (RRB) Instructions contain a virtual register number RRB + virtual register number = physical register number. Rotating registers make data transfer between stages transparent Page-30

31 Machine specific optimizations Software Pipelining: Pipelined Loop RRB = 0 Kernel code Physical register file + Virtual register loop: ld r34 = xx r34 = xx s1 s2 s3 s4 ld r34 = [ra], 1 cmp p1 = true cmp.and p1 = (r35>96) cmp.and p1 = (r35<123) (p1) sub r36 = r36, 32 st [rb] = r37, 1 br.ctop loop cmp< cmp> sub r35 = xx r36 = xx r35 = xx r36 = xx st r37 = xx r37 = xx Page-31

32 Machine specific optimizations Software Pipelining: Fill the pipe... G o B l u e! Physical register file + RRB = 0 Virtual register ld r34 = G r34 = G Execute prologue stage Kernel code loop: ld r34 = [ra], 1 cmp p1 = true cmp.and p1 = (r35>96) cmp.and p1 = (r35<123) (p1) sub r36 = r36, 32 st [rb] = r37, 1 br.ctop loop Page-32 cmp< cmp> sub st r35 = xx r36 = xx r37 = xx r35 = xx r36 = xx r37 = xx

33 Machine specific optimizations Software Pipelining: Fill the pipe... G o B l u e! Physical register file + RRB = 0 Virtual register ld r34 = G r34 = G Perform a loop branch Decrement lc Rotate registers by decrementing RRB cmp< cmp> r35 = xx r35 = xx sub r36 = xx r36 = xx st r37 = xx r37 = xx Page-33

34 Machine specific optimizations Software Pipelining: Fill the pipe... G o B l u e! Physical register file + RRB = -1 Virtual register ld r33 = o r34 = o Execute prologue stage Kernel code loop: ld r34 = [ra], 1 cmp p1 = true cmp.and p1 = (r35>96) cmp.and p1 = (r35<123) (p1) sub r36 = r36, 32 st [rb] = r37, 1 br.ctop loop Page-34 cmp< cmp> sub st r34 = G r35 = xx r36 = xx r35 = G r36 = xx r37 = xx

35 Machine specific optimizations Software Pipelining: Fill the pipe... G o B l u e! Physical register file + RRB = -2 Virtual register ld r32 = _ r34 = _ Execute prologue stage Kernel code loop: ld r34 = [ra], 1 cmp p1 = true cmp.and p1 = (r35>96) cmp.and p1 = (r35<123) (p1) sub r36 = r36, 32 st [rb] = r37, 1 br.ctop loop Page-35 cmp< cmp> sub st r33 = o r34 = G r35 = xx r35 = o r36 = G r37 = xx

36 Machine specific optimizations Software Pipelining: Execute the Kernel G o B l u e! Physical register file + RRB = -3 Virtual register Execute kernel Whole iteration per cycle Kernel code loop: ld r34 = [ra], 1 cmp p1 = true cmp.and p1 = (r35>96) cmp.and p1 = (r35<123) (p1) sub r36 = r36, 32 st [rb] = r37, 1 br.ctop loop G Page-36 ld cmp< cmp> sub st r37 = B r32 = _ r33 = o r34 = G r34 = B r35 = _ r36 = o r37 = G

37 Machine specific optimizations Software Pipelining: Execute the Kernel G o B l u e! Physical register file + RRB = -4 Virtual register Execute kernel Whole iteration per cycle Kernel code loop: ld r34 = [ra], 1 cmp p1 = true cmp.and p1 = (r35>96) cmp.and p1 = (r35<123) (p1) sub r36 = r36, 32 st [rb] = r37, 1 br.ctop loop G O Page-37 ld cmp< cmp> sub st r36 = l r37 = B r32 = _ r33 = O r34 = l r35 = B r36 = _ r37 = O

38 Machine specific optimizations Software Pipelining: Execute the Kernel G o B l u e! Physical register file + RRB = -5 Virtual register Execute kernel Whole iteration per cycle Kernel code loop: ld r34 = [ra], 1 cmp p1 = true cmp.and p1 = (r35>96) cmp.and p1 = (r35<123) (p1) sub r36 = r36, 32 st [rb] = r37, 1 br.ctop loop G O Page-38 ld cmp< cmp> sub st r35 = u r36 = l r37 = B r32 = _ r34 = u r35 = l r36 = B r37 = _

39 Machine specific optimizations Software Pipelining: IA-64 features that make this possible Full Predication Special branch handling features Register rotation: removes loop copy overhead Traditional architectures use loop unrolling High overhead: extra code for loop body Especially Useful for Integer Code with Small Number of Loop Iterations Page-39

40 Summary IA-64 compilers use existing compiler technologies Inter procedure optimizations to increase the compilation score profile guided optimizations to make efficient use of predication and speculation Compilers take advantage of IA-64 architecture with new compiler technology predication to remove branches, and increase parallelism architecture features allow significant speed up in software pipelined loops Page-40

41 IA-64 Compiler Status C/C++ FORTRAN Microsoft IBM/SCO Linux Novell Sun Win64 Monterey64 IA-64 Linux Modesto Solaris MS, IBM, IBM, EPC Cygnus, EPC Sun SGI, EPC, EPC, Compaq EPC PGI, EPC N/A PGI COBOL Fujitsu IBM N/A Fujitsu JAVA IBM IBM Novell Sun Early access versions planned for Q1 00 Engaging with additional tools vendors in Q3 99 Production compilers in sync with Itanium system availability Page-41

42 Call To Action IA-64 compiler technology is ready and able. Get started on IA-64 today. Review the IA-64 Track software sessions at intel.com/design/ia64/.com/design/ia64/devinfo.htm Getting Software Ready for IA-64 Learn how to get the best performance from your apps Learn what you can do to get started today OSV Breakout Sessions Get the latest OS status from the vendors themselves Learn more details on OS tools availability and optimizations Page-42

HPL-PD A Parameterized Research Architecture. Trimaran Tutorial

HPL-PD A Parameterized Research Architecture. Trimaran Tutorial 60 HPL-PD A Parameterized Research Architecture 61 HPL-PD HPL-PD is a parameterized ILP architecture It serves as a vehicle for processor architecture and compiler optimization research. It admits both

More information

Final CSE 131B Winter 2003

Final CSE 131B Winter 2003 Login name Signature Name Student ID Final CSE 131B Winter 2003 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 _ (20 points) _ (25 points) _ (21 points) _ (40 points) _ (30 points) _ (25 points)

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

UCI. Intel Itanium Line Processor Efforts. Xiaobin Li. PASCAL EECS Dept. UC, Irvine. University of California, Irvine

UCI. Intel Itanium Line Processor Efforts. Xiaobin Li. PASCAL EECS Dept. UC, Irvine. University of California, Irvine Intel Itanium Line Processor Efforts Xiaobin Li PASCAL EECS Dept. UC, Irvine Outline Intel Itanium Line Roadmap IA-64 Architecture Itanium Processor Microarchitecture Case Study of Exploiting TLP at VLIW

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 ILP techniques Hardware Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit ILP 1 / 44 ILP techniques

More information

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle

More information

Mike Martell - Staff Software Engineer David Mackay - Applications Analysis Leader Software Performance Lab Intel Corporation

Mike Martell - Staff Software Engineer David Mackay - Applications Analysis Leader Software Performance Lab Intel Corporation Mike Martell - Staff Software Engineer David Mackay - Applications Analysis Leader Software Performance Lab Corporation February 15-17, 17, 2000 Agenda: Key IA-64 Software Issues l Do I Need IA-64? l Coding

More information

Optimising with the IBM compilers

Optimising with the IBM compilers Optimising with the IBM Overview Introduction Optimisation techniques compiler flags compiler hints code modifications Optimisation topics locals and globals conditionals data types CSE divides and square

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

VLIW/EPIC: Statically Scheduled ILP

VLIW/EPIC: Statically Scheduled ILP 6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind

More information

Preface. Intel Technology Journal Q4, Lin Chao Editor Intel Technology Journal

Preface. Intel Technology Journal Q4, Lin Chao Editor Intel Technology Journal Preface Lin Chao Editor Intel Technology Journal With the close of the year 1999, it is appropriate that we look forward to Intel's next generation architecture--the Intel Architecture (IA)-64. The IA-64

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Lecture: Static ILP. Topics: predication, speculation (Sections C.5, 3.2)

Lecture: Static ILP. Topics: predication, speculation (Sections C.5, 3.2) Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2) 1 Scheduled and Unrolled Loop Loop: L.D F0, 0(R1) L.D F6, -8(R1) L.D F10,-16(R1) L.D F14, -24(R1) ADD.D F4, F0, F2 ADD.D F8, F6,

More information

Final Exam Review Questions

Final Exam Review Questions EECS 665 Final Exam Review Questions 1. Give the three-address code (like the quadruples in the csem assignment) that could be emitted to translate the following assignment statement. However, you may

More information

Code Merge. Flow Analysis. bookkeeping

Code Merge. Flow Analysis. bookkeeping Historic Compilers Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials

More information

Languages and Compiler Design II IR Code Optimization

Languages and Compiler Design II IR Code Optimization Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1 Agenda IR Optimization

More information

Lecture 6: Static ILP

Lecture 6: Static ILP Lecture 6: Static ILP Topics: loop analysis, SW pipelining, predication, speculation (Section 2.2, Appendix G) Assignment 2 posted; due in a week 1 Loop Dependences If a loop only has dependences within

More information

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division thomas.siebold@hp.com Agenda Terminology What is the Itanium Architecture? 1 Terminology Processor Architectures

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

CS553 Lecture Profile-Guided Optimizations 3

CS553 Lecture Profile-Guided Optimizations 3 Profile-Guided Optimizations Last time Instruction scheduling Register renaming alanced Load Scheduling Loop unrolling Software pipelining Today More instruction scheduling Profiling Trace scheduling CS553

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 10 Compiler Techniques / VLIW Israel Koren ECE568/Koren Part.10.1 FP Loop Example Add a scalar

More information

Introduction to Compilers

Introduction to Compilers Introduction to Compilers Compilers are language translators input: program in one language output: equivalent program in another language Introduction to Compilers Two types Compilers offline Data Program

More information

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

CS553 Lecture Dynamic Optimizations 2

CS553 Lecture Dynamic Optimizations 2 Dynamic Optimizations Last time Predication and speculation Today Dynamic compilation CS553 Lecture Dynamic Optimizations 2 Motivation Limitations of static analysis Programs can have values and invariants

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer rchitecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 11: Software Pipelining and Global Scheduling Lecture Outline Review of Loop Unrolling Software Pipelining

More information

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

USC 227 Office hours: 3-4 Monday and Wednesday  CS553 Lecture 1 Introduction 4 CS553 Compiler Construction Instructor: URL: Michelle Strout mstrout@cs.colostate.edu USC 227 Office hours: 3-4 Monday and Wednesday http://www.cs.colostate.edu/~cs553 CS553 Lecture 1 Introduction 3 Plan

More information

Code generation and local optimization

Code generation and local optimization Code generation and local optimization Generating assembly How do we convert from three-address code to assembly? Seems easy! But easy solutions may not be the best option What we will cover: Instruction

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler

More information

Code generation and local optimization

Code generation and local optimization Code generation and local optimization Generating assembly How do we convert from three-address code to assembly? Seems easy! But easy solutions may not be the best option What we will cover: Instruction

More information

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 4.1 Basic Compiler Techniques for Exposing ILP 計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 吳俊興高雄大學資訊工程學系 To avoid a pipeline stall, a dependent instruction must be

More information

Understanding the IA-64 Architecture

Understanding the IA-64 Architecture Understanding the IA-64 Architecture Gautam Doshi Senior Architect IA-64 Processor Division Corporation August 31, 99 - September 2, 99 Agenda Today s Architecture Challenges IA-64 Architecture Performance

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

Lecture 13 - VLIW Machines and Statically Scheduled ILP

Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

IA-64 Architecture Innovations

IA-64 Architecture Innovations IA-64 Architecture Innovations Carole Dulong Principal Engineer IA-64 Computer Architect Intel Corporation February 19, 1998 Today s Objectives Provide background on key IA-64 architecture concepts Today

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

The SGI Pro64 Compiler Infrastructure - A Tutorial

The SGI Pro64 Compiler Infrastructure - A Tutorial The SGI Pro64 Compiler Infrastructure - A Tutorial Guang R. Gao (U of Delaware) J. Dehnert (SGI) J. N. Amaral (U of Alberta) R. Towle (SGI) Acknowledgement The SGI Compiler Development Teams The MIPSpro/Pro64

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Supercomputing in Plain English Part IV: Henry Neeman, Director

Supercomputing in Plain English Part IV: Henry Neeman, Director Supercomputing in Plain English Part IV: Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma Wednesday September 19 2007 Outline! Dependency Analysis! What is

More information

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

Cell Programming Tips & Techniques

Cell Programming Tips & Techniques Cell Programming Tips & Techniques Course Code: L3T2H1-58 Cell Ecosystem Solutions Enablement 1 Class Objectives Things you will learn Key programming techniques to exploit cell hardware organization and

More information

Code generation and optimization

Code generation and optimization Code generation and timization Generating assembly How do we convert from three-address code to assembly? Seems easy! But easy solutions may not be the best tion What we will cover: Peephole timizations

More information

Computer Science 246 Computer Architecture

Computer Science 246 Computer Architecture Computer Architecture Spring 2009 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Compiler ILP Static ILP Overview Have discussed methods to extract ILP from hardware Why can t some of these

More information

Instruction Scheduling

Instruction Scheduling Instruction Scheduling Michael O Boyle February, 2014 1 Course Structure Introduction and Recap Course Work Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling Next register allocation

More information

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two? Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the

More information

HW 2 is out! Due 9/25!

HW 2 is out! Due 9/25! HW 2 is out! Due 9/25! CS 6290 Static Exploitation of ILP Data-Dependence Stalls w/o OOO Single-Issue Pipeline When no bypassing exists Load-to-use Long-latency instructions Multiple-Issue (Superscalar),

More information

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005 Compiling for Performance on hp OpenVMS I64 Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005 Compilers discussed C, Fortran, [COBOL, Pascal, BASIC] Share GEM optimizer

More information

(just another operator) Ambiguous scalars or aggregates go into memory

(just another operator) Ambiguous scalars or aggregates go into memory Handling Assignment (just another operator) lhs rhs Strategy Evaluate rhs to a value (an rvalue) Evaluate lhs to a location (an lvalue) > lvalue is a register move rhs > lvalue is an address store rhs

More information

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections ) Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Program Optimization

Program Optimization Program Optimization Professor Jennifer Rexford http://www.cs.princeton.edu/~jrex 1 Goals of Today s Class Improving program performance o When and what to optimize o Better algorithms & data structures

More information

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop

More information

ECE 4750 Computer Architecture, Fall 2015 T16 Advanced Processors: VLIW Processors

ECE 4750 Computer Architecture, Fall 2015 T16 Advanced Processors: VLIW Processors ECE 4750 Computer Architecture, Fall 2015 T16 Advanced Processors: VLIW Processors School of Electrical and Computer Engineering Cornell University revision: 2015-11-30-13-42 1 Motivating VLIW Processors

More information

Computer Architecture. MIPS Instruction Set Architecture

Computer Architecture. MIPS Instruction Set Architecture Computer Architecture MIPS Instruction Set Architecture Instruction Set Architecture An Abstract Data Type Objects Registers & Memory Operations Instructions Goal of Instruction Set Architecture Design

More information

Data Prefetch and Software Pipelining. Stanford University CS243 Winter 2006 Wei Li 1

Data Prefetch and Software Pipelining. Stanford University CS243 Winter 2006 Wei Li 1 Data Prefetch and Software Pipelining Wei Li 1 Data Prefetch Software Pipelining Agenda 2 Why Data Prefetching Increasing Processor Memory distance Caches do work!!! IF Data set cache-able, able, accesses

More information

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining Lecture 21 Software Pipelining & Prefetching I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining [ALSU 10.5, 11.11.4] Phillip B. Gibbons 15-745: Software

More information

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V. Introduction to optimizations CS3300 - Compiler Design Introduction to Optimizations V. Krishna Nandivada IIT Madras Copyright c 2018 by Antony L. Hosking. Permission to make digital or hard copies of

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence

More information

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as 372 Chapter 16 Code Improvement 16.10 Exercises 16.1 In Section 16.2 we suggested replacing the instruction r1 := r2 / 2 with the instruction r1 := r2 >> 1, and noted that the replacement may not be correct

More information

Computer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key

Computer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key Computer Architecture and Engineering CS152 Quiz #5 April 23rd, 2009 Professor Krste Asanovic Name: Answer Key Notes: This is a closed book, closed notes exam. 80 Minutes 8 Pages Not all questions are

More information

Preventing Stalls: 1

Preventing Stalls: 1 Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:

More information

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

COSC 6385 Computer Architecture - Review for the 2 nd Quiz

COSC 6385 Computer Architecture - Review for the 2 nd Quiz COSC 6385 Computer Architecture - Review for the 2 nd Quiz Fall 2006 Covered topic area End of section 3 Multiple issue Speculative execution Limitations of hardware ILP Section 4 Vector Processors (Appendix

More information

Simone Campanoni Loop transformations

Simone Campanoni Loop transformations Simone Campanoni simonec@eecs.northwestern.edu Loop transformations Outline Simple loop transformations Loop invariants Induction variables Complex loop transformations Simple loop transformations Simple

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

Lecture: Static ILP. Topics: loop unrolling, software pipelines (Sections C.5, 3.2)

Lecture: Static ILP. Topics: loop unrolling, software pipelines (Sections C.5, 3.2) Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2) 1 Loop Example for (i=1000; i>0; i--) x[i] = x[i] + s; Source code FPALU -> any: 3 s FPALU -> ST : 2 s IntALU -> BR :

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

Just-In-Time Compilers & Runtime Optimizers

Just-In-Time Compilers & Runtime Optimizers COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4 12 1 CMPE110 Fall 2006 A. Di Blas 110 Fall 2006 CMPE pipeline concepts Advanced ffl ILP ffl Deep pipeline ffl Static multiple issue ffl Loop unrolling ffl VLIW ffl Dynamic multiple issue Textbook Edition:

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall CS 701 Charles N. Fischer Class Meets Tuesdays & Thursdays, 11:00 12:15 2321 Engineering Hall Fall 2003 Instructor http://www.cs.wisc.edu/~fischer/cs703.html Charles N. Fischer 5397 Computer Sciences Telephone:

More information

COMPUTER ORGANIZATION & ARCHITECTURE

COMPUTER ORGANIZATION & ARCHITECTURE COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5a 1 What are Instruction Sets The complete collection of instructions that are understood by a CPU Can be considered as a functional

More information

The CPU IA-64: Key Features to Improve Performance

The CPU IA-64: Key Features to Improve Performance The CPU IA-64: Key Features to Improve Performance Ricardo Freitas Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal freitas@fam.ulusiada.pt Abstract. The IA-64 architecture is

More information

CS 6354: Static Scheduling / Branch Prediction. 12 September 2016

CS 6354: Static Scheduling / Branch Prediction. 12 September 2016 1 CS 6354: Static Scheduling / Branch Prediction 12 September 2016 On out-of-order RDTSC 2 Because of pipelines, etc.: RDTSC can actually take its time measurement after it starts Earlier instructions

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Optimization Prof. James L. Frankel Harvard University

Optimization Prof. James L. Frankel Harvard University Optimization Prof. James L. Frankel Harvard University Version of 4:24 PM 1-May-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Reasons to Optimize Reduce execution time Reduce memory

More information

Hiroaki Kobayashi 12/21/2004

Hiroaki Kobayashi 12/21/2004 Hiroaki Kobayashi 12/21/2004 1 Loop Unrolling Static Branch Prediction Static Multiple Issue: The VLIW Approach Software Pipelining Global Code Scheduling Trace Scheduling Superblock Scheduling Conditional

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

Spring 2 Spring Loop Optimizations

Spring 2 Spring Loop Optimizations Spring 2010 Loop Optimizations Instruction Scheduling 5 Outline Scheduling for loops Loop unrolling Software pipelining Interaction with register allocation Hardware vs. Compiler Induction Variable Recognition

More information

Last time: forwarding/stalls. CS 6354: Branch Prediction (con t) / Multiple Issue. Why bimodal: loops. Last time: scheduling to avoid stalls

Last time: forwarding/stalls. CS 6354: Branch Prediction (con t) / Multiple Issue. Why bimodal: loops. Last time: scheduling to avoid stalls CS 6354: Branch Prediction (con t) / Multiple Issue 14 September 2016 Last time: scheduling to avoid stalls 1 Last time: forwarding/stalls add $a0, $a2, $a3 ; zero or more instructions sub $t0, $a0, $a1

More information

Computer Science 160 Translation of Programming Languages

Computer Science 160 Translation of Programming Languages Computer Science 160 Translation of Programming Languages Instructor: Christopher Kruegel Code Optimization Code Optimization What should we optimize? improve running time decrease space requirements decrease

More information

CS 614 COMPUTER ARCHITECTURE II FALL 2005

CS 614 COMPUTER ARCHITECTURE II FALL 2005 CS 614 COMPUTER ARCHITECTURE II FALL 2005 DUE : November 9, 2005 HOMEWORK III READ : - Portions of Chapters 5, 6, 7, 8, 9 and 14 of the Sima book and - Portions of Chapters 3, 4, Appendix A and Appendix

More information

Lecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

Lecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections ) Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections 2.2-2.6) 1 Predication A branch within a loop can be problematic to schedule Control

More information

ARM Assembly Programming II

ARM Assembly Programming II ARM Assembly Programming II Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/26 with slides by Peng-Sheng Chen GNU compiler and binutils HAM uses GNU compiler and binutils gcc: GNU C

More information

Instruction-Level Parallelism (ILP)

Instruction-Level Parallelism (ILP) Instruction Level Parallelism Instruction-Level Parallelism (ILP): overlap the execution of instructions to improve performance 2 approaches to exploit ILP: 1. Rely on hardware to help discover and exploit

More information