Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

Size: px
Start display at page:

Download "Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model"

Transcription

1 Simple Machine Model Fall 005 Lectures & 5: Instruction Scheduling Instructions are executed in sequence Fetch, decode, execute, store results One instruction at a time For branch instructions, start fetching from a different location if needed Check branch condition Next instruction may come from a new location given by the branch instruction Saman marasinghe 6.05 MIT Fall 998 Simple xecution Model 5 Stage pipe-line fetch decode execute memory writeback Fetch: get the next instruction ecode: figure-out what that instruction is xecute: Perform LU operation address calculation in a memory op Memory: o the memory access in a mem. Op. Write ack: write the results back IF X MM W IF X MM W Saman marasinghe 6.05 MIT Fall 998 Saman marasinghe 6.05 MIT Fall 998 Inst Inst Inst Inst Inst Inst Inst 5 Simple xecution Model time IF X MM W IF X MM W IF X MM W IF X MM W IF X MM W From a Simple Machine Model to a Real Machine Model Many pipeline stages Pentium 5 Pentium Pro 0 Pentium IV (0nm) 0 Pentium IV (90nm) ifferent instructions taking different amount of time to execute Real Machine Model cont. Most modern processors have multiple execution units (superscalar) If the instruction sequence is correct, multiple operations will happen in the same cycles ven more important to have the right instruction sequence Hardware to stall the pipeline if an instruction uses a result that is not ready Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998

2 Constraints On Scheduling ata dependencies Control dependencies Resource Constraints ata ependency between Instructions If two instructions access the same variable, they can be dependent Kind of dependencies True: write read nti: read write Output: write write What to do if two instructions are dependent. The order of execution cannot be reversed Reduce the possibilities for scheduling Saman marasinghe 6.05 MIT Fall 998 Saman marasinghe MIT Fall 998 Computing ependencies For basic blocks, compute dependencies by walking through the instructions Identifying register dependencies is simple is it the same register? For memory accesses simple: base + offset?= base + offset data dependence analysis: a[i]?= a[i+] interprocedural analysis: global?= parameter pointer alias analysis: p?= p Representing ependencies Using a dependence G, one per basic block Nodes are instructions, edges represent dependencies Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Representing ependencies Using a dependence G, one per basic block Nodes are instructions, edges represent dependencies : r = *(r + ) : r = *(r + 8) : r = r + r : r5 = r - dge is labeled with Latency: v(i j) = delay required between initiation times of i and j minus the execution time required by i Saman marasinghe 6.05 MIT Fall 998 : r = *(r + ) : r = *(r + ) : r = r + r : r5 = r - Saman marasinghe 6.05 MIT Fall 998

3 nother Control ependencies and Resource Constraints : r = *(r + ) : *(r + ) = r For now, lets only worry about basic blocks : r = r + r For now, lets look at simple pipelines : r5 = r - Saman marasinghe 6.05 MIT Fall 998 Saman marasinghe 6.05 MIT Fall 998 List Scheduling lgorithm Results In : lea var_a, %rax cycle Idea : add $, %rax cycle : inc %r o a topological sort of the dependence G cycle : (%rsp), %r0 cycles Consider when an instruction can be scheduled 5: add %r0, 8(%rsp) without causing a stall 6: and 6(%rsp), %rbx cycles Schedule the instruction if it causes no stall and all cycles its predecessors are already scheduled st st 5 6 st st st Optimal list scheduling is NP-complete Use heuristics when necessary Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 List Scheduling lgorithm Create a dependence G of a basic block Topological Sort RY = nodes with no predecessors Loop until RY is empty Schedule each node in RY when no stalling Update RY Heuristics for selection Heuristics for selecting from the RY list pick the node with the longest path to a leaf in the dependence graph pick a node with most immediate successors pick a node that can go to a less busy pipeline (in a superscalar) Saman marasinghe 6.05 MIT Fall 998 Saman marasinghe MIT Fall 998

4 Heuristics for selection pick the node with the longest path to a leaf in the dependence graph lgorithm (for node x) If no successors d x = 0 d x = MX( d y + c xy ) for all successors y of x Heuristics for selection pick a node with most immediate successors lgorithm (for node x): f x = number of successors of x reverse breadth-first visitation order Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Results In : lea var_a, %rax cycle : add $, %rax cycle : inc %r cycle : (%rsp), %r0 cycles 5: add %r0, 8(%rsp) 6: and 6(%rsp), %rbx cycles cycles 8: %rbx, 6(%rsp) 9: lea var_b, %rax d=5 d= f= f= RY = { } d= d= f= 6 f= 5 8 d= f= Saman marasinghe 6.05 MIT Fall 998 Saman marasinghe 6.05 MIT Fall 998 Results In : lea var_a, %rax cycle : add $, %rax cycle : inc %r cycle : (%rsp), %r0 cycles 5: add %r0, 8(%rsp) 6: and 6(%rsp), %rbx cycles cycles 8: %rbx, 6(%rsp) 9: lea var_b, %rax st st 5 6 st st st cycles vs 9 cycles Saman marasinghe 6.05 MIT Fall 998 Resource Constraints Modern machines have many resource constraints Superscalar architectures: can run few parallel operations ut have constraints Saman marasinghe 6.05 MIT Fall 998

5 Resource Constraints of a Superscalar Processor : One fully pipelined reg-to-reg unit ll integer operations taking one cycle In parallel with One fully pipelined memory-to/from-reg unit ata loads take two cycles ata stores teke one cycle Saman marasinghe MIT Fall 998 List Scheduling lgorithm with resource constraints Represent the superscalar architecture as multiple pipelines ach pipeline represent some resource One single cycle reg-to-reg LU unit One two-cycle pipelined reg-to/from-memory unit LU MM MM Saman marasinghe MIT Fall 998 List Scheduling lgorithm with resource constraints Create a dependence G of a basic block Topological Sort RY = nodes with no predecessors Loop until RY is empty Let n RY be the node with the highest priority Schedule n in the earliest slot that satisfies precedence + resource constraints Update RY Saman marasinghe 6.05 MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = {, 6,, } 8 9 LUop MM MM Saman marasinghe MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = {, 6,, } 8 9 LUop MM MM Saman marasinghe MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = {, 6,, } 8 9 LUop MM MM Saman marasinghe MIT Fall 998 5

6 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = { 6,, } 8 9 LUop 6 MM MM Saman marasinghe 6.05 MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = {,, } 8 9 LUop MM MM 6 Saman marasinghe 6.05 MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = {,, 5 } 8 9 LUop 6 MM MM Saman marasinghe 6.05 MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = {, 5, 8, 9 } 8 9 LUop 6 MM MM Saman marasinghe 6.05 MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = { 5, 8, 9 } 8 9 LUop 6 MM 5 MM Saman marasinghe MIT Fall 998 : lea var_a, %rax d= d= : add (%rsp), %rax f= f= : inc %r : (%rsp), %r0 d= d= 5: %r0, 8(%rsp) f= 6 f= 5 d= 9: %rbx, 6(%rsp) f= RY = { 8, 9 } 8 9 LUop 6 8 MM 5 MM Saman marasinghe MIT Fall 998 6

7 : lea var_a, %rax : add (%rsp), %rax : inc %r : (%rsp), %r0 5: %r0, 8(%rsp) 9: %rbx, 6(%rsp) RY = { } LUop MM MM Saman marasinghe 6.05 MIT Fall Scheduling across basic blocks Number of instructions in a basic block is small Cannot keep a multiple units with long pipelines busy by just scheduling within a basic block Need to handle control dependence Scheduling constraints across basic blocks Scheduling policy Saman marasinghe MIT Fall 998 Moving across basic blocks ownward to adjacent basic block Moving across basic blocks Upward to adjacent basic block C C path to that does not execute? path from C that does not reach? Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Control ependencies Constraints in ing instructions across basic blocks if (... ) a = b op c Control ependencies Constraints in ing instructions across basic blocks If ( valid address? ) d = *(a) Saman marasinghe 6.05 MIT Fall 998 Saman marasinghe 6.05 MIT Fall 998

8 Trace Scheduling Trace Scheduling Find the most common trace of basic blocks Use profile information Combine the basic blocks in the trace and schedule them as one block Create clean-up code if the execution goes offtrace C F G H Saman marasinghe 6.05 MIT Fall 998 Saman marasinghe 6.05 MIT Fall 998 Trace Scheduling Large asic locks via Code uplication Creating large extended basic blocks by duplication Schedule the larger blocks G C C H Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Trace Scheduling C F G F G H H H Saman marasinghe 6.05 MIT Fall 998 Scheduling Loops Loop bodies are small ut, lot of time is spend in loops due to large number of iterations Need better ways to schedule loops Saman marasinghe MIT Fall 998 8

9 Loop Machine One load/store unit load cycles store cycles Two arithmetic units add cycles branch cycles multiply cycles oth units are pipelined (initiate one op each cycle) Source Code for i = to N [i] = [i] * b Source Code for i = to N [i] = [i] * b ssembly Code Loop loop: (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) $, %rax bge loop Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Loop ssembly Code loop: (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) $, %rax bge loop Schedule (9 cycles per iteration) bge bge d= d=5 d= 0 d= bge Loop Unrolling Unroll the loop body few times Pros: Create a much larger basic block for the body liminate few loop bounds checks Cons: Much larger program Setup code (# of iterations < unroll factor) beginning and end of the schedule can still have unused slots Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Loop loop: (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) $, %rax (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) $, %rax bge loop Schedule (8 cycles per iteration) 0 0 mul mul d= d= d=9 d=9 d= d=5 d= d= bge bge bge Saman marasinghe MIT Fall 998 Loop Unrolling Rename registers Use different registers in different iterations Saman marasinghe MIT Fall 998 9

10 loop: bge Loop (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) $, %rax (%rdi,%rax), %rcx %r, %rcx %rcx, (%rdi,%rax) $, %rax loop d= mul d= d=9 0 d=9 d= mul d=5 d= 0 d= bge Loop Unrolling Rename registers Use different registers in different iterations liminate unnecessary dependencies again, use more registers to eliminate true, anti and output dependencies eliminate dependent-chains of calculations when possible Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Loop loop: (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) d= mul $8, %rax d=5 (%rdi,%rbx), %rcx %r, %rcx mul d= bge %rcx, (%rdi,%rbx) $8, %rbx loop Schedule (.5 cycles per iteration d= d= d= bge bge bge Saman marasinghe MIT Fall 998 Software Pipelining Try to overlap multiple iterations so that the slots will be filled Find the steady-state window so that: all the instructions of the loop body is executed but from different iterations Saman marasinghe MIT Fall 998 Loop ssembly Code loop: (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) $, %rax bge loop Schedule st st st 5 st 6 ld5 mul mul mul bge mul bge mul bge mul5 mul mul mul bge mul bge mul bge mul mul mul mul mul Saman marasinghe MIT Fall 998 iterations are overlapped value of %r don t change Loop regs for (%rdi,%rax) each addr. incremented by * regs to keep value %r0 st mul bge mul mul Same registers can be reused after of these blocks loop: generate code for blocks, otherwise need to e (%rdi,%rax), %r0 %r, %r0 %r0, (%rdi,%rax) $, %rax bge loop Saman marasinghe MIT Fall 998 0

11 Software Pipelining Optimal use of resources Need a lot of registers Values in multiple iterations need to be kept Issues in dependencies xecuting a store instruction in an iteration before branch instruction is executed for a previous iteration (writing when it should not have) Loads and stores are issued out-of-order (need to figure-out dependencies before doing this) Code generation issues Generate pre-amble and post-amble code Multiple blocks so no register copy is needed Register llocation and Instruction Scheduling If register allocation is before instruction scheduling restricts the choices for scheduling Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 : (%rbp), %rax : add %rax, %rbx : 8(%rbp), %rax : add %rax, %rcx : (%rbp), %rax : add %rax, %rbx : 8(%rbp), %r0 : add %r0, %rcx LUop MM MM Saman marasinghe MIT Fall 998 LUop MM MM Saman marasinghe MIT Fall 998 Register llocation and Instruction Scheduling If register allocation is before instruction scheduling restricts the choices for scheduling Register llocation and Instruction Scheduling If register allocation is before instruction scheduling restricts the choices for scheduling If instruction scheduling before register allocation Register allocation may spill registers Will change the carefully done schedule!!! Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998

12 Superscalar: Where have all the transistors gone? Out of order execution If an instruction stalls, go beyond that and start executing non-dependent instructions Pros: Hardware scheduling Tolerates unpredictable latencies Cons: Instruction window is small Superscalar: Where have all the transistors gone? Register renaming If there is an anti or output dependency of a register that stalls the pipeline, use a different hardware register Pros: voids anti and output dependencies Cons: Cannot do more complex transformations to eliminate dependencies Saman marasinghe MIT Fall 998 Saman marasinghe MIT Fall 998 Hardware vs. Compiler In a superscalar, hardware and compiler scheduling can work hand-in-hand Hardware can reduce the burden when not predictable by the compiler Compiler can still greatly enhance the performance Large instruction window for scheduling Many program transformations that increase parallelism Compiler is even more critical when no hardware support VLIW machines (Itanium, SPs) Saman marasinghe MIT Fall 998

Spring 2 Spring Loop Optimizations

Spring 2 Spring Loop Optimizations Spring 2010 Loop Optimizations Instruction Scheduling 5 Outline Scheduling for loops Loop unrolling Software pipelining Interaction with register allocation Hardware vs. Compiler Induction Variable Recognition

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction

More information

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation Instruction Scheduling Last week Register allocation Background: Pipelining Basics Idea Begin executing an instruction before completing the previous one Today Instruction scheduling The problem: Pipelined

More information

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Lecture 7 Instruction Scheduling I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 10.4 CS243: Instruction Scheduling 1 Scheduling Constraints Data dependences

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

CS 3330 Exam 2 Fall 2017 Computing ID:

CS 3330 Exam 2 Fall 2017 Computing ID: S 3330 Fall 2017 Exam 2 Variant page 1 of 8 Email I: S 3330 Exam 2 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if we

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

register allocation saves energy register allocation reduces memory accesses.

register allocation saves energy register allocation reduces memory accesses. Lesson 10 Register Allocation Full Compiler Structure Embedded systems need highly optimized code. This part of the course will focus on Back end code generation. Back end: generation of assembly instructions

More information

Final Jeopardy. CS356 Unit 15. Binary Brainteaser 100. Binary Brainteaser 200. Review

Final Jeopardy. CS356 Unit 15. Binary Brainteaser 100. Binary Brainteaser 200. Review 15.1 Final Jeopardy 15.2 Binary Brainteasers Instruction Inquiry Random Riddles Memory Madness Processor Predicaments Programming Pickles CS356 Unit 15 Review 100 100 100 100 100 100 200 200 200 200 200

More information

CS 3330 Exam 3 Fall 2017 Computing ID:

CS 3330 Exam 3 Fall 2017 Computing ID: S 3330 Fall 2017 Exam 3 Variant E page 1 of 16 Email I: S 3330 Exam 3 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006 P3 / 2006 Register Allocation What is register allocation Spilling More Variations and Optimizations Kostis Sagonas 2 Spring 2006 Storing values between defs and uses Program computes with values value

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Compiler Architecture

Compiler Architecture Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer

More information

Lecture 13 - VLIW Machines and Statically Scheduled ILP

Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 ILP techniques Hardware Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit

More information

SUPERSCALAR AND VLIW PROCESSORS

SUPERSCALAR AND VLIW PROCESSORS Datorarkitektur I Fö 10-1 Datorarkitektur I Fö 10-2 What is a Superscalar Architecture? SUPERSCALAR AND VLIW PROCESSORS A superscalar architecture is one in which several instructions can be initiated

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit ILP 1 / 44 ILP techniques

More information

CS152 Computer Architecture and Engineering. Complex Pipelines

CS152 Computer Architecture and Engineering. Complex Pipelines CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence

More information

15.1. CS356 Unit 15. Review

15.1. CS356 Unit 15. Review 15.1 CS356 Unit 15 Review 15.2 Final Jeopardy Binary Brainteasers Instruction Inquiry Random Riddles Memory Madness Processor Predicaments Programming Pickles 100 100 100 100 100 100 200 200 200 200 200

More information

Instruction scheduling

Instruction scheduling Instruction scheduling iaokang Qiu Purdue University ECE 468 October 12, 2018 What is instruction scheduling? Code generation has created a sequence of assembly instructions But that is not the only valid

More information

Chapter 3 (Cont III): Exploiting ILP with Software Approaches. Copyright Josep Torrellas 1999, 2001, 2002,

Chapter 3 (Cont III): Exploiting ILP with Software Approaches. Copyright Josep Torrellas 1999, 2001, 2002, Chapter 3 (Cont III): Exploiting ILP with Software Approaches Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Exposing ILP (3.2) Want to find sequences of unrelated instructions that can be overlapped

More information

Instruction scheduling. Advanced Compiler Construction Michel Schinz

Instruction scheduling. Advanced Compiler Construction Michel Schinz Instruction scheduling Advanced Compiler Construction Michel Schinz 2015 05 21 Instruction ordering When a compiler emits the instructions corresponding to a program, it imposes a total order on them.

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication

More information

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization CH14 Instruction Level Parallelism and Superscalar Processors Decode and issue more and one instruction at a time Executing more than one instruction at a time More than one Execution Unit What is Superscalar?

More information

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended

More information

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining Lecture 21 Software Pipelining & Prefetching I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining [ALSU 10.5, 11.11.4] Phillip B. Gibbons 15-745: Software

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Instruction-Level Parallelism (ILP)

Instruction-Level Parallelism (ILP) Instruction Level Parallelism Instruction-Level Parallelism (ILP): overlap the execution of instructions to improve performance 2 approaches to exploit ILP: 1. Rely on hardware to help discover and exploit

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

Compiler Optimization and Code Generation

Compiler Optimization and Code Generation Compiler Optimization and Code Generation Professor: Sc.D., Professor Vazgen elikyan 1 Course Overview ntroduction: Overview of Optimizations 1 lecture ntermediate-code Generation 2 lectures achine-ndependent

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm.

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm. Lecture 19 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm I. Example of DoAll Loops Machine: Per clock: 1 read, 1 write, 1 (2-stage) arithmetic op, with hardware loop op and

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your

More information

Instruction Scheduling

Instruction Scheduling Instruction Scheduling Michael O Boyle February, 2014 1 Course Structure Introduction and Recap Course Work Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling Next register allocation

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP

CS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

VLIW/EPIC: Statically Scheduled ILP

VLIW/EPIC: Statically Scheduled ILP 6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind

More information

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

CS252 Graduate Computer Architecture

CS252 Graduate Computer Architecture CS252 Graduate Computer Architecture University of California Dept. of Electrical Engineering and Computer Sciences David E. Culler Spring 2005 Last name: Solutions First name I certify that my answers

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

HW 2 is out! Due 9/25!

HW 2 is out! Due 9/25! HW 2 is out! Due 9/25! CS 6290 Static Exploitation of ILP Data-Dependence Stalls w/o OOO Single-Issue Pipeline When no bypassing exists Load-to-use Long-latency instructions Multiple-Issue (Superscalar),

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.

More information

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

Register Allocation, i. Overview & spilling

Register Allocation, i. Overview & spilling Register Allocation, i Overview & spilling 1 L1 p ::=(label f...) f ::=(label nat nat i...) i ::=(w

More information

Computer Science 246 Computer Architecture

Computer Science 246 Computer Architecture Computer Architecture Spring 2009 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Compiler ILP Static ILP Overview Have discussed methods to extract ILP from hardware Why can t some of these

More information

Last time: forwarding/stalls. CS 6354: Branch Prediction (con t) / Multiple Issue. Why bimodal: loops. Last time: scheduling to avoid stalls

Last time: forwarding/stalls. CS 6354: Branch Prediction (con t) / Multiple Issue. Why bimodal: loops. Last time: scheduling to avoid stalls CS 6354: Branch Prediction (con t) / Multiple Issue 14 September 2016 Last time: scheduling to avoid stalls 1 Last time: forwarding/stalls add $a0, $a2, $a3 ; zero or more instructions sub $t0, $a0, $a1

More information

Computer Architecture and Engineering CS152 Quiz #4 April 11th, 2012 Professor Krste Asanović

Computer Architecture and Engineering CS152 Quiz #4 April 11th, 2012 Professor Krste Asanović Computer Architecture and Engineering CS152 Quiz #4 April 11th, 2012 Professor Krste Asanović Name: ANSWER SOLUTIONS This is a closed book, closed notes exam. 80 Minutes 17 Pages Notes: Not all questions

More information

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011 15-740/18-740 Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011 Reviews Due next Monday Mutlu et al., Runahead Execution: An Alternative

More information

last time out-of-order execution and instruction queues the data flow model idea

last time out-of-order execution and instruction queues the data flow model idea 1 last time 2 out-of-order execution and instruction queues the data flow model idea graph of operations linked by depedencies latency bound need to finish longest dependency chain multiple accumulators

More information

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Modern CPU Design Instruction Control Retirement Unit Register File

More information

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011 5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer

More information

(Basic) Processor Pipeline

(Basic) Processor Pipeline (Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might

More information

CS 406/534 Compiler Construction Instruction Scheduling

CS 406/534 Compiler Construction Instruction Scheduling CS 406/534 Compiler Construction Instruction Scheduling Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

Code Generation. CS 540 George Mason University

Code Generation. CS 540 George Mason University Code Generation CS 540 George Mason University Compiler Architecture Intermediate Language Intermediate Language Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure

More information

Lecture 19: Instruction Level Parallelism

Lecture 19: Instruction Level Parallelism Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register

More information

Course on Advanced Computer Architectures

Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer rchitecture Spring 2016 Lecture 10: Out-of-Order Execution & Register Renaming Shuai Wang Department of Computer Science and Technology Nanjing University In Search of Parallelism Trivial Parallelism

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer rchitecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 11: Software Pipelining and Global Scheduling Lecture Outline Review of Loop Unrolling Software Pipelining

More information

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon Lecture 18 List Scheduling & Global Scheduling Reading: Chapter 10.3-10.4 1 Review: The Ideal Scheduling Outcome What prevents us from achieving this ideal? Before After Time 1 cycle N cycles 2 Review:

More information

Pipeline Architecture RISC

Pipeline Architecture RISC Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

void twiddle1(int *xp, int *yp) { void twiddle2(int *xp, int *yp) {

void twiddle1(int *xp, int *yp) { void twiddle2(int *xp, int *yp) { Optimization void twiddle1(int *xp, int *yp) { *xp += *yp; *xp += *yp; void twiddle2(int *xp, int *yp) { *xp += 2* *yp; void main() { int x = 3; int y = 3; twiddle1(&x, &y); x = 3; y = 3; twiddle2(&x,

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 4: Pipelining Last Time in Lecture 3 icrocoding, an effective technique to manage control unit complexity, invented in era when logic

More information

CSE 401/M501 Compilers

CSE 401/M501 Compilers CSE 401/M501 Compilers Code Shape I Basic Constructs Hal Perkins Autumn 2018 UW CSE 401/M501 Autumn 2018 K-1 Administrivia Semantics/type check due next Thur. 11/15 How s it going? Be sure to (re-)read

More information

Lecture: Pipeline Wrap-Up and Static ILP

Lecture: Pipeline Wrap-Up and Static ILP Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Multicycle

More information

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 11 Superscalar Pendidikan Teknik Elektronika FT UNY What is Superscalar? Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed

More information

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are 9. Code Scheduling for ILP-Processors Typical layout of compiler: traditional, optimizing, pre-pass parallel, post-pass parallel {Software! compilers optimizing code for ILP-processors, including VLIW}

More information

How to efficiently use the address register? Address register = contains the address of the operand to fetch from memory.

How to efficiently use the address register? Address register = contains the address of the operand to fetch from memory. Lesson 13 Storage Assignment Optimizations Sequence of accesses is very important Simple Offset Assignment This lesson will focus on: Code size and data segment size How to efficiently use the address

More information

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: SOLUTION Notes: CS 152 Computer Architecture and Engineering CS 252 Graduate Computer Architecture Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: I am taking CS152 / CS252 This is a closed

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

CS 152, Spring 2011 Section 10

CS 152, Spring 2011 Section 10 CS 152, Spring 2011 Section 10 Christopher Celio University of California, Berkeley Agenda Stuff (Quiz 4 Prep) http://3dimensionaljigsaw.wordpress.com/2008/06/18/physics-based-games-the-new-genre/ Intel

More information

Accessing Variables. How can we generate code for x?

Accessing Variables. How can we generate code for x? S-322 Register llocation ccessing Variables How can we generate code for x? a := x + y The variable may be in a register:,r x, The variable may be in a static memory location: ST x,r w work register L

More information

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

Pipelining. Pipeline performance

Pipelining. Pipeline performance Pipelining Basic concept of assembly line Split a job A into n sequential subjobs (A 1,A 2,,A n ) with each A i taking approximately the same time Each subjob is processed by a different substation (or

More information