CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

Similar documents
Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

Appendix D. Controller Implementation

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

Arquitectura de Computadores

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

Elementary Educational Computer

Chapter 4 The Datapath

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Multiprocessors. HPC Prof. Robert van Engelen

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

COMPUTER ORGANIZATION AND DESIGN

Data diverse software fault tolerance techniques

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Lecture 3. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

Pipelining. CSC Friday, November 6, 2015

Instruction and Data Streams

Lecture 7 Pipelining. Peng Liu.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

Uniprocessors. HPC Prof. Robert van Engelen

Lecture 19 Introduction to Pipelining

. Written in factored form it is easy to see that the roots are 2, 2, i,

Computer Graphics Hardware An Overview

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

SPIRAL DSP Transform Compiler:

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

CMPT 125 Assignment 2 Solutions

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

CS 152 Computer Architecture and Engineering

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Lecture 1: Introduction and Fundamental Concepts 1

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

The Simeck Family of Lightweight Block Ciphers

Computer Architecture

(Basic) Processor Pipeline

Computer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Analysis of Algorithms

Examples and Applications of Binary Search

1. SWITCHING FUNDAMENTALS

COMP2611: Computer Organization. The Pipelined Processor

Pipelining. Maurizio Palesi

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

COSC 6385 Computer Architecture - Pipelining

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Python Programming: An Introduction to Computer Science

Algorithm Design Techniques. Divide and conquer Problem

COMPUTER ORGANIZATION AND DESIGN

Design of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

How do we evaluate algorithms?

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Processor (II) - pipelining. Hwansoo Han

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Lecture 5. Counting Sort / Radix Sort

Instruction Pipelining

Python Programming: An Introduction to Computer Science

Instruction Pipelining

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units

Homework 1 Solutions MA 522 Fall 2017

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Isn t It Time You Got Faster, Quicker?

Lecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality

ΕΠΛ 605 Εργαστήριο 5. Παναγιώτα Νικολάου 11/10/18. Slides from: Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

1 Hazards COMP2611 Fall 2015 Pipelined Processor

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

Transcription:

CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago

Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award lecture Tomorrow 2

Lecture Outlie Pipeliig basics ad discussios No-ideal pipelie 3

Sigle Cycle uarch: Datapath & Cotrol **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 4

Sigle Cycle uarch: Summary Iefficiet All istructios ru as slow as the slowest istructio Not ecessarily the simplest way to implemet a ISA Sigle-cycle implemetatio of REP MOVS (x86)? Not easy to optimize/improve performace Optimizig the commo case (e.g. commo istructios) does ot work Need to optimize the worst case all the time All resources are ot fully utilized e.g., data memory access ca t overlap with ALU operatio How to do better? 5

Sigle-Cycle, Multi-Cycle, Pipeliig Sigle-cycle: 1 cycle per istructio, log cycle time F D E M W F D E M W Multi-cycle: 5 cycles per istructio, short cycle time F D E M W F D E M W F D E M W Pipelie: 1 cycle per istructio (steady state), short cycle time F D E M W F D E M W F D E M W F D E M W Time 6

Istructio Pipeliig: Basic Idea Pipelie the executio of multiple istructios Idea: Divide the istructio processig ito distict stages of processig Esure there are eough hardware resources to process oe istructio i each stage Process a differet istructio i each stage Istructios cosecutive i program order are processed i cosecutive stages Beefit: Icreases istructio processig throughput Dowside: Start thikig about this 7

Pipeliig Istructio Processig 8

Remember: Istructio Processig Steps 1. Istructio fetch (IF) 2. Istructio decode ad register operad fetch (ID/RF) 3. Execute/Evaluate memory address (EX/AG) 4. Memory operad fetch (MEM) 5. Store/writeback result (WB) 9

Remember the Sigle-Cycle Uarch Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 10

Pipelie Operatio Examples We ll look at load & store Show pipelie usage i a sigle cycle Highlight resources used 11

Review: LEGv8 Sigle-Cycle Datapath **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 12

Addig Pipelie Registers Registers betwee stages to hold iformatio produced i previous cycle Imm E B M AoutW BE IR D PC D A E PC E Aout M PC M MDR W **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 13

IF for Load, Store, Cycle 1 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 14

ID for Load, Store, Cycle 2 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 15

EX for Load Cycle 3 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 16

MEM for Load Cycle 4 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 17

WB for Load Cycle 5 Wrog register umber **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 18

Corrected Datapath for Load Cycle 5 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 19

EX for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 20

MEM for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 21

WB for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 22

Pipelie Operatio Examples Cosider the followig istructio seueces LDUR X10, [X1, 40] SUB X11, X2, X3 ADD X12, X3, X4 LDUR X13, [X1, 48] ADD X14, X5, X6 23

Fillig up the Pipelie **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 24

Pipelie: Steady State State of pipelie at the 5th cycles **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 25

Illustratig Pipelie Operatio: Operatio View t 0 t 1 t 2 t 3 t 4 t 5 Ist 0 Ist 1 Ist 2 Ist 3 Ist 4 IF ID IF EX ID IF MEM EX ID IF WB MEM EX ID IF steady state (full pipelie) WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF 26

Illustratig Pipelie Operatio: Resource View t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 IF I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 ID I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 EX I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 MEM I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 WB I 0 I 1 I 2 I 3 I 4 I 5 I 6 27

Pipelied Cotrol Idetical set of cotrol poits as the sigle-cycle uarch!! **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 28

Pipelied Cotrol Cotrol sigals derived from istructio Decode oce as i sigle-cycle implemetatio Buffer sigals util cosumed What other optios are there to derive pipelie cotrol sigals? **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 29

Pipelied Cotrol + Datapath Note: 1. Reg2Loc==0: istructio[20:16] is selected; ad Reg2Loc==1: istructio[4:0] is selected; 2. istructio[9:5] is the iput to Read register1 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 30

Performace Aalysis 31

Termiologies ad Defiitios CPI: cycle per istructio IPC: istructio per cycle, which is 1/CPI Executio time of a istructio {CPI} x {clock cycle time} Executio time of a program Iro Law Sum over all istructios [ {CPI} x {clock cycle time} ] {# of istructios} x {average CPI} x {clock cycle time} 32

Examples Remember: executio time of a program Sum over all istructios [ {CPI} x {clock cycle time} ] {# of istructios} x {average CPI} x {clock cycle time} Sigle-cycle uarch CPI = 1, but clock cycle time is log Multi-cycle uarch (with 5 stages) CPI = 5, but clock cycle time is short Pipelied uarch (with 5 stages) CPI = 1 (steady state), clock cycle time same with multi-cycle This is the ideal case 33

Pipeliig: Discussios 34

Pipelied uarch Is this a good partitioig? Why ot 4 or 6 stages? Why ot differet boudaries? **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 35

Pipelie Cosideratios How to partitio? How may stages? 36

Pipelie Partitioig: Resource Reuiremet The goal: o shared resources amog differet pipelie stages i.e., No resource is used by more tha 1 stage Otherwise, we have resource cotetio or structural hazard Example: eed to be able to fetch istructios (i IF stage) ad load data (i MEM stage) at the same time Sigle memory iterface ot sufficiet Solutio 1: provide two separate iterfaces via istructio ad data caches Solutio 2:?? 37

How May Pipelie Stages? BW (badwidth), a.k.a. throughput (1/ cycle time) Ideally, seuetial elemets (pipelie registers) do ot impose additioal delays/cost combiatioal logic (F,D,E,M,W) T ps BW=~(1/T) T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,M) T/3 ps (M,W) BW=~(3/T) 38

Pipelie Stages ad Impact o Performace Nopipelied versio with delay T BW = 1/(T+S) where S = seuetial elemet delay T ps k-stage pipelied versio BW k-stage = 1 / (T/k +S ) BW max = 1 / (1 gate delay + S ) Seuetial elemet delay reduces BW (switchig overhead betwee stages) T/k ps T/k ps 39

Pipelie Stages ad Impact o HW Cost Nopipelied versio with combiatioal cost G Cost = G+L where L = seuetial elemet cost G gates k-stage pipelied versio Cost k-stage ~= G + Lk Seuetial elemets icrease hardware cost G/k G/k It is critical to balace the tradeoffs i.e., how may stages ad what is doe i each stage 40

Ideal vs. No Ideal Pipelies 41

Properties of A Ideal Pipelie Goal: Icrease throughput with little icrease i cost (hardware cost, i case of istructio processig) Repetitio of idetical operatios The same operatio is repeated o a large umber of differet iputs (e.g., all laudry loads go through the same steps) Uiformly partitioable suboperatios Processig a be evely divided ito uiform-latecy suboperatios (that do ot share resources) Repetitio of idepedet operatios No depedecies betwee repeated operatios Ca you implemet a ideal pipelie for istructio processig? 42

Istructio Pipelie: Not Ideal Idetical operatios... NOT! Þ differet istructios à ot all eed the same stages Forcig differet istructios to go through the same pipe stages à exteral fragmetatio (some pipe stages idle for some istructios) Uiform suboperatios... NOT! Þ differet pipelie stages à ot the same latecy Need to force each stage to be cotrolled by the same clock à iteral fragmetatio (some pipe stages are too fast but all take the same clock cycle time) Idepedet operatios... NOT! Þ istructios are ot idepedet of each other Need to detect ad resolve iter-istructio depedecies to esure the pipelie provides correct results à pipelie stalls (pipelie is ot always movig) 43

Istructio Pipelie: Not Ideal Idetical operatios... NOT! Þ differet istructios à ot all eed the same stages Forcig differet istructios to go through the same pipe stages à exteral fragmetatio (some pipe stages idle for some istructios) Examples Add, Brach: o eed to go through the MEM stage Others? Performace impact? 44

Istructio Pipelie: Not Ideal Uiform suboperatios... NOT! Þ differet pipelie stages à ot the same latecy Need to force each stage to be cotrolled by the same clock à iteral fragmetatio (some pipe stages are too fast but all take the same clock cycle time) 45

No-Uiform Operatios: Laudry Aalogy Time Task order A B C D 6 PM 7 8 9 10 11 12 1 2 AM Time 6 PM 7 8 9 10 11 12 1 2 AM Task order A B C D Based o origial figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] the slowest step decides throughput or cycle time 46

No-Uiform Operatios: Example 200ps 100ps 200ps 200ps 100ps Imm E B M AoutW BE IR D PC D A E PC E Aout M PC M MDR W Based o origial figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 47

No-Uiform Operatios: Example Program executio order Time (i istructios) lw $1, 100($0) Istructio fetch 2 2004 400 6 600 8 800 1000 1200 1400 160016 180018 Reg ALU Data access Reg lw $2, 200($0) 800ps 8 s Istructio fetch Reg ALU Data access Reg lw $3, 300($0) Program executio Time order (i istructios) lw $1, 100($0) Istructio fetch 800ps 8 s 200 400 600 800 1000 1200 1400 2 4 6 8 10 12 14 Reg ALU Data access Reg Istructio fetch 800ps 8 s... lw $2, 200($0) 200ps 2 s Istructio fetch Reg ALU Data access Reg lw $3, 300($0) 200ps 2 s Istructio fetch Reg ALU Data access Reg 200ps 200ps 200ps 200ps 200ps 2 s 2 s 2 s 2 s 2 s 48

Istructio Pipelie: Not Ideal Idepedet operatios... NOT! Þ istructios are ot idepedet of each other Need to detect ad resolve iter-istructio depedecies to esure the pipelie provides correct results à pipelie stalls (pipelie is ot always movig) 49

Depedecies ad Their Types Also called hazards Two types Data depedecy Cotrol depedecy 50

Data Depedecy Hadlig 51

Data Depedecy Types Flow depedecy r 3 r 1 op r 2 Read-after-Write (RAW) r 5 r 3 op r 4 Ati depedecy r 5 r 3 op r 4 Write-after-Read (WAR) r 3 r 6 op r 7 Output-depedecy r 3 r 1 op r 2 Write-after-Write (WAW) r 5 r 3 op r 4 r 3 r 6 op r 7 52

Data Depedecy Types Flow depedecies always eed to be obeyed because they costitute true depedece o a value Ati ad output depedecies exist due to limited umber of architectural registers They are depedece o a ame, ot a value Ati ad output depedeces are easier to hadle Write to the destiatio i oe stage ad i program order Flow depedeces are more iterestig 53

Ways of Hadlig Flow Depedecies Detect ad wait util value is available i register file Detect ad forward/bypass data to depedet istructio Detect ad elimiate the depedece at the software level No eed for the hardware to detect depedece Predict the eeded value(s), execute speculatively, ad verify Do somethig else (fie-graied multithreadig) No eed to detect 54

Flow Depedecy Example Cosider this seuece: SUB X2, X1,X3 AND X12,X2,X5 OR X13,X2,X6 ADD X14,X2,X2 STUR X15,[X2,#100]

Flow Depedecy Example Time SUB X2, X1, X3 IF ID EX MEM WB AND X12, X2, X5 IF ID EX MEM WB OR ADD X13, X2, X6 X14, X2, X2 IF ID EX MEM? IF ID EX STUR X15, [X2, #100] IF ID SUB writig to X2 ad ADD readig from it i the same cycle Assume iteral forwardig i register file i.e., ADD gets the ew X2 value produced from SUB 56

How to Detect Flow Depedecies i HW? R/I-Type LDUR STUR B IF ID read RF read RF read RF EX MEM WB write RF write RF Istructios I A ad I B (where I A comes before I B ) have RAW depedecy iff ad I B (R/I, LDUR, or STUR) reads a register writte by I A (R/I or LDUR) dist(i A, I B ) < dist(id, WB) = 3 57

Flow Depedecy Check Logic Helper fuctios Op1(I) ad Op2(I) returs the 1 st ad 2 d register operad field of I, respectively Use_Op1(I) returs true if I reuires the 1 st register operads ad the register is ot X31; similarly for Use_Op2(I) Flow depedecy occurs whe or or or (Op1(IR ID )==dest EX ) && use_op1(ir ID ) && RegWrite EX (Op1(IR ID )==dest MEM ) && use_op1(ir ID ) && RegWrite MEM (Op2(IR ID )==dest EX ) && use_op2(ir ID ) && RegWrite EX (Op2(IR ID )==dest MEM ) && use_op2(ir ID ) && RegWrite MEM 58

Resolvig Data Depedece Optio 1: Stall the pipelie (i.e., Isertig bubbles ) t 0 t 1 t 2 t 3 t 4 t 5 Ist h IF ID ALU MEM WB Ist i i IF ID ALU MEM WB Ist j j IF ID ALU ID MEM ALU ID WB MEM ALU WB MEM Ist k IF ID IF ALU ID IF MEM ALU ID WB MEM ALU Ist l IF ID IF ALU ID IF MEM ALU ID IF ID IF ALU ID i: r x _ j: bubble _ r IF ID x dist(i,j)=1 IF Stall = make the depedet istructio j: bubble _ r x dist(i,j)=2 IF j: _ r x dist(i,j)=3 wait util its source data value is available 1. stop all up-stream stages 2. drai all dow-stream stages 59

Resolvig Data Depedece Optio 1: Stall the pipelie (i.e., Isertig bubbles ) t 0 t 1 t 2 t 3 t 4 t 5 Ist h IF ID ALU MEM WB Ist i i i IF ID ALU MEM WB Ist Bubble j (op) j IF ID j ALU ID MEM ALU WB MEM WB Ist Bubble k (op) IF k IF ID j ALU ID MEM ALU WB MEM Ist lj j IF k IF ID j ID ALU ALU ID MEM Ist k k IF IF ID i: r x _ k ID ALU j: bubble _ r IF ID x dist(i,j)=1 IF j: _ r x dist(i,j)=2 IF 60