What do we have so far? Multi-Cycle Datapath

Similar documents
What do we have so far? Multi-Cycle Datapath (Textbook Version)

Enhanced Performance with Pipelining

Pipelining. Chapter 4

Overview of Pipelining

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

PS Midterm 2. Pipelining

Chapter 6: Pipelining

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Review: Computer Organization

1048: Computer Organization

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Improve performance by increasing instruction throughput

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Multi-cycle Datapath (Our Version)

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Chapter 6: Pipelining

Chapter 4 (Part II) Sequential Laundry

CS 251, Winter 2018, Assignment % of course mark

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

EEC 483 Computer Organization

The single-cycle design from last time

The extra single-cycle adders

CS 251, Winter 2019, Assignment % of course mark

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

Solutions for Chapter 6 Exercises

Quiz #1 EEC 483, Spring 2019

EEC 483 Computer Organization. Branch (Control) Hazards

EEC 483 Computer Organization

Exceptions and interrupts

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

Review Multicycle: What is Happening. Controlling The Multicycle Design

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

Designing a Pipelined CPU

1048: Computer Organization

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

Lecture 7. Building A Simple Processor

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Lecture 6: Pipelining

Computer Architecture. Lecture 6: Pipelining

Lecture 10: Pipelined Implementations

Review. A single-cycle MIPS processor

Pipelining: Basic Concepts

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Computer Architecture

1048: Computer Organization

Computer Architecture

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipeline Data Hazards. Dealing With Data Hazards

COMP2611: Computer Organization. The Pipelined Processor

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

PART I: Adding Instructions to the Datapath. (2 nd Edition):

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

微算機系統第六章. Enhancing Performance with Pipelining 陳伯寧教授電信工程學系國立交通大學. Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

ECEC 355: Pipelining

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Pipelining. CSC Friday, November 6, 2015

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS 251, Spring 2018, Assignment 3.0 3% of course mark

cs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes

Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Advanced Computer Architecture Pipelining

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

CS 251, Winter 2018, Assignment % of course mark

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Processor (II) - pipelining. Hwansoo Han

Chapter 4. The Processor

Pipelining: Hazards Ver. Jan 14, 2014

COMPUTER ORGANIZATION AND DESIGN

Pipeline Review. Review

14:332:331 Pipelined Datapath

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

EECS 322 Computer Architecture Improving Memory Access: the Cache

CSSE232 Computer Architecture I. Mul5cycle Datapath

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

LECTURE 3: THE PROCESSOR

Chapter 4 The Processor 1. Chapter 4B. The Processor

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

COSC 6385 Computer Architecture - Pipelining

ECE331: Hardware Organization and Design

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

Transcription:

What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2

Pipelining pipelining is a CPU implementation techniqe where mltiple operations on a nmber of instrctions are overlapped. The net instrction is fetched in the net cycle withot waiting for the crrent instrction to complete. An instrction eection pipeline involves a nmber of steps, where each step completes one part of an instrction. Each step is called a pipeline stage or a pipeline segment. The stages or steps are connected one to the net to form a pipeline -- instrctions enter at one end and progress throgh the stages and eit at the other end when completed. Pipeline Throghpt : The instrction completion rate of the pipeline and is determined by how often an instrction eists the pipeline. The time to move an instrction one step down the line is is eqal to the machine cycle and is determined by the stage with the longest processing delay. Pipeline Latency: The time reqired to complete an instrction: Cycle time Nmber of pipeline stages. #2 Lec # 8 Spring2 4-11-2

Single Cycle Vs. Pipelining P rogram e ection Tim e o rder (in instrctions) lw $ 1, 1 ($ ) fetch 2 4 6 8 1 1 2 14 16 18 ALU Data access Single Cycle lw $ 2, 2 ($ ) 8 ns fetch ALU Data access lw $ 3, 3 ($ ) Time for 1 instrctions = 8 1 = 8 ns 8 n s fetch 8 ns... Prog ram eection Time ord er (in instrctions) lw $1, 1 ($) lw $2, 2 ($) fetch 2 ns 2 4 6 8 1 1 2 14 fetch ALU Data access ALU Data access 5 Stage Pipeline lw $3, 3 ($) 2 ns fetch ALU Data access 2 ns 2 n s 2 ns 2 ns 2 n s Time for 1 instrctions = time to fill pipeline + cycle time 1 = 8 + 2 1 = 28 ns Pipelining Speedp = 8/28 = 3.98 #3 Lec # 8 Spring2 4-11-2

Pipelining: Design Goals The length of the machine clock cycle is determined by the time reqired for the slowest pipeline stage. An important pipeline design consideration is to balance the length of each pipeline stage. If all stages are perfectly balanced, then the time per instrction on a pipelined machine (assming ideal conditions with no stalls): Time per instrction on npipelined machine Nmber of pipe stages Under these ideal conditions: Speedp from pipelining = the nmber of pipeline stages = k One instrction is completed every cycle: CPI = 1. #4 Lec # 8 Spring2 4-11-2

From IPS lti-cycle Datapath: Five Stages of Load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load IF ID EX E WB 1- Fetch (IF) Fetch Fetch the instrction from the emory. 2- Decode (ID): isters Fetch and Decode. 3- Eecte (EX): Calclate the memory address. 4- emory (E): the data from the Data emory. 5- Write Back (WB): Write the data back to the register file. #5 Lec # 8 Spring2 4-11-2

Pipelined Processing Representation Clock cycle Nmber Time in clock cycles Nmber 1 2 3 4 5 6 7 8 9 I IF ID EX E WB I+1 IF ID EX E WB I+2 IF ID EX E WB I+3 IF ID EX E WB I +4 IF ID EX E WB Time to fill the pipeline Pipeline Stages: IF = Fetch ID = Decode EX = Eection E = emory Access WB = Write Back First instrction, I Completed Last instrction, I+4 completed #6 Lec # 8 Spring2 4-11-2

Pipelined Processing Time IF ID EX E WB Representation IF ID EX E WB IF ID EX E WB IF ID EX E WB Program Flow IF ID EX E WB IF ID EX E WB #7 Lec # 8 Spring2 4-11-2

Clk Single Cycle, lti-cycle, Vs. Pipeline Single Cycle Implementation: Cycle 1 Cycle 2 8 ns Load Store Waste 2ns Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 1 Clk ltiple Cycle Implementation: Load IF ID EX E WB Store IF ID EX E R-type IF Pipeline Implementation: Load IF ID EX E WB Store IF ID EX E WB R-type IF ID EX E WB #8 Lec # 8 Spring2 4-11-2

Single Cycle, lti-cycle, Pipeline: Performance Comparison Eample For 1 instrctions, eection time: Single Cycle achine: 8 ns/cycle 1 CPI 1 inst = 8 ns lticycle achine: 2 ns/cycle 4.6 CPI (de to inst mi) 1 inst = 92 ns Ideal pipelined machine, 5-stages: 2 ns/cycle (1 CPI 1 inst + 4 cycle fill) = 28 ns #9 Lec # 8 Spring2 4-11-2

IPS Pipeline Stage Identification IF: fetch ID: decode/ register file read EX: Eecte/ address calclation E: emory access WB: Write back 1 Add 4 Shift left 2 Add reslt Add PC Address memory register 1 data 1 register 2 isters data 2 Write register Write data 16 Sign etend 32 1 Zero ALU ALU reslt Address Write data Data memory data 1 What is needed to divide datapath into pipeline stages? #1 Lec # 8 Spring2 4-11-2

IPS: An Initial Pipelined Datapath 1 IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address memory register 1 data 1 register 2 isters data 2 Write register Write data 1 Zero ALU ALU reslt Address Write data Data memory data 1 16 Sign etend 32 IF ID EX E WB Fetch Decode Eection emory Write Back Can yo find a problem even if there are no dependencies? What instrctions can we eecte to manifest the problem? #11 Lec # 8 Spring2 4-11-2

A Corrected Pipelined Datapath 1 IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add reslt Add PC Address memory register 1 data 1 register 2 isters data 2 Write register Write data 1 Zero ALU ALU reslt Address Write data Data memory data 1 16 Sign etend 32 IF ID EX E WB Fetch Decode Eection emory Write Back #12 Lec # 8 Spring2 4-11-2

Representing Pipelines Graphically Time (in clock cycles) Program eection order (in instrctions) lw $1, 2($1) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 I ALU D sb $11, $2, $3 I ALU D Can help with answering qestions like: How many cycles does it take to eecte this code? What is the ALU doing dring cycle 4? Use this representation to help nderstand datapaths #13 Lec # 8 Spring2 4-11-2

Adding Pipeline Control Points PCSrc 1 IF/ID ID/EX EX/E E/WB Add 4 Write Shift left 2 Add Add reslt Branch PC Address memory register 1 data 1 register 2 isters Write data 2 register Write data ALUSrc 1 Zero ALU ALU reslt Address Write emwrite Data memory data emto 1 [15 ] 16 Sign 32 etend 6 ALU control data em [2 16] [15 11] 1 ALUOp Dst #14 Lec # 8 Spring2 4-11-2

Pipeline Control Pass needed control signals along from one stage to the net as the instrction travels throgh the pipeline jst like the data Eection/Address Calclation stage control lines emory access stage control lines Write-back stage control lines Dst ALU Op1 ALU Op ALU Src Branch em em Write write em to R-format 1 1 1 lw 1 1 1 1 sw X 1 1 X beq X 1 1 X WB Control WB EX WB IF/ID ID/EX EX/E E/WB #15 Lec # 8 Spring2 4-11-2

Pipeline Control The ain Control generates the control signals dring /Dec Control signals for Eec (EtOp, ALUSrc,...) are sed 1 cycle later Control signals for em (emwr Branch) are sed 2 cycles later Control signals for Wr (emto emwr) are sed 3 cycles later ID EX em WB EtOp EtOp ALUSrc ALUSrc IF/ID ister ain Control ALUOp Dst emwr Branch emto ID/E ister ALUOp Dst emwr Branch emto E/em ister emwr Branch emto em/wb ister emto Wr Wr Wr Wr #16 Lec # 8 Spring2 4-11-2

Pipelined Datapath with Control Added PCSrc 1 Control ID/EX WB EX/E WB E/WB IF/ID EX WB Add PC 4 Address memory register 1 data 1 register 2 isters Write data 2 register Write data R egwrite Shift left 2 1 Add Add reslt ALUSrc Zero ALU ALU reslt Branch Write data emwrite Address Data memory data emto 1 [15 ] 16 Sign 32 etend 6 ALU control em [2 16] [15 11] 1 Dst ALUOp Target address of branch determined in E #17 Lec # 8 Spring2 4-11-2

Basic Performance Isses In Pipelining Pipelining increases the CPU instrction throghpt: The nmber of instrctions completed per nit time. Under ideal condition instrction throghpt is one instrction per machine cycle, or CPI = 1 Pipelining does not redce the eection time of an individal instrction: The time needed to complete all processing steps of an instrction (also called instrction completion latency). It sally slightly increases the eection time of each instrction over npipelined implementations de to the increased control overhead of the pipeline and pipeline stage registers delays. #18 Lec # 8 Spring2 4-11-2

Pipelining Performance Eample Eample: For an npipelined machine: Clock cycle = 1ns, 4 cycles for ALU operations and branches and 5 cycles for memory operations with instrction freqencies of 4%, 2% and 4%, respectively. If pipelining adds 1ns to the machine clock cycle then the speedp in instrction eection from pipelining is: Non-pipelined Average instrction eection time = Clock cycle Average CPI = 1 ns ((4% + 2%) 4 + 4% 5) = 1 ns 4.4 = 44 ns In the pipelined five implementation five stages are sed with an average instrction eection time of: 1 ns + 1 ns = 11 ns Speedp from pipelining = time npipelined time pipelined = 44 ns / 11 ns = 4 times #19 Lec # 8 Spring2 4-11-2

Pipeline Hazards Hazards are sitations in pipelining which prevent the net instrction in the instrction stream from eecting dring the designated clock cycle reslting in one or more stall cycles. Hazards redce the ideal speedp gained from pipelining and are classified into three classes: Strctral hazards: Arise from hardware resorce conflicts when the available hardware cannot spport all possible combinations of instrctions. Data hazards: Arise when an instrction depends on the reslts of a previos instrction in a way that is eposed by the overlapping of instrctions in the pipeline. Control hazards: Arise from the pipelining of conditional branches and other instrctions that change the PC. #2 Lec # 8 Spring2 4-11-2

Strctral Hazards In pipelined machines overlapped instrction eection reqires pipelining of fnctional nits and dplication of resorces to allow all possible combinations of instrctions in the pipeline. If a resorce conflict arises de to a hardware resorce being reqired by more than one instrction in a single cycle, and one or more sch instrctions cannot be accommodated, then a strctral hazard has occrred, for eample: when a machine has only one register file write port or when a pipelined machine has a shared single-memory pipeline for data and instrctions. stall the pipeline for one cycle for register writes or memory data access #21 Lec # 8 Spring2 4-11-2

Strctral hazard Eample: Single emory For s & Data Time (clock cycles) I n s t r. O r d e r Load Instr 1 Instr 2 Instr 3 Instr 4 ALU em em em em ALU em ALU em em ALU em ALU em em Detection is easy in this case (right half highlight means read, left half write) #22 Lec # 8 Spring2 4-11-2

Data Hazards Eample Problem with starting net instrction before first is finished Data dependencies here that go backward in time create data hazards. sb $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 1($2) Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $1, $3 CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 I CC 7 CC 8 CC 9 1 1 1 1 1/ 2 2 2 2 2 D and $12, $2, $5 I D or $13, $6, $2 I D add $14, $2, $2 I D sw $15, 1($2) I D #23 Lec # 8 Spring2 4-11-2

Data Hazard Resoltion: Stall Cycles Stall the pipeline by a nmber of cycles. The control nit mst detect the need to insert stall cycles. In this case two stall cycles are needed. Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $1, $3 CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 I CC 7 CC 8 1 1 1 1 1/ 2 2 2 2 D CC 9 2 CC 1 2 CC 11 2 and $12, $2, $5 I STALL STALL D or $13, $6, $2 STALL STALL I D add $14, $2, $2 I D sw $15, 1($2) I D #24 Lec # 8 Spring2 4-11-2

Performance of Pipelines with Stalls Hazards in pipelines may make it necessary to stall the pipeline by one or more cycles and ths degrading performance from the ideal CPI of 1. CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instrction If pipelining overhead is ignored and we assme that the stages are perfectly balanced then: Speedp = CPI npipelined / (1 + Pipeline stall cycles per instrction) When all instrctions take the same nmber of cycles and is eqal to the nmber of pipeline stages then: Speedp = Pipeline depth / (1 + Pipeline stall cycles per instrction) #25 Lec # 8 Spring2 4-11-2

Data Hazard Resoltion: Compiler Schedling The compiler can garantee that no data hazards eist by re-ordering instrctions and/or adding NOP instrctions where needed. For the previos eample: sb $2, $1, $3 nop nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 1($2) #26 Lec # 8 Spring2 4-11-2

Data Hazard Resoltion: Forwarding Observation: Why not se temporary reslts prodced by memory/alu and not wait for them to be written back in the register bank. Forwarding is a hardware-based techniqe (also called register bypassing or short-circiting) sed to eliminate or minimize data hazard stalls that makes se of this observation. Using forwarding hardware, the reslt of an instrction is copied directly from where it is prodced (ALU, memory read port etc.), to where sbseqent instrctions need it (ALU inpt register, memory write port etc.) #27 Lec # 8 Spring2 4-11-2

Data Hazard Resoltion: Forwarding ister file forwarding to handle read/write to same register ALU forwarding #28 Lec # 8 Spring2 4-11-2

Pipelined Datapath With Forwarding ID/EX WB EX/E Control WB E/WB IF/ID EX WB PC memory isters ALU Data memory IF/ID.isterRs Rs IF/ID.isterRt Rt IF/ID.isterRt IF/ID.isterRd Rt Rd EX/E.isterRd Forwarding nit E/WB.isterRd #29 Lec # 8 Spring2 4-11-2

Data Hazard Eample With Forwarding Vale of register $2 : Vale of EX/E : Vale of E/WB : Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 1 1 1 1 1/ 2 2 2 2 2 X X X 2 X X X X X X X X X 2 X X X X Program eection order (in instrctions) sb $2, $1, $3 I D and $12, $2, $5 I D or $13, $6, $2 I D add $14, $2, $2 I D sw $15, 1($2) I D #3 Lec # 8 Spring2 4-11-2

A Data Hazard Reqiring A Stall A load followed by an R-type instrction that ses the loaded vale Program eection order (in instrctions) lw $2, 2($1) Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 I D CC 7 CC 8 CC 9 and $4, $2, $5 I D or $8, $2, $6 I D add $9, $4, $2 I D slt $1, $6, $7 I D Even with forwarding in place a stall cycle is needed This condition mst be detected by hardware #31 Lec # 8 Spring2 4-11-2

A Data Hazard Reqiring A Stall A load followed by an R-type instrction that ses the loaded vale Program eection order (in instrctions) Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 1 lw $2, 2($1) I D and $4, $2, $5 I D or $8, $2, $6 add $9, $4, $2 I I D bbble I D slt $1, $6, $7 I D We can stall the pipeline by keeping an instrction in the same stage #32 Lec # 8 Spring2 4-11-2

Compiler Schedling Eample Reorder the instrctions to avoid as many pipeline stalls as possible: lw $15, ($2) lw $16, 4($2) sw $16, ($2) sw $15, 4($2) The data hazard occrs on register $16 between the second lw and the first sw reslting in a stall cycle With forwarding we need to find only one independent instrctions to place between them, swapping the lw instrctions works: lw $15, ($2) lw $16, 4($2) sw $15, ($2) sw $16, 4($2) Withot forwarding we need three independent instrctions to place between them, so in addition two nops are added. lw $15, ($2) lw $16, 4($2) nop nop sw $15, ($2) sw $16, 4($2) #33 Lec # 8 Spring2 4-11-2

Datapath With Hazard Detection Unit A load followed by an instrction that ses the loaded vale is detected and a stall cycle is inserted. Hazard detection nit ID/EX.em ID/EX IF/IDWrite IF/ID Control WB EX EX/E WB E/WB WB PCWrite PC memory isters ALU Data memory IF/ID.isterRs IF/ID.isterRt IF/ID.isterRt IF/ID.isterRd Rt Rd EX/E.isterRd ID/EX.isterRt Rs Rt Forwarding nit E/WB.isterRd #34 Lec # 8 Spring2 4-11-2

Control Hazards: Eample Three other instrctions are in the pipeline before branch instrction target decision is made when BEQ is in E stage. Program eection order (in instrctions) Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 4 beq $1, $3, 7 I D 44 and $12, $2, $5 I D 48 or $13, $6, $2 I D 52 add $14, $2, $2 I D 72 lw $4, 5($7) I D In the above diagram, we are predicting branch not taken Need to add hardware for flshing the three following instrctions if we are wrong losing three cycles. #35 Lec # 8 Spring2 4-11-2

Redcing Delay of Taken Branchs Net PC of a branch known in E stage: Costs three lost cycles if taken. If net PC is known in EX stage, one cycle is saved. Branch address calclation can be moved to ID stage sing a register comparator, costing only one cycle if branch is taken. IF.Flsh Hazard detection nit ID/EX WB EX/E Control WB E/WB IF/ID EX WB PC 4 memory Shift left 2 isters = ALU Data memory Sign etend Forwarding nit #36 Lec # 8 Spring2 4-11-2

Pipeline Performance Eample Assme the following IPS instrction mi: Type Freqency Arith/Logic 4% Load 3% of which 25% are followed immediately by an instrction sing the loaded vale Store 1% branch 2% of which 45% are taken What is the reslting CPI for the pipelined IPS with forwarding and branch address calclation in ID stage? CPI = Ideal CPI + Pipeline stall clock cycles per instrction = 1 + stalls by loads + stalls by branches = 1 +.3.251 +.2.451 = 1 +.75 +.9 = 1.165 #37 Lec # 8 Spring2 4-11-2