TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

Similar documents
Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

PS Midterm 2. Pipelining

1048: Computer Organization

Review: Computer Organization

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

Solutions for Chapter 6 Exercises

EEC 483 Computer Organization. Branch (Control) Hazards

What do we have so far? Multi-Cycle Datapath

Enhanced Performance with Pipelining

Pipelining. Chapter 4

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

CS 251, Winter 2019, Assignment % of course mark

Overview of Pipelining

Chapter 6: Pipelining

CS 251, Winter 2018, Assignment % of course mark

Exceptions and interrupts

The extra single-cycle adders

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

Review. A single-cycle MIPS processor

Chapter 6: Pipelining

EEC 483 Computer Organization

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

EEC 483 Computer Organization

The single-cycle design from last time

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Review Multicycle: What is Happening. Controlling The Multicycle Design

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Quiz #1 EEC 483, Spring 2019

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

1048: Computer Organization

1048: Computer Organization

Lecture 13: Exceptions and Interrupts

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN

CSEE 3827: Fundamentals of Computer Systems

Full Datapath. Chapter 4 The Processor 2

Lecture 7. Building A Simple Processor

ECE260: Fundamentals of Computer Engineering

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

PART I: Adding Instructions to the Datapath. (2 nd Edition):

Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

Processor (II) - pipelining. Hwansoo Han

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

Chapter 4 The Processor 1. Chapter 4B. The Processor

LECTURE 9. Pipeline Hazards

Computer Architecture. Lecture 6: Pipelining

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

LECTURE 3: THE PROCESSOR

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

Computer Architecture

Computer Architecture

Outline Marquette University

Chapter 4 (Part II) Sequential Laundry

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

CS 251, Winter 2018, Assignment % of course mark

CS 251, Spring 2018, Assignment 3.0 3% of course mark

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Hardware Design Tips. Outline

CENG 3420 Lecture 06: Pipeline

Thomas Polzer Institut für Technische Informatik

14:332:331 Pipelined Datapath

DEE 1053 Computer Organization Lecture 6: Pipelining

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

POWER-OF-2 BOUNDARIES

Chapter 4. The Processor

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

ELE 655 Microprocessor System Design

EIE/ENE 334 Microprocessors

Chapter 4. The Processor

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

ECE154A Introduction to Computer Architecture. Homework 4 solution

Improve performance by increasing instruction throughput

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Chapter 4. The Processor

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 9: Microcontrolled Multi-Cycle Implementations

CS 153 Design of Operating Systems

CSSE232 Computer Architecture I. Mul5cycle Datapath

Pipelining. CSC Friday, November 6, 2015

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

Transcription:

Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard in a processor pipeline? What are the five stages of the IPS pipeline?

Today: Friday the 2st of October 4.7 Data Hazards: Forwarding vs Stalling 4.8 Control Hazards (branches) 4.9 Eceptions

Datapath with control from last week

Data Hazards and Forwarding There are dependencies in the seqence to the left All the for last instrctions se register $2 Assme that $2 is before the sb instrction and that $-$3 is -2 The and instrction shold se -2 for $2, bt reads from the register file The or instrction also reads from the register file The add and sw instrctions read the correct vale -2 from the register file. sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC sb $2, $, $3... $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 2 and $2, $2, $5 sb $2, $, $3.. $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 3 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3. $2 = and instrction reads for $2 stores in ID/EX sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 4 add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = or instrction reads for $2 and stores it in ID/EX and instrction ses in Al operation sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 5 sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = (/)-2 add reads new vale 2 for $2 from register file or ses in Al operation sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 6. sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 $2 = 2 sw reads new vale 2 for $2 from register file add ses 2 in al operation and writes register $2 with vale calclated with $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

When is the needed and is it prodced? sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC sb $2, $, $3... $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 2 and $2, $2, $5 sb $2, $, $3.. $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 3 EX/E.AlOt gets new vale for $2 (2) or $3, $6, $2 and $2, $2, $5 sb $2, $, $3. $2 = and instrction reads for $2 stores in ID/EX sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 4 ALU needs new $2 vale available in EX/E add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = or instrction reads for $2 and stores it in ID/EX and instrction can se 2 from EX/E.AlOt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 5 - ALU needs new $2 vale, now available in E/WB sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = (/)-2 add reads new vale 2 for $2 from register file or can se 2 from E/WB.AlOt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

How can a need for forwarding be detected?

Detecting hazards - types of hazard conditions Notation: EX/E.RegisterRd (RegisterRd in EX/E register) EX/E.AlOt (Regsiter ALUOt in EX/E register) Hazard conditions: a) EX/E.RegisterRd = ID/EX.RegisterRs b) EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.RegisterRd = ID/EX.RegisterRs 2b) E/WB.RegisterRd = ID/EX.RegisterRt

CC 4 Which hazard type add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 a) EX/E.RegisterRd = ID/EX.RegisterRs b) EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.RegisterRd = ID/EX.RegisterRs 2b) E/WB.RegisterRd Rd = ID/EX.RegisterRt Rt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

CC 5 - Which hazard type sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 a) EX/E.RegisterRd = ID/EX.RegisterRs b) EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.RegisterRd = ID/EX.RegisterRs 2b) E/WB.RegisterRd Rd = ID/EX.RegisterRt Rt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

hazards detection contined (r-type) Detection is performed in the EX state There is no hazard if the previos instrction will not write the reslt Reg for the earlier instrction mst be asserted $ is always and a write to $ will not create dependencies a) EX/E.Reg and EX/E.RegisterRd!= and EX/E.RegisterRd = ID/EX.RegisterRS b) EX/E.Reg and EX/E.RegisterRd!= and EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.Reg and E/WB.RegisterRd!= and E/WB.RegisterRd = ID/EX.RegisterRS 2b) E/WB.Reg and E/WB.RegisterRd!= and E/WB.RegisterRd = ID/EX.RegisterRt

Seqence with forwarding Dependence between pipeline registers and the inpts to the ALU Reqired eists in time for later instrctions It is possible to spply the inpts to the ALU needed by the and instrction and or instrction by forwarding the reslts fond in the pipeline registers

Adding forwarding logic If inpts to the ALU can be taken from any pipeline register proper can be forwarded By adding mltipleers to the inpt of the ALU the pipeline can be rn at fll speed in presence of dependenciesd

We mst not forget the immediate vales

Control lines from forwarding nit control Sorce Eplanation ForwardA = ID/EX Al operand A comes from the register file ForwardA = EX/E Al operand A comes from previos cycle ALU reslt ForwardA = E/WB Al operand A comes from previos cycle memory read or earlier ALU reslt ForwardB = ID/EX Al operand B comes from the register file ForwardB = EX/E Al operand B comes from previos cycle ALU reslt ForwardB = E/WB Al operand B comes from previos cycle memory read or earlier ALU reslt a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= 2a) If (E/WB.Reg and (E/WB.RegisterRd ) and (E/WB.RegisterRd = ID/EXRegisterRs)) ForwardA <= 2b) If (E/WB.Reg and (E/WB.RegisterRd ) and (E/WB.RegisterRd = ID/EXRegisterRt)) ForwardB <=

CC add $, $, $2... add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

CC 2 add $, $, $3 add $,. $, $2.. add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

CC 3 add $, $, $4 add $,. $, $3 add $,. $, $2. add $, $, $3 reads old vale from register file add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

CC 4 add $, $, $5 add $,. $, $4 add $,. $, $3 add $,. $, $2 add $, $, $4 reads old vale from register file add $, $, $2 add $, $, $3 gets the forwarded vale from e/mem add $, $, $3 a) If (EX/E.Reg add $, $, $4 and (EX/E.RegisterRd ) add $, $, $5 and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <=

CC 5 add $, $, $6 add $,. $, $5 add $,. $, $4 add $,. $, $3 add $, $, $2 a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= 2a) If (E/WB.Reg and (E/WB.RegisterRd ) and (E/WB.RegisterRd = ID/EXRegisterRs)) ForwardA <= add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

Control lines from forwarding nit control Sorce Eplanation ForwardA = ID/EX Al operand A comes from the register file ForwardA = EX/E Al operand A comes from previos cycle ALU reslt ForwardA = E/WB Al operand A comes from previos cycle memory read or earlier ALU reslt ForwardB = ID/EX Al operand B comes from the register file ForwardB = EX/E Al operand B comes from previos cycle ALU reslt ForwardB = E/WB Al operand B comes from previos cycle memory read or earlier ALU reslt a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= 2a) If (E/WB.Reg and (E/WB.RegisterRd ) and (EX/E.RegisterRd ID/EXRegisterRs) and (E/WB.RegisterRd = ID/EXRegisterRs)) ForwardA <= 2b) If (E/WB.Reg and (E/WB.RegisterRd ) and (EX/E.RegisterRd Rd ID/EXRegisterRt) Rt) and (E/WB.RegisterRd = ID/EXRegisterRt)) ForwardB <=

Datapath modified to resolve hazards Forwarding nit in EX-stage (with UXes) Operand register nmbers are passed on from ID stage via ID/EX pipel. reg. Some details are left ot, like signetending nit What abot store instrctions following r-type instrctions: add $2, $, $3 add $2, $, $5 sw $2, ($3) sw $5, ($2) or store following loads lw $2, ($3) sw $2, ($4)

CC add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $2, ($3)

CC 2 sw $2, ($3) add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $2, ($3)

CC 3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) $2, $3, $4, $ Instrction decode Eection sw $2, ($3) add $2, $, $3... sb $, $2, $3 emory lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $2, ($3)

CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode add $2, $3, $4, $ sb $, $2, Eection emory sw $2, ($3) add $2, $, $3... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= add $2, $, $3 sw $2, ($3) The read from port B is echanged with rd in EX/E PAT6F2.eps

CC add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $5, ($2)

CC 2 sw $2, ($3) add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction.. decode Eection emory.. lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $5, ($2)

CC 3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) $2, $3, $4, $ sw Instrction $2, ($3) decode add Eection $2,. $, $3... sb $, $2, $3 emory lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $5, ($2)

CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode add $2, $3, $4, $ sb $, $2, sw $2, Eection ($3) add $2, emory. $,.. $3 lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= add $2, $, $3 sw $5, ($2) The read from port A is echanged with rd in EX/E PAT6F2.eps

CC lw $2, ($3) add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend PAT6F2.eps lw $2, ($3) sw $2, ($4)

CC 3 add $4, $5, $6 Instrction fetch sw $2, ($4) lw $3, 24 ($) Instrction decode.. add $2, $3, $4, $ Eection. lw $2, ($3) sb $, $2, $3 emory lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend PAT6F2.eps lw $2, ($3) sw $2, ($4)

CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode sw $2, ($4) add $2, $3, $4, $ Eection.. sb $, $2, emory. lw $2, ($3) lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend PAT6F2.eps lw $2, ($3) sw $2, ($4)

CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode sw $2, ($4) add $2, $3, $4, $ Eection.. sb $, $2, emory. lw $2, ($3) lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= sw is forwarded the wrong vale for $2 lw $2, ($3) sw $2, ($4) PAT6F2.eps

CC 5 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode add $2, $3, $4, $ Eection sb $, $2, $3 emory sw $2, ($4).. lw$, 2($) back. lw $2, ($3) IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend??? lw $2, ($3) sw $2, ($4) PAT6F2.eps

Data hazards and stalls (6.5) lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7 Forwarding cannot avoid stalling the pipeline when an instrction tries to read a register following a load instrction that writes the same register. The is still being read from memory in clock cycle 4 while the ALU is performing the operation for the following instrction. Something mst stall the pipeline for the combination of load followed by an instrction that reads its reslt. Hazard detection is needed. d

CC lw $2, 2($)... $2 = lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

CC 2 and $4, $2, $5 lw $2, 2($).. $2 = lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

CC 3 or $8, $2, $6 and $4, $2, $5 lw $2, 2($). $2 = and instrction reads for $2 stores in ID/EX lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

CC 4 add $9, $4, $2 or $8, $2, $6 and $4, $2, $5 lw $2, 2($) $2 = or instrction reads for $2 and stores it in ID/EX and instrction need new $2 vale, bt it is not available lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

CC 5 slt $, $6, $7 add $9, $4, $2 or $8, $2, $6 and $4, $2, $5 lw $2, 2($) $2 = add instrction reads for $2 and stores it in ID/EX or instrction can get ALU A from E/WB register lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

Data hazards and stalls hazard detection ID step mst test to see if previos instrction is a load. Then it mst be decided if the sorce registers match the destination register of the load if (ID/EX.em and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) Stall pipeline

Stall insertion Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC Program eection order (in instrctions) lw $2, 2($) and becomes nop add $4, $2, $5 or $8, $2, $6 add $9, $4, $2 I Reg I Reg I D Reg I Reg D Reg I bbble Reg D Reg D Reg CC2 and is fetched and lw is decoded CC3 and is decoded, or is fetched and lw is eected CC4 and is decoded, d d or is fetched, a nop is eected eecedad and lw is in the E stage The nop can be achieved by setting harmless control signals, Reg D Reg PAT6F35.eps

Stall / No operation Both the instrctions in the ID and IF stages mst be stalled to not loose the fetched instrctions. Preventing these two instrctions from making progress is accomplished simply by preventing the PC register and the IF/ID pipeline register from changing. The back half of the pipeline starting ti with the EX is eecting instrctions that have no effect: nops, which act like bbbles Deasserting all 9 control signals in the EX, E and WB stages will create a do nothing or nop instrction. No registers or memories are written

Pipeline with control, forwarding and hazard detection

The big pictre (page 374) Althogh the hardware may or may not rely on the compiler to resolve hazard dependences to ensre correct eection, the compiler mst nderstand the pipeline to achieve the best performance. Otherwise, nepected stalls will redce the performance of the compiled code

Branch hazards / control hazard (6.6) The decision whether to branch or not is taken in the E stage. Withot intervention the three seqential instrctions following the branch will be fetched and begin eection less freqent than hazards, bt a three instrction flsh is costly

Assme branch not taken Stalling ntil the branch is complete is too slow. Improvement: Assme the branch will not be taken and contine eection. If it is taken the instrctions that are being fetched and decoded mst be discarded. If branches are ntaken half the time, and if it costs little to discard the instrctions, this optimization halves the cost of control hazards. To discard instrctions: change the original control vales to s Also change the three instrctions in the IF, ID and EX stages when the branch reaches the E-stage Discarding instrctions means we mst be able to flsh instrctions in the IF, ID and EX stages of the pipeline

Redcing the delay of branches If branch eection is moved earlier in the pipeline fewer instrctions need to be flshed (So far we have assmed that the net PC for a branch is selected in the E stage.) any branches can rely on simple tests not reqiring ii a fll llalu operation oving the branch decision i p reqires two actions to occr earlier; compting the branch target address and evalating the branch decision

Redcing the delay of branches early branch detection. We already have the PC and the immediate field in the IF/ID pipeline register, so we jst move the branch adder from the EX stage to the ID stage. 2. BEQ; we wold compare the two registers (simple logic) read dring the ID stage. oving the branch test to the ID stage implies additional forwarding and hazard detection hardware, since a branch dependent on a reslt still in the pipeline mst still work properly with this optimization

Eample page 379 36 sb $, $4, $8 4 beq $, $3, 7 # PC relative branch to 72 44 and $2, $2, $5 48 or $3, $2, $6 52 beq $4, $4, $2 56 and $5, $6, $7 72 lw $4, 5($7) Assmes that the pipeline is optimized for branches not taken and branch eection is moved to the ID stage The ID stage of CC3 determines that a branch mst be taken, so 72 is selected as the net PC address and zeros the instrction fetched for the net CC. CC4 shows the instrction at location 72 being fetched and the single bbble or nop instrction as a reslt of the taken branch.

Dynamic branch prediction In a deeper pipeline (than 5 stage IPS) a simple static prediction scheme will probably waste too mch performance. Dynamic branch prediction ses rntime information to decide where to begin fetching new instrctions. A branch prediction bffer or branch history table is a small memory indeed by the lower portion of the address of the branch instrction. The memory contains one bit indicating whether the branch was recently taken or not. A problem is that t we don t know if the prediction is the right one, and it may have been pt there by another branch with the same low-order bits. Another shortcoming: even if a branch is almost always taken, we will predict incorrectly twice, rather than once, when it is not taken.

2-bit prediction scheme By sing 2 bits rather than, a branch that t strongly favors taken or not taken will be mispredicted only once

Delayed branch Other branch handling strategies Always eecte the following instrction Compilers and assemblers try to fill in the following instrction with one withot dependencies looses effect on long pipelines and mltiple isse pipelines Branch target bffer Store the epected jmp address in a bffer Global dynamic prediction Use the global branch behavior to determine prediction Effective if combined with local branch prediction

The final path and control for chapter 4

Eceptions (4.9) Add $, $2, $s, sppose overflow. We mst: Transfer control to eception rotine immediately after this instrction We mst flsh the instrctions following the add from the pipeline and begin fetching instrctions from the new address Same mechanisms as for taken branches, bt with the eception deasserting the control lines

Datapath with controls to handle eceptions (fig. 4.66 page 387) ID.Flsh is ORed with the stall signal from the Hazard detection ti nit. To flsh instrctions in EX-stage; EX.Flsh casing UXes to zero the control lines Additional inpt to PC is added to be able to fetch instrctions from 8 8he, which is the eception location for overflow

Cases of eceptions (page 385): ) I/O device reqest 2) Hardware malfnction 3) Invoking an operating system service from a ser program 4) Using an ndefined instrction 5) Overflow ),2) are not associated with a special instrction, so the implementation has some fleibility as to when to interrpt the pipeline, sing the mechanism sed for other eceptions In case of simltaneos mltiple eceptions the normal soltion is to prioritize iti the eceptions

4 he sb $, $2, $4 44 he and $2, $2, $5 48 he or $3, $2, $6 4C he add $, $2, $ 5 he slt $5, $6, $7 54 he lw $6, 5($7) Instr. To be invoked 44 he sw $25, ($s) 444 he sw $26, 4($s) Overflow for add in EX stage 4 4 he forced into PC. CC7 shows that add and following instrctions are flshed and the first instrction ti of the eception code is fetched. Address of the instrction following add is saved: 4C he +4=5 he. and and or complete

lw $6, 5($7) slt $5, $6, $7 add $, $2, $ or $3,... and $2,... EX.Flsh IF.Flsh ID.Flsh Hazard detection nit ID/EX WB EX/E IF/ID 54 Control + EX 5 Case EPC WB E/WB WB 4 + 58 Shift left 2 $6 $2 88 44 PC 54 Instrction memory 2 Registers = $7 $ Data memory Sign etend 5 $ 3 2 Clock 6 Forwarding nit IF.Flsh sw $25, ($) bbble (nop) bbble bbble or $3,... EX.Flsh ID.Flsh IF/ID 54 Hazard detection nit Control + ID/EX WB Case EX EPC EX/E WB E/WB WB 4 + Shift left 2 Registers = 2 ALU 88 44 PC Instrction memory Data memory Sign etend 3 Clock 7 Forwarding nit PAT6F43.eps

lw $6, 5($7) slt $5, $6, $7 add $, $2, $ or $3,... and $2,... EX.Flsh IF.Flsh ID.Flsh Hazard detection nit ID/EX WB EX/E IF/ID 54 Control + EX 5 Case EPC WB E/WB WB 4 + 58 Shift left 2 $6 $2 88 44 PC 54 Instrction memory 2 Registers = $7 $ Data memory Sign etend 5 $ 3 2 Clock 6 Forwarding nit IF.Flsh sw $25, ($) bbble (nop) bbble bbble or $3,... EX.Flsh ID.Flsh IF/ID 54 Hazard detection nit Control + ID/EX WB Case EX EPC EX/E WB E/WB WB 4 + Shift left 2 Registers = 2 ALU 88 44 PC Instrction memory Data memory Sign etend 3 Clock 7 Forwarding nit PAT6F43.eps

HW/SW interface (/2) HW + OS works in conjnction so eceptions behave as epected. HW contract is normally to stop the offending instrction ti in midstream, let all prior instrctions complete, flsh all following instrctions, set a register to show the case of the eception, save the address of the offending instrction, and jmp to the prearranged address Compter Control emory Datapath Processor Inpt Otpt

HW/SW interface (2/2) The OS contract is to look at the case of the eception and act appropriately: Undefined instrction, hw failre, overflow: kills the program and retrns an indicator of the reason I/O reqest or OS service call: Saves state of program, performs desired task, restores the program to contine eec. On of the most important and freqent ses of eceptions is handling page falts and TLB eceptions (chapter 7) Compter Control emory Datapath Processor Inpt Otpt