Review: Computer Organization

Similar documents
EEC 483 Computer Organization. Branch (Control) Hazards

EEC 483 Computer Organization

Pipelining. Chapter 4

1048: Computer Organization

Enhanced Performance with Pipelining

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

What do we have so far? Multi-Cycle Datapath

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

PS Midterm 2. Pipelining

The extra single-cycle adders

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

Chapter 6: Pipelining

The single-cycle design from last time

Review Multicycle: What is Happening. Controlling The Multicycle Design

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

Chapter 6: Pipelining

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

CS 251, Winter 2018, Assignment % of course mark

Overview of Pipelining

EEC 483 Computer Organization

Exceptions and interrupts

CS 251, Winter 2019, Assignment % of course mark

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

PART I: Adding Instructions to the Datapath. (2 nd Edition):

Review. A single-cycle MIPS processor

Solutions for Chapter 6 Exercises

1048: Computer Organization

1048: Computer Organization

CS 251, Spring 2018, Assignment 3.0 3% of course mark

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Quiz #1 EEC 483, Spring 2019

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

Lecture 7. Building A Simple Processor

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

Lecture 10: Pipelined Implementations

CS 251, Winter 2018, Assignment % of course mark

Hardware Design Tips. Outline

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Computer Architecture

Computer Architecture. Lecture 6: Pipelining

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

Computer Architecture

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Outline Marquette University

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 9: Microcontrolled Multi-Cycle Implementations

Computer Architecture. Lecture 6.1: Fundamentals of

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

Review. How to represent real numbers

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 4 (Part II) Sequential Laundry

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

CSSE232 Computer Architecture I. Mul5cycle Datapath

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Lecture 6: Pipelining

Lab 8 (All Sections) Prelab: ALU and ALU Control

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Pipelining. Maurizio Palesi

Design of the MIPS Processor

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

ECE232: Hardware Organization and Design

Improve performance by increasing instruction throughput

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

EE 457 Unit 6a. Basic Pipelining Techniques

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Design of the MIPS Processor (contd)

Pipeline: Introduction

comp 180 Lecture 25 Outline of Lecture The ALU Control Operation & Design The Datapath Control Operation & Design HKUST 1 Computer Science

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Working on the Pipeline

Lecture 5: The Processor

Lecture 13: Exceptions and Interrupts

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Processor (I) - datapath & control. Hwansoo Han

Lecture 19 Introduction to Pipelining

Modern Computer Architecture

Chapter 8. Pipelining

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

Transcription:

Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes Folder takes 3 mintes Stasher takes 3 mintes to pt clothes into drawers 2

Seqential Landry 6 P 7 8 9 2 2 A A 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Time B C D Seqential landry takes 8 hors for loads If they learned pipelining, how long wold landry take? 3 Faster Landry - Pipelining 2 2 A 6 P 7 8 9 A B C D 3 3 3 3 3 3 3 Time Faster landry takes 3.5 hors for loads! 2

5 Stages of IPS Step name Instrction fetch Instrction decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[] = + A = Reg [IR[25-2]] B = Reg [IR[2-6]] ALUOt = + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address ALUOt = A op B ALUOt = A + sign-etend if (A ==B) then = [3-28] II comptation, branch/ (IR[5-]) = ALUOt (IR[25-]<<2) jmp completion emory access or R-type Reg [IR[5-]] = Load: DR = emory[aluot] completion ALUOt or Store: emory [ALUOt] = B emory read completion Load: Reg[IR[2-6]] = DR 5 Single cycle vs. lticycle Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read Reg. write load 2 3 5 6 7 8 9 2 3 5 6 7 8 Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read What are the advantages of mlticycle implementation? What are the disadvantages of mlticycle implementation? Reg. write load 6 3

lticycle vs. Pipelined Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read Reg. write load 2 3 5 6 7 8 9 2 3 5 6 7 8 Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read Reg. write load What are the advantages of pipelined implementation? What are the disadvantages of pipelined implementation? 7 Lessons from Pipelined Landry 6 P 7 8 9 Time 3 3 3 3 3 3 3 A B C D Pipelining doesn t help latency of single task, it helps throghpt of entire workload Potential speedp = Nmber pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages redces speedp Time to fill pipeline and time to drain it redces speedp ltiple tasks operating simltaneosly sing different resorces any dependencies, any conflicts??? 8

Can pipelining get s into troble? E If any two stages se the same resorce, there mst be a conflict. s n i o t c r t s n I 2 3 E E E 5 E 2 3 5 6 7 8 9 Time Step (Clock Cycle) 9 Hazards Hazard = when an instrction s stage is nable to eecte dring the crrent cycle. Can always resolve hazards by waiting pipeline control mst detect the hazard take action (or delay action) to resolve hazards E Instrction #2 stage 3 nable to contine. s n i o t c r t s n I 2 3 Stall E E E 2 3 5 6 7 8 9 Time Step (Clock Cycle) 5

Strctral Hazards A needed fnctional nit is bsy eecting a previos instrction (Attempt to se the same resorce two different ways at the same time) Eample: Or sample IPS pipeline has none. What if + comptation sed main ALU instead of separate adder? s n i o t c r t s n I 2 3 Stall E Stall E E 2 3 5 6 7 8 9 Time Step (Clock Cycle) Control Hazards While eecting a previos branch, net instrction address might not yet be known. (attempt to make a decision before condition is evalated) s n i o t c r t s n I Comptes branch target address. Calclates +. Performs branch test & sets to target. Conditional branch Branch target Stall Stall 2 3 E E 5 6 7 8 Time Step (Clock Cycle) 2 6

Data Hazards Needed still being compted by previos instrction. (attempt to se item before it is ready) add $s3,$s,$s2 E sw $s,($s3) Stall E lw $s5,($s3) Stall E add $s7,$s5,$s6 Stall Stall E 2 3 5 6 7 8 9 2 Time Step (Clock Cycle) 3 Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 6 7 8 9 7

Resorce Conflicts (revisit) Step name Instrction fetch Instrction decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[] = + A = Reg [IR[25-2]] B = Reg [IR[2-6]] ALUOt = + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address ALUOt = A op B ALUOt = A + sign-etend if (A ==B) then = [3-28] II comptation, branch/ (IR[5-]) = ALUOt (IR[25-]<<2) jmp completion emory access or R-type Reg [IR[5-]] = Load: DR = emory[aluot] completion ALUOt or Store: emory [ALUOt] = B emory read completion Load: Reg[IR[2-6]] = DR ALU conflict emory conflict Register file conflict (read or write) 5 Basic Pipeline : Instrction fetch : Instrction decode/ register file read : Eecte/ address calclation E: emory access : back ress Instrction Instrction register register 2 Registers register 6 2 Sign etend 32 Shift left 2 reslt ALU Zero ALU reslt ress Data Instrctions and move generally from left to right throgh the five stages as they complete eection ecept two cases. - stage - selection 6 8

Basic Pipeline Step name Instrction fetch Instrction decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[] = + A = Reg [IR[25-2]] B = Reg [IR[2-6]] ALUOt = + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address ALUOt = A op B ALUOt = A + sign-etend if (A ==B) then = [3-28] II comptation, branch/ (IR[5-]) = ALUOt (IR[25-]<<2) jmp completion emory access or R-type Reg [IR[5-]] = Load: DR = emory[aluot] completion ALUOt or Store: emory [ALUOt] = B emory read completion Why move?? ZF is available dring stage, anyway. Why do we still need 2 ALUs at stage? (one for A-B and the other for +IR) Load: Reg[IR[2-6]] = DR 7 Pipelined Datapath / / /E E/ Shift left 2 reslt to the basic pipeline in order to actally split the path into stages. ress Instrction Instrction register register 2 Registers register 6 2 Sign etend 32 Zero ALU ALU reslt ress Data The info. mst be placed in a pipeline register; otherwise, it is lost when the net instrction enters that pipeline stage. For store instrction, (?) => / pipeline register => /E pipeline register => (?) 8 9

Content of Pipeline Registers Which shold be passed throgh stages? I.e., what are the contents of pipeline registers? In / pipeline register (32), Inst. (32) In / pipeline register (32), Reg. (32), Reg. 2 (32), Offset (32), Reg. no. 2 and 3 () In /E pipeline register (32), ZF (), ALUOt (32), Reg. 2 (32), Reg. no. (5) In E/ pipeline register emory (32), ALUOt (32), Reg. no. (5) 9 Eample Five instrctions go throgh the IPS pipeline: lw $, 2($) (8c2a ) sb$, $2, $3 (3 582) and$2, $, $5 (85 626) or $3, $6, $7 (c7 6827) add$, $8, $9 (9 72) $pc = 5 [ ] = $ = [ ] =...... $9 = 9 2

2 22

23 2 2

25 26 3

Five instrctions go throgh the IPS pipeline lw $, 2($) (8c2a ) sb$, $2, $3 (3 582) and$2, $, $5 (85 626) or $3, $6, $7 (c7 6827) add$, $8, $9 (9 72) Register contents emory contents $pc = 5 [ ] = $ = [ ] =...... $9 = 9 27 add $, $8, $9 or $3, $6, $7 and $2, $, $5 sb $, $2, $3 lw $, 2($) / / /E E/ (a) (j) (m) (b) reslt (q) (t) Shift left 2 ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 28

9 Control Signals mltipleor selectors 3 write signals 2 ALU signals emto- Reg em em (Src) Instrction RegDst ALUSrc Reg Branch ALUOp ALUp R-format lw sw X X beq X X Q: In which stage is the control circit? Q2: stage eectes and and stage eectes lw Is emtoreg or? 29 Pipeline Control Generate control signals all at once at stage And passed them throgh stages jst like the Eection/ress Calclation stage control lines emory access stage control lines -back stage control lines Instrction Reg Dst ALU Op ALU Op ALU Src Branch em em Reg write em to Reg R-format lw sw X X beq X X Instrction Control / / /E E/ 3 5

Datapath with Control Src / /E Control E/ / ress Instrction Instrction register register 2 Registers register Reg 2 Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg Instrction [5 ] 6 32 Sign etend 6 ALU control em Instrction [2 6] Instrction [5 ] RegDst ALUOp 3 Graphically Representing Pipelines Time (in clock cycles) Program eection order (in instrctions) lw $, 2($) CC CC 2 CC 3 CC CC 5 CC 6 I Reg ALU D Reg sb $, $2, $3 I Reg ALU D Reg Can help with answering qestions like: how many cycles does it take to eecte this code? what is the ALU doing dring cycle? se this representation to help nderstand paths 32 6

Data Hazards Needed still being compted by previos instrction sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $, $2, $2 sw $5, ($2) 33 Assme $=, $2=, $3=3 Data Hazards: Dependencies Problem with starting net instrction before first is finished dependencies that go backward in time are hazards Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 and $2, $2, $5 CC CC 2 CC 3 CC CC 5 CC 6 I Reg I Reg CC 7 CC 8 CC 9 / 2 2 2 2 2 Reg D Reg D and has a problem or has a problem add??? sw is OK or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg 3 7

Data Hazards: Forwarding While reslt not written back ntil : sb $2,$,$3 E and $2,$2,$5 Stall Stall E It is calclated earlier in : sb $2,$,$3 E Actally available after stage (not ) and $2,$2,$5 35 E forwarding hardware to allow, e.g., s otpt (located in /E pipeline register) to be s inpt. Actally needed at stage (not ) Forwarding : All 2 Cases Time (in clock cycles) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 Vale of register $2 : / 2 2 2 2 2 Vale of /E : X X X 2 X X X X X Vale of E/ : X X X X 2 X X X X Program eection order (in instrctions) sb $2, $, $3 and $2, $2, $5 I Reg I Reg D Reg D Reg and has a problem -> fied or has a problem -> fied add??? -> OK sw is OK or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg 36 8

Data Hazards (again) Needed still being compted by previos instrction sb $, $3, $2 and $2, $, $ or $3, $6, $ add $, $8, $9 sw $5, ($2) 37 sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) Shift left 2 ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 38 9

and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) $Rs=3 reslt (q) (t) Rs=3 Shift left 2 ress Instrction (c) Instrction (f) (g) register register 2 Registers 2 register (h) 6 Sign etend 32 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (s) (w) (f) Rd= 39 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) ress Instrction (c) Instrction register (e) register 2 Registers (f) (l) 2 (g) register (h) $Rs=??? Rs= 6 Sign etend 32 Shift left 2 (n) (o) (p) reslt??? (q)??? Zero ALU ALU reslt (r) (t) () ress (v) Data () (y) (g) (z) (s) (w) (f) Rd=2 Rd= 2

or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) ress Instrction (c) Instrction register (e) register 2 Registers (f) (l) 2 (g) register (h) $Rs= Rs= 6 Sign etend 32 Shift left 2 (n) (o) (p) reslt 3 (q) Zero ALU ALU reslt (r) (t) () ress (v) Data () (y) (g) (z) (s) (w) (f) Rd=2 Rd= add $, $8, $9 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 Shift left 2 (n) (o) (p)??? Zero ALU ALU reslt (r)??? () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 2 Rd=2 Rd= 2

add $, $8, $9 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 Shift left 2 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 3 Rd=2 Rd= sw $5, ($2) add $, $8, $9 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 Shift left 2 (n) (o) (p)??? Zero ALU ALU reslt (r) () ress (v) Data ()??? (y) (g) (z) (i) (s) (w) (f) Rd= 22

Forwarding : Implementation / / /E E/ reslt itional path for forwarding? Shift left 2 ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 How to control the forwarding pth? 5 Forwarding : Implementation / / /E E/ reslt itional path for forwarding? Shift left 2 ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 How to control the forwarding pth? 6 23

Forwarding : Forwarding Unit / /E Control E/ / Instrction Instrction Registers ALU Data /.RegisterRs Rs /.RegisterRt Rt /.RegisterRt /.RegisterRd Rt Rd /E.RegisterRd Forwarding nit: 6-inpt, 2-otpt combinational circit 7 Forwarding nit E/.RegisterRd HW#, (5) Forwarding Control Control logic ForwardA = if (/E.Rd = /.Rs) <- get operand from /E if (E/.Rd = /.Rs) <- get operand from E/, otherwise <- get operand from / ForwardB = if (/E.Rd = /.Rt) <- get operand from /E if (E/.Rd = /.Rt) <- get operand from E/, otherwise <- get operand from / 8 2

Forwarding Control Circit ForwardA = if ((/E.Rd = /.Rs) && /E.Reg && (/E.Rd )) if ((E/.Rd = /.Rs) && E/.Reg && (E/.Rd ) && (/E.Rd /.Rs)), otherwise ForwardB = if ((/E.Rd = /.Rt) && /E.Reg && (/E.Rd )) if ((E/.Rd = /.Rt) && E/.Reg && (E/.Rd ) && (/E.Rd /.Rt))), otherwise 9 Data Hazards: All Considered??? bt it doesn t eliminate all hazards: lw $s5,($s) E add $s7,$s5,$s6 Stall E especially when we remember that access is really often mch longer than a single cycle: lw $s5,($s) E add $s7,$s5,$s6 Stall Stall Stall E 5 25

Data Hazards: Stalling Stall the pipeline by keeping an instrction in the same stage Program Time (in clock cycles) eection order (in instrctions) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 CC lw $2, 2($) I Reg D Reg and $, $2, $5 I Reg Reg D Reg or $8, $2, $6 add $9, $, $2 I I Reg D Reg bbble I Reg D Reg slt $, $6, $7 I Reg D Reg lw-and lw-or At CC5, E stage is empty!!! 5 Data Hazards: Stalling Stalling detection and control Detects dring the stage when lw instrction is in stage The following two instrctions are in ( and ) and ( or ) stages, respectively If detected, Stall the following instrction (in stage, and ) so that it repeats the stage again => / pipeline register shold not be changed Stall the second instrction (in stage, or ) so that it repeats the stage again => shold not be changed 52 26

Data Hazards: Stalling lw Hazard detection If (/.em and ((/.Rt = /.Rs) or (/.Rt = /.Rt)) stall the pipeline Control signals generated from hazard detection nit / to prevent / register from changing to prevent from changing UX control to delay forwarding control signals (pass nll signals) 53 Stalling: Detection Unit Stall by letting an instrction that won t write anything go forward Hazard detection nit /.em / / Control /E E/ / Instrction Instrction Registers ALU Data /.RegisterRs /.RegisterRt Hazard detection nit: -inpt, 3-otpt combinational circit /.RegisterRt /.RegisterRd /.RegisterRt 5 Rt Rd Rs Rt Forwarding nit /E.RegisterRd E/.RegisterRd 27

Stalling: What happen in the pipleine? CC CC2 CC3 CC CC5 CC6 CC7 CC8 CC9 CC CC CC2 No stage E stage is repeated at CC7 <- /. No E stage E stage is repeated at CC7 <- No stage No at CC7, no E at CC8 E and no at CC9 <- zero control signals lw $s5,($s) E add $s7,$s5,$s6 Stall () E Stall () E E 55 E Branch (Control) Hazards While eecting a previos branch, net instrction address might not yet be known. s n i o t c r t s n I Conditional branch Branch target Calclates +. Stall 2 Comptes branch target address. Performs branch test & sets to target Stall E E 3 5 6 7 8 Time Step (Clock Cycle) 56 28

Branch (Control) Hazards 57 Branch Hazards We can stall the pipeline for every branch instrction Too slow (3 instrctions) Or, contine eection down the seqential instrction stream assming that the branch will not be taken (predict branch not taken ) If the condition is not met, OK! (prediction is sccessfl) If the condition is met, (prediction is wrong) Some nwanted instrctions are in the pipeline! Need to flsh instrctions How do yo compare the above two? If branches are taken half the time, and if it costs little to discard the instrctions, the second approach halves the cost of control hazards 58 29

Stalling: What happen in the pipleine? A new control signal.flsh is introdced to flsh the instrction in stage It zeros the instrction field of the / pipeline register, which in fact can be decoded as sll $, $, $ In fact, nop = sll $, $, $ beq $,$2, 7 add $3,$,$5 target of beq.flsh at CC3 will do. 59 CC CC2 CC3 CC CC5 CC6 CC7 Nll () E stage eectes a nll instrction (sll $,$,$) at CC3 Nll Nll Nll () (E)() E CC5 stage eectes a nll instrction (sll $,$,$) at CC E stage eectes a nll instrction (sll $,$,$) at stage eectes a nll instrction (sll $,$,$) at CC6 Branch Hazards: Branch Delay Slots While determining net instrction address, go ahead and eecte seqentially following instrction(s). Comptes branch target address. Performs branch test & sets to target. s n i o t c r t s n I Conditional branch Branch delay Branch target E E Fetches correct target. E 2 3 5 6 7 Time Step (Clock Cycle) 6 3

Branch Hazards: Branch Delay Slots Advantage: Can avoid one stall per delay slot. Disadvantages: akes assembly-langage programming more difficlt. Can be difficlt to find appropriate code for slot. Eposes implementation detail that cold change. Later implementations withot a stall mst still emlate slot. ost modern processors avoid 6 Branch Hazards: Branch Prediction Gess which instrction is net, & start eecting it. What if gess is wrong? : Flsh the pipeline Simplest gesses: Always Taken or Never Taken. When to do prediction? Static prediction: compiler Dynamic prediction: processor 62 3

Dynamic Branch Prediction Branch prediction bffer (branch history table) A small that is indeed by the lower portion of the address of the branch instrction and that contains one or more bits indicating whether the branch was recently taken or not. Instrction Instrction BPB Prediction (T or NT) 63 / Dynamic Branch Prediction -bit predictor T Predict taken N (Not taken) T (Taken) Predict not taken NT Prediction accracy --- --- loop times => st :?, 2 nd : correct, 3 rd : correct, beq 9 th : correct, th : incorrect => 8% accracy (Becase the first one is incorrect in the second eection of the same code.) 6 32

Eceptions Another form of control hazard involves eceptions. When an arithmetic overflow occrs dring eecting add $, $2, $ Transfer control to the eception rotine ( ) This is the same as eecting a branch instrction Necessary actions are Stop eecting the crrent instrction and start the eception rotine. Following instrctions already in the pipe mst be wiped ot (flsh pipeline registers). Retrn to the offending instrction. 65 Flsh Control Signals Similar to the taken-branch, we need to flsh pipeline registers. Qestion is which pipeline register(s)? Arithmetic overflow is detected at the end of stage. And ths flshing takes place at E stage (at the net cycle). Since three following instrctions are already in the pipeline (, and stages), we need to flsh those three instrctions. Therefore, we need.flsh and.flsh in addition to.flsh control signal. 66 33

For the instrction in stage For the instrction in stage For the instrction in stage.flsh.flsh.flsh Instrction / Hazard detection nit Control Shift left 2 Registers = / 67 Case Ecept OF ALU /E Data E/ Sign Challenges etend What if more than one instrction generates eceptions? Forwarding nit While add cases an overflow eception at CC5 in, another cases an invalid opcode eception at CC5 in It is not OK to generate all flshing signals. And, how does the eception service rotine correctly identify the instrction that cases the eception? => Imprecise eception 68 3

Precise and Imprecise Eceptions Precise eceptions Hardware (CPU) correctly identifies the offending instrction. And makes sre all prior instrctions complete. All instrctions following it are not allowed to complete their eection and have not modified the process state Imprecise eception Hardware does not garantee it and leaves it p to the operating system to determine which instrction cased the problem. Some instrctions following the offending instrction are allowed to completed their eection and modified the process state. ost of modern CPUs spport Precise eceptions 69 35