Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Similar documents
The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

Processor (II) - pipelining. Hwansoo Han

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

COMPUTER ORGANIZATION AND DESIGN

CSEE 3827: Fundamentals of Computer Systems

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

LECTURE 3: THE PROCESSOR

COMPUTER ORGANIZATION AND DESIGN

Chapter 4 The Processor 1. Chapter 4B. The Processor

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

ECE260: Fundamentals of Computer Engineering

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Thomas Polzer Institut für Technische Informatik

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Chapter 4. The Processor

Chapter 4. The Processor

Chapter 4. The Processor

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

EIE/ENE 334 Microprocessors

Chapter 4. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Outline Marquette University

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Pipelining. CSC Friday, November 6, 2015

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

ELE 655 Microprocessor System Design

Chapter 4 The Processor 1. Chapter 4A. The Processor

LECTURE 9. Pipeline Hazards

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4. The Processor. Jiang Jiang

CENG 3420 Lecture 06: Pipeline

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

COMPUTER ORGANIZATION AND DESI

Chapter 4. The Processor

DEE 1053 Computer Organization Lecture 6: Pipelining

14:332:331 Pipelined Datapath

ECE154A Introduction to Computer Architecture. Homework 4 solution

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

Chapter 4 (Part II) Sequential Laundry

ECE260: Fundamentals of Computer Engineering

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

The Processor: Improving the performance - Control Hazards

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE232: Hardware Organization and Design

CS 251, Winter 2018, Assignment % of course mark

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

(Basic) Processor Pipeline

COSC121: Computer Systems. ISA and Performance

COSC 6385 Computer Architecture - Pipelining

CS 251, Winter 2019, Assignment % of course mark

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ECE Exam II - Solutions November 8 th, 2017

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Lecture 2: Processor and Pipelining 1

ECS 154B Computer Architecture II Spring 2009

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Design a MIPS Processor (2/2)

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

CS/CoE 1541 Exam 1 (Spring 2019).

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Question 1: (20 points) For this question, refer to the following pipeline architecture.

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

MIPS An ISA for Pipelining

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipeline Data Hazards. Dealing With Data Hazards

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

HY425 Lecture 05: Branch Prediction

Computer Architecture Spring 2016

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

Modern Computer Architecture

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

ECE331: Hardware Organization and Design

Lecture 9 Pipeline and Cache

Computer Architecture

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

6.004 Tutorial Problems L22 Branch Prediction

ECE/CS 552: Pipeline Hazards

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Transcription:

Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu

Hazards What are hazards? Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazards Need to wait for previous instruction to complete its data read/write Control hazards Deciding on control action depends on previous instruction 2

Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline bubble Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches 3

Data Hazards (1) An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 4

Data Hazards (2) Read After Write (RAW) Inst J tries to read operand before Inst I writes it: I: add $t1, $t2, $t3 J: sub $t4, $t1, $t3 Caused by a (true) dependence. This hazard results from an actual need for communication. 5

Data Hazards (3) Write After Read (WAR) Inst J writes operand before Inst I reads it: I: sub $t4, $t1, $t3 J: add $t1, $t2, $t3 K: add $t6, $t1, $t7 Called an anti-dependence by compiler writers. This results from the reuse of the name $t1. 6

Data Hazards (4) Write After Write (WAW) Inst J writes operand before Inst I writes it: I: sub $t1, $t4, $t3 J: add $t1, $t2, $t3 K: add $t6, $t1, $t7 Called an output dependence by compiler writers. This also results from the reuse of the name $t1. 7

Forwarding Forwarding (aka Bypassing) Use result when it is computed Don t wait for it to be stored in a register Requires extra connections in the datapath 8

Forwarding Example Consider this sequence: sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) We can resolve hazards with forwarding How do we detect when to forward? 9

Dependencies & Forwarding 10

Forwarding Conditions (1) Detecting the need to forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs & ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg 11

Forwarding Conditions (2) Detecting the need to forward (cont d) But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd 0, MEM/WB.RegisterRd 0 12

Forwarding Conditions (3) Forwarding paths 13

Forwarding Conditions (4) EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 14

Double Data Hazard Consider this sequence: add $1, $1, $2 add $1, $1, $3 add $1, $1, $4 Both hazards occur Want to use the most recent Revise MEM hazard condition Only forward if EX hazard condition isn t true 15

Rev d Forwarding Conditions MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 16

Datapath with Forwarding 17

Load-Use Data Hazard (1) Load-Use data hazards Can t always avoids stalls by forwarding If value not computed when needed Can t forward backward in time! 18

Load-Use Data Hazard (2) Need to stall for one cycle 19

Load-Use Data Hazard (3) Load-use hazard detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs & IF/ID.RegisterRt Load-use hazard when ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble 20

Load-Use Data Hazard (4) How to stall the pipeline? Force control values in ID/EX register to 0 EX, MEM, and WB do nop (no-operation) Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for lw» Can subsequently forward to EX stage 21

Load-Use Data Hazard (5) Stall/Bubble in the pipeline Stall inserted here 22

Load-Use Data Hazard (6) Stall/Bubble in the pipeline (cont d) Or, more accurately... 23

Datapath + Hazard Detection 24

Compiler Approach (1) Stalls reduce performance But are required to get correct results Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure 25

Compiler Approach (2) Code scheduling to avoid stalls Reorder code to avoid use of load result in the next instruction (C code) A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles 11 cycles 26

Control Hazards (1) Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage 27

Control Hazards (2) Example: branch outcome determined in MEM 3 stall cycles for taken branch Flush these instructions (Set control values to 0) PC 28

Control Hazards (3) Branch handling in ID + Stall on branch Branch outcome at the end of ID stage Wait until branch outcome determined before fetching next instruction One cycle penalty for every branch instruction 29

Control Hazards (4) Reducing branch delay Move hardware to determine outcome to ID stage Target address adder Register comparator Example: branch taken 36: sub $10, $4, $8 40: beq $1, $3, 7 44: and $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $7... 72: lw $4, 50($7) 30

Control Hazards (5) 31

Control Hazards (6) 32

Data Hazards for Branches (1) Data hazards for branches If a comparison register is a destination of 2nd or 3rd preceding ALU instruction Can resolve using forwarding add $1, $2, $3 add $4, $5, $6 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB beq $1, $4, target IF ID EX MEM WB 33

Data Hazards for Branches (2) Data hazards for branches (cont d) If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction Need 1 stall cycle lw $1, addr add $4, $5, $6 IF ID EX MEM WB IF ID EX MEM WB beq stalled IF ID beq $1, $4, target ID EX MEM WB 34

Data Hazards for Branches (3) Data hazards for branches (cont d) If a comparison register is a destination of immediately preceding load instruction Need 2 stall cycles lw $1, addr IF ID EX MEM WB beq stalled IF ID beq stalled ID beq $1, $0, target ID EX MEM WB 35

Branch Prediction (1) Branch prediction Longer pipelines can t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay 36

Branch Prediction (2) Static branch prediction Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history 37

Static Branch Prediction MIPS with Predict Not Taken Prediction correct Prediction incorrect 38

Dynamic Branch Prediction (1) Dynamic branch prediction In deeper and superscalar pipelines, branch penalty is more significant Use dynamic prediction Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch» Check table, expect the same outcome» Start fetching from fall-through or target» If wrong, flush pipeline and flip prediction 39

Dynamic Branch Prediction (2) 1-bit predictor: shortcoming Inner loop branches mispredicted twice! outer: inner: beq,, inner beq,, outer Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around 40

Dynamic Branch Prediction (3) 2-bit predictor Only change prediction on two successive mispredictions 41

Dynamic Branch Prediction (4) Calculating the branch target Even with predictor, still need to calculate the target address 1-cycle penalty for a taken branch Branch target buffer Cache of target addresses Indexed by PC when instruction fetched» If hit and instruction is branch predicted taken, can fetch target immediately 42

Delayed Branch (1) Branch delay slot filled by a useful instruction A branch always executes the following instruction. Done by compilers and assemblers 43

Delayed Branch (2) Compiler approach Simple and effective for a single branch delay slot Used in MIPS jal Addr: R[31]=PC+8; PC=PC+4[31..28] Addr*4 Modern processors favor dynamic branch prediction Branch delay slot becomes longer Longer pipelines Multiple instructions issue per clock cycle Dynamic branch prediction is more flexible The growth in available transistors 44