ECE/CS 552: Pipeline Hazards

Similar documents
Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

(Basic) Processor Pipeline

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

LECTURE 3: THE PROCESSOR

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

COSC 6385 Computer Architecture - Pipelining

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Appendix C: Pipelining: Basic and Intermediate Concepts

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Chapter 4 The Processor 1. Chapter 4B. The Processor

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

COMPUTER ORGANIZATION AND DESIGN

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

ECE154A Introduction to Computer Architecture. Homework 4 solution

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Computer Architecture

LECTURE 10. Pipelining: Advanced ILP

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CSEE 3827: Fundamentals of Computer Systems

Pipelining. CSC Friday, November 6, 2015

COMPUTER ORGANIZATION AND DESIGN

Chapter 4. The Processor

EITF20: Computer Architecture Part2.2.1: Pipeline-1

COMPUTER ORGANIZATION AND DESI

Full Datapath. Chapter 4 The Processor 2

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Thomas Polzer Institut für Technische Informatik

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CENG 3420 Lecture 06: Pipeline

Basic Pipelining Concepts

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Chapter 4 The Processor 1. Chapter 4A. The Processor

Computer Organization and Structure

Very Simple MIPS Implementation

DEE 1053 Computer Organization Lecture 6: Pipelining

ECE/CS 552: Pipelining

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Full Datapath. Chapter 4 The Processor 2

There are different characteristics for exceptions. They are as follows:

Very Simple MIPS Implementation

ECE/CS 552: Introduction to Superscalar Processors

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

ECEC 355: Pipelining

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Processor (II) - pipelining. Hwansoo Han

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

Modern Computer Architecture

DLX Unpipelined Implementation

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Chapter 4. The Processor

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

MIPS An ISA for Pipelining

Lecture 2: Processor and Pipelining 1

EIE/ENE 334 Microprocessors

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Computer Architecture Spring 2016

Chapter 4. The Processor

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Instruction Pipelining Review

Lecture 7 Pipelining. Peng Liu.

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

The Pipelined RiSC-16

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

The Processor: Instruction-Level Parallelism

ECS 154B Computer Architecture II Spring 2009

CS/CoE 1541 Exam 1 (Spring 2019).

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

CPU Pipelining Issues

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

CS 251, Winter 2019, Assignment % of course mark

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Question 1: (20 points) For this question, refer to the following pipeline architecture.

Pipelining. Pipeline performance

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Chapter 4. The Processor

Instruction Pipelining

The Processor: Improving the performance - Control Hazards

Advanced Computer Architecture

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Advanced issues in pipelining

14:332:331 Pipelined Datapath

Transcription:

ECE/CS 552: Pipeline Hazards Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith

Pipeline Hazards Forecast Program Dependences Data Hazards Stalls Forwarding Control Hazards Exceptions

Sequential Execution Model MIPS ISA requires the appearance of sequential execution Precise exceptions True of most general purpose ISAs

Program Dependences i1: xxxx i1 i1: A true dependence between two instructions may only involve one subcomputation of each instruction. i2: xxxx i2 i2: i3: xxxx i3 i3: The implied sequential precedences are an overspecification. It is sufficient but not necessary to ensure program correctness.

Program Data Dependences True dependence (RAW) j cannot execute until i produces its result Anti-dependence (WAR) j cannot write its result until i has read its sources Output dependence (WAW) j cannot write its result until i has written its result D( i) R( j) R( i) D( j) D( i) D( j)

Control Dependences Conditional branches Branch must execute to determine which instruction to fetch next Instructions following a conditional branch are control dependent on the branch instruction

Example (quicksort/mips) # for (; (j < high) && (array[j] < array[low]) ; ++j ); # $10 = j # $9 = high # $6 = array # $8 = low bge done, $10, $9 mul $15, $10, 4 addu $24, $6, $15 lw $25, 0($24) mul $13, $8, 4 addu $14, $6, $13 lw $15, 0($14) bge done, $25, $15 cont: addu $10, $10, 1... done: addu $11, $11, -1

Pipeline Hazards Pipeline hazards Potential violations of program dependences Must ensure program dependences are not violated Hazard resolution Static: compiler/programmer guarantees correctness Dynamic: hardware performs checks at runtime Pipeline interlock Hardware mechanism for dynamic hazard resolution Must detect and enforce dependences at runtime

Pipeline Hazards Necessary conditions: WAR: write stage earlier than read stage Is this possible in IF-RD-EX-MEM-WB? WAW: write stage earlier than write stage Is this possible in IF-RD-EX-MEM-WB? RAW: read stage earlier than write stage Is this possible in IF-RD-EX-MEM-WB? If conditions not met, no need to resolve Check for both register and memory

Pipeline Hazard Analysis IF ID RD RD ALU ALU MEM WB Memory hazards RAW: Yes/No? D WAdd WAR: Yes/No? S1 Register WAW: Yes/No? RAdd1 File Register hazards S2 RAW: Yes/No? WAR: Yes/No? WAW: Yes/No? RAdd2 RData1 WData RData2 W/R

RAW Hazard Earlier instruction produces a value used by a later instruction: add $1, $2, $3 sub $4, $5, $1 Cycle: Instr: add sub 1 2 3 4 5 6 7 8 9 1 0 F D X M W F D X M W 1 1 1 2 1 3

RAW Hazard - Stall Detect dependence and stall: add $1, $2, $3 sub $4, $5, $1 Cycle: Instr: add sub 1 2 3 4 5 6 7 8 9 1 0 F D X M W F D X M W 1 1 1 2 1 3

Control Dependence One instruction affects which executes next sw $4, 0($5) bne $2, $3, loop sub $6, $7, $8 Cycle: Instr: sw bne sub 1 2 3 4 5 6 7 8 9 1 0 F D X M W F D X M W F D X M W 1 1 1 2 1 3

Control Dependence - Stall Detect dependence and stall sw $4, 0($5) bne $2, $3, loop sub $6, $7, $8 Cycle: Instr: sw bne sub 1 2 3 4 5 6 7 8 9 1 0 F D X M W F D X M W F D X M W 1 1 1 2 1 3

Pipelined Control Controlled by different instructions Decode instructions and pass the signals down the pipe Control sequencing is embedded in the pipeline No explicit FSM Instead, distributed FSM

Pipelined Control WB Instruction Control M WB EX M WB IF/ID ID/EX EX/MEM MEM/WB

RAW Hazards Must first detect RAW hazards Pipeline analysis proves that WAR/WAW don t occur ID/EX.WriteRegister = IF/ID.ReadRegister1 ID/EX.WriteRegister = IF/ID.ReadRegister2 EX/MEM.WriteRegister = IF/ID.ReadRegister1 EX/MEM.WriteRegister = IF/ID.ReadRegister2 MEM/WB.WriteRegister = IF/ID.ReadRegister1 MEM/WB.WriteRegister = IF/ID.ReadRegister2

RAW Hazards Not all hazards because WriteRegister not used (e.g. sw) ReadRegister not used (e.g. addi, jump) Do something only if necessary

RAW Hazards Hazard Detection Unit Several 5-bit comparators Response? Stall pipeline Instructions in IF and ID stay IF/ID pipeline latch not updated Send nop down pipeline (called a bubble) PCWrite, IF/IDWrite, and nop mux

RAW Hazard Forwarding A better response forwarding Also called bypassing Comparators ensure register is read after it is written Instead of stalling until write occurs Use mux to select forwarded value rather than register value Control mux with hazard detection logic

Forwarding Paths (ALU instructions) IF ID RD i+1: R1 i+2: R1 i+3: R1 c b a ALU i: R1 i+1: R1 i+2: R1 ALU FORWARDING PATHS MEM WB (i i+1) Forwarding via Path a i: R1 (i i+2) Forwarding via Path b i+1: i: R1 (i i+3) i writes R1 before i+3 reads R1 2005 Mikko Lipasti 21

Write before Read RF Register file design 2-phase clocks common Write RF on first phase Read RF on second phase Hence, same cycle: Write $1 Read $1 No bypass needed If read before write or DFF-based, need bypass

ALU Forwarding Register File Comp Comp Comp Comp 1 0 1 0 1 0 1 0 ALU ALU 2005 Mikko Lipasti 23

IF Forwarding Paths (Load instructions) ID RD i+1: R1 i+1: R1 i+2: R1 e d LOAD FORWARDING PATH(s) ALU MEM i:r1 MEM[] i:r1 MEM[] i+1: R1 WB i:r1 MEM[] (i i+1) Stall i+1 (i i+1) Forwarding via Path d (i i+2) i writes R1 before i+2 reads R1 2005 Mikko Lipasti 24

Implementation of Load Forwarding Register File CompComp CompComp A dd r D ata D-Cache 1 0 1 0 LOAD 1 0 1 0 1 0 1 0 Load Stall IF,ID,RD ALU ALU

Control Flow Hazards Control flow instructions branches, jumps, jals, returns Can t fetch until branch outcome known Too late for next IF

Control Flow Hazards What to do? Always stall Easy to implement Performs poorly 1/6 th instructions are branches each branch takes 3 cycles CPI = 1 + 3 x 1/6 = 1.5 (lower bound)

Control Flow Hazards Predict branch not taken Send sequential instructions down pipeline Kill instructions later if incorrect Must stop memory accesses and RF writes Late flush of instructions on misprediction Complex Global signal (wire delay)

Control Flow Hazards Even better but more complex Predict taken Predict both (eager execution) Predict one or the other dynamically Adapt to program branch patterns Lots of chip real estate these days Core i7, ARM A15 Current research topic More later, covered in detail in ECE752

Control Flow Hazards Another option: delayed branches Always execute following instruction delay slot (later example on MIPS pipeline) Put useful instruction there, otherwise nop A mistake to cement this into ISA Just a stopgap (one cycle, one instruction) Superscalar processors (later) Delay slot just gets in the way

Exceptions and Pipelining add $1, $2, $3 overflows A surprise branch Earlier instructions flow to completion Kill later instructions Save PC in EPC, set PC to EX handler, etc. Costs a lot of designer sanity 554 teams that try this sometimes fail

Exceptions Even worse: in one cycle I/O interrupt User trap to OS (EX) Illegal instruction (ID) Arithmetic overflow Hardware error Etc. Interrupt priorities must be supported

Pipeline Hazards Program Dependences Data Hazards Stalls Forwarding Control Hazards Exceptions