Pipelining is Hazardous!

Similar documents
Implementing a MIPS Processor. Readings:

Processor Design Pipelined Processor (II) Hung-Wei Tseng

1 Hazards COMP2611 Fall 2015 Pipelined Processor

ELE 655 Microprocessor System Design

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Implementing a MIPS Processor. Readings:

Lecture 7 Pipelining. Peng Liu.

Quiz 5 Mini project #1 solution Mini project #2 assigned Stalling recap Branches!

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

What about branches? Branch outcomes are not known until EXE What are our options?

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Chapter 4 The Processor 1. Chapter 4B. The Processor

ECEC 355: Pipelining

What about branches? Branch outcomes are not known until EXE What are our options?

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Pipelining. CSC Friday, November 6, 2015

Quiz 5 Mini project #1 solution Mini project #2 assigned Stalling recap Branches!

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

CS 251, Winter 2018, Assignment % of course mark

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Processor Design Pipelined Processor. Hung-Wei Tseng

ECE 486/586. Computer Architecture. Lecture # 12

LECTURE 3: THE PROCESSOR

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

CS 251, Winter 2019, Assignment % of course mark

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Pipelining. lecture 15. MIPS data path and control 3. Five stages of a MIPS (CPU) instruction. - factory assembly line (Henry Ford years ago)

Pipelined Processor Design

Instr. execution impl. view

Pipeline Data Hazards. Dealing With Data Hazards

CS 152, Spring 2011 Section 2

Announcement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Pipeline Architecture RISC

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

DLX Unpipelined Implementation

Processor (II) - pipelining. Hwansoo Han

Full Datapath. Chapter 4 The Processor 2

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Chapter 4 The Processor 1. Chapter 4A. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

LECTURE 10. Pipelining: Advanced ILP

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Basic Pipelining Concepts

Instruction Pipelining

COMPUTER ORGANIZATION AND DESI

Full Datapath. Chapter 4 The Processor 2

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CSEE 3827: Fundamentals of Computer Systems

COSC 6385 Computer Architecture - Pipelining

COMPUTER ORGANIZATION AND DESIGN

Pipeline Review. Review

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

CPU Pipelining Issues

Chapter 4. The Processor

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

Chapter 4. The Processor

Pipeline design. Mehran Rezaei

CS146 Computer Architecture. Fall Midterm Exam

ECE331: Hardware Organization and Design

EITF20: Computer Architecture Part2.2.1: Pipeline-1

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Computer Organization and Structure

Advanced Computer Architecture

Pipelined Processor Design

Limiting The Data Hazards by Combining The Forwarding with Delay Slots Operations to Improve Dynamic Branch Prediction in Superscalar Processor

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

ECE/CS 552: Pipeline Hazards

EITF20: Computer Architecture Part2.2.1: Pipeline-1

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering

EITF20: Computer Architecture Part2.2.1: Pipeline-1

2. [3 marks] Show your work in the computation for the following questions involving CPI and performance.

Computer Architecture Spring 2016

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

ECE154A Introduction to Computer Architecture. Homework 4 solution

Multiple Instruction Issue. Superscalars

The Processor: Improving the performance - Control Hazards

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

Computer Architecture

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

Instruction Pipelining

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

ECS 154B Computer Architecture II Spring 2009

Complex Pipelines and Branch Prediction

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

CS/CoE 1541 Mid Term Exam (Fall 2018).

ECE260: Fundamentals of Computer Engineering

Transcription:

Pipelining is Hazardous! Hazards are situations where pipelining does not work as elegantly as we would like Three kinds Structural hazards -- we have run out of a hardware resource. Data hazards -- an input is not available on the cycle it is needed. Control hazards -- the next instruction is not known. Dealing with hazards increases complexity or decreases performance (or both) Dealing efficienctly with hazards is much of what makes processor design hard. That, and the Quartus tools ;-) 56

Hazards: Key Points Hazards cause imperfect pipelining They prevent us from achieving CPI = 1 They are generally causes by counter flow data dependences in the pipeline Three kinds Structural -- contention for hardware resources Data -- a data value is not available when/where it is needed. Control -- the next instruction to execute is not known. Two ways to deal with hazards Removal -- add hardware and/or complexity to work around the hazard so it does not occur Stall -- Hold up the execution of new instructions. Let the older instructions finish, so the hazard will clear. 57

Structural hazard Why does a structural hazard exist here? add $1, $2, $3 lw $4, 0($5) sub $6, $7, $8 sub $9,$10, $1 sw $1, 0($12) EXE MEM EXE WB MEM EXE A. The register file is trying to read and write the same register at the same cycle B. The ALU and data memory are both active at the same cycle C. A value is used before it s produced D. Both A and B E. Both A and C 58

Structural hazard The original pipeline incurs structural hazard when two instructions compete for the same register. Solution: write early, read late Writes occur at the clock edge and complete long enough before the end of the clock cycle. The read occurs later in the clock cycle We will use this approach from now on. add $1, $2, $3 lw $4, 0($5) sub $6, $7, $8 sub $9,$10, $1 sw $1, 0($12) EXE MEM EXE WB MEM EXE WB MEM EXE WB MEM EXE WB MEM WB 59

How does a structural hazard arise in this pipeline? add $1, $2, $3 lw $4, 0($5) sub $6, $7, $8 sub $9,$10,$11 sw $1, 0($12) EXE A. The register file and memory are both active at the same cycle B. The ALU and memory are both active at the same cycle C. The processor needs to fetch an instruction and access memory at the same cycle D. Both A and B E. Both A and C MEM EXE WB MEM EXE 60

Data Dependences A data dependence occurs whenever one instruction needs a value produced by another. Register values Also memory accesses (more on this later) add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 add $t3, $t2, $t4 63

Dependences in the pipeline In our simple pipeline, these instructions cause a data hazard Time Cyc 1 Cyc 2 Cyc 3 Cyc 4 Cyc 5 64

Solution : Stall When you need a value that is not ready, stall Suspend the execution of the executing instruction and those that follow. This introduces a pipeline bubble. A bubble is a lack of work to do, it propagates through the pipeline like nop instructions Cyc 1 Cyc 2 Cyc 3 Cyc 4 Cyc 5 Cyc 6 Cyc 7 Cyc 8 Cyc 9 Cyc 10 Both of these instructions are stalled One instruction or nop completes each cycle 68

Stalling the pipeline Freeze all pipeline stages before the stage where the hazard occurred. Disable the PC update Disable the pipeline registers This is equivalent to inserting a nop into the pipeline when a hazard exists Insert nop control bits at stalled stage (decode in our example) 69

Calculating CPI for Stalls In this case, the bubble lasts for 2 cycles. As a result, in cycle (6 and 7), no instruction completes. What happens to CPI? We assign the 2 stall cycle to the instruction that stalled In this case, it is the sub insturction Rule: CPI for an instruction = (Cycles from fetch to writeback) (#of pipeline stages) + 1 Cyc 1 Cyc 2 Cyc 3 Cyc 4 Cyc 5 Cyc 6 Cyc 7 Cyc 8 Cyc 9 Cyc 10

Hardware for Stalling Turn off the enables on the earlier pipeline stages The earlier stages will keep processing the same instruction over and over. No new instructions get fetched. Insert control and data values corresponding to a nop into the downstream pipeline register. This will create the bubble. The nops will flow downstream, doing nothing. When the stall is over, re-enable the pipeline registers The instructions in the upstream stages will start moving again. New instructions will start entering the pipeline again. 71

The Impact of Stalling On Performance ET = I * CPI * CT I and CT are constant What is the impact of stalling on CPI? What do we need to know to figure it out? 72

The Impact of Stalling On Performance ET = I * CPI * CT I and CT are constant What is the impact of stalling on CPI? Fraction of instructions that stall: 30% Baseline CPI = 1 Stall CPI = 1 + 2 = 3 New CPI = 0.3*3 + 0.7*1 = 1.6 73

Solution 3: Bypassing/Forwarding Data values are computed in Ex and Mem but publicized in write back The data exists! We should use it. 74

Bypassing or Forwarding Take the values, where ever they are 75

Forwarding Paths 76

PC etch/dec Dec/Exec Exec/Mem Mem/WB Forwarding in Hardware Add 4 Read Address Add Instruction Memory Read Addr 1 Register Read Addr 2 File Write Addr Write Data Read Data 1 Read Data 2 Shift left 2 Add ALU Address Write Data Data Memory Read Data Sign 16 Extend 32

Hardware Cost of Forwarding In our pipeline, adding forwarding required relatively little hardware. For deeper pipelines it gets much more expensive Roughly: ALU * pipe_stages you need to forward over Some modern processor have multiple ALUs (4-5) And deeper pipelines (4-5 stages of to forward across) Not all forwarding paths need to be supported. If a path does not exist, the processor will need to stall. 79

80

Pros and Cons Punt to the compiler This is what MIPS does and is the source of the loaddelay slot Future versions must emulate a single load-delay slot. The compiler fills the slot if possible, or drops in a nop. Always stall. The compiler is oblivious, but performance will suffer 10-15% of instructions are loads, and the CPI for loads will be 2 Forward when possible, stall otherwise Here the compiler can order instructions to avoid the stall. If the compiler can t fix it, the hardware will. 81

Stalling for Load Only four stages are occupied. What s in Mem? All stages of the pipeline earlier than the stall stand still. To stall we insert a noop in place of the instruction and freeze the earlier stages of the pipeline 82

Inserting Noops The noop is in Mem To stall we insert a noop in place of the instruction and freeze the earlier stages of the pipeline 83