Processor (II) - pipelining. Hwansoo Han

Similar documents
Full Datapath. Chapter 4 The Processor 2

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Chapter 4. The Processor

CSEE 3827: Fundamentals of Computer Systems

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Full Datapath. Chapter 4 The Processor 2

COMPUTER ORGANIZATION AND DESIGN

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Pipelining. CSC Friday, November 6, 2015

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor

LECTURE 3: THE PROCESSOR

COMPUTER ORGANIZATION AND DESIGN

ECE260: Fundamentals of Computer Engineering

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Chapter 4. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

EIE/ENE 334 Microprocessors

Outline Marquette University

Chapter 4. The Processor

Thomas Polzer Institut für Technische Informatik

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Chapter 4. The Processor

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

Chapter 4 (Part II) Sequential Laundry

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ECE260: Fundamentals of Computer Engineering

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

ELE 655 Microprocessor System Design

Chapter 4. The Processor. Jiang Jiang

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

COMPUTER ORGANIZATION AND DESI

LECTURE 9. Pipeline Hazards

COMP2611: Computer Organization. The Pipelined Processor

Chapter 4. The Processor

14:332:331 Pipelined Datapath

CENG 3420 Lecture 06: Pipeline

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

DEE 1053 Computer Organization Lecture 6: Pipelining

Design a MIPS Processor (2/2)

COMPUTER ORGANIZATION AND DESIGN

ECS 154B Computer Architecture II Spring 2009

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

ECE 154A Introduction to. Fall 2012

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE Exam II - Solutions November 8 th, 2017

ECEC 355: Pipelining

Pipeline Data Hazards. Dealing With Data Hazards

COSC 6385 Computer Architecture - Pipelining

Pipelining. Pipeline performance

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

CS 251, Winter 2018, Assignment % of course mark

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

CS/CoE 1541 Exam 1 (Spring 2019).

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

COSC121: Computer Systems. ISA and Performance

MIPS An ISA for Pipelining

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Lecture 2: Processor and Pipelining 1

Chapter 4. The Processor

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

CS 251, Winter 2019, Assignment % of course mark

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

(Basic) Processor Pipeline

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Instr. execution impl. view

Processor (IV) - advanced ILP. Hwansoo Han

Chapter 6 Exercises with solutions

Question 1: (20 points) For this question, refer to the following pipeline architecture.

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

ECE232: Hardware Organization and Design

Computer Architecture. Lecture 6.1: Fundamentals of

Pipelined Processor Design

Transcription:

Processor (II) - pipelining Hwansoo Han

Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number of stages 2

MIPS Pipeline Five stages, one step per stage 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 3

Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps 4

Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps) 5

Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease 6

Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3 rd stage, and Access memory in 4 th stage Alignment of memory operands Memory access takes only one cycle 7

Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction 8

Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline bubble Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches 9

Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 10

Forwarding (aka Bypassing) Use result when it is computed Don t wait for it to be stored in a register Requires extra connections in the datapath 11

Load-Use Data Hazard Can t always avoid stalls by forwarding If value not computed when needed Can t forward backward in time! 12

Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 13

Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage 14

Stall on Branch Wait until branch outcome determined before fetching next instruction 15

Branch Prediction Longer pipelines can t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay 16

MIPS with Predict Not Taken Prediction correct Prediction incorrect 17

More-Realistic Branch Prediction Static branch prediction Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history 18

Pipeline Summary Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control Instruction set design affects complexity of pipeline implementation 19

MIPS Pipelined Datapath MEM Right-to-left fl ow leads to h azards WB 20

Pipeline registers Need registers between stages To hold information produced in previous cycle 21

Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath Single-clock-cycle pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. multi-clock-cycle diagram Graph of operation over time We ll look at single-clock-cycle diagrams for load & store 22

IF for Load, Store, 23

ID for Load, Store, 24

EX for Load 25

MEM for Load 26

WB for Load Wrong register number 27

Corrected Datapath for Load 28

EX for Store 29

MEM for Store 30

WB for Store 31

Multi-Cycle Pipeline Diagram Form showing resource usage 32

Multi-Cycle Pipeline Diagram Traditional form 33

Single-Cycle Pipeline Diagram State of pipeline in a given cycle 34

Pipelined Control (Simplified) 35

Pipelined Control Control signals derived from instruction As in single-cycle implementation 36

Pipelined Control 37

Data Hazards in ALU Instructions Consider this sequence: sub $2, $1,$3 and $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) We can resolve hazards with forwarding How do we detect when to forward? 38

Dependencies & Forwarding 39

Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt 40 Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg

Detecting the Need to Forward But only if forwarding instruction will write to a register EX/MEM.RegWrite, MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd 0, MEM/WB.RegisterRd 0 41

Forwarding Paths 42

Forwarding Conditions EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard 43 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

Double Data Hazard Consider the sequence: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur Want to use the most recent Revise MEM hazard condition Only fwd if EX hazard condition isn t true 44

Revised Forwarding Condition MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 45

Datapath with Forwarding 46

Load-Use Data Hazard Need to stall for one cycle 47

Load-Use Hazard Detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard when ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble 48

How to Stall the Pipeline Force control values in ID/EX register to 0 EX, MEM and WB do nop (no-operation) Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for lw Can subsequently forward to EX stage 49

Stall/Bubble in the Pipeline Stall inserted here 50

Stall/Bubble in the Pipeline 51 Or, more accurately

Datapath with Hazard Detection 52

Stalls and Performance Stalls reduce performance But are required to get correct results Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure 53