CSEE 3827: Fundamentals of Computer Systems

Similar documents
Processor (II) - pipelining. Hwansoo Han

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2

COMPUTER ORGANIZATION AND DESIGN

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

LECTURE 3: THE PROCESSOR

Chapter 4. The Processor

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

COMPUTER ORGANIZATION AND DESIGN

Chapter 4. The Processor

Chapter 4. The Processor

Thomas Polzer Institut für Technische Informatik

EIE/ENE 334 Microprocessors

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Chapter 4. The Processor

Pipelining. CSC Friday, November 6, 2015

Outline Marquette University

ECE260: Fundamentals of Computer Engineering

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Chapter 4 The Processor 1. Chapter 4A. The Processor

COMPUTER ORGANIZATION AND DESI

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Chapter 4. The Processor

Chapter 4. The Processor. Jiang Jiang

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

LECTURE 9. Pipeline Hazards

Chapter 4. The Processor

The Processor: Improving the performance - Control Hazards

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CENG 3420 Lecture 06: Pipeline

14:332:331 Pipelined Datapath

ELE 655 Microprocessor System Design

1 Hazards COMP2611 Fall 2015 Pipelined Processor

ECE260: Fundamentals of Computer Engineering

Chapter 4 (Part II) Sequential Laundry

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

DEE 1053 Computer Organization Lecture 6: Pipelining

Design a MIPS Processor (2/2)

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

ECS 154B Computer Architecture II Spring 2009

MIPS An ISA for Pipelining

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Lecture 9 Pipeline and Cache

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

COMPUTER ORGANIZATION AND DESIGN

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

ECE154A Introduction to Computer Architecture. Homework 4 solution

CS 251, Winter 2018, Assignment % of course mark

ECEC 355: Pipelining

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

ECE Exam II - Solutions November 8 th, 2017

Basic Pipelining Concepts

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

COMP2611: Computer Organization. The Pipelined Processor

Lecture 2: Processor and Pipelining 1

COSC 6385 Computer Architecture - Pipelining

CS 251, Winter 2019, Assignment % of course mark

Chapter 4. The Processor

Instr. execution impl. view

ECE232: Hardware Organization and Design

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

COSC121: Computer Systems. ISA and Performance

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Pipelining. Pipeline performance

There are different characteristics for exceptions. They are as follows:

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

CS/CoE 1541 Exam 1 (Spring 2019).

LECTURE 10. Pipelining: Advanced ILP

Computer Architecture

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

ECE331: Hardware Organization and Design

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Pipelined Processor Design

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Modern Computer Architecture

Instruction Pipelining Review

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

ECE260: Fundamentals of Computer Engineering

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Transcription:

CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu

Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected Example: On machine A, multiplication accounts for 80s out of 100s total CPU time. How much improvement in multiplication performance to get 5x speedup overall? Corollary: make the common case fast 2

Single-Cycle CPU Performance Issues Longest delay determines clock period Critical path: load instruction instruction memory register file ALU data memory register file Not feasible to vary clock period for different instructions We will improve performance by pipelining 3

Pipelining Laundry Analogy 4

MIPS Pipeline Five stages, one step per stage IF: Instruction fetch from memory ID: Instruction decode and register read EX: Execute operation or calculate address MEM: Access memory operand WB: Write result back to register 5

MIPS Pipeline Illustration 1 6

MIPS Pipeline Illustration 2 7

Pipeline Performance 1 Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath to single-cycle datapath Instr IF ID EX MEM WB Total (PS) lw 200 100 200 200 100 800 sw 200 100 200 200 700 R-format 200 100 200 100 600 beq 200 100 200 500 8

Pipeline Performance 2 Single-cycle Tclock = 800ps Pipelined Tclock = 200ps 9

Pipeline Speedup Speedup due to increased throughput. If all stages are balanced (i.e., all take the same time) Pipeline instr. completion rate = Single-cycle instr. completion rate * Number of stages If not balanced, speedup is less 10

Hazard A hazard is a situation that prevents starting the next instruction in the next cycle Structure hazards occur when a required resource is busy Data hazards occur when an instruction needs to wait for an earlier instruction to complete its data write Control hazards occur when the control action (i.e., next instruction to fetch) depends on a value that is not yet ready 11

Structure Hazard Conflict for use of a resource In a MIPS pipeline with a single memory Load/store requires memory access Instruction fetch would have to stall for that cycle This introduces a pipeline bubble Hence, pipelined datapaths require separate instruction and data memories (or separate instruction and data caches) 12

Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 13

Forwarding (aka Bypassing) Use result when it is computed Don t wait for it to be stored in a register Requires extra connections in the datapath 14

Load-Use Data Hazard Can t always avoid stalls by forwarding If value not computed when needed Can t forward backward in time! 15

Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction MIPS assembly code for A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles 11 cycles 16

Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage (See Sec. 4.8) 17

Stall on Branch Wait until branch outcome determined before fetching next instruction 18

Branch Prediction Longer pipelines can t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay 19

MIPS with Predict Not Taken prediction correct prediction incorrect 20

More-Realistic Branch Prediction Static branch prediction Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history 21

Pipeline Summary Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control Instruction set design affects complexity of pipeline implementation 22

MIPS Pipelined Datapath

MIPS Pipelined Datapath MEM Right-toleft flow leads to hazards WB 24

Pipeline registers Need registers between stages, to hold information produced in previous cycle 25

IF for Load 26

ID for Load 27

EX for Load 28

MEM for Load 29

WB for Load wrong register number! 30

Corrected Datapath for Load (A single-cycle pipeline diagram) 31

Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath Single-clock-cycle pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. multi-clock-cycle diagram Graph of operation over time We ll look at single-clock-cycle diagrams for load 32

Multi-Cycle Pipeline Diagram 1 Form showing resource usage over time 33

Multi-Cycle Pipeline Diagram 2 Traditional form 34

Single-Cycle Pipeline Diagram State of pipeline in a given cycle 35

Pipelined Control (Simplified) 36

Pipelined Control Scheme Control signals derived from instruction As in single-cycle implementation 37

Pipeline Control Values Control signals are conceptually the same as they were in the single cycle CPU. ALU Control is the same. Main control also unchanged. Table below shows same control signals grouped by pipeline stage 38

Controlled Pipelined CPU 39

Data Hazards in ALU Instructions Consider this instruction sequence: sub $2,$1,$3 and $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) We can resolve hazards with forwarding How do we detect when to forward? 40

Dependencies & Forwarding 41

Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg 42

Detecting the Need to Forward 2 But only if forwarding instruction will write to a register other than $zero! EX/MEM.RegWrite, MEM/WB.RegWrite EX/MEM.RegisterRd 0, MEM/WB.RegisterRd 0 43

Simplified Pipeline w. No Forwarding 44

Simplified Pipeline w. Forwarding Paths 45

Simplified Pipeline w. Forwarding Paths 1 keep track of register sources/targets for in-flight instructions 46

Simplified Pipeline w. Forwarding Paths 2 option of routing previously calculated values directly to ALU 47

Simplified Pipeline w. Forwarding Paths 3 Operand forwarding (aka register bypass) controlled by forwarding unit 48

Forwarding Conditions EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 49

Double Data Hazard Consider the sequence: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur Want to use the most recent Revise MEM hazard condition Only fwd if EX hazard condition isn t true 50

Revised Forwarding Condition MEM hazard Condition for EX hazard on RegisterRs if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 Condition for EX hazard on RegisterRt if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 51

Datapath with Forwarding 52

Load-Use Data Hazard Need to stall for one cycle 53

Load-Use Hazard Detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard when ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble 54

How to Stall the Pipeline Force control values in ID/EX register to 0 Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for lw Can subsequently forward to EX stage 55

Stall/Bubble in the Pipeline Stall inserted here 56

Stall/Bubble in the Pipeline Stall inserted here 57

Datapath with Hazard Detection 58

Stalls and Performance Stalls reduce performance But are required to get correct results Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure 59

Branch Hazards Determine branch outcome and target as early as possible Move hardware to determine outcome to ID stage Target address adder Register comparator 60

Branch Taken 1 61

Branch Taken 2 62

Data Hazards for Branches 1 If a comparison register is a destination of 2nd or 3rd preceding ALU instruction can resolve using forwarding add $1, $2, $3 IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB IF ID EX MEM WB beq $1, $4, target IF ID EX MEM WB 63

Data Hazards for Branches 2 If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction need 1 stall cycle lw $1, addr IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB beq stalled IF ID beq $1, $4, target ID EX MEM WB 64

Data Hazards for Branches 3 If a comparison register is a destination of immediately preceding load instruction need 2 stall cycles lw $1, addr IF ID EX MEM WB beq stalled IF ID beq stalled ID beq $1, $0, target ID EX MEM WB 65

Exceptions and Interrupts Unexpected events requiring change in flow of control Exception Arises within the CPU (e.g., undefined opcode, overflow, syscall, ) Interrupt From an external I/O controller Dealing with them without sacrificing performance is hard 66

Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception Program Counter (EPC) Save indication of the problem In MIPS: Cause register We ll assume 1-bit 0 for undefined opcode, 1 for overflow Jump to handler at 8000 00180 67

Handler Actions Read cause, and transfer to relevant handler Determine action required If restartable Take corrective action use EPC to return to program Otherwise Terminate program Report error using EPC, cause, 68

Exceptions in a Pipeline Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware 69