Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Similar documents
Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Full Datapath. Chapter 4 The Processor 2

LECTURE 3: THE PROCESSOR

COMPUTER ORGANIZATION AND DESIGN

CSEE 3827: Fundamentals of Computer Systems

Chapter 4. The Processor

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Processor (II) - pipelining. Hwansoo Han

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Chapter 4. The Processor

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COMPUTER ORGANIZATION AND DESIGN

Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

EIE/ENE 334 Microprocessors

Chapter 4 The Processor 1. Chapter 4B. The Processor

The Processor: Improving the performance - Control Hazards

Thomas Polzer Institut für Technische Informatik

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Chapter 4. The Processor

Chapter 4. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Chapter 4 (Part II) Sequential Laundry

Outline Marquette University

COMPUTER ORGANIZATION AND DESI

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor. Jiang Jiang

Pipelining. CSC Friday, November 6, 2015

14:332:331 Pipelined Datapath

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Chapter 4. The Processor

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

Improve performance by increasing instruction throughput

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Chapter 4 The Processor 1. Chapter 4A. The Processor

CENG 3420 Lecture 06: Pipeline

1 Hazards COMP2611 Fall 2015 Pipelined Processor

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

ECE260: Fundamentals of Computer Engineering

PS Midterm 2. Pipelining

MIPS An ISA for Pipelining

DEE 1053 Computer Organization Lecture 6: Pipelining

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Lecture 9 Pipeline and Cache

COMP2611: Computer Organization. The Pipelined Processor

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

LECTURE 9. Pipeline Hazards

1048: Computer Organization

ECE331: Hardware Organization and Design

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer Organization and Structure

ECE Exam II - Solutions November 8 th, 2017

ECS 154B Computer Architecture II Spring 2009

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W12-M

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 2: Processor and Pipelining 1

Processor (I) - datapath & control. Hwansoo Han

ECE232: Hardware Organization and Design

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

COMPUTER ORGANIZATION AND DESIGN

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Designing a Pipelined CPU

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Advanced Computer Architecture Pipelining

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

CS 251, Winter 2018, Assignment % of course mark

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

ECE154A Introduction to Computer Architecture. Homework 4 solution

Solutions for Chapter 6 Exercises

Improving Performance: Pipelining

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

Lecture 6: Pipelining

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Review: Computer Organization

Design a MIPS Processor (2/2)

Lecture 7 Pipelining. Peng Liu.

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Pipelined Processor Design

There are different characteristics for exceptions. They are as follows:

Pipeline design. Mehran Rezaei

Transcription:

Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2)

Pipeline Performance Assume time for stages is ps for register read or write 2ps for other stages Compare pipelined path with singlecycle path Instr Instr fetch Register read ALU op emory access Register write Total time lw 2ps ps 2ps 2ps ps 8ps sw 2ps ps 2ps 2ps 7ps R-format 2ps ps 2ps ps 6ps beq 2ps ps 2ps 5ps (3) Pipeline Performance Single-cycle (T c = 8ps) Pipelined (T c = 2ps) (4) 2

Pipeline Speedup If all stages are balanced i.e., all take the same time Inter instruction gap pipelined = Inter instruction gap nonpipelined number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease (5) Basic Idea All instructions are 32-bits Few & regular instruction formats Alignment of operands (6) 3

Pipelining What makes it easy All instructions are the same length Simple instruction formats emory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one control hazards: need to worry about branch instructions hazards: an instruction depends on a preious instruction What really makes it hard: exception handling trying to improe performance with out-of-order execution, etc. (7) Pipeline registers Need registers between stages To hold information produced in preious cycle Pipeline stage execution time (8) 4

Graphically Representing Pipelines Time 2 4 6 8 lw IF ID EX E add IF ID EX E Shading indicates the unit is being used by the instruction Shading on the right half of the register file (ID or ) or means the element is being read in that stage Shading on the left half means the element is being written in that stage (9) Graphically Representing Pipelines Program execution order (in instructions) lw $, 2($) Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 I Reg ALU D Reg sub $, $2, $3 I Reg ALU D Reg Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand paths () 5

Structural Hazard Time 2 4 6 8 lw IF ID EX E add IF ID EX E sub IF ID EX E add IF ID EX E Need to separate instruction and () IF for Load, Store, Pipeline stage execution time (2) 6

ID for Load, Store, Pipeline stage execution time (3) EX for Load Pipeline stage execution time (4) 7

E for Load Pipeline stage execution time (5) for Load Wrong register number (6) 8

Corrected Datapath for Load Pipeline stage execution time (7) EX for Store Pipeline stage execution time (8) 9

E for Store Pipeline stage execution time (9) for Store Pipeline stage execution time (2)

Pipelining Example add $4, $5, $6 lw $3, 24($) add $2, $3, $4 sub $, $2, $3 lw $, 2($) Note what is happening in the register file IF/ID ID/EX EX/E E/ 4 Shift left 2 PC ress Pipeline stage execution time register register 2 Registers register [2 6] [5 ] 6 2 Sign extend 32 u x RegDst Zero ALU ALU ress Data (2) Pipelined Control (Simplified) (22)

Pipelined Control Execution/ress Calculation stage control lines emory access stage control lines -back stage control lines Reg Dst ALU Op ALU Op ALU Src Branch em em Reg write em to Reg R-format lw sw X X beq X X Control signals deried from instruction As in single-cycle implementation Pass control signals along like (23) Pipelined Control (24) 2

Datapath with Control IF: lw $, 9($) PCSrc ID/EX EX/E Control E/ IF/ID EX PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (25) Datapath with Control IF: sub $, $2, $3 ID: lw $, 9($) PCSrc lw Control ID/EX EX/E E/ IF/ID E X PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (26) 3

Datapath with Control IF: and $2, $4, $5 PCSrc ID: sub $, $2, $3 EX: lw $, 9($) IF/ID sub Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (27) Datapath with Control IF: or $3, $6, $7 PCSrc ID: and $2, $4, $5 EX: sub $, $2, $3 E: lw $, 9($) IF/ID and Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (28) 4

Datapath with Control IF: add $4, $8, $9 PCSrc ID: or $3, $6, $7 EX: and $2, $4, $5 E: sub $,.. : lw $, 9($) IF/ID or Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (29) Datapath with Control IF: xxxx ID: add $4, $8, $9 EX: or $3, $6, $7 E: and $2 : sub $,.. PCSrc IF/ID add Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (3) 5

Datapath with Control IF: xxxx ID: xxxx EX: add $4, $8, $9 E: or $3,.. : and $2 PCSrc IF/ID Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (3) Datapath with Control IF: xxxx ID: xxxx EX: xxxx E: add $4,.. : or $3 PCSrc IF/ID Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (32) 6

Datapath with Control IF: xxxx ID: xxxx EX: xxxx E: xxxx : add $4.. PCSrc ID/EX EX/E Control E/ IF/ID EX PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (33) Data Hazards (4.7) An instruction depends on completion of access by a preious instruction add $s, $t, $t sub $t2, $s, $t3 (34) 7

Dependencies Problem with starting next instruction before first is finished dependencies that go backward in time are hazards Time (in clock cycles) Value of register $2: Program execution order (in instructions) sub $2, $, $3 CC CC 2 CC 3 CC 4 CC 5 CC 6 I Reg CC 7 CC 8 CC 9 / 2 2 2 2 2 D Reg and $2, $2, $5 I Reg D Reg or $3, $6, $2 I Reg D Reg add $4, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg (35) Hae compiler guarantee no hazards Where do we insert the nops? sub $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2) Software Solution Problem: this really slows us down! (36) 8

A Better Solution Consider this sequence: sub $2, $,$3 and $2,$2,$5 or $3,$6,$2 add $4,$2,$2 sw $5,($2) We can resole hazards with forwarding How do we detect when to forward? (37) Dependencies & Forwarding Do not wait for s to be written to the register file find them in the pipeline à forward to ALU (38) 9

Forwarding IF: add $4, $8, $9 PCSrc ID: or $3, $6, $7 EX: and $6, $4, $5 E: sub $,.. : lw $, 9($) IF/ID or Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (39) Forwarding (simplified) ID/EX EX/E E/ Register File ALU Data emory UX (4) 2

Forwarding (from EX/E) ID/EX EX/E E/ Register File UX ALU UX Data emory UX (4) Forwarding (from E/) ID/EX EX/E E/ Register File UX ALU UX Data emory UX (42) 2

Forwarding (operand selection) ID/EX EX/E E/ Register File UX ALU UX Data emory UX Forwarding Unit (43) Forwarding (operand propagation) ID/EX EX/E E/ Register File UX ALU UX Data emory UX Rd Rt UX Rt Rs Forwarding Unit EX/E Rd E/ Rd Combinational Logic! (44) 22

Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are gien by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when a. EX/E.RegisterRd = ID/EX.RegisterRs b. EX/E.RegisterRd = ID/EX.RegisterRt Fwd from EX/E pipeline reg 2a. E/.RegisterRd = ID/EX.RegisterRs 2b. E/.RegisterRd = ID/EX.RegisterRt Fwd from E/ pipeline reg (45) Detecting the Need to Forward But only if forwarding instruction will write to a register! EX/E.Reg, E/.Reg And only if Rd for that instruction is not $zero EX/E.RegisterRd, E/.RegisterRd (46) 23

Forwarding Paths (47) Forwarding Conditions EX hazard if (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EX.RegisterRs)) ForwardA = if (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EX.RegisterRt)) ForwardB = E hazard if (E/.Reg and (E/.RegisterRd ) and (E/.RegisterRd = ID/EX.RegisterRs)) ForwardA = if (E/.Reg and (E/.RegisterRd ) and (E/.RegisterRd = ID/EX.RegisterRt)) ForwardB = (48) 24

Consider the sequence: add $,$,$2 add $,$,$3 add $,$,$4 Both hazards occur Want to use the most recent Double Data Hazard Reise E hazard condition Only forward if EX hazard condition isn t true (49) E hazard Reised Forwarding Condition if (E/.Reg and (E/.RegisterRd ) and not (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EX.RegisterRs)) and (E/.RegisterRd = ID/EX.RegisterRs)) ForwardA = if (E/.Reg and (E/.RegisterRd ) and not (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EX.RegisterRt)) and (E/.RegisterRd = ID/EX.RegisterRt)) ForwardB = Checking precedence of EX hazard (5) 25

Datapath with Forwarding (5) Concurrent Execution Correct execution is about managing dependencies Producer-consumer Structural (using the same hardware component) We will come across other types of dependencies later! (52) 26

Load-Use Data Hazard Need to stall for one cycle (53) Forwarding IF: add $4, $8, $9 PCSrc ID: or $3, $6, $7 EX: and $6, $4, $ E: lw $, ($2) : lw $, 9($) IF/ID or Control ID/EX EX EX/E E/ PC 4 ress register register 2 Registers register Reg 2 Shift left 2 ALUSrc Zero ALU ALU Branch em ress Data emtoreg [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp RegDst (54) 27

Load-Use Hazard Detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are gien by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard when ID/EX.em and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble (55) Code Scheduling to Aoid Stalls Reorder code to aoid use of load in the next instruction C code for A = B + E; C = B + F; stall stall lw $t, ($t) lw $t2, 4($t) add $t3, $t, $t2 sw $t3, 2($t) lw $t4, 8($t) add $t5, $t, $t4 sw $t5, 6($t) 3 cycles lw $t, ($t) lw $t2, 4($t) lw $t4, 8($t) add $t3, $t, $t2 sw $t3, 2($t) add $t5, $t, $t4 sw $t5, 6($t) cycles (56) 28

How to Stall the Pipeline Force control alues in ID/EX register to EX, E and perform a nop (no-operation) Preent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again -cycle stall allows E to read for lw o Can subsequently forward to EX stage (57) Stall/Bubble in the Pipeline Stall inserted here (58) 29

Stall/Bubble in the Pipeline Or, more accurately (59) Datapath with Hazard Detection Pipeline stage execution time ALUSrc m is missing! (6) 3

Control Hazards (4.8) Branch instruction determines flow of control Fetching next instruction depends on branch outcome Pipeline cannot always fetch correct instruction o Still working on ID stage of branch In IPS pipeline Need to compare registers and determine the branch condition (6) Branch Hazards If branch outcome determined in E Flush these instructions (Set control alues to ) PC (62) 3

Reducing Branch Delay oe hardware to determine outcome to ID stage Target address adder Register comparator IF.Flush signal to squash IF/ID register Example: branch taken 36: sub $, $4, $8 4: beq $, $3, 72 44: and $2, $2, $5 48: or $3, $2, $6 52: add $4, $4, $2 56: slt $5, $6, $7... 72: lw $4, 5($7) (63) Example: Branch Taken (64) 32

Example: Branch Taken (65) Data Hazards for Branches If a comparison register is a destination of 2 nd or 3 rd preceding ALU instruction add $, $2, $3 IF ID EX E add $4, $5, $6 IF ID EX E IF ID EX E beq $, $4, target IF ID EX E n Can resole using forwarding to ID n Need to add forwarding logic! (66) 33

Data Hazards for Branches If a comparison register is a destination of preceding ALU instruction or 2 nd preceding load instruction Need stall cycle lw $, addr IF ID EX E add $4, $5, $6 IF ID EX E beq stalled IF ID beq $, $4, target ID EX E (67) Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction Need 2 stall cycles lw $, addr IF ID EX E beq stalled IF ID beq stalled ID beq $, $, target ID EX E (68) 34

Delay Slot (IPS) Expose pipeline Load and jump/branch entail a delay slot The instruction right after the jump or branch is executed before the jump/branch jal add lw function_a $4, $5, $6 ; executed before jmp $2, 8($4) ; executed after return Jump/branch and the delay slot instruction are considered indiisible In the delay slot, the compiler needs to schedule A useful instruction (either before the jmp, or after the jmp w/o side effect) otherwise a NOP (69) Branch Prediction Longer pipelines cannot readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In IPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay (7) 35

IPS with Predict Not Taken Prediction correct Prediction incorrect (7) -Bit Predictor: Shortcoming Inner loop branches mispredicted twice! outer: inner: beq,, inner beq,, outer n n ispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around (72) 36

2-Bit Predictor: State achine Only change prediction on two successie mispredictions (73) ore-realistic Branch Prediction Static branch prediction Based on typical branch behaior Example: loop and if-statement branches o o Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behaior o e.g., record recent history of each branch Assume future behaior will continue the trend o When wrong, stall while re-fetching, and update history (74) 37

AD Bobcat ECE 6 Later in this course ECE 6 Leel Parallelism (ILP) Later in this course http://hothardware.com (75) Intel Sandy Bridge bdti.com (76) 38

Exceptions and Interrupts (4.9) Unexpected eents requiring change in flow of control Different ISAs use the terms differently Exception Arises within the CPU o e.g., undefined opcode, oerflow, syscall, Interrupt From an external I/O controller Dealing with them without sacrificing performance is hard (77) Handling Exceptions In IPS, exceptions managed by a System Control Coprocessor (CP) Sae PC of offending (or interrupted) instruction In IPS: Exception Program Counter (EPC) Sae indication of the problem In IPS: Cause register We ll assume -bit o for undefined opcode, for oerflow Jump to handler at 88 (78) 39

An Alternate echanism Vectored Interrupts Handler address determined by the cause Example: Undefined opcode: C Oerflow: C 2 : C 4 s either Deal with the interrupt, or Jump to real handler (79) Handler Actions cause, and transfer to releant handler Determine action required If restartable Take correctie action use EPC to return to program Otherwise Terminate program Report error using EPC, cause, (8) 4

Another form of control hazard Exceptions in a Pipeline Consider oerflow on add in EX stage add $, $2, $ Preent $ from being clobbered Complete preious instructions Flush add and subsequent instructions Set Cause and EPC register alues Transfer control to handler Similar to mispredicted branch Use much of the same hardware (8) Pipeline with Exceptions (82) 4

Exception Properties Restartable exceptions Pipeline can flush the instruction Handler executes, then returns to the instruction o Re-fetched and executed from scratch PC saed in EPC register Identifies causing instruction Actually PC + 4 is saed o Handler must adjust (83) Exception Example Exception on add in 4 sub $, $2, $4 44 and $2, $2, $5 48 or $3, $2, $6 4C add $, $2, $ 5 slt $5, $6, $7 54 lw $6, 5($7) Handler 88 sw $25, ($) 884 sw $26, 4($) (84) 42

Exception Example (85) Exception Example (86) 43

ultiple Exceptions Pipelining oerlaps multiple instructions Could hae multiple exceptions at once Simple approach: deal with exception from earliest instruction Flush subsequent instructions Precise exceptions In complex pipelines ultiple instructions issued per cycle Out-of-order completion aintaining precise exceptions is difficult! (87) Imprecise Exceptions Just stop pipeline and sae state Including exception cause(s) Let the handler work out Which instruction(s) had exceptions Which to complete or flush o ay require manual completion Simplifies hardware, but more complex handler software Not feasible for complex multiple-issue out-of-order pipelines (88) 44

Performance How do we assess the impact of stall cycles? How close do we approach the ideal of one instruction per cycle execution time? Back to the CPI model! (89) Recall: Program Execution time Number of instruction classes # n & ExecutionTime = % C i CPI ( i= i cycle_time $ % '( ~= _count * CPI ag * clock_cycle_time algorithms/compiler architecture technology Clock Cycles CPI ag = Count = n i= " CPI i Count % i $ ' # Count & Relatie frequency (9) 45

Study Guide Gien a code block, and initial register alues (those that are accessed) be able to determine state of all pipeline registers at some future clock cycle. Determine the size of each pipeline register Track pipeline state in the case of forwarding and branches Compute the number of cycles to execute a code block odify the path to include forwarding and hazard detection for branches (this is trickier and time consuming but well worth it) (9) Study Guide (cont.) Schedule code (manually) to improe performance, for example to eliminate hazards and fill delay slots odify the path to add new instructions such as j odify the path to accommodate a two cycle access, i.e., the itself is a two cycle pipeline odify the forwarding and hazard control logic Gien a code sequence, be able to compute the number of stall cycles (92) 46

Study Guide (cont.) Track the state of the 2-bit branch predictor oer a sequence of branches in a code segment, for example a for-loop Show the pipeline state before and after an exception has taken place. (93) Glossary Branch prediction Branch hazards Branch delay Control hazard Data hazard Delay slot Dynamic instruction issue Forwarding Imprecise exception scheduling leel parallelism (ILP) Load-to-use hazard Pipeline bubbles Stall cycles Static instruction issue Structural hazard (94) 47