Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Size: px

Start display at page:

Download "Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST"

Nathaniel Cummings
5 years ago
Views:

1 Chapter 3. Pipelining EE511 In-Cheol Park, KAIST

2 Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup = # of stages

4 IF (Instruction fetch) ID (Instruction decode / Register fetch) EX (Execution / Effective address) MEM (Memory access / Branch completion) WB (Write-back) IF/ID ID/EX EX/MEM MEM/WB IF ID EX MEM WB

9 Hazards Prevent the next instruction from executing during its desired clock cycle pipeline stall Earlier instructions must continue, while later instructions are stalled Classes Structural Hazards Resource conflicts Data Hazards Data dependency Control Hazards Caused by instructions changing the PC such as branches

10 Functional unit conflict For example, not fully pipelined FU Memory access conflict For example, MEM and IF Solutions: Separate I$ and D$ Dual-port memory On-chip I$ Instruction queue Register file access conflict For example, ID and WB Solutions: Multi-port register file Time multiplexed R/W access

14 Data hazard classification RAW (True data dependency) Internal data forwarding To reduce forwarding logic, register file write is done before read Instruction scheduling Rearranges code sequence Delayed Load Insert a NOP if there is no proper instruction to be inserted into the delay slot WAR (Anti-dependency), WAW (Output dependency) Register renaming

22 Pipeline stalls until the new PC is available Branch delay Turns into a branch penalty Solutions Pipeline stalls when we find a branch instruction Fill with NOPs Rearrange code sequence Delayed branch To reduce the branch penalty Compute the branch instructions as early as possible Target, taken/not-taken Delayed branch / Squashed branch / Annulled branch Branch prediction

33 Static prediction at compile time Predict taken or predict not-taken as a whole Predict on the basis of branch direction Backward-going taken Forward-going not taken Profile-based prediction: Branch prediction for each individual branch instruction Individual branch instructions are highly biased Introduce a prediction bit in the instruction format Dynamic Prediction at run time

36 Exception / Interrupt Synchronous / Asynchronous User requested / Coerced User maskable / Nonmaskable Within / Between instructions Resume / Terminate Restartability almost all machines support Precise exception Restarting Execution 1. Force a trap instruction into the pipeline on the next IF 2. Turn off all writes for the faulting instructions and all following instructions in the pipeline, but not the preceding instructions 3. Save the PC of the faulting instruction

39 Initiation interval = repeat interval The number of cycles that must elapse between issuing two operations of a given type Latency The number of intervening cycles between an instruction that produces a result and an instruction that uses the result # of EX stages 1 Problems in longer latency pipelines Structural hazards Multiple register writes Stall before it issues Stall a conflicting instruction when it tries to enter the MEM stage

44 WAW hazards no longer reach WB in order Delay the issue of the later instruction Stamp out the result of the former instruction so that the instruction does not write its result Instructions can complete in an order different from that of issued (outof-order completion) Leads to imprecise exceptions RAW hazards are more frequent

47 Precise / Imprecise Precise exceptions Exception is checked at the WB stage Hardware posts all exceptions in a status vector which is carried along as the instruction goes down the pipeline Once an exception indication is set in the status vector, all writes are turned off

48 Ignore the problem and settle for imprecise exceptions Two operating modes Fast but imprecise / slower but precise Buffer the results of an instruction until all the instruction that were issued earlier are complete History file / future file Smith and Plezskun, Implementing precise interrupts in pipelined processors, IEEE Trans. Computes, 1988 Keep enough information so that the trap-handling routines can create a precise sequence for the exception Hwu and Patt, Check-point repair for out-of-order execution machines, ISCA 1987 Allows the instruction issue only if it is certain that all the instructions before the issuing instruction will complete without causing an exception

51 Variable instruction lengths and running times can lead to imbalance among pipeline stages Sometimes justify the added complexity cache Sophisticated addressing modes can complicate pipeline control and make it difficult to keep the pipeline flowing Writes into instruction space (self-modifying code) can cause trouble for pipelining Implicitly set condition codes increase the difficulty of finding when a branch has been decided and the difficulty of scheduling branch delays

52 Deeper integer pipeline 8 stages IF IS RF EX DF DS TC WB IF : First half of instruction fetch IS : Second half of instruction fetch, complete I$ access DF : First half of D$ access DS : Second half of data fetch, completion of D$ access TC : Tag check, determine whether the D$ access hit Two cycle load delay 3 cycle branch delay Single cycle delayed branch Predict-not-taken for the remaining two cycles If taken, two idle cycles

58 Pitfall: Unexpected execution sequences may cause unexpected hazards Pitfall: Extensive pipelining can impact other aspects of a design, leading to overall worse cost/performance Fallacy: Increasing the number of pipeline stages always increases performance Pitfall: Evaluating a compile-time scheduler on the basis of unoptimized code

Instruction Pipelining Review

Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number