Recap: Summary of Pipelining Basics

Size: px

Start display at page:

Download "Recap: Summary of Pipelining Basics"

Mariah Parks
6 years ago
Views:

1 Recap: ummary of Pipelining asics C152 Computer rchitecture and Engineering Lecture 14 Pipelining Control Continued Introduction to dvanced Pipelining arch 8, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: 5 stages: Fetch: Fetch instruction from memory ecode: get register values and decode control information ute: ute arithmetic operations/calculate addresses ory: o memory ops (load or store) Writeback: Write results back to registers (I.e. COIT) Pipelines pass control information down the pipe just as data moves down pipe Foarding/talls handled by local control alancing length of instructions makes pipelining much smoother Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency Lec14.1 Lec14.2 Recap: Can pipelining get us into trouble Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time - E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready - E.g., one sock of pair in dryer and one in washer; can t fold until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated - E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - branch instructions Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards Lec14.3 Recap: Hazards I-Fet ch C tructural Hazard I-Fet ch C IF C EX IF C EX OpFetch OpFetch tore IFetch C OpFetch Jump IFetch C W IF C EX Control Hazard RW (read after write) Hazard W W WW Hazard (write after write) IF C OF Ex IF C OF Ex R WR Hazard (write after read) Lec14.4

2 Recap: Control iagram Recap: Pipelined Processor for slides < + ; < or ZX; <- []; < +4; <- R[rs]; < R[rt] < + X; < + X; If Cond < +X; Inst. Valid cd ex ubbles talls Ex mem wb W Equal < R[rd] < ; Next < R[rt] < ; Inst. < [] R[rd] < ; Equal [] <- Lec14.5 Next eparate control at each stage talls propagate backwards to freeze previous stages ubbles in pipeline introduced by placing Noops into local stage, stall previous stages. Lec14.6 The ig Picture: Where are We Now Recall: ingle cycle control! The Five Classic Components of a Computer Processor Input Control ory path Output Today s Topics: Recap last lecture Review IP R3000 pipeline dministrivia dvanced Pipelining upercalar, VLIW/EPIC Next ddress Ideal Instruction ory Instruction ddress Clk 32 Rd 5 Clk Instruction Rs 5 Rt 5 Rw Ra Rb bit isters 32 Control Control ignals 32 LU path Conditions 32 ddress In Clk Ideal ory Out Lec14.7 Lec14.8

3 tationary Control path + tationary Control The ain Control generates the control signals during /ec Control signals for (ExtOp, LUrc,...) are used 1 cycle later Control signals for (Wr ranch) are used 2 cycles later Control signals for Wr (to Wr) are used 3 cycles later /ec Wr Inst. fun op rs rt ecode rs rt v wb me ex im v wb me v wb W IF/I ister ain Control ExtOp LUrc LUOp st W ranch r to Wr I/Ex ister ExtOp LUrc LUOp st W ranch r to Wr Ex/ ister W rranch to Wr /Wr ister to Wr Next Lec14.9 Lec14.10 Let s Try it Out tart: Fetch 10 n n n n 10 lw r1, r2(3 these addresses are octal Inst. ecode rs Next rt im 10 IF W 10 lw r1, r2(3 Lec14.11 Lec14.12

4 Fetch 14, ecode 10 Inst. lw r1, r2(3 ecode 2 rt n n n im W Fetch 20, ecode 14, 10 Inst. addi r2, r2, 3 ecode 2 rt lw r1 35 n n W Next 14 I IF 10 lw r1, r2(3 Next r2 20 EX 10 lw r1, r2(3 I IF Lec14.13 Lec14.14 Fetch 24, ecode 20, 14, 10 Inst. sub r3, r4, r5 ecode 4 5 Next addi r2, r2, 3 3 r2 24 lw r1 r2+35 n W 10 lw r1, r2(3 EX I IF Lec14.15 Fetch 30, cd 24, Ex 20, 14, W 10 Inst. beq r6, r7 100 ecode 6 7 Next sub r3 r4 r5 30 r2+3 Note elayed ranch: always execute ori after beq addi r2 lw r1 [r2+35] W W 10 lw r1, r2(3 EX I IF Lec14.16

5 Fetch 100, cd 30, Ex 24, 20, W 14 Fetch 104, cd 100, Ex 30, 24, W 20 Inst. ori r8, r9 17 ecode 9 xx Next beq 100 r6 r7 100 sub r3 r4-r5 addi r2 r2+3 W r1[r2+35] 10 lw r1, r2(3 W EX I IF Lec14.17 Inst. ecode Next Fill it in yourself! W 10 lw r1, r2(3 EX W I Lec14.18 Fetch 110, cd 104, Ex 100, 30, W 24 Fetch 114, cd 110, Ex 104, 100, W 30 Inst. ecode W Inst. ecode W 10 lw r1, r2(3 10 lw r1, r2(3 Next W Next W Fill it in yourself! EX Lec14.19 Fill it in yourself! Lec14.20

6 dministrivia Updated Lab 4 schedule: ubmit by midnight tomorrow night (Friday 3/9) emo to T next Wednesday in section Updated Lab 5 schedule: Up there now (sorry about that) ail problem 0 to T by tomorrow night at idnight - Evaluation of your partners ail Lab 5 breakdowns to your Ts by tomorrow at idnight Get started on Lab 5: Pipelining is difficult to get right! e sure that we will test gotcha cases in our mystery programs Tuesday: advanced pipelining Out-of-order execution/register renaming Reorder buffers Recap: Hazards void some by design eliminate WR by always fetching operands early (C) in pipe eleminate WW by doing all Ws in order (last stage, static) etect and resolve remaining ones stall or foard (if possible) IF C EX W RW Hazard IF C EX W WW Hazard IF C EX W IF C OF Ex IF C OF Ex R WR Hazard olutions to idterm I will be up later tonight. Lec14.21 Lec14.22 Hazard etection uppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. New Inst Instruction ovement: Inst I Inst J RW hazard exists on register Uif U Rregs( i ) ˆWregs( j ) Keep a record of pending writes (for inst s in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. WW hazard exists on register Uif U Wregs( i ) ˆWregs( j ) WR hazard exists on register Uif U Wregs( i ) ˆRregs( j ) Window on execution: Only pending instructions can cause exceptions Lec14.23 Record of Pending Writes In Pipeline isters s alu mem IU npc Imem op rs rt im n op n op m n op s Current operand registers Pending writes hazard < ((rs ex) & regw ex ) OR ((rs mem) & regw me ) OR ((rs & regw wb) wb ) OR ((rt & regw ex) ex ) OR ((rt & regw mem) me ) OR ((rt wb ) & regw wb ) Lec14.24

7 Resolve RW by foarding (or bypassing) What about memory operations s Foard mux alu mem m IU npc Imem op rs rt im n op n op n op etect nearest valid write op operand register and foard into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Foarding ypassing º If instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! º What does delaying W on arithmetic operations cost cycles hardware º What about data dependence on loads R1 <- R4 + R5 R2 <- [ R2 + I ] R3 <- R2 + R1 elayed Loads º Can recognize this in decode stage and introduce bubble while stalling fetch stage (hint for lab 5!) º Tricky situation: R1 <- [ R2 + I ] [R3+34] <- R1 Handle with bypass in memory stage! op Rd Ra Rb op Rd Ra Rb Rd Rd to reg file T R s Lec14.25 Lec14.26 Compiler voiding Load talls: scheduled unscheduled 54% gcc 31% 42% spice 14% 65% tex 25% 0% 20% 40% 60% 80% % loads stalling pipeline What about Interrupts, Traps, Faults External Interrupts: llow pipeline to drain, Fill with NOPs Load with interrupt address Faults (within instruction, restartable) Force trap instruction into IF disable writes till trap hits W must save multiple s or + state Recall: Precise Exceptions Ÿ tate of the machine is preserved as if program executed up to the offending instruction ll previous instructions completed Offending instruction and all following instructions act as if they have not even started ame system code will work on different implementations Lec14.27 Lec14.28

8 Exception/Interrupts: Implementation questions 5 instructions, executing in 5 different pipeline stages! Who caused the interrupt tage Problem interrupts occurring IF Page fault on instruction fetch; misaligned memory access; memory-protection violation I Undefined or illegal opcode EX rithmetic exception E Page fault on data fetch; misaligned memory access; memory-protection violation; memory error How do we stop the pipeline How do we restart it o we interrupt immediately or wait How do we sort all of this out to maintain preciseness Lec14.29 Exception Handling s alu mem m IU npc Imem lw $2,20($ im n op s Excp detect bad instruction address detect bad instruction Excp Excp Excp detect overflow detect bad data address llow exception to take effect Lec14.30 nother look at the exception problem Time TL ad Inst Inst TL fault Overflow Program Flow IFetch cd W Use pipeline to sort this out! IFetch cd W IFetch cd W Pass exception status along with instruction. Keep track of s for every instruction in pipeline. on t act on exception until it reache W stage Handle interrupts through faulting noop in IF stage When instruction reaches end of E stage: ave Ÿ E, Interrupt vector addr Ÿ Turn all instructions in earlier stages into noops! IFetch cd W Lec14.31 Resolution: Freeze above & ubble elow s alu IU npc Imem op rs rt im n op n op mem m n op s bubble freeze Flush accomplished by setting invalid bit in pipeline Lec14.32

9 FYI: IP R3000 clocking discipline IP R3000 Instruction Pipeline phi1 phi2 2-phase non-overlapping clocks Pipeline stage is two (level sensitive) latches Inst Fetch ecode Read LU / E. ory Write TL I-Cache RF Operation W Resource Usage E.. TL -Cache TL TL I-cache RF W Edge-triggered phi1 phi2 phi1 LULU -Cache Write in phase 1, read in phase 2 > eliminates bypass from W Lec14.33 Lec14.34 Recall: Hazard on r1 IP R3000 ulticycle Operations I n s t r. O r d e r Time (clock cycles) IF I/RF EX E W add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 LU Im m LU Im m LU Im m Im LU m LU Im m With IP R3000 pipeline, no need to foard from W stage Lec14.35 op Rd Ra Rb mul Rd Ra Rb Rd Rd to reg file Ex: ultiply, ivide, Cache iss R T Use control word of local stage to step through multicycle operation tall all stages above multicycle operation in the pipeline rain (bubble) stages below it lternatively, launch multiply/divide to autonomous unit, only stall pipe if attempt to get result before ready - This means stall mflo/mfhi in decode stage if multiply/divide still executing - Extra credit in Lab 5 does this Lec14.36

10 Is CPI 1 for our pipeline Remember that CPI is an verage # cycles/inst IFetch cd W IFetch cd W IFetch cd W IFetch cd W CPI here is 1, since the average throughput is 1 instruction every cycle. What if there are stalls or multi-cycle execution Usually CPI > 1. How close can we get to 1 Case tudy: IP R4000 (200 Hz) 8 tage Pipeline: IF first half of fetching of instruction; selection happens here as well as initiation of instruction cache access. I second half of access to instruction cache. RF instruction decode and register fetch, hazard checking and also instruction cache hit detection. EX execution, which includes effective address calculation, LU operation, and branch target computation and condition evaluation. F data fetch, first half of access to data cache. second half of access to data cache. TC tag check, determine whether the data cache access hit. W write back for loads and register-register operations. 8 tages: What is impact on Load delay ranch delay Why Lec14.37 Lec14.38 Case tudy: IP R4000 IP R4000 Floating Point 7:2&\FOH /RG/WHQF\ 7+5((&\FOH %UQFK/WHQF\ FRQGLWLRQVHYOXWHG GXULQJKVH 'HO\VORWOXVWZRVWOOV %UQFKOLNHO\FQFHOVGHO\VORWLIQRWWNHQ 7& 7& :% 7& :% 7& FP dder, FP ultiplier, FP ivider Last step of FP ultiplier/ivider uses FP dder HW 8 kinds of stages in FP units: tage Functional unit escription FP adder antissa stage FP divider ivide pipeline stage E FP multiplier Exception test stage FP multiplier First stage of multiplier N FP multiplier econd stage of multiplier R FP adder Rounding stage FP adder Operand shift stage U Unpack FP numbers Lec14.39 Lec14.40

11 IP FP Pipe tages FP Instr dd, ubtract U + +R R+ ultiply U E+ N N+ R ivide U R R, +R, +, +R,, R quare root U E (+R) 108 R Negate U bsolute value U FP compare U R tages: First stage of multiplier antissa stage N econd stage of multiplier ivide pipeline stage R Rounding stage E Exception test stage Operand shift stage U Unpack FP numbers R4000 Performance Not ideal CPI of 1: Load stalls (1 or 2 clock cycles) ranch stalls (2 cycles + unfilled slots) FP result stalls: RW data hazard (latency) FP structural stalls: Not enough FP hardware (parallelism) eqntott espresso gcc li doduc nasa7 ora spice2g6 su2cor tomcatv Lec14.41 ase Load stalls ranch stalls FP result stalls FP structural stalls Lec14.42 ummary Hazards limit performance tructural: need more HW resources : need foarding, compiler scheduling Control: early evaluation &, delayed branch, prediction hazards must be handled carefully: RW data hazards handled by foarding WW and WR hazards don t exist in 5-stage pipeline IP I instruction set architecture made pipeline visible (delayed branch, delayed load) Exceptions in 5-stage pipeline recorded when they occur, but acted on only at W (end of E) stage ust flush all previous instructions ore performance from deeper pipelines, parallelism Lec14.43

Review: Summary of Pipelining Basics

Review: Summary of Pipelining Basics Review: ummary of Pipelining asics C152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to Advanced Pipelining October 18, 1999 John Kubiatowicz (http.cs.berkeley.edu/~kubitron)