CA226 Advanced Computer Architecture

Size: px

Start display at page:

Download "CA226 Advanced Computer Architecture"

Andrea Price
5 years ago
Views:

1 Stephen Blott Today: data hazards Table of Contents 1 2 MIPS Pipeline Recall: the MIPS pipeline implements instruction level parallelism ideally, up to five instructions are executed (in part) on any clock cycle if one instruction were to exit the pipeline on each cycle: then the CPI would be 1 and, ideally, the MIPS pipeline approaches a CPI of 1 3 4

2 Example Speedup daddi r1,r1,1 daddi r2,r2,1 daddi r3,r3,1 daddi r4,r4,1 daddi r5,r5,1 Note Note to self: see pipeline.s. Ideally: each instruction takes 5 cycles to execute however, 5 instructions are in the pipeline so the number of cycles per instruction approaches 1 Note Note to self: Observe the effect on CPI of repeating the block of instructions, previous. 5 6 Hazards Types of Hazard The major hurdle to effective pipeline implementation is: hazards Structural hazards resource conflicts; hardware cannot support all instruction combinations simultaneously Data hazards when one instruction depends upon the result (which is not yet available) of a previous instruction (today) Control hazards when the address of the next instruction cannot be determined immediately 7 8

3 Data Hazards Example Ok Consider: dadd r1,r2,r3 ; instruction 1 dsub r4,r1,r5 ; instruction 2 and r6,r1,r5 ; instruction 3 or r8,r1,r9 ; instruction 4 xor r10,r1,r11 ; instruction 5 Turn off forwarding, and let s try running that Note to self: see hazards1.s. Instructions 2, 3, 4 and 5: each depend upon the result of instruction Illustration Observations Table 1. Two Read-After-Write (RAW) pipeline stalls: dadd r1,r2,r3 IF ID Ex Mem WB* dsub r4,r1,r5 IF ID RAW RAW *Ex and r6,r1,r5 IF stall stall ID or r8,r1,r9 IF This is known as a read after write (or RAW) stall: instruction 2 is blocked at ID because one of its arguments (registers) is not yet available in this case, all subsequent instructions are blocked too which is known as a pipeline stall Note This assumes that we can both write and read the register file in a single clock cycle. Typically, the write happens in the first half of the cycle, and the read in the second half

4 Next, Illustration Consider: the effect of replacing instruction 2 with a nop instruction (or any other, non-dependent instruction) Table 2. Still one RAW stall: dadd r1,r2,r3 IF ID Ex Mem WB* nop IF ID Ex Mem WB and r6,r1,r5 IF ID RAW *Ex Mem or r8,r1,r9 IF stall Id Ex Next, Illustration Finally, consider: the effect of replacing instruction 3 with a nop instruction (or any other, non-dependent instruction) Table 3. No stalls: dadd r1,r2,r3 IF ID Ex Mem WB* nop IF ID Ex Mem WB nop IF ID Ex Mem or r8,r1,r9 IF ID *Ex Mem 15 16

5 We could: find (two) other (independent) instructions to insert between such write-read dependencies but such dependencies are common and we rarely have enough instructions to fill the gaps However, such hazards are not insurmountable: the ALU produces the necessary value in cycle 3 (although it is not written back to the register file until cycle 5) that value is not needed by instruction 2 until cycle Forwarding Table 4. The value is available after cycle 3: dadd r1,r2,r3 IF ID Ex** Mem WB* dsub r4,r1,r5 IF ID RAW RAW *Ex and r6,r1,r5 IF stall stall ID or r8,r1,r9 IF Solution: data paths are added: EX/Mem.ALUOutput ID/EX.A (output) EX/Mem.ALUOutput ID/EX.B (output) Mem/WB.ALUOutput ID/EX.A (output) Mem/WB.ALUOutput ID/EX.B (output) when a read-after-write is detected, the ALU input: (either ID/EX.A or ID/EX.B) is switched to one of the two available ALUOutput pipeline registers (Ex/Mem or Mem/WB) 19 20

6 MIPS Pipeline Forwarding dadd r1,r2,r3 IF ID Ex** Mem WB dsub r4,r1,r5 IF ID **Ex Mem WB and r6,r1,r5 IF ID Ex Mem WB or r8,r1,r9 IF ID Ex Mem One of: EX/Mem.ALUOutput ID/EX.A EX/Mem.ALUOutput ID/EX.B Forwarding The WinMIPS64 Simulator dadd r1,r2,r3 IF ID Ex Mem** WB nop IF ID Ex Mem WB and r6,r1,r5 IF ID **Ex Mem WB or r8,r1,r9 IF ID Ex Mem The WinMIPS64 simulator: supports forwarding it can be either enabled or disabled see: Configure/Enable Forwarding One of: Mem/WB.ALUOutput ID/EX.A Mem/WB.ALUOutput ID/EX.B 23 24

7 Try turning on forwarding: and running the example again (hazards1.s) Now, consider the following daddi r1,r2,123 ; instruction 1 ld r4,0(r1) ; instruction 2 sd r4,8(r1) ; instruction 3 Here: there is a RAW dependency between the daddi instruction and the address calculation in both of the following instructions the address calculation is handled by the ALU, so these are handled by forwarding, as before Illustration And, again Table 5. No stalls due to address calculation: daddi r1,r2,123 IF ID Ex** Mem++ WB ld r4,0(r1) IF ID **Ex Mem WB sd r4,8(r1) IF ID ++Ex Mem WB daddi r1,r2,123 ; instruction 1 ld r4,0(r1) ; instruction 2 sd r4,8(r1) ; instruction 3 EX/Mem.ALUOutput ID/EX.A for cycle 4 Mem/WB.ALUOutput ID/EX.A for cycle

8 And, again daddi r1,r2,123 ; instruction 1 ld r4,0(r1) ; instruction 2 sd r4,8(r1) ; instruction 3 Also: the sd instruction depends upon the result of the ld Table 6. This can be solved by forwarding too: daddi r1,r2,123 IF ID Ex Mem WB ld r4,0(r1) IF ID Ex Mem** WB sd r4,8(r1) IF ID Ex **Mem WB Here: Mem/WB.LMD EX/MEM.B for cycle In full daddi r1,r2,123 IF ID Ex++ Mem== WB ld r4,0(r1) IF ID ++Ex Mem** WB sd r4,8(r1) IF ID ==Ex **Mem WB In all: four pipeline stalls are eliminated (note to self: see stalls1.s) EX/Mem.ALUOutput ID/EX.A for cycle 4 Mem/WB.ALUOutput ID/EX.A for cycle 5 Mem/WB.LMD EX/MEM.B for cycle

9 MIPS Pipeline Unfortunately Forwarding cannot solve all RAW problems: ld r1,n(r0) dadd r2,r1,r An Insurmountable Stall Table 7. You can t forward backwards in time: ld r1,n(r0) IF ID Ex Mem** WB dadd r2,r1,r0 IF ID **Ex Mem WB Table 8. An inevitable stall of one cycle: ld r1,n(r0) IF ID Ex Mem** WB dadd r2,r1,r0 IF ID RAW **Ex Mem Clearly: this is not possible 35 36

10 More generally, Suggestion Unlike arithmetic instructions: loads yield values only after the Mem stage of the pipeline so stalls at Ex cannot be avoided When possible, replace: dadd r3,r2,r1 ; some other, unrelated instruction ld r4,n(r0) dadd r6,r5,r4 ; stall - can't forward backwards! Suggestion With: ld r4,n(r0) dadd r3,r2,r1 ; some other, unrelated instruction dadd r6,r5,r4 ; doesn't stall - can forward from dadd Now: Note A good compiler (or you!) should be able to spot such stalls and reorder the operations. We spot such stalls by observing that an ALU instruction immediately follows a load upon which it depends. when the final dadd reaches Ex: Mem/WB.LMD is available for forwarding 39 40

11 Example Example Compile: First, spot the problem: int a = b + c; int d = e + f; Note to self: see psched1.s and psched2.s. ld r1,b(r0) ; a = b + c ld r2,c(r0) dadd r5,r1,r2 sd r5,a(r0) ld r1,e(r0) ; d = e + f ld r2,f(r0) dadd r5,r1,r2 sd r5,d(r0) Example Example Then, rewrite instructions such that there are no stalls: ld r1,b(r0) ; a = b + c ld r2,c(r0) dadd r5,r1,r2 ; stall, r2 not ready sd r5,a(r0) ld r1,e(r0) ; d = e + f ld r2,f(r0) dadd r5,r1,r2 ; stall, r2 not ready sd r5,d(r0) Well, it s helpful to use different registers: ld r1,b(r0) ; a = b + c ld r2,c(r0) dadd r5,r1,r2 ; stall, r2 not ready sd r5,a(r0) ld r3,e(r0) ; d = e + f ld r4,f(r0) dadd r5,r3,r4 ; stall, r4 not ready sd r5,d(r0) 43 44

12 Example No stalls: ld r1,b(r0) ld r2,c(r0) ld r3,e(r0) ; prevent stall (pulled up) dadd r5,r1,r2 ; no stall ld r4,f(r0) sd r5,a(r0) ; prevent stall (pushed down) dadd r5,r3,r4 ; no stall sd r5,d(r0) This is known as: pipeline scheduling In this case: use two extra registers avoid two stalls 13 cycles, instead of Aside Summary 1 The "13 versus 15 cycles" statement is misleading: it includes cycles for the pipeline to fill and empty Actually: disregarding the filling of the pipeline: it s 8 cycles, instead of 10 so a speedup of 1.25 Forwarding is simple: if the necessary data is available somewhere in the pipeline and when needed: then it can be forwarded to where it s needed The implementation in hardware of these strategies is an engineering decision: it is correct, in all cases, to stall the pipeline when such hazards are detected forwarding, however, improves performance at the cost of some additional complexity 47 48

13 Summary 2 Some types of (RAW) stall are unavoidable: however, it is often possible to reorder instructions such that they do not occur Done <script> (function() { var mathjax = 'mathjax/mathjax.js?config=asciimath'; // var mathjax = ' var element = document.createelement('script'); element.async = true; element.src = mathjax; element.type = 'text/javascript'; (document.getelementsbytagname('head')[0] document.body).appendchild(element); })(); </script> 49 50

CA226 Advanced Computer Architecture

CA226 Advanced Computer Architecture Stephen Blott Review of MIPS Instruction Set Table of Contents 1 2 Registers Memory Instructions r0 ; always 0 r1, r2,..., r31 ; general-purpose integer registers f0, f1, f2,...,