Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1

Size: px

Start display at page:

Download "Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1"

Adrian Williams
5 years ago
Views:

1 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining & Hazards Fall 2 Prof. Thomas Wenisch edu/courses/eecs4 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, ipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, and niversity of Wisconsin. Slide

2 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Announcements Reminders: HW # due Friday 9/4 Hand in at start of discussion Programming assignment # due Friday 9/4 Electronic hand in by midnight Tools: problem with vcs GI You can use the GI on relic or sunlogin Be sure to provide relative path to 4submit Slide 2

3 Readings Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar For today: Cramming ore Components onto ICs. G.E. oore H & P Chapter A. A.6 For onday: No new readings Slide 3

4 Outline: Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar nderstanding di the Execution Core. 3 s 5 stage pipeline (review) 2. Implementing pipeline interlocks (review) 3. Scoreboard scheduling (CDC 66) 4. Tomasulo s s OoO schedulingalgorithm (IB36) 5. Precise interrupts with a Reorder Buffer (P6) 6. odern OoO (IPS RK, Netburst) Slide 4

5 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Pipelining Idealism otivation: Increase throughput with little increase in hardware Repetition of Identical Operations The same operations are to be performed repeatedly on a large number of different inputs niform Suboperations The operation to be pipelined can be evenly partitioned into uniform latency suboperations Repetition of Independent Operations All the repetitions of the same operation are mutually independent, i.e. no data dependence and no resource conflicts Good Examples: automobile assembly line floating point multiplier, but instruction pipeline??? Slide 5

6 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Pipeline Illustrated: Comb. ogic n Gate Delay BW = ~(/n) n -- 2 Gate Delay n -- Gate 2 Delay BW = ~(2/n) n Gate n Gate n Gate --Delay --Delay Delay BW = ~(3/n) Slide 6

7 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Instruction Pipeline Design Identical operations? unifying instruction types coalescing instruction types into one multi function pipe minimize external fragmentation (some idling stages) niform Suboperations? balance pipeline stages stage quantization to yield balanced stages minimize internal fragmentation (some waiting stages) Independent operations? resolve data and resource hazards inter instruction dependency detection and resolution minimize performance loss Slide

8 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar 3 Processor Pipeline Review Fetch th Decoded Executet emory (Write-back) 4 PC I-cache Reg File A D-cache T pipeline = T base / 5 Slide 8

9 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage : Fetch Fetch an instruction from memory every cycle. se PC to index memory Increment PC (assume no branches for now) Write state to the pipeline register (IF/) The next stage will read this pipeline register. Note that pipeline register must be edge triggered Slide 9

10 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC ed datap path en PC Instruction emory/ Cache structio on bits In Rest of pipelin en IF / Pipeline register Slide

11 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 2: Decode Decodes opcode bits ay set up control signals for later stages Read input operands from registers file specified by rega and regb of instruction bits Write state to the pipeline register (/E) Opcode Register contents Offset & destination fields PC (even though decode didn t use it) Slide

12 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Contents Of regb Contents Of rega Rest of pipelined datapath PC rega regb Destreg Data Register File en Stage : Fetch datapath Instruction PC bits Control Signals IF / Pipeline register / E Pipeline register Slide 2

13 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 3: Execute Perform A operation. Input operands can be: Contents of rega or RegB Offset field on the instruction Branches: calculate PCoffset Write state to the pipeline register (E/em) A result, contents of RegB and PCoffset Instruction bits for opcode and destreg specifiers Slide 3

14 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar A 2: Decode datapath Contents Of regb Contents Of rega PC contents A PC of regb Result offset Rest of pipelined datapath Stage Control Signals Control Signals / E Pipeline register E/em Pipeline register Slide 4

15 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 4: emory Operation Perform data cache access for memory ops A result contains address for ld and st Opcode bits control mem R/W and enable signals Write state to the pipeline register (em/wb) A result and emdata Instruction bits for opcode and destreg specifiers Slide 5

16 This goes back to the before the PC in stage. control for PC input Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 3: Execute datapath contents Alu PC of regb Result offset emory Read Data Alu Result Rest of pipelined datapath Data emory en R/W Control Signals Control Signals E/em Pipeline register em/wb Pipeline register Slide 6

17 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 5: Write back Writing result to register file (if required) Write emdata to destreg for ld instruction Write A result to destreg for arithmetic instruction Opcode bits control register write enable signal Slide

18 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar 4: em mory dat tapath t Alu Resul emory Re ead Data This goes back to data input of register file Contro ol Signal ls Stage em/wb Pipeline register This goes back to the destination register specifier register write enable bits -2 bits Slide 8

19 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Sample Code (Simple) Run the following code on a pipelined datapath: add 2 3 ; reg 3 = reg reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg 4 = em[reg22] add ; reg 5 = reg 2 reg 5 sw 3 ; em[reg3] =reg Slide 9

20 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC instruction rega regb ster file Regi R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata data dest IF/ Bits -2 Bits 6-8 Bits dest op / E dest op E/ em dest op em/ WB Slide 2

21 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Initial State Inst mem noop IF/ ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R noop / E A noop E/ em Data memory noop em/ WB data dest Slide 2

22 add 2 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem Fetch: add 2 3 Time: add 2 3 IF/ ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R noop / E A noop E/ em Data memory noop em/ WB data dest Slide 22

23 nand add 2 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem Fetch: nand Time: 2 2 nand IF/ 2 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R add / E A noop E/ em Data memory noop em/ WB data dest Slide 23

24 lw nand add 2 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem Fetch: lw Time: 3 3 lw IF/ 4 5 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R nand / E A add E/ em Data memory noop em/ WB data dest Slide 24

25 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar add lw nand add 2 3 PC Inst mem Fetch: add Time: 4 4 add IF/ 2 4 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R lw / E A nand E/ em Data memory 45 3 add em/ WB data dest Slide 25

26 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add lw nand add PC Inst mem Fetch: sw 3 Time: 5 5 sw 3 IF/ 2 5 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R add / E A lw -3 Data memory E/ em -3 6 nand 45 em/ WB data dest Slide 26

27 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add lw nand PC Inst mem No more instructions Time: 6 IF/ 3 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R sw / E A add Data memory E/ em 4 lw -3 em/ WB data dest Slide 2

28 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add lw PC Inst mem No more instructions Time: IF/ ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R / E A sw 6 Data memory E/ em 5 add 99 em/ WB data dest Slide 28

29 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add PC Inst mem Regi ster file R R R2 R3 R4 R5 R6 R A Data memory 55 6 data dest No more instructions Time: 8 IF/ Bits -2 Bits 6-8 Bits / E E/ em sw 5 em/ WB Slide 29

30 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw PC Inst mem Regi ster file R R R2 R3 R4 R5 R6 R A Data memory data dest No more instructions Bits -2 Bits 6-8 Bits Time: 9 IF/ / E E/ em em/ WB Slide 3

31 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Time graphs Time: add nand lw add sw fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback Slide 3

32 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Balancing Pipeline Stages IF T IF = 6 units Without pipelining T cyc T IF T T E T E T WB = 3 E T = 2 units Pipelined T cyc max{t IF,T,T E,T E,T WB } T E= 9 units = 9 E T E = 5 units Speedup= 3 / 9 WB T WB = 9 units Can we do better in terms of either performance or efficiency? Slide 32

33 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Granularity of Pipeline Stages Coarser-Grained achine Cycle: 4 machine cyc / instruction cyc Finer-Grained achine Cycle: machine cyc / instruction cyc IF IF T IF& = 8 units IF IF DEAY DEAY 2 3 E 2 T E = 9 units E E DEAY DEAY 4 5 DEAY DEAY 6 E 3 T E = 5 units E E E2 E2 8 WB 4 T WB = 9 units WB WB DEAY DEAY 9 T cyc = 3 units Slide 33

34 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Balancing Pipeline Stages Two ethods for Stage Quantization: erging of multiple subcomputations into one Subdividing a subcomputation into multiple subcomputations Current Trends: Deeper pipelines (more and more stages) ultiplicity of different (subpipelines) Pipelining of memory access (tricky) Slide 34

35 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Why Not Deeper Pipelines Instruction pipelines are not ideal i.e. Instructions in different stages can have dependenciesd Suppose add 2 3 nand RAW!! t t t 2 t 3 t 4 t 5 add Ft Dt Et 2 t 3 Wt 4 t 5 nand Inst F FD DE E StallW E W Inst F FD D E Stall DW E Slide 35

36 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Types of Dependencies and Hazards Data Dependence (Both memory and register) True dependence (RAW) Instruction must wait for all required input operands Anti Dependence (WAR) aterwrite must not clobber a still pending earlier read Output dependence (WAW) Earlier write must not clobber an already finished later write Control Dependence (aka Procedural Dependence) Conditional branches cause uncertainty to instruction sequencing Instructions following a conditional branch depends on the resolution of the branch instruction (more exact definition later) Slide 36

37 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Terminology Pipeline Hazards: Potential violations of program dependences ust ensure program dependences are not violated Hazard dresolution: Static ethod: Performed at compiled time in software Dynamic ethod: Performed at run time using hardware Pipeline Interlock: Hardware mechanisms for dynamic hazard resolution ust detect and enforce dependences at run time Slide 3

38 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Necessary Conditions for Data Hazards stage j:r k _ Reg Write j:r k _ Reg Write j:_ r k Reg Read stage Y i:r k _ Reg Write i:_ r k Reg Read i:r k _ Reg Write Hazard Distance WAW Hazard WAR Hazard RAW Hazard dist(i,j) dist(,y)?? Hazard!! dist(i,j) > dist(,y)?? Safe Slide 38

39 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards Avoidance (static) ake sure there are no hazards in the code Detect and Stall (dynamic) Stall until earlier instructions finish Detect and Forward (dynamic) Get correct value from elsewhere in pipeline Slide 39

40 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards: Avoidance Programmer/compiler must know implementation details Insert noops between dependent instructions add 2 3 write R3 in cycle 5 noop noop nand read R3 in cycle 6 Slide 4

41 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Avoidance Binary compatability New implementations may require more noops Code size Higher instruction cache footprint onger binary load times Worse in machines that execute multiple instructions / cycle Intel Itanium 25 4% of instructions are noops Slower execution CPI=, but many instructions are noops Slide 4

42 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards: Detect t & Stall Detection Compare rega & regb with DestReg of preceding insn. 3 bit comparators Stall Do not advance pipeline register for Fetch/Decode Pass noop to Execute Slide 42

43 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, Fetch Decode Execute ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar emory WB PC Inst mem PC in nstruction rega regb Re egister file R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata data dest Bits -2 Bits 6-8 Bits dest op dest op dest op IF/ / E E/ em em/ WB Slide 43

44 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, Fetch Decode Execute ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar emory WB PC Inst mem PC in nstruction rega regb Re egister file R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata data dest dest dest dest op op op IF/ / E E/ em em/ WB Slide 44

45 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, Fetch Decode Execute ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar emory WB PC Inst mem PC in nstruction rega regb data Re egister file R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata IF/ op op op fwd fwd fwd / E E/ em em/ WB Slide 45

46 End of Cycle Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC add 2 3 rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 PC vala valb offset A target eq? A result valb Data memory A result mdata op op op IF/ / E E/ em em/ WB Slide 46

47 End of Cycle 2 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC n and rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 PC 4 3 A target eq? A result valb Data memory A result mdata add op op IF/ / E E/ em em/ WB Slide 4

48 First half of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC Hazard detection na and rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 PC 4 3 A target eq? A result valb Data memory A result mdata add op op IF/ / E E/ em em/ WB Slide 48

49 Hazard detected Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar 3 compare compare compare compare rega regb 3 REG file IF/ / E Slide 49

50 Hazard detected Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar compare rega regb 3 Slide 5

51 First half of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar en en PC Inst mem 2 na and Hazard 3 3 rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 4 A target eq? A result Data memory A result mdata valb add IF/ / E E/ em em/ WB Slide 5

52 End of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 2 n and rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A 2 Data memory A result mdata noop add IF/ / E E/ em em/ WB Slide 52

53 First half of cycle 4 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar en en PC Inst mem 2 n and Hazard 3 rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A 2 Data memory A result mdata noop add IF/ / E E/ em em/ WB Slide 53

54 End of cycle 4 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 2 n and rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A Data memory 2 noop noop add IF/ / E E/ em em/ WB Slide 54

55 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 5 PC Inst mem 2 n and No Hazard 3 rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A Data memory 2 noop noop add IF/ / E E/ em em/ WB Slide 55

56 End of cycle 5 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 3 a dd 3 5 rega regb data Re egister file R R R2 R3 R4 R5 R6 R A Data memory nand noop noop IF/ / E E/ em em/ WB Slide 56

57 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Detect & Stall CPI increases on every hazard Are these stalls necessary? Not always! The new value for R3 is in the E/em register Reroute the result to the nand Called forwarding or bypassing Slide 5

58 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards: Detect t & Forward Detection Same as detect and stall, but each possible hazard requires different forwarding paths Forward Add data paths for all possible sources Add mux in front of A to select source bypassing logic often a critical path in wide issue machines # paths grows quadratically with machine width Slide 58

59 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, add 2 3 // r3 = r ipasti, r2artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar nand // r5 = r3 NAND r4 add 6 3 // r = r3 r6 lw 3 6 // r6 = E[r3] sw // E[r62]=r2 r add nand add lw sw Slide 59

60 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 3 PC Inst mem 2 na and Hazard 3 3 rega regb data Re egister file R R R2 R3 R4 R5 R6 R A Data memory add IF/ fwd fwd fwd / E E/ em em/ WB Slide 6

61 End of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 3 a dd rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R A 2 Data memory nand add IF/ H / E E/ em em/ WB Slide 6

62 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 4 PC Inst mem 3 add 6 3 New Hazard R 4 8 R rega R2 3 regb R3 R4 3 R5 data R6 5 Re egister file R 2 2 A 2 Data memory nand add IF/ H / E E/ em em/ WB Slide 62

63 End of cycle 4 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 4 lw 3 6 rega regb 5 3 data Re egister file R R R2 R3 R4 R5 R6 R A -2 Data memory 2 add nand add IF/ H2 / E H E/ em em/ WB Slide 63

64 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 5 PC Inst mem 4 lw 3 6 No Hazard 3 rega regb 5 3 data Re egister file R R R2 R3 R4 R5 R6 R A -2 Data memory 2 add nand add IF/ H2 / E H E/ em em/ WB Slide 64

65 End of cycle 5 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 5 sw rega regb 6 5 data Re egister file R R R2 R3 R4 R5 R6 R A 22 Data memory -2 lw add nand IF/ / E H2 E/ em H em/ WB Slide 65

66 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 6 en en PC Inst mem 5 sw Hazard 6 rega regb 6 5 data Re egister file R R R2 R3 R4 R5 R6 R A 22 Data memory -2 lw add nand IF/ / E H2 E/ em H em/ WB Slide 66

67 End of cycle 6 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 5 sw rega regb 6 data Re egister file R R R2 R3 R4 R5 R6 R A 3 Data memory 22 noop lw add IF/ / E E/ em H2 em/ WB Slide 6

68 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle PC Inst mem 5 sw Hazard 6 rega regb 6 data Re egister file R R R2 R3 R4 R5 R6 R A 3 Data memory 22 noop lw add IF/ / E E/ em H2 em/ WB Slide 68

69 End of cycle Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem rega regb 6 data Re egister file R R R2 R3 R4 R5 R6 R A Data memory 99 sw noop lw IF/ H3 / E E/ em em/ WB Slide 69

70 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 8 PC Inst mem rega regb 6 data Re egister file R R R R3 R4 R5 R6 R 2 2 A Data memory 99 sw noop lw IF/ H3 / E E/ em em/ WB Slide

71 End of cycle 8 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem rega regb data Re egister file R R R2 R3 R4 R5 R6 R A Data memory sw noop IF/ / E H3 E/ em em/ WB Slide

72 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar oad Delay Slot (IPS R2) t t t 2 t 3 t 4 t 5 i: F D E W j: F D E W F D E W k: h: R k -- i: R k E[ - ] - The effect of a delayed oad is not visible to the instructions in its delay slots. j: -- R k Which (R k: -- R k ) do we really mean? k Slide 2

73 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Control Hazards beq sub beq sub t t t 2 t 3 t 4 t 5 F D E W F D E W squash Slide 3

74 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards Avoidance (static) No branches? Convert branches to predication Control dependence becomes data dependence Detect andstall (dynamic) Stop fetch until branch resolves Speculate and squash (dynamic) Keep going past branch, throw away instructions if wrong Slide 4

75 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Avoidance: if-conversion if (a == b) { sub t ab a, x; jnz t, PC2 y = n / d; add x x, # } div y n, d sub t a, b sub t a, b add(t) x x, # add t2 x, # div(t) y n, d div t3 n, d cmov(t) x t2 cmov(t) y t3 Slide 5

76 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Control Hazards: Detect t & Stall Detection In decode, check if opcode is branch or jump Stall Hold next instruction in Fetch Pass noop to Decode Slide 6

77 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Detect & Stall CPI increases on every branch Are these stalls necessary? Not always! Branch is only taken half the time Assume branch is NOT taken Keep fetching, treat branch as noop If wrong, make sure bad instructions don t complete Slide

78 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Control Hazards: Speculate & Squash Speculate Assume branch is not taken Squash Overwrite opcodes in Fetch, Decode, Execute with noop Pass target to Fetch Slide 8

79 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC beq sub add nand Inst mem noop add IF/ Control REG file sign ext noop sub / E equal A noop beq E/ em Data memory beq em/ WB Slide 9

80 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Speculate & Squash Alwaysassumes assumes branch is not taken Can we do better? Yes. Predict branch direction and target! Why possible? Program behavior repeats. ore on branch prediction to come... Slide 8

81 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Branch Delay Slot (IPS, SPARC) branch: next: target: t t t 2 t 3 t 4 t 5 F D E W F Squash F D E W - Instruction in delay slot executes even on taken branch branch: delay: target: F D E W F D E W F D E W i: beq, 2, tgt j: add 3, 4, 5 What can we put here? Slide 8

82 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Pipeline Hazard Checklist emory Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Register Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Control Dependences Slide 82

83 Sequential Code Semantics Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Instruction Dependences i: xxxx i i2: xxxx i2 i3: xxxx i3 and Pipeline Hazards A true dependence between two instructions may only involve one substep of each instruction. i: i2: The implied sequential precedences are overspecifications. It is sufficient but not necessary to ensure program correctness. i3: Slide 83

EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont

EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith,