EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont

Size: px

Start display at page:

Download "EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont"

Frederick Ward
5 years ago
Views:

1 GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, niversity of Pennsylvania, and niversity of Wisconsin. Slide

2 Class Question Which of the following best explains why pipelining results in speedup? a) Instructions are executed with shorter latency b) Clock period is reduced c) ore instructions are executed at the same time d) agnets Slide 2

3 Announcements and Readings Reminder Lab #2 due Thursday before lab Get checked off during GSI/IA OH Verilog assignment #2 due Tuesday 9/25 Submit to autograder by :59p HW # due Tuesday 9/8 (tomorrow) Submit through Gradescope by :59p Arm at career fair Slide 3

4 Last Time Baseline processor discussion Review 5-stage pipeline from EECS 370 Slide 4

5 Today Hazards Detection Resolution Software (avoidance) Hardware (stalling, forwarding) Slide 5

6 Hazards Instruction pipelines are not ideal i.e. Instructions in different stages can have dependencies Suppose add 2 3 nand RAW!! add nand Inst 0 Inst t 0 t t 2 t 3 t 4 t 5 Ft 0 Dt Et 2 t 3 Wt 4 t 5 F FD DE E StallW E W F FD D E Stall DW E Why does this discourage deep pipelines? Slide 6

7 Terminology Dependency: Anything that requires the result of a previous instruction Pipeline Hazards: Anything that might cause a delay for given implementation ust ensure program dependences are not violated Hazard Resolution: Static ethod: Performed at compile time in software Dynamic ethod: Performed at run time using hardware Pipeline Interlock: Hardware mechanisms for dynamic hazard resolution ust detect and enforce dependences at run time Slide 7

8 Types of Dependencies and Hazards Data Dependence (Both memory and register) True dependence (RAW) Instruction must wait for all required input operands Anti-Dependence (WAR) Later write must not clobber a still-pending earlier read Output dependence (WAW) Earlier write must not clobber already-completed later write Last two are also called name dependences Control Dependence (aka Procedural Dependence) Conditional branches may change instruction sequence Instructions after cond. branch depend on outcome Structural Hazard Any limitation of hardware resources Slide 8

9 Hazard Distance Necessary Conditions for Data Hazards j:r k _ Reg Write j:r k _ Reg Write j:_ r k stage Reg Read stage Y i:r k _ Reg Write i:_ r k Reg Read i:r k _ Reg Write WAW Hazard WAR Hazard RAW Hazard dist(i,j) dist(,y)?? Hazard!! dist(i,j) > dist(,y)?? Safe Slide 9

10 Handling Data Hazards Avoidance (static) ake sure there are no hazards in the code Detect and Stall (dynamic) Stall until earlier instructions finish Detect and Forward (dynamic) Get correct value from elsewhere in pipeline Slide 0

11 Handling Data Hazards: Avoidance Programmer/compiler must know implementation details Insert nops between dependent instructions add 2 3 nop nop nand write R3 in cycle 5 read R3 in cycle 6 Slide

12 Problems with Avoidance Binary compatibility Code size New implementations may require more nops Higher instruction cache footprint Longer binary load times Worse in machines that execute multiple instructions / cycle Intel Itanium 25-40% of instructions are nops Slower execution CPI=, but many instructions are nops How to handle non-deterministic latencies? E.g. cache misses and floating-point? Slide 2

13 Handling Data Hazards: Detect & Stall Detection Stall Compare rega & regb with DestReg of preceding insn(s) n-bit comparators Do not advance pipeline register for Fetch/Decode Pass nop to Execute Slide 3

14 Register file Fetch Decode Execute emory WB PC Inst mem PC instruction rega regb R0 R R2 R3 R4 R5 R6 R7 0 PC vala valb offset A L target eq? AL result valb Data memory AL result mdata data dest Bits 0-2 Bits 6-8 Bits dest op dest op dest op IF/ ID ID/ E E/ em em/ WB 4

15 5 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB op dest offset valb vala PC PC target AL result op dest valb op dest AL result mdata eq? instruction 0 R2 R3 R4 R5 R R6 R0 R7 rega regb data dest Fetch Decode Execute emory WB

16 Register file End of Cycle PC Inst mem PC add 2 3 rega regb data R0 R R2 R3 R4 R5 R6 R PC vala valb offset A L target eq? AL result valb Data memory AL result mdata op op op IF/ ID ID/ E E/ em em/ WB 6

17 Register file End of Cycle 2 PC Inst mem PC nand rega regb data R0 R R2 R3 R4 R5 R6 R PC A L target eq? AL result valb Data memory AL result mdata add op op IF/ ID ID/ E E/ em em/ WB 7

18 Register file First half of cycle 3 PC Inst mem PC nand Hazard detection 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R PC A L target eq? AL result valb Data memory AL result mdata add op op IF/ ID ID/ E E/ em em/ WB 8

19 Hazard detected 3 compare compare compare compare rega regb 3 REG file IF/ ID ID/ E 9

20 Hazard detected compare rega regb 20

21 Register file First half of cycle 3 en en PC Inst mem 2 nand Hazard 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R A L target eq? AL result Data memory AL result mdata valb add IF/ ID ID/ E E/ em em/ WB 2

22 Register file End of cycle 3 PC Inst mem 2 nand rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory AL result mdata noop add IF/ ID ID/ E E/ em em/ WB 22

23 Register file First half of cycle 4 en en PC Inst mem 2 nand Hazard 3 rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory AL result mdata noop add IF/ ID ID/ E E/ em em/ WB 23

24 Register file End of cycle 4 PC Inst mem 2 nand rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L Data memory 2 noop noop add IF/ ID ID/ E E/ em em/ WB 24

25 Register file First half of cycle 5 PC Inst mem 2 nand No Hazard 3 rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L Data memory 2 noop noop add IF/ ID ID/ E E/ em em/ WB 25

26 Register file End of cycle 5 PC Inst mem 3 add rega regb data R0 R R2 R3 R4 R5 R6 R A L Data memory nand noop noop IF/ ID ID/ E E/ em em/ WB 26

27 Problems with Detect & Stall CPI increases on every hazard Are these stalls necessary? Not always! The new value for R3 is in the E/em register Reroute the result to the nand Called forwarding or bypassing Slide 27

28 Handling Data Hazards: Detect & Forward Detection Same as detect and stall, but each possible hazard requires different forwarding paths Forward Add data paths for all possible sources Add mux in front of AL to select source bypassing logic often a critical path in wide-issue machines # paths grows quadratically with machine width Slide 28

29 Sample Code Reminder Run the following code on a pipelined datapath: nand ; reg 5 = reg 3 ~& reg 4 add ; reg 7 = reg 6 reg 3 lw ; reg 6 = em[reg30] sw ; em[reg60] =reg 2 Slide 29

30 Register file First half of cycle 3 PC Inst mem 2 nand Hazard 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R A L Data memory add IF/ ID fwd fwd fwd ID/ E E/ em em/ WB 30

31 Register file End of cycle 3 PC Inst mem 3 add rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB 3

32 Register file First half of cycle 4 PC Inst mem 3 add New Hazard rega regb data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB 32

33 Register file End of cycle 4 PC Inst mem 4 lw rega regb 75 3 data R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB 33

34 Register file First half of cycle 5 PC Inst mem 4 lw No Hazard 3 rega regb 75 3 data R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB 34

35 Register file End of cycle 5 PC Inst mem 5 sw rega regb 67 5 data R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB 35

36 Register file First half of cycle 6 en en PC Inst mem 5 sw Hazard 6 rega regb 67 5 L data R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB 36

37 Register file End of cycle 6 PC Inst mem 5 sw rega regb 6 7 data R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB 37

38 Register file First half of cycle 7 PC Inst mem 5 sw Hazard 6 rega regb 6 7 data R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB 38

39 39 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB sw noop lw R2 R3 R4 R5 R R6 R0 R7 rega regb 6 data H3 End of cycle 7

40 40 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB sw noop lw R2 R3 R4 R5 R R6 R0 R7 rega regb 6 data H3 First half of cycle

41 4 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB sw 7 noop R2 R3 R4 R5 R R6 R0 R7 rega regb data H3 End of cycle 8

42 Control Hazards beq 0 sub beq sub t 0 t t 2 t 3 t 4 t 5 F D E W F D E W squash Slide 42

43 Handling Control Hazards Avoidance (static) No branches? Convert branches to predication Control dependence becomes data dependence Detect and Stall (dynamic) Stop fetch until branch resolves Speculate and squash (dynamic) Keep going past branch, throw away instructions if wrong Slide 43

44 Avoidance: if-conversion if (a == b) { x; y = n / d; } sub t a, b jnz t, PC2 add x x, # div y n, d sub t a, b add(t) x x, # div(t) y n, d sub t a, b add t2 x, # div t3 n, d cmov(t) x t2 cmov(t) y t3 Slide 44

45 Handling Control Hazards: Detect & Stall Detection In decode, check if opcode is branch or jump Stall Hold next instruction in Fetch Pass noop to Decode Slide 45

46 Problems with Detect & Stall CPI increases on every branch Are these stalls necessary? Not always! Branch is only taken half the time Assume branch is NOT taken Keep fetching, treat branch as noop If wrong, make sure bad instructions don t complete Slide 46

47 Handling Control Hazards: Speculate & Squash Speculate Assume branch is not taken Squash Overwrite opcodes in Fetch, Decode, Execute with noop Pass target to Fetch Slide 47

48 PC beq sub add nand Inst mem noop add IF/ ID Control REG file sign ext noop sub ID/ E equal A L noop beq E/ em Data memory beq em/ WB 48

49 Problems with Speculate & Squash Always assumes branch is not taken Can we do better? Yes. Predict branch direction and target! Why possible? Program behavior repeats. ore on branch prediction to come... Slide 49

50 Pipeline Hazard Checklist emory Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Register Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Control Dependences Slide 50

51 Instruction Level Parallelism Slide 5

52 Limitations of Scalar Pipelines pper Bound on Scalar Pipeline Throughput Limited by IPC= Flynn Bottleneck Inefficient nification Into Single Pipeline Long latency for each instruction Performance Lost Due to Rigid In-order Pipeline nnecessary stalls Slide 52

53 Architectures for Instruction-Level Parallelism Slide 53

54 Superscalar achine Slide 54

55 What is the real problem? CPI of in-order pipelines degrades very sharply if the machine parallelism is increased beyond a certain point, i.e., when Nx approaches average distance between dependent instructions Forwarding is no longer effective Pipeline may never be full due to frequent dependency stalls! Slide 55

56 ILP: Instruction-Level Parallelism ILP is a measure of the amount of inter-dependencies between instructions Average ILP =no. instruction / no. cyc required code: ILP = code2: ILP = 3 i.e. must execute serially i.e. can execute at the same time code: r r2 r3 r / 7 r4 r0 - r3 code2: r r2 r3 r9 / 7 r4 r0 - r0 Slide 56

57 Purported Limits on ILP Weiss and Smith [984].58 Sohi and Vajapeyam [987].8 Tjaden and Flynn [970].86 Tjaden and Flynn [973].96 ht [986] 2.00 Smith et al. [989] 2.00 Jouppi and Wall [988] 2.40 Johnson [99] 2.50 Acosta et al. [986] 2.79 Wedig [982] 3.00 Butler et al. [99] 5.8 elvin and Patt [99] 6 Wall [99] 7 Kuck et al. [972] 8 Riseman and Foster [972] 5 Nicolau and Fisher [984] 90 Slide 57

58 Scope of ILP Analysis ILP= ILP=3 r r2 r3 r / 7 r4 r0 - r3 r r2 r3 r9 / 7 r4 r0 - r20 ILP=.5 =2.0 more accurately Out-of-order execution exposes more ILP Slide 58

59 The Problem With In-Order Pipelines addf f0,f,f2 F D E E E W mulf f2,f3,f2 F D d* d* E* E* E* E* E* W subf f0,f,f4 F p* p* D E E E W What s happening in cycle 4? mulf stalls due to RAW hazard OK, this is a fundamental problem subf stalls due to pipeline hazard Why? subf can t proceed into D because mulf is there That is the only reason, and it isn t a fundamental one Why can t subf go into D in cycle 4 and E in cycle 5? Slide 59

Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1

Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining & Hazards Fall 2 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs4