Pipelining: Hazards Ver. Jan 14, 2014

Size: px

Start display at page:

Download "Pipelining: Hazards Ver. Jan 14, 2014"

Shon Mathews
5 years ago
Views:

1 POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: Simone Campanoni:

2 2 Back to the Intro to the Pipeline

3 Sequential vs. Pipelining Execution I1 I2 IF ID EX MEM IF ID EX MEM 10 ns 10 ns I1 IF ID EX MEM Time I2 2 ns IF ID EX MEM I3 2 ns IF ID EX MEM I4 2 ns IF ID EX MEM I5 2 ns IF ID EX MEM 3

4 Implementation of MIPS pipeline M U X ID Instruction Decode EX Execution MEM Memory Access Write Back +4 Adder IF /ID ID/E X Adder EX/MEM ME M/ WR 2- bit Left Shifter PC Read Address Instruction Ins truc tion Memory [25-21] [20-16] M [15-11] U X Register Read 1 Register Read 2 Register Write Write Data RF Content register 1 Content register 2 M U X OP ALU Zero Result Read Address Write Address Write Data WR RD Read Data Data Memory M U X IF Instruction Fetch [15-0] 16 bit Sign extension 32 bit 4

Note: Optimized Pipeline! Register File used in 2 stages: Read access during ID and write access during! What happens if read and write refer to the same register in the same clock cycle?

5 Note: Optimized Pipeline! Register File used in 2 stages: Read access during ID and write access during! What happens if read and write refer to the same register in the same clock cycle?! It is necessary to insert one stall From now on,! Optimized Pipeline: the RF read occurs in the second half of clock cycle and the RF write in the first half of clock cycle this is the Pipeline! What happens if read and write refer to the same register in the same clock cycle? we are going to use! It is not necessary to insert one stall 5

6 Note: Optimized Pipeline! Register File used in 2 stages: Read access during ID and write access during! What happens if read and write refer to the same register in the same clock cycle?! It is necessary to insert one stall! Optimized Pipeline: the RF read occurs in the second half of clock cycle and the RF write in the first half of clock cycle! What happens if read and write refer to the same register in the same clock cycle?! It is not necessary to insert one stall 6

7 Today 7

8 Outline! The Problem of Pipeline Hazards! Performance Issues in Pipelining 8

9 The Problem of Hazards! A hazard is created whenever there is a dependence between instructions, and instructions are close enough that the overlap caused by pipelining would change the order of access to the operands involved in the dependence.! Hazards prevent the next instruction in the pipeline from executing during its designated clock cycle.! Hazards reduce the performance from the ideal speedup gained by pipelining. 9

10 Three Classes of Hazards! Structural Hazards: Attempt to use the same resource from different instructions simultaneously! Example: Single memory for instructions and data! Data Hazards: Attempt to use a result before it is ready! Example: Instruction depending on a result of a previous instruction still in the pipeline! Control Hazards: Attempt to make a decision on the next instruction to execute before the condition is evaluated! Example: Conditional branch execution 10

11 Structural Hazards! No structural hazards in MIPS architecture:! Instruction Memory separated from Data Memory! Register File used in the same clock cycle: Read access by an instruction and write access by another instruction I1 I2 I3 I4 I5 IM REG A L DM REG U 2 ns IM REG A L DM REG U 2 ns IM REG A L DM REG U 2 ns IM REG A L DM REG U 2 ns Time IM REG A L DM REG U 11

12 Speed Up Equation for Pipelining CPI pipelined = Ideal CPI + Average Stall cycles per Inst Ideal CPI Pipeline depth Speedup = Ideal CPI + Pipeline stall CPI Cycle Cycle Time Time unpipelined pipelined For simple RISC pipeline, CPI = 1: Pipeline depth Speedup = 1 + Pipeline stall CPI Cycle Cycle Time Time unpipelined pipelined 12

13 Data Hazards! If the instructions executed in the pipeline are dependent, data hazards can arise when instructions are too close! Example: sub $2, $1, $3 # Reg. $2 written by sub and $12, $2, $5 # 1 operand ($2) depends on sub or $13, $6, $2 # 2 operand ($2) depend on sub add $14, $2, $2 # 1 ($2) & 2 ($2) depend on sub sw $15,100($2) # Base reg. ($2) depends on sub 13

14 Data Hazards in the Optimized Pipeline: Example sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns Time A IM REG DM REG L U Instruction order It is necessary to insert two stalls 14

15 Type of Data Hazard Read After Write (RAW) Instr J tries to read operand before Instr I writes it I: add r1,r2,r3 J: sub r4,r1,r3 Caused by a Dependence (in compiler nomenclature). This hazard results from an actual need for communication. 15

16 Data Hazards: Possible Solutions! Compilation Techniques:! Insertion of nop (no operation) instructions! Instructions Scheduling to avoid that correlating instructions are too close! The compiler tries to insert independent instructions among correlating instructions! When the compiler does not find independent instructions, it insert nops.! Hardware Techniques:! Insertion of bubbles or stalls in the pipeline! Data Forwarding or Bypassing 16

17 Just an example... sub and or add sw A IM REG DM REG L U 2 ns 2 ns Ordine di esecuzione delle istruzioni A IM REG DM REG L U A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns Tempo A IM REG DM REG L U I need to insert 2 bubbles or 2 stalls 17

18 Insertion of nops sub $2, $1, $3 nop nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) IF ID EX ME IF ID EX ME IF ID EX ME IF ID EX ME IF ID EX ME IF ID EX ME IF ID EX ME 18

19 Insertion of bubbles and stalls CK1 CK2 CK3 CK4 CK5 CK6 CK7 CK8 CK9 CK10 CK11 CK12 sub $2, $1, $3 IF ID EX MEM Time (clock cycles) and $12, $2, $5 IF bubble bubble ID EX MEM or $13, $6, $2 IF ID EX MEM add $14, $2, $2 IF ID EX MEM sw $15,100( $2 ) IF ID EX MEM Contenuto di $

20 Scheduling: Example sub $2, $1, $3 sub $2, $1, $3 and $12, $2, $5 add $4, $10, $11 or $13, $6, $2 and $7, $8, $9 add $14, $2, $2 lw $16, 100($18) sw $15,100($2) lw $17, 200($19) add $4, $10, $11 and $12, $2, $5 and $7, $8, $9 or $13, $6, $2 lw $16, 100($18) add $14, $2, $2 lw $17, 200($19) sw $15,100($2) 20

Forwarding! Data forwarding uses temporary results stored in the pipeline registers instead of waiting for the write back of results in the RF.

21 Forwarding! Data forwarding uses temporary results stored in the pipeline registers instead of waiting for the write back of results in the RF.! We need to add multiplexers at the inputs of ALU to fetch inputs from pipeline registers to avoid the insertion of stalls in the pipeline. 21

22 Forwarding: Example EX/EX path MEM/EX path sub $2, $1, $3 IF ID EX ME MEM/ID path and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) IF ID EX ME IF ID EX ME IF ID EX ME IF ID EX ME 22

23 Implementation of MIPS with Forwarding Unit IF/ID ID/EX EX/MEM MEM/ P C Instr Memory I n s t r u c t i o n path Reg. M u x M u x M u x M u x A L U Data Memory M u x I F / I D. R e g i s t e r R s R s I F / I D. R e g i s t e r R t R t I F / I D. R e g i s t e r R t I F / I D. R e g i s t e r R d R t R d M u x E X / M E M. R e g i s t e r R d MEM/ID path F o r w a r d i n g u n i t M E M / W B. R e g i s t e r R d MEM/EX path EX/EX path 23

24 Data Hazards: Load/Use Hazard L1: lw $s0, 4($t1) # $s0 <- M [4 + $t1] L2: add $s5, $s0, $s1 # 1 operand depends from L1 " CK1 CK2 CK3 CK4 CK5 CK6 CK7 lw $s0, 4($t1) IF ID EX MEM add $s5,$s0,$s1 IF ID EX MEM 24

25 Data Hazards: Load/Use Hazard! With forwarding using the MEM/EX path: 1 stall CK1 CK2 CK3 CK4 CK5 CK6 CK7 lw $s0, 4($t1) IF ID EX MEM add $s5,$s0,$s1 IF ID EX MEM 25

26 Data Hazards: Load/Store L1: lw $s0, 4($t1) # $s0 <- M [4 + $t1] L2: sw $s0, 4($t2) # M[4 + $t2] <- $s0 " CK1 CK2 CK3 CK4 CK5 CK6 CK7 lw $s0, 4($t1) IF ID EX MEM sw $s0, 4($t2) IF ID EX MEM Contenuto di $s / Ø Without forwarding : 3 stalls 26

27 ! Forwarding: Stall = 0 Data Hazards: Load/Store! We need a forwarding path to bring the load result from the memory (in MEM/) to the memory s input for the store. CK1 CK2 CK3 CK4 CK5 CK6 CK7 lw $s0, 4($t1) IF ID EX MEM sw $s0, 4($t2) IF ID EX MEM 27

28 Implementation of MIPS with Forwarding Unit IF/ID ID/ EX/MEM MEM/ EX P C Memoria Istruzioni I n s t r u c t i o n path Reg. M u x M u x M u x M u x A L U M u x Mem. Dati M u x I F / I D. R e g i s t e r R s R s I F / I D. R e g i s t e r R t R t MEM/MEM path I F / I D. R e g i s t e r R t I F / I D. R e g i s t e r R d R t R d M u x E X / M E M. R e g i s t e r R d MEM/ID path F o r w a r d i n g u n i t M E M / W B. R e g i s t e r R d MEM/EX path EX/EX path Ø Ø Ø Ø EX/EX path MEM/EX path MEM/ID path MEM/MEM path 28

29 Forwarding to Avoid LW-SW Data Hazard Time (clock cycles) I n s t r. add r1,r2,r3 lw r4, 0(r1) Ifetch Reg Ifetch ALU Reg DMem ALU Reg DMem Reg O r d e r sw r4,12(r1) or r8,r6,r9 Ifetch Reg Ifetch ALU Reg DMem ALU Reg DMem Reg xor r10,r9,r11 Ifetch Reg ALU DMem Reg 29

30 Data Hazard Even with Forwarding Time (clock cycles) I n s t r. lw r1, 0(r2) sub r4,r1,r6 Ifetch Reg Ifetch ALU Reg DMem ALU Reg DMem Reg O r d e r and r6,r1,r7 or r8,r1,r9 Ifetch Reg Ifetch ALU Reg DMem ALU Reg DMem Reg 30

31 Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e f; assuming a, b, c, d,e, and f in memory. Slow code: LW LW ADD SW LW LW Rb,b Rc,c Ra,Rb,Rc a,ra Re,e Rf,f SUB Rd,Re,Rf SW d,rd Fast code: LW LW LW ADD LW SW SUB Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf SW d,rd 31 Compiler optimizes for performance. Hardware checks for safety.

32 Data Hazards! Data hazards analyzed up to now are:! RAW (READ AFTER WRITE) hazards: instruction n+1 tries to read a source register before the previous instruction n has written it in the RF.! Example: add $r1, $r2, $r3 sub $r4, $r1, $r5! By using forwarding, it is always possible to solve this conflict without introducing stalls, except for the load/use hazards where it is necessary to add one stall 32

33 Data Hazards! Other types of data hazards in the pipeline:! WAW (WRITE AFTER WRITE)! WAR (WRITE AFTER READ) 33

34 Data Hazards: WAW (WRITE AFTER WRITE)! Instruction n+1 tries to write a destination operand before it has been written by the previous instruction n write operations executed in the wrong order! This type of hazards could not occur in the MIPS pipeline because all the register write operations occur in the stage 34

35 Data Hazards: WAW (WRITE AFTER WRITE)! Example: If we assume the register write in the ALU instructions occurs in the fourth stage and that load instructions require two stages (MEM1 and MEM2) to access the data memory, we can have: CK1 CK2 CK3 CK4 CK5 CK6 CK7 lw $r1, 0($r2) IF ID EX MEM 1 MEM2 add $r1,$r2,$r3 IF ID EX 35

36 Data Hazards: WAW (WRITE AFTER WRITE)! Example: If we assume the floating point ALU operations require a multi-cycle execution, we can have: CK1 CK2 CK3 CK4 CK5 CK6 CK7 CK8 mul $f6,$f2,$f2 IF ID MUL1 MUL2 MUL3 MUL4 MEM add $f6,$f2,$f2 IF ID AD1 AD2 MEM 36

37 WAW Data Hazards! Write After Write (WAW) Instr J writes operand before Instr I writes it. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7! Called an output dependence by compiler writers This also results from the reuse of name r1.! Can t happen in MIPS 5 stage pipeline because:! All instructions take 5 stages, and! Writes are always in stage 5! Will see WAR and WAW in more complicated pipes 37

38 Data Hazards: WAR (WRITE AFTER READ)! Instruction n+1 tries to write a destination operand before it has been read from the previous instruction n instruction n reads the wrong value.! This type of hazards could not occur in the MIPS pipeline because the operand read operations occur in the ID stage and the write operations in the stage.! As before, if we assume the register write in the ALU instructions occurs in the fourth stage and that we need two stages to access the data memory, some instructions could read operands too late in the pipeline. 38

39 Data Hazards: WAR (WRITE AFTER READ)! Example: Instruction sw reads $r2 in the second half of MEM2 stage and instruction add writes $r2 in the first half of stage sw reads the new value of $r2. CK1 CK2 CK3 CK4 CK5 CK6 CK7 sw $r1, 0($r2) IF ID EX MEM 1 MEM2 add $r2, $r3, $r4 IF ID EX 39

40 WAR Data Hazards! Write After Read (WAR) Instr J writes operand before Instr I reads it I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7! Called an anti-dependence by compiler writers. This results from reuse of the name r1.! Can t happen in MIPS 5 stage pipeline because:! All instructions take 5 stages, and! Reads are always in stage 2, and! Writes are always in stage 5 40

41 Performance Issues in Pipelining! Pipelining increases the CPU instruction throughput (number of instructions completed per unit of time), but it does not reduce the execution time (latency) of a single instruction.! Pipelining usually slightly increases the latency of each instruction due to imbalance among the pipeline stages and overhead in the control of the pipeline.! Imbalance among pipeline stages reduces performance since the clock can run no faster than the time needed for the slowest pipe stage.! Pipeline overhead arises from pipeline register delay and clock skew. 41

42 Performance Issues in Pipelining! The average instruction execution time for the unpipelined processor is: Ave. Exec. Time Unpipelined = Ave. CPI Unp. x Clock Cycle Unp. Pipeline Speedup = Ave. Exec. Time Unpipelined = Ave. Exec. Time Pipelined = Ave. CPI Unp. x Clock Cycle Unp. = Ave. CPI Pipe Clock Cycle Pipe 42

43 Performance Issues in Pipelining! The ideal CPI on a pipelined processor is almost always 1, but stalls cause the pipeline performance to degrade form the ideal performance, so we have: Ave. CPI Pipe = Ideal CPI + Pipe Stall Cycles per Instruction = 1 + Pipe Stall Cycles per Instruction! Pipe Stall Cycles per Instruction are due to Structural Hazards + Data Hazards + Control Hazards 43

44 Performance Issues in Pipelining! If we ignore the cycle time overhead of pipelining and we assume the stages are perfectly balanced, the clock cycle time of two processors can be equal, so: Pipeline Speedup = Ave. CPI Unp. 1 + Pipe Stall Cycles per Instruction! Simple case: All instructions take the same number of cycles, which must also equal to the number of pipeline stages (called pipeline depth): Pipeline Speedup = Pipeline Depth 1 + Pipe Stall Cycles per Instruction! If there are no pipeline stalls (ideal case), this leads to the intuitive result that pipelining can improve performance by the depth of the pipeline. 44

45 ! CI = # of Instructios Performances! # Clock Cycles = CI + # Stall Cycles + 4! CPI = # Clock Cycles / CI = (CI + # Stall Cycles + 4) / CI! MIPS = f clock / (CPI * 10 6 ) 45

Performance Time sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Instruction order A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns A IM REG DM

46 Performance Time sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Instruction order A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns A IM REG DM REG L U 2 ns A IM REG DM REG L U It is necessary to insert two stalls IC = 5 # Clock Cycles = IC + # Stall Cycles + 4 = = 11 CPI = # Clock Cycles/ CI = 11 / 5 = 2.2 MIPS = f clock / (CPI * 10 6 ) = 500 MHz / 2.2 * 10 6 =

47 Asymptotic Performance! We have n iterations of a cycle defined using m instructions. We have k stalls for the m instructions! IC AS = m * n! # Clock Cycles = IC AS + (# Stall Cycles ) AS + 4! CPI AS = lim n -> ( IC AS + # Stall Cycles AS +4) /IC AS = lim n -> ( m *n + k * n + 4 ) / m * n = (m + k) / m! MIPS AS = f clock / (CPI AS * 10 6 ) 47

48 Questions

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!