TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

Size: px
Start display at page:

Download "TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction"

Transcription

1 Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard in a processor pipeline? What are the five stages of the IPS pipeline?

2 Today: Friday the 2st of October 4.7 Data Hazards: Forwarding vs Stalling 4.8 Control Hazards (branches) 4.9 Eceptions

3 Datapath with control from last week

4 Data Hazards and Forwarding There are dependencies in the seqence to the left All the for last instrctions se register $2 Assme that $2 is before the sb instrction and that $-$3 is -2 The and instrction shold se -2 for $2, bt reads from the register file The or instrction also reads from the register file The add and sw instrctions read the correct vale -2 from the register file. sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

5 CC sb $2, $, $3... $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

6 CC 2 and $2, $2, $5 sb $2, $, $3.. $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

7 CC 3 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3. $2 = and instrction reads for $2 stores in ID/EX sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

8 CC 4 add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = or instrction reads for $2 and stores it in ID/EX and instrction ses in Al operation sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

9 CC 5 sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = (/)-2 add reads new vale 2 for $2 from register file or ses in Al operation sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

10 CC 6. sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 $2 = 2 sw reads new vale 2 for $2 from register file add ses 2 in al operation and writes register $2 with vale calclated with $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

11 When is the needed and is it prodced? sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

12 CC sb $2, $, $3... $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

13 CC 2 and $2, $2, $5 sb $2, $, $3.. $2 = sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

14 CC 3 EX/E.AlOt gets new vale for $2 (2) or $3, $6, $2 and $2, $2, $5 sb $2, $, $3. $2 = and instrction reads for $2 stores in ID/EX sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

15 CC 4 ALU needs new $2 vale available in EX/E add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = or instrction reads for $2 and stores it in ID/EX and instrction can se 2 from EX/E.AlOt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

16 CC 5 - ALU needs new $2 vale, now available in E/WB sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 $2 = (/)-2 add reads new vale 2 for $2 from register file or can se 2 from E/WB.AlOt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

17 How can a need for forwarding be detected?

18 Detecting hazards - types of hazard conditions Notation: EX/E.RegisterRd (RegisterRd in EX/E register) EX/E.AlOt (Regsiter ALUOt in EX/E register) Hazard conditions: a) EX/E.RegisterRd = ID/EX.RegisterRs b) EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.RegisterRd = ID/EX.RegisterRs 2b) E/WB.RegisterRd = ID/EX.RegisterRt

19 CC 4 Which hazard type add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 a) EX/E.RegisterRd = ID/EX.RegisterRs b) EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.RegisterRd = ID/EX.RegisterRs 2b) E/WB.RegisterRd Rd = ID/EX.RegisterRt Rt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

20 CC 5 - Which hazard type sw $5, ($2) add $4, $2, $2 or $3, $6, $2 and $2, $2, $5 sb $2, $, $3 a) EX/E.RegisterRd = ID/EX.RegisterRs b) EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.RegisterRd = ID/EX.RegisterRs 2b) E/WB.RegisterRd Rd = ID/EX.RegisterRt Rt sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

21 hazards detection contined (r-type) Detection is performed in the EX state There is no hazard if the previos instrction will not write the reslt Reg for the earlier instrction mst be asserted $ is always and a write to $ will not create dependencies a) EX/E.Reg and EX/E.RegisterRd!= and EX/E.RegisterRd = ID/EX.RegisterRS b) EX/E.Reg and EX/E.RegisterRd!= and EX/E.RegisterRd = ID/EX.RegisterRt 2a) E/WB.Reg and E/WB.RegisterRd!= and E/WB.RegisterRd = ID/EX.RegisterRS 2b) E/WB.Reg and E/WB.RegisterRd!= and E/WB.RegisterRd = ID/EX.RegisterRt

22 Seqence with forwarding Dependence between pipeline registers and the inpts to the ALU Reqired eists in time for later instrctions It is possible to spply the inpts to the ALU needed by the and instrction and or instrction by forwarding the reslts fond in the pipeline registers

23 Adding forwarding logic If inpts to the ALU can be taken from any pipeline register proper can be forwarded By adding mltipleers to the inpt of the ALU the pipeline can be rn at fll speed in presence of dependenciesd

24 We mst not forget the immediate vales

25 Control lines from forwarding nit control Sorce Eplanation ForwardA = ID/EX Al operand A comes from the register file ForwardA = EX/E Al operand A comes from previos cycle ALU reslt ForwardA = E/WB Al operand A comes from previos cycle memory read or earlier ALU reslt ForwardB = ID/EX Al operand B comes from the register file ForwardB = EX/E Al operand B comes from previos cycle ALU reslt ForwardB = E/WB Al operand B comes from previos cycle memory read or earlier ALU reslt a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= 2a) If (E/WB.Reg and (E/WB.RegisterRd ) and (E/WB.RegisterRd = ID/EXRegisterRs)) ForwardA <= 2b) If (E/WB.Reg and (E/WB.RegisterRd ) and (E/WB.RegisterRd = ID/EXRegisterRt)) ForwardB <=

26 CC add $, $, $2... add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

27 CC 2 add $, $, $3 add $,. $, $2.. add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

28 CC 3 add $, $, $4 add $,. $, $3 add $,. $, $2. add $, $, $3 reads old vale from register file add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

29 CC 4 add $, $, $5 add $,. $, $4 add $,. $, $3 add $,. $, $2 add $, $, $4 reads old vale from register file add $, $, $2 add $, $, $3 gets the forwarded vale from e/mem add $, $, $3 a) If (EX/E.Reg add $, $, $4 and (EX/E.RegisterRd ) add $, $, $5 and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <=

30 CC 5 add $, $, $6 add $,. $, $5 add $,. $, $4 add $,. $, $3 add $, $, $2 a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= 2a) If (E/WB.Reg and (E/WB.RegisterRd ) and (E/WB.RegisterRd = ID/EXRegisterRs)) ForwardA <= add $, $, $2 add $, $, $3 add $, $, $4 add $, $, $5

31 Control lines from forwarding nit control Sorce Eplanation ForwardA = ID/EX Al operand A comes from the register file ForwardA = EX/E Al operand A comes from previos cycle ALU reslt ForwardA = E/WB Al operand A comes from previos cycle memory read or earlier ALU reslt ForwardB = ID/EX Al operand B comes from the register file ForwardB = EX/E Al operand B comes from previos cycle ALU reslt ForwardB = E/WB Al operand B comes from previos cycle memory read or earlier ALU reslt a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= 2a) If (E/WB.Reg and (E/WB.RegisterRd ) and (EX/E.RegisterRd ID/EXRegisterRs) and (E/WB.RegisterRd = ID/EXRegisterRs)) ForwardA <= 2b) If (E/WB.Reg and (E/WB.RegisterRd ) and (EX/E.RegisterRd Rd ID/EXRegisterRt) Rt) and (E/WB.RegisterRd = ID/EXRegisterRt)) ForwardB <=

32 Datapath modified to resolve hazards Forwarding nit in EX-stage (with UXes) Operand register nmbers are passed on from ID stage via ID/EX pipel. reg. Some details are left ot, like signetending nit What abot store instrctions following r-type instrctions: add $2, $, $3 add $2, $, $5 sw $2, ($3) sw $5, ($2) or store following loads lw $2, ($3) sw $2, ($4)

33 CC add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $2, ($3)

34 CC 2 sw $2, ($3) add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $2, ($3)

35 CC 3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) $2, $3, $4, $ Instrction decode Eection sw $2, ($3) add $2, $, $3... sb $, $2, $3 emory lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $2, ($3)

36 CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode add $2, $3, $4, $ sb $, $2, Eection emory sw $2, ($3) add $2, $, $3... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= add $2, $, $3 sw $2, ($3) The read from port B is echanged with rd in EX/E PAT6F2.eps

37 CC add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $5, ($2)

38 CC 2 sw $2, ($3) add $2, $, $3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction.. decode Eection emory.. lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $5, ($2)

39 CC 3 add $4, $5, $6 Instrction fetch lw $3, 24 ($) $2, $3, $4, $ sw Instrction $2, ($3) decode add Eection $2,. $, $3... sb $, $2, $3 emory lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend add $2, $, $3 PAT6F2.eps sw $5, ($2)

40 CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode add $2, $3, $4, $ sb $, $2, sw $2, Eection ($3) add $2, emory. $,.. $3 lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend a) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRs)) ForwardA <= add $2, $, $3 sw $5, ($2) The read from port A is echanged with rd in EX/E PAT6F2.eps

41 CC lw $2, ($3) add $4, $5, $6 Instrction fetch lw $3, 24 ($) add $2, $3, $4, $ sb $, $2, $3 Instrction decode Eection emory... lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend PAT6F2.eps lw $2, ($3) sw $2, ($4)

42 CC 3 add $4, $5, $6 Instrction fetch sw $2, ($4) lw $3, 24 ($) Instrction decode.. add $2, $3, $4, $ Eection. lw $2, ($3) sb $, $2, $3 emory lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend PAT6F2.eps lw $2, ($3) sw $2, ($4)

43 CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode sw $2, ($4) add $2, $3, $4, $ Eection.. sb $, $2, emory. lw $2, ($3) lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend PAT6F2.eps lw $2, ($3) sw $2, ($4)

44 CC 4 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode sw $2, ($4) add $2, $3, $4, $ Eection.. sb $, $2, emory. lw $2, ($3) lw$, 2($) back IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend b) If (EX/E.Reg and (EX/E.RegisterRd ) and (EX/E.RegisterRd = ID/EXRegisterRt)) ForwardB <= sw is forwarded the wrong vale for $2 lw $2, ($3) sw $2, ($4) PAT6F2.eps

45 CC 5 add $4, $5, $6 Instrction fetch lw $3, 24 ($) Instrction decode add $2, $3, $4, $ Eection sb $, $2, $3 emory sw $2, ($4).. lw$, 2($) back. lw $2, ($3) IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add reslt PC Address Instrction memory Instrction register register 2 Registers 2 register Zero ALU ALU reslt Address Data memory 6 Sign 32 etend??? lw $2, ($3) sw $2, ($4) PAT6F2.eps

46 Data hazards and stalls (6.5) lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7 Forwarding cannot avoid stalling the pipeline when an instrction tries to read a register following a load instrction that writes the same register. The is still being read from memory in clock cycle 4 while the ALU is performing the operation for the following instrction. Something mst stall the pipeline for the combination of load followed by an instrction that reads its reslt. Hazard detection is needed. d

47 CC lw $2, 2($)... $2 = lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

48 CC 2 and $4, $2, $5 lw $2, 2($).. $2 = lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

49 CC 3 or $8, $2, $6 and $4, $2, $5 lw $2, 2($). $2 = and instrction reads for $2 stores in ID/EX lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

50 CC 4 add $9, $4, $2 or $8, $2, $6 and $4, $2, $5 lw $2, 2($) $2 = or instrction reads for $2 and stores it in ID/EX and instrction need new $2 vale, bt it is not available lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

51 CC 5 slt $, $6, $7 add $9, $4, $2 or $8, $2, $6 and $4, $2, $5 lw $2, 2($) $2 = add instrction reads for $2 and stores it in ID/EX or instrction can get ALU A from E/WB register lw $2, 2($) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

52 Data hazards and stalls hazard detection ID step mst test to see if previos instrction is a load. Then it mst be decided if the sorce registers match the destination register of the load if (ID/EX.em and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) Stall pipeline

53 Stall insertion Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC Program eection order (in instrctions) lw $2, 2($) and becomes nop add $4, $2, $5 or $8, $2, $6 add $9, $4, $2 I Reg I Reg I D Reg I Reg D Reg I bbble Reg D Reg D Reg CC2 and is fetched and lw is decoded CC3 and is decoded, or is fetched and lw is eected CC4 and is decoded, d d or is fetched, a nop is eected eecedad and lw is in the E stage The nop can be achieved by setting harmless control signals, Reg D Reg PAT6F35.eps

54 Stall / No operation Both the instrctions in the ID and IF stages mst be stalled to not loose the fetched instrctions. Preventing these two instrctions from making progress is accomplished simply by preventing the PC register and the IF/ID pipeline register from changing. The back half of the pipeline starting ti with the EX is eecting instrctions that have no effect: nops, which act like bbbles Deasserting all 9 control signals in the EX, E and WB stages will create a do nothing or nop instrction. No registers or memories are written

55 Pipeline with control, forwarding and hazard detection

56 The big pictre (page 374) Althogh the hardware may or may not rely on the compiler to resolve hazard dependences to ensre correct eection, the compiler mst nderstand the pipeline to achieve the best performance. Otherwise, nepected stalls will redce the performance of the compiled code

57 Branch hazards / control hazard (6.6) The decision whether to branch or not is taken in the E stage. Withot intervention the three seqential instrctions following the branch will be fetched and begin eection less freqent than hazards, bt a three instrction flsh is costly

58 Assme branch not taken Stalling ntil the branch is complete is too slow. Improvement: Assme the branch will not be taken and contine eection. If it is taken the instrctions that are being fetched and decoded mst be discarded. If branches are ntaken half the time, and if it costs little to discard the instrctions, this optimization halves the cost of control hazards. To discard instrctions: change the original control vales to s Also change the three instrctions in the IF, ID and EX stages when the branch reaches the E-stage Discarding instrctions means we mst be able to flsh instrctions in the IF, ID and EX stages of the pipeline

59 Redcing the delay of branches If branch eection is moved earlier in the pipeline fewer instrctions need to be flshed (So far we have assmed that the net PC for a branch is selected in the E stage.) any branches can rely on simple tests not reqiring ii a fll llalu operation oving the branch decision i p reqires two actions to occr earlier; compting the branch target address and evalating the branch decision

60 Redcing the delay of branches early branch detection. We already have the PC and the immediate field in the IF/ID pipeline register, so we jst move the branch adder from the EX stage to the ID stage. 2. BEQ; we wold compare the two registers (simple logic) read dring the ID stage. oving the branch test to the ID stage implies additional forwarding and hazard detection hardware, since a branch dependent on a reslt still in the pipeline mst still work properly with this optimization

61 Eample page sb $, $4, $8 4 beq $, $3, 7 # PC relative branch to and $2, $2, $5 48 or $3, $2, $6 52 beq $4, $4, $2 56 and $5, $6, $7 72 lw $4, 5($7) Assmes that the pipeline is optimized for branches not taken and branch eection is moved to the ID stage The ID stage of CC3 determines that a branch mst be taken, so 72 is selected as the net PC address and zeros the instrction fetched for the net CC. CC4 shows the instrction at location 72 being fetched and the single bbble or nop instrction as a reslt of the taken branch.

62 Dynamic branch prediction In a deeper pipeline (than 5 stage IPS) a simple static prediction scheme will probably waste too mch performance. Dynamic branch prediction ses rntime information to decide where to begin fetching new instrctions. A branch prediction bffer or branch history table is a small memory indeed by the lower portion of the address of the branch instrction. The memory contains one bit indicating whether the branch was recently taken or not. A problem is that t we don t know if the prediction is the right one, and it may have been pt there by another branch with the same low-order bits. Another shortcoming: even if a branch is almost always taken, we will predict incorrectly twice, rather than once, when it is not taken.

63 2-bit prediction scheme By sing 2 bits rather than, a branch that t strongly favors taken or not taken will be mispredicted only once

64 Delayed branch Other branch handling strategies Always eecte the following instrction Compilers and assemblers try to fill in the following instrction with one withot dependencies looses effect on long pipelines and mltiple isse pipelines Branch target bffer Store the epected jmp address in a bffer Global dynamic prediction Use the global branch behavior to determine prediction Effective if combined with local branch prediction

65 The final path and control for chapter 4

66 Eceptions (4.9) Add $, $2, $s, sppose overflow. We mst: Transfer control to eception rotine immediately after this instrction We mst flsh the instrctions following the add from the pipeline and begin fetching instrctions from the new address Same mechanisms as for taken branches, bt with the eception deasserting the control lines

67 Datapath with controls to handle eceptions (fig page 387) ID.Flsh is ORed with the stall signal from the Hazard detection ti nit. To flsh instrctions in EX-stage; EX.Flsh casing UXes to zero the control lines Additional inpt to PC is added to be able to fetch instrctions from 8 8he, which is the eception location for overflow

68 Cases of eceptions (page 385): ) I/O device reqest 2) Hardware malfnction 3) Invoking an operating system service from a ser program 4) Using an ndefined instrction 5) Overflow ),2) are not associated with a special instrction, so the implementation has some fleibility as to when to interrpt the pipeline, sing the mechanism sed for other eceptions In case of simltaneos mltiple eceptions the normal soltion is to prioritize iti the eceptions

69 4 he sb $, $2, $4 44 he and $2, $2, $5 48 he or $3, $2, $6 4C he add $, $2, $ 5 he slt $5, $6, $7 54 he lw $6, 5($7) Instr. To be invoked 44 he sw $25, ($s) 444 he sw $26, 4($s) Overflow for add in EX stage 4 4 he forced into PC. CC7 shows that add and following instrctions are flshed and the first instrction ti of the eception code is fetched. Address of the instrction following add is saved: 4C he +4=5 he. and and or complete

70 lw $6, 5($7) slt $5, $6, $7 add $, $2, $ or $3,... and $2,... EX.Flsh IF.Flsh ID.Flsh Hazard detection nit ID/EX WB EX/E IF/ID 54 Control + EX 5 Case EPC WB E/WB WB Shift left 2 $6 $ PC 54 Instrction memory 2 Registers = $7 $ Data memory Sign etend 5 $ 3 2 Clock 6 Forwarding nit IF.Flsh sw $25, ($) bbble (nop) bbble bbble or $3,... EX.Flsh ID.Flsh IF/ID 54 Hazard detection nit Control + ID/EX WB Case EX EPC EX/E WB E/WB WB 4 + Shift left 2 Registers = 2 ALU PC Instrction memory Data memory Sign etend 3 Clock 7 Forwarding nit PAT6F43.eps

71 lw $6, 5($7) slt $5, $6, $7 add $, $2, $ or $3,... and $2,... EX.Flsh IF.Flsh ID.Flsh Hazard detection nit ID/EX WB EX/E IF/ID 54 Control + EX 5 Case EPC WB E/WB WB Shift left 2 $6 $ PC 54 Instrction memory 2 Registers = $7 $ Data memory Sign etend 5 $ 3 2 Clock 6 Forwarding nit IF.Flsh sw $25, ($) bbble (nop) bbble bbble or $3,... EX.Flsh ID.Flsh IF/ID 54 Hazard detection nit Control + ID/EX WB Case EX EPC EX/E WB E/WB WB 4 + Shift left 2 Registers = 2 ALU PC Instrction memory Data memory Sign etend 3 Clock 7 Forwarding nit PAT6F43.eps

72 HW/SW interface (/2) HW + OS works in conjnction so eceptions behave as epected. HW contract is normally to stop the offending instrction ti in midstream, let all prior instrctions complete, flsh all following instrctions, set a register to show the case of the eception, save the address of the offending instrction, and jmp to the prearranged address Compter Control emory Datapath Processor Inpt Otpt

73 HW/SW interface (2/2) The OS contract is to look at the case of the eception and act appropriately: Undefined instrction, hw failre, overflow: kills the program and retrns an indicator of the reason I/O reqest or OS service call: Saves state of program, performs desired task, restores the program to contine eec. On of the most important and freqent ses of eceptions is handling page falts and TLB eceptions (chapter 7) Compter Control emory Datapath Processor Inpt Otpt

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts CS359: Compter Architectre Chapter 3 & Appendi C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Compter Science and Engineering Shanghai Jiao Tong University 1 Otline Introdction

More information

PS Midterm 2. Pipelining

PS Midterm 2. Pipelining PS idterm 2 Pipelining Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry

More information

1048: Computer Organization

1048: Computer Organization 8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6- Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards

More information

Review: Computer Organization

Review: Computer Organization Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes

More information

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13 Comp 33 Compter Architectre A Pipelined path Lectre 3 Pipelined path with Signals PCSrc IF/ ID ID/ EX EX / E E / Add PC 4 Address Instrction emory RegWr ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign

More information

Solutions for Chapter 6 Exercises

Solutions for Chapter 6 Exercises Soltions for Chapter 6 Eercises Soltions for Chapter 6 Eercises 6. 6.2 a. Shortening the ALU operation will not affect the speedp obtained from pipelining. It wold not affect the clock cycle. b. If the

More information

EEC 483 Computer Organization. Branch (Control) Hazards

EEC 483 Computer Organization. Branch (Control) Hazards EEC 483 Compter Organization Section 4.8 Branch Hazards Section 4.9 Exceptions Chans Y Branch (Control) Hazards While execting a previos branch, next instrction address might not yet be known. s n i o

More information

What do we have so far? Multi-Cycle Datapath

What do we have so far? Multi-Cycle Datapath What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining

More information

Enhanced Performance with Pipelining

Enhanced Performance with Pipelining Chapter 6 Enhanced Performance with Pipelining Note: The slides being presented represent a mi. Some are created by ark Franklin, Washington University in St. Lois, Dept. of CSE. any are taken from the

More information

Pipelining. Chapter 4

Pipelining. Chapter 4 Pipelining Chapter 4 ake processor rns faster Pipelining is an implementation techniqe in which mltiple instrctions are overlapped in eection Key of making processor fast Pipelining Single cycle path we

More information

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P 7 8 9 A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate

More information

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark CS 25, Winter 29, Assignment.. 3% of corse mark De Wednesday, arch 3th, 5:3P Lates accepted ntil Thrsday arch th, pm with a 5% penalty. (7 points) In the diagram below, the mlticycle compter from the corse

More information

Overview of Pipelining

Overview of Pipelining EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase

More information

Chapter 6: Pipelining

Chapter 6: Pipelining Chapter 6: Pipelining Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 25, Winter 28, Assignment 4.. 3% of corse mark De Wednesday, arch 7th, 4:3P Lates accepted ntil Thrsday arch 8th, am with a 5% penalty. (6 points) In the diagram below, the mlticycle compter from the

More information

Exceptions and interrupts

Exceptions and interrupts Eceptions and interrpts An eception or interrpt is an nepected event that reqires the CPU to pase or stop the crrent program. Eception handling is the hardware analog of error handling in software. Classes

More information

The extra single-cycle adders

The extra single-cycle adders lticycle Datapath As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle, jst like before. Bt since

More information

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation EXAINATIONS 2003 COP203 END-YEAR Compter Organisation Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. There are 180 possible marks on the eam. Calclators and foreign langage dictionaries

More information

Review. A single-cycle MIPS processor

Review. A single-cycle MIPS processor Review If three instrctions have opcodes, 7 and 5 are they all of the same type? If we were to add an instrction to IPS of the form OD $t, $t2, $t3, which performs $t = $t2 OD $t3, what wold be its opcode?

More information

Chapter 6: Pipelining

Chapter 6: Pipelining CSE 322 COPUTER ARCHITECTURE II Chapter 6: Pipelining Chapter 6: Pipelining Febrary 10, 2000 1 Clothes Washing CSE 322 COPUTER ARCHITECTURE II The Assembly Line Accmlate dirty clothes in hamper Place in

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 483 Compter Organization Chapter 4.4 A Simple Implementation Scheme Chans Y The Big Pictre The Five Classic Components of a Compter Processor Control emory Inpt path Otpt path & Control 2 path and

More information

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION EXAINATIONS 2010 END OF YEAR COPUTER ORGANIZATION Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. ake sre yor answers are clear and to the point. Calclators and paper foreign langage

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 83 Compter Organization Chapter.6 A Pipelined path Chans Y Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 2 6 7 8 9 c.y9@csohio.ed Resorces sed in 5 Stages

More information

The single-cycle design from last time

The single-cycle design from last time lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion . (Chapter 5) Fill in the vales for SrcA, SrcB, IorD, Dst and emto to complete the Finite State achine for the mlti-cycle datapath shown below. emory address comptation 2 SrcA = SrcB = Op = fetch em SrcA

More information

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read. The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5

More information

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University Compter Architectre Chapter 5 Fall 25 Department of Compter Science Kent State University The Processor: Datapath & Control Or implementation of the MIPS is simplified memory-reference instrctions: lw,

More information

Review Multicycle: What is Happening. Controlling The Multicycle Design

Review Multicycle: What is Happening. Controlling The Multicycle Design Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code: EE8 Winter 25 Homework #2 Soltions De Thrsday, Feb 2, 5 P. ( points) Consider the following fragment of Java code: for (i=; i

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

Quiz #1 EEC 483, Spring 2019

Quiz #1 EEC 483, Spring 2019 Qiz # EEC 483, Spring 29 Date: Jan 22 Name: Eercise #: Translate the following instrction in C into IPS code. Eercise #2: Translate the following instrction in C into IPS code. Hint: operand C is stored

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle

More information

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5B - mlticycle implementation (cwli@twins.ee.nct.ed.tw) 5B- Recap: A Single-Cycle Processor PCSrc 4 Add Shift left 2 Add ALU reslt PC address

More information

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5A - simple implementation (cwli@twins.ee.nct.ed.tw) 5A- Introdction In this lectre, we will try to implement simplified IPS which contain emory

More information

Lecture 13: Exceptions and Interrupts

Lecture 13: Exceptions and Interrupts 18 447 Lectre 13: Eceptions and Interrpts S 10 L13 1 James C. Hoe Dept of ECE, CU arch 1, 2010 Annoncements: Handots: Spring break is almost here Check grades on Blackboard idterm 1 graded Handot #9: Lab

More information

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Lecture 7. Building A Simple Processor

Lecture 7. Building A Simple Processor Lectre 7 Bilding A Simple Processor Christos Kozyrakis Stanford University http://eeclass.stanford.ed/ee8b C. Kozyrakis EE8b Lectre 7 Annoncements Upcoming deadlines Lab is de today Demo by 5pm, report

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Data Hazards in a Pipelined Datapath James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Data

More information

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons Pipelining: Natral Phenomenon Landry Eample: nn, rian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 mintes C D Dryer takes 0 mintes PIPELINING Folder takes 20 mintes

More information

PART I: Adding Instructions to the Datapath. (2 nd Edition):

PART I: Adding Instructions to the Datapath. (2 nd Edition): EE57 Instrctor: G. Pvvada ===================================================================== Homework #5b De: check on the blackboard =====================================================================

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

LECTURE 9. Pipeline Hazards

LECTURE 9. Pipeline Hazards LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which do not include hazards of any kind. Remember that

More information

Computer Architecture. Lecture 6: Pipelining

Computer Architecture. Lecture 6: Pipelining Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time. Pipelining Pipelining is the se of pipelining to allow more than one instrction to be in some stage of eection at the same time. Ferranti ATLAS (963): Pipelining redced the average time per instrction

More information

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture. zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture. We have already discussed in the previous module that true

More information

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade CSE 141 Compter Architectre Smmer Session I, 2004 Lectres 10 Advanced Topics, emory Hierarchy and Cache Pramod V. Argade CSE141: Introdction to Compter Architectre Instrctor: TA: Pramod V. Argade (p2argade@cs.csd.ed)

More information

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics (Lectres

More information

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Spring 25 Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page. page of 7 University of Calgary Departent of Electrical and Copter Engineering ENCM 369: Copter Organization Lectre Instrctors: Steve Noran and Nor Bartley Winter 23 MIDTERM TEST #2 Wednesday, March 2

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 25, Winter 28, Assignment 3.. 3% of corse mark De onday, Febrary 26th, 4:3 P Lates accepted ntil : A, Febrary 27th with a 5% penalty. IEEE 754 Floating Point ( points): (a) (4 points) Complete the following

More information

CS 251, Spring 2018, Assignment 3.0 3% of course mark

CS 251, Spring 2018, Assignment 3.0 3% of course mark CS 25, Spring 28, Assignment 3. 3% of corse mark De onday, Jne 25th, 5:3 P. (5 points) Consider the single-cycle compter shown on page 6 of this assignment. Sppose the circit elements take the following

More information

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos Pipelined datapath Staging data b 55 Life of a load in the MIPS pipeline Note: both the instruction and the incremented PC value need to be forwarded in the next stage (in case the instruction is a beq)

More information

Hardware Design Tips. Outline

Hardware Design Tips. Outline Hardware Design Tips EE 36 University of Hawaii EE 36 Fall 23 University of Hawaii Otline Verilog: some sbleties Simlators Test Benching Implementing the IPS Actally a simplified 6 bit version EE 36 Fall

More information

CENG 3420 Lecture 06: Pipeline

CENG 3420 Lecture 06: Pipeline CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2019 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

DEE 1053 Computer Organization Lecture 6: Pipelining

DEE 1053 Computer Organization Lecture 6: Pipelining Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve

More information

POWER-OF-2 BOUNDARIES

POWER-OF-2 BOUNDARIES Warren.3.fm Page 5 Monday, Jne 17, 5:6 PM CHAPTER 3 POWER-OF- BOUNDARIES 3 1 Ronding Up/Down to a Mltiple of a Known Power of Ronding an nsigned integer down to, for eample, the net smaller mltiple of

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control CSE-45432 Introdction to Compter Architectre Chapter 5 The Processor: Datapath & Control Dr. Izadi Data Processor Register # PC Address Registers ALU memory Register # Register # Address Data memory Data

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards Computer Architecture and Organization Pipeline: Data Hazards Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 14.1 Pipelining Outline Introduction

More information

ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE154A Introduction to Computer Architecture. Homework 4 solution ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns Chapter Si Pipelining Improve perfomance by increasing instruction throughput eecutionı Time lw $, ($) 2 6 8 2 6 8 access lw $2, 2($) 8 ns access lw $3, 3($) eecutionı Time lw $, ($) lw $2, 2($) 2 ns 8

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University 8 447 Lectre 6: icroprogrammed lti Cycle Implementation James C. Hoe Department of ECE Carnegie ellon University 8 447 S8 L06 S, James C. Hoe, CU/ECE/CALC, 208 Yor goal today Hosekeeping nderstand why

More information

Lecture 9: Microcontrolled Multi-Cycle Implementations

Lecture 9: Microcontrolled Multi-Cycle Implementations 8-447 Lectre 9: icroled lti-cycle Implementations James C. Hoe Dept of ECE, CU Febrary 8, 29 S 9 L9- Annoncements: P&H Appendi D Get started t on Lab Handots: Handot #8: Project (on Blackboard) Single-Cycle

More information

CS 153 Design of Operating Systems

CS 153 Design of Operating Systems CS 153 Design of Operating Systems Spring 18 Lectre 3: OS model and Architectral Spport Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Last time/today

More information

CSSE232 Computer Architecture I. Mul5cycle Datapath

CSSE232 Computer Architecture I. Mul5cycle Datapath CSSE232 Compter Architectre I Ml5cycle Datapath Class Stats Next 3 days : Ml5cycle datapath ing Ml5cycle datapath is not in the book! How long do instrc5ons take? ALU 2ns Mem 2ns Reg File 1ns Everything

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1, SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty

More information