Design a MIPS Processor (2/2)

Size: px

Start display at page:

Download "Design a MIPS Processor (2/2)"

Calvin Shields
6 years ago
Views:

1 93-2Digital System Design Design a MIPS Processor (2/2) Lecturer: Chihhao Chao Advisor: Prof. An-Yeu Wu 2005/5/13 Friday ACCESS IC LABORTORY

2 Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P2

3 Pipelining is Natural! vpipelining provides a method for executing multiple instructions at the same time. vlaundry Example vann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold vwasher takes 30 minutes vdryer takes 40 minutes v Folder takes 20 minutes A B C D P3

4 Sequential Laundry 6 PM Midnight Time T a s k O r d e r A B C D vsequential laundry takes 6 hours for 4 loads vif they learned pipelining, how long would laundry take? P4

5 Pipelined Laundry: Start work ASAP 6 PM Midnight Time T a s k O r d e r A B C D vpipelined laundry takes 3.5 hours for 4 loads P5

6 T a s k O r d e r Pipelining Lessons A B C D 6 PM Time vpipelining doesn t help latency of single task, it helps throughput of entire workload vpipeline rate limited by slowest pipeline stage vmultiple tasks operating simultaneously using different resources vpotential speedup = Number pipe stages vunbalanced lengths of pipe stages reduces speedup vtime to fill pipeline and time to drain it reduces speedup vstall for Dependences P6

7 The 5 Stages of the Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load IFetch Reg/Dec Exec Mem Wr vifetch: Instruction Fetch vfetch the instruction from the Instruction Memory vreg/dec: Registers Fetch and Instruction Decode vexec: Calculate the memory address vmem: Read the data from the Data Memory vwr: Write the data back to the register file P7

8 Pipeline Execution Time IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Program Flow IFetch Dcd Exec Mem WB von a processor multple instructions are in various stages at the same time. vassume each instruction takes five cycles P8

9 Single Cycle, Multi-cycle, Pipelined Clk Cycle 1 Cycle 2 Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load IFetch Reg Exec Mem Wr Store IFetch Reg Exec Mem R-type IFetch Pipeline Implementation: Load IFetch Reg Exec Mem Wr Store IFetch Reg Exec Mem Wr R-type IFetch Reg Exec Mem Wr P9

10 Why Pipeline? Because the Resources Are There! Time (clock cycles) I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg P10

11 Pipelining MIPS Execution P11

12 Pipeline Hazards v Structural hazard v An occurrence in which a planned instruction cannot execute in the proper clock cycle because the hardware cannot support the combination of instructions that are set to execute in the given clock cycle. v Data hazard v Also called pipeline data hazard. An occurrence in which a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction is not yet available. v Control hazard v Also called branch hazard. An occurrence in which the proper instruction cannot execute in the proper clock cycle because the instruction that was fetched is NOT the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected. P12

13 Data Hazard v Forwarding v Also called bypassing. A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible register or memory. v Load-use data hazard v A specific form of data hazard in which the data requested by a load instruction has not yet become available when it is requested. v Pipeline stall v Also called bubble. A stall initiated in order to resolve a hazard P13

14 vuntaken branch Control Hazard vone that falls through to the successive instruction. A taken branch is one that causes transfer to the branch target vbranch prediction va method of resolving a branch hazard that assumes a given outcome for the branch, and proceeds from that assumption rather than waiting to ascertain the actual outcome P14

15 Pipeline Overview Summary vlatency (pipeline) vthe number of stages in a pipeline or the number of stages between two instructions during execution. vthroughput (pipeline) vthe number of instructions executed per unit time. P15

16 Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P16

17 Designing a Pipelined Processor v Examine the datapath and control diagram v Starting with single-or multi-cycle datapath? v Single-or multi-cycle control? v Partition datapath into stages: v IF (instruction fetch), ID (instruction decode and register file read), EX (execution or address calculation), MEM (data memory access), WB (write back) v Associate resources with states v Ensure that flows do not conflict, or figure out how to resolve v Assert control in appropriate stage P17

18 Use Multi-cycle Execution Steps But, use single-cycle datapath.. (separate memory, why??) P18

19 Split Single-cycle Datapath What to add to split the datapath into stages P19

20 Add Pipeline Registers (Flip/Flop) v Use registers between stages to carry data and control P20

21 Consider load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw Ifetch Reg/Dec Exec Mem Wr v IF: Instruction Fetch v Fetch the instruction from the Instruction Memory v ID: Instruction Decode v Registers fetch and instruction decode v EX: Calculate the memory address v MEM: Read the data from the Data Memory v WB: Write the data back to the register file P21

22 Pipelining load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clock 1st lw Ifetch Reg/Dec Exec Mem Wr 2nd lw Ifetch Reg/Dec Exec Mem Wr 3rd lw Ifetch Reg/Dec Exec Mem Wr v 5 functional units in the pipeline datapath are: v Instruction Memory for the IFetch stage v Register File s Read ports (busa and busb) for the Reg/Dec stage v ALU for the Exec stage v Data Memory for the MEM stage v Register File s Write port (busw) for the WB stage P22

23 v IF/ID= mem[pc] ; PC = PC + 4 IF Stage of load word P23

24 ID Stage of load word v v ID/EX(A)= Reg[IR[25-21]]; ID/EX(B)= Reg[IR[20-16]]; ID/EX = Sign-extension of ID[15:0] P24

25 EX Stage of load word v EX/MEM = A + sign-ext(ir[15-0]) % address computation P25

26 MEM Stage of load word v MEM/WB = mem[aluout] P26

27 v Reg[ IR[20-16] ] = MEM/WB WB Stage of load P27

28 Pipelined Datapath P28

29 The Four Stages of R-typeR Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 R-type Ifetch Reg/Dec Exec Wr v IF: fetch the instruction from the Instruction Memory v ID: registers fetch and instruction decode v EX: v ALU operates on the two register operands v Update PC v WB: write ALU output back to the register file P29

30 Pipelining R-type R and load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Exec Wr We have a problem! R-type Ifetch Reg/Dec Exec Wr Load Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr vwe have a structural hazard: vtwo instructions try to write to the register file at the same time! vonly one write port P30

31 Important Observation v Each functional unit can only be used once per instruction v Each functional unit must be used at the same stage for all instructions: v Load uses Register File s write port during its 5th stage Load Ifetch Reg/Dec Exec Mem Wr v R-type uses Register File s write port during its 4th stage R-type Ifetch Reg/Dec Exec Wr Several ways to solve: 1) forwarding, 2) adding pipeline bubble, 3) making instructions same length P31

32 Solution 1: Insert Bubble Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Exec Wr Load Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Pipeline Exec Wr R-type Ifetch Bubble Reg/Dec Exec Wr Ifetch Reg/Dec Exec v Insert a bubble into the pipeline to prevent two writes at the same cycle v The control logic can be complex v Lose instruction fetch and issue opportunity v No instruction is started in Cycle 6 P32

33 Solution 2: Delay R-typeR type s s Write v Delay R-type s register write by one cycle: v R-type also use Reg File s write port at Stage 5 v MEM is a NOP stage: nothing is being done. R-type Ifetch Reg/Dec Exec Mem Wr Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr P33

34 The Four Stages of store Cycle 1 Cycle 2 Cycle 3 Cycle 4 Store Ifetch Reg/Dec Exec Mem Wr v IF: fetch the instruction from the Instruction Memory v ID: registers fetch and instruction decode v EX: calculate the memory address v MEM: write the data into the Data Memory Add an extra stage: v WB: NOP P34

35 The Four Stages of beq Cycle 1 Cycle 2 Cycle 3 Cycle 4 Store Ifetch Reg/Dec Exec Mem Wr v IF: fetch the instruction from the Instruction Memory v ID: registers fetch and instruction decode v EX: v compares the two register operand v select correct branch target address v latch into PC v Add two extra stages: v MEM: NOP v WB: NOP P35

36 Use Graphical Representation for Pipelined MIPS v The graph can help to answer questions like: v How many cycles to execute this code? v What is the ALU doing during cycle 4? P36

37 Example 1: Cycle 1 P37

38 Example 1: Cycle 2 P38

39 Example 1: Cycle 3 P39

40 Example 1: Cycle 4 P40

41 Example 1: Cycle 5 P41

42 Example 1: Cycle 6 P42

43 Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P43

44 Pipeline Control: Control Signals P44

45 Group Signals According to Stages v Can use control signals of single-cycle CPU P45

46 Data Stationary Control v Pass control signals along just like the data v Main control generates control signals during ID P46

47 Data Stationary Control (cont.) v Signals for EX (ExtOp, ALUSrc,...) are used 1 cycle later v Signals for MEM (MemWr, Branch) are used 2 cycles later v Signals for WB (MemtoReg, MemWr) are used 3 cycles later Reg/Dec Exec Mem Wr ExtOp ExtOp ALUSrc ALUSrc IF/ID Register Main Control ALUOp RegDst MemW Branch r MemtoReg ID/Ex Register ALUOp RegDst MemW Branch r MemtoReg Ex/Mem Register MemWr Branch MemtoReg Mem/Wr Register MemtoReg RegWr RegWr RegWr RegWr P47

48 Datapath with Control P48

49 Let s s Try it Out Sample Assembly Program lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 P49

50 Example 2: Cycle 1 P50

51 Example 2: Cycle 2 P51

52 Example 2: Cycle 3 P52

53 Example 2: Cycle 4 P53

54 Example 2: Cycle 5 P54

55 Example 2: Cycle 6 P55

56 Example 2: Cycle 7 P56

57 Example 2: Cycle 8 P57

58 Example 2: Cycle 9 P58

59 Summary of Pipeline Basics vpipelining is a fundamental concept vmultiple steps using distinct resources vutilize capabilities of datapath by pipelined instruction processing Start next instruction while working on the current one Limited by length of longest stage (plus fill/flush) Need to detect and resolve hazards vwhat makes it easy in MIPS? vall instructions are of the same length vjust a few instruction formats vmemory operands only in loads and stores vwhat makes pipelining hard? hazards P59

60 Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P60

61 Data Hazards Data Hazards v Order of operand accesses changed by pipeline v Starting next instruction before first is finished v Dependencies go backward in time P61

62 Handling Data Hazards vdetect vresolve remaining ones vcompiler inserts NOP vstall vforward P62

63 Software Solution vhave compiler guarantee no hazards vwhere do we insert the NOPs? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) vproblem: not efficient enough! P63

64 Detecting Data Hazards v Hazard conditions: v EX/MEM.RegisterRd = ID/EX.RegisterRs v EX/MEM.RegisterRd = ID/EX.RegisterRt v MEM/WB.RegisterRd = ID/EX.RegisterRs v MEM/WB.RegisterRd = ID/EX.RegisterRt (1a) (1b) (2a) (2b) v Two optimizations: v Don t forward if instruction does not write register check if RegWrite is asserted v Don t forward if destination register is $0 check if RegisterRd = 0 P64

65 Detecting Data Hazards (cont.) vhazard conditions using control signals: At EX stage (EX hazard): If ( EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd=ID/EX.RegRs ) ForwardA = 10 P65

66 Detecting Data Hazards (cont.) vhazard conditions using control signals: vat MEM stage: MEM/WB.RegWrite and (MEM/WB.RegRd 0) and (MEM/WB.RegRd=ID/EX.RegRs) v(replace ID/EX.RegRt for ID/EX.RegRs for the other two conditions) P66

67 Resolving Hazards: Forwarding v Use temporary results, e.g., those in pipeline registers, don t wait for them to be written P67

68 Forwarding Logic v Forwarding: input to ALU from any pipe registers v Add multiplexors to ALU input v Control forwarding in EX carry Rs in ID/EX v Control signals for forwarding: v If both WB and MEM forward, e.g., add $1,$1,$2; add $1,$1,$3; add $1,$1,$4; => let MEM forward v EX hazard: if (EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd=ID/EX.RegRs)) ForwardA=10 v MEM hazard: if (MEM/WB.RegWriteand (MEM/WB.RegRd 0) and (EX/MEM.RegRd ID/EX.Reg.Rs) and (MEM/WB.RegRd=ID/EX.RegRs)) ForwardA=01 (ID/EX.RegRt <-> ID/EX.RegRs, ForwardB <-> ForwardA) P68

69 No Forwarding P69

70 With Forwarding P70

71 Pipeline with Forwarding P71

72 Example 3: Cycle 3 P72

73 Example 3: Cycle 4 P73

74 Example 3: Cycle 5 P74

75 Example 3: Cycle 6 P75

76 Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P76

77 Can't Always Forward v lw can still cause a hazard: v if is followed by an instruction to read the loaded reg. P77

78 Stalling v Stall pipeline by keeping instructions in same stage and inserting an NOP instead P78

79 Handling Stalls v Hazard detection unit in ID to insert stall between a load instruction and its use: if (ID/EX.MemRead and ((ID/EX.RegisterRt= IF/ID.RegisterRs) or (ID/EX.RegisterRt= IF/ID.registerRt)) stall the pipeline for one cycle (ID/EX.MemRead=1 indicates a load instruction) v How to stall? v Stall instruction in IF and ID: not change PC and IF/ID => the stages re-execute the instructions v What to move into EX: insert an NOP by changing EX, MEM, WB control fields of ID/EX pipeline register to 0 as control signals propagate, all control signals to EX, MEM, WB are deasserted and no registers or memories are written P79

80 Pipeline with Stalling Unit v Forwarding controls ALU inputs, hazard detection controls PC, IF/ID, control signals P80

81 Example 4: Cycle 2 P81

82 Example 4: Cycle 3 P82

83 Example 4: Cycle 4 P83

84 Example 4: Cycle 5 P84

85 Example 4: Cycle 6 P85

86 Example 4: Cycle 7 P86

87 Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions (optional) v (optional) P87

88 Branch Hazards v When decide to branch, other instructions are still in the pipe P88

89 Handling Branch Hazard v Predict branch always not taken v Need to add hardware for flushing inst. if wrong v Branch decision made at MEM => need to flush inst. in IF, ID, EX by changing control values to 0 v Reduce delay of taken branch by moving branch execution earlier in the pipeline v Move up branch address calculation to ID v Check branch equality at ID (using XOR) by comparing the two registers read during ID v Branch decision made at EX => one inst. to flush v Add a control signal, IF.Flush, to zero instruction field of IF/ID => making the instruction an NOP v Dynamic branch prediction v Compiler rescheduling, delay branch P89

90 Delayed Branch v Predict-not-taken + branch decision at ID => the following inst. is always executed => branches take effect 1 cycle later v 0 clock cycle per branch instruction if can find instruction to put in slot ( 50% of time) P90

91 Pipeline with Flushing P91

92 Example 5: Cycle 3 P92

93 Example 5: Cycle 4 P93

94 Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P94

95 What about Exceptions? v 5 instructions executing in 5 stage pipeline v How to stop the pipeline? restart? v Who caused the interrupt? v Who to serve first, if multiple interrupts at the same time? v Need to know in which stage an exception can occur Stage IF ID EX MEM Problem interrupts occurring Page fault; misaligned memory access; memory-protection violation Undefined or illegal opcode Arithmetic exception Page fault; misaligned memory access; memory error; mem-protection violation; P95

96 Handling Exceptions v Suppose overflow occur at add $1,$2,$1 v Disable writes of instructions till trap hits WB, e.g., flush following instructions using IF.Flush, ID.Flush, EX.Flush to cause multiplexorsto zero control signals (overflow exception detected at EX => flush offending instr.) v Force trap instruction into IF, e.g., fetch from hex by adding hex to PC input MUX v Save address of offending instruction in EPC v Multiple interrupts: use priority hardware to choose the earliest instruction to interrupt v External interrupts: flexible in when to interrupt P96

97 Pipeline with Exception P97

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike