Design a MIPS Processor (2/2) - PDF Free Download

93-2Digital System Design Design a MIPS Processor (2/2) Lecturer: Chihhao Chao Advisor: Prof. An-Yeu Wu 2005/5/13 Friday ACCESS IC LABORTORY

Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P2

Pipelining is Natural! vpipelining provides a method for executing multiple instructions at the same time. vlaundry Example vann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold vwasher takes 30 minutes vdryer takes 40 minutes v Folder takes 20 minutes A B C D P3

Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 20 30 40 20 30 40 20 30 40 20 vsequential laundry takes 6 hours for 4 loads vif they learned pipelining, how long would laundry take? P4

Pipelined Laundry: Start work ASAP 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 40 40 40 20 vpipelined laundry takes 3.5 hours for 4 loads P5

T a s k O r d e r Pipelining Lessons A B C D 6 PM 7 8 9 Time 30 40 40 40 40 20 vpipelining doesn t help latency of single task, it helps throughput of entire workload vpipeline rate limited by slowest pipeline stage vmultiple tasks operating simultaneously using different resources vpotential speedup = Number pipe stages vunbalanced lengths of pipe stages reduces speedup vtime to fill pipeline and time to drain it reduces speedup vstall for Dependences P6

The 5 Stages of the Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load IFetch Reg/Dec Exec Mem Wr vifetch: Instruction Fetch vfetch the instruction from the Instruction Memory vreg/dec: Registers Fetch and Instruction Decode vexec: Calculate the memory address vmem: Read the data from the Data Memory vwr: Write the data back to the register file P7

Pipeline Execution Time IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Program Flow IFetch Dcd Exec Mem WB von a processor multple instructions are in various stages at the same time. vassume each instruction takes five cycles P8

Single Cycle, Multi-cycle, Pipelined Clk Cycle 1 Cycle 2 Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load IFetch Reg Exec Mem Wr Store IFetch Reg Exec Mem R-type IFetch Pipeline Implementation: Load IFetch Reg Exec Mem Wr Store IFetch Reg Exec Mem Wr R-type IFetch Reg Exec Mem Wr P9

Why Pipeline? Because the Resources Are There! Time (clock cycles) I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg P10

Pipelining MIPS Execution P11

Pipeline Hazards v Structural hazard v An occurrence in which a planned instruction cannot execute in the proper clock cycle because the hardware cannot support the combination of instructions that are set to execute in the given clock cycle. v Data hazard v Also called pipeline data hazard. An occurrence in which a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction is not yet available. v Control hazard v Also called branch hazard. An occurrence in which the proper instruction cannot execute in the proper clock cycle because the instruction that was fetched is NOT the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected. P12

Data Hazard v Forwarding v Also called bypassing. A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible register or memory. v Load-use data hazard v A specific form of data hazard in which the data requested by a load instruction has not yet become available when it is requested. v Pipeline stall v Also called bubble. A stall initiated in order to resolve a hazard P13

vuntaken branch Control Hazard vone that falls through to the successive instruction. A taken branch is one that causes transfer to the branch target vbranch prediction va method of resolving a branch hazard that assumes a given outcome for the branch, and proceeds from that assumption rather than waiting to ascertain the actual outcome P14

Pipeline Overview Summary vlatency (pipeline) vthe number of stages in a pipeline or the number of stages between two instructions during execution. vthroughput (pipeline) vthe number of instructions executed per unit time. P15

Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P16

Designing a Pipelined Processor v Examine the datapath and control diagram v Starting with single-or multi-cycle datapath? v Single-or multi-cycle control? v Partition datapath into stages: v IF (instruction fetch), ID (instruction decode and register file read), EX (execution or address calculation), MEM (data memory access), WB (write back) v Associate resources with states v Ensure that flows do not conflict, or figure out how to resolve v Assert control in appropriate stage P17

Use Multi-cycle Execution Steps But, use single-cycle datapath.. (separate memory, why??) P18

Split Single-cycle Datapath What to add to split the datapath into stages P19

Add Pipeline Registers (Flip/Flop) v Use registers between stages to carry data and control P20

Consider load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw Ifetch Reg/Dec Exec Mem Wr v IF: Instruction Fetch v Fetch the instruction from the Instruction Memory v ID: Instruction Decode v Registers fetch and instruction decode v EX: Calculate the memory address v MEM: Read the data from the Data Memory v WB: Write the data back to the register file P21

Pipelining load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clock 1st lw Ifetch Reg/Dec Exec Mem Wr 2nd lw Ifetch Reg/Dec Exec Mem Wr 3rd lw Ifetch Reg/Dec Exec Mem Wr v 5 functional units in the pipeline datapath are: v Instruction Memory for the IFetch stage v Register File s Read ports (busa and busb) for the Reg/Dec stage v ALU for the Exec stage v Data Memory for the MEM stage v Register File s Write port (busw) for the WB stage P22

v IF/ID= mem[pc] ; PC = PC + 4 IF Stage of load word P23

ID Stage of load word v v ID/EX(A)= Reg[IR[25-21]]; ID/EX(B)= Reg[IR[20-16]]; ID/EX = Sign-extension of ID[15:0] P24

EX Stage of load word v EX/MEM = A + sign-ext(ir[15-0]) % address computation P25

MEM Stage of load word v MEM/WB = mem[aluout] P26

v Reg[ IR[20-16] ] = MEM/WB WB Stage of load P27

Pipelined Datapath P28

The Four Stages of R-typeR Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 R-type Ifetch Reg/Dec Exec Wr v IF: fetch the instruction from the Instruction Memory v ID: registers fetch and instruction decode v EX: v ALU operates on the two register operands v Update PC v WB: write ALU output back to the register file P29

Pipelining R-type R and load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Exec Wr We have a problem! R-type Ifetch Reg/Dec Exec Wr Load Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr vwe have a structural hazard: vtwo instructions try to write to the register file at the same time! vonly one write port P30

Important Observation v Each functional unit can only be used once per instruction v Each functional unit must be used at the same stage for all instructions: v Load uses Register File s write port during its 5th stage Load Ifetch Reg/Dec Exec Mem Wr v R-type uses Register File s write port during its 4th stage R-type Ifetch Reg/Dec Exec Wr Several ways to solve: 1) forwarding, 2) adding pipeline bubble, 3) making instructions same length P31

Solution 1: Insert Bubble Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Exec Wr Load Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Pipeline Exec Wr R-type Ifetch Bubble Reg/Dec Exec Wr Ifetch Reg/Dec Exec v Insert a bubble into the pipeline to prevent two writes at the same cycle v The control logic can be complex v Lose instruction fetch and issue opportunity v No instruction is started in Cycle 6 P32

Solution 2: Delay R-typeR type s s Write v Delay R-type s register write by one cycle: v R-type also use Reg File s write port at Stage 5 v MEM is a NOP stage: nothing is being done. R-type Ifetch Reg/Dec Exec Mem Wr Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr P33

The Four Stages of store Cycle 1 Cycle 2 Cycle 3 Cycle 4 Store Ifetch Reg/Dec Exec Mem Wr v IF: fetch the instruction from the Instruction Memory v ID: registers fetch and instruction decode v EX: calculate the memory address v MEM: write the data into the Data Memory Add an extra stage: v WB: NOP P34

The Four Stages of beq Cycle 1 Cycle 2 Cycle 3 Cycle 4 Store Ifetch Reg/Dec Exec Mem Wr v IF: fetch the instruction from the Instruction Memory v ID: registers fetch and instruction decode v EX: v compares the two register operand v select correct branch target address v latch into PC v Add two extra stages: v MEM: NOP v WB: NOP P35

Use Graphical Representation for Pipelined MIPS v The graph can help to answer questions like: v How many cycles to execute this code? v What is the ALU doing during cycle 4? P36

Example 1: Cycle 1 P37

Example 1: Cycle 2 P38

Example 1: Cycle 3 P39

Example 1: Cycle 4 P40

Example 1: Cycle 5 P41

Example 1: Cycle 6 P42

Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P43

Pipeline Control: Control Signals P44

Group Signals According to Stages v Can use control signals of single-cycle CPU P45

Data Stationary Control v Pass control signals along just like the data v Main control generates control signals during ID P46

Data Stationary Control (cont.) v Signals for EX (ExtOp, ALUSrc,...) are used 1 cycle later v Signals for MEM (MemWr, Branch) are used 2 cycles later v Signals for WB (MemtoReg, MemWr) are used 3 cycles later Reg/Dec Exec Mem Wr ExtOp ExtOp ALUSrc ALUSrc IF/ID Register Main Control ALUOp RegDst MemW Branch r MemtoReg ID/Ex Register ALUOp RegDst MemW Branch r MemtoReg Ex/Mem Register MemWr Branch MemtoReg Mem/Wr Register MemtoReg RegWr RegWr RegWr RegWr P47

Datapath with Control P48

Let s s Try it Out Sample Assembly Program lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 P49

Example 2: Cycle 1 P50

Example 2: Cycle 2 P51

Example 2: Cycle 3 P52

Example 2: Cycle 4 P53

Example 2: Cycle 5 P54

Example 2: Cycle 6 P55

Example 2: Cycle 7 P56

Example 2: Cycle 8 P57

Example 2: Cycle 9 P58

Summary of Pipeline Basics vpipelining is a fundamental concept vmultiple steps using distinct resources vutilize capabilities of datapath by pipelined instruction processing Start next instruction while working on the current one Limited by length of longest stage (plus fill/flush) Need to detect and resolve hazards vwhat makes it easy in MIPS? vall instructions are of the same length vjust a few instruction formats vmemory operands only in loads and stores vwhat makes pipelining hard? hazards P59

Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P60

Data Hazards Data Hazards v Order of operand accesses changed by pipeline v Starting next instruction before first is finished v Dependencies go backward in time P61

Handling Data Hazards vdetect vresolve remaining ones vcompiler inserts NOP vstall vforward P62

Software Solution vhave compiler guarantee no hazards vwhere do we insert the NOPs? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) vproblem: not efficient enough! P63

Detecting Data Hazards v Hazard conditions: v EX/MEM.RegisterRd = ID/EX.RegisterRs v EX/MEM.RegisterRd = ID/EX.RegisterRt v MEM/WB.RegisterRd = ID/EX.RegisterRs v MEM/WB.RegisterRd = ID/EX.RegisterRt (1a) (1b) (2a) (2b) v Two optimizations: v Don t forward if instruction does not write register check if RegWrite is asserted v Don t forward if destination register is $0 check if RegisterRd = 0 P64

Detecting Data Hazards (cont.) vhazard conditions using control signals: At EX stage (EX hazard): If ( EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd=ID/EX.RegRs ) ForwardA = 10 P65

Detecting Data Hazards (cont.) vhazard conditions using control signals: vat MEM stage: MEM/WB.RegWrite and (MEM/WB.RegRd 0) and (MEM/WB.RegRd=ID/EX.RegRs) v(replace ID/EX.RegRt for ID/EX.RegRs for the other two conditions) P66

Resolving Hazards: Forwarding v Use temporary results, e.g., those in pipeline registers, don t wait for them to be written P67

Forwarding Logic v Forwarding: input to ALU from any pipe registers v Add multiplexors to ALU input v Control forwarding in EX carry Rs in ID/EX v Control signals for forwarding: v If both WB and MEM forward, e.g., add $1,$1,$2; add $1,$1,$3; add $1,$1,$4; => let MEM forward v EX hazard: if (EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd=ID/EX.RegRs)) ForwardA=10 v MEM hazard: if (MEM/WB.RegWriteand (MEM/WB.RegRd 0) and (EX/MEM.RegRd ID/EX.Reg.Rs) and (MEM/WB.RegRd=ID/EX.RegRs)) ForwardA=01 (ID/EX.RegRt <-> ID/EX.RegRs, ForwardB <-> ForwardA) P68

No Forwarding P69

With Forwarding P70

Pipeline with Forwarding P71

Example 3: Cycle 3 P72

Example 3: Cycle 4 P73

Example 3: Cycle 5 P74

Example 3: Cycle 6 P75

Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P76

Can't Always Forward v lw can still cause a hazard: v if is followed by an instruction to read the loaded reg. P77

Stalling v Stall pipeline by keeping instructions in same stage and inserting an NOP instead P78

Handling Stalls v Hazard detection unit in ID to insert stall between a load instruction and its use: if (ID/EX.MemRead and ((ID/EX.RegisterRt= IF/ID.RegisterRs) or (ID/EX.RegisterRt= IF/ID.registerRt)) stall the pipeline for one cycle (ID/EX.MemRead=1 indicates a load instruction) v How to stall? v Stall instruction in IF and ID: not change PC and IF/ID => the stages re-execute the instructions v What to move into EX: insert an NOP by changing EX, MEM, WB control fields of ID/EX pipeline register to 0 as control signals propagate, all control signals to EX, MEM, WB are deasserted and no registers or memories are written P79

Pipeline with Stalling Unit v Forwarding controls ALU inputs, hazard detection controls PC, IF/ID, control signals P80

Example 4: Cycle 2 P81

Example 4: Cycle 3 P82

Example 4: Cycle 4 P83

Example 4: Cycle 5 P84

Example 4: Cycle 6 P85

Example 4: Cycle 7 P86

Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions (optional) v (optional) P87

Branch Hazards v When decide to branch, other instructions are still in the pipe P88

Handling Branch Hazard v Predict branch always not taken v Need to add hardware for flushing inst. if wrong v Branch decision made at MEM => need to flush inst. in IF, ID, EX by changing control values to 0 v Reduce delay of taken branch by moving branch execution earlier in the pipeline v Move up branch address calculation to ID v Check branch equality at ID (using XOR) by comparing the two registers read during ID v Branch decision made at EX => one inst. to flush v Add a control signal, IF.Flush, to zero instruction field of IF/ID => making the instruction an NOP v Dynamic branch prediction v Compiler rescheduling, delay branch P89

Delayed Branch v Predict-not-taken + branch decision at ID => the following inst. is always executed => branches take effect 1 cycle later v 0 clock cycle per branch instruction if can find instruction to put in slot ( 50% of time) P90

Pipeline with Flushing P91

Example 5: Cycle 3 P92

Example 5: Cycle 4 P93

Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath v 6.3 Pipelined Control v 6.4 Data Hazards and Forwarding v 6.5 Data Hazards and Stalls v 6.6 Branch Hazards v 6.8 Exceptions P94

What about Exceptions? v 5 instructions executing in 5 stage pipeline v How to stop the pipeline? restart? v Who caused the interrupt? v Who to serve first, if multiple interrupts at the same time? v Need to know in which stage an exception can occur Stage IF ID EX MEM Problem interrupts occurring Page fault; misaligned memory access; memory-protection violation Undefined or illegal opcode Arithmetic exception Page fault; misaligned memory access; memory error; mem-protection violation; P95

Handling Exceptions v Suppose overflow occur at add $1,$2,$1 v Disable writes of instructions till trap hits WB, e.g., flush following instructions using IF.Flush, ID.Flush, EX.Flush to cause multiplexorsto zero control signals (overflow exception detected at EX => flush offending instr.) v Force trap instruction into IF, e.g., fetch from 4000 0040hex by adding 4000 0040hex to PC input MUX v Save address of offending instruction in EPC v Multiple interrupts: use priority hardware to choose the earliest instruction to interrupt v External interrupts: flexible in when to interrupt P96

Pipeline with Exception P97