ECE68 Computer Organization and Architecture Designing a Multiple Cycle Processor ECE68 Multipath. -- op 6 Instr<:6> RegDst A Single Cycle Processor busw RegWr Main imm6 Instr<:> Rb -bit Registers 6 op RegDst Src : busb Extender Branch Jump busa ExtOp func Instr<:> 6 Src Instruction Fetch Unit ctr Data In ECE68 Multipath. -- Instruction<:> <:> <6:> MemWr WrEn Adr Data ctr <:> <:> Imm6 MemtoReg
Concepts Datapath Components Connections Functional/Combinational Sate//Sequential Bus for data Signals instruction signals ECE68 Multipath. -- Instruction Fetch Unit Why does the use -bit counter? imm6 Instruction<:> 6 <:8> Target Instruction<:> Adder SignExt 6 Adder Jump Addr<:> Addr<:> Instruction Instruction<:> Branch ECE68 Multipath. --
The Main op<>.. op<>.. op<>.. op<>.. op<>.. op<>.. <> <> <> <> <> op<> R-type ori lw sw beq jump RegWrite Src RegDst MemtoReg MemWrite Branch Jump ExtOp op<> op<> op<> ECE68 Multipath. -- Drawbacks of this Single Cycle Processor Long cycle time: Cycle time must be long enough for the load instruction: - s Clock -to-q + - Instruction Access Time + - Register File Access Time + - Delay (address calculation) + - Data Access Time + - Register File Setup Time + - Clock Skew Cycle time is much longer than needed for all other instructions. Examples: R-type instructions do not require data memory access Jump does not require operation nor data memory access ECE68 Multipath.6 --
Overview of a Multiple Cycle Implementation The root of the single cycle processor s problems: The cycle time has to be long enough for the slowest instruction Solution: Break the instruction into smaller steps Execute each step (instead of the entire instruction) in one cycle - Cycle time: time it takes to execute the longest step, not the longest instruction - Keep all the steps to have similar length This is the essence of the multiple cycle processor The advantages of the multiple cycle processor: Cycle time is much shorter Different instructions take different number of cycles to complete - Load takes five cycles - Jump only takes three cycles Allows a functional unit to be used more than once per instruction - Adder + - Instruction mem + Data mem ECE68 Multipath.7 -- The Five Steps of a Load Instruction,,, Op, Func ctr Old Value Instruction Fetch Instr Decode / Reg. Fetch -to-q New Value Old Value Old Value Address Instruction Access Time New Value Delay through Logic New Value Data Reg Wr ExtOp Old Value New Value Src Old Value New Value RegWr Old Value New Value Register File Access Time busa Old Value New Value Delay through Extender & busb Old Value New Value Delay Address Old Value New Value Data Access Time busw Old Value New Register File Write Time ECE68 Multipath.8 --
Register File & Write Timing: vs. Reality In previous lectures for -cycle machine, register file and memory are simplified: Write happens at the clock tick Address, data, and write enable must be stable one set-up time before the clock tick WrEn Adr In real life for m-cycle machines: Neither register file nor ideal memory has the clock input The write path is a combinational logic delay path: - Write enable goes to and Din settles down - write access delay - Din is written into mem[address] Important: Address and Data must be stable BEFORE Write Enable goes to WrEn Adr ECE68 Multipath.9 -- ce Condition Between Address and Write Enable What is race condition? This real (no clock input) register file may not work reliably in the single cycle processor because: We cannot guarantee will be stable BEFORE RegWr = There is a race between (address) and RegWr (write enable) RegWr Rb busa Reg File busb busw The real (no clock input) memory may not work reliably in the single cycle processor because: We cannot guarantee Address will be stable BEFORE WrEn = There is a race between Adr and WrEn WrEn Adr ECE68 Multipath. --
How to Avoid this ce Condition? Solution for the multiple cycle implementation: Make sure Address is stable by the end of Cycle N Assert Write Enable signal ONE cycle later at Cycle (N + ) Address cannot change until Write Enable is disasserted ECE68 Multipath. -- Dual-Port Dual Port Independent Read (, Dout) and Write (WAdr, Din) ports Read and write (to different location) can occur at the same instruction cycle Read Port is a combinational path: Read Address Valid --> Read Access Delay --> Data Out Valid Write Port is also a combinational path: MemWrite = --> Write Access Delay --> Data In is written into location[wradr] MemWr <:> <:> WrAdr ECE68 Multipath. --
Instruction Fetch Cycle: In the Beginning Every cycle begins right AFTER the clock tick: mem[] <:> + You are here! One Logic Clock Cycle Wr=? MemWr=? WrAdr Dout Din IRWr=? ECE68 Multipath. -- op=? Instruction Fetch Cycle: The End Every cycle ends AT the next clock tick (storage element updates): IR mem[] <:> <:> + One Logic Clock Cycle Wr= You are here! MemWr= WrAdr Din Dout IRWr= Op = Add ECE68 Multipath. --
Instruction Fetch Cycle: Overall Picture Why do we need Wr Op=Add unlike in -cycle machine? : Wr, IRWr x: WrCond RegDst, MemR Others: s Wr= WrCond=x Src= BrWr= IorD= MemWr= IRWr= SelA= Target WrAdr Ifetch busa busb SelB= Op=Add ECE68 Multipath. -- Register Fetch / Instruction Decode busa RegFile[rs] ; busb RegFile[rt] ; is not being used: ctr = xx Wr= WrCond= IorD=x MemWr= IRWr= WrAdr Go to the Op Func 6 6 RegDst=x Imm 6 RegWr= Rb busa Reg File busw busb Src=x SelA=x SelB=xx Op=xx ECE68 Multipath.6 --
Register Fetch / Instruction Decode (Continue) busa Reg[rs] ; busb Reg[rt] ; Target + SignExt(Imm6)* (speculative calculation) Wr= WrCond= IorD=x MemWr= IRWr= RegDst=x Beq ype Ori : WrAdr Op 6 Func 6 Imm 6 ExtOp= ECE68 Multipath.7 -- Extend Rfetch/Decode Op=Add Why can we not further send target : BrWr, ExtOp address to? SelB= x: RegDst, Src IorD, MemtoReg Others: s Src=x BrWr= RegWr= SelA= Target Rb busa Reg File busw busb << SelB= Op=Add Branch Completion if (busa == busb) Target Wr= WrCond= IorD=x MemWr= IRWr= WrAdr RegDst=x Imm 6 ExtOp=x BrComplete Op=Sub SelB= x: IorD, MemReg RegDst, ExtOp : WrCond SelA Src ECE68 Multipath.8 -- Extend RegWr= Rb busa Reg File busw busb << Src= SelA= SelB= BrWr= Target Op=Sub
Instruction Decode: We have a R-type! Next Cycle: R-type Execution Wr= WrCond= IorD=x MemWr= IRWr= RegDst=x WrAdr Beq ype Ori : Op 6 Func 6 Imm 6 ExtOp= Extend RegWr= Rb busa Reg File busw busb << Src=x SelA= SelB= BrWr= Target Op=Add ECE68 Multipath.9 -- R-type Execution Output busa op busb RExec : RegDst SelA SelB= Op=ype x: Src, IorD Wr= WrCond= MemtoReg ExtOp Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Why not set RegDst= at next cycle? Imm Extend 6 ExtOp=x MemtoReg=x Op=ype SelB= ECE68 Multipath. --
R-type Completion Rfinish Op=ype R[rd] <- Output : RegDst, RegWr sela SelB= Wr= WrCond= x: IorD, Src ExtOp Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp=x MemtoReg= Op=ype SelB= ECE68 Multipath. -- A Multiple Cycle Delay Path There is no register to save the results between: Register Fetch: busa Reg[rs] ; busb Reg[rt] R-type Execution: output busa op busb R-type Completion: Reg[rd] output Register here to save outputs of Rfetch? sela IRWr= busa Rb Reg File busw busb selb Op Register here to save outputs of RExec? ECE68 Multipath. --
A Multiple Cycle Delay Path (Continue) Register is NOT needed to save the outputs of Register Fetch: IRWr = : busa and busb will not change after Register Fetch Register is NOT needed to save the outputs of R-type Execution: busa and busb will not change after Register Fetch signals SelA, SelB, and Op will not change after R-type Execution Consequently output will not change after R-type Execution In theory, you need a register to hold a signal value if: () The signal is computed in one clock cycle and used in another. () AND the inputs to the functional block that computes this signal can change before the signal is written into a state element. You can save a register if Cond is true BUT Cond is false: But in practice, this will introduce a multiple cycle delay path: - A logic delay path that takes multiple cycles to propagate from one storage element to the next storage element ECE68 Multipath. -- Pros and Cons of a Multiple Cycle Delay Path A -cycle path example: IR (storage) Reg File Read Reg File Write (storage) Advantages: Register savings We can share time among cycles: - If takes longer than one cycle, still a OK as long as the entire path takes less than cycles to finish busa Rb Reg File busw busb selb ECE68 Multipath. --
Pros and Cons of a Multiple Cycle Delay Path (Continue) Disadvantage: Static timing analyzer, which ONLY looks at delay between two storage elements, will report this as a timing violation You have to ignore the static timing analyzer s warnings But you may end up ignoring real timing violations Always TRY to put in registers between cycles to avoid MCDP busa Rb Reg File busw busb selb ECE68 Multipath. -- Instruction Decode: We have an Ori! Next Cycle: Ori Execution Wr= WrCond= IorD=x MemWr= IRWr= RegDst=x WrAdr Beq ype Ori : Intruction Reg Op 6 Func 6 Imm 6 ExtOp= Extend RegWr= Rb busa Reg File busw busb << Src=x SelA= SelB= BrWr= Target Op=Add ECE68 Multipath.6 --
Ori Execution output busa or Ext[Imm6] Op=Or OriExec : SelA SelB= x: MemtoReg IorD, Src Wr= WrCond= Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg=x Op=Or SelB= ECE68 Multipath.7 -- Ori Completion OriFinish Op=Or Reg[rt] output x: IorD, Src SelB= : SelA Wr= WrCond= RegWr Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg= Op=Or SelB= ECE68 Multipath.8 --
Instruction Decode: We have a Access! Next Cycle: Address Calculation Wr= WrCond= IorD=x MemWr= IRWr= Beq ype Ori : WrAdr Op 6 Func 6 RegDst=x Imm 6 ExtOp= ECE68 Multipath.9 -- Extend RegWr= Rb busa Reg File busw busb << Src=x SelA= SelB= BrWr= Target Op=Add Address Calculation AdrCal : ExtOp SelA SelB= output busa + SignExt[Imm6] Op=Add x: MemtoReg Src Wr= WrCond= Src=x BrWr= IorD=x MemWr= IRWr= RegDst=x RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg=x Op=Add SelB= ECE68 Multipath. --
Access for Store : ExtOp SWmem MemWr SelA mem[ output] busb SelB= Op=Add Wr= WrCond= x: Src,RegDst MemtoReg Src=x BrWr= IorD=x MemWr= IRWr= RegDst=x RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Keep SelA, SelB, Op same as previous cycle! Imm Extend 6 ExtOp= MemtoReg=x Op=Add SelB= ECE68 Multipath. -- Access for Load : ExtOp LWmem SelA, IorD SelB= Mem Dout mem[ output] Op=Add x: MemtoReg Src Wr= WrCond= Src=x BrWr= IorD= MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg=x Op=Add SelB= ECE68 Multipath. --
Write Back for Load LWwr : SelA RegWr, ExtOp Reg[rt] Mem Dout MemtoReg SelB= Op=Add Wr= WrCond= x: Src IorD Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg= Op=Add SelB= ECE68 Multipath. -- Putting it all together: Multiple Cycle Datapath Wr WrCond Src BrWr IorD MemWr IRWr RegDst RegWr SelA Target WrAdr Rb busa Reg File busw busb << Imm 6 ExtOp Extend MemtoReg SelB Op ECE68 Multipath. --
Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Do NOT confuse Multiple Cycle Processor with Multiple Cycle Delay Path Multiple Cycle Processor executes each instruction in multiple clock cycles Multiple Cycle Delay Path: a combinational logic path between two storage elements that takes more than one clock cycle to complete It is possible (desirable) to build a MC Processor without MCDP: Use a register to save a signal s value whenever a signal is generated in one clock cycle and used in another cycle later ECE68 Multipath. -- Putting it all together: State Diagram Ifetch Rfetch/Decode BrComplete Op=Add Op=Add Op=Sub : BrWr, ExtOp SelB= : Wr, IRWr x: WrCond SelB= beq x: IorD, MemReg RegDst, MemR x: RegDst, Src RegDst, ExtOp Others: s IorD, MemtoReg : WrCond SelA Others: s Src lw or sw Ori ype AdrCal : ExtOp LWmem : RegDst RExec OriExec : ExtOp SelA, IorD SelA SelA Op=Or SelB= SelB= SelB= : SelA Op=ype Op=Add lw Op=Add SelB= x: Src, IorD x: MemtoReg x: MemtoReg MemtoReg x: MemtoReg Src Src ExtOp IorD, Src sw Rfinish OriFinish LWwr : ExtOp SWMem : SelA MemWr Op=ype Op=Or RegWr, ExtOp SelA : RegDst, RegWr MemtoReg x: IorD, Src SelB= sela SelB= Op=Add SelB= Op=Add SelB= x: Src,RegDst x: IorD, Src : SelA x: Src MemtoReg IorD ExtOp RegWr ECE68 Multipath.6 --
Where to get more information? Next two lectures: Multiple Cycle ler: Appendix C of your text book. Microprogramming: Section. of your text book. D. Patterson, Microprograming, Scientific America, March 98. D. Patterson and D. Ditzel, The Case for the Reduced Instruction Set Computer, Computer Architecture News 8, 6 (October, 98) Homework. See the website. ECE68 Multipath.7 --