CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements. 2. Select set of datapath components & establish clock methodology. 3. Assemble datapath meeting the requirements. 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic. #1 Lec # 5 Winter 2000 12-20-2000
CPU Design & Implantation Process Bottom-up Design: Assemble components in target technology to establish critical timing. Top-down Design: Specify component behavior from high-level requirements. Iterative refinement: Establish a partial solution, expand and improve. Instruction Set Architecture => processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates #2 Lec # 5 Winter 2000 12-20-2000
Single Cycle MIPS Datapath: CPI = 1, Long Clock Cycle Inst Memory Adr <0:15> <11:15> <16:20> <21:25> Rs Rt Rd Imm16 Instruction<31:0> imm16 4 PC Ext Adder Adder npc_sel PC Mux Clk 00 RegDst RegWr busw Clk 1 imm16 Rd 0 Rt 5 5 Rs 5 Rw Ra Rb -bit Registers 16 Rt busa busb Extender Equal 0 Mux 1 ALUctr = ALU MemWr WrEn Adr Data In Data Memory Clk MemtoReg 0 Mux 1 ExtOp ALUSrc #3 Lec # 5 Winter 2000 12-20-2000
Drawback of Single Cycle Processor Long cycle time. All instructions must take as much time as the slowest: Cycle time for load is longer than needed for all other instructions. Real memory is not as well-behaved as idealized memory Cannot always complete data access in one (short) cycle. #4 Lec # 5 Winter 2000 12-20-2000
Abstract View of Single Cycle CPU op Main Control fun ALU control Next PC PC npc_sel Equal ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr Instruction Fetch MemWr Register Fetch Ext ALU Mem Access Reg. Wrt Data Mem Result Store #5 Lec # 5 Winter 2000 12-20-2000
Single Cycle Instruction Timing Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup Load PC Inst Memory Reg File mux ALU Data Mem mux Critical Path Store PC Inst Memory Reg File mux ALU Data Mem Branch PC Inst Memory Reg File cmp mux setup #6 Lec # 5 Winter 2000 12-20-2000
Reducing Cycle Time: Multi-Cycle Design Cut combinational dependency graph by inserting registers / latches. The same work is done in two or more fast cycles, rather than one slow cycle. storage element storage element Acyclic Combinational Logic => Acyclic Combinational Logic (A) storage element storage element Acyclic Combinational Logic (B) storage element #7 Lec # 5 Winter 2000 12-20-2000
Clock Cycle Time & Critical Path Clk............ Critical path: the slowest path between any two storage devices Cycle time is a function of the critical path must be greater than: Clock-to-Q + Longest Path through the Combination Logic + Setup #8 Lec # 5 Winter 2000 12-20-2000
Instruction Processing Cycles Instruction Fetch Next Instruction Instruction Decode Obtain instruction from program storage Update program counter to address of next instruction Determine instruction type Obtain operands from registers }Common steps for all instructions Execute Compute result value or status Result Store Store result in register/memory if needed (usually called Write Back). #9 Lec # 5 Winter 2000 12-20-2000
Partitioning The Single Cycle Datapath Add registers between smallest steps Next PC PC Instruction Fetch npc_sel Operand Fetch ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr MemWr Exec Mem Access Reg. File Data Mem Result Store #10 Lec # 5 Winter 2000 12-20-2000
npc_sel Next PC PC Example Multi-cycle Datapath Instruction Fetch IR Reg File Operand Fetch A B ExtOp ALUSrc ALUctr Ext ALU R MemRd MemWr Mem Access M MemToReg Data Mem RegDst RegWr Reg. File Registers added: IR: Instruction register A, B: Two registers to hold operands read from register file. R: or ALUOut, holds the output of the ALU M: or Memory data register (MDR) to hold data read from data memory Result Store Equal #11 Lec # 5 Winter 2000 12-20-2000
Operations In Each Cycle R-Type Logic Immediate Load Store Branch Instruction Fetch IR Mem[PC] IR Mem[PC] IR Mem[PC] IR Mem[PC] IR Mem[PC] Instruction Decode A R[rs] B R[rt] A R[rs] A R[rs] A R[rs] B R[rt] A R[rs] B R[rt] If Equal = 1 Execution R A + B R A OR ZeroExt[imm16] R A + SignEx(Im16) R A + SignEx(Im16) + (SignExt(imm16) x4) else Memory M Mem[R] Mem[R] B Write Back R[rd] R R[rt] R R[rd] M #12 Lec # 5 Winter 2000 12-20-2000
Finite State Machine (FSM) Control Model State specifies control points for Register Transfer. Transfer occurs upon exiting state (same falling edge). inputs (conditions) Next State Logic Control State State X Register Transfer Control Points Depends on Input Output Logic outputs (control points) #13 Lec # 5 Winter 2000 12-20-2000
Control Specification For Multi-cycle CPU Finite State Machine (FSM) IR MEM[PC] instruction fetch A R[rs] B R[rt] decode / operand fetch Execute R-type R A fun B ORi R A or ZX LW R A + SX SW R A + SX BEQ & ~Equal BEQ & Equal PC PC + SX 00 Memory M MEM[R] MEM[R] B To instruction fetch R[rd] R R[rt] R R[rt] M Write-back To instruction fetch To instruction fetch #14 Lec # 5 Winter 2000 12-20-2000
Traditional FSM Controller state op cond next state control points Truth or Transition Table Equal 6 11 next State control points datapath State op 4 State To datapath #15 Lec # 5 Winter 2000 12-20-2000
Traditional FSM Controller datapath + state diagram => control Translate RTN statements into control points. Assign states. Implement the controller. #16 Lec # 5 Winter 2000 12-20-2000
Mapping RTNs To Control Points Examples & State Assignments IR MEM[PC] instruction fetch 0000 imem_rd, IRen Execute ALUfun, Sen R-type R A fun B 0100 Aen, Ben ORi R A or ZX 0110 A R[rs] B R[rt] 0001 LW R A + SX 1000 SW decode / operand fetch R A + SX 1011 BEQ & ~Equal 0011 BEQ & Equal PC PC + SX 00 0010 Memory RegDst, RegWr, PCen M MEM[S] 1001 MEM[S] B 1100 To instruction fetch state 0000 R[rd] R 0101 R[rt] R 0111 R[rt] M 1010 Write-back To instruction fetch state 0000 To instruction fetch state 0000 #17 Lec # 5 Winter 2000 12-20-2000
BEQ R ORI LW SW Detailed Control Specification State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000??????? 0001 1 0001 BEQ 0 0011 1 1 0001 BEQ 1 0010 1 1 0001 R-type x 0100 1 1 0001 ori x 0110 1 1 0001 LW x 1000 1 1 0001 SW x 1011 1 1 0010 xxxxxx x 0000 1 1 0011 xxxxxx x 0000 1 0 0100 xxxxxx x 0101 0 1 fun 1 0101 xxxxxx x 0000 1 0 0 1 1 0110 xxxxxx x 0111 0 0 or 1 0111 xxxxxx x 0000 1 0 0 1 0 1000 xxxxxx x 1001 1 0 add 1 1001 xxxxxx x 1010 1 0 0 1010 xxxxxx x 0000 1 0 1 1 0 1011 xxxxxx x 1100 1 0 add 1 1100 xxxxxx x 0000 1 0 0 1 #18 Lec # 5 Winter 2000 12-20-2000
Alternative Multiple Cycle Datapath (In Textbook) Miminizes Hardware: 1 memory, 1 adder PCWr PC IorD 0 Mux 1 PCWrCond PCSrc BrWr Zero MemWr RAdr Ideal Memory WrAdr Din Dout IRWr Instruction Reg RegDst Rs Rt Rt 0 Mux 5 5 Rd 1 1 Mux 0 RegWr Ra Rb busa Reg File Rw buswbusb << 2 ALUSelA 4 0 Mux 1 0 1 2 3 1 Mux 0 ALU ALU Control Zero Target ALU Out Imm 16 ExtOp Extend MemtoReg ALUSelB ALUOp #19 Lec # 5 Winter 2000 12-20-2000
Alternative Multiple Cycle Datapath (In Textbook) Shared instruction/data memory unit A single ALU shared among instructions Shared units require additional or widened multiplexors Temporary registers to hold data between clock cycles of the instruction: Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut #20 Lec # 5 Winter 2000 12-20-2000
Operations In Each Cycle R-Type Logic Immediate Load Store Branch Instruction Fetch IR Mem[PC] IR Mem[PC] IR Mem[PC] IR Mem[PC] IR Mem[PC] Instruction Decode A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) A B R[rs] R[rt] ALUout PC + (SignExt(imm16) x4) Execution ALUout A + B ALUout A OR ZeroExt[imm16] ALUout A + SignEx(Im16) ALUout A + SignEx(Im16) If Equal = 1 PC ALUout Memory M Mem[ALUout] Mem[ALUout] B Write Back R[rd] ALUout R[rt] ALUout R[rd] Mem #21 Lec # 5 Winter 2000 12-20-2000
High-Level View of Finite State Machine Control First steps are independent of the instruction class Then a series of sequences that depend on the instruction opcode Then the control returns to fetch a new instruction. Each box above represents one or several state. #22 Lec # 5 Winter 2000 12-20-2000
Instruction Fetch and Decode FSM States #23 Lec # 5 Winter 2000 12-20-2000
Load/Store Instructions FSM States #24 Lec # 5 Winter 2000 12-20-2000
R-Type Instructions FSM States #25 Lec # 5 Winter 2000 12-20-2000
Branch Instruction Single State Jump Instruction Single State #26 Lec # 5 Winter 2000 12-20-2000
#27 Lec # 5 Winter 2000 12-20-2000
Finite State Machine (FSM) Specification IR MEM[PC] PC PC + 4 0000 instruction fetch A R[rs] B R[rt] ALUout PC +SX 0001 decode Execute R-type ALUout A fun B 0100 ORi ALUout A op ZX 0110 LW ALUout A + SX 1000 SW ALUout A + SX 1011 BEQ If A = B then PC ALUout 0010 Memory R[rd] ALUout 0101 To instruction fetch R[rt] ALUout 0111 M MEM[ALUout] 1001 R[rt] M 1010 To instruction fetch MEM[ALUout] B 1100 To instruction fetch #28 Lec # 5 Winter 2000 12-20-2000 Write-back
MIPS Multi-cycle Datapath Performance Evaluation What is the average CPI? State diagram gives CPI for each instruction type Workload below gives frequency of each type Type CPI i for type Frequency CPI i x freqi i Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 Better than CPI = 5 if all instructions took the same number of clock cycles (5). #29 Lec # 5 Winter 2000 12-20-2000