Designing a Multicycle Processor Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Multicycle-
Recap: Putting it All Together: A Single Cycle Processor op 6 Instr<3:26> RegDst busw Clk Main Control RegWr Rd Mux imm6 Instr<5:> Rt 5 5 Rs 5 Rw Ra Rb -bit Registers 6 ALUop RegDst ALUSrc : Rt busb Extender busa ExtOp Clk 3 npc_sel Mux ALUSrc Instruction Fetch Unit ALUctr func Instr<5:> 6 Data In ALU Clk Zero Instruction<3:> Rt ALU Control <2:25> WrEn Rs <6:2> MemWr Adr Data Memory ALUctr Rd <:5> 3 <:5> Imm6 Mux MemtoReg
The Truth Table for the Main Control op 6 Main Control RegDst ALUSrc : ALUop 3 func 6 ALU Control (Local) ALUctr 3 op R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite npc_sel Jump ExtOp ALUop (Symbolic) x R-type Or Add x x Add x x x Subtract x x x x xxx ALUop <2> x ALUop <> x ALUop <> x
PLA Implementation of the Main Control op<5>.. <> op<5>.. op<5>.. op<5>.. op<5>.. op<5>.. <> <> <> <> op<> R-type ori lw sw beq jump RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<> ALUop<> Arquitectura de Computadoras Multicycle- 4
Systematic Generation of Control Decode OPcode Control Logic / Store (PLA, ROM) Instruction Conditions Control Points microinstruction Datapath In our single-cycle processor, each instruction is realized by exactly one control command or microinstruction in general, the controller is a finite state machine microinstruction can also control sequencing (see later) Arquitectura de Computadoras Multicycle- 5
Outline of Today s s Lecture Recap: Single cycle processor Faster designs Multicycle Datapath Performance Analysis Multicycle Control Arquitectura de Computadoras Multicycle- 6
Abstract View of our single cycle processor op Main Control fun ALU control ALU Data Mem Next PC PC Instruction Fetch Register Fetch Ext Mem Access Reg. Wrt Result Store npc_sel Equal ExtOp ALUSrc ALUctr MemWr MemRd RegDst RegWr MemWr looks like a FSM with PC as state Arquitectura de Computadoras Multicycle- 7
What s s wrong with our CPI= processor? Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup setup Load PC Inst Memory Reg File mux ALU Data Mem mux Critical Path Store PC Inst Memory Reg File mux ALU Data Mem setup Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle Arquitectura de Computadoras Multicycle- 8
Memory Access Time Physics => fast memories are small (large memories are slow) address Storage Array selected word line storage cell bit line address decoder question: register file vs. memory => Use a hierarchy of memories Processor Cache proc. bus L2 Cache sense amps Arquitectura de Computadoras Multicycle- 9 mem. bus memory time-period 2-5 time-periods 2-3 time-periods
Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one May be able to short-circuit path and remove some components for some instructions! storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) storage element storage element Acyclic Combinational Logic (B) Arquitectura de Computadoras Multicycle- storage element
Basic Limits on Cycle Time Next address logic PC <= branch? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B Control Next PC PC Instruction Fetch npc_sel Operand Fetch ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr MemWr Exec Mem Access Reg. File Data Mem Arquitectura de Computadoras Multicycle- Result Store
Partitioning the CPI= Datapath Add registers between smallest steps Exec Data Mem Next PC PC Instruction Fetch Operand Fetch Mem Access Reg. File Result Store npc_sel Equal ExtOp ALUSrc ALUctr MemWr MemRd RegDst RegWr MemWr Place enables on all registers Arquitectura de Computadoras Multicycle- 2
Example Multicycle Datapath Instruction Fetch Operand Fetch Next PC PC npc_sel Reg File Data Mem Result Store Mem Access IR Ext ALU Reg. File Equal ExtOp ALUSrc ALUctr MemWr MemRd MemToReg RegDst RegWr E A B S M Critical Path? Arquitectura de Computadoras Multicycle- 3
Recall: Step-by by-step Processor Design Step : ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control Arquitectura de Computadoras Multicycle- 4
Step 4: R-rtypeR (add, sub,...) Logical Register Transfer Physical Register Transfers Time inst Logical Register Transfers ADDU R[rd] < R[rs] + R[rt]; PC < PC + 4 inst Physical Register Transfers IR < MEM[pc] ADDU A< R[rs]; B < R[rt] S < A + B R[rd] < S; PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 5
Step 4: Logical immed Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ORI R[rt] < R[rs] OR ZExt(Im6); PC < PC + 4 inst Physical Register Transfers Time IR < MEM[pc] ORI A< R[rs]; B < R[rt] S < A or ZExt(Im6) R[rt] < S; PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 6
Step 4 : Load Logical Register Transfer inst LW inst Logical Register Transfers R[rt] < MEM[R[rs] + SExt(Im6)]; PC < PC + 4 Physical Register Transfers IR < MEM[pc] Physical Register Transfers Time LW A< R[rs]; B < R[rt] S < A + SExt(Im6) M < MEM[S] R[rd] < M; PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 7
Step 4 : Store Logical Register Transfer Physical Register Transfers Time inst SW inst SW Logical Register Transfers MEM[R[rs] + SExt(Im6)] < R[rt]; PC < PC + 4 Physical Register Transfers IR < MEM[pc] A< R[rs]; B < R[rt] S < A + SExt(Im6); MEM[S] < B PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 8
Step 4 : Branch Logical Register Transfer Physical Register Transfers Time inst BEQ Logical Register Transfers if R[rs] == R[rt] then PC <= PC + 4+SExt(Im6) else PC <= PC + 4 inst Physical Register Transfers IR < MEM[pc] BEQ E< (R[rs] = R[rt]) if E then PC < PC + 4 else PC < PC+4+SExt(Im6) E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 9
Alternative datapath (book): Multiple Cycle Datapath Minimizes Hardware: memory, adder PCWr PC IorD Mux PCWrCond PCSrc BrWr Zero MemWr IRWr RegDst RegWr ALUSelA Target RAdr Ideal Memory WrAdr Din Dout Instruction Reg Rs Rt Rt Mux Rd 5 5 Mux Ra Rb Reg File Rw busa busw busb << 2 4 Mux 2 3 Mux Zero ALU ALU Control ALU Out Imm 6 ExtOp Extend MemtoReg ALUOp Arquitectura de Computadoras Multicycle- 2 ALUSelB
Our Control Model State specifies control points for Register Transfer Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic Control State State X Register Transfer Control Points Depends on Input Output Logic outputs (control points) Arquitectura de Computadoras Multicycle- 2
Step 4 Control Specification for multicycle proc IR <= MEM[PC] instruction fetch A <= R[rs] B <= R[rt] decode / operand fetch R-type ORi LW SW BEQ S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) Execute M <= MEM[S] MEM[S] <= B PC <= PC + 4 Memory R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 Write-back
Traditional FSM Controller state op cond next state control points Truth Table Equal 6 next State control points 4 State datapath State op Arquitectura de Computadoras Multicycle- 23
Step 5 (datapath + state diagram control) Translate RTs into control points Assign states Then go build the controller Arquitectura de Computadoras Multicycle- 24
Mapping RTs to Control Points IR <= MEM[PC] imem_rd, IRen instruction fetch A <= R[rs] B <= R[rt] Aen, Ben, Een decode R-type S <= A fun B ALUfun, Sen ORi S <= A or ZX LW S <= A + SX SW S <= A + SX BEQ PC <= Next(PC,Equal) Execute R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 M <= MEM[S] R[rt] <= M PC <= PC + 4 MEM[S] <= B PC <= PC + 4 Memory Write-back
Assigning States IR <= MEM[PC] instruction fetch A <= R[rs] B <= R[rt] decode R-type S <= A fun B ORi S <= A or ZX LW S <= A + SX SW BEQ S <= A + SX PC <= Next(PC) Execute R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] R[rt] <= M PC <= PC + 4 MEM[S] <= B PC <= PC + 4 Memory Write-back
Detailed Control Specification State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst??????? BEQ x R-type x ori x LW x SW x -all same in Moore machine BEQ: xxxxxx xxxxxx R: xxxxxx x fun xxxxxx x ORi: xxxxxx x or xxxxxx x LW: xxxxxx x add xxxxxx x xxxxxx x SW: xxxxxx x add xxxxxx x
Performance Evaluation What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPI i for type Frequency CPI i x freqi i Arith/Logic 4 4%.6 Load 5 3%.5 Store 4 %.4 branch 3 2%.6 Average CPI:4. Arquitectura de Computadoras Multicycle- 28
Controller Design The state diagrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple microsequencer Control reduces to programming this very simple device microprogramming sequencer control datapath control microinstruction micro-pc sequencer Arquitectura de Computadoras Multicycle- 29
Example: Jump-Counter i i+ i op-code Map ROM None of above: Do nothing (for wait states) Counter zero inc load Arquitectura de Computadoras Multicycle- 3
Using a Jump Counter IR <= MEM[PC] inc instruction fetch A <= R[rs] decode B <= R[rt] load R-type ORi LW SW BEQ S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) inc inc inc inc M <= MEM[S] MEM[S] <= B zero PC <= PC + 4 inc R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 zero zero zero zero Execute Memory Write-back
Our Microsequencer taken Z I L datapath control Micro-PC op-code Map ROM Arquitectura de Computadoras Multicycle-
Microprogram Control Specification BEQ R: ORi: LW: SW: µpc Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst? inc load zero zero x inc fun x zero x inc or x zero x inc add x inc x zero x inc add x zero Arquitectura de Computadoras Multicycle- 33
Mapping ROM R-type BEQ ori LW SW Arquitectura de Computadoras Multicycle- 34
Example: Controlling Memory PC addr Instruction Memory data Inst. Reg InstMem_rd IM_wait IR_en Arquitectura de Computadoras Multicycle- 35
Controller handles non-ideal memory IR <= MEM[PC] ~wait A <= R[rs] B <= R[rt] instruction fetch wait decode / operand fetch R-type ORi S <= A fun B S <= A or ZX LW S <= A + SX SW S <= A + SX BEQ PC <= Next(PC) Execute R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] ~wait wait R[rt] <= M PC <= PC + 4 MEM[S] <= B ~wait wait PC <= PC + 4 Memory Write-back
Really Simple Time-State Control instruction fetch Execute decode R-type S <= A fun B ORi S <= A or ZX IR <= MEM[PC] ~wait A <= R[rs] B <= R[rt] LW S <= A + SX SW wait S <= A + SX BEQ write-back Memory R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] wait R[rt] <= M PC <= PC + 4 MEM[S] <= B PC <= PC + 4 wait PC <= Next(PC)
Time-state Control Path Local decode and control at each stage Valid Data Mem Mem Access Exec Reg. File A B S PC Reg File Equal M Next PC Inst. Mem IR Dcd Ctrl IRex Ex Ctrl IRmem Mem Ctrl IRwb WB Ctrl Arquitectura de Computadoras Multicycle- 38
Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique hardwired control microprogrammed control Arquitectura de Computadoras Multicycle- 39
Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~ levels of logic between latches Follow same 5-step method for designing real processor Arquitectura de Computadoras Multicycle- 4
Summary (cont d) Control is specified by finite state diagram Specialize state-diagrams easily captured by microsequencer simple increment & branch fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book) Simple Instruction set and powerful datapath simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once! Arquitectura de Computadoras Multicycle- 4