CS 152 Computer Architecture and Engineering Lecture 1: Designing a Multicycle Processor October 1, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ cs 152 L1 Multicycle.1
Recap: Processor Design is a Process Bottom-up assemble components in target technology to establish critical timing Top-down specify component behavior from high-level requirements Iterative refinement establish partial solution, expand and improve Instruction Set Architecture => processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates cs 152 L1 Multicycle.2
Recap: A Single Cycle Datapath We have everything except control signals (underline) Today s lecture will show you how to generate the control signals RegDst busw 32 Clk 1 RegWr Rd Mux imm16 Rt 5 5 Rs 5 Rw Ra Rb 32 32-bit Registers 16 Rt busb 32 Extender npc_sel Clk busa 32 32 Mux 1 ALUSrc Instruction Fetch Unit ALUctr Data In ALU 32 Clk Zero 32 Instruction<31:> Rt <21:25> WrEn Rs <16:2> MemWr Adr Data Memory Rd <11:15> 32 <:15> Imm16 Mux 1 MemtoReg cs 152 L1 Multicycle.3 ExtOp
Recap: The Truth Table for the Main Control op 6 RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp Main Control ALUop (Symbolic) RegDst ALUSrc : ALUop R-type ori lw sw beq jump 1 1 x 3 R-type 1 1 Or 1 1 1 1 Add func op 111 1 11 1 111 1 1 x 1 x 1 1 Add x x 1 x Subtract ALUop <2> 1 x ALUop <1> 1 x ALUop <> 1 x 6 ALU Control (Local) ALUctr x x x 1 x xxx 3 cs 152 L1 Multicycle.4
Recap: PLA Implementation of the Main Control op<5>.. op<5>.. op<5>.. op<5>.. op<5>.. op<5>.. <> <> <> <> <> op<> R-type ori lw sw beq jump RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<> cs 152 L1 Multicycle.5
Recap: Systematic Generation of Control Decode OPcode Control Logic / Store (PLA, ROM) Instruction Conditions Control Points microinstruction Datapath In our single-cycle processor, each instruction is realized by exactly one control command or microinstruction in general, the controller is a finite state machine microinstruction can also control sequencing (see later) cs 152 L1 Multicycle.6
The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Memory Input Datapath Output Today s Topic: Designing the Datapath for the Multiple Clock Cycle Datapath cs 152 L1 Multicycle.7
Outline of Today s Lecture Recap: single cycle processor VHDL versions Faster designs Multicycle Datapath Performance Analysis Multicycle Control cs 152 L1 Multicycle.8
Behavioral models of Datapath Components entity adder16 is generic (ccout_delay : TIME := 12 ns; adderout_delay: TIME := 12 ns); port(a, B: in vlbit_1d(15 downto ); DOUT: out vlbit_1d(15 downto ); CIN: in vlbit; COUT: out vlbit); end adder16; architecture behavior of adder32 is begin adder16_process: process(a, B, CIN) A 16 B 16 variable tmp : vlbit_1d(18 downto ); variable adder_out : vlbit_1d(31 downto ); variable carry: vlbit; Cout 16 DOUT Cin begin tmp := addum (addum (A, B), CIN); adder_out := tmp(15 downto ); carry :=tmp(16); cs 152 L1 Multicycle.9 COUT <= carry after ccout_delay; DOUT <= adder_out after adderout_delay; end process; end behavior;
Behavioral Specification of Control Logic entity maincontrol is port(opcode: in vlbit_1d(5 downto ); equal_cond: in vlbit; extop out vlbit; ALUsrc out vlbit; ALUop out vlbit_1d(1 downto ); MEMwr out vlbit; MemtoReg out vlbit; RegWr out vlbit; RegDst out vlbit; npc out vlbit; end maincontrol; Decode / Control-store address modeled by Case statement Each arm drives control signals for that operation just like the microinstruction either can be symbolic cs 152 L1 Multicycle.1
Abstract View of our single cycle processor op Main Control fun ALU control Next PC PC Instruction Fetch npc_sel Equal ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr Register Fetch MemWr Ext ALU Mem Access Reg. Wrt Data Mem Result Store looks like a FSM with PC as state cs 152 L1 Multicycle.11
What s wrong with our CPI=1 processor? Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup Load PC Inst Memory Reg File mux ALU Data Mem muxsetup Critical Path Store PC Inst Memory Reg File mux ALU Data Mem Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not so nice as our idealized memory cannot always get the job done in one (short) cycle cs 152 L1 Multicycle.12
Memory Access Time Physics => fast memories are small (large memories are slow) Storage Array selected word line address storage cell bit line address decoder question: register file vs. memory sense amps => Use a hierarchy of memories Processor Cache proc. bus L2 Cache mem. bus memory 1 cycle 2-3 cycles 2-5 cycles cs 152 L1 Multicycle.13
Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one storage element storage element Acyclic Combinational Logic Acyclic Combinational Logic (A) => storage element storage element Acyclic Combinational Logic (B) storage element cs 152 L1 Multicycle.14
Basic Limits on Cycle Time Next address logic PC <= branch? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B Control Next PC PC Instruction Fetch npc_sel Operand Fetch ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr MemWr Exec Mem Access Reg. File Data Mem Result Store cs 152 L1 Multicycle.15
Partitioning the CPI=1 Datapath Add registers between smallest steps Next PC PC Instruction Fetch npc_sel Operand Fetch ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr MemWr Exec Mem Access Reg. File Data Mem Result Store cs 152 L1 Multicycle.16
Example Multicycle Datapath Next PC PC Instruction Fetch Operand Fetch Ext ALU Mem Access npc_sel ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr IR MemToReg Equal Reg File A B R M Data Mem Reg. File Result Store Critical Path? cs 152 L1 Multicycle.17
Administrative Issues Read Chapter 5 This lecture and next one slightly different from the book Midterm next Wednesday 1/8/97: 5:3pm to 8:3pm, 36 Soda No class on that day Midterm reminders: Pencil, calculator, one 8.5 x 11 (both sides) of handwritten notes Sit in every other chair, every other row (odd row & odd seat) Meet at LaVal s pizza after the midterm cs 152 L1 Multicycle.18
Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control cs 152 L1 Multicycle.19
Step 4: R-rtype (add, sub,...) Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ADDU R[rd] < R[rs] + R[rt]; PC < PC + 4 inst Physical Register Transfers IR < MEM[pc] ADDU A< R[rs]; B < R[rt] S < A + B R[rd] < S; PC < PC + 4 Next PC PC Inst. Mem IR Reg File Equal A B Exec S Mem Access M Reg. File cs 152 L1 Multicycle.2 Data Mem
Step 4:Logical immed Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ADDU R[rt] < R[rs] OR zx(im16); PC < PC + 4 inst Physical Register Transfers IR < MEM[pc] ADDU A< R[rs]; B < R[rt] S < A or ZeroExt(Im16) R[rt] < S; PC < PC + 4 Next PC PC Inst. Mem IR Reg File Equal A B Exec S Mem Access M Reg. File cs 152 L1 Multicycle.21 Data Mem
Step 4 : Load Logical Register Transfer Physical Register Transfers inst LW inst LW Logical Register Transfers R[rt] < MEM(R[rs] + sx(im16); PC < PC + 4 Physical Register Transfers IR < MEM[pc] A< R[rs]; B < R[rt] S < A + SignEx(Im16) M < MEM[S] R[rd] < M; PC < PC + 4 Next PC PC Inst. Mem IR Reg File Equal A B Exec S Mem Access M Reg. File cs 152 L1 Multicycle.22 Data Mem
Step 4 : Store Logical Register Transfer Physical Register Transfers inst SW inst SW Logical Register Transfers MEM(R[rs] + sx(im16) < R[rt]; PC < PC + 4 Physical Register Transfers IR < MEM[pc] A< R[rs]; B < R[rt] S < A + SignEx(Im16); MEM[S] < B PC < PC + 4 Next PC PC Inst. Mem IR Reg File Equal A B Exec S Mem Access M Reg. File cs 152 L1 Multicycle.23 Data Mem
Step 4 : Branch Logical Register Transfer Physical Register Transfers inst BEQ Logical Register Transfers if R[rs] == R[rt] then PC <= PC + sx(im16) else PC <= PC + 4 inst Physical Register Transfers IR < MEM[pc] BEQ Eq PC < PC + 4 inst Physical Register Transfers IR < MEM[pc] BEQ Eq PC < PC + sx(im16) Next PC PC Inst. Mem IR Equal Reg File A B Exec S Mem Access M Reg. File cs 152 L1 Multicycle.24 Data Mem
Alternative datapath (book): Multiple Cycle Datapath Miminizes Hardware: 1 memory, 1 adder PCWr PC 32 32 32 IorD Mux 1 32 32 PCWrCond PCSrc BrWr Zero MemWr 32 RAdr Ideal Memory WrAdr Din Dout 32 IRWr Instruction Reg 32 RegDst Rs Rt Rt Mux 5 5 Rd 1 1 Mux RegWr Ra Rb busa Reg File Rw busw busb ALUSelA 32 32 << 2 4 Mux 1 1 2 3 1 Mux 32 32 32 ALU ALU Control Zero Target 32 ALU Out Imm 16 ExtOp Extend 32 MemtoReg ALUSelB ALUOp cs 152 L1 Multicycle.25
Our Control Model State specifies control points for Register Transfer Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic Control State State X Register Transfer Control Points Depends on Input Output Logic outputs (control points) cs 152 L1 Multicycle.26
Step 4 => Control Specification for multicycle proc IR <= MEM[PC] instruction fetch A <= R[rs] B <= R[rt] decode / operand fetch Execute R-type S <= A fun B ORi S <= A or ZX LW S <= A + SX BEQ & Equal SW BEQ & ~Equal S <= A + SX PC <= PC + 4 PC <= PC + SX Memory M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 Write-back cs 152 L1 Multicycle.27
Traditional FSM Controller state op cond next state control points Truth Table Equal 6 11 next State control points 4 State cs 152 L1 Multicycle.28 datapath State op
Step 5: datapath + state diagram => control Translate RTs into control points Assign states Then go build the controller cs 152 L1 Multicycle.29
Mapping RTs to Control Points IR <= MEM[PC] imem_rd, IRen instruction fetch A <= R[rs] B <= R[rt] Aen, Ben decode Execute R-type S <= A fun B ALUfun, Sen ORi S <= A or ZX LW S <= A + SX BEQ & Equal SW BEQ & ~Equal S <= A + SX PC <= PC + 4 PC <= PC + SX Memory R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 M <= MEM[S] R[rt] <= M PC <= PC + 4 MEM[S] <= B PC <= PC + 4 Write-back cs 152 L1 Multicycle.3
Assigning States IR <= MEM[PC] instruction fetch A <= R[rs] B <= R[rt] 1 decode Execute R-type S <= A fun B 1 ORi S <= A or ZX 11 LW S <= A + SX 1 BEQ & Equal SW BEQ & ~Equal S <= A + SX PC <= PC + 4 PC <= PC + 111 11 SX 1 Memory R[rd] <= S PC <= PC + 4 11 R[rt] <= S PC <= PC + 4 111 M <= MEM[S] 11 R[rt] <= M PC <= PC + 4 11 MEM[S] <= B PC <= PC + 4 11 Write-back cs 152 L1 Multicycle.31
Detailed Control Specification R: ORi: LW: SW: State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst??????? 1 1 1 BEQ 11 1 1 1 BEQ 1 1 1 1 1 R-type x 1 1 1 -all same in Moore machine 1 ori x 11 1 1 1 LW x 1 1 1 1 SW x 111 1 1 1 xxxxxx x 1 1 11 xxxxxx x 1 1 xxxxxx x 11 1 fun 1 11 xxxxxx x 1 1 1 11 xxxxxx x 111 or 1 111 xxxxxx x 1 1 1 xxxxxx x 11 1 add 1 11 xxxxxx x 11 1 11 xxxxxx x 1 1 1 111 xxxxxx x 11 1 add 1 11 xxxxxx x 1 1 cs 152 L1 Multicycle.32
Performance Evaluation What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPI i for type Frequency CPI i x freqi i Arith/Logic 4 4% 1.6 Load 5 3% 1.5 Store 4 1%.4 branch 3 2%.6 Average CPI:4.1 cs 152 L1 Multicycle.33
Controller Design The state digrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple microsequencer Control reduces to programming this very simple device microprogramming sequencer control datapath control microinstruction micro-pc sequencer cs 152 L1 Multicycle.34
Example: Jump-Counter i i+1 i op-code Map ROM Counter zero inc load cs 152 L1 Multicycle.35
Using a Jump Counter IR <= MEM[PC] inc instruction fetch Execute Memory A <= R[rs] decode B <= R[rt] 1 load inc R-type ORi LW BEQ & Equal SW BEQ & ~Equal S <= A fun B 1 S <= A or ZX 11 S <= A + SX 1 S <= A + SX 111 PC <= PC + 4 PC <= PC + 11 SX 1 inc inc inc inc M <= MEM[S] MEM[S] <= B zero zero 11 PC <= PC + 4 inc 11 R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4zero zero 11 111 11 zero zero cs 152 L1 Multicycle.36 Write-back
Our Microsequencer taken Z I L datapath control Micro-PC op-code Map ROM cs 152 L1 Multicycle.37
Microprogram Control Specification µpc Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst? inc 1 1 load 1 1 inc 1 x zero 1 1 BEQ 11 x zero 1 R: 1 x inc 1 fun 1 11 x zero 1 1 1 ORi: 11 x inc or 1 111 x zero 1 1 LW: 1 x inc 1 add 1 11 x inc 1 11 x zero 1 1 1 SW: 111 x inc 1 add 1 11 x zero 1 1 cs 152 L1 Multicycle.38
Mapping ROM R-type 1 BEQ 1 11 ori 111 11 LW 111 1 SW 1111 111 cs 152 L1 Multicycle.39
Example: Controlling Memory PC addr Instruction Memory data Inst. Reg InstMem_rd IM_wait IR_en cs 152 L1 Multicycle.4
Controller handles non-ideal memory IR <= MEM[PC] ~wait A <= R[rs] B <= R[rt] instruction fetch wait decode / operand fetch Execute R-type S <= A fun B ORi S <= A or ZX LW S <= A + SX BEQ & Equal SW BEQ & ~Equal S <= A + SX PC <= PC + 4 PC <= PC + SX Memory R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] ~wait wait R[rt] <= M PC <= PC + 4 MEM[S] <= B ~wait wait PC <= PC + 4 Write-back cs 152 L1 Multicycle.41
Really Simple Time-State Control instruction fetch Execute decode R-type ORi S <= A fun B S <= A or ZX IR <= MEM[PC] ~wait A <= R[rs] B <= R[rt] LW S <= A + SX SW wait S <= A + SX BEQ & Equal BEQ & ~Equal write-back Memory R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] wait R[rt] <= M PC <= PC + 4 MEM[S] <= B wait PC <= PC + 4 PC <= PC + 4PC <= PC + SX cs 152 L1 Multicycle.42
Time-state Control Path Local decode and control at each stage Valid Next PC Reg File A B Exec S Mem Access M Data Mem Equal Inst. Mem IR Dcd Ctrl IRex Ex Ctrl IRmem Mem Ctrl IRwb Reg. File PC WB Ctrl cs 152 L1 Multicycle.43
Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique hardwired control microprogrammed control cs 152 L1 Multicycle.44
Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~1 levels of logic between latches Follow same 5-step method for designing real processor cs 152 L1 Multicycle.45
Summary (cont d) Control is specified by finite state digram Specialize state-diagrams easily captured by microsequencer simple increment & branch fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book) Simple Instruction set and powerful datapath => simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once! cs 152 L1 Multicycle.46
Where to get more information? Next two lectures: Multiple Cycle Controller: Appendix C of your text book. Microprogramming: Section 5.5 of your text book. D. Patterson, Microprograming, Scientific America, March 1983. D. Patterson and D. Ditzel, The Case for the Reduced Instruction Set Computer, Computer Architecture News 8, 6 (October 15, 198) cs 152 L1 Multicycle.47