CS Computer Architecture Single-Cycle CPU Datapath & Control Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS6C
Processor Design: 5 steps Step : Analyze instruction set to determine datapath requirements Meaning of each instruction is given by register transfers Datapath must include storage element for ISA registers Datapath must support each register transfer Step 2: Select set of datapath components & establish clock methodology Step 3: Assemble datapath components that meet the requirements Step 4: Analyze implementation of each instruction to determine setting of control points that realizes the register transfer Step 5: Assemble the control logic 2
Clk Register-Register Timing: One Complete Cycle (Add/Sub) PC Old Value Rs,, Rd, Op, Func ALUctr New Value Old Value Old Value Instruction Access Time New Value Delay through Control Logic New Value RegWr Old Value New Value Register File Access Time busa, B Old Value New Value ALU Delay busw Old Value New Value RegWr Rd Rs ALUctr 5 5 5 Register Write Rw Ra Rb busa busw Occurs Here RegFile busb ALU 3
A Single Cycle Datapath 4 PC Ext imm6 Adder Adder npc_sel Mux Inst PC Adr Rs RegDst Rd RegWr busw <2:25> imm6 <6:2> Rs 5 5 Rw Ra Rb RegFile 6 Rd 5 <:5> Extender busa busb ExtOp <:5> Imm6 Equal Instruction<3:> ALUSrc ALUctr MemtoReg MemWr = ALU Data In WrEn Adr Data 4
Step 3b: Add & Subtract R[rd] = R[rs] op R[rt] (addu rd,rs,rt) Ra, Rb, and Rw come from instruction s Rs,, and Rd fields 3 26 2 6 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits ALUctr and RegWr: control logic after decoding the instruction busw RegWr Rd Rs 5 5 5 Rw Ra Rb x -bit Registers busa busb Already defined the register file & ALU ALUctr ALU Result 5
3c: Logical Op (or) with Immediate R[rt] = R[rs] op ZeroExt[imm6] RegDst RegWr Rd imm6 Rs 5 5 Rw Ra Rb RegFile 6 5 3 26 2 op rs rt immediate 3 6 bits 5 bits 5 bits 6 5 6 bits ZeroExt busa busb 6 bits ALUSrc ALUctr 6 immediate 6 bits Writing to register (not Rd)!! ALU What about Read? 6
3d: Load Operations R[rt] = Mem[R[rs] + SignExt[imm6]] Example: lw rt,rs,imm6 3 26 2 6 op rs rt immediate 6 bits 5 bits 5 bits 6 bits RegDst RegWr busw Rd imm6 Rs 5 5 Rw Ra Rb RegFile 6 ExtOp 5 Extender busa busb ALUSrc ALUctr ALU MemtoReg Adr Data 7
Mem[ R[rs] + SignExt[imm6] ] = R[rt] Ex.: sw rt, rs, imm6 3 26 2 6 op rs rt immediate 6 bits 5 bits 5 bits 6 bits RegDst RegWr busw 3e: Store Operations Rd imm6 Rs 5 5 Rw Ra Rb RegFile 6 ExtOp 5 Extender busa busb ALUSrc ALUctr ALU Data In MemWr WrEn Adr Data MemtoReg 8
3e: Store Operations Mem[ R[rs] + SignExt[imm6] ] = R[rt] Ex.: sw rt, rs, imm6 3 26 2 6 op rs rt immediate 6 bits 5 bits 5 bits 6 bits RegDst RegWr busw Rd imm6 Rs 5 5 Rw Ra Rb RegFile 6 ExtOp 5 Extender busa busb ALUSrc ALUctr ALU Data In MemWr WrEn Adr Data MemtoReg 9
3f: The Branch Instruction 3 26 2 6 op rs rt immediate 6 bits 5 bits 5 bits 6 bits beq rs, rt, imm6 mem[pc] Fetch the instruction from memory Equal = (R[rs] == R[rt]) Calculate branch condition if (Equal) Calculate the next instruction s address PC = PC + 4 + ( SignExt(imm6) x 4 ) else PC = PC + 4
Datapath for Branch Operations beq rs, rt, imm6 3 26 2 6 op rs rt immediate 6 bits 5 bits 5 bits 6 bits Datapath generates condition (Equal) Inst Address 4 PC Ext imm6 Adder Adder npc_sel Mux PC RegWr busw Rs 5 5 5 Rw Ra Rb RegFile busa busb Equal ALUctr Already have mux, adder, need special sign extender for PC, need equal compare (sub?) = ALU
Instruction Fetch Unit including Branch 3 26 2 6 op rs rt immediate if (Zero == ) then PC = PC + 4 + SignExt[imm6]*4 ; else PC = PC + 4 npc_sel Inst Adr Instruction<3:> Equal MUX npc_sel ctrl How to encode npc_sel? Direct MUX select? imm6 4 PC Ext Adder Adder PC Mux Branch inst. / not branch inst. Let s pick 2nd option npc_sel zero? MUX x Q: What logic gate? 2
Putting it All Together:A Single Cycle Datapath 4 PC Ext imm6 npc_sel & Equal Adder Adder Mux Inst PC Adr Rs RegDst Rd RegWr busw <2:25> imm6 <6:2> Rs 5 5 Rw Ra Rb RegFile 6 Rd 5 <:5> Extender busa busb ExtOp <:5> Imm6 Equal Instruction<3:> ALUSrc ALUctr = ALU Data In MemWr WrEn Adr Data MemtoReg 3
Question What new instruction would need no new datapath hardware? A: branch if reg==immediate B: add two registers and branch if result zero C: store with auto-increment of base address: sw rt, rs, offset // rs incremented by offset after store D: shift left logical by two bits 4 PC Ext imm6 npc_sel & Equal Adder Adder Mux Inst Adr PC RegDst RegWr Rs 5 5 busw <2:25> Rs Rd imm6 <6:2> Rw Ra Rb RegFile 6 Rd 5 <:5> Extender busa busb ExtOp <:5> Imm6 Equal Instruction<3:> ALUSrc ALUctr =ALU Data In MemWr WrEn Adr Data MemtoReg 4
Processor Design: 5 steps Step : Analyze instruction set to determine datapath requirements Meaning of each instruction is given by register transfers Datapath must include storage element for ISA registers Datapath must support each register transfer Step 2: Select set of datapath components & establish clock methodology Step 3: Assemble datapath components that meet the requirements Step 4: Analyze implementation of each instruction to determine setting of control points that realizes the register transfer Step 5: Assemble the control logic
4 Datapath Control Signals ExtOp: zero, sign ALUsrc: => regb; => immed ALUctr: ADD, SUB, OR npc_sel: => branch PC Ext imm6 Adder Adder Inst Address npc_sel & Equal Mux PC RegDst RegWr busw Rd imm6 Rs 5 5 Rw Ra Rb RegFile 6 ExtOp 5 MemWr: => write memory MemtoReg: => ALU; => Mem RegDst: => rt ; => rd RegWr: => write register Extender busa busb ALUSrc ALUctr Data In ALU MemWr WrEn MemtoReg Adr Data 6
Given Datapath: RTL à Control Instruction<3:> Instruction Address <26:3> <:5> <2:25> <6:2> <:5> <:5> Op Fun Rs Rd Imm6 Control npc_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg DATA PATH 7
RTL: The Add Instruction 3 26 2 6 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add rd, rs, rt MEM[PC] Fetch the instruction from memory R[rd] = R[rs] + R[rt] The actual operation PC = PC + 4 Calculate the next instruction s address 8
Instruction Fetch Unit at the Beginning of Add Fetch the instruction from Instruction memory: Instruction = MEM[PC] same for all instructions Inst Instruction<3:> 4 PC Ext Adder Adder npc_sel PC Mux Inst Address imm6 9
Single Cycle Datapath during Add 3 26 2 op rs rt rd shamt funct R[rd] = R[rs] + R[rt] RegDst= Rd RegWr= busw 6 npc_sel=+4 imm6 Rs 5 5 Rw Ra Rb RegFile 6 5 ExtOp=x Extender busa busb instr fetch unit zero ALUSrc= 6 Instruction<3:> <2:25> Rs Rd Imm6 ALUctr=ADD MemtoReg= = Data In ALU <6:2> MemWr= WrEn <:5> Adr Data <:5> 2
Instruction Fetch Unit at End of Add PC = PC + 4 Same for all instructions except: Branch and Jump Inst 4 PC Ext npc_sel=+4 Adder Adder PC Mux Inst Address imm6 2
J-type RegDst = RegWr = busw Single Cycle Datapath during Jump 3 Clk op Rd imm6 26 Mux 25 New PC = { PC[3..28], target address, } 5 5 Rs 5 Rw Ra Rb x -bit Registers 6 Jump= npc_sel= busb Extender Clk busa target address ALUctr = Mux ALUSrc = Instruction Fetch Unit ALU Data In Clk Zero Instruction<3:> Rs <2:25> WrEn <6:2> Rd Imm6 TA26 MemtoReg = MemWr = Adr Data <:5> jump <:5> Mux <:25> ExtOp = 22
Single Cycle Datapath during Jump J-type 3 RegDst = x RegWr = busw Clk op Rd Mux imm6 26 25 New PC = { PC[3..28], target address, } 5 5 Rs 5 Rw Ra Rb x -bit Registers 6 Jump= npc_sel=? busb Extender Clk busa target address ALUctr =x Mux ALUSrc = x Instruction Fetch Unit ALU Data In Clk Zero Instruction<3:> Rs <2:25> WrEn <6:2> Rd Imm6 TA26 MemtoReg = x MemWr = Adr Data <:5> jump <:5> Mux <:25> ExtOp = x 23
Instruction Fetch Unit at the End of Jump J-type New PC = { PC[3..28], target address, } Jump npc_sel Zero 3 op 26 25 Inst Adr target address Instruction<3:> jump npc_mux_sel 4 Adder Mux PC How do we modify this to account for jumps? imm6 Adder Clk 24
Instruction Fetch Unit at the End of Jump J-type New PC = { PC[3..28], target address, } Jump npc_sel Zero imm6 3 4 op 26 Adder 25 npc_mux_sel Adder Mux TA 26 target address 4 (MSBs) Mux Inst Clk PC Adr jump Instruction<3:> Query Can Zero still get asserted? Does npc_sel need to be? If not, what? 25
Question Which of the following is TRUE? A. The clock can have a shorter period for instructions that don t use memory B. The ALU is used to set PC to PC+4 when necessary C. Worst-delay path in Instruction Fetch unit is Add+mux delay D. The CPU s control needs only opcode to determine the next PC value to select E. npc_sel affects the next PC address on a jump 26
inst Summary of the Control Signals (/2) Register Transfer add R[rd] R[rs] + R[rt]; PC PC + 4 ALUsrc=RegB, ALUctr= ADD, RegDst=rd, RegWr, npc_sel= +4 sub R[rd] R[rs] R[rt]; PC PC + 4 ALUsrc=RegB, ALUctr= SUB, RegDst=rd, RegWr, npc_sel= +4 ori R[rt] R[rs] + zero_ext(imm6); PC PC + 4 ALUsrc=Im, Extop= Z, ALUctr= OR, RegDst=rt,RegWr, npc_sel= +4 lw R[rt] MEM[ R[rs] + sign_ext(imm6)]; PC PC + 4 ALUsrc=Im, Extop= sn, ALUctr= ADD, MemtoReg, RegDst=rt, RegWr, npc_sel = +4 sw MEM[ R[rs] + sign_ext(imm6)] R[rs]; PC PC + 4 ALUsrc=Im, Extop= sn, ALUctr = ADD, MemWr, npc_sel = +4 beq if (R[rs] == R[rt]) then PC PC + sign_ext(imm6)] else PC PC + 4 npc_sel = br, ALUctr = SUB 27
Summary of the Control Signals (2/2) See func We Don t Care :-) Appendix A op RegDst ALUSrc MemtoReg RegWrite MemWrite npcsel Jump ExtOp ALUctr<2:> add sub ori lw sw beq jump x Add x Subtract Or Add x x Add x x x Subtract x x x? x x R-type 3 26 2 6 6 op rs rt rd shamt funct add, sub I-type op rs rt immediate ori, lw, sw, beq J-type op target address jump 28
Boolean Expressions for Controller RegDst = add + sub ADD ss ssst tttt dddd d ALUSrc = ori + lw + sw SUB ss ssst tttt dddd d MemtoReg = lw ORI ss ssst tttt iiii iiii iiii iiii RegWrite = add + sub + ori + lw LW ss ssst tttt iiii iiii iiii iiii MemWrite = sw SW ss ssst tttt iiii iiii iiii iiii npcsel = beq BEQ ss ssst tttt iiii iiii iiii iiii Jump = jump JUMP ii iiii iiii iiii iiii iiii iiii ExtOp = lw + sw ALUctr[] = sub + beq (assume ALUctr is ADD, SUB, OR) ALUctr[] = or Where: rtype = ~op 5 ~op 4 ~op 3 ~op 2 ~op ~op ori = ~op 5 ~op 4 op 3 op 2 ~op op lw = op 5 ~op 4 ~op 3 ~op 2 op op sw = op 5 ~op 4 op 3 ~op 2 op op beq = ~op 5 ~op 4 ~op 3 op 2 ~op ~op jump = ~op 5 ~op 4 ~op 3 ~op 2 op ~op How do we implement this in gates? add = rtype func 5 ~func 4 ~func 3 ~func 2 ~func ~func sub = rtype func 5 ~func 4 ~func 3 ~func 2 func ~func 29
Controller Implementation opcode func AND logic add sub ori lw sw beq jump OR logic RegDst ALUSrc MemtoReg RegWrite MemWrite npcsel Jump ExtOp ALUctr[] ALUctr[] 3
P&H Figure 4.7 3
Summary: Single-cycle Processor Five steps to design a processor:. Analyze instruction set à datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements Processor Control Datapath 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic Formulate Logic Equations Design Circuits Input Output
Machine Interpretation Levels of Representation/Interpretation High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g., MIPS) Assembler Machine Language Program (MIPS) Hardware Architecture Description (e.g., block diagrams) Architecture Implementation temp = v[k]; v[k] = v[k+]; v[k+] = temp; lw $t, ($2) lw $t, 4($2) sw $t, ($2) sw $t, 4($2) Anything can be represented as a number, i.e., data or instructions Logic Circuit Description (Circuit Schematic Diagrams) 33
No More Magic! Application (ex: browser) Digital Design Circuit Design Operating Compiler System Software Assembler (Mac OSX) Instruction Set Architecture Hardware Processor I/O system Datapath & Control Transistors 34