CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CPE 442 single-cycle datapath.1
Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture? (1 minutes) The Steps of Designing a Processor (1 minutes) Datapath and timing for Reg-Reg Operations (15 minutes) Datapath for Logical Operations with Immediate (5 minutes) Datapath for Load and Store Operations (1 minutes) Datapath for Branch and Jump Operations (1 minutes) CPE 442 single-cycle datapath.2
Course Overview Computer Design Instruction Set Deign Machine Language Compiler View "Computer Architecture" "Instruction Set Processor" "Building Architect" Computer Hardware Design Machine Implementation Logic Designer's View "Processor Architecture" "Computer Organization Construction Engineer CPE 442 single-cycle datapath.3
The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Memory Input Datapath Output Today s Topic: Datapath Design CPE 442 single-cycle datapath.4
The Big Picture: The Performance Perspective Performance of a machine was determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction In the next two lectures: Single cycle processor: - Advantage: One clock cycle per instruction - Disadvantage: long cycle time CPE 442 single-cycle datapath.5
Recall the MIPS Instruction Set Architecture CPE 442 single-cycle datapath.6
CPE 442 single-cycle datapath.7
CPE 442 single-cycle datapath.8
CPE 442 single-cycle datapath.9
The MIPS Instruction Formats All MIPS instructions are bits long. The three instruction formats: 31 26 21 16 11 6 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-type 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits J-type 31 26 op target address 6 bits 26 bits The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction CPE 442 single-cycle datapath.1
The MIPS Subset ADD and subtract add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm16 31 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 LOAD and STORE lw rt, rs, imm16 sw rt, rs, imm16 BRANCH: beq rs, rt, imm16 JUMP: 31 26 j target op target address 6 bits 26 bits CPE 442 single-cycle datapath.11
An Abstract View of the Implementation PC Instruction Fetch Instruction Address Ideal Instruction Memory Rd 5 Instruction Rs 5 Rt 5 Operand Fetch Imm 16 Rw Ra Rb -bit Registers Execute ALU Data Address Data In Ideal Data Memory DataOut CPE 442 single-cycle datapath.12
Clocking Methodology Setup Hold Don t Care Setup Hold............ All storage elements are clocked by the same clock edge Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew CPE 442 single-cycle datapath.13
An Abstract View of the Critical Path Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: - Address valid => Output valid after access time. PC Ideal Instruction Memory Instruction Address Rd 5 Instruction Rs 5 Rt 5 Imm 16 Critical Path (Load Operation) = PC s -to-q + Instruction Memory s Access Time + Register File s Access Time + ALU to Perform a -bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Rw Ra Rb -bit Registers ALU Data Address Data In Ideal Data Memory DataOut CPE 442 single-cycle datapath.14
Outline of Today s Lecture Recap and Introduction (5 minutes) Where are we with respect to the BIG picture? (1 minutes) The Steps of Designing a Processor (1 minutes) Datapath and timing for Reg-Reg Operations (15 minutes) Datapath for Logical Operations with Immediate (5 minutes) Datapath for Load and Store Operations (1 minutes) Datapath for Branch and Jump Operations (1 minutes) CPE 442 single-cycle datapath.15
The Steps of Designing a Processor ) Instruction Set Architecture => Register Transfer Language Program Convert each instruction to RTL statements 1)Register Transfer Language program => Datapath Design Determine Datapath components Determine Datapath interconnects 2)Datapath components => Control signals 3)Control signals => Control logic => Control Unit Design CPE 442 single-cycle datapath.16
Step: ) Instruction Set Architecture => Register Transfer Language Program Register Transfer Language-RTL: The ADD Instruction add rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] <- R[rs] + R[rt] The ADD operation PC <- PC + 4 Calculate the next instruction s address CPE 442 single-cycle datapath.17
RTL: The Load Instruction lw rt, rs, imm16 mem[pc] Fetch the instruction from memory Addr <- R[rs] + SignExt(imm16) Calculate the memory address R[rt] <- Mem[Addr] Load the data into the register PC <- PC + 4 CPE 442 single-cycle datapath.18 Calculate the next instruction s address
Step 1) : RTL To Data Path Design Data Path Components, Adder MUX A B A Select MUX Adder CarryIn Y B Sum Carry Combinational Logic Elements ALU OP A B ALU Result Zero CPE 442 single-cycle datapath.19
Step 1) : RTL To Data Path Design Data Path Components, Storage Elements: Register Register Similar to the D Flip Flop except - N-bit input and output - Write Enable input Write Enable: - : Data Out will not change - 1: Data Out will become Data In Write Enable Data In N Data Out N CPE 442 single-cycle datapath.2
Step 1) : RTL To Data Path Design Data Path Components, Storage Elements: Register File Register File consists of registers: Two -bit output busses: busa and busb One -bit input bus: busw Register is selected by: RA selects the register to put on busa RB selects the register to put on busb RW selects the register to be written via busw when Write Enable is 1 Write Enable busw RW RA RB 5 5 5 -bit Registers Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid => busa or busb valid after access time. busa busb CPE 442 single-cycle datapath.21
Step 1) : RTL To Data Path Design Data Path Components, Storage Elements: Idealized Memory Memory (idealized) One input bus: Data In One output bus: Data Out Data In Write Enable Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory memory word to be written via the Data In bus Address DataOut Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: - Address valid => Data Out valid after access time. CPE 442 single-cycle datapath.22
Step 1) : RTL To Data Path Design Data Path Components, Overview of the Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump PC <- something else PC Next Address Logic Address Instruction Memory Instruction Word CPE 442 single-cycle datapath.23
Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture? (1 minutes) The Steps of Designing a Processor (1 minutes) Datapath and timing for Reg-Reg Operations (5 minutes) Datapath for Logical Operations with Immediate (5 minutes) Datapath for Load and Store Operations (1 minutes) Datapath for Branch and Jump Operations (1 minutes) CPE 442 single-cycle datapath.24
Step 1): RTL To Data-Path Design Determine Data-path interconnects RTL: The ADD Instruction 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] <- R[rs] + R[rt] The actual operation PC <- PC + 4 Calculate the next instruction s address CPE 442 single-cycle datapath.25
RTL: The Subtract Instruction 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits sub rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] <- R[rs] - R[rt] The actual operation PC <- PC + 4 Calculate the next instruction s address CPE 442 single-cycle datapath.26
RTL To Data Path Design Data-path for Register-Register Operations R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw comes from instruction s rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits RegWr busw Rd Rs 5 5 5 Rt Rw Ra Rb -bit Registers busa busb ALUctr ALU Result CPE 442 single-cycle datapath.27
Register-Register Timing PC Rs, Rt, Rd, Op, Func ALUctr Old Value -to-q New Value Old Value Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value RegWr Old Value New Value Register File Access Time busa, B Old Value New Value ALU Delay busw Old Value New Value RegWr busw Rd Rs Rt 5 5 5 Rw Ra Rb -bit Registers busa busb ALUctr ALU Result Register Write Occurs Here CPE 442 single-cycle datapath.28
Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture? (5 minutes) The Steps of Designing a Processor (1 minutes) Datapath and timing for Reg-Reg Operations (15 minutes) Datapath for Logical Operations with Immediate (5 minutes) Datapath for Load and Store Operations (1 minutes) Datapath for Branch and Jump Operations (1 minutes) CPE 442 single-cycle datapath.29
RTL: The OR Immediate Instruction 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 ori rt, rs, imm16 mem[pc] Fetch the instruction from memory R[rt] <- R[rs] or ZeroExt(imm16) The OR operation PC <- PC + 4 Calculate the next instruction s address 31 16 15 16 bits immediate 16 bits CPE 442 single-cycle datapath.3
Datapath for Logical Operations with Immediate R[rt] <- R[rs] op ZeroExt[imm16]] Example: ori rt, rs, imm16 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits rd 16 bits 16 11 Rd Rt RegDst Mux Rs RegWr 5 5 5 busw imm16 Rw Ra Rb -bit Registers 16 Don t Care (Rt) busb ZeroExt busa Mux ALUSrc ALUctr ALU Result CPE 442 single-cycle datapath.31
Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture? (5 minutes) The Steps of Designing a Processor (1 minutes) Datapath and timing for Reg-Reg Operations (15 minutes) Datapath for Logical Operations with Immediate (5 minutes) Datapath for Load and Store Operations (1 minutes) Datapath for Branch and Jump Operations (1 minutes) Assignment #3 CPE 442 single-cycle datapath.
RTL: The Load Instruction lw rt, rs, imm16 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 mem[pc] Fetch the instruction from memory Addr <- R[rs] + SignExt(imm16) Calculate the memory address R[rt] <- Mem[Addr] Load the data into the register PC <- PC + 4 Calculate the next instruction s address SignExt operation 31 16 15 16 bits 31 16 15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 bits immediate 16 bits immediate 16 bits CPE 442 single-cycle datapath.33
Datapath for Load Operations R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits rd 16 bits 16 11 Rd Rt RegDst Mux Rs RegWr 5 5 5 busw imm16 Rw Ra Rb -bit Registers 16 Don t Care (Rt) busb Extender busa Mux ALUSrc ALUctr ALU Data In MemWr WrEn Adr Data Memory MemtoReg Mux CPE 442 single-cycle datapath.34 ExtOp
RTL: The Store Instruction 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits sw rt, rs, imm16 mem[pc] Fetch the instruction from memory Addr <- R[rs] + SignExt(imm16) Calculate the memory address Mem[Addr] <- R[rt] Store the register into memory PC <- PC + 4 Calculate the next instruction s address CPE 442 single-cycle datapath.35
Datapath for Store Operations Mem[R[rs] + SignExt[imm16] <- R[rt]] Example: sw rt, rs, imm16 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 RegDst busw Rd RegWr Mux imm16 Rt 5 5 Rs 5 Rw Ra Rb -bit Registers 16 Rt busb Extender busa Mux ALUSrc ALUctr Data In ALU MemWr WrEn Adr Data Memory MemtoReg Mux CPE 442 single-cycle datapath.36 ExtOp
Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture? (1 minutes) The Steps of Designing a Processor (1 minutes) Datapath and timing for Reg-Reg Operations (15 minutes) Datapath for Logical Operations with Immediate (5 minutes) Datapath for Load and Store Operations (1 minutes) Datapath for Branch and Jump Operations (1 minutes) CPE 442 single-cycle datapath.37
RTL: The Branch Instruction 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits beq rs, rt, imm16 mem[pc] Fetch the instruction from memory Cond <- R[rs] - R[rt] Calculate the branch condition if (COND eq ) Calculate the next instruction s address PC <- PC + 4 + ( SignExt(imm16) x 4 ) else PC <- PC + 4 CPE 442 single-cycle datapath.38
Step 1) : RTL To Data Path Design Data Path Components, Overview of the Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump PC <- something else PC Next Address Logic Address Instruction Memory Instruction Word CPE 442 single-cycle datapath.39
Datapath for Branch Operations beq rs, rt, imm16 We need to compare Rs and Rt! 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 Rd Rt RegDst Mux Rs RegWr 5 5 5 busw imm16 Rw Ra Rb -bit Registers 16 Rt busb Extender busa Mux ALUSrc ALUctr ALU imm16 16 Branch Zero PC Next Address Logic To Instruction Memory CPE 442 single-cycle datapath.4 ExtOp
Binary Arithmetics for the Next Address In theory, the PC is a -bit byte address into the instruction memory: Sequential operation: PC<31:> = PC<31:> + 4 Branch operation: PC<31:> = PC<31:> + 4 + SignExt[Imm16] * 4 The magic number 4 always comes up because: The -bit PC is a byte address And all our instructions are 4 bytes ( bits) long In other words: The 2 LSBs of the -bit PC are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simply the hardware by using a 3-bit PC<31:2>: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat CPE 442 single-cycle datapath.41
Next Address Logic: Expensive and Fast Solution Using a 3-bit PC: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat PC 3 1 imm16 Instruction<15:> 16 Adder SignExt 3 3 3 3 Adder 3 Mux 1 Addr<31:2> Addr<1:> Instruction Memory Instruction<31:> Branch Zero CPE 442 single-cycle datapath.42
Next Address Logic: Cheap and Slow Solution Why is this slow? Cannot start the address add until Zero (output of ALU) is valid Does it matter that this is slow in the overall scheme of things? Probably not here. Critical path is the load operation. 3 PC imm16 Instruction<15:> 16 3 SignExt 3 Mux 1 3 1 Carry In Adder 3 Addr<31:2> Addr<1:> Instruction Memory Instruction<31:> Branch Zero CPE 442 single-cycle datapath.43
RTL: The Jump Instruction 31 26 op target address 6 bits 26 bits j target mem[pc] Fetch the instruction from memory PC<31:2> <- PC<31:28> concat target<25:> Calculate the next instruction s address CPE 442 single-cycle datapath.44
Instruction Fetch Unit j target PC<31:2> <- PC<31:29> concat target<25:> PC<31:28> Target Instruction<25:> 3 4 26 3 3 1 Mux Addr<31:2> Addr<1:> Instruction Memory PC 3 1 imm16 Instruction<15:> 16 Adder SignExt 3 Adder 3 3 Mux 1 Jump Instruction<31:> Branch equal Zero CPE 442 single-cycle datapath.45
Putting it All Together: A Single Cycle Datapath We have everything except control signals (underline) RegDst busw 1 RegWr Rd Mux imm16 Rt 5 5 Rs 5 Rw Ra Rb -bit Registers 16 Rt busb Extender Branch Jump busa Mux 1 ALUSrc Instruction Fetch Unit ALUctr Data In ALU Zero Instruction<31:> Rt <21:25> WrEn Rs <16:2> MemWr Adr Data Memory Rd <11:15> <:15> Imm16 Mux 1 MemtoReg CPE 442 single-cycle datapath.46 ExtOp