ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http://www.auburn.edu/~uzg5/ Adapted from Dr. Chen-Huan Chiang (Intel) and Prof. Vishwani D. Agrawal (Auburn University) [Adapted from Computer Organization and Design, Patterson & Hennessy, 214] 2/6/217 ELEC 52-1/62-1 Lecture 4 1
Von Neumann Kitchen Start ALU Control Registers PC My choice Processor Program Data Input Memory Output 2/6/217 ELEC 52-1/62-1 Lecture 4 2
Where Does It All Begin? In a register called program counter (PC). PC contains the memory address of the next instruction to be executed. In the beginning, PC contains the address of the memory location where the program begins. 2/6/217 ELEC 52-1/62-1 Lecture 4 3
Where is the Program? Processor Memory Program counter (register) Start address Machine code of program 2/6/217 ELEC 52-1/62-1 Lecture 4 4
How Does It Run? Start PC has memory address where program begins Fetch instruction word from memory address in PC and increment PC PC + 4 to point to next instruction Decode instruction Execute instruction Save result in register or memory No Program complete? Yes STOP 2/6/217 ELEC 52-1/62-1 Lecture 4 5
Datapath and Control Datapath Memory, registers, adders, ALU, and communication buses. Each step (fetch, decode, execute, save result) requires communication (data transfer) paths between memory, registers and ALU. Control Datapath for each step is set up by control signals that set up dataflow directions on communication buses and select ALU and memory functions. Control signals are generated by a control unit consisting of one or more finite-state machines. 2/6/217 ELEC 52-1/62-1 Lecture 4 6
Single-Cycle Processor Simplified MIPS - Datapath
Registers ALU Add Abstract View of MIPS 4 Data PC Address Instruction Instruction memory Register # Register # Register # Address Data Memory Data 2/6/217 ELEC 52-1/62-1 Lecture 4 8
Add Instruction Fetch instructions from Instruction Memory Update PC for next instruction 4 Instruction Memory PC Address Instruction 2/6/217 ELEC 52-1/62-1 Lecture 4 9
Register File: A Datapath Component registers 5 5 Reg 1 Reg 2 32 Reg 1 Data Write register Write Data 5 32 Register File 32 Reg 2 Data RegWrite 2/6/217 ELEC 52-1/62-1 Lecture 4 1
Instruction Decode R-Type 6-bit Opcode and 6-bit funct to Control Unit two registers (rs and rt) Control Unit Instruction I-Type 6-bit Opcode to Control Unit one register (rs) J-Type? Reg 1 Reg 2 Register File Write Reg Write Data Data 1 Data 2 2/6/217 ELEC 52-1/62-1 Lecture 4 11
Execute: R-Type 31-26 25-21 2-16 15-11 1 6 5 opcode rs rt rd shamt funct RegWrite ALU Operation Instruction Reg 1 Reg 2 Data 1 Register File Write Reg Data 2 Write Data ALU zero Why RegWrite? 2/6/217 ELEC 52-1/62-1 Lecture 4 12
Execute: Load/Store 31-26 25-21 2-16 15 - opcode rs rt 16-bit address RegWrite ALU operation MemWrite Instruction Reg 1 Reg 2 Write Reg Write Data Data 1 Data 2 ALU zero Address Data Memory Write Data Data Signextend 16 32 Mem lw $rt, offset($rs) sw $rt, offset($rs) 2/6/217 ELEC 52-1/62-1 Lecture 4 13
ALU Add Add Execute: Branch bne $t, $t1, Label beq $t, $t1, Label 4 Shift left 2 Branch target address ALU operation PC Instruction Reg 1 Reg 2 Register File Write Reg Write Data Data 1 Data 2 zero (To branch control logic) 16 Sign Extend 32 2/6/217 ELEC 52-1/62-1 Lecture 4 14
Add Execute: Jump Jump operation involves Update lower 28 bits of the PC Lower 26 bits of the fetched instruction shifted left by 2 bits (converting to byte address) op 26-bit address 4 4 MSBs of PC+4 PC Address Instruction Memory Instruction 26 Shift left 2 28 Jump address 2/6/217 ELEC 52-1/62-1 Lecture 4 15
Assembling Datapath Assemble the datapath segments Add control lines and multiplexors as needed Single cycle design fetch, decode and execute each instructions all in one clock cycle No datapath resource can be used more than once per instruction Must be duplicated if needed (e.g., separate Instruction Memory and Data Memory, several adders) Multiplexors needed at the input of shared elements with control lines to do the selection Write signals to control writing to the Register File and Data Memory Cycle time is determined by length of the longest path 2/6/217 ELEC 52-1/62-1 Lecture 4 16
Add Add Datapath (Except Jump) 4 ALUOp Shift left 2 1 Instr[31-26] Control Unit RegWrite ALUSrc PCSrc MemWrite MemtoReg RegDst zero PC Address Instruction Memory Instr[31-] Instr[25-21] Instr[2-16] Instr 1 [15-11] Reg 1 Reg 2 Register File Write Addr Write Data Data 1 Data 2 1 ALU Address Data Memory Write Data Data 1 Instr[15-] Sign 16 Extend 32 ALU control Mem Instr[5-]
Add Add Datapath and Control (Except Jump) 4 ALUOp Branch Shift left 2 1 PCSrc Instr[31-26] RegDst Control Unit RegWrite ALUSrc MemWrite MemtoReg PC Address Instruction Memory Instr[31-] Instr[25-21] Instr[2-16] Instr 1 [15-11] Reg 1 Reg 2 Register File Write Addr Write Data Data 1 Data 2 1 ALU Address Data Memory Write Data Data 1 Instr[15-] Sign 16 Extend 32 ALU control Mem Instr[5-]
Arithmetic Logic Unit (ALU) Operation select ALU function AND 1 OR 1 Add 11 Subtract 111 Set on less than 11 NOR Operation select from control ALU 4 zero overflow result zero = 1, when all bits of result are 2/6/217 ELEC 52-1/62-1 Lecture 4 19
Building a 32 bit ALU 2/6/217 ELEC 52-1/62-1 Lecture 4 2
1-Bit ALU: AND, OR, ADD, SUB, NOR 2/6/217 ELEC 52-1/62-1 Lecture 4 21
slt produces a 1 if rs < rt and otherwise Use subtraction: (a-b) < implies a < b ALU: slt 2/6/217 ELEC 52-1/62-1 Lecture 4 22
ALU: Branch 2/6/217 ELEC 52-1/62-1 Lecture 4 23
ALU Control ALU Control Lines Function AND 1 OR 1 add 11 subtract 111 set on less than 11 NOR 2/6/217 ELEC 52-1/62-1 Lecture 4 24
Single-Cycle Processor Simplified MIPS - Control
Datapath and Control (Except Jump) Instruction RegDst ALUSrc Memto-Reg Reg Write Mem Mem Write Branch ALUOp1 ALUp R-format 1 1 1 lw 1 1 1 1 sw X 1 X 1 beq X X 1 1
ALU Control Load and store word instructions, ALU computes the target memory address by addition Base address + displacement Base register + sign_ext(imm16) R-type instructions ALU performs one of the following 5 actions depending on the value of the 6-bit funct field AND, OR, subtract, add, set on less than Branch ALU performs a subtraction Check the output ZERO We can use 2 bits of opcode (Instr[31:26]) as ALUop to distinguish the above 3 types of instructions lw/sw (), beq (1), R-type (1) Note that the binary encoding (11) is not used 2/6/217 ELEC 52-1/62-1 Lecture 4 27
Recall: ALU Control Inputs 4 bits required for ALU control inputs, ALUctr Remember this in ALU design? = and 1 = or 1 = add 11 = subtract 111 = slt 11 = NOR opcode funct funct op Main 6? 6 Control ALUop 2 ALUctr 4 To ALU 2/6/217 ELEC 52-1/62-1 Lecture 4 28
What s in the box? op 6 Main Control funct 6 ALUop 2 ALU Control ALUctr 4 To ALU Opcode ALUOp Operation Function Code Desired ALU action ALU control input LW Load word xxxxxx add 1 SW Store word xxxxxx add 1 Branch equal 1 Branch equal xxxxxx subtract 11 R-type 1 Addition 1 add 1 R-type 1 Subtraction 11 subtract 11 R-type 1 AND 11 and R-type 1 OR 111 or 1 R-type 1 Set-on-less-than 111 set-on-less-than 111 ALUOp Function code ALU control ALUOp1 ALUOp F5 F4 F3 F2 F1 F input X X X X X X 1 X 1 X X X X X X 11 1 X X X 1 1 X X X 1 11 1 X X X 1 1 X X X 1 1 1 1 X X X 1 1 111
Add Add Control Unit 4 ALUOp Branch Shift left 2 1 PCSrc Instr[31-26] RegDst Control Unit RegWrite ALUSrc MemWrite MemtoReg PC Address Instruction Memory Instr[31-] Instr[25-21] Instr[2-16] Instr 1 [15-11] Reg 1 Reg 2 Register File Write Addr Write Data Data 1 Data 2 1 ALU Address Data Memory Write Data Data 1 Instr[15-] Sign 16 Extend 32 ALU control Mem Instr[5-]
R-Type Instructions add $x, $y, $z 31 25 2 15 1 5 R-type: op rs rt rd shamt funct Instruction Fetch (IF): An instruction is fetched from the instruction memory and the PC is incremented. Instruction Decode (ID): Two registers, $y and $z, are read from the register file. Execution (EX): The ALU operates on the data read from the register file, using the function code (bits 5- of the instruction) to generate the ALU function. Write Back (WB): The result from the ALU is written into the register file using bits 15-11 of the instruction to select the destination register ($x). 2/6/217 ELEC 52-1/62-1 Lecture 4 31
Add Add R-Type Instructions add $x, $y, $z 4 ALUOp Branch Shift left 2 1 PCSrc Instr[31-26] RegDst Control Unit RegWrite ALUSrc MemWrite MemtoReg PC Address Instruction Memory Instr[31-] Instr[25-21] Instr[2-16] Instr 1 [15-11] Reg 1 Reg 2 Register File Write Addr Write Data Data 1 Data 2 1 ALU Address Data Memory Write Data Data 1 Instr[15-] Sign 16 Extend 32 ALU control Mem Instr[5-] 1
I-Type: Load lw $x, offset ($y) 31 25 2 15 I-type: op rs rt offset Instruction Fetch (IF): An instruction is fetched from the instruction memory and the PC is incremented. Instruction Decode (ID): A register ($y) value is read from the register file. Address Calculation (EX): The ALU computes the sum of the value read from the register file and the sign-extended lower 16 bits of the instruction (offset). Memory Operation (MEM): The sum from the ALU is used as the address for the data memory. Write Back (WB): The data from the memory unit is written into the register file; the register destination is given by bits 2-16 of the instruction ($x). 2/6/217 ELEC 52-1/62-1 Lecture 4 33
Add Add I-Type: Load lw $x, offset ($y) 4 ALUOp Branch Shift left 2 1 PCSrc Instr[31-26] RegDst Control Unit RegWrite ALUSrc MemWrite MemtoReg PC Address Instruction Memory Instr[31-] Instr[25-21] Instr[2-16] Instr 1 [15-11] Reg 1 Reg 2 Register File Write Addr Write Data Data 1 Data 2 1 ALU Address Data Memory Write Data Data 1 Instr[15-] Sign 16 Extend 32 ALU control Mem Instr[5-]
I-Type : Branch beq $x, $y, offset 31 25 2 15 I-type: op rs rt offset Instruction Fetch (IF): An instruction is fetched from the instruction memory and the PC is incremented. Instruction Decode (ID): Two registers, $x and $y, are read from the register file. Branch Address calculation (EX): The ALU performs a subtract on the data values read from the register file. The value of PC + 4 is added to the sign-extended lower 16 bits of the instruction (offset); the result is the branch target address. Branch Decision: The Zero result from the ALU is used to decide which adder result to store into the PC. 2/6/217 ELEC 52-1/62-1 Lecture 4 35
Add Add I-Type: beq beq $x, $y, offset 4 ALUOp Branch Shift left 2 1 PCSrc Instr[31-26] RegDst Control Unit RegWrite ALUSrc MemWrite MemtoReg PC Address Instruction Memory Instr[31-] Instr[25-21] Instr[2-16] Instr 1 [15-11] Reg 1 Reg 2 Register File Write Addr Write Data Data 1 Data 2 1 ALU Address Data Memory Write Data Data 1 Instr[15-] Sign 16 Extend 32 ALU control Mem Instr[5-] 1
Control Signals Instruction RegDst ALUSrc Memto-Reg Reg Write Mem Mem Write Branch ALUOp1 ALUp R-format 1 1 1 lw 1 1 1 1 sw X 1 X 1 beq X X 1 1 op[] op[5] Control Unit RegDst ALUSrc ALUp1 ALUp 2/6/217 ELEC 52-1/62-1 Lecture 4 37
Adding jump hardware op 26-bit address Note: the 26-bit address is a word address Must be multiplied by 4 to obtain the byte address, i.e. shift-left-by 2 Low order 26 bits of the jump instruction 26 PC[31:28] or PC+4[31:28]? 4 PC 32 32 2/6/217 ELEC 52-1/62-1 Lecture 4 38
Add Add 4 Instr[25-] Shift left 2 26 4 28 ALUOp PC[31-28] jump Branch RegWrite Jump 32 Shift left 2 1 PCSrc 1 Instr[31-26] RegDst Control Unit ALUSrc MemWrite MemtoReg PC Address Instruction Memory Instr[31-] Instr[25-21] Instr[2-16] Instr 1 [15-11] Reg 1 Reg 2 Register File Write Addr Write Data Data 1 Data 2 1 ALU Address Data Memory Write Data Data 1 Instr[15-] Sign 16 Extend 32 ALU control Mem Instr[5-]
Limitations Inefficient clocking Clock cycle must be timed to accommodate the slowest instruction Problematic for more complex instructions like floating point multiply Clk Cycle 1 Cycle 2 lw sw Waste May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle BUT it is simple and easy to understand Especially the design of the main control unit Combinational logic 2/6/217 ELEC 52-1/62-1 Lecture 4 4
lk Clk Clk Registers ALU Add Critical Path (Load) Critical Path = PC s Clk-to-Q + Instruction Memory s Access Time + Register File s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 4 Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid => Output valid after access time. (i.e. delay) Data Register # PC Address Instruction Register # Address Instruction memory Register # Data Memory Data
Arithmetic & Logical Cycle Time IF ID EXE WB Load IF ID EXE MEM WB Critical Path Store IF ID EXE MEM Branch IF ID EXE 2/6/217 ELEC 52-1/62-1 Lecture 4 42
Multicycle Datapath Approach Let an instruction take more than 1 clock cycle to complete Break up instructions into steps where each step takes a cycle while trying to Balance the amount of work to be done in each step Restrict each cycle to use only one major functional unit Not every instruction takes the same number of clock cycles In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used in different clock cycles, hence One memory but only one memory access per cycle Recall instruction and data memory in single-cycle processor One ALU/adder but only one ALU operation per cycle Recall one adder for PC+4 and one ALU/adder for others in single-cycle processor 2/6/217 ELEC 52-1/62-1 Lecture 4 43
Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do the same work in two fast cycles, rather than one slow one storage element storage element Acyclic Combinational Logic Acyclic Combinational Logic (A) => storage element storage element Acyclic Combinational Logic (B) 2/6/217 ELEC 52-1/62-1 Lecture 4 storage element 44
MDR B ALUout PC A IR Multicycle Datapath Abstract View End of a cycle All data needed in subsequent clock cycles must be stored in an internal register (not visible to the programmers). All (except IR) hold data only between a pair of adjacent clock cycles (no write control signal for the internal register is needed) Address Memory Data (Instr. or Data) Write Data Reg 1 Reg 2Data 1 Register File Write Addr Write Data Data 2 ALU Single Memory Unit, Single ALU, Temporary registers after major functional unit IR Instruction Register MDR Memory Data Register A, B regfile read data registers ALUout ALU output register 2/6/217 ELEC 52-1/62-1 Lecture 4 45
Next: Pipelining https://www.youtube.com/watch?v=ijarlbd9r3 https://www.youtube.com/watch?v=anxgje6i3g8 https://www.youtube.com/watch?v=5lp4ebfpati