Computer Hardware Engineering IS2, spring 27 Lecture 9: LU and s ssociate Professor, KTH Royal Institute of Technology Slides version. 2 Course Structure Module : C and ssembly Programming LE LE2 LE EX LB Module : Processor Design LE9 LE EX S2 LB LE S LB2 Module 2: I/O Systems Module : Hierarchy LE LE6 EX2 LB LE EX S Module : Logic Design LE7 LE8 EX PROJ STRT LD-LB Module 6: Parallel Processors and Programs LE2 LE EX6 S Proj. Expo LE I II
bstractions in Computer Systems Networked Systems and Systems of Systems Computer System pplication Software Software Operating System Set rchitecture Hardware/Software Interface Microarchitecture Logic and Building Blocks Digital Hardware Design Digital Circuits nalog Circuits nalog Design and Physics Devices and Physics I II genda I II I II
cknowledgement: The structure and several of the good examples are derived from the book Digital Design and Computer rchitecture (2) by D. M. Harris and S. L. Harris. I II 6 (LU) n LU saves hardware by combining different arithmetic and logic operations in one single unit/element. N LU N Y B N F F LU LU symbol: both figures have the same function B N N Y N Input F specifies the function that the LU should perform LUs can have different functions and be designed differently. n LU can also include output flags, for instance: Overflow flag (adder overflowed) Zero flag (output is zero) I II
(LU) H E 7 F 2 B N N Exercise: Determine the functional behavior for each value of F. Most significant bit (number 2) N [N-] Bits to Zero Extend I 2 F : Extracts the most significant bit. dds zeros for bits N- to. N N Y F 2: Function ND B OR B B not used ND!B OR!B B SLT Zero extend examples (N=): Input: xffff2. Output: x Input: xff676a. Output: x II 8 I cknowledgement: The structure and several of the good examples are derived from the book Digital Design and Computer rchitecture (2) by D. M. Harris and S. L. Harris. I II
9 Path and Control Unit processor is typically divided into two parts Path Operates on a word of data. Consists of elements such as registers, memory, LUs etc. Control Unit Gets the current instruction from the data path and tells the data path how to execute the instruction. I II s In this lecture, we construct a microarchitecture for a subset of a MIPS processor with the following instructions R-Type: add, sub, and, or, slt / logic instructions instructions I-Type: addi, lw, sw, beq immediate instruction J-Type: Branch instructions j I II
State Elements (/) Program Counter and Register File The architectural states for this MIPS processor are the program counter () and the registers ($, $t, $s, $s, etc.) Reads -bit data -bit address results in 2 = registers at the next clock cycle next at the current clock cycle Program Counter () -bit register Two read ports ( and 2) and one write port (). 2 2 registers of size -bit I II -bit address State Elements (2/) s and Memories Reads -bit word of data -bit address 2 Writes on the rising clock edge and when write enable () is true simplified instruction memory, modeled as a read-only memory (ROM) Reads or writes -bit word of data Non-architectural states are used to simplify logic or improve performance (introduced in the next lecture). I II
State Elements (/) Reading combinationally, writing at clock edge ll the blocks below read combinationally: when the address changes, the data on the read port change after some propagation time. There is no clock involved. The register file and the data memory write at the raising clock edge. 2 2 I II Read from the Current next First step. Read the instruction at the current address. -bit instruction Inst is fetched. I II
lw instruction Read Base ddress E 26 2 op rs 6 bits bits 2 2 rt bits 6 imm 6 bits Read out the base address from the register file. 2:2 cuts out the bits from the instruction. Example lw $s,($s) Base address in rs has now the address stored in $s (in the above example). next 2:2 2 2 I II 6 lw instruction Read Offset 26 2 op rs 6 bits bits next 2 2 The offset is found in the least significant 6 bits of the instruction. rt bits 6 : imm 6 bits 2:2 2 2 Simm Example lw $s,($s) The offset is stored in the imm field. The offset is signed. Sign extend to bits. That is: Simm : = : Simm :6 = I II
7 lw instruction Read Word The base address and the offset are added together Control code for Example lw $s,($s) next 2:2 2 2 2 LU : Simm Reads out the data word from data memory. I II 8 lw instruction Write Back 26 2 2 2 op rs rt 6 imm Example lw $s,($s) next Reads out bits of the rt register to enable write back of result. : 2:2 2 2:6 2 Write enable Simm 2 LU Write back the result I II
9 lw instruction Increment next 2:2 2 2:6 2 2 LU : Simm Increment the by. (Next instruction is at address ) This is the complete data path for the load word (lw) instruction. I II 2 lw instruction Timing next 2:2 2 2:6 2 2 LU : Simm Combinational logic during clock cycle: read instruction, sign extend, read from register file, perform LU operation, and read from the data memory. t the raising clock edge: Write to the register file and update the. I II
2 sw instruction Increment We need to read the base address, read the offset, and compute an address. Good news: We have already done that!. next The word to be stored is saved in a register (rt-field in the I-type) : 2:2 2:6 2:6 2 2 Write enable must be false Simm 2 LU Example sw $s,($s) The data word is stored into the memory I II 22 R-type instructions Machine Encoding We are now going to handle all R-type instructions the same uniform way. That is, we should handle add, sub, and, or, and slt. 26 2 2 2 6 I-Type op rs rt imm 6 bits bits bits 6 bits The rs and rt fields are in the same place for both I-Type and R-Type R-Type 26 2 2 2 6 6 op rs rt rd shamt funct 6 bits bits bits bits bits 6 bits I II
2 R-type instructions LU Usage We want to send the second operand to the LU, but still be compatible with the lw-instruction. Different LU control signals for different instructions. next 2:2 2:6 2 : 2 LUSrc = LUControl LU MemWrite = : Simm I II 2 R-type instructions Write to Register R-Type instructions write to registers and not to memory. Bypass the data memory if an R-Type instruction next 2:2 2:6 2 : 2 LUSrc = LUControl LU MemWrite = MemToReg = : Simm I II
2 R-type instructions Machine Encoding 26 2 2 2 6 I-Type op rs rt imm 6 bits bits bits 6 bits For the lw instruction, the target register is stored in the rt field (bits 2:6) For R-type instructions, the target register is stored in the rd field (bits :) R-Type 26 2 2 2 6 6 op rs rt rd shamt funct 6 bits bits bits bits bits 6 bits I II 26 next R-type instructions Use the rd field Write to register using rd field instead of rt. 2:2 2:6 : 2 RegWrite = 2 2:6 : RegDst = LUSrc = LUControl LU Create a control variable for register write. MemWrite = MemToReg = I II
27 beq instruction Machine Encoding Recall that the beq instruction is a branch instruction, encoded in the I-Type. 26 2 2 2 6 I-Type op rs rt imm 6 bits bits bits 6 bits The rs and rt fields specify the registers that should be compared. The imm field is used when computing the branch target address (BT) Recall how to compute the BT: BT = imm * Example beq $s,$s,loop I II 28 beq instruction Compare if equal next Branch taken if equal, else increment by 2:2 2:6 : 2 RegWrite = RegDst =? 2 2:6 : LUSrc = Branch = LUControl LU <<2 Zero MemWrite = MemToReg =? Compute BT I II
29 Pseudo-Direct ddressing (Revisited) The J and JL instructions are encoded using the J-type. But, the address is not bits, only 26 bits. 26 2 op addr 6 bits 26 bits -bit Pseudo-Direct ddress is computed as follows: Bits to (least significant) are always zero because word alignment of code. Bits 27 to 2 is taken directly from the addr field of the machine code instruction. Bits to 28 are obtained from the four most significant bits from. I II j instruction Jump 2:2 Zero 2:6 2 2 :28 27: <<2 2: : RegWrite Branch RegDst LUSrc LUControl 2:6 : LU <<2 MemWrite MemToReg I II
Path for s H add,sub,and,or,slt,addi,lw,sw,beq,j Note: we have indirect support for addi Branch (with sign extension), but ori would need RegDst additional logic. RegWrite LUSrc Jump LUControl 2:2 Zero 2:6 2 2 :28 27: <<2 2: : 2:6 : LU <<2 MemWrite MemToReg I II II cknowledgement: The structure and several of the good examples are derived from the book Digital Design and Computer rchitecture (2) by D. M. Harris and S. L. Harris. I II
What to Control? We should set the control RegWrite Branch RegDst Jump signals depending on the instruction. LUSrc LUControl 2:2 Zero 2:6 2 2 :28 27: <<2 2: : 2:6 : LU <<2 MemWrite MemToReg I II Control Unit Input: Machine Code R-Type op rs rt rd shamt funct 6 bits bits bits bits bits 6 bits For most R-Type instructions, the op field is. The control signals depend on the funct field. I-Type op rs rt imm 6 bits bits bits 6 bits For I-Type and J-Type, the control signals depend on the op field J-Type op addr 6 bits 26 bits I II
Control Unit Structure H The 6 bits op field from all instruction types Internal signal LUOp LUOp = means add LUOp = means subtract LUOp = look at the funct field LUOp = n/a op 6 LUOp Main Decoder 2 RegWrite RegDst LUSrc Branch MemWrite MemToReg Jump Control signals to the data path funct 6 LU Decoder LUControl The 6 bits funct field from the R-type. Ignored if other types. I II 6 LU Decoder Enough to check one bit (faster decoding) LUOp 2 funct 6 LU Decoder LUControl LUOp funct LUControl? (add)?? (subtract)? (add) (add)? (sub) (subtract)? (and) (and)? (or) (or)? (slt) (set less than) I II
Main Decoder H E 7 op RegWrite RegDst LUSrc Branch MemWrite MemToReg Jump LUOp R-Type lw sw?? beq?? addi j?????? I II Performance nalysis (/2) General View 8 How should we analyze the performance of a computer? By clock frequency? By instructions per program? Execution time (in seconds) = # instructions clock cycles instruction seconds clock cycle Number of instructions in a program (# = number of) Determined by programmer or the compiler or both. verage cycles per instruction (CPI) Determined by the microarchitecture implementation. Seconds per cycle = clock period T C. Determined by the critical path in the logic. Problem: Your program may have many inputs. Not only one specific program might be interesting. Solution: Use a benchmark (a set of programs). Example: SPEC CPU Benchmark I II
Performance nalysis (2/2) 9 Execution time (in seconds) = # instructions clock cycles instruction seconds clock cycle Number of instructions in a program (# = number of) Determined by programmer or the compiler or both. verage cycles per instruction (CPI) Determined by the microarchitecture implementation. Seconds per cycle = clock period T C. Determined by the critical path in the logic. Each instruction takes one clock cycle. That is, CPI =. The lw instruction has longer path than R-Type instructions. However, because of synchronous logic, the clock period is determined by the slowest instruction. The main problem with this design is the long critical path. I II Critical Path Example: Load Word (lw) Jump 2:2 Zero 2:6 2 2 :28 27: <<2 2: : RegWrite Branch RegDst LUSrc LUControl 2:6 : LU <<2 MemWrite MemToReg I II
You can soon stretch your legs but wait just a second more I II 2 Reading Guidelines Module : Processor Design Lecture 9: LU and Single-Cycled Processors H&H Chapters.2., 7.-7.. Lecture : Pipelined processors H&H Chapters 7., 7.8.-7.8.2, 7.9 Reading Guidelines See the course webpage for more information. I II
Summary Some key take away points: The LU performs most of the arithmetic and logic computations in the processor. The data path consists of sequential logic that performs processing of words in the processor. The control unit decodes instructions and tells the data path what to do. The single-cycle processor has a long critical path. We will solve this in the next lecture by introducing a pipelined processor. Thanks for listening! I II