ENE 334 Microprocessors Lecture 6: Datapath and Control : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 3 th & 4 th Edition, Patterson & Hennessy, 2005/2008, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
Outlines: ENE 334 Datapath and Control Page 2 Arithmetic Logic Unit(ALU) The datapath A Single Cycle Implementation A Multicycle Implementation The Control unit
Logic Circuits: ENE 334 Datapath and Control Page 3 Combinational Logic: Decoders
Logic Circuits: ENE 334 Datapath and Control Page 4 Combinational Logic: Multiplexors I
Logic Circuits: ENE 334 Datapath and Control Page 5 Combinational Logic: Multiplexors II
Logic Circuits: ENE 334 Datapath and Control Page 6 Combinational Logic: Adder
ENE 334 Datapath and Control Page 7 ALU: 1-Bit The device that performs the arithmetic operations like addition and subtraction or logical operations like AND and OR.
ALU: 1-Bit ENE 334 Datapath and Control Page 8 Subtraction?
ALU: 1-Bit ENE 334 Datapath and Control Page 9 NOR?
ALU: 32-Bit ENE 334 Datapath and Control Page 10
ALU: MIPS ENE 334 Datapath and Control Page 11
ALU: MIPS bit0-bit30 ENE 334 Datapath and Control Page 12
ALU: MIPS bit31 ENE 334 Datapath and Control Page 13
ALU: 32-bit ENE 334 Datapath and Control Page 14
ALU: ENE 334 Datapath and Control Page 15
Datapath: ENE 334 Datapath and Control Page 16 The core MIPS instruction set: - the memory-reference instructions: load word(lw) and store word(sw) - the arithmetic-logical instructions: add, sub, and, or and slt - the instructions: branch equal(beq) and jump (j)
Datapath: ENE 334 Datapath and Control Page 17 For every instruction, the first two steps are identical: 1] Send the program counter(pc) to the memory that contains the code and fetch the instruction from that memory. 2] Read one or two registers, using fields of the instruction. All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow?
MIPS: Implementation ENE 334 Datapath and Control Page 18 Abstract / Simplified View: Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential)
MIPS: Implementation ENE 334 Datapath and Control Page 19
Logic Design Basics ENE 334 Datapath and Control Page 20
Sequential Elements ENE 334 Datapath and Control Page 21
Sequential Elements ENE 334 Datapath and Control Page 22
Logic Design conventions: Clocks Clock methodology: Any input to a state element must reach a stable value before the active clock edge causes the state to be updated ENE 334 Datapath and Control Page 23
Logic Design conventions: Clocks ENE 334 Datapath and Control Page 24 Clock methodology: The clock period must be as large as : t prop +t combination +t setup
Logic Design conventions: Clocks ENE 334 Datapath and Control Page 25 Clock methodology: The clock period must be as large as : t prop +t combination +t setup +t skew
Logic Design conventions: Clocks ENE 334 Datapath and Control Page 26 Clock methodology: An edge-triggered methodology allows a state element to be read and written in the same clock cycle
Building a Datapath: ENE 334 Datapath and Control Page 27 Datapath elements: We need: a memory unit to store a program and supply instructions given an address.
Building a Datapath: ENE 334 Datapath and Control Page 28 Fetching instructions and incrementing the PC: We need: a memory unit to store a program and supply instructions given an address.
R-Format Instruction ENE 334 Datapath and Control Page 29
Building a Datapath: R-format ENE 334 Datapath and Control Page 30 Datapath elements: Ex: add $s1,$s2,$s3 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt rd shamt funct We need: the register file and the ALU to implement R-format ALU operations.
Load/Store Instruction ENE 334 Datapath and Control Page 31
Branch Instruction ENE 334 Datapath and Control Page 32
Building a Datapath: I-format ENE 334 Datapath and Control Page 33 Datapath elements: Ex: lw $s1,100($s2) 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt immediate We need: the data memory unit and a unit to sign-extend for lw, sw
ENE 334 Datapath and Control Page 34 Building a Datapath: I,R-format 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt rd shamt funct 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt immediate
Building a Datapath:beq(I-format) ENE 334 Datapath and Control Page 35 Datapath elements: Ex: beq $t1,$t2,offset -branch target address -branch taken -branch not taken o 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt immediate
ENE 334 Datapath and Control Page 36 Building a Datapath: I,R-format 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt rd shamt funct 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt immediate
A Simple Scheme: ALU Control ENE 334 Datapath and Control Page 37 ALU control lines Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set on less than 1100 NOR lw,sw and,or,add,subtract,set on less than Beq Why is the code for subtract 0110 and not 0011? - Why is the code for subtract 0110 and not 0011?
ENE 334 Datapath and Control Page 38 A Simple Scheme: ALU Control Instruction opcode ALUOp Instruction operation Funct field Desired ALU action ALU control input LW 00 load word XXXXXX add 0010 SW 00 store word XXXXXX add 0010 Branch equal 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 R-type 10 subtract 100010 subtract 0110 R-type 10 AND 100100 and 0000 R-type 10 OR 100101 or 0001 R-type 10 set on less than 101010 set on less than 0111 ALUOp,Funct field: ALU control lines
ENE 334 Datapath and Control Page 39 A Simple Scheme: ALU Control ALUOp Funct field Operations ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 0 0 X X X X X X 0010 X 1 X X X X X X 0110 1 X X X 0 0 0 0 0010 1 X X X 0 0 1 0 0110 1 X X X 0 1 0 0 0000 1 X X X 0 1 0 1 0001 1 X X X 1 0 1 0 0111 The truth table for the three ALU control bits
A Simple Scheme: ENE 334 Datapath and Control Page 40 a.) R-type instruction 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt rd shamt funct b.) Load or store instruction 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt immediate c.) Branch instruction 31 26 25 21 20 16 15 11 10 6 5 0 opcode rs rt immediate
A Simple Scheme: control lines ENE 334 Datapath and Control Page 41
ENE 334 Datapath and Control Page 42 A Simple Scheme: the main control Instruction RegDst ALUSrc Memto Reg Reg Write Mem Read Mem Write Branch ALUOp1 ALUOp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 ALUOpcode: control lines
ENE 334 Datapath and Control Page 43 A Simple Scheme: the main control Simple combinational logic (truth tables) Inputs ALUOp ALUOp0 ALUOp1 ALU control block Op5 Op4 Op3 Op2 Op1 Op0 F (5 0) F3 F2 F1 F0 Operation2 Operation1 Operation0 Operation R-format Iw sw beq Outputs RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOpO
A Simple Scheme: Datapath+control ENE 334 Datapath and Control Page 44
A Simple Scheme: R-type ENE 334 Datapath and Control Page 45
A Simple Scheme: a load instruction ENE 334 Datapath and Control Page 46
A Simple Scheme: a beq instr. ENE 334 Datapath and Control Page 47
A Simple Scheme: the control table Input or Output Inputs Outputs ENE 334 Datapath and Control Page 48 Signal name R-format lw sw beq Op5 0 1 1 0 Op4 0 0 0 0 Op3 0 0 1 0 Op2 0 0 0 1 Op1 0 1 1 0 Op0 0 1 1 0 RegDst 1 0 X X ALUSrc 0 1 1 0 MemtoReg 0 1 X X RegWrite 1 1 0 0 MemRead 0 1 0 0 MemWrite 0 0 1 0 Branch 0 0 0 1 ALUOp1 1 0 0 0 ALUOp0 0 0 0 1
Implementing Jumps ENE 334 Datapath and Control Page 49
A Simple Scheme: jump ENE 334 Datapath and Control Page 50
Performance Issues ENE 334 Datapath and Control Page 51
Pipelineing Analogy ENE 334 Datapath and Control Page 52
Single Cycle Implementation ENE 334 Datapath and Control Page 53 Calculate cycle time assuming negligible delays except: memory (200ps), ALU and adders (100ps), register file access (50ps) PCSrc Add M ux 4 Add ALU result Shift left 2 PC Read address Instruction Instruction memory Read register 1 Read register 2 Registers Write register Write data RegWrite Read data 1 Read data 2 ALUSrc M ux 4 ALU ALU operation Zero ALU result Address Write data MemWrite Read data Data memory MemtoReg M ux 16 Sign 32 extend MemRead
Where we are headed ENE 334 Datapath and Control Page 54 Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area One Solution: use a smaller cycle time have different instructions take different numbers of cycles a multicycle datapath: PC Address Instruction or data Memory Data Instruction register Memory data register Data Register # Registers Register # Register # A B ALU ALUOut
Basic: the multicycle datapath ENE 334 Datapath and Control Page 55 The Approach: - Break up the instructions into steps, each step takes a cycle - balance each step - reuse functional units - Store values at the end of a cycle - for later cycles - introduce additional registers
Basic: the multicycle datapath ENE 334 Datapath and Control Page 56 IR with write input MDR R R R Share functional units
Basic: the multicycle datapath ENE 334 Datapath and Control Page 57
MIPS: with the control lines ENE 334 Datapath and Control Page 58
MIPS: with the control lines ENE 334 Datapath and Control Page 59 - Neither register A nor B requires a write signal, since their contents are only read on the cycle immediately after it is written. - the clock cycle cannot accommodate the time required for both the memory access and the register file write. MDR
MIPS: the complete datapath ENE 334 Datapath and Control Page 60
MIPS: Multicycle steps ENE 334 Datapath and Control Page 61 1) Instruction fetch: IR <= Memory[PC] PC <= PC + 4 the control signals: MemRead, IRWrite, IorD,... 2) Instruction decode and register fetch: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC+SignExtend(IR[15:00]<<2);
MIPS: Multicycle steps ENE 334 Datapath and Control Page 62 3) Execution, memory address computation, or branch completion: determined by the instruction class: Memory reference (lw, sw): ALUOp = 00 ALUOut <= A+(SignExtend(IR[15:00])); Arithmetic-logical instruction (R-type): ALUOp = 10 ALUOut <= A op B;
MIPS: Multicycle steps ENE 334 Datapath and Control Page 63 3) Execution, memory address computation, or branch completion: determined by the instruction class: Branch: ALUOp = 01 if (A==B) PC <= ALUOut; Jump: PC <= {PC[31:28],IR[25:0],2'b00};
MIPS: Multicycle steps ENE 334 Datapath and Control Page 64 4) Memory access or R-type instruction completion: Memory reference (lw, sw): MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; Arithmetic-logical instruction (R-type): Reg[IR[15:11]] <= ALUOut;
MIPS: Multicycle steps ENE 334 Datapath and Control Page 65 5) Memory read completion: load: Reg[IR[20:16]] <= MDR;
MIPS: Multicycle steps ENE 334 Datapath and Control Page 66
MIPS: Question ENE 334 Datapath and Control Page 67 How many cycles will it take to execute this code? Loop: sll $t1,$s3,2 # $t1 = 4*i add $t1,$t1,$s6 # $t1 = addr lw $t0,0($t1) # $t0 = save[i] bne $t0,$s5,exit # save[i]=k? addi $s3,$s3,1 # i = i+1 j Loop # go to Loop Exit:
Control: techniques ENE 334 Datapath and Control Page 68 1) Finite State Machine: 2) Microprogramming: 3) Sequencer:
FSM: Finite State Machine ENE 334 Datapath and Control Page 69
FSM: Implementation ENE 334 Datapath and Control Page 70
FSM: The high level ENE 334 Datapath and Control Page 71
FSM: Step 1, 2 ENE 334 Datapath and Control Page 72
FSM: memory-reference instr. ENE 334 Datapath and Control Page 73 Step 3, 4,...
FSM: R-type instr. ENE 334 Datapath and Control Page 74 Step 3, 4
FSM: branch instr. ENE 334 Datapath and Control Page 75 Step 3
FSM: jump instr. ENE 334 Datapath and Control Page 76 Step 3
FSM: implementation ENE 334 Datapath and Control Page 77
FSM: the complete ENE 334 Datapath and Control Page 78 10 states:? bits S3,S2,S1,S0
FSM: the control unit ENE 334 Datapath and Control Page 79 NS: Next State
PLA Implementation ENE 334 Datapath and Control Page 80 If I picked a horizontal or vertical line could you explain it? Op5 Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource1 PCSource0 ALUOp1 ALUOp0 ALUSrcB1 ALUSrcB0 ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0
ROM Implementation ENE 334 Datapath and Control Page 81 ROM = "Read Only Memory" values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2 m entries in the ROM. our outputs are the bits of data that the address points to. m is the "height", and n is the "width" m n
ROM Implementation ENE 334 Datapath and Control Page 82 How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 2 10 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 2 10 x 20 = 20K bits (and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are the same i.e., opcode is often ignored
ENE 334 Datapath and Control Page 83 ROM vs PLA Break up the table into two parts 4 state bits tell you the 16 outputs, 2 4 x 16 bits of ROM 10 bits tell you the 4 next state bits, 2 10 x 4 bits of ROM Total: 4.3K bits of ROM PLA is much smaller can share product terms only need entries that produce an active output can take into account don't cares Size is (#inputs #product-terms) + (#outputs #product-terms) For this example = (10x17)+(20x17) = 510 PLA cells PLA cells usually about the size of a ROM cell (slightly bigger)
Another Implementation Style ENE 334 Datapath and Control Page 84 Op[5 0] Complex instructions: the "next state" is often current state + 1 Control unit PLA or ROM Input 1 Outputs PCWrite PCWriteCond IorD MemRead MemWrite IRWrite BWrite MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst AddrCtl Adder State Address select logic Instruction register opcode field
Another Implementation Style ENE 334 Datapath and Control Page 85 Dispatch ROM 1 Dispatch ROM 2 Op Opcode name Value Op Opcode name Value 000000 R-format 0110 100011 lw 0011 000010 jmp 1001 101011 sw 0101 000100 beq 1000 100011 lw 0010 101011 sw 0010 1 PLA or ROM State Adder Mux 3 2 1 0 AddrCtl 0 Dispatch ROM 2 Dispatch ROM 1 Address select logic State number Address-control action Value of AddrCtl 0 Use incremented state 3 1 Use dispatch ROM 1 1 2 Use dispatch ROM 2 2 3 Use incremented state 3 4 Replace state number by 0 0 5 Replace state number by 0 0 6 Use incremented state 3 7 Replace state number by 0 0 8 Replace state number by 0 0 9 Replace state number by 0 0 Instruction register opcode field
Microprogramming ENE 334 Datapath and Control Page 86 Control unit Microcode memory Outputs Input 1 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite BWrite MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst AddrCtl Datapath Adder Microprogram counter Address select logic Instruction register opcode field What are the microinstructions?
Microprogramming ENE 334 Datapath and Control Page 87 A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions Label ALU control SRC1 SRC2 Register control Memory PCWrite control Sequencing Fetch Add PC 4 Read PC ALU Seq Add PC Extshft Read Dispatch 1 Mem1 Add A Extend Dispatch 2 LW2 Read ALU Seq Write MDR Fetch SW2 Write ALU Fetch Rformat1 Func code A B Seq Write ALU Fetch BEQ1 Subt A B ALUOut-cond Fetch JUMP1 Jump address Fetch Will two implementations of the same architecture have the same microcode? What would a microassembler do?
Microprogramming Field name Value Signals active Comment Add ALUOp = 00 Cause the ALU to add. ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for branches. Func code ALUOp = 10 Use the instruction's function code to determine ALU control. SRC1 PC ALUSrcA = 0 Use the PC as the first ALU input. A ALUSrcA = 1 Register A is the first ALU input. B ALUSrcB = 00 Register B is the second ALU input. SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input. Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input. Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input. Read Read two registers using the rs and rt fields of the IR as the register numbers and putting the data into registers A and B. Write ALU RegWrite, Write a register using the rd field of the IR as the register number and Register RegDst = 1, the contents of the ALUOut as the data. control MemtoReg = 0 Write MDR RegWrite, Write a register using the rt field of the IR as the register number and RegDst = 0, the contents of the MDR as the data. MemtoReg = 1 Read PC MemRead, Read memory using the PC as address; write result into IR (and lord = 0 the MDR). Memory Read ALU MemRead, Read memory using the ALUOut as address; write result into MDR. lord = 1 Write ALU MemWrite, Write memory using the ALUOut as address, contents of B as the lord = 1 data. ALU PCSource = 00 Write the output of the ALU into the PC. PCWrite PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contents PCWriteCond of the register ALUOut. jump address PCSource = 10, Write the PC with the jump address from the instruction. PCWrite Seq AddrCtl = 11 Choose the next microinstruction sequentially. Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction. Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1. Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2. ENE 334 Datapath and Control Page 88
Maximally vs. Minimally Encoded ENE 334 Datapath and Control Page 89 No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It s easy to add new instructions
Microcode: Trade-offs Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes ENE 334 Datapath and Control Page 90
Historical Perspective ENE 334 Datapath and Control Page 91 In the 60s and 70s microprogramming was very important for implementing machines This led to more sophisticated ISAs and the VAX In the 80s RISC processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-32 architecture processors since 486 use: hardwired control for simpler instructions (few cycles, FSM control implemented using PLA or random logic) microcoded control for more complex instructions (large numbers of cycles, central control store) The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store
Exceptions: or Interrupts ENE 334 Datapath and Control Page 92 An unscheduled event that disrupts program execution. - I/O device request - Invoke the operating system from user program - Arithmetic overflow - Using an undefined instruction - Hardware malfunction
Exceptions: How to handle ENE 334 Datapath and Control Page 93 Save the address of the offending instruction in the exception program counter (EPC) then transfer control to the operating system (OS) at the specified address. I/O device request OS then take appropriate actions and can restart the execution using EPC by means of: - a status register - vectored interrupts
Exceptions: Implementation ENE 334 Datapath and Control Page 94 Two more registers: - EPC - Cause Control signals: - EPCWrite - CauseWrite - IntCause: set the low-order bit of the Cause register appropriately (bit '0' = undefined instruction, bit '1' = arithmetic overflow) Need to be able to write the exception address into the PC.
Exceptions: Implementation ENE 334 Datapath and Control Page 95
Exceptions: Control Checks ENE 334 Datapath and Control Page 96
Pentium 4 ENE 334 Datapath and Control Page 97 Somewhere in all that control we must handle complex instructions Control Control I/O interface Instruction cache Enhanced floating point and multimedia Control Data cache Integer datapath Secondary cache and memory interface Advanced pipelining hyperthreading support Control Processor executes simple microinstructions, 70 bits wide (hardwired) 120 control lines for integer datapath (400 for floating point) If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions) Its complicated!
ENE 334 Datapath and Control Page 98 Conclusions: If we understand the instructions We can build a simple processor! If instructions take different amounts of time, multi-cycle is better Datapath implemented using: Combinational logic for arithmetic State holding elements to remember bits Control implemented using: Combinational logic for single-cycle implementation Finite state machine for multi-cycle implementation
Questions?: ENE 334 Datapath and Control Page 99 Midterm: Thursday 11 th oct, 2012: 9am-12pm