ENG8 Computer Organization and rchitecture MIPS: Data Path Design Part Winter 7 S. reibi School of Engineering University of Guelph Introduction Topics uilding a Complete Data Path for MIPS Multi Cycle Computer Datapath Design Design of the Unit dvantages & Disadvantages Summary Single Cycle Implementation Cycle Time Unfortunately, though simple, the single cycle approach is not used because it is very slow Clock cycle must have the same length for every instruction What is the longest (slowest) path (slowest instruction)? With thanks to W. Stallings, Hamacher, J. Hennessy, M. J. Irwin for lecture slide contents Many slides adapted from the PPT slides accompanying the textbook and CSE Course References I. Computer Organization and rchitecture: Designing for Performance, th edition, by William Stalling, Pearson. II. Computer Organization and Design: The Hardware/Software Interface, th editino, by D. Patterson and J. Hennessy, Morgan Kaufmann III. Computer Organization and rchitecture: Themes and Variations,, by lan Clements, CENGGE Learning Review: Single Cycle Data and Path Instr[5-] 8 6 left +[-8] Jump Op ranch dd Instr[-6] Instr[5-] Instruction Instr[-6] Instr[-] Instr[5 -] Src ddr RegisterData ddr Write ddr Data dd left ovf Src Mem Data Data Instr[5-] 6 Instr[5-] School of Engineering
Single Cycle Disadvantages & dvantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate the slowest instr especially problematic for more complex instructions like floating point multiply Clk Cycle Cycle lw sw Waste May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle but It is simple and easy to understand Our Multicycle pproach reak up the instructions into steps where each step takes a clock cycle while trying to balance the amount of work to be done in each step use only one major functional unit per clock cycle t the end of a clock cycle Store values needed in a later clock cycle by the current instruction in a state element (internal register not visible to the programmer) Instruction Register Data Register and Register read data registers out output register - ll (except ) hold data only between a pair of adjacent clock cycles (so they don t need a write signal) Data used by subsequent instructions are stored in programmer visible state elements (i.e., Register,, or ) The Multicycle Datapath High Level View Registers have to be added after every major functional unit to hold the output value until it is used in a subsequent clock cycle MIPS Data Path Multi Cycle Data ddr RegisterData ddr Write ddr Data out 8 Multicycle Implementation Overview Clocking the Multicycle Datapath Each instruction step takes clock cycle Therefore, an instruction takes more than clock cycle to complete (fetch, decode, execute, ) Not every instruction takes the same number of clock cycles to complete Multicycle implementations allow faster clock rates different instructions to take a different number of clock cycles functional units to be used more than once per instruction as long as they are used on different clock cycles, as a result - only need one memory - only need one /adder System Clock clock cycle ddr RegisterData Data ddr Write ddr Data out School of Engineering
The Multicycle Datapath High Level View Multiplexors have to be added since we are using a single memory and a single The Complete Multicycle Data with Data ddr RegisterData ddr Write ddr Data out Data Instr[-6] Instr[5-] Instr[5-] Instr[5-] ddr RegisterData ddr Write ddr Data left [-8] 8 left out The Multicycle Datapath with more Support lthough this datapath supports normal incrementing of the, a few more connections and a multiplexor will be needed for branches and jumps The additions versus the single-clock datapath include Data Instr[5-] Instr[5-] ddr RegisterData ddr Write ddr Data left out The Complete Multicycle Data with Data The is written both unconditionally and conditionally. Cond Instr[-6] Instr[5-] Instr[5-] ddr RegisterData ddr Write ddr During a normal increment and for jumps, the is written unconditionally. If the instruction is a conditional branch, the increment is replaced with the value in Out only if registers are equal. [-8] Instr[5-] Data left left 8 out The MC Datapath with support to ranch/jump The Complete Multicycle Data with. The output of the, which is the value + during instruction fetch. This value should be stored directly into the.. The register Out, which is where we will store the address of the branch target after it is computed.. The lower 6 bits of the instruction register () shifted left by two and concatenated with the upper bits of the incremented, which is the source when the instruction is a jump. [-8] Data ddr RegisterData ddr Write ddr Instr[5-] Instr[5-] Instr[5-] Data left left 8 out Data Cond Mem Write Source Op Src Src Instr[-6] Instr[5-] Instr[5-] ddr RegisterData ddr Write ddr Instr[5-] Data left [-8] left 8 out School of Engineering
The Instruction & Instruction Register Five Instruction Steps R-type: 5 5 5 I-Type: J-Type: op rs rt rd shamt funct 5 5 op rs rt address offset op jump target address Data Instr[5-] Write ddr Instr[5-] to branch Instr[-6] to unit for opcode Instr[5-] used for jump Instr[5-] ddr Instr[-6] ddr ddr RegisterData ddr Write ddr Data left. Instruction. Instruction and Register. R-type Instruction Execution, /Write Computation, ranch Completion, or Jump Completion. ccess, Write Completion or R-type Instruction Completion 5. Completion (Write ack) INSTRUCTIONS TKE FROM - 5 CYCLES! Instr[5-] to Review: Our ling the uses of multiple decoding levels main unit generates the Op bits unit generates bits Instr op funct Op action lw xxxxxx add sw xxxxxx add beq xxxxxx subtract add add subt subtract and and or or xor xor nor nor slt slt Step : Instruction Use to get instruction from the memory and put it in the Instruction Register Increment the by and put the result back in the Can be described succinctly using the RTL "Register- Transfer Language = []; = + ; Can we figure out the values of the signals? What is the advantage of updating the now? Our Multicycle pproach, con t Datapath ctivity During Instruction ing from or writing to any of the internal registers, Register, or the occurs (quickly) at the beginning (for read) or the end of a clock cycle (for write) Cond Mem Write Source Op Src Src ing from the Register takes ~5% of a clock cycle since it has additional and access overhead (but reading can be done in parallel with decode) Had to add multiplexors in front of several of the functional unit input ports (e.g.,, ) because they are now shared by different clock cycles and/or do multiple jobs ll operations occurring in one clock cycle occur in parallel This limits us to one operation, one access, and one Register access per clock cycle Data Instr[-6] ddr RegisterData ddr Write ddr Instr[5-] Instr[5-] Instr[5-] Data left [-8] left 8 out School of Engineering
als Settings,Write,,= = Instr Mem;Write Src= src= Source,Op= als Settings,Write,,= = Mem;Write Src= src= Source,Op= Instr Src= Src= Op= Cond= Step : Instruction and Register Don t know what the instruction is yet, so can only registers rs and rt in case we need them Compute the branch address in case the instruction is a branch The RTL: R-type: 5 5 5 op rs rt rd shamt funct = Reg[[5-]]; I-Type: = Reg[[-6]]; Out = +(sign-extend([5-])<< ); 5 5 op rs rt address offset Note we aren't setting any lines based on the instruction (since we don t know what it is (the logic is busy "decoding" the op code bits)) Step (instruction dependent) is performing one of four functions, based on instruction type. reference (lw and sw): op rs rt address offset Out = + sign-extend([5-]);. R-type: Out = op ; I-Type: 5 5 R-type: 5 5 5 op rs rt rd shamt funct. ranch: I-Type: op rs rt address offset if (==) = Out;. Jump: = [-8] ([5-] << ); J-Type: 5 5 op jump target address Datapath ctivity During Instruction Cond Mem Write Source Op Src Src Datapath ctivity During () lw & sw Cond Mem Write Source Op Src Src Out = + sign-extend([5-]); Data Instr[-6] Instr[5-] ddr RegisterData ddr Write ddr Data Instr[5-] left Instr[5-] [-8] 8 left out Data Instr[-6] Instr[5-] ddr RegisterData ddr Write ddr Data Instr[5-] left Instr[5-] [-8] 8 left out School of Engineering 5
Datapath ctivity During () R-type R-type: 5 5 5 Cond Mem Write op rs rt rd shamt funct Source Op Src Src Out = op ; als Settings,Write,,= = Mem;Write Src= src= Source,Op= Instr Src= Src= Op= Cond= Data Instr[-6] Instr[5-] ddr RegisterData ddr Write ddr Data Instr[5-] left Instr[5-] [-8] 8 left out Src= Src= Op= Cond= Src= Src= Op= Cond= Src= Src= Op= Source= Cond Source= fter state the signals asserted depend on the class of instruction. Thus, the finite state machine has four arcs exiting state, corresponding to the four instruction classes: reference (lw, sw) R-type ranch on equal Jump This process of branching to different states depending on the instruction is called decoding. Datapath ctivity During () beq Data I-Type: 5 5 Cond Mem Write op rs rt address offset ddr RegisterData ddr Write ddr Instr[5-] Data left Source Op Src Src Instr[-6] Instr[5-] Instr[5-] if (==) = Out; [-8] left 8 out Step or Write (also instruction dependent) reference: = [Out]; or [Out] = ; -- lw -- sw R-type instruction completion (write to Reg) Reg[[5-]] = Out; R-type: 5 5 5 op rs rt rd shamt funct Remember, the register write actually takes place at the end of the cycle on the clock edge Datapath ctivity During () j J-Type: op jump target address Cond Mem Write = [-8] ([5-] << ); Source Op Src Src Datapath ctivity During lw ccess Cond Mem Write Source Op Src Src Data Instr[-6] Instr[5-] ddr RegisterData ddr Write ddr Data Instr[5-] left Instr[5-] [-8] 8 left out Data Instr[-6] Instr[5-] ddr RegisterData ddr Write ddr Data Instr[5-] left Instr[5-] [-8] 8 left out School of Engineering 6
Datapath ctivity During sw ccess Step 5: Completion (Write ack) Data Cond Mem Write Source Op Src Src Instr[-6] Instr[5-] Instr[5-] ddr RegisterData ddr Write ddr Instr[5-] Data left [-8] left 8 out ll we have left is the write back into the register file the data just read from memory for the load lw instruction Reg[[-6]]= ; I-Type: 5 5 op rs rt address offset Write the load data, which was stored into in the previous cycle into the register file. What about all the other instructions? Datapath ctivity During R-type Completion Datapath ctivity During lw Write ack Cond Mem Write Source Op Src Src Cond Mem Write Source Op Src Src Data Instr[-6] Instr[5-] ddr RegisterData ddr Write ddr Data Instr[5-] left Instr[5-] [-8] 8 left out Data Instr[-6] Instr[5-] ddr RegisterData ddr Write ddr Data Instr[5-] left Instr[5-] [-8] 8 left out ccess als Settings,Write,,= = Mem;Write Src= src= Source,Op= Instr Src= Src= Op= Cond= Write ack als Settings,Write,,= = Mem;Write Src= src= Source,Op= Instr Src= Src= Op= Cond= Src= Src= Op= Cond= Src= Src= Op= Cond= Src= Src= Op= Source= Cond Source= Src= Src= Op= Cond= Src= Src= Op= Cond= Src= Src= Op= Source= Cond Source= Mem = Cond= ccess = Cond= = = Cond= Mem = Cond= ccess = Cond= = = Cond= = = Cond= Write ack School of Engineering 7
RTL Summary (from 5 cycles) Instruction & will be common to all Instructions Step R-type Mem Ref ranch Jump Instr fetch = []; = + ; = Reg[[5-]]; = Reg[[-6]]; Out = +(sign-extend([5-])<< ); Operations for Each Cycle RTL Summary (from 5 cycles) Step R-type Mem Ref ranch Jump nswering Simple Questions How many cycles will it take to execute this code? Out = op ; Out = + sign-extend ([5-]); if (==) = Out; = [-8] ([5- ] << ); lw $t, ($t) lw $t, ($t) beq $t, $t, Label #assume not add $t5, $t, $t sw $t5, 8($t) Label:... address for second lw being calculated What is going on during the 8 th cycle of execution? In what cycle does the actual addition of $t and $t takes place? 6 th cycle th cycle In what cycle is the branch target address calculated? 5 5 = cycles RTL Summary (from 5 cycles) Cycles /5 Cycles Cycles Cycles Step R-type Mem Ref ranch Jump Instr fetch Out = op ; access = []; = + ; = Reg[[5-]]; = Reg[[-6]]; Out = +(sign-extend([5-])<< ); Reg[ [5-] ] = Out; Out = + sign-extend ([5-]); = [Out]; or [Out] = ; if (==) = Out; X = [-8] ([5- ] << ); X Multi Cycle Writeback X Reg[[-6]] = ; X X School of Engineering 8
Unit Design Recall (in Single-Cycle datapath) we used a set of truth tables (hardwired) that specified the setting of the signals based on the instruction class. For the Multi-Cycle datapath, the is more complex!! Why? ecause the instruction is executed in a series of steps. The for the Multi-Cycle datapath must specify the signals to be set in any step and the next step in the sequence. Possible Implementations: Finite Sate Machine (FSM) Microprogrammed Finite State Machine Implementation Use D-FF or JK FF to realize the unit From State Diagram obtain the State Table. Determine the number of FFs required. Use excitation tables to design input logic for FF. Op5 Op Op Op Combinational logic Op Inst[-6] Inputs Op System Clock Outputs State Reg Cond Mem Write Source Op Source Source Next State Multicycle Multicycle datapath signals are not determined solely by the bits in the instruction e.g., op code bits tell what operation the should be doing, but not what instruction cycle is to be done next We can use a finite state machine for a set of states (current state stored in State Register) next state function (determined by current state and the input) output function (determined by current state) (Type of FSM?) Combinational logic Inst Opcode State Reg Datapath points Next State So we are using a Moore machine (datapath signals based only on current state)......... lgorithmic State Machine (SM) Implementation We can also use lgorithmic State Machines (SM) to implement the Unit. Translate the FSM to an SM and then: Use one flipflop per state Use Sequence Register and r. The SM implementation based on VHDL can be realized using: Schematic Capture Structural VHDL ehavioral VHDL Multicycle Datapath Finite State Machine,Write,,= Src= Src= Op= Cond= Mem = Cond= ccess = = Cond= Write ack = Instr Mem;Write Src= Src= Src= src= Op= Source,Op= Cond= 5 = Cond= 6 Src= Src= Op= Cond= 7 = = Cond= 8 9 Src= Src= Source= Op= Source= Cond State ssignment Total of States The Complete Multicycle Data with Data Cond Mem Write Source Op Src Src Instr[-6] Instr[5-] Instr[5-] ddr RegisterData ddr Write ddr Instr[5-] Data left [-8] left 8 out School of Engineering 9
The Effect of -bit als Datapath Outputs Truth Table Outputs Cond X Mem Write X Source Op Src Src X Input Values (Current State[-]) The Effect of -bit als Datapath Outputs Truth Table Outputs Input Values (Current State[-]) Cond X X X X X X X X X Mem Write X X X X X X X X Source XX XX XX XX XX XX XX Op XX XX XX XX XX Src XX XX XX XX XX Src X X X X X X X X X X X X X Datapath ctivity During Instruction Data Cond Mem Write Source Op Src Src Instr[-6] ddr RegisterData ddr Write ddr Instr[5-] Instr[5-] Instr[5-] Data left [-8] left 8 out Multicycle Datapath FSM,Write,,= Src= Src= Op= Cond= Mem = Cond= ccess = = Cond= Write ack = Instr Mem;Write Src= Src= Src= src= Op= Source,Op= Cond= 5 = Cond= 6 Src= Src= Op= Cond= 7 = = Cond= 8 9 Src= Src= Source= Op= Source= Cond Total of States School of Engineering
Next State Truth Table ( ) Current State [-] (Rtype) (jmp) Inst[-6] (beq) (Op[5-]) (lw) (sw) ny other 6 Next State Truth Table (ll return to state ) Recall.. Simple Sequencer.. Current State [-] (Rtype) (jmp) Inst[-6] (beq) (Op[5-]) (lw) (sw) ny other illegal XXXX XXXX XXXX illegal XXXX XXXX XXXX XXXX illegal XXXX XXXX XXXX XXXX illegal XXXX XXXX XXXX XXXX illegal XXXX XXXX XXXX XXXX illegal XXXX XXXX XXXX XXXX illegal XXXX XXXX XXXX XXXX illegal XXXX XXXX XXXX XXXX illegal Instruction Register En Mapping Logic MUX Register Incrementer + als (micro-operations) Simplifying the Unit Design For an implementation of the full MIPS IS instr s can take from clock cycles to + clock cycles resulting in finite state machines with hundreds to thousands of states with even more arcs (state sequences) - Such state machine representations become impossibly complex Instead, can represent the set of signals that are asserted during a state as a low-level instruction to be executed by the datapath microinstructions Executing the microinstruction is equivalent to asserting the signals specified by the microinstruction Microprogramming Units microinstruction has to specify what signals should be asserted what microinstruction should be executed next Each microinstruction corresponds to one state in the FSM and is assigned a state number (or address ). Sequential behavior increment the state (address) of the current microinstruction to get to the state (address) of the next. Jump to the microinstruction that begins execution of the next MIPS instruction (state ). ranch to a microinstruction based on unit input using dispatch tables (Will Discuss this later!!) - need one for microinstructions following state - need another for microinstructions following state The set of microinstructions that define a MIPS assembly language instruction (macroinstruction) is its microroutine School of Engineering
Microcode Implementation Our Microinstruction Format PL Outputs Cond Mem Write Source Op Source Source Field Value al setting Comments dd Op = Cause to add Subt Op = Cause to subtract (compare op for beq) Func code Op = Use function code to determine SRC Src = Use as top input Src = Use reg as top input ddrctl dder Microprogram Counter System clock To Datapath ddr select logic Op5 Op Op Op Op Op sequencing Inst[-6] (Opcode) Defining a Microinstruction Format Format the fields of the microinstruction and the signals that are affected by each field signals specified by a field usually have functions that are related format is chosen to simplify the representation and to make it difficult to write inconsistent microinstructions - i.e., that allow a given signal be set to two different values Make each field of the microinstruction responsible for specifying a nonoverlapping set of signals signals that are never asserted simultaneously may share the same field seven fields for our simple machine - ; SRC; SRC; Register ; ; ; Sequencing Our Microinstruction Format Field Value al setting Comments dd Op = Cause to add Subt Op = Cause to subtract (compare op for beq) Func code Op = Use function code to determine SRC Src = Use as top input Src = Use reg as top input SRC Src = Use reg as bottom input Src = Use as bottom input Src = Use sign ext output as bottom input Extshft Src = Use shift-by-two output as bottom input Our Microinstruction Format Field Value al setting Comments dd Op = Cause to add Subt Op = Cause to subtract (compare op for beq) Func code Op = Use function code to determine Our Microinstruction Format Field Value al setting Comments dd Op = Cause to add Subt Op = Cause to subtract (compare op for beq) Func code Op = Use function code to determine SRC Src = Use as top input Src = Use reg as top input SRC Src = Use reg as bottom input Register Src = Use as bottom input Src = Use sign ext output as bottom input Extshft Src = Use shift-by-two output as bottom input Write Write, =, =, =, = Reg using rs and rt fields of as read addr s; put data into and Write Reg using rd field of as write addr and Out as write data Write Reg using rt field of as write addr and as write data School of Engineering
Our Microinstruction Format, con t Field Value al setting Comments Mem, =,Write Write Mem, lord =, = memory using as addr; write result into (and ) memory using Out as addr; write results into Write memory using Out as addr and as write data Creating the Microprogram microinstruction Label (ddr) () SRC SRC Reg dd Seq Seq ing compute + fetch instr write into output into go to µinstr Label field represents the state (address) of the microinstruction microinstruction assigned state (address) Our Microinstruction Format, con t Field Value al setting Comments Mem, =,Write write Write Mem, lord =, = Source = Jump address Source =, Cond Source =, memory using as addr; write result into (and ) memory using Out as addr; write results into Write memory using Out as addr and as write data Write with output of If Zero output of is true, write with the contents of Out Write with jump address after shift-by-two Multicycle Datapath FSM,Write,,= Src= Src= Op= Cond= Mem = Cond= ccess 5 = Cond= = Instr Mem;Write Src= Src= Src= src= Op= Source,Op= Cond= 6 Src= Src= Op= Cond= 7 = = Cond= 8 9 Src= Src= Source= Op= Source= Cond = = Cond= Write ack Our Microinstruction Format, con t Field Value al setting Comments Mem, =,Write write Outcond Sequencing Write Mem, lord =, = Source = Outcond Jump address Source =, Cond Source =, memory using as addr; write result into (and ) memory using Out as addr; write results into Write memory using Out as addr and as write data Write with output of If Zero output of is true, write with the contents of Out Write with jump address after shift-by-two Seq ddrctl = Choose next microinstruction sequentially ddrctl = Jump to the first microinstruction (i.e., ) to begin a new instruction Dispatch ddrctl = ranch using PL_ Dispatch ddrctl = ranch using PL_ The Entire Microprogram ddr SRC SRC Reg Seq ing dd Seq dd Ext shft Disp dd Disp Seq Write 5 Write 6 Func code Seq 7 Write 8 Subt Outcond 9 Jump address School of Engineering
Multicycle Datapath FSM,Write,,= Src= Src= Op= Cond= Mem = Cond= ccess = Instr Mem;Write Src= Src= Src= src= Op= Source,Op= Cond= 5 = Cond= 6 Src= Src= Op= Cond= 7 = = Cond= 8 9 Src= Src= Source= Op= Source= Cond = = Cond= Write ack Implementing Dispatches 8 8 8 8 8 School of Engineering
Multicycle dvantages & Disadvantages Uses the clock cycle efficiently the clock cycle is timed to accommodate the slowest instruction step balance the amount of work to be done in each step restrict each step to use only one major functional unit Multicycle implementations allow faster clock rates different instructions to take a different number of clock cycles functional units to be used more than once per instruction as long as they are used on different clock cycles but Requires additional internal state registers, muxes, and more complicated (FSM) 85 Path Design lternatives Single Cycle vs. Multiple Cycle Timing Initial representation Sequencing Logic representation Implementation technique Finite state diagram Microprogram Explicit next Microprogram counter state function + dispatch PLs Logic equations Programmable Logic rray (PL) Microcode ROM/RM Single Cycle Implementation: Clk Clk Cycle Cycle Cycle lw sw Waste multicycle clock slower than /5 th of single cycle clock Multiple Cycle Implementation: due to state register overhead Cycle Cycle Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle Microprogram representation advantages Easier to design, write, and debug lw I Dec Exec Mem W sw I Dec Exec Mem R-type I Summary 87 School of Engineering 5