3-3. Learning Outcomes 3-3. Spiral 3-3 Single Cycle CPU I understand how the single-cycle CPU datapath supports each type of instruction I understand why each mux is needed to select appropriate inputs to the datapath components I know how to design the control signals as a function of the type of instruction 3-3.3 Sorting: Software Implementation 3-3. Hardware vs. Software REVIEW To perform the algorithm in software means the processor fetches instructions, executes them, which causes the processor to then read and write the data in memory into it's sorted positions Sorting 6 element on a.8 GHz Xeon processor 6 microseconds Can we do better w/ more HW? Processor D C Memory 78 fffff 6 3 Custom (Sort) HW
Sorting: Hardware Implementation 3-3. Sorting: Final Comparison 3-3.6...from memory ( per clock) Sorting 6 element on a.8 GHz Xeon processor [SW only] 6 microseconds Sorting 6 numbers in [old] custom HW period = 3 ns => 6 microseconds total 3 ns is due to the 8 number HW sorter Merging (Select-Val) stages are < ns Can we improve? X X X X3 X X X6 X7 3 ns HW Sorting Network Y Y Y Y3 Y Y Y6 Y7...... FIFO/Queue a/b 8 FIFO/Queue a/b 8 ns ns ns SelectVal FIFO/Queue a/b 6 FIFO/Queue a/b 6 What did we do to reduce period in this design? SelectVal FIFO/Queue a/b FIFO/Queue a/b SelectVal 6...to memory Sorting 6 element on a.8 GHz Xeon processor [SW only] 6 microseconds total time Sorting 6 numbers in [old] custom HW period = 3 ns => 6 microseconds total = ~.x speedup Sorting 6 numbers in [old] pipelined HW period = ns => microseconds total = ~8x speedup Processor Processor is freed to do other work D C fffff Custom (Sort) HW Memory 78 6 3 3-3.7 CPU Organization Scope 3-3.8 uilding hardware to execute software GENERL PURPOSE HRDWRE We will build a CPU to implement our subset of the MIPS IS Memory Reference Instructions: Load Word (LW) Store Word (SW) rithmetic and Logic Instructions: DD, SU, ND, OR, SLT ranch and Instructions: ranch if equal (EQ) unconditional (J) These basic instructions exercise a majority of the necessary datapath and control logic for a more complete implementation 8
3-3.9 3-3. Single-Cycle CPU path Fetch [3:6] [:] [:] [:6] [:] Reg. # Mem & Mem Op[:] Src Reg data 6 INST[:] Op[:] ranch Src control Src Mem Mem 9 ddress in is used to fetch instruction while it is also incremented by to point to the next instruction Remember, the doesn t update until the end of the clock cycle / beginning of next cycle Mux provides a path for branch target addresses Fetch x8 xc x8 branch target xa8 DD $6,$9,$ opcode rs rt rd shamt func Decode Opcode and func. field are decoded to produce other control signals Execution of an instruction (DD $3,$,$) requires reading register values and writing the result to a third REG is an enable signal indicating the write data should be written to the specified register Instruction Word DD $3,$,$ opcode rs rt shamt rd func Reg. # Logic data REG als Value of $ Value of $ Result from add 3-3. is the collection of GPR s. Our register file has 3 (ability to concurrently read or write a register). To see why we need 3, consider an DD $3,$,$. We need to read two operands (i.e. $ $) and for the result ($3) registers each storing -bits registers => Muxes to choose desired value register => Decoder and registers w/ enable data Reg -to- decoder converts -bit write reg. # to -of- output signals to enable that register to capture the write data on the next edge. If Regis the decoder is disabled making all outputs go to and thus no register updates. [:] EN 3 D EN D EN D EN $ $ $3 Reg # Reg # 3 3 data 3-3. Each Mux chooses which register value to output based on the -bit reg. # provided by the instruction
3-3.3 3-3. path for instruction takes inputs from register file and performs the add, sub, and, or, slt, operations Result is written back to dest. register Memory ccess path Operands are read from register file while offset is sign extended calculates Memory access is performed If LW, LW $,xfff8($) SW $3,xa($) word DD $3,$,$ 3 Reg. # data $ value $ value op 3 Reg. # data $ value xffff fff8 DD 3 Reg. # data $ value xa $3 value 3-3. 3-3.6 ranch path EQ requires for comparison (examine zero output) extension unit for branch offset dder to add and offset Need a separate adder since is used to perform comparison Fetch path Question Can the adder used to increment the be an and be used/shared for instructions like DD/SU/etc. In a single-cycle CPU, word EQ $,$,offset word offset Reg. # data (incremented ) Shift byte offset op $ value $ value dder extended word offset ranch Target ddress to ZERO Next = Current / ddress S / I-MEM Instruction Word 6
3-3.7 3-3.8 Fetch path Question RegFile Question Do we need the enable signal on the register for our single-cycle CPU? In the single-cycle CPU, Why do we need the write enable signal, REG? Next = Current / ddress S / I-MEM Instruction Word 7 Instruction Word ex. instruc. opcode rs rt shamt rd func Reg. # Logic data REG als Value of $ Value of $ Result from add 8 3-3.9 3-3. RegFile Question Extension Unit Can write to registers be level sensitive or does it have to be edge-sensitive? Instruction Word ex. instruc. opcode rs rt shamt rd func Logic Reg. # data REG als Value of $ Value of $ Result from add 9 In a LW or SW instructions with their base register offset format, the instruction only contains the offset as a 6-bit value Example: LW $,-8($) Machine Code: x8cfff8-8 = xfff8 The 6-bit offset must be extended to -bits before being added to base register LW $,xfff8($) opcode rs rt offset offset = xfff8 6 xfffffff8
3-3. 3-3. Extension Questions ranch path Question What logic is inside a sign-extension unit? How do we sign extend a number? Do you need a shift register? Is it okay to start adding branch offset even before determining whether the branch is taken or not? b b b b 3 b b b b b 3 b 6-bit offset -bit sign-extended output word EQ $,$,offset word offset Reg. # data (incremented ) Shift $ value $ value dder op extended word offset ranch Target ddress to ZERO (To control logic) 3-3.3 3-3. Combining paths Src Mux Now we will take the datapaths for each instruction type and try to combine them into one nywhere we have multiple options for a certain input we can use a mux to select the appropriate value for the given instruction Select bits must be generated to control the mux Mux controlling second input to instruction provides Register data to the nd input of LW/SW uses nd input of as an offset to form effective address 3 Reg. # Instruction data $ value $ value op Reg. # data $ value xffff fff8 DD Mem. Instruction data 3 Src
3-3. 3-3.6 Mux Src Mux Mux controlling writeback value to register file instructions use the result of the LW uses the read data from data memory Next instruction can either be at the next sequential address () or the branch target address (offset) Reg. # data 6 Reg. # data 6 ranch Target ddress Src 6 3-3.7 3-3.8 Mux Single-Cycle CPU path Different destination register ID fields for and LW instructions R-Type () rs rt rd shamt func 3-6 - -6 - -6 - Destination Register Number I-Type (LW) 3 or 3 rs rt address offset rs rt rd 3-6 - -6 - Reg. # data 6 7 [:] [:] [:6] [:] Reg Reg. # data 6 INST[:] Op[:] Src ranch control Src Mem 8 Mem
3-3.9 3-3.3 Single-Cycle CPU path Implementation [3:6] [:] [:] [:6] [:] Reg. # Mem & Mem Op[:] Src Reg data 6 INST[:] Op[:] ranch Src control Src Mem Mem 9 [:] [:] [3:6] 6 8 [:] [:6] [:] Reg. # Mem & Mem Op[:] Src Reg data 6 INST[:] Op[:] ddress = {New[3:8], INST[:],} ranch Src Next ddress control ranch ddress Src Mem Mem 3 3-3.3 3-3. Unit Design for Single-Cycle CPU Unit Unit: Maps instruction to control signals Traditional Unit FSM: Produces control signals asserted at different times Design NSL, SM, OFL Single-Cycle Unit Every cycle we perform the same steps: Fetch, Decode, Execute als are not necessarily time based but instruction based => only combinational logic Inputs (Instruction/Opcode) NSL SM Traditional Unit Outputs OFL # of FF s in tightly-encoded state assignment: -8 states:, 9-6 states: Inputs (Instruction/Opcode) NSL SM State Outputs OFL Most control signals are a function of the opcode (i.e. LW/SW, R-Type, ranch, ) is a function of opcode ND function bits. OpCode ([3:6]) Func. ([:]) OpCode ([3:6]) Unit Unit Func. ([:]) ranch Mem Mem Src Reg [:] ranch Mem Mem Src Reg Op[:] to Single-Cycle Unit Only state 3=> FF s
3-3.33 3-3.3 Truth Table needs to know what instruction type it is: R-Type (op. depends on func. code) LW/SW (op. = DD) EQ (op. = SU) Let main control unit produce Op[:] to indicate, then use function bits if necessary to tell the what to do OpCode ([3:6]) Unit Op[:] Func. ([:]) to [:] is a function of: Op[:] and Func.[:] Op[:] Instruction Operation Func.[:] Desired ction LW Load word X dd SW Store word X dd ranch EQ X Subtract R-Type ND nd R-Type OR Or Instruction Op[:] R-Type dd dd LW/SW R-Type Sub Subtract ranch R-Type unit maps instruction opcode to Op[:] encoding R-Type SLT Set on less than Produce each [:] bit from the Op and Func. inputs 3 al Generation Other control signals are a function of the opcode We could write a full truth table or (because we are only implementing a small subset of instructions) simply decode the opcodes of the specific instructions we are implementing and use those intermediate signals to generate the actual control signals OpCode ([3:6]) Unit ranch Mem Mem Src Reg Op[:] Could generate each control signal by writing a full truth table of the 6-bit opcode OpCode ([3:6]) Decoder R-Type LW SW EQ Unit ranch Mem Mem Src Reg Op[:] Simpler for human to design if we decode the opcode and then use individual instruction signals to generate desired 3 control signals 3-3.3 R- Type al Truth Table LW SW EQ J ranch Reg Dst Src Memto- Reg Reg Mem Mem Op[] X X X [:] [:] [3:6] [:] [:6] [:] 6 8 Reg. # Reg data Mem & Mem Op[:] Src 6 INST[:] Op[:] ddress ranch Src Next ddress control ranch ddress Src Mem Mem 36 3-3.36 Op[]
al Logic 3-3.37 Credits 3-3.38 Op[] Op[] Op[3] Op[] Op[] Op[] Decoder These slides were derived from Gandhi Puvvada s EE 7 Class Notes R-Type LW SW EQ J ranch Src Reg Mem Mem Op Op 37