Single Cycle atapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili Section 4.1-4.4 Appendices B.3, B.7, B.8, B.11,.2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7-11 in the online text. Practice Problems: 1, 4, 6, 9 (2) 1
Introduction We will examine two MIPS implementations A simplified ersion à this module A more realistic pipelined ersion Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j (3) Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers epending on instruction class 1. Use ALU to calculate o o Arithmetic result Memory address for load/store o Branch target address 2. Access data memory for load/store 3. PC An address or PC + 4 Address An Encoded Program 8d0b0000 014b5020 21080004 2129ffff 1520fffc 000a082a.... (4) 2
Basic Ingredients Include the functional units we need for each instruction combinational and sequential Instruction address Instruction memory Register numbers Instruction PC a. Instruction memory b. Program counter ata 5 3 register 1 5 5 register 2 Registers register data data 1 data 2 Reg ata Add Sum c. Adder ALU control Zero ALU ALU result Address data Mem ata memory data Mem a. ata memory unit 16 32 Sign extend b. Sign-extension unit a. Registers b. ALU (5) Sequential Elements (4.2, B.7, B.11) Register: stores data in a circuit Uses a clock signal to determine when to update the stored alue Edge-triggered: update when Clk changes from 0 to 1 Q falling edge rising edge Clk Clk Q latch C Q latch _ C Q Q _ Q Q c C (6) 3
Register with write control Sequential Elements Only updates on clock edge when write control input is 1 Used when stored alue is required later Clk Q Clk cycle time Q latch C Q latch _ C Q Q _ Q Q c C (7) Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Synchronous s. Asynchronous operation Recall: Critical Path elay (8) 4
Register File (B.8) Built using flip-flops (remember ECE 2020!) register number 1 register number 2 Register 0 Register 1 Register n 1 Register n M u x M u x data 1 data 2 register number 1 register number 2 Register file register data data 1 data 2 (9) Register File Note: we still use the real clock to determine when to write Register number 0 1 n-to-1 decoder n 1 n C Register 0 C Register 1 C Register n 1 register number 1 register number 2 Register file register data data 1 data 2 Register data C Register n (10) 5
atapath Building a atapath (4.3) Elements that process data and addresses in the CPU o Registers, ALUs, mux s, memories, We will build a MIPS datapath incrementally Refining the oeriew design (11) High Leel escription Control Fetch Instructions Execute Instructions Memory Operations Single instruction single data stream model of execution Serial execution model Commonly known as the on Neumann execution model Stored program model Instructions and data share memory (12) 6
Instruction Fetch clk 32-bit register Increment by 4 for next instruction Start instruction fetch cycle time Complete instruction fetch clk (13) R-Format Instructions two register operands Perform arithmetic/logical operation register result op rs rt rd shamt funct (14) 7
Executing R-Format Instructions 5 5 5 register 1 register 2 register data data 1 data 2 Reg 3 ALU control ALU Zero ALU result op rs rt rd shamt funct (15) Load/Store Instructions register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: memory and update register Store: register alue to memory op rs rt 16-bit constant (16) 8
Executing I-Format Instructions register 1 register 2 register Reg 16 32 S ign exte nd A d d r e s s W r i te d a ta M e m W r it e a ta m e m o r y R e a d d a ta M e m R e a d op rs rt 16-bit constant (17) register operands Compare operands Branch Instructions Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 o Already calculated by instruction fetch op rs rt 16-bit constant (18) 9
Branch Instructions Just re-routes wires Sign-bit wire replicated op rs rt 16-bit constant (19) Updating the Program Counter Branch Add M ux 0 4 Shift Add result ALU 1 Computation of the branch address PC address Instruction [31 0] Instruction memory Instruction [25 21] Instruction [20 16] Instruction [15 11 Instruction [15 0] 16 Sign 32 extend loop: beq $t0, $0, exit addi $t0, $t0, -1 lw $a0, arg1($t1) lw $a1, arg2($t2) jal func add $t3, $t3, $0 addi $t1, $t1, 4 addi $t2, $t2, 4 j loop (20) 10
Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions PC Address An Encoded Program 014b5020 21080004 2129ffff 1520fffc 000a082a.... (21) Full Single Cycle atapath estination register is instructionspecific lw$t0, 0($t4) s. add $t0m $t1, $t2 (22) 11
ALU used for ALU Control (4.4,.2) Load/Store: Function = add Branch: Function = subtract R-type: Function depends on func field ALU control Function 000 AN 001 OR 010 add 110 subtract 111 set-on-less-than (23) ALU Control Assume 2-bit ALUOp deried from opcode Combinational logic deries ALU control don t care opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 010 sw 00 store word XXXXXX add 010 beq 01 branch equal XXXXXX subtract 110 R-type 10 add 100000 add 010 subtract 100010 subtract 110 AN 100100 AN 000 OR 100101 OR 001 set-on-less-than 101010 set-on-less-than 111 How do we turn this description into gates? (24) 12
ALU Controller lw/sw beq arith ALUOp Funct field ALU ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 Control 0 0 X X X X X X 010 X 1 X X X X X X 110 1 X X X 0 0 0 0 010 1 X X X 0 0 1 0 110 1 X X X 0 1 0 0 000 1 X X X 0 1 0 1 001 1 X X X 1 0 1 0 111 Generated from ecoding inst[31:26] inst[5:0] add sub add sub and or slt ALUOp ALU control 3 A L U A LU co ntrol Z e ro A L U re su lt funct = inst[5:0] (25) ALU Control Simple combinational logic (truth tables) ALUOp ALU control block ALUOp0 ALUOp1 F3 Operation2 F (5 0) F2 F1 F0 Operation1 Operation0 Operation (26) 13
The Main Control Unit Control signals deried from instruction R-type Load/ Store Branch 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 opcode always read read, except for load write for R-type and load sign-extend and add (27) atapath With Control Use rt not rd Instruction Regst ALUSrc Memto- Reg Reg Mem Mem Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 (28) 14
Commodity Processors ARM 7 Single Cycle atapath (29) Control Unit Signals Inputs Op5 Op4 Op3 Op2 Op1 Op0 Inst[31:26] Memto- Reg Instruction Regst ALUSrc Reg Mem Mem Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Adding a new instruction? R-format Iw sw beq Outputs Regst ALUSrc MemtoReg Reg Mem Mem Branch ALUOp1 ALUOpO To harness the datapath Programmable logic array (PLA) implementation (B.3) (30) 15
Controller Implementation LIBRARY IEEE; USE IEEE.ST_LOGIC_1164.ALL; USE IEEE.ST_LOGIC_ARITH.ALL; USE IEEE.ST_LOGIC_SIGNE.ALL; ENTITY control IS PORT( SIGNAL Opcode : IN ST_LOGIC_VECTOR( 5 OWNTO 0 ); SIGNAL Regst : OUT ST_LOGIC; SIGNAL ALUSrc : OUT ST_LOGIC; SIGNAL MemtoReg : OUT ST_LOGIC; SIGNAL Reg : OUT ST_LOGIC; SIGNAL Mem : OUT ST_LOGIC; SIGNAL Mem : OUT ST_LOGIC; SIGNAL Branch : OUT ST_LOGIC; SIGNAL ALUop : OUT ST_LOGIC_VECTOR( 1 OWNTO 0 ); SIGNAL clock, reset : IN ST_LOGIC ); EN control; (31) Controller Implementation (cont.) ARCHITECTURE behaior OF control IS SIGNAL R_format, Lw, Sw, Beq : ST_LOGIC; BEGIN -- Code to generate control signals using opcode bits R_format <= '1' WHEN Opcode = "000000" ELSE '0'; Lw <= '1' WHEN Opcode = "100011" ELSE '0'; Sw <= '1' WHEN Opcode = "101011" ELSE '0'; Beq <= '1' WHEN Opcode = "000100" ELSE '0'; Regst <= R_format; ALUSrc <= Lw OR Sw; Implementation MemtoReg <= Lw; Reg <= R_format OR Lw; of each table column Mem <= Lw; Mem <= Sw; Branch <= Beq; ALUOp( 1 ) <= R_format; ALUOp( 0 ) <= Beq; EN behaior; Instruction Regst ALUSrc Memto- Reg Reg Mem Mem Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 (32) 16
R-Type Instruction (33) Load Instruction (34) 17
Branch-on-Equal Instruction (35) Implementing Jumps Jump Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 2 address 31:26 25: 0 Need an extra control signal decoded from opcode (36) 18
atapath With Jumps Added clk (37) Example: ARM Cortex M3 Fitbit Flex ARM Processor www.ifixit.com Blue Tooth IC zembedded.com (38) 19
Our Simple Control Structure All of the logic is combinational We wait for eerything to settle down, and the right thing to be done ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element 1 Combinational logic State element 2 Clock cycle We are ignoring some details like setup and hold times (39) Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to ary period for different instructions Violates design principle Making the common case fast We will improe performance by pipelining (40) 20
Summary Single cycle datapath All instructions execute in one clock cycle Not all instructions take the same amount of time Software sees a simple interface Can memory operations really take one cycle? Improe performance ia pipelining, multicycle operation, parallelism or customization We will address these next (41) Study Guide Gien an instruction, be able to specify the alues of all control signals required to execute that instruction Add new instructions: modify the datapath and control to affect its execution Modify the dataflow in support, e.g., jal, jr, shift, etc. Modify the VHL controller Gien delays of arious components, determine the cycle time of the datapath istinguish between those parts of the datapath that are unique to each instruction and those components that are shared across all instructions (42) 21
Study Guide (cont.) Gien a set of control signal alues determine what operation the datapath performs Know the bit width of each signal in the datapath Add support for procedure calls jal instruction (43) Glossary Asynchronous Clock Controller Critical path Cycle Time ataflow Flip Flop Program Counter Register File Sign Extension Synchronous (44) 22