Lecture 5: The Processor CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other sources for academic purpose only. The instructor does not claim any originality.
Lecture Outline Construction of a Simple IPS Processor Single Cycle Processor ulticycle Processor 2
High Level Language Program Levels of Representation Compiler Assembly Language Program Assembler achine Language Program achine Interpretation temp = v[k]; v[k] = v[k+]; v[k+] = temp; lw$5,($2) lw$6,4($2) sw $6, ($2) sw $5, 4($2) Control Signal Specification ALUOP[:3] <= InstReg[9:] & ASK 3
Execution Cycle Fetch Decode Operand Fetch Execute Result Store Next Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand Compute result value or status Deposit results in storage for later use Determine successor instruction 4
The Processor: Datapath & Control We're ready to look at an implementation of the IPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical ti i l instructions:add, ti sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do 5
ore Implementation Details Abstract / Simplified View Two types of ffunctional units: elements that operate on values (combinational) elements that contain state (sequential) Data Register # PC Address Registers ALU memory Register # Register # Address Data memory Data 6
Basic Building Blocks Latches Flip-flops Combinational Elements Sequential Elements Clocking strategies 7
Register File: Operation Built using D flip-flops (Combinational in nature) register number register number 2 Register R egister Register n Register n u x u 2 x 3
Register File: Operation We still use the real clock to determine when to write C D Register Register number n-to- decoder n C D Register n Register C Register n D C D Register n 4
Register File: Block Diagram register number register number 2 register Register file 2 Three Address ports One Data Input Port Two Data Output Ports One Control Signal 5
a: emory Functional Units - I After an instruction address is put, the instruction residing at the address appears at the output port b: Program Counter -- A simple up counter c: Adder -- A 2 s complement adder address memory PC Add Sum a. memory b. Program counter c. Adder 6
a: Register File Functional Units - II It s construction, read, and write operations as discussed previously b: ALU (Arithmetic ti & Logic Unit) Register numbers Recall the ALU design we have discussed in last two classes Note the Zero output Data 5 register 5 register 2 Registers 5 register 2 Reg Data 3 ALU control Zero ALU ALU result a. Register File b. ALU 7
Functional Units -III a: Data emory Unit Similar to emory Unit, only that it can written into as well Two input ports for address and, one output port (for read out) Two control signals: for and operations b: Sign Extension Unit -- Extends 6-bit input operand to 32 bits em Address Data memory 6 Sign 32 e x te n d em a. Data m emory unit b. Sign-extension unit 8
Datapath for Fetch (Piece I) Fetching s and Incrementing the program counter by 4 Add 4 P C ad dre ss memory 9
Datapath for R-type s (Piece II) Datapath for R-type s register register 2 Registers register 2 3 ALU operation Zero ALU ALU result Reg 2
Datapath for Load/Store (Piece III) Datapath for load or store () Register Access; (2) emory Address calculation; (3) / (4) into Register file (if the instruction is a load) register register 2 Registers register Reg 2 3 ALU operation Zero ALU ALU result Address em Data memory 6 Sign 32 extend em 2
Datapath for Branch (Piece IV) Unit Shift left 2 adds at the low-order end of the sign-extended offset Control logic is used to decide whether the incremented PC or branch target should replace the PC based on the Zero output of the ALU PC + 4 from instruction path Add Sum Branch target Shift left 2 register register 2 Registers register Reg 2 6 Sign 32 extend ALU operation 3 To branch ALU Zero control logic 22
Datapath Construction Strategy Now, we have pieces of path that are capable of performing distinct functions We want to stitch them together to yield a final path that can execute all the instructions (lw, sw, add, sub, and, or,slt, beq, j) We will use multiplexors (or muxes for short) for stitching the path 23
i l Datapath Construction (erge Pieces II&III) Piece II + Piece III x ti ri r register register 2 Registers register register register 2 Registers register Reg 2 Reg 2 6 Sign 32 extend 3 ALU operation Zero ALU ALU result Address 3 ALU operation Zero ALU ALU result em Data memory em 24
And you will get.. Rule: Whenever we have more than one input feeding a functional unit, introduce a multiplexor (this gives rise to a control signal, more later..) Registers register register 2 register 2 Reg ALUSrc u x ALU operation 3 em Zero ALU ALU result Address Data memory emtoreg u x 6 Sign 32 extend em 25
Datapath Construction (merge Piece I) Just tack the Fetch and PC increment logic at the front! 4 Add PC address memory Registers register register 2 register 2 Reg ALUSrc ux ALU operation 3 em Zero ALU ALU result Address Data memory emtoreg ux 6 Sign 32 extend em 26
Datapath Construction (merge Piece IV) PCSrc Add ux 4 Shift left 2 Add ALU result PC address ti memory Registers register register 2 register Reg 2 6 Sign 32 extend ALUSrc u x ALU operation 3 em Zero ALU ALU result Address em Data memory emtoreg ux 27
Final Datapath Data flows through various paths under the influence of control signals There are seven control signals (of type,, or ux Select) PCSrc Add ux 4 ALU Add result Reg Shift left 2 PC address [3 ] memory [25 2] [2 6] [5 ] ux x RegDst [5 ] register register 2 2 register Registers 6 Sign 32 extend ALUSrc u x ALU control ALU Zero ALU result em Address Data memory em emtoreg u x [5 ] ALUOp 28
Defining the Control.. Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $7, $8 Format: op rs rt rd shamt funct ALU's operation based on instruction type and function code We will design two control units: ()ALU Control to generate appropriate function select signals for the ALU (2)ain Control to generate signals for functional units other than the ALU 29
Defining the ALU Control... Contd. e.g., what should the ALU do with this instruction Example: lw $, ($2) 35 2 op rs rt 6 bit offset ALU control input AND OR add subtract set-on-less-than than 3
ALU Control Design ust describe hardware to compute 3-bit ALU control input given instruction type = lw, sw = beq, = arithmetic function code for arithmetic ALUOp computed from instruction type ALU Control inputs How are they determined? Funct Desired ALU Control Opcode ALUOp Operation Field ALU Action Operation LW load word XXXXXX add SW store word XXXXXX add Branch equal branch equal XXXXXX subtract R-type add add R-type subtract subtract R-type AND and R-type OR or R-type set on less than set on less than 3
ALU Control - Truth Table & Implementation Describe it using a truth table (can turn into gates): ALUOp Funct field Operation ALUOp ALUOp F5 F4 F3 F2 F F X X X X X X X X X X X X X X X X X X X X X X X X X X X X ALUOp ALUOp ALUOp ALU control block F3 Operation2 F (5 ) F2 F F Operation Operation Operation 32
Designing the ain Control 4 Add [3 26] Control R egdst Branch em emtoreg ALUOp em Shift left 2 Add ALU result u x ALUSrc Reg PC address memory [3 ] ti [25 2] [2 6] [5 ] register register 2 Registers 2 register u x u x Zero ALU ALU result Address Data memory u x [5 ] 6 Sign 32 extend ALU control [5 ] 33
Control Signals and their Effects Signal Name Effect When deasserted Effect when asserted RegDst The register destination number for the The register destination number for the register comes from the rt field (bits 2-6) register comes from the rd field (bits 5-) Reg NONE The register on the register input is written with the value on the input ALUSrc The second ALU operand comes from the The second ALU operand is the sign-extended second register file output ( 2) lower 6 bits of the instruction PCSrc The PC is replaced by the output of the adder that The PC is replaced by the output of the adder computes the value of PC + 4. that computes the branch target em None Data emory contents designated by the address input are put on the output em None Data memory contents designated by the address input are replaced by the value on the input emtoreg The value fed to the register input The value fed to the register input comes comes from the ALU from the memory. 34
ain Control: Truth Table & Implementation emto- Reg em em RegDst ALUSrc Reg Branch ALUOp ALUp R-format lw sw X X beq X X Inputs Op5 O p 4 Op3 Op2 Op O p R -fo r m a t Iw s w b e q Outputs RegDst ALUSrc emtoreg R e g W rite em em Branch ALUOp ALUOpO 35
Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thingtobedone ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element Combinational logic State element 2 Clock cycle We are ignoring some details like setup and hold times 36
How does the single cycle path work? Let us understand this by highlighting g g the portions of the path when an R-type instruction is executed For an R-type instruction we go through the following phases: Phase : Fetch Phase 2: Register File Phase 3: ALU execution Phase 4: the Result into the Register File NOTE: All the four phases are completed in only ONE clock cycle and hence it is a single cycle implementation 37
R-type Phase ( Fetch) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 38
R-type Phase 2 (Register ) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 39
R-type Phase 3 ( ALU execution) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 4
R-type Phase 4 ( the Result) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 4
How do we handle jump? [25 ] Shift Jump address [3 ] left 2 26 28 4 Add PC+4 [3 28] [3 26] Control RegDst Jump Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU u x u x PC address memory [3 ] [25 2] [2 6] [5 ] register register 2 Zero Registers ALU ALU 2 result u x register u x Address Data memory u x [5 ] 6 Sign 32 extend ALU control [5 ] 42
Single Cycle Implementation: Summary All instructions are executed in only clock cycle We built asingle cycle path th from scratch th We designed appropriate controller to generate correct correct signals All instructions are not born equal; that some require more work, some less => disadvantage of single cycle implementation is that the slowest instruction determines the clock cycle width In reality, no body implements single cycle approach. Given the single cycle path, you should be able to highlight active portions of the path for any given instruction. 43
Single Cycle Implementation - Issues Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area Cycle width determined by the slowest instruction One Solution: use a smaller cycle time have different instructions ti take different numbers of cycles a multicycle path: 44
Single Cycle, ultiple Cycle, vs. Pipeline Clk Cycle Single Cycle Implementation: Cycle 2 Load Store Waste Clk Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle ultiple Cycle Implementation: Load Store Ifetch Reg Exec em Wr Ifetch Reg Exec em R-type Ifetch Pipeline Implementation: Load Ifetch Reg Exec em Wr Store Ifetch Reg Exec em Wr R-type Ifetch Reg Exec em Wr 45
ulticycle Approach Wewill be reusing functional units ALU used to compute address and to increment PC emory used for instruction and Our control signals will not be determined solely by instruction e.g., what should the ALU do for a subtract instruction? We ll use a finite state machine for control Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional internal registers 46
ulticycle Approach High Level View PC Address emory or Data register emory register Data Register # Registers Register # Register # A B ALU ALUOut 47
ulticycle Approach handles the basic instructions PC u x Address emory emdata [25 2] [2 6] [5 ] register [5 ] emory register [5 ] u x u x 6 register register 2 Registers register 2 Sign extend 32 Shift left 2 A B 4 u x u 2 x 3 Zero ALU ALU result ALUOut 48
Fetch Five Execution Steps ti Decode and Register Fetch Execution, emory Address Computation, or Branch Completion emory Access or R-type instruction completion -back step INSTRUCTIONS TAKE FRO 3-5 CYCLES! 49
Step : Fetch Use PC to get instruction and put it in the Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register- Transfer Language IR <= emory[pc]; PC <= PC + 4; 5
Step 2: Decode and Register Fetch registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25-2]]; B <= Reg[IR[2-6]]; ALUOut <= PC + ( sign-extend(ir[5- ]) << 2 ); 5
Step 3: (instruction dependent) ALU is performing one of three functions, based on instruction type emory Reference: ALUOut <= A + sign-extend(ir[5-]); R-type: ALUOut <= A op B; Branch: if (A==B) PC <= ALUOut; 52
Step 4: (R-type or memory-access) Loads and stores access memory DR <= emory[aluout]; or emory[aluout] <= B; R-type instructions finish Reg[IR[5-]] <= ALUOut; The write actually takes place at the end of the cycle on the edge. 53
Step 5: -back step Reg[IR[2-6]] <= DR; Summary of Steps taken to execute any instruction class. Step name Action for R-type instructions Action for memory-reference instructions Action for branches Action for jumps fetch IR <= emory[pc] PC <= PC + 4 A <= Reg [IR[25-2]] decode/register fetch B <= Reg g[ [IR[2-6]][ ALUOut <= PC + (sign-extend (IR[5-]) << 2) Execution, address ALUOut <= A op B ALUOut <= A + sign-extend if (A == B) then PC <= PC [3-28] II computation, branch/ (IR[5-]) PC <= ALUOut (IR[25-]<<2) jump completion emory access or R-type Reg [IR[5-]] <= Load: DR <= emory[aluout] completion ALUOut or Store: emory [ALUOut] <= B emory read completion Load: Reg[IR[2-6]] <= DR This sequence suggests what controller must do on each clock-cycle cycle. 54
ulticycle Datapath with Control Lines IorD em em IR RegDst Reg ALUSrcA PC u x Address emory emdata [25 2] register u A x [2 6] register 2 Registers [5 ] register 2 B register [5 ] [5 ] ux ux 4 u 2 x 3 Zero ALU ALU result ALUOut emory register 6 Sign extend 32 Shift left 2 ALU control [5 ] emtoreg ALUSrcB ALUOp 55
ulticycle Datapath with Controller PC u x Address emory emdata [3-26] [25 2] [2 6] [5 ] register [5 ] PCCond PC IorD em em emtoreg IR Outputs Control Op [5 ] PCSource ALUOp ALUSrcB ALUSrcA Reg RegDst [25 ] 26 Shift 28 left 2 [5 ] ux ux register register 2 Registers register 2 A B 4 ux u 2 x 3 PC [3-28] Zero ALU ALU result Jump address [3-] ALUOut ux 2 emory register 6 Sign extend 32 Shift left 2 ALU control [5 ] 56
Implementing the Control Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we ve accumulated to specify a finite state machine specify the finite state machine graphically, or Use microprogramming g Implementation can be derived from specification 57
Actions of the -bit control signals Signal Name Effect When deasserted Effect when asserted RegDst The register destination number for the The register destination number for the register comes from the rt tfield register comes from the rd field Reg NONE The general purpose register selected by the register number is written with the value of the input. ALUSrcA The first ALU operand is the PC. The first ALU operand comes from the A register. em None Content of emory at the location specified by the Address input is put on emory output. em None emory contents at the location specified by the Address input is replaced by value on input. emtoreg The value fed to the register file inputthe value fed to the register file input comes comes from ALUOut from the DR. IorD The PC is used to supply the address to the ALUOut is used to supply the address to the memory memory unit. unit. IR None The output of the memory is written into the IR. PC None The PC is written; the source is controlled by PCSource PCCond None The PC is written if the Zero output from the ALU is activ 58
Actions of the 2-bit control signals Signal Name Value Effect when asserted "" The ALU performs an add operation. ALUOp "" The ALU performs a subtract operation. "" The funct field of the instruction determines the ALU operation "" The second input to the ALU comes from the B register. ALUSrcB "" The second input to the ALU is the constant 4. "" The second input to the ALU is the sign-extended, lower 6 bits of the IR. "" The second input to the ALU is the sign-extended, lower 6 bits of the IR shifted left 2 bits PCSource "" Output of the ALU (PC + 4) is sent to the PC for writing. "" The contents of the ALUOut (the branch target address) are sent to the PC for writing. "" The jump target address (IR[25-] shifted left 2 bits and concatenated with PC + 4[3-28]) is sent to the PC for writing 59
Finite state machines: Finite state machines a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) We ll use a oore machine (output based only on current state) Current state Next-state function Next state Inputs Clock Output function Outputs 6
Finite State achine Control for ulticycle Implementation ti Start fetch/decode and register fetch (Figure 5.37) 5.32 ) emory access instructions (Figure 5.38) R-type instructions Branch instruction Jump instruction (Figure 5.39) (Figure 5.4) (Figure 5.4) 5.33) 5.34) 5.35) 5.36) 6
Fetch & Decode (Fig 5.32) fetch decode/ Register fetch em ALUSrcA = IorD = IR ALUSrcA = Startt ALUSrcB = ALUSrcB = ALUOp = ALUOp = PC PCSource = Op = 'JP') (O emory reference FS R-type FS Branch FS Jump FS (Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36) 62
emory Reference s (Fig. 5.33) 2 From state (Op='LW')or(Op='SW') emory address computation ALUSrcA = ALUSrcB = ALUOp = 3 (Op ='LW') emory access 5 emory access em IorD = em IorD = 4 -back step Reg emtoreg = RegDst = To state (Figure 5.32) 63
6 7 From state (Op = R-type) ALUSrcA = ALUSrcB = ALUOp = Execution R-type, Branch, Jump.. R-type completion 8 From state (Op = 'BEQ') ALUSrcA = ALUSrcB = ALUOp = PCCond PCSource = Branch completion 9 From state t (Op = 'J') PC PCSource = Jump completion RegDst = Reg emtoreg = To state (Figure 5.32) To state (Figure 5.32) Tostate t (Figure 5.32) 64
Graphical Specification of FS 2 emory address computation ALUSrcA = ALUSrcB = ALUOp = Start fetch em ALUSrcA = IorD = IR ALUSrcB = ALUOp = PC PCSource = 6 Execution ALUSrcA = ALUSrcB = ALUOp= 8 Branch completion ALUSrcA = ALUSrcB = ALUOp = PCCond PCSource = ti decode/ d register fetch 9 ALUSrcA = ALUSrcB = ALUOp = (Op = 'J') Jump completion PC PCSource = 3 (Op = 'LW') emory access 5 emory access 7 R-type completion em IorD = em IorD = RegDst = Reg emtoreg = 4 -back step RegDst= Reg emtoreg = 65
Finite State achine for Control PC Control logic Inputs Outputs PCCond IorD em em IRW rite emtoreg PCSource ALUOp ALUSrcB ALUSrcA R egw rite RegDst NS3 NS2 NS NS Op p5 Op p4 Op p3 Op p2 Op p Op p S3 S2 S S register opcode field State register 66
ulticycle Datapath to Handle Exceptions 67
ulticycle Control to Handle Exceptions 68
Controller Implementation: Big Picture Initial representation Finite state diagram icroprogram Sequencing control Explicit next state function icroprogram counter +dispatchros Logic Logic Truth representation equations tables Implementation Programmable only technique logic array memory 69
Summary Single-cycle implementation ulti-cycle implementation Is an effective implementation: slower instructions take more clock cycles, faster, less! Higher resource sharing (area is less) State diagrams can be used to specify the control From FS spec, we can automatically synthesize the controller implementation. Controller Implementation Three choices: RO, PLA, and icroprogramming PLAs is more efficient in terms of area compared to RO icroprogramming is a flexible style (popularized by CISC) 7