Lecture 5: The Processor

Similar documents
Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

The Processor: Datapath & Control

ECE369. Chapter 5 ECE369

CPE 335. Basic MIPS Architecture Part II

Multicycle Approach. Designing MIPS Processor

Chapter 5: The Processor: Datapath and Control

Processor: Multi- Cycle Datapath & Control

Topic #6. Processor Design

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M

Lets Build a Processor

Systems Architecture I

CC 311- Computer Architecture. The Processor - Control

Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

LECTURE 6. Multi-Cycle Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

ENE 334 Microprocessors

Lecture 6: Pipelining

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

Chapter 4 The Processor (Part 2)

Major CPU Design Steps

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W9-W

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CSE 2021 COMPUTER ORGANIZATION

EECE 417 Computer Systems Architecture

Processor (I) - datapath & control. Hwansoo Han

Systems Architecture

CSE 2021 COMPUTER ORGANIZATION

Mapping Control to Hardware

LECTURE 5. Single-Cycle Datapath and Control

Implementing the Control. Simple Questions

ECE232: Hardware Organization and Design

RISC Processor Design

Review: Abstract Implementation View

CENG 3420 Lecture 06: Datapath

Chapter 5 Solutions: For More Practice

Single-Cycle Examples, Multi-Cycle Introduction

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Outline. Combinational Element. State (Sequential) Element. Clocking Methodology. Input/Output of Elements

Multiple Cycle Data Path

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Note- E~ S. \3 \S U\e. ~ ~s ~. 4. \\ o~ (fw' \i<.t. (~e., 3\0)

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Chapter 4. The Processor

Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CPU Organization (Design)

Computer Science 141 Computing Hardware

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ECE260: Fundamentals of Computer Engineering

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

comp 180 Lecture 25 Outline of Lecture The ALU Control Operation & Design The Datapath Control Operation & Design HKUST 1 Computer Science

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

Design of the MIPS Processor

Lecture 10 Multi-Cycle Implementation

Single Cycle Datapath

Chapter 4. The Processor

Inf2C - Computer Systems Lecture Processor Design Single Cycle

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

RISC Architecture: Multi-Cycle Implementation

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Design of the MIPS Processor (contd)

Chapter 4. The Processor Designing the datapath

Initial Representation Finite State Diagram. Logic Representation Logic Equations

The MIPS Processor Datapath

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Ch 5: Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

EECS 322 Computer Architecture Improving Memory Access: the Cache

Lab 8: Multicycle Processor (Part 1) 0.0

Processor Implementation in VHDL. University of Ulster at Jordanstown University of Applied Sciences, Augsburg

The Processor: Datapath & Control

CS Computer Architecture Spring Week 10: Chapter

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ECE 313 Computer Organization EXAM 2 November 9, 2001

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

CSE140: Components and Design Techniques for Digital Systems

RISC Architecture: Multi-Cycle Implementation

RISC Design: Multi-Cycle Implementation

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

The overall datapath for RT, lw,sw beq instrucution

Introduction. ENG3380 Computer Organization and Architecture MIPS: Data Path Design Part 3. Topics. References. School of Engineering 1

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

Chapter 4. The Processor

Control Unit for Multiple Cycle Implementation

Transcription:

Lecture 5: The Processor CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other sources for academic purpose only. The instructor does not claim any originality.

Lecture Outline Construction of a Simple IPS Processor Single Cycle Processor ulticycle Processor 2

High Level Language Program Levels of Representation Compiler Assembly Language Program Assembler achine Language Program achine Interpretation temp = v[k]; v[k] = v[k+]; v[k+] = temp; lw$5,($2) lw$6,4($2) sw $6, ($2) sw $5, 4($2) Control Signal Specification ALUOP[:3] <= InstReg[9:] & ASK 3

Execution Cycle Fetch Decode Operand Fetch Execute Result Store Next Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand Compute result value or status Deposit results in storage for later use Determine successor instruction 4

The Processor: Datapath & Control We're ready to look at an implementation of the IPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical ti i l instructions:add, ti sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do 5

ore Implementation Details Abstract / Simplified View Two types of ffunctional units: elements that operate on values (combinational) elements that contain state (sequential) Data Register # PC Address Registers ALU memory Register # Register # Address Data memory Data 6

Basic Building Blocks Latches Flip-flops Combinational Elements Sequential Elements Clocking strategies 7

Register File: Operation Built using D flip-flops (Combinational in nature) register number register number 2 Register R egister Register n Register n u x u 2 x 3

Register File: Operation We still use the real clock to determine when to write C D Register Register number n-to- decoder n C D Register n Register C Register n D C D Register n 4

Register File: Block Diagram register number register number 2 register Register file 2 Three Address ports One Data Input Port Two Data Output Ports One Control Signal 5

a: emory Functional Units - I After an instruction address is put, the instruction residing at the address appears at the output port b: Program Counter -- A simple up counter c: Adder -- A 2 s complement adder address memory PC Add Sum a. memory b. Program counter c. Adder 6

a: Register File Functional Units - II It s construction, read, and write operations as discussed previously b: ALU (Arithmetic ti & Logic Unit) Register numbers Recall the ALU design we have discussed in last two classes Note the Zero output Data 5 register 5 register 2 Registers 5 register 2 Reg Data 3 ALU control Zero ALU ALU result a. Register File b. ALU 7

Functional Units -III a: Data emory Unit Similar to emory Unit, only that it can written into as well Two input ports for address and, one output port (for read out) Two control signals: for and operations b: Sign Extension Unit -- Extends 6-bit input operand to 32 bits em Address Data memory 6 Sign 32 e x te n d em a. Data m emory unit b. Sign-extension unit 8

Datapath for Fetch (Piece I) Fetching s and Incrementing the program counter by 4 Add 4 P C ad dre ss memory 9

Datapath for R-type s (Piece II) Datapath for R-type s register register 2 Registers register 2 3 ALU operation Zero ALU ALU result Reg 2

Datapath for Load/Store (Piece III) Datapath for load or store () Register Access; (2) emory Address calculation; (3) / (4) into Register file (if the instruction is a load) register register 2 Registers register Reg 2 3 ALU operation Zero ALU ALU result Address em Data memory 6 Sign 32 extend em 2

Datapath for Branch (Piece IV) Unit Shift left 2 adds at the low-order end of the sign-extended offset Control logic is used to decide whether the incremented PC or branch target should replace the PC based on the Zero output of the ALU PC + 4 from instruction path Add Sum Branch target Shift left 2 register register 2 Registers register Reg 2 6 Sign 32 extend ALU operation 3 To branch ALU Zero control logic 22

Datapath Construction Strategy Now, we have pieces of path that are capable of performing distinct functions We want to stitch them together to yield a final path that can execute all the instructions (lw, sw, add, sub, and, or,slt, beq, j) We will use multiplexors (or muxes for short) for stitching the path 23

i l Datapath Construction (erge Pieces II&III) Piece II + Piece III x ti ri r register register 2 Registers register register register 2 Registers register Reg 2 Reg 2 6 Sign 32 extend 3 ALU operation Zero ALU ALU result Address 3 ALU operation Zero ALU ALU result em Data memory em 24

And you will get.. Rule: Whenever we have more than one input feeding a functional unit, introduce a multiplexor (this gives rise to a control signal, more later..) Registers register register 2 register 2 Reg ALUSrc u x ALU operation 3 em Zero ALU ALU result Address Data memory emtoreg u x 6 Sign 32 extend em 25

Datapath Construction (merge Piece I) Just tack the Fetch and PC increment logic at the front! 4 Add PC address memory Registers register register 2 register 2 Reg ALUSrc ux ALU operation 3 em Zero ALU ALU result Address Data memory emtoreg ux 6 Sign 32 extend em 26

Datapath Construction (merge Piece IV) PCSrc Add ux 4 Shift left 2 Add ALU result PC address ti memory Registers register register 2 register Reg 2 6 Sign 32 extend ALUSrc u x ALU operation 3 em Zero ALU ALU result Address em Data memory emtoreg ux 27

Final Datapath Data flows through various paths under the influence of control signals There are seven control signals (of type,, or ux Select) PCSrc Add ux 4 ALU Add result Reg Shift left 2 PC address [3 ] memory [25 2] [2 6] [5 ] ux x RegDst [5 ] register register 2 2 register Registers 6 Sign 32 extend ALUSrc u x ALU control ALU Zero ALU result em Address Data memory em emtoreg u x [5 ] ALUOp 28

Defining the Control.. Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $7, $8 Format: op rs rt rd shamt funct ALU's operation based on instruction type and function code We will design two control units: ()ALU Control to generate appropriate function select signals for the ALU (2)ain Control to generate signals for functional units other than the ALU 29

Defining the ALU Control... Contd. e.g., what should the ALU do with this instruction Example: lw $, ($2) 35 2 op rs rt 6 bit offset ALU control input AND OR add subtract set-on-less-than than 3

ALU Control Design ust describe hardware to compute 3-bit ALU control input given instruction type = lw, sw = beq, = arithmetic function code for arithmetic ALUOp computed from instruction type ALU Control inputs How are they determined? Funct Desired ALU Control Opcode ALUOp Operation Field ALU Action Operation LW load word XXXXXX add SW store word XXXXXX add Branch equal branch equal XXXXXX subtract R-type add add R-type subtract subtract R-type AND and R-type OR or R-type set on less than set on less than 3

ALU Control - Truth Table & Implementation Describe it using a truth table (can turn into gates): ALUOp Funct field Operation ALUOp ALUOp F5 F4 F3 F2 F F X X X X X X X X X X X X X X X X X X X X X X X X X X X X ALUOp ALUOp ALUOp ALU control block F3 Operation2 F (5 ) F2 F F Operation Operation Operation 32

Designing the ain Control 4 Add [3 26] Control R egdst Branch em emtoreg ALUOp em Shift left 2 Add ALU result u x ALUSrc Reg PC address memory [3 ] ti [25 2] [2 6] [5 ] register register 2 Registers 2 register u x u x Zero ALU ALU result Address Data memory u x [5 ] 6 Sign 32 extend ALU control [5 ] 33

Control Signals and their Effects Signal Name Effect When deasserted Effect when asserted RegDst The register destination number for the The register destination number for the register comes from the rt field (bits 2-6) register comes from the rd field (bits 5-) Reg NONE The register on the register input is written with the value on the input ALUSrc The second ALU operand comes from the The second ALU operand is the sign-extended second register file output ( 2) lower 6 bits of the instruction PCSrc The PC is replaced by the output of the adder that The PC is replaced by the output of the adder computes the value of PC + 4. that computes the branch target em None Data emory contents designated by the address input are put on the output em None Data memory contents designated by the address input are replaced by the value on the input emtoreg The value fed to the register input The value fed to the register input comes comes from the ALU from the memory. 34

ain Control: Truth Table & Implementation emto- Reg em em RegDst ALUSrc Reg Branch ALUOp ALUp R-format lw sw X X beq X X Inputs Op5 O p 4 Op3 Op2 Op O p R -fo r m a t Iw s w b e q Outputs RegDst ALUSrc emtoreg R e g W rite em em Branch ALUOp ALUOpO 35

Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thingtobedone ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element Combinational logic State element 2 Clock cycle We are ignoring some details like setup and hold times 36

How does the single cycle path work? Let us understand this by highlighting g g the portions of the path when an R-type instruction is executed For an R-type instruction we go through the following phases: Phase : Fetch Phase 2: Register File Phase 3: ALU execution Phase 4: the Result into the Register File NOTE: All the four phases are completed in only ONE clock cycle and hence it is a single cycle implementation 37

R-type Phase ( Fetch) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 38

R-type Phase 2 (Register ) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 39

R-type Phase 3 ( ALU execution) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 4

R-type Phase 4 ( the Result) u 4 Add [3 26] Control RegDst Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU x PC address memory ti [3 ] [25 2] [2 6] [5 ] ux register register 2 Registers 2 register ux Zero ALU ALU result Address Data memory ux [5 ] 6 Sign 32 extend ALU control ti [5 ] 4

How do we handle jump? [25 ] Shift Jump address [3 ] left 2 26 28 4 Add PC+4 [3 28] [3 26] Control RegDst Jump Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add result ALU u x u x PC address memory [3 ] [25 2] [2 6] [5 ] register register 2 Zero Registers ALU ALU 2 result u x register u x Address Data memory u x [5 ] 6 Sign 32 extend ALU control [5 ] 42

Single Cycle Implementation: Summary All instructions are executed in only clock cycle We built asingle cycle path th from scratch th We designed appropriate controller to generate correct correct signals All instructions are not born equal; that some require more work, some less => disadvantage of single cycle implementation is that the slowest instruction determines the clock cycle width In reality, no body implements single cycle approach. Given the single cycle path, you should be able to highlight active portions of the path for any given instruction. 43

Single Cycle Implementation - Issues Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area Cycle width determined by the slowest instruction One Solution: use a smaller cycle time have different instructions ti take different numbers of cycles a multicycle path: 44

Single Cycle, ultiple Cycle, vs. Pipeline Clk Cycle Single Cycle Implementation: Cycle 2 Load Store Waste Clk Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle ultiple Cycle Implementation: Load Store Ifetch Reg Exec em Wr Ifetch Reg Exec em R-type Ifetch Pipeline Implementation: Load Ifetch Reg Exec em Wr Store Ifetch Reg Exec em Wr R-type Ifetch Reg Exec em Wr 45

ulticycle Approach Wewill be reusing functional units ALU used to compute address and to increment PC emory used for instruction and Our control signals will not be determined solely by instruction e.g., what should the ALU do for a subtract instruction? We ll use a finite state machine for control Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional internal registers 46

ulticycle Approach High Level View PC Address emory or Data register emory register Data Register # Registers Register # Register # A B ALU ALUOut 47

ulticycle Approach handles the basic instructions PC u x Address emory emdata [25 2] [2 6] [5 ] register [5 ] emory register [5 ] u x u x 6 register register 2 Registers register 2 Sign extend 32 Shift left 2 A B 4 u x u 2 x 3 Zero ALU ALU result ALUOut 48

Fetch Five Execution Steps ti Decode and Register Fetch Execution, emory Address Computation, or Branch Completion emory Access or R-type instruction completion -back step INSTRUCTIONS TAKE FRO 3-5 CYCLES! 49

Step : Fetch Use PC to get instruction and put it in the Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register- Transfer Language IR <= emory[pc]; PC <= PC + 4; 5

Step 2: Decode and Register Fetch registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25-2]]; B <= Reg[IR[2-6]]; ALUOut <= PC + ( sign-extend(ir[5- ]) << 2 ); 5

Step 3: (instruction dependent) ALU is performing one of three functions, based on instruction type emory Reference: ALUOut <= A + sign-extend(ir[5-]); R-type: ALUOut <= A op B; Branch: if (A==B) PC <= ALUOut; 52

Step 4: (R-type or memory-access) Loads and stores access memory DR <= emory[aluout]; or emory[aluout] <= B; R-type instructions finish Reg[IR[5-]] <= ALUOut; The write actually takes place at the end of the cycle on the edge. 53

Step 5: -back step Reg[IR[2-6]] <= DR; Summary of Steps taken to execute any instruction class. Step name Action for R-type instructions Action for memory-reference instructions Action for branches Action for jumps fetch IR <= emory[pc] PC <= PC + 4 A <= Reg [IR[25-2]] decode/register fetch B <= Reg g[ [IR[2-6]][ ALUOut <= PC + (sign-extend (IR[5-]) << 2) Execution, address ALUOut <= A op B ALUOut <= A + sign-extend if (A == B) then PC <= PC [3-28] II computation, branch/ (IR[5-]) PC <= ALUOut (IR[25-]<<2) jump completion emory access or R-type Reg [IR[5-]] <= Load: DR <= emory[aluout] completion ALUOut or Store: emory [ALUOut] <= B emory read completion Load: Reg[IR[2-6]] <= DR This sequence suggests what controller must do on each clock-cycle cycle. 54

ulticycle Datapath with Control Lines IorD em em IR RegDst Reg ALUSrcA PC u x Address emory emdata [25 2] register u A x [2 6] register 2 Registers [5 ] register 2 B register [5 ] [5 ] ux ux 4 u 2 x 3 Zero ALU ALU result ALUOut emory register 6 Sign extend 32 Shift left 2 ALU control [5 ] emtoreg ALUSrcB ALUOp 55

ulticycle Datapath with Controller PC u x Address emory emdata [3-26] [25 2] [2 6] [5 ] register [5 ] PCCond PC IorD em em emtoreg IR Outputs Control Op [5 ] PCSource ALUOp ALUSrcB ALUSrcA Reg RegDst [25 ] 26 Shift 28 left 2 [5 ] ux ux register register 2 Registers register 2 A B 4 ux u 2 x 3 PC [3-28] Zero ALU ALU result Jump address [3-] ALUOut ux 2 emory register 6 Sign extend 32 Shift left 2 ALU control [5 ] 56

Implementing the Control Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we ve accumulated to specify a finite state machine specify the finite state machine graphically, or Use microprogramming g Implementation can be derived from specification 57

Actions of the -bit control signals Signal Name Effect When deasserted Effect when asserted RegDst The register destination number for the The register destination number for the register comes from the rt tfield register comes from the rd field Reg NONE The general purpose register selected by the register number is written with the value of the input. ALUSrcA The first ALU operand is the PC. The first ALU operand comes from the A register. em None Content of emory at the location specified by the Address input is put on emory output. em None emory contents at the location specified by the Address input is replaced by value on input. emtoreg The value fed to the register file inputthe value fed to the register file input comes comes from ALUOut from the DR. IorD The PC is used to supply the address to the ALUOut is used to supply the address to the memory memory unit. unit. IR None The output of the memory is written into the IR. PC None The PC is written; the source is controlled by PCSource PCCond None The PC is written if the Zero output from the ALU is activ 58

Actions of the 2-bit control signals Signal Name Value Effect when asserted "" The ALU performs an add operation. ALUOp "" The ALU performs a subtract operation. "" The funct field of the instruction determines the ALU operation "" The second input to the ALU comes from the B register. ALUSrcB "" The second input to the ALU is the constant 4. "" The second input to the ALU is the sign-extended, lower 6 bits of the IR. "" The second input to the ALU is the sign-extended, lower 6 bits of the IR shifted left 2 bits PCSource "" Output of the ALU (PC + 4) is sent to the PC for writing. "" The contents of the ALUOut (the branch target address) are sent to the PC for writing. "" The jump target address (IR[25-] shifted left 2 bits and concatenated with PC + 4[3-28]) is sent to the PC for writing 59

Finite state machines: Finite state machines a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) We ll use a oore machine (output based only on current state) Current state Next-state function Next state Inputs Clock Output function Outputs 6

Finite State achine Control for ulticycle Implementation ti Start fetch/decode and register fetch (Figure 5.37) 5.32 ) emory access instructions (Figure 5.38) R-type instructions Branch instruction Jump instruction (Figure 5.39) (Figure 5.4) (Figure 5.4) 5.33) 5.34) 5.35) 5.36) 6

Fetch & Decode (Fig 5.32) fetch decode/ Register fetch em ALUSrcA = IorD = IR ALUSrcA = Startt ALUSrcB = ALUSrcB = ALUOp = ALUOp = PC PCSource = Op = 'JP') (O emory reference FS R-type FS Branch FS Jump FS (Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36) 62

emory Reference s (Fig. 5.33) 2 From state (Op='LW')or(Op='SW') emory address computation ALUSrcA = ALUSrcB = ALUOp = 3 (Op ='LW') emory access 5 emory access em IorD = em IorD = 4 -back step Reg emtoreg = RegDst = To state (Figure 5.32) 63

6 7 From state (Op = R-type) ALUSrcA = ALUSrcB = ALUOp = Execution R-type, Branch, Jump.. R-type completion 8 From state (Op = 'BEQ') ALUSrcA = ALUSrcB = ALUOp = PCCond PCSource = Branch completion 9 From state t (Op = 'J') PC PCSource = Jump completion RegDst = Reg emtoreg = To state (Figure 5.32) To state (Figure 5.32) Tostate t (Figure 5.32) 64

Graphical Specification of FS 2 emory address computation ALUSrcA = ALUSrcB = ALUOp = Start fetch em ALUSrcA = IorD = IR ALUSrcB = ALUOp = PC PCSource = 6 Execution ALUSrcA = ALUSrcB = ALUOp= 8 Branch completion ALUSrcA = ALUSrcB = ALUOp = PCCond PCSource = ti decode/ d register fetch 9 ALUSrcA = ALUSrcB = ALUOp = (Op = 'J') Jump completion PC PCSource = 3 (Op = 'LW') emory access 5 emory access 7 R-type completion em IorD = em IorD = RegDst = Reg emtoreg = 4 -back step RegDst= Reg emtoreg = 65

Finite State achine for Control PC Control logic Inputs Outputs PCCond IorD em em IRW rite emtoreg PCSource ALUOp ALUSrcB ALUSrcA R egw rite RegDst NS3 NS2 NS NS Op p5 Op p4 Op p3 Op p2 Op p Op p S3 S2 S S register opcode field State register 66

ulticycle Datapath to Handle Exceptions 67

ulticycle Control to Handle Exceptions 68

Controller Implementation: Big Picture Initial representation Finite state diagram icroprogram Sequencing control Explicit next state function icroprogram counter +dispatchros Logic Logic Truth representation equations tables Implementation Programmable only technique logic array memory 69

Summary Single-cycle implementation ulti-cycle implementation Is an effective implementation: slower instructions take more clock cycles, faster, less! Higher resource sharing (area is less) State diagrams can be used to specify the control From FS spec, we can automatically synthesize the controller implementation. Controller Implementation Three choices: RO, PLA, and icroprogramming PLAs is more efficient in terms of area compared to RO icroprogramming is a flexible style (popularized by CISC) 7