Multicycle Approach. Designing MIPS Processor

Similar documents
Chapter 4 The Processor (Part 2)

The Processor: Datapath & Control

Lets Build a Processor

Implementing the Control. Simple Questions

ENE 334 Microprocessors

ECE369. Chapter 5 ECE369

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

EECE 417 Computer Systems Architecture

Systems Architecture I

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

CC 311- Computer Architecture. The Processor - Control

CPE 335. Basic MIPS Architecture Part II

Processor: Multi- Cycle Datapath & Control

Topic #6. Processor Design

Mapping Control to Hardware

Lecture 5: The Processor

Outline. Combinational Element. State (Sequential) Element. Clocking Methodology. Input/Output of Elements

Multiple Cycle Data Path

CSE 2021 COMPUTER ORGANIZATION

Computer Science 141 Computing Hardware

EECE 417 Computer Systems Architecture

CSE 2021 COMPUTER ORGANIZATION

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Chapter 5: The Processor: Datapath and Control

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

Multicycle conclusion

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

RISC Processor Design

LECTURE 6. Multi-Cycle Datapath and Control

Chapter 5 Solutions: For More Practice

5.7. Microprogramming: Simplifying Control Design 5.7

Processor (multi-cycle)

Initial Representation Finite State Diagram. Logic Representation Logic Equations

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Microprogramming. Microprogramming

Points available Your marks Total 100

Major CPU Design Steps

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

RISC Architecture: Multi-Cycle Implementation

Design of the MIPS Processor

Control Unit for Multiple Cycle Implementation

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Introduction. ENG3380 Computer Organization and Architecture MIPS: Data Path Design Part 3. Topics. References. School of Engineering 1

Control & Execution. Finite State Machines for Control. MIPS Execution. Comp 411. L14 Control & Execution 1

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

RISC Design: Multi-Cycle Implementation

Design of the MIPS Processor (contd)

RISC Architecture: Multi-Cycle Implementation

Microprogrammed Control Approach

Note- E~ S. \3 \S U\e. ~ ~s ~. 4. \\ o~ (fw' \i<.t. (~e., 3\0)

Single vs. Multi-cycle Implementation

ECE 3056: Architecture, Concurrency and Energy of Computation. Single and Multi-Cycle Datapaths: Practice Problems

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

CS152 Computer Architecture and Engineering. Lecture 8 Multicycle Design and Microcode John Lazzaro (

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Review: Abstract Implementation View

Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ECE468 Computer Organization and Architecture. Designing a Multiple Cycle Controller

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Lecture 06: Datapath

ECE 313 Computer Organization EXAM 2 November 9, 2001

EE457. Note: Parts of the solutions are extracted from the solutions manual accompanying the text book.

October 24. Five Execution Steps

Processor (I) - datapath & control. Hwansoo Han

CS152 Computer Architecture and Engineering Lecture 13: Microprogramming and Exceptions. Review of a Multiple Cycle Implementation

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

CpE 442. Designing a Multiple Cycle Controller

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Lecture 9: Microcontrolled Multi-Cycle Implementations. Who Am I?

CSE Lecture In Class Example Handout

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

Designing a Multicycle Processor

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Systems Architecture

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

ECE232: Hardware Organization and Design

Lecture 10 Multi-Cycle Implementation

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

ECE 486/586. Computer Architecture. Lecture # 7

LECTURE 5. Single-Cycle Datapath and Control

Single Cycle Datapath

Lab 8: Multicycle Processor (Part 1) 0.0

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

Micro-programmed Control Ch 17

ECE 30, Lab #8 Spring 2014

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

Processor Implementation in VHDL. University of Ulster at Jordanstown University of Applied Sciences, Augsburg

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Transcription:

CSE 675.2: Introduction to Computer Architecture Multicycle Approach 8/8/25 Designing MIPS Processor (Multi-Cycle) Presentation H Slides by Gojko Babić and Elsevier Publishing We will be reusing functional units ALU used to compute address and to increment PC used for instruction and data Our control signals will not be determined directly by instruction e.g., what should the ALU do for a subtract instruction? We ll use a finite state machine for control Multicycle Approach Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional internal registers PC M ux Address [25 2] register M A ux data [2 6] register 2 MemData M Registers [5 ] ux Write Write [5 ] register data 2 B data 4 M register Write ux data 2 M 3 [5 ] ux data register 6 32 Sign extend Shift left 2 Zero ALU ALU result ALUOut s from ISA perspective Consider each instruction from perspective of ISA. Example: The add instruction changes a register. Register specified by bits 5: of instruction. specified by the PC. New value is the sum ( op ) of two registers. Registers specified by bits 25:2 and 2:6 of the instruction Reg[[PC][5:]] <= Reg[[PC][25:2]] op Reg[[PC][2:6]] In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming) g. babic Presentation H 4

Breaking down an instruction ISA definition of arithmetic: Reg[[PC][5:]] <= Reg[[PC][25:2]] op Reg[[PC][2:6]] Could break down to: IR <= [PC] A <= Reg[IR[25:2]] B <= Reg[IR[2:6]] ALUOut <= A op B Reg[IR[2:6]] <= ALUOut We forgot an important part of the definition of arithmetic! PC <= PC + 4 Idea behind multicycle approach We define each instruction from the ISA perspective (do this!) Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps) Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.) Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution) Result: Our book s multicycle Implementation! g. babic Presentation H 5 g. babic Presentation H 6 Fetch Decode and Register Fetch Execution, Address Computation, or Branch Completion Access or R-type instruction completion Write-back step Five Execution Steps INSTRUCTIONS TAKE FROM 3-5 CYCLES! Step : Fetch Use PC to get instruction and put it in the Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register-Transfer Language" IR <= [PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now?

Step 2: Decode and Register Fetch registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:2]]; B <= Reg[IR[2:6]]; ALUOut <= PC + (sign-extend(ir[5:]) << 2); We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) Step 3 (instruction dependent) ALU is performing one of three functions, based on instruction type Reference: R-type: Branch: ALUOut <= A + sign-extend(ir[5:]); ALUOut <= A op B; if (A==B) PC <= ALUOut; Step 4 (R-type or memoryaccess) Loads and stores access memory MDR <= [ALUOut]; or [ALUOut] <= B; Write-back step Reg[IR[2:6]] <= MDR; Which instruction needs this? R-type instructions finish Reg[IR[5:]] <= ALUOut; The write actually takes place at the end of the cycle on the edge

Summary: Simple Questions How many cycles will it take to execute this code? lw $t2, ($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) Label:... #assume not What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? Cond IorD MemtoReg IRWrite Outputs Op [5 ] PCSource ALUOp ALUSrcB ALUSrcA RegDst Shift [25-] 26 28 left 2 [3 26] PC PC [3 28] M ux Address [25 2] register M A ux data [2 6] Zero register 2 MemData ALU Registers ALU M [5 ] ux Write result Write [5 ] register data 2 B data 4 M register Write ux data 2 M 3 [5 ] ux 6 32 data Sign Shift ALU register extend left 2 control Jump M address ux [3 ] 2 ALUOut Review: finite state machines Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Inputs Current state Clock Next-state function Output function Next state Outputs [5 ] We ll use a Moore machine (output based only on current state) g. babic Presentation H 5

Review: finite state machines Example: B. 37 A friend would like you to build an electronic eye for use as a fake security device. The device consists of three lights lined up in a row, controlled by the outputs Left, Middle, and Right, which, if asserted, indicate that a light should be on. Only one light is on at a time, and the light moves from left to right and then from right to left, thus scaring away thieves who believe that the device is monitoring their activity. Draw the graphical representation for the finite state machine used to specify the electronic eye. Note that the rate of the eye s movement will be controlled by the clock speed (which should not be too great) and that there are essentially no inputs. Implementing the Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we ve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification Finite State Machines Introduction Finite State Machine Example: 3 ones Draw the FSM Truth table PS Input NS Output g. babic Presentation H 9 g. babic Presentation H 2

Hardware Implementation of FSM + =? A current state is kept in the Current state register. Next state function is determined by current state and the input. Output function is determined by current state and input. We will use a Moore machine, where output is based only on current state. Figure B.. Inputs Finite State Machines Current state Clock Next-state function Output function Next state Outputs g. babic Presentation H 2 g. babic Presentation H 22 Finite State Machine Graph for Unit Figure 5.38 with corrections in red 2 3 4 address computation ALUSrcA = ALUSrcB = ALUOp = (Op ='LW') IorD = access RegDst= MemtoReg= (Op = 'SW') 5 Write-back step Start (Op = 'LW ') or (Op = 'SW ') IorD = access 6 7 ALUSrcA = IorD = IRW rite ALUSrcB = ALUOp = PCSource = Execution ALUSrcA = ALUSrcB = ALUOp= RegDst = MemtoReg = fetch R-type completion (Op = R-type) Branch completion 8 ALUSrcA = ALUSrcB = ALUOp = Cond PCSource = decode/ register fetch ALUSrcA = ALUSrcB = ALUOp = Jump completion PCSource = g. babic Presentation H 23 (Op = 'BEQ') 9 (Op = 'J') ALUSrcA= ALUSrcB= ALUOp= Graphical Specification of FSM Start Note: don t care if not mentioned asserted if name only otherwise exact value How many state bits will we need? 2 3 4 address computation ALUSrcA = ALUSrcB = ALUOp = IorD = access RegDst = MemtoReg = read completon step 6 IorD = fetch decode/ register fetch ALUSrcA = IorD = IRWrite ALUSrcA = ALUSrcB = ALUSrcB = ALUOp = ALUOp = PCSource = Execution ALUSrcA = ALUSrcB = ALUOp = access 5 7 8 RegDst = MemtoReg = Branch completion ALUSrcA = ALUSrcB = ALUOp = Cond PCSource = R-type completion 9 Jump completion PCSource =

Step : Fetch Use PC to get instruction and put it in the Register, i.e. IR [PC]; IorD=,, IRWrite Increment the PC by 4 and put the result back in the PC, i.e. PC [PC] + 4; ALUSrcA=, ALUSrcB=, ALUOp=, PCSource=, Can we figure out the values of the control signals? Here are rules for signals that are omitted: If signal for mux is not stated, it is don t care If ALU signals are not stated, they are don t care If,,, IRWrite, or Cond is not stated, it is unasserted, i.e. logical. Step 2: Decode & Register Fetch We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) registers rs and rt in case we need them: A Reg[IR[25-2]]; B Reg[IR[2-6]]; Done automatically Compute the branch address in case the instruction is a branch: ALUOut PC + (sign-extend(ir[5-]) << 2); ALUSrcA=, ALUSrcB=, ALUOp= g. babic Presentation H 25 g. babic Presentation H 26 Step 3: Dependent ALU is performing one of three functions, based on instruction type Reference (lw or sw): ALUOut A + sign-extend(ir[5-]); ALUSrcA=, ALUSrcB=, ALUop= R-type: ALUOut A op B; ALUSrcA=, ALUSrcB=, ALUp= Branch on Equal: if (A==B) PC ALUOut; ALUSrcA=, ALUSrcB=, ALUp= PCSource=, Cond Note: beq instruction is done, thus this instruction requires 3 clock cycles to execute. g. babic Presentation H 27 Steps 4 and 5: Dependent Step 4: R-type and Access Loads and stores access memory MDR [ALUOut] (load); or [ALUOut] B (store); R-type instructions finish Reg[IR[5-]] ALUOut; Register write actually takes place at the end of the cycle on the falling edge Store and R-type instructions are done in 4 clock cycles Step 5: Write back (load only) Reg[IR[2-6]] MDR IorD=, IorD=, RegDst=, MemToReg=, RegDst=, MemToReg=, g. babic Presentation H 28

Summary of Executions Figure 5.5 Finite State Machine for Step name fetch decode/register fetch Action for R-type instructions Action for memory-reference instructions A Reg [IR[25-2]] IR [PC] PC PC + 4 Action for branches B Reg [IR[2-6]] ALUOut PC + (sign-extend (IR[5-]) << 2) Action for jumps Implementation: logic Cond IorD IRWrite MemtoReg PCSource Execution, address ALUOut A op B ALUOut A + sign-extend if (A ==B) then PC PC [3-28] II Outputs ALUOp ALUSrcB computation, branch/ (IR[5-]) PC ALUOut (IR[25-]<<2) jump completion ALUSrcA RegDst access or R-type Reg [IR[5-]] Load: MDR [ALUOut] completion ALUOut or Store: [ALUOut] B Op5 Op4 Op3 Op2 Inputs Op Op S3 S2 S S NS3 NS2 NS NS read completion Load: Reg[IR[2-6]] MDR register opcode field State register Note: Jump instruction added: PCSource=, g. babic Presentation H 29 PLA Implementation Op5 Op4 Op3 Op2 Op Op S3 S2 ROM Implementation ROM = " Only " values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2 m entries in the ROM. our outputs are the bits of data that the address points to. S S Cond IorD IRWrite MemtoReg PCSource PCSource ALUOp ALUOp ALUSrcB ALUSrcB ALUSrcA RegDst NS3 NS2 NS NS m m is the "height", and n is the "width" n

ROM Implementation How many inputs are there? 6 bits for opcode, 4 bits for state = address lines (i.e., 2 = 24 different addresses) How many outputs are there? 6 datapath-control outputs, 4 state bits = 2 outputs ROM is 2 x 2 = 2K bits (and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are the same i.e., opcode is often ignored ROM vs PLA Break up the table into two parts 4 state bits tell you the 6 outputs, 2 4 x 6 bits of ROM bits tell you the 4 next state bits, 2 x 4 bits of ROM Total: 4.3K bits of ROM PLA is much smaller can share product terms only need entries that produce an active output can take into account don't cares Size is (#inputs #product-terms) + (#outputs #product-terms) For this example = (x7)+(2x7) = 5 PLA cells PLA cells usually about the size of a ROM cell (slightly bigger) Another Implementation Style Complex instructions: the "next state" is often current state + unit PLA or ROM Input Outputs Cond IorD IRWrite BWrite MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegDst AddrCtl Details Dispatch ROM Dispatch ROM 2 Op Opcode name Value Op Opcode name Value R-format lw jmp sw beq lw PLA or ROM sw Adder Dispatch ROM 2 State Mux 3 2 Dispatch ROM Address select logic AddrCtl Adder State Address select logic Op[5 ] register opcode field State number Address-control action Value of AddrCtl Use incremented state 3 Use dispatch ROM 2 Use dispatch ROM 2 2 3 Use incremented state 3 4 Replace state number by 5 Replace state number by 6 Use incremented state 3 7 Replace state number by 8 Replace state number by 9 Replace state number by register opcode field

Microprogramming unit Cond IorD Microcode memory Datapath IRWrite BWrite Outputs MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegDst AddrCtl Input Microprogram counter Adder Address select logic register opcode field Microprogramming A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions Label ALU control SRC SRC2 Register control control Sequencing Fetch Add PC 4 PC ALU Seq Add PC Extshft Dispatch Mem Add A Extend Dispatch 2 LW2 ALU Seq Write MDR Fetch SW2 Write ALU Fetch Rformat Func code A B Seq Write ALU Fetch BEQ Subt A B ALUOut-cond Fetch JUMP Jump address Fetch Will two implementations of the same architecture have the same microcode? What would a microassembler do? Microinstruction format Field name Value Signals active Comment Add ALUOp = Cause the ALU to add. ALU control Subt ALUOp = Cause the ALU to subtract; this implements the compare for branches. Func code ALUOp = Use the instruction's function code to determine ALU control. SRC PC ALUSrcA = Use the PC as the first ALU input. A ALUSrcA = Register A is the first ALU input. B ALUSrcB = Register B is the second ALU input. SRC2 4 ALUSrcB = Use 4 as the second ALU input. Extend ALUSrcB = Use output of the sign extension unit as the second ALU input. Extshft ALUSrcB = Use the output of the shift-by-two unit as the second ALU input. two registers using the rs and rt fields of the IR as the register numbers and putting the data into registers A and B. Write ALU, Write a register using the rd field of the IR as the register number and Register RegDst =, the contents of the ALUOut as the data. control MemtoReg = Write MDR, Write a register using the rt field of the IR as the register number and RegDst =, the contents of the MDR as the data. MemtoReg = PC, memory using the PC as address; write result into IR (and lord = the MDR). ALU, memory using the ALUOut as address; write result into MDR. lord = Write ALU, Write memory using the ALUOut as address, contents of B as the lord = data. ALU PCSource = Write the output of the ALU into the PC. PC write control ALUOut-cond PCSource =, If the Zero output of the ALU is active, write the PC with the contents Cond of the register ALUOut. jump address PCSource =, Write the PC with the jump address from the instruction. Seq AddrCtl = Choose the next microinstruction sequentially. Sequencing Fetch AddrCtl = Go to the first microinstruction to begin a new instruction. Dispatch AddrCtl = Dispatch using the ROM. Dispatch 2 AddrCtl = Dispatch using the ROM 2. Maximally vs. Minimally Encoded No encoding: bit for each datapath operation faster, requires more memory (logic) used for Vax 78 an astonishing 4K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It s easy to add new instructions

Microcode: Trade-offs Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes Historical Perspective In the 6s and 7s microprogramming was very important for implementing machines This led to more sophisticated ISAs and the VAX In the 8s RISC processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-32 architecture processors since 486 use: hardwired control for simpler instructions (few cycles, FSM control implemented using PLA or random logic) microcoded control for more complex instructions (large numbers of cycles, central control store) The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store g. babic Presentation H 42 Pentium 4 Pipelining is important (last IA-32 without it was 8386 in 985) Pentium 4 Somewhere in all that control we must handle complex instructions cache Enhanced floating point and multimedia Data cache Integer datapath I/O interface Secondary cache and memory interface Chapter 7 Chapter 6 cache Enhanced floating point and multimedia Data cache Integer datapath I/O interface Secondary cache and memory interface Advanced pipelining hyperthreading support Advanced pipelining hyperthreading support Pipelining is used for the simple instructions favored by compilers Simply put, a high performance implementation needs to ensure that the simple instructions execute quickly, and that the burden of the complexities of the instruction set penalize the complex, less frequently used, instructions g. babic Presentation H 43 Processor executes simple microinstructions, 7 bits wide (hardwired) 2 control lines for integer datapath (4 for floating point) If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8 microinstructions) Its complicated! g. babic Presentation H 44

Chapter 5 Summary If we understand the instructions We can build a simple processor! If instructions take different amounts of time, multi-cycle is better Datapath implemented using: Combinational logic for arithmetic State holding elements to remember bits implemented using: Combinational logic for single-cycle implementation Finite state machine for multi-cycle implementation g. babic Presentation H 45