Review Multicycle: What is Happening. Controlling The Multicycle Design

Similar documents
The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

The extra single-cycle adders

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

The single-cycle design from last time

Review. A single-cycle MIPS processor

Exceptions and interrupts

Review: Computer Organization

PART I: Adding Instructions to the Datapath. (2 nd Edition):

Multicycle conclusion

EEC 483 Computer Organization

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

1048: Computer Organization

1048: Computer Organization

Pipelining. Chapter 4

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

Lecture 7. Building A Simple Processor

What do we have so far? Multi-Cycle Datapath

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

CS 251, Winter 2019, Assignment % of course mark

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

CS 251, Winter 2018, Assignment % of course mark

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Hardware Design Tips. Outline

CS 251, Spring 2018, Assignment 3.0 3% of course mark

Chapter 6: Pipelining

PS Midterm 2. Pipelining

Review. How to represent real numbers

Overview of Pipelining

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle

CS 251, Winter 2018, Assignment % of course mark

Lecture 9: Microcontrolled Multi-Cycle Implementations

Quiz #1 EEC 483, Spring 2019

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

EEC 483 Computer Organization

Enhanced Performance with Pipelining

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Computer Architecture

1048: Computer Organization

Computer Architecture

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W9-W

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

Chapter 6: Pipelining

ECE 313 Computer Organization EXAM 2 November 9, 2001

Computer Architecture. Lecture 6: Pipelining

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture

EEC 483 Computer Organization. Branch (Control) Hazards

ECE232: Hardware Organization and Design

Solutions for Chapter 6 Exercises

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

POWER-OF-2 BOUNDARIES

Lecture 5: The Processor

comp 180 Lecture 25 Outline of Lecture The ALU Control Operation & Design The Datapath Control Operation & Design HKUST 1 Computer Science

Processor (multi-cycle)

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Single-Cycle Examples, Multi-Cycle Introduction

Pipelined Datapath. One register file is enough

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

CSSE232 Computer Architecture I. Mul5cycle Datapath

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

CPE 335. Basic MIPS Architecture Part II

Multicycle Approach. Designing MIPS Processor

MIPS Architecture. An Example: MIPS. From the Harris/Weste book Based on the MIPS-like processor from the Hennessy/Patterson book

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Computer Science 141 Computing Hardware

Outline. Combinational Element. State (Sequential) Element. Clocking Methodology. Input/Output of Elements

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

CC 311- Computer Architecture. The Processor - Control

The Processor: Datapath & Control

10.2 Solving Quadratic Equations by Completing the Square

RISC Design: Multi-Cycle Implementation

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

Implementing the Control. Simple Questions

Lecture 10: Pipelined Implementations

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

Lecture 10 Multi-Cycle Implementation

Lab 8 (All Sections) Prelab: ALU and ALU Control

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Multiple Cycle Data Path

Lab 8: Multicycle Processor (Part 1) 0.0

ENE 334 Microprocessors

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

ECE473 Computer Architecture and Organization. Processor: Combined Datapath

Computer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012

Introduction. ENG3380 Computer Organization and Architecture MIPS: Data Path Design Part 3. Topics. References. School of Engineering 1

Midterm. Sticker winners: if you got >= 50 / 67

Chapter 4 The Processor (Part 2)

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Topic #6. Processor Design

ECE 313 Computer Organization Name SOLUTION EXAM 2 November 3, Floating Point 20 Points

ECE 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller. Review of a Multiple Cycle Implementation

ECE369. Chapter 5 ECE369

Lecture 13: Exceptions and Interrupts

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Design of the MIPS Processor

Lets Build a Processor

Transcription:

Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em Controlling The lticycle Design Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em

Stage : Instrction fetch & increment IorD IR = em[] SrcA em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] Instrction emory RegDst Reg Registers Sign etend Shift left A B SrcB Zero Reslt Op = + Ot Sorce emtoreg Controls:, IorD, em, IR, SrcA==, SrcB==, Op==add, Sorce== Register File Devoting whole cycle only to read regs is a waste IorD SrcA em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] RegDst Reg Registers A B Zero Reslt Op Ot Sorce Instrction emory Sign etend Shift left SrcB emtoreg

Stage : Reg fetch & branch target IorD sorce s SrcA em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] RegDst Reg Registers A B Zero Reslt Op Ot Sorce Instrction emory Sign etend Shift left SrcB Compte branch target address emtoreg Controls: SrcA==, SrcB==, Op==add 5 Stage (beq): Branch completion IorD em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] RegDst Reg Registers A B SrcA Use the target address compted in stage Zero Reslt Op Ot Sorce Instrction emory Sign etend Shift left SrcB Check for eqality of contents emtoreg Controls: SrcA==, SrcB==, Op==sb, Sorce==, == 6

Finite-state machine for the nit Op = R-type SrcA = SrcB = Op = fnc R-type eection Reg = RegDst = emtoreg = R-type writeback Instrction fetch and increment IorD = em = IR = SrcA = SrcB = Op = Sorce = = Register fetch and branch comptation SrcA = SrcB = Op = Op = BEQ SrcA = SrcB = Op = = Zero Sorce = Effective address comptation Branch completion Op = SW em = IorD = emory write Op = LW/SW SrcA = SrcB = Op = Op = LW em = IorD = emory read Reg = RegDst = emtoreg = Register write All instrction are the same for stages and 7 Comparing instrction eection times In the single-cycle path, each instrction needs an entire clock cycle, or 8ns, to eecte With the mlticycle CPU, different instrctions need different nmbers of clock cycles A branch needs cycles, or ns = 6ns Arithmetic and sw instrctions each reqire cycles, or 8ns Finally, a lw takes 5 cycles, or ns We can make some observations abot performance already Loads take longer with this mlticycle implementation, while all other instrctions are faster than before. So if or program doesn t have too many loads, then we shold see an increase in performance. 8

The gcc eample Let s assme the gcc instrction mi Instrction Freqency Arithmetic 8% Loads % Stores % Branches 9% In a single-cycle path, all instrctions take 8ns The average eection time for an instrction on the mlticycle processor works ot to 8.6ns: (8% 8ns) + (% ns) + (% 8ns) + (9% 6ns) =.8 +. +.88 +. = 8.6ns The mlticycle implementation is actally slightly slower 9 Reconsider emory s Role emory is 5ns, implying single-cycle = ns implying a 9.6Hz clock rate For mlti-cycle w/cache, let the processor stall on a cache miss Keep ns cycle time or 5Hz clock rate Instrction eection for GCC 8.6 ns Consider eecting 9 instrctions w/ 6 memory references: 5ns * 6 = 5* 7 ns single-cycle = seconds for total of.5 sec mlti-cycle = memory time + instrction eection time =.5 + 8.6 seconds for total of 8. sec 5

Retrn:Finite-state machine for Op = R-type SrcA = SrcB = Op = fnc R-type eection Reg = RegDst = emtoreg = R-type writeback Instrction fetch and increment IorD = em = IR = SrcA = SrcB = Op = Sorce = = Register fetch and branch comptation SrcA = SrcB = Op = Op = BEQ SrcA = SrcB = Op = = Zero Sorce = Effective address comptation Branch completion Op = SW em = IorD = emory write Op = LW/SW SrcA = SrcB = Op = Op = LW em = IorD = emory read Reg = RegDst = emtoreg = Register write Recall: Implementing the FS FS can be translated into a state table; first states: Crrent State Inpt (Op) Instr X Fetch Reg BEQ Fetch Reg Fetch Reg Fetch Net State Reg Fetch Branch compl R-type R-type eecte LW/SW Compte eff addr IorDemR em ead Otpt (Control signals) IR Reg emto Dst Reg Reg SrcA SrcB Op Yo can implement this the hard way yo don t want to do this Represent the crrent state sing flip-flops or a. Find eqations for the net state and ( signal) otpts in terms of the crrent state and inpt (instrction word). Or yo can se the easy way. Stick the whole state table into a memory, like a RO This wold be mch easier, since yo don t have to derive eqations Sorce X X X X X X X X X X X X X X 6

otivation for microprogramming Think of the nit s state diagram as a program Each state represents a command, or a set of signals that tells the path what to do Several commands are eected seqentially Branches may be taken depending on the instrction opcode The state machine loops by retrning to the initial state We cold invent a special langage for the nit We cold devise a more readable, higher-level notation rather than dealing directly with binary signals and state transitions We wold design nits by writing programs in this langage We wold depend on a hardware or software translator to convert or programs into a circit for the nit A good notation is very sefl Instead of specifying the eact binary vales for each signal, we will define a symbolic notation that s easier to work with As a simple eample, we might replace SrcB = with SrcB =, meaning the constant We can also create symbols that combine several signals together. Instead of IorD = em = IR = it wold be nicer to jst say something like 7

icroinstrctions Label Src Src Register emory Net For the IPS mlticycle we cold define microinstrctions with eight fields. These fields will be filled in symbolically, instead of in binary They determine all the signals for the path. There are only 8 fields becase some of them specify more than one of the actal signals A microinstrction corresponds to one eection stage, or one cycle Yo can see that in each microinstrction, we can do something with the, file, memory, and program conter nits 5 Label Specifying operations Src Src Register emory Net selects the operation Add indicates addition for memory offsets or increments Sb performs sorce comparisons for beq Fnc denotes the eection of R-type instrctions SRC is either or A, for the s first operand SRC, the second operand, can be one of for different vales B for R-type instrctions and branch comparisons The constant to increment the Etend, the sign-etended constant field for mem refs Etshift, sign-etended, shifted constant for branch targets These correspond to the Op, SrcA and SrcB signals, ecept we se names like Add and not actal bits like 6. 8

Specifying and memory actions Label Src Src Register emory Net Register selects a file action to read from s rs and rt of the instrction word writes Ot into destination rd DR saves DR into destination rt emory chooses the memory nit s action reads an instrction from address into IR reads from address Ot into DR writes B to address memory Ot 7 Specifying actions Label Src Src Register emory Net determines what happens to the. sets to Ot, sed in incrementing the. -Zero writes Ot to only if the s Zero condition is tre. This is sed to complete a branch instrction. Net determines the net microinstrction to be eected. Seq cases the net microinstrction to be eected. Fetch retrns to the initial instrction fetch stage. Dispatch i is similar to a switch or case statement; it branches depending on the actal instrction word. 8 9

icroprogramming the first stage Below are two lines of microcode to implement the first two mlticycle eection stages, instrction fetch and fetch The first line, labeled Fetch, involves several actions from memory address Use to compte +, and retrn it to the Contine on to the net seqential microinstrction Label Src Src Register emory Net Fetch Add Seq Add Etshift Dispatch 9 The second stage Label Src Src Register emory Net Fetch Add Seq Add Etshift Dispatch The second line implements fetch stage s rs and rt from the file Pre-compte + (sign-etend(ir[5-]) << ) for branches Determine the net microinstrction based on the opcode of the crrent IPS program instrction switch (opcode) { case : goto BEQ; case : goto Rtype; case : case 5: goto em; }

Completing a beq instrction Label Src Src Register emory Net BEQ Sb A B -Zero Fetch Control wold transfer to this microinstrction if the opcode was beq Compte A-B, to set the s Zero bit if A=B Update with Ot (which contains the branch target from the previos cycle) if Zero is set The beq is completed, so fetch the net instrction The in the label BEQ reminds s that we came here via the first branch point ( dispatch table ), from the second eection stage Completing an arithmetic instrction Label Src Src Register emory Net Rtype fnc A B Seq Fetch When the opcode indicates an R-type instrction The first cycle performs an operation on s A and B, based on the IPS instrction s fnc field The net stage writes the otpt to rd from the IPS instrction word We can then go back to the Fetch microinstrction, to fetch and eecte the net IPS instrction

Completing transfer instrctions Label Src Src Register emory Net em Add A Etend Dispatch SW Fetch LW Seq DR Fetch For both sw, lw instrctions, we shold first compte the effective memory address, A + sign-etend(ir[5-]) Another dispatch or branch distingishes between stores and loads For sw, we store (from B) to the effective memory address For lw we copy from the effective memory address to rt In either case, we contine on to Fetch when done icroprogramming vs. programming icroinstrctions correspond to signals They describe what is done in a single clock cycle These are the most basic operations available in a processor icroprograms implement higher-level IPS instrctions IPS assembly langage instrctions are comparatively comple, each possibly reqiring mltiple clock cycles to eecte Bt each comple IPS instrction can be implemented with several simpler microinstrctions

Similarities with assembly langage icrocode is intended to make nit design easier We defined symbols like to replace binary signals A translator converts microinstrctions into a real nit The translation is straightforward, becase each microinstrction corresponds to one set of vales This sonds similar to IPS assembly langage! We se mnemonics like lw instead of binary opcodes like IPS programs mst be assembled to prodce real machine code Each IPS instrction corresponds to a -bit instrction word 5 anaging compleity It looks like all we ve done is devise a new notation that makes it easier to specify signals And that s eactly right! The isse is managing compleity Control nits are probably the most challenging part of CPU design Large instrction sets reqire large state machines with many states, branches and otpts Control nits for mlticycle processors are difficlt to create and maintain Applying programming ideas to hardware design is a sefl techniqe 6

Cases when microprogramming is bad One disadvantage of microprograms is that looking p signals in a RO can be slower than generating them from simplified circits Sometimes comple instrctions implemented in hardware are slower than eqivalent assembly programs written sing simpler instrctions Comple instrctions are sally very general, so they can be sed more often. Bt this also means they can t be optimized for specific operands or sitations Some microprograms jst aren t written very efficiently. Bt since they re bilt into the CPU, people are stck with them (at least ntil the net processor pgrade) 7 How microcode is sed today odern CISC processors (like 86) se a combination of hardwired logic and microcode to balance design effort with performance Control for many simple instrctions can be implemented in hardwired which can be faster than reading a microcode RO Less-sed or very comple instrctions are microprogrammed to make the design easier and more fleible (floats, divide) In this way, designers respect the first law of performance ake the common case fast! 8