Outline. Combinational Element. State (Sequential) Element. Clocking Methodology. Input/Output of Elements

Similar documents
Multicycle Approach. Designing MIPS Processor

The Processor: Datapath & Control

Lets Build a Processor

Chapter 4 The Processor (Part 2)

ECE369. Chapter 5 ECE369

ENE 334 Microprocessors

Implementing the Control. Simple Questions

Lecture 5: The Processor

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Ch 5: Designing a Single Cycle Datapath

EECE 417 Computer Systems Architecture

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Topic #6. Processor Design

CC 311- Computer Architecture. The Processor - Control

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Single-Cycle Examples, Multi-Cycle Introduction

5.7. Microprogramming: Simplifying Control Design 5.7

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Systems Architecture I

ECE 313 Computer Organization EXAM 2 November 9, 2001

Review: Abstract Implementation View

EECE 417 Computer Systems Architecture

Multicycle conclusion

Single Cycle Datapath

Single Cycle Datapath

Introduction. ENG3380 Computer Organization and Architecture MIPS: Data Path Design Part 3. Topics. References. School of Engineering 1

Multiple Cycle Data Path

Computer Science 141 Computing Hardware

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Chapter 5: The Processor: Datapath and Control

CENG 3420 Lecture 06: Datapath

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

Systems Architecture

ECE260: Fundamentals of Computer Engineering

CS Computer Architecture Spring Week 10: Chapter

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor

Mapping Control to Hardware

Lecture 10 Multi-Cycle Implementation

Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CPE 335. Basic MIPS Architecture Part II

CPU Organization (Design)

Single Cycle Data Path

The Big Picture: Where are We Now? CS 152 Computer Architecture and Engineering Lecture 11. The Five Classic Components of a Computer

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor

Processor: Multi- Cycle Datapath & Control

CSE140: Components and Design Techniques for Digital Systems

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Computer Architecture and IC Design Lab

ECE 486/586. Computer Architecture. Lecture # 7

Chapter 4. The Processor

Computer Architecture

Review Multicycle: What is Happening. Controlling The Multicycle Design

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

RISC Design: Multi-Cycle Implementation

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

ECE468 Computer Organization and Architecture. Designing a Multiple Cycle Controller

Stored Program Concept. Instructions: Characteristics of Instruction Set. Architecture Specification. Example of multiple operands

CS222: Processor Design

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Instruction Set Architecture. "Speaking with the computer"

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CPE 335 Computer Organization. Basic MIPS Architecture Part I

LECTURE 5. Single-Cycle Datapath and Control

Block diagram view. Datapath = functional units + registers

Chapter 3 MIPS Assembly Language. Ó1998 Morgan Kaufmann Publishers 1

Chapter 4 The Processor 1. Chapter 4A. The Processor

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Major CPU Design Steps

Chapter 4. The Processor

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

COMPSCI 313 S Computer Organization. 7 MIPS Instruction Set

Control & Execution. Finite State Machines for Control. MIPS Execution. Comp 411. L14 Control & Execution 1

Chapter 2A Instructions: Language of the Computer

RISC Processor Design

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Computer Architecture

Introduction. Datapath Basics

Outline of today s lecture. EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor. What s wrong with our CPI=1 processor?

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Transcription:

Outline ombinational Element ombinational & sequential logic Single-cycle PU ulti-cycle PU Examples of ombinational Elements State (Sequential) Element! "#$$$ #$$ #$$ #$ # & & ) *.// * - + #3, * + - locking ethodology Input/Output of Elements * *! 5, +6 # - * * * * - - - - /

Register File IPS6 Formats 8&, 9 8:5 &.,, " 8,, ' ' :6; ' ' 6; ' ' <6; 3! / ; 3 / ommon Steps in Execution ifferences in Execution Execution of all instructions require the following steps send P to memory and fetch instruction stored at location specified by P read - s, using fields specifying the s in the instruction All instructions use functionality transfer instructions: compute address instructions: execute operations branch instructions: comparison & address compuation transfer (strictly load/store ISA) load: access memory for read {ld R, (R)} store: access memory for write {ld (R), R} instruction no memory access for operands access a for write of {add R,R, R3} Branch instruction change P content based on comparison {bnez R, Loop} Summary Path & path 9 " 5 = = = = = = = = = = = = = = = = = = = = = = = path is the signal path through which in the PU flows including the functional elements Elements of path combinational elements state (sequential) elements path the signal path from the controller to the path elements exercises timing & control over path elements

What Should be in the path path Schematic At a minimum we need combinational and sequential logic elements in the path to support the following functions fetch instructions and from memory s decode instructions and dispatch them to the execution unit execute arithmetic & logic operations update state elements (s and memory) " > > : > " path Building Blocks: Access path Building Blocks: R-Type a that points to the next instruction to be fetched it is incremented each clock cycle ontent of P is input to The instruction is fetched and supplied to upstream path elements Adder is used to increment P by in preparation for the next instruction (why?) Adder: an with control input hardwired to perform add instruction only For reasons that will become clear later, we assume separate memory units for instructions &! " " # $ ( : & ' 6; 9 Used for arithmetic & logic operations two, rs and rt operates on s content to rd Example: add R, R, R3 ' : rs=r, rt=r3, rd=r s Reg is asserted to enable write at clock edge op to control operation 9 ) I-Type : load/store Required path Elements for load/store rs contains the base field for the displacement address mode rt specifies to load from memory for load to write to memory for store Immediate contains address offset To compute memory address, we must sign-extend the 6-bit immediate to 6 bits add it to the base in rs ' ' :6; &. * *6??.( Register file load: s to read for base address & to write for store: s to read for base address & for Sign extender to sign-extend and condition immediate field for s complement addition of address offset using 6-bit to add base address and sign-extended immediate field memory to load/store : memory address; input for store; output for load control inputs: em, em, clock 6 6

path Building Blocks: load/store I-Type : bne I-Type 6 5 5 6 opcode rs rt immediate 5 reg 5 reg 5 Registers reg Reg 6 sign 6 extend op zero em em Branch path must compute branch condition & branch address rs and rt refer to s to be compared for branch condition if Reg[rs]!= Reg[rd], P = P + Imm<< (note that at this point P is already incremented. In effect P current =(P previous +) + Imm<< else if Reg[rs] == Reg[rt] P remains unchanged: P current =(P previous +) the next sequential instruction is taken Required functional elements RegFile, sign extender, adder, shifter ' ' :6; : : Sign Extend & Shift Operations path Building Blocks: bne Sign extension is required because 6-bit offset must be expanded to 6 bits in order to be used in the 6-bit adder we are using s complement arithmetic Shift by is required because instructions are 3-bits wide and are aligned on a word ( bytes) boundary in effect we are using an 8- bit offset instead of 6 -*&),. -*&),. -,&/$ &)*+ &)*+ &', ' 3@( ' ' :6; : ' 3@( :" A ) ; B omputing & Branch ondition Putting it All Together The operands of bne are compared in the same we use for load/store/arithmetic/logic instructions the provides a ZERO output signal to indicate condition the ZERO signal controls what instruction will be fetched next depending on whether the branch is taken or not We also need to compute the address we may not be able to use the if it is being used to compute the branch condition (more on this later) need an additional AER (an hardwired to add only) to compute branch address ombine path building blocks to build the full path now we must decide some specifics of implementation Single-cycle PU each instruction executes in one clock cycle PI= for all instructions ulti-cycle PU instructions execute in multiples of a shorter clock cycle different instructions have different PI

Single-ycle PU The Processor: path & One clock cycle for all instructions No path resource can be used more than once per clock cycle s in resource duplication for elements that must be used more than once examples: separate memory units for instruction and ; two s for conditional branches Some path elements may be shared through multiplexing as long as they are used once We're ready to look at an implementation of the IPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (P) to supply instruction address get the instruction from memory read s use the instruction to decide exactly what to do All instructions use the after reading the s Why? memory-reference? arithmetic? control flow? IPS Fetch-Execute Processor Architecture Initialize first instruction Register Register In In Activate Route to Register Register In In

Route to Register (IR) Select Appropriate From Register File Register Register In In Route to Arithmetic Unit () o the omputation Register Register In In Store the Result Increment P Point to Next Register Register In In

Increment P Point to Next Execute Next Register Register In In State Elements An unclocked state element Unclocked vs. locked locks used in synchronous logic when should an element that contains state be updated? Falling edge The set-reset latch output depends on present inputs and also on past inputs R Q lock period cycle time Rising edge S Q Latches and Flip-flops -latch Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) hange of state (value) is based on the clock Latches: output changes whenever the inputs change, and the clock is asserted (level-triggered methodology) Flip-flops: state changes only on a clock edge (edge-triggered methodology) A clocking methodology defines when signals can be read and written wouldn't want to read a signal at the same time it was being written Two inputs: the value to be stored () the clock signal () indicating when to read & store Two outputs: the value of the internal state (Q) and it's complement Q _ Q Q

flip-flop Our Implementation Output changes only on the clock edge Q Q latch latch Q Q Q An edge triggered methodology Typical execution: read contents of some state elements send values through some combinational logic write s to one or more state elements State element ombinational logic State element lock cycle Q Register File Abstraction Built using flip-flops number number Register file number number Register Register... Register n Register n u x u x ake sure you understand the abstractions! Sometimes it is easy to think you do, when you don t A B 3 3 Select u x 3 A3 B3 A3 B3 Select u x u x. 3 3. o you understand? What is the above? A B u x Register File Simple Implementation Note: we still use the real clock to determine when to write Include the functional units we need for each instruction em Register number n-to-n decoder n n. Register Register. Register n Register n address P Add Sum memory a. memory b. counter c. Adder 5 operation Register 5 numbers Zero Registers 5 memory em a. memory unit 6 3 Sign extend b. Sign-extension unit Register Reg a. Registers b.

Building the path Use multiplexors to stitch them together PSrc Add Add Shift left Src operation P address em Zero emtoreg Registers memory Reg memory 6 3 em Sign extend Selecting the operations to perform (, read/write, etc.) ling the flow of (multiplexor inputs) Information comes from the 3 bits of the instruction Example: add $8, $, $8 Format: op rs rt rd shamt funct 's operation based on instruction type and function code e.g., what should the do with this instruction Example: lw $, ($) 35 op rs rt 6 bit offset control input AN OR add subtract set-on-less-than NOR ust describe hardware to compute -bit control input given instruction type = lw, sw = beq, = arithmetic function code for arithmetic Op computed from instruction type escribe it using a truth table (can turn into gates): Why is the code for subtract and not? R-type Add Add [3 6] Regst Branch em emtoreg Op em Src Reg Shift left P [5 ] address [ 6] Zero [3 ] [5 ] memory Registers memory [5 ] 6 3 Sign extend control [5 ] Regst Src emto- Reg Reg em em Branch Op p R-format lw sw X X beq X X

Load Branch on Equal Our Simple Structure Simple combinational logic (truth tables) Inputs Op5 Op Op3 Op Op Op control block Op Op Op Operation F3 R-format Iw sw beq Operation F Operation F (5 ) F Operation F Outputs Regst Src emtoreg Reg em em All of the logic is combinational We wait for everything to settle down, and the right thing to be done might not produce right answer right away we use write signals along with clock to determine when to write ycle time determined by length of the longest path Branch Op OpO State element ombinational logic State element lock cycle We are ignoring some details like setup and hold times Single ycle Implementation Where we are headed alculate cycle time assuming negligible delays except: memory (ps), and adders (ps), file access (5ps) Add Shift left Add PSrc Single ycle Problems: what if we had a more complicated instruction like floating point? wasteful of area One Solution: use a smaller cycle time have different instructions take different numbers of cycles a multicycle path: P address memory Src Registers Reg 6 3 Sign extend operation em Zero emtoreg memory em P or Register # Registers Register # Register # A B Out

ulticycle Approach ulticycle Approach Reuse functional units used to compute address and to increment P used for instruction and signals will not be determined directly by instruction e.g., what should the do for a subtract instruction? There must be some sequencing involved leading to. Use a finite state machine for control Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) this introduces additional internal s P [5 ] A [ 6] em Registers [5 ] [5 ] B 3 [5 ] 6 3 Sign extend Shift left Zero Out s from ISA perspective Breaking down an instruction onsider each instruction from the perspective of ISA Example: The add instruction changes a specified by the P estination specified by bits 5: of instruction New value is the sum ( op ) of two s Source s specified by bits 5: and :6 of the instruction Reg[[P][5:]] <= Reg[[P][5:]] op Reg[[P][:6]] In order to accomplish this, we must break up the instruction (kind of like introducing variables when programming) ISA definition of arithmetic: Reg[[P][5:]] <= Reg[[P][5:]] op Reg[[P][:6]] ould break down to: IR <= [P] A <= Reg[IR[5:]] B <= Reg[IR[:6]] Out <= A op B Reg[IR[:6]] <= Out on t forget an important part of the operation: P <= P + Idea behind multicycle approach Five Execution Steps We define each instruction from the ISA perspective Break it down into steps following the rule that flows through, at most, one major functional unit (e.g., balance work across steps) Introduce new s as needed (A, B, Out, R, etc.) Finally, try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control and likely hardware, helps to simplify solution) Result: The textbook s multicycle Implementation. Fetch ecode and Register Fetch Execution, omputation, or Branch ompletion Access or R-type instruction completion -back step INSTRUTIONS TAKE FRO 3-5 YLES

Step : Fetch Step : ecode and Register Fetch Use P to get instruction and put it in the Register Increment the P by and put the back in the P an be described succinctly using RTL "Register-Transfer Language" IR <= [P]; P <= P + ; What is the advantage of updating the P now? s rs and rt in case we need them ompute the branch address in case the instruction is a branch RTL: A <= Reg[IR[5:]]; B <= Reg[IR[:6]]; Out <= P + (sign-extend(ir[5:]) << ); We aren't setting any control lines based on the instruction type Step 3: ( dependent) Step : (R-type or memory-access) is performing one of three functions, based on instruction type Reference: Out <= A + sign-extend(ir[5:]); R-type: Out <= A op B; Branch: if (A==B) P <= Out; Loads and stores access memory R <= [Out]; or [Out] <= B; R-type instructions finish Reg[IR[5:]] <= Out; The write actually takes place at the end of the cycle on the edge Step 5: -back step Summary: Reg[IR[:6]] <= R; Which instruction needs this?

Simple Questions How many cycles will it take to execute this code? lw $t, ($t3) lw $t3, ($t3) beq $t, $t3, Label nop add $t5, $t, $t3 sw $t5, 8($t3) Label:... What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t and $t3 take place? Pond Shift [5-] 6 8 left [3 6] P P [3 8] [5 ] A [ 6] Zero em Registers [5 ] [5 ] B 3 [5 ] P Ior em em emtoreg IR Outputs Op [5 ] PSource Op SrcB SrcA Reg Regst 6 3 Sign extend Shift left control Jump address [3 ] Out [5 ] Review: finite state machines Review: finite state machines Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Inputs urrent state lock Next-state function We ll use a oore machine (output based only on current state) Output function Next state Outputs Example: B. 3 A friend would like you to build an electronic eye for use as a fake security device. The device consists of three lights lined up in a row, controlled by the outputs Left, iddle, and Right, which, if asserted, indicate that a light should be on. Only one light is on at a time, and the light moves from left to right and then from right to left, thus scaring away thieves who believe that the device is monitoring their activity. raw the graphical representation for the finite state machine used to specify the electronic eye. Note that the rate of the eye s movement will be controlled by the clock speed (which should not be too great) and that there are essentially no inputs. Implementing the Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we ve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification Graphical Specification of FS Note: don t care if not mentioned asserted if name only otherwise exact value How many state bits will we need? 3 address computation SrcA = SrcB = Op = em Ior = access Start 6 em Ior = fetch decode/ fetch em SrcA = Ior = IR SrcA = SrcB = SrcB = Op = Op = P PSource = Execution SrcA = SrcB = Op = access 5 8 Regst = Reg emtoreg = Branch completion SrcA = SrcB = Op = Pond PSource = R-type completion 9 Jump completion P PSource = read completon step Regst = Reg emtoreg =

Finite State achine for Implementation: P PLA Implementation If I picked a horizontal or vertical line could you explain it? Op5 Op Op3 Op logic Pond Ior em em IR emtoreg PSource Op Op S3 S Outputs Op SrcB SrcA S S Op5 Inputs Op Op3 Op Op Op S3 S S S State opcode field Reg Regst NS3 NS NS NS P Pond Ior em em IR emtoreg PSource PSource Op Op SrcB SrcB SrcA Reg Regst NS3 NS NS NS RO Implementation RO Implementation RO = " Only " values of memory locations are fixed ahead of time A RO can be used to implement a truth table if the address is m-bits, we can address m entries in the RO. our outputs are the bits of that the address points to. How many inputs are there? 6 bits for opcode, bits for state = address lines (i.e., = different addresses) How many outputs are there? 6 path-control outputs, state bits = outputs m n RO is x = K bits (and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are the same i.e., opcode is often ignored m is the "height", and n is the "width" RO vs PLA Another Implementation Style Break up the table into two parts state bits tell you the 6 outputs, x6 bits of RO bits tell you the next state bits, x bits of RO Total:.3K bits of RO PLA is much smaller can share product terms only need entries that produce an active output can take into account don't cares Size is (#inputs #product-terms) + (#outputs #productterms) For this example = (x)+(x) = 5 PLA cells PLA cells usually about the size of a RO cell (slightly bigger) omplex instructions: the "next state" is often current state + unit Adder PLA or RO Input State select logic Op[5 ] Outputs P Pond Ior em em IR B emtoreg PSource Op SrcB SrcA Reg Regst Addrtl opcode field

etails icroprogramming ispatch RO ispatch RO Op Opcode name Value Op Opcode name Value R-format lw jmp sw beq PLA or RO lw sw State Adder 3 Addrtl unit icrocode memory Input Outputs P Pond Ior em em IR B emtoreg PSource Op SrcB SrcA Reg Regst Addrtl path icroprogram counter ispatch RO ispatch RO Adder select logic select logic State number -control action Value of Addrtl Use incremented state 3 Use dispatch RO Use dispatch RO 3 Use incremented state 3 Replace state number by 5 Replace state number by 6 Use incremented state 3 Replace state number by 8 Replace state number by 9 Replace state number by opcode field opcode field What are the microinstructions? icroprogramming A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions Label control SR SR Register control P control Sequencing Fetch Add P P Seq Add P Extshft ispatch em Add A Extend ispatch LW Seq R Fetch SW Fetch Rformat Func code A B Seq Fetch BEQ Subt A B Out-cond Fetch JUP Jump address Fetch Will two implementations of the same architecture have the same microcode? What would a microassembler do? icroinstruction format Field name Value Signals active omment Op = Add ause the to add. control Subt Op = ause the to subtract; this implements the compare for branches. Func code Op = Use the instruction's function code to determine control. SR P SrcA = Use the P as the first input. A SrcA = Register A is the first input. B SrcB = Register B is the second input. SR SrcB = Use as the second input. Extend SrcB = Use output of the sign extension unit as the second input. Extshft SrcB = Use the output of the shift-by-two unit as the second input. two s using the rs and rt fields of the IR as the numbers and putting the into s A and B. Reg, a using the rd field of the IR as the number and Register Regst =, the contents of the Out as the. control emtoreg = R Reg, a using the rt field of the IR as the number and Regst =, the contents of the R as the. emtoreg = P em, memory using the P as address; write into IR (and lor = the R). em, memory using the Out as address; write into R. lor = em, memory using the Out as address, contents of B as the lor =. PSource = the output of the into the P. P P write control Out-cond PSource =, If the Zero output of the is active, write the P with the contents Pond of the Out. jump address PSource =, the P with the jump address from the instruction. P Seq Addrtl = hoose the next microinstruction sequentially. Sequencing Fetch Addrtl = Go to the first microinstruction to begin a new instruction. ispatch Addrtl = ispatch using the RO. ispatch Addrtl = ispatch using the RO. aximally vs. inimally Encoded icrocode: Trade-offs No encoding: bit for each path operation faster, requires more memory (logic) used for Vax 8 an astonishing K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of IS: Too much logic to put on a single chip with everything else Use a RO (or even RA) to hold the microcode It s easy to add new instructions istinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write esign architecture and microcode in parallel Implementation (off-chip RO) Advantages Easy to change since values are in memory an emulate other architectures an make use of internal s Implementation isadvantages, SLOWER now that: is implemented on same chip as processor RO is no longer faster than RA No need to go back and make changes

Historical Perspective Pentium In the 6s and s microprogramming was very important for implementing machines This led to more sophisticated ISAs and the VAX In the 8s RIS processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-3 architecture processors since 86 use: hardwired control for simpler instructions (few cycles, FS control implemented using PLA or random logic) microcoded control for more complex instructions (large numbers of cycles, central control store) The IA-6 architecture uses a RIS-style ISA and can be implemented without a large central control store Pipelining is important (last IA-3 without it was 8386 in 985) cache Enhanced floating point and multimedia Advanced pipelining hyperthreading support Pipelining is used for the simple instructions favored by compilers Simply put, a high performance implementation needs to ensure that the simple instructions execute quickly, and that the burden of the complexities of the instruction set penalize the complex, less frequently used, instructions cache Integer path I/O interface Secondary cache and memory interface hapter hapter 6 Pentium hapter 5 Summary Somewhere in all that control we must handle complex instructions If we understand the instructions We can build a simple processor! cache Enhanced floating point and multimedia cache Integer path I/O interface Secondary cache and memory interface If instructions take different amounts of time, multi-cycle is better path implemented using: ombinational logic for arithmetic Advanced pipelining hyperthreading support Processor executes simple microinstructions, bits wide (hardwired) control lines for integer path ( for floating point) If an instruction requires more than microinstructions to implement, control from microcode RO (8 microinstructions) Its complicated! State holding elements to remember bits implemented using: ombinational logic for single-cycle implementation Finite state machine for multi-cycle implementation