Designing a Multicycle Processor

Similar documents
CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

CPU Design Steps. EECC550 - Shaaban

CS152 Computer Architecture and Engineering. Lecture 8 Multicycle Design and Microcode John Lazzaro (

Recap: A Single Cycle Datapath. CS 152 Computer Architecture and Engineering Lecture 8. Single-Cycle (Con t) Designing a Multicycle Processor

Major CPU Design Steps

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMP303 Computer Architecture Lecture 9. Single Cycle Control

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CpE 442. Designing a Multiple Cycle Controller

MIPS-Lite Single-Cycle Control

CPU Organization (Design)

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

361 multipath..1. EECS 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor

ECE468 Computer Organization and Architecture. Designing a Multiple Cycle Controller

Lecture #17: CPU Design II Control

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

ECE 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller. Review of a Multiple Cycle Implementation

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

UC Berkeley CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2

Outline of today s lecture. EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor. What s wrong with our CPI=1 processor?

Single Cycle CPU Design. Mehran Rezaei

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

UC Berkeley CS61C : Machine Structures

How to design a controller to produce signals to control the datapath

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

The Processor: Datapath & Control

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

CS152 Computer Architecture and Engineering Lecture 13: Microprogramming and Exceptions. Review of a Multiple Cycle Implementation

The Big Picture: Where are We Now? CS 152 Computer Architecture and Engineering Lecture 11. The Five Classic Components of a Computer

Working on the Pipeline

Lecture 6 Datapath and Controller

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Chapter 4. The Processor. Computer Architecture and IC Design Lab

ECE 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor

CS61C : Machine Structures

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

Midterm I March 1, 2001 CS152 Computer Architecture and Engineering

CS61C : Machine Structures

Ch 5: Designing a Single Cycle Datapath

CENG 3420 Lecture 06: Datapath

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

CPE 335. Basic MIPS Architecture Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

UC Berkeley CS61C : Machine Structures

CS152 Computer Architecture and Engineering Lecture 10: Designing a Single Cycle Control. Recap: The MIPS Instruction Formats

CS 152 Computer Architecture and Engineering. Lecture 12: Multicycle Controller Design

ECE4680 Computer Organization and Architecture. Designing a Pipeline Processor

Initial Representation Finite State Diagram. Logic Representation Logic Equations

CPU Organization Datapath Design:

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

CS3350B Computer Architecture Quiz 3 March 15, 2018

Lecture 7 Pipelining. Peng Liu.

CS 61C: Great Ideas in Computer Architecture Lecture 12: Single- Cycle CPU, Datapath & Control Part 2

Recap: The MIPS Subset ADD and subtract EEL Computer Architecture shamt funct add rd, rs, rt Single-Cycle Control Logic sub rd, rs, rt

Systems Architecture I

Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

Pipeline design. Mehran Rezaei

CC 311- Computer Architecture. The Processor - Control

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!

ECE4680. Computer Organization and Architecture. Designing a Multiple Cycle Processor

Laboratory 5 Processor Datapath

CS61C : Machine Structures

Review: Abstract Implementation View

Systems Architecture

Recap: Processor Design is a Process. CS 152 Computer Architecture and Engineering Lecture 9. Designing a Multicycle Processor

ECE4680. Computer Organization and Architecture. Designing a Multiple Cycle Processor

Midterm I SOLUTIONS October 6, 1999 CS152 Computer Architecture and Engineering

Processor (I) - datapath & control. Hwansoo Han

Instructor: Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #18. Warehouse Scale Computer

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

ECE468. Computer Organization and Architecture. Designing a Multiple Cycle Processor

Topic #6. Processor Design

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

ENE 334 Microprocessors

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 19 CPU Design: The Single-Cycle II & Control !

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath Control Part 1

Hardwired Control Unit Ch 14

Transcription:

Designing a Multicycle Processor Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Multicycle-

Recap: Putting it All Together: A Single Cycle Processor op 6 Instr<3:26> RegDst busw Clk Main Control RegWr Rd Mux imm6 Instr<5:> Rt 5 5 Rs 5 Rw Ra Rb -bit Registers 6 ALUop RegDst ALUSrc : Rt busb Extender busa ExtOp Clk 3 npc_sel Mux ALUSrc Instruction Fetch Unit ALUctr func Instr<5:> 6 Data In ALU Clk Zero Instruction<3:> Rt ALU Control <2:25> WrEn Rs <6:2> MemWr Adr Data Memory ALUctr Rd <:5> 3 <:5> Imm6 Mux MemtoReg

The Truth Table for the Main Control op 6 Main Control RegDst ALUSrc : ALUop 3 func 6 ALU Control (Local) ALUctr 3 op R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite npc_sel Jump ExtOp ALUop (Symbolic) x R-type Or Add x x Add x x x Subtract x x x x xxx ALUop <2> x ALUop <> x ALUop <> x

PLA Implementation of the Main Control op<5>.. <> op<5>.. op<5>.. op<5>.. op<5>.. op<5>.. <> <> <> <> op<> R-type ori lw sw beq jump RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<> ALUop<> Arquitectura de Computadoras Multicycle- 4

Systematic Generation of Control Decode OPcode Control Logic / Store (PLA, ROM) Instruction Conditions Control Points microinstruction Datapath In our single-cycle processor, each instruction is realized by exactly one control command or microinstruction in general, the controller is a finite state machine microinstruction can also control sequencing (see later) Arquitectura de Computadoras Multicycle- 5

Outline of Today s s Lecture Recap: Single cycle processor Faster designs Multicycle Datapath Performance Analysis Multicycle Control Arquitectura de Computadoras Multicycle- 6

Abstract View of our single cycle processor op Main Control fun ALU control ALU Data Mem Next PC PC Instruction Fetch Register Fetch Ext Mem Access Reg. Wrt Result Store npc_sel Equal ExtOp ALUSrc ALUctr MemWr MemRd RegDst RegWr MemWr looks like a FSM with PC as state Arquitectura de Computadoras Multicycle- 7

What s s wrong with our CPI= processor? Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup setup Load PC Inst Memory Reg File mux ALU Data Mem mux Critical Path Store PC Inst Memory Reg File mux ALU Data Mem setup Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle Arquitectura de Computadoras Multicycle- 8

Memory Access Time Physics => fast memories are small (large memories are slow) address Storage Array selected word line storage cell bit line address decoder question: register file vs. memory => Use a hierarchy of memories Processor Cache proc. bus L2 Cache sense amps Arquitectura de Computadoras Multicycle- 9 mem. bus memory time-period 2-5 time-periods 2-3 time-periods

Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one May be able to short-circuit path and remove some components for some instructions! storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) storage element storage element Acyclic Combinational Logic (B) Arquitectura de Computadoras Multicycle- storage element

Basic Limits on Cycle Time Next address logic PC <= branch? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B Control Next PC PC Instruction Fetch npc_sel Operand Fetch ExtOp ALUSrc ALUctr MemRd MemWr RegDst RegWr MemWr Exec Mem Access Reg. File Data Mem Arquitectura de Computadoras Multicycle- Result Store

Partitioning the CPI= Datapath Add registers between smallest steps Exec Data Mem Next PC PC Instruction Fetch Operand Fetch Mem Access Reg. File Result Store npc_sel Equal ExtOp ALUSrc ALUctr MemWr MemRd RegDst RegWr MemWr Place enables on all registers Arquitectura de Computadoras Multicycle- 2

Example Multicycle Datapath Instruction Fetch Operand Fetch Next PC PC npc_sel Reg File Data Mem Result Store Mem Access IR Ext ALU Reg. File Equal ExtOp ALUSrc ALUctr MemWr MemRd MemToReg RegDst RegWr E A B S M Critical Path? Arquitectura de Computadoras Multicycle- 3

Recall: Step-by by-step Processor Design Step : ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control Arquitectura de Computadoras Multicycle- 4

Step 4: R-rtypeR (add, sub,...) Logical Register Transfer Physical Register Transfers Time inst Logical Register Transfers ADDU R[rd] < R[rs] + R[rt]; PC < PC + 4 inst Physical Register Transfers IR < MEM[pc] ADDU A< R[rs]; B < R[rt] S < A + B R[rd] < S; PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 5

Step 4: Logical immed Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ORI R[rt] < R[rs] OR ZExt(Im6); PC < PC + 4 inst Physical Register Transfers Time IR < MEM[pc] ORI A< R[rs]; B < R[rt] S < A or ZExt(Im6) R[rt] < S; PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 6

Step 4 : Load Logical Register Transfer inst LW inst Logical Register Transfers R[rt] < MEM[R[rs] + SExt(Im6)]; PC < PC + 4 Physical Register Transfers IR < MEM[pc] Physical Register Transfers Time LW A< R[rs]; B < R[rt] S < A + SExt(Im6) M < MEM[S] R[rd] < M; PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 7

Step 4 : Store Logical Register Transfer Physical Register Transfers Time inst SW inst SW Logical Register Transfers MEM[R[rs] + SExt(Im6)] < R[rt]; PC < PC + 4 Physical Register Transfers IR < MEM[pc] A< R[rs]; B < R[rt] S < A + SExt(Im6); MEM[S] < B PC < PC + 4 E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 8

Step 4 : Branch Logical Register Transfer Physical Register Transfers Time inst BEQ Logical Register Transfers if R[rs] == R[rt] then PC <= PC + 4+SExt(Im6) else PC <= PC + 4 inst Physical Register Transfers IR < MEM[pc] BEQ E< (R[rs] = R[rt]) if E then PC < PC + 4 else PC < PC+4+SExt(Im6) E Next PC PC Inst. Mem IR Reg File A B Exec S Mem Access M Reg. File Data Mem Arquitectura de Computadoras Multicycle- 9

Alternative datapath (book): Multiple Cycle Datapath Minimizes Hardware: memory, adder PCWr PC IorD Mux PCWrCond PCSrc BrWr Zero MemWr IRWr RegDst RegWr ALUSelA Target RAdr Ideal Memory WrAdr Din Dout Instruction Reg Rs Rt Rt Mux Rd 5 5 Mux Ra Rb Reg File Rw busa busw busb << 2 4 Mux 2 3 Mux Zero ALU ALU Control ALU Out Imm 6 ExtOp Extend MemtoReg ALUOp Arquitectura de Computadoras Multicycle- 2 ALUSelB

Our Control Model State specifies control points for Register Transfer Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic Control State State X Register Transfer Control Points Depends on Input Output Logic outputs (control points) Arquitectura de Computadoras Multicycle- 2

Step 4 Control Specification for multicycle proc IR <= MEM[PC] instruction fetch A <= R[rs] B <= R[rt] decode / operand fetch R-type ORi LW SW BEQ S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) Execute M <= MEM[S] MEM[S] <= B PC <= PC + 4 Memory R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 Write-back

Traditional FSM Controller state op cond next state control points Truth Table Equal 6 next State control points 4 State datapath State op Arquitectura de Computadoras Multicycle- 23

Step 5 (datapath + state diagram control) Translate RTs into control points Assign states Then go build the controller Arquitectura de Computadoras Multicycle- 24

Mapping RTs to Control Points IR <= MEM[PC] imem_rd, IRen instruction fetch A <= R[rs] B <= R[rt] Aen, Ben, Een decode R-type S <= A fun B ALUfun, Sen ORi S <= A or ZX LW S <= A + SX SW S <= A + SX BEQ PC <= Next(PC,Equal) Execute R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 M <= MEM[S] R[rt] <= M PC <= PC + 4 MEM[S] <= B PC <= PC + 4 Memory Write-back

Assigning States IR <= MEM[PC] instruction fetch A <= R[rs] B <= R[rt] decode R-type S <= A fun B ORi S <= A or ZX LW S <= A + SX SW BEQ S <= A + SX PC <= Next(PC) Execute R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] R[rt] <= M PC <= PC + 4 MEM[S] <= B PC <= PC + 4 Memory Write-back

Detailed Control Specification State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst??????? BEQ x R-type x ori x LW x SW x -all same in Moore machine BEQ: xxxxxx xxxxxx R: xxxxxx x fun xxxxxx x ORi: xxxxxx x or xxxxxx x LW: xxxxxx x add xxxxxx x xxxxxx x SW: xxxxxx x add xxxxxx x

Performance Evaluation What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPI i for type Frequency CPI i x freqi i Arith/Logic 4 4%.6 Load 5 3%.5 Store 4 %.4 branch 3 2%.6 Average CPI:4. Arquitectura de Computadoras Multicycle- 28

Controller Design The state diagrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple microsequencer Control reduces to programming this very simple device microprogramming sequencer control datapath control microinstruction micro-pc sequencer Arquitectura de Computadoras Multicycle- 29

Example: Jump-Counter i i+ i op-code Map ROM None of above: Do nothing (for wait states) Counter zero inc load Arquitectura de Computadoras Multicycle- 3

Using a Jump Counter IR <= MEM[PC] inc instruction fetch A <= R[rs] decode B <= R[rt] load R-type ORi LW SW BEQ S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) inc inc inc inc M <= MEM[S] MEM[S] <= B zero PC <= PC + 4 inc R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 zero zero zero zero Execute Memory Write-back

Our Microsequencer taken Z I L datapath control Micro-PC op-code Map ROM Arquitectura de Computadoras Multicycle-

Microprogram Control Specification BEQ R: ORi: LW: SW: µpc Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst? inc load zero zero x inc fun x zero x inc or x zero x inc add x inc x zero x inc add x zero Arquitectura de Computadoras Multicycle- 33

Mapping ROM R-type BEQ ori LW SW Arquitectura de Computadoras Multicycle- 34

Example: Controlling Memory PC addr Instruction Memory data Inst. Reg InstMem_rd IM_wait IR_en Arquitectura de Computadoras Multicycle- 35

Controller handles non-ideal memory IR <= MEM[PC] ~wait A <= R[rs] B <= R[rt] instruction fetch wait decode / operand fetch R-type ORi S <= A fun B S <= A or ZX LW S <= A + SX SW S <= A + SX BEQ PC <= Next(PC) Execute R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] ~wait wait R[rt] <= M PC <= PC + 4 MEM[S] <= B ~wait wait PC <= PC + 4 Memory Write-back

Really Simple Time-State Control instruction fetch Execute decode R-type S <= A fun B ORi S <= A or ZX IR <= MEM[PC] ~wait A <= R[rs] B <= R[rt] LW S <= A + SX SW wait S <= A + SX BEQ write-back Memory R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 M <= MEM[S] wait R[rt] <= M PC <= PC + 4 MEM[S] <= B PC <= PC + 4 wait PC <= Next(PC)

Time-state Control Path Local decode and control at each stage Valid Data Mem Mem Access Exec Reg. File A B S PC Reg File Equal M Next PC Inst. Mem IR Dcd Ctrl IRex Ex Ctrl IRmem Mem Ctrl IRwb WB Ctrl Arquitectura de Computadoras Multicycle- 38

Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique hardwired control microprogrammed control Arquitectura de Computadoras Multicycle- 39

Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~ levels of logic between latches Follow same 5-step method for designing real processor Arquitectura de Computadoras Multicycle- 4

Summary (cont d) Control is specified by finite state diagram Specialize state-diagrams easily captured by microsequencer simple increment & branch fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book) Simple Instruction set and powerful datapath simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once! Arquitectura de Computadoras Multicycle- 4