ECE4680. Computer Organization and Architecture. Designing a Multiple Cycle Processor

Similar documents
ECE468. Computer Organization and Architecture. Designing a Multiple Cycle Processor

ECE 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor

361 multipath..1. EECS 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor

ECE4680. Computer Organization and Architecture. Designing a Multiple Cycle Processor

ECE 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller. Review of a Multiple Cycle Implementation

Outline of today s lecture. EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor. What s wrong with our CPI=1 processor?

ECE468 Computer Organization and Architecture. Designing a Multiple Cycle Controller

CpE 442. Designing a Multiple Cycle Controller

CS152 Computer Architecture and Engineering Lecture 13: Microprogramming and Exceptions. Review of a Multiple Cycle Implementation

COMP303 Computer Architecture Lecture 9. Single Cycle Control

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

ECE4680 Computer Organization and Architecture. Designing a Pipeline Processor

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

How to design a controller to produce signals to control the datapath

CPU Design Steps. EECC550 - Shaaban

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Hardwired Control Unit Ch 14

Major CPU Design Steps

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Hardwired Control Unit Ch 16

Lecture #17: CPU Design II Control

Designing a Multicycle Processor

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

UC Berkeley CS61C : Machine Structures

Recap: A Single Cycle Datapath. CS 152 Computer Architecture and Engineering Lecture 8. Single-Cycle (Con t) Designing a Multicycle Processor

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

CS152 Computer Architecture and Engineering Lecture 10: Designing a Single Cycle Control. Recap: The MIPS Instruction Formats

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

CPU Organization (Design)

UC Berkeley CS61C : Machine Structures

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

MIPS-Lite Single-Cycle Control

CENG 3420 Lecture 06: Datapath

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2

CS61C : Machine Structures

Lecture 6 Datapath and Controller

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

CS152 Computer Architecture and Engineering. Lecture 8 Multicycle Design and Microcode John Lazzaro (

Lecture 10: Control Unit

Lecture 10: Control Unit

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 19 CPU Design: The Single-Cycle II & Control !

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Ch 5: Designing a Single Cycle Datapath

Instructor: Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #18. Warehouse Scale Computer

CS61C : Machine Structures

Working on the Pipeline

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS3350B Computer Architecture Quiz 3 March 15, 2018

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

Single Cycle CPU Design. Mehran Rezaei

CS61C : Machine Structures

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

The Processor: Datapath & Control

CS61C : Machine Structures

Recap: The MIPS Subset ADD and subtract EEL Computer Architecture shamt funct add rd, rs, rt Single-Cycle Control Logic sub rd, rs, rt

CS420/520 Homework Assignment: Pipelining

CS 61C: Great Ideas in Computer Architecture Lecture 12: Single- Cycle CPU, Datapath & Control Part 2

Review: Abstract Implementation View

The Big Picture: Where are We Now? CS 152 Computer Architecture and Engineering Lecture 11. The Five Classic Components of a Computer

Processor (multi-cycle)

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

Computer Science 141 Computing Hardware

Lecture 7 Pipelining. Peng Liu.

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

CPE 335. Basic MIPS Architecture Part II

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Midterm I March 1, 2001 CS152 Computer Architecture and Engineering

CC 311- Computer Architecture. The Processor - Control

CS 152 Computer Architecture and Engineering. Lecture 12: Multicycle Controller Design

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

UC Berkeley CS61C : Machine Structures

ECE 313 Computer Organization EXAM 2 November 9, 2001

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath Control Part 1

RISC Design: Multi-Cycle Implementation

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Systems Architecture I

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

Transcription:

ECE68 Computer Organization and Architecture Designing a Multiple Cycle Processor ECE68 Multipath. -- op 6 Instr<:6> RegDst A Single Cycle Processor busw RegWr Main imm6 Instr<:> Rb -bit Registers 6 op RegDst Src : busb Extender Branch Jump busa ExtOp func Instr<:> 6 Src Instruction Fetch Unit ctr Data In ECE68 Multipath. -- Instruction<:> <:> <6:> MemWr WrEn Adr Data ctr <:> <:> Imm6 MemtoReg

Concepts Datapath Components Connections Functional/Combinational Sate//Sequential Bus for data Signals instruction signals ECE68 Multipath. -- Instruction Fetch Unit Why does the use -bit counter? imm6 Instruction<:> 6 <:8> Target Instruction<:> Adder SignExt 6 Adder Jump Addr<:> Addr<:> Instruction Instruction<:> Branch ECE68 Multipath. --

The Main op<>.. op<>.. op<>.. op<>.. op<>.. op<>.. <> <> <> <> <> op<> R-type ori lw sw beq jump RegWrite Src RegDst MemtoReg MemWrite Branch Jump ExtOp op<> op<> op<> ECE68 Multipath. -- Drawbacks of this Single Cycle Processor Long cycle time: Cycle time must be long enough for the load instruction: - s Clock -to-q + - Instruction Access Time + - Register File Access Time + - Delay (address calculation) + - Data Access Time + - Register File Setup Time + - Clock Skew Cycle time is much longer than needed for all other instructions. Examples: R-type instructions do not require data memory access Jump does not require operation nor data memory access ECE68 Multipath.6 --

Overview of a Multiple Cycle Implementation The root of the single cycle processor s problems: The cycle time has to be long enough for the slowest instruction Solution: Break the instruction into smaller steps Execute each step (instead of the entire instruction) in one cycle - Cycle time: time it takes to execute the longest step, not the longest instruction - Keep all the steps to have similar length This is the essence of the multiple cycle processor The advantages of the multiple cycle processor: Cycle time is much shorter Different instructions take different number of cycles to complete - Load takes five cycles - Jump only takes three cycles Allows a functional unit to be used more than once per instruction - Adder + - Instruction mem + Data mem ECE68 Multipath.7 -- The Five Steps of a Load Instruction,,, Op, Func ctr Old Value Instruction Fetch Instr Decode / Reg. Fetch -to-q New Value Old Value Old Value Address Instruction Access Time New Value Delay through Logic New Value Data Reg Wr ExtOp Old Value New Value Src Old Value New Value RegWr Old Value New Value Register File Access Time busa Old Value New Value Delay through Extender & busb Old Value New Value Delay Address Old Value New Value Data Access Time busw Old Value New Register File Write Time ECE68 Multipath.8 --

Register File & Write Timing: vs. Reality In previous lectures for -cycle machine, register file and memory are simplified: Write happens at the clock tick Address, data, and write enable must be stable one set-up time before the clock tick WrEn Adr In real life for m-cycle machines: Neither register file nor ideal memory has the clock input The write path is a combinational logic delay path: - Write enable goes to and Din settles down - write access delay - Din is written into mem[address] Important: Address and Data must be stable BEFORE Write Enable goes to WrEn Adr ECE68 Multipath.9 -- ce Condition Between Address and Write Enable What is race condition? This real (no clock input) register file may not work reliably in the single cycle processor because: We cannot guarantee will be stable BEFORE RegWr = There is a race between (address) and RegWr (write enable) RegWr Rb busa Reg File busb busw The real (no clock input) memory may not work reliably in the single cycle processor because: We cannot guarantee Address will be stable BEFORE WrEn = There is a race between Adr and WrEn WrEn Adr ECE68 Multipath. --

How to Avoid this ce Condition? Solution for the multiple cycle implementation: Make sure Address is stable by the end of Cycle N Assert Write Enable signal ONE cycle later at Cycle (N + ) Address cannot change until Write Enable is disasserted ECE68 Multipath. -- Dual-Port Dual Port Independent Read (, Dout) and Write (WAdr, Din) ports Read and write (to different location) can occur at the same instruction cycle Read Port is a combinational path: Read Address Valid --> Read Access Delay --> Data Out Valid Write Port is also a combinational path: MemWrite = --> Write Access Delay --> Data In is written into location[wradr] MemWr <:> <:> WrAdr ECE68 Multipath. --

Instruction Fetch Cycle: In the Beginning Every cycle begins right AFTER the clock tick: mem[] <:> + You are here! One Logic Clock Cycle Wr=? MemWr=? WrAdr Dout Din IRWr=? ECE68 Multipath. -- op=? Instruction Fetch Cycle: The End Every cycle ends AT the next clock tick (storage element updates): IR mem[] <:> <:> + One Logic Clock Cycle Wr= You are here! MemWr= WrAdr Din Dout IRWr= Op = Add ECE68 Multipath. --

Instruction Fetch Cycle: Overall Picture Why do we need Wr Op=Add unlike in -cycle machine? : Wr, IRWr x: WrCond RegDst, MemR Others: s Wr= WrCond=x Src= BrWr= IorD= MemWr= IRWr= SelA= Target WrAdr Ifetch busa busb SelB= Op=Add ECE68 Multipath. -- Register Fetch / Instruction Decode busa RegFile[rs] ; busb RegFile[rt] ; is not being used: ctr = xx Wr= WrCond= IorD=x MemWr= IRWr= WrAdr Go to the Op Func 6 6 RegDst=x Imm 6 RegWr= Rb busa Reg File busw busb Src=x SelA=x SelB=xx Op=xx ECE68 Multipath.6 --

Register Fetch / Instruction Decode (Continue) busa Reg[rs] ; busb Reg[rt] ; Target + SignExt(Imm6)* (speculative calculation) Wr= WrCond= IorD=x MemWr= IRWr= RegDst=x Beq ype Ori : WrAdr Op 6 Func 6 Imm 6 ExtOp= ECE68 Multipath.7 -- Extend Rfetch/Decode Op=Add Why can we not further send target : BrWr, ExtOp address to? SelB= x: RegDst, Src IorD, MemtoReg Others: s Src=x BrWr= RegWr= SelA= Target Rb busa Reg File busw busb << SelB= Op=Add Branch Completion if (busa == busb) Target Wr= WrCond= IorD=x MemWr= IRWr= WrAdr RegDst=x Imm 6 ExtOp=x BrComplete Op=Sub SelB= x: IorD, MemReg RegDst, ExtOp : WrCond SelA Src ECE68 Multipath.8 -- Extend RegWr= Rb busa Reg File busw busb << Src= SelA= SelB= BrWr= Target Op=Sub

Instruction Decode: We have a R-type! Next Cycle: R-type Execution Wr= WrCond= IorD=x MemWr= IRWr= RegDst=x WrAdr Beq ype Ori : Op 6 Func 6 Imm 6 ExtOp= Extend RegWr= Rb busa Reg File busw busb << Src=x SelA= SelB= BrWr= Target Op=Add ECE68 Multipath.9 -- R-type Execution Output busa op busb RExec : RegDst SelA SelB= Op=ype x: Src, IorD Wr= WrCond= MemtoReg ExtOp Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Why not set RegDst= at next cycle? Imm Extend 6 ExtOp=x MemtoReg=x Op=ype SelB= ECE68 Multipath. --

R-type Completion Rfinish Op=ype R[rd] <- Output : RegDst, RegWr sela SelB= Wr= WrCond= x: IorD, Src ExtOp Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp=x MemtoReg= Op=ype SelB= ECE68 Multipath. -- A Multiple Cycle Delay Path There is no register to save the results between: Register Fetch: busa Reg[rs] ; busb Reg[rt] R-type Execution: output busa op busb R-type Completion: Reg[rd] output Register here to save outputs of Rfetch? sela IRWr= busa Rb Reg File busw busb selb Op Register here to save outputs of RExec? ECE68 Multipath. --

A Multiple Cycle Delay Path (Continue) Register is NOT needed to save the outputs of Register Fetch: IRWr = : busa and busb will not change after Register Fetch Register is NOT needed to save the outputs of R-type Execution: busa and busb will not change after Register Fetch signals SelA, SelB, and Op will not change after R-type Execution Consequently output will not change after R-type Execution In theory, you need a register to hold a signal value if: () The signal is computed in one clock cycle and used in another. () AND the inputs to the functional block that computes this signal can change before the signal is written into a state element. You can save a register if Cond is true BUT Cond is false: But in practice, this will introduce a multiple cycle delay path: - A logic delay path that takes multiple cycles to propagate from one storage element to the next storage element ECE68 Multipath. -- Pros and Cons of a Multiple Cycle Delay Path A -cycle path example: IR (storage) Reg File Read Reg File Write (storage) Advantages: Register savings We can share time among cycles: - If takes longer than one cycle, still a OK as long as the entire path takes less than cycles to finish busa Rb Reg File busw busb selb ECE68 Multipath. --

Pros and Cons of a Multiple Cycle Delay Path (Continue) Disadvantage: Static timing analyzer, which ONLY looks at delay between two storage elements, will report this as a timing violation You have to ignore the static timing analyzer s warnings But you may end up ignoring real timing violations Always TRY to put in registers between cycles to avoid MCDP busa Rb Reg File busw busb selb ECE68 Multipath. -- Instruction Decode: We have an Ori! Next Cycle: Ori Execution Wr= WrCond= IorD=x MemWr= IRWr= RegDst=x WrAdr Beq ype Ori : Intruction Reg Op 6 Func 6 Imm 6 ExtOp= Extend RegWr= Rb busa Reg File busw busb << Src=x SelA= SelB= BrWr= Target Op=Add ECE68 Multipath.6 --

Ori Execution output busa or Ext[Imm6] Op=Or OriExec : SelA SelB= x: MemtoReg IorD, Src Wr= WrCond= Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg=x Op=Or SelB= ECE68 Multipath.7 -- Ori Completion OriFinish Op=Or Reg[rt] output x: IorD, Src SelB= : SelA Wr= WrCond= RegWr Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg= Op=Or SelB= ECE68 Multipath.8 --

Instruction Decode: We have a Access! Next Cycle: Address Calculation Wr= WrCond= IorD=x MemWr= IRWr= Beq ype Ori : WrAdr Op 6 Func 6 RegDst=x Imm 6 ExtOp= ECE68 Multipath.9 -- Extend RegWr= Rb busa Reg File busw busb << Src=x SelA= SelB= BrWr= Target Op=Add Address Calculation AdrCal : ExtOp SelA SelB= output busa + SignExt[Imm6] Op=Add x: MemtoReg Src Wr= WrCond= Src=x BrWr= IorD=x MemWr= IRWr= RegDst=x RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg=x Op=Add SelB= ECE68 Multipath. --

Access for Store : ExtOp SWmem MemWr SelA mem[ output] busb SelB= Op=Add Wr= WrCond= x: Src,RegDst MemtoReg Src=x BrWr= IorD=x MemWr= IRWr= RegDst=x RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Keep SelA, SelB, Op same as previous cycle! Imm Extend 6 ExtOp= MemtoReg=x Op=Add SelB= ECE68 Multipath. -- Access for Load : ExtOp LWmem SelA, IorD SelB= Mem Dout mem[ output] Op=Add x: MemtoReg Src Wr= WrCond= Src=x BrWr= IorD= MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg=x Op=Add SelB= ECE68 Multipath. --

Write Back for Load LWwr : SelA RegWr, ExtOp Reg[rt] Mem Dout MemtoReg SelB= Op=Add Wr= WrCond= x: Src IorD Src=x BrWr= IorD=x MemWr= IRWr= RegDst= RegWr= SelA= Target Rb busa Reg File WrAdr busw busb << Imm Extend 6 ExtOp= MemtoReg= Op=Add SelB= ECE68 Multipath. -- Putting it all together: Multiple Cycle Datapath Wr WrCond Src BrWr IorD MemWr IRWr RegDst RegWr SelA Target WrAdr Rb busa Reg File busw busb << Imm 6 ExtOp Extend MemtoReg SelB Op ECE68 Multipath. --

Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Do NOT confuse Multiple Cycle Processor with Multiple Cycle Delay Path Multiple Cycle Processor executes each instruction in multiple clock cycles Multiple Cycle Delay Path: a combinational logic path between two storage elements that takes more than one clock cycle to complete It is possible (desirable) to build a MC Processor without MCDP: Use a register to save a signal s value whenever a signal is generated in one clock cycle and used in another cycle later ECE68 Multipath. -- Putting it all together: State Diagram Ifetch Rfetch/Decode BrComplete Op=Add Op=Add Op=Sub : BrWr, ExtOp SelB= : Wr, IRWr x: WrCond SelB= beq x: IorD, MemReg RegDst, MemR x: RegDst, Src RegDst, ExtOp Others: s IorD, MemtoReg : WrCond SelA Others: s Src lw or sw Ori ype AdrCal : ExtOp LWmem : RegDst RExec OriExec : ExtOp SelA, IorD SelA SelA Op=Or SelB= SelB= SelB= : SelA Op=ype Op=Add lw Op=Add SelB= x: Src, IorD x: MemtoReg x: MemtoReg MemtoReg x: MemtoReg Src Src ExtOp IorD, Src sw Rfinish OriFinish LWwr : ExtOp SWMem : SelA MemWr Op=ype Op=Or RegWr, ExtOp SelA : RegDst, RegWr MemtoReg x: IorD, Src SelB= sela SelB= Op=Add SelB= Op=Add SelB= x: Src,RegDst x: IorD, Src : SelA x: Src MemtoReg IorD ExtOp RegWr ECE68 Multipath.6 --

Where to get more information? Next two lectures: Multiple Cycle ler: Appendix C of your text book. Microprogramming: Section. of your text book. D. Patterson, Microprograming, Scientific America, March 98. D. Patterson and D. Ditzel, The Case for the Reduced Instruction Set Computer, Computer Architecture News 8, 6 (October, 98) Homework. See the website. ECE68 Multipath.7 --