The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

Similar documents
The single-cycle design from last time

The extra single-cycle adders

Review Multicycle: What is Happening. Controlling The Multicycle Design

Review. A single-cycle MIPS processor

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

EEC 483 Computer Organization

Exceptions and interrupts

Review: Computer Organization

1048: Computer Organization

Pipelining. Chapter 4

PART I: Adding Instructions to the Datapath. (2 nd Edition):

Lecture 7. Building A Simple Processor

CS 251, Winter 2018, Assignment % of course mark

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

1048: Computer Organization

Quiz #1 EEC 483, Spring 2019

What do we have so far? Multi-Cycle Datapath

CS 251, Winter 2019, Assignment % of course mark

CS 251, Spring 2018, Assignment 3.0 3% of course mark

EEC 483 Computer Organization

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

Lecture 9: Microcontrolled Multi-Cycle Implementations

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

CS 251, Winter 2018, Assignment % of course mark

Hardware Design Tips. Outline

1048: Computer Organization

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

Computer Architecture

Computer Architecture

Enhanced Performance with Pipelining

PS Midterm 2. Pipelining

Solutions for Chapter 6 Exercises

CSSE232 Computer Architecture I. Mul5cycle Datapath

Review. How to represent real numbers

Lab 8 (All Sections) Prelab: ALU and ALU Control

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture

Chapter 6: Pipelining

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

Design of the MIPS Processor

ECE232: Hardware Organization and Design

Overview of Pipelining

Chapter 6: Pipelining

Design of the MIPS Processor (contd)

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

Lecture 5: The Processor

EEC 483 Computer Organization. Branch (Control) Hazards

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

LECTURE 5. Single-Cycle Datapath and Control

Computer Architecture. Lecture 6: Pipelining

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

comp 180 Lecture 25 Outline of Lecture The ALU Control Operation & Design The Datapath Control Operation & Design HKUST 1 Computer Science

MIPS Architecture. An Example: MIPS. From the Harris/Weste book Based on the MIPS-like processor from the Hennessy/Patterson book

LECTURE 6. Multi-Cycle Datapath and Control

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

Pipelined Datapath. One register file is enough

CPE 335. Basic MIPS Architecture Part II

Processor (I) - datapath & control. Hwansoo Han

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

EECS 322 Computer Architecture Improving Memory Access: the Cache

ECE369. Chapter 5 ECE369

POWER-OF-2 BOUNDARIES

Topic #6. Processor Design

Single-Cycle Examples, Multi-Cycle Introduction

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Systems Architecture I

CSEN 601: Computer System Architecture Summer 2014

Lecture 13: Exceptions and Interrupts

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W9-W

Lecture 10: Pipelined Implementations

The Processor: Datapath & Control

Implementing the Control. Simple Questions

CC 311- Computer Architecture. The Processor - Control

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

Processor: Multi- Cycle Datapath & Control

Chapter 4. The Processor

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 5: The Processor: Datapath and Control

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

CPE 335 Computer Organization. Basic MIPS Architecture Part I

MIPS-Lite Single-Cycle Control

Computer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012

Multicycle Approach. Designing MIPS Processor

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Multicycle conclusion

CSE 2021 COMPUTER ORGANIZATION

Laboratory 5 Processor Datapath

Chapter 5 Solutions: For More Practice

Lets Build a Processor

RISC Processor Design

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Review: Abstract Implementation View

Transcription:

The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5 - ] Sign etend

Control The control nit is responsible for setting all the control signals so that each instrction is eected properly. The control nit s inpt is the 32-bit instrction word. The otpts are vales for the ble control signals in the path. ost of the signals can be generated from the instrction opcode alone, and not the entire 32-bit word. To illstrate the relevant control signals, we will show the rote that is taken throgh the path by R-type, lw, sw and beq instrctions.

R-type instrction path The R-type instrctions inclde add, sb, and, or, and slt. The ALUOp is determined by the instrction s fnc field. PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtore RegDst ALUSrc em I [5 - ] Sign etend

lw instrction path An eample load instrction is lw $t, 4($sp). The ALUOp mst be (add), to compte the effective. PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtore RegDst ALUSrc em I [5 - ] Sign etend

sw instrction path An eample store instrction is sw $a, 6($sp). The ALUOp mst be (add), again to compte the effective. PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtore RegDst ALUSrc em I [5 - ] Sign etend

beq instrction path One sample branch instrction is beq $at, $, offset. The ALUOp is (sbtract), to test for eqality. The branch may or may not be taken, depending PC Instrction [3-] Instrction 4 I [25-2] I [2-6] I [5 - ] I [5 - ] Add RegDst register register 2 register Reg 2 Registers Sign etend Shift left 2 ALUSrc Add ALU Zero Reslt ALUOp PCSrc on the ALU s Zer otpt em Data em emtore

Control signal table Operation RegDst Reg ALUSrc ALUOp em em emtoreg add sb and or slt lw sw X X beq X X sw and beq are the only instrctions that do not write any registers. lw and sw are the only instrctions that se the constant field. They also depend on the ALU to compte the effective. ALUOp for R-type instrctions depends on the instrctions fnc field. The PCSrc control signal (not listed) shold be set if the instrction is beq and the ALU s Zero otpt is tre.

Generating control signals The control nit needs 3 bits of inpts. Si bits make p the instrction s opcode. Si bits come from the instrction s fnc field. It also needs the Zero otpt of the ALU. The control nit generates bits of otpt, corresponding to the signals mentioned on the previos page. Yo can bild the actal circit by sing big K-maps, big Boolean algebra, or big circit design programs. The tetbook presents a slightly different control nit. RegDst Reg Instrction [3-] Instrction I [3-26] I [5 - ] Control ALUSrc ALUOp em em emtoreg PCSrc Zero

Smmary - Single Cycle Datapath A path contains all the fnctional nits and connections necessary to implement an instrction set architectre. For or single-cycle implementation, we se two separate memories, an ALU, some etra adders, and lots of mltipleers. IPS is a 32-bit machine, so most of the bses are 32-bits wide. The control nit tells the path what to do, based on the instrction that s crrently being eected. Or processor has ten control signals that reglate the path. The control signals can be generated by a combinational circit with the instrction s 32-bit binary encoding as inpt. Now we ll see the performance limitations of this single-cycle machine and try to improve pon it.

lticycle path We jst saw a single-cycle path and control nit for or simple IPSbased instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not held back by slower ones. The clock cycle time can be decreased. We don t have to dplicate any hardware nits. A mlticycle processor reqires a somewhat simpler path which we l see today, bt a more comple control nit that we ll see later.

The single-cycle design again PC 4 Add Reg Shift left 2 Add PCSrc A control nit (no shown) generates the control signa from the instrctio op and fnc fie Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend

The eample add from last time Consider the instrction add $s4, $t, $t2. op rs rt rd shamt fnc Assme $t and $t2 initially contain and 2 respectively. Eecting this instrction involves several steps.. The instrction word is read from the instrction, and the program conter is incremented by 4. 2. The sorces $t and $t2 are read from the register file. 3. The vales and 2 are added by the ALU. 4. The reslt (3) is stored back into $s4 in the register file.

How the add goes throgh the path PC+4 PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] RegDst register register 2 register 2 Registers...... ALUSrc ALU Zero Reslt ALUOp em Data em emtoreg I [5 - ] Sign etend...

State elements In an instrction like add $t, $t, $t2, how do we know $t is not pdated ntil after its original vale is read? register register 2 register Reg 2 Registers em Data em PC

The path and the clock STEP : A new instrction is loaded from. The control nit sets the path signals appropriately so that registers are read, ALU otpt is generated, is read and branch target es are compted. STEP 2: The register file is pdated for arithmetic or lw instrctions. Data is written for a sw instrction. The PC is pdated to point to the net instrction. In a single-cycle path everything in Step mst complete within one clock cycle.

The slowest instrction... If all instrctions mst complete within one clock cycle, then the cycle time has to be large enogh to accommodate the slowest instrction. For eample, lw $t, 4($sp) needs 8ns, assming the delays shown here. Instrction [3-] Instrction 2 ns I [5 - ] reading the instrction reading the base register $sp compting $sp-4 2ns reading the storing back to $t I [25-2] I [2-6] I [5 - ] ns register register 2 register ns 2 Registers Sign etend ns ns ALU Zero Reslt 2 ns 2ns ns 2ns ns 8ns Data 2 ns ns

...determines the clock cycle time If we make the cycle time 8ns then every instrction will take 8ns, even if they don t need that mch time. For eample, the instrction add $s4, $t, $t2 really needs jst 6ns. reading the instrction reading registers $t and $t2 compting $t + $t2 storing the reslt into $s 2ns ns 2ns ns 6ns Instrction [3-] Instrction 2 ns I [25-2] I [2-6] I [5 - ] I [5 - ] ns register register 2 register ns 2 Registers Sign etend ns ns ALU Zero Reslt 2 ns Data 2 ns ns

How bad is this? With these same component delays, a sw instrction wold need 7ns, and beq wold need jst 5ns. Let s consider the gcc instrction mi from p. 89 of the tetbook. Instrction Arithmetic Loads Stores Branches Freqency 48% 22% % 9% With a single-cycle path, each instrction wold reqire 8ns. Bt if we cold eecte instrctions as fast as possible, the average time per instrction for gcc wold be: (48% 6ns) + (22% 8ns) + (% 7ns) + (9% 5ns) = 6.36ns The single-cycle path is abot.26 times slower!

It gets worse... We ve made very optimistic assmptions abot latency: ain accesses on modern machines is >5ns. For comparison, an ALU on the Pentim4 takes ~.3ns. Or worst case cycle (loads/stores) incldes 2 accesses A modern single cycle implementation wold be stck at <hz. Caches will improve common case access time, not worst case. Tying freqency to worst case path violates first law of performance!!

A mltistage approach to instrction eection We ve informally described instrctions as eecting in several steps.. Instrction fetch and PC increment. 2. ing sorces from the register file. 3. Performing an ALU comptation. 4. ing or writing (). 5. Storing back to the register file. What if we made these stages eplicit in the hardware design? 2

Performance benefits Each instrction can eecte only the stages that are necessary. Arithmetic Load Store Branches This wold mean that instrctions complete as soon as possible, instead of being limited by the slowest instrction. Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file 2

The clock cycle Things are simpler if we assme that each stage takes one clock cycle. This means instrctions will reqire mltiple clock cycles to eecte. Bt since a single stage is fairly simple, the cycle time can be low. For the proposed eection stages below and the sample path delays shown earlier, each stage needs 2ns at most. This acconts for the slowest devices, the ALU and. A 2ns clock cycle time corresponds to a 5Hz clock rate! Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file 2

Cost benefits As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle jst like before. Bt since instrctions reqire mltiple cycles, we cold rese some nits in a different cycle dring the eection of a single instrction. For eample, we cold se the same ALU: to increment the PC (first clock cycle), and for arithmetic operations (third clock cycle). Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file 2

Two etra adders Or original single-cycle path had an ALU and two adders. The arithmetic-logic nit had two responsibilities. Doing an operation on two registers for arithmetic instrctions. Adding a register to a sign-etended constant, to compte effective es for lw and sw instrctions. One of the etra adders incremented the PC by compting PC + 4. The other adder compted branch targets, by adding a sign-etended, shifted offset to (PC + 4). 2

The etra single-cycle adders PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend 2

Or new adder setp We can eliminate both etra adders in a mlticycle path, and instead se jst one ALU, with mltipleers to select the proper inpts. A 2-to- m ALUSrcA sets the first ALU inpt to be the PC or a register. A 4-to- m ALUSrcB selects the second ALU inpt from among: the register file (for arithmetic operations), a constant 4 (to increment the PC), a sign-etended constant (for effective es), and a sign-etended and shifted constant (for branch targets). This permits a single ALU to perform all of the necessary fnctions. Arithmetic operations on two register operands. Incrementing the PC. Compting effective es for lw and sw. Adding a sign-etended, shifted offset to (PC + 4) for branches. 2

The mlticycle adder setp highlighted PC PC IorD em ALUSrcA Address emory em em Data RegDst register register 2 register Reg 2 Registers 4 2 3 ALU Zero Reslt ALUOp ALUSrcB Sign etend Shift left 2 emtoreg 2

Eliminating a Similarly, we can get by with one nified, which will store both program instrctions and. (a Princeton architectre) This is sed in both the instrction fetch and access stages, and the cold come from either: the PC register (when we re fetching an instrction), or the ALU otpt (for the effective of a lw or sw). We add another 2-to- m, IorD, to decide whether the is being accessed for instrctions or for. Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file 2

The new setp highlighted PC PC IorD em ALUSrcA Address emory em em Data RegDst register register 2 register Reg 2 Registers 4 2 3 ALU Zero Reslt ALUOp ALUSrcB Sign etend Shift left 2 emtoreg 2

Intermediate registers Sometimes we need the otpt of a fnctional nit in a later clock cycle dring the eection of one instrction. The instrction word fetched in stage determines the destination of the register write in stage 5. The ALU reslt for an comptation in stage 3 is needed as the for lw or sw in stage 4. These otpts will have to be stored in intermediate registers for ftre se. Otherwise they wold probably be lost by the net clock cycle. The instrction read in stage is saved in Instrction register. Register file otpts from stage 2 are saved in registers A and B. The ALU otpt will be stored in a register ALUOt. Any fetched from in stage 4 is kept in the emory register, also called DR. 3

3 The final mlticycle path Reslt Zero ALU ALUOp ALUSrcA 2 3 ALUSrcB register register 2 register 2 Registers Reg Address emory em Data Sign etend Shift left 2 PC PC A 4 [3-26] [25-2] [2-6] [5-] [5-] Instrction register emory register IR RegDst emtoreg IorD em em PC ALU Ot B

Register write control signals We have to add a few more control signals to the path. Since instrctions now take a variable nmber of cycles to eecte, we cannot pdate the PC on each cycle. Instead, a PC signal controls the loading of the PC. The instrction register also has a write signal, IR. We need to keep the instrction word for the dration of its eection, and mst eplicitly re-load the instrction register when needed. The other intermediate registers, DR, A, B and ALUOt, will store for only one clock cycle at most, and do not need write control signals. 3

Smmary A single-cycle CPU has two main disadvantages. The cycle time is limited by the worst case latency. It reqires more hardware than necessary. A mlticycle processor splits instrction eection into several stages. Instrctions only eecte as many stages as reqired. Each stage is relatively simple, so the clock cycle time is redced. Fnctional nits can be resed on different cycles. We made several modifications to the single-cycle path. The two etra adders and one were removed. ltipleers were inserted so the ALU and can be sed for different prposes in different eection stages. New registers are needed to store intermediate reslts. Net time, we ll look at controlling this path. 3