The single-cycle design from last time

Similar documents
The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

The extra single-cycle adders

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Review Multicycle: What is Happening. Controlling The Multicycle Design

EEC 483 Computer Organization

Exceptions and interrupts

Review. A single-cycle MIPS processor

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

Pipelining. Chapter 4

Review: Computer Organization

PART I: Adding Instructions to the Datapath. (2 nd Edition):

CS 251, Winter 2018, Assignment % of course mark

1048: Computer Organization

1048: Computer Organization

CS 251, Winter 2019, Assignment % of course mark

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

EEC 483 Computer Organization

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Lecture 7. Building A Simple Processor

What do we have so far? Multi-Cycle Datapath

CS 251, Winter 2018, Assignment % of course mark

Lecture 9: Microcontrolled Multi-Cycle Implementations

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Quiz #1 EEC 483, Spring 2019

CS 251, Spring 2018, Assignment 3.0 3% of course mark

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

1048: Computer Organization

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

Solutions for Chapter 6 Exercises

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

Hardware Design Tips. Outline

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Chapter 6: Pipelining

Review. How to represent real numbers

Enhanced Performance with Pipelining

Computer Architecture

CSSE232 Computer Architecture I. Mul5cycle Datapath

Computer Architecture

PS Midterm 2. Pipelining

ECE232: Hardware Organization and Design

Design of the MIPS Processor

Overview of Pipelining

Lab 8 (All Sections) Prelab: ALU and ALU Control

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Chapter 6: Pipelining

Computer Architecture. Lecture 6: Pipelining

Design of the MIPS Processor (contd)

Lecture 5: The Processor

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

Topic #6. Processor Design

LECTURE 5. Single-Cycle Datapath and Control

EEC 483 Computer Organization. Branch (Control) Hazards

MIPS Architecture. An Example: MIPS. From the Harris/Weste book Based on the MIPS-like processor from the Hennessy/Patterson book

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

CPE 335. Basic MIPS Architecture Part II

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

LECTURE 6. Multi-Cycle Datapath and Control

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

Processor: Multi- Cycle Datapath & Control

Review: Abstract Implementation View

Lecture 10: Pipelined Implementations

CSEN 601: Computer System Architecture Summer 2014

EECS 322 Computer Architecture Improving Memory Access: the Cache

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W9-W

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Pipelined Datapath. One register file is enough

CSE 2021 COMPUTER ORGANIZATION

CC 311- Computer Architecture. The Processor - Control

MIPS-Lite Single-Cycle Control

Lecture 13: Exceptions and Interrupts

Computer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Systems Architecture I

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Chapter 5: The Processor: Datapath and Control

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

comp 180 Lecture 25 Outline of Lecture The ALU Control Operation & Design The Datapath Control Operation & Design HKUST 1 Computer Science

POWER-OF-2 BOUNDARIES

Chapter 5 Solutions: For More Practice

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor (I) - datapath & control. Hwansoo Han

Single-Cycle Examples, Multi-Cycle Introduction

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ECE369. Chapter 5 ECE369

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

CENG 3420 Lecture 06: Datapath

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Multicycle Approach. Designing MIPS Processor

The Processor: Datapath & Control

Lets Build a Processor

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

ECE473 Computer Architecture and Organization. Processor: Combined Datapath

ENE 334 Microprocessors

RISC Processor Design

Transcription:

lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not held back by slower ones. The clock cycle time can be decreased. We don t have to dplicate any hardware nits. A mlticycle processor reqires a somewhat simpler path which we ll see today, bt a more comple control nit that we ll save for net week.

The single-cycle design from last time PC 4 Add Reg Shift left 2 Add PCSrc A control nit (not shown) generates all the control signals from the instrction s op and fnc fields. Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend lticycle path 2

The eample add from last time Consider the instrction add $4, $, $2. op rs rt rd shamt fnc Assme $ and $2 initially contain and 2 respectively. Eecting this instrction involves several steps.. The instrction word is read from the instrction, and the program conter is incremented by 4. 2. The sorces $ and $2 are read from the register file. 3. The vales and 2 are added by the ALU. 4. The reslt (3) is stored back into $4 in the register file. lticycle path 3

How the add goes throgh the path PC+4 PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] RegDst register register 2 register 2 Registers...... ALUSrc ALU Zero Reslt ALUOp em Data em emtoreg I [5 - ] Sign etend... lticycle path 4

Edge-triggered state elements In an instrction like add $, $, $2, how do we know $ is not pdated ntil after its original vale is read? We ll assme that or state elements are positive edge triggered, and can be pdated only on the positive edge of a clock signal. The register file and have eplicit write control signals, Reg and em. These nits can be written to only if the control signal is asserted and there is a positive clock edge. In a single-cycle machine the PC is pdated on each clock cycle, so we don t bother to give it an eplicit write control signal. register register 2 register Reg em 2 Registers Data em PC lticycle path 5

The path and the clock. On a positive clock edge, the PC is pdated with a new. 2. A new instrction can then be loaded from. The control nit sets the path signals appropriately so that registers are read, ALU otpt is generated, is read or written, and branch target es are compted. 3. Several things happen on the net positive clock edge. The register file is pdated for arithmetic or lw instrctions. Data is written for a sw instrction. The PC is pdated to point to the net instrction. In a single-cycle path everything in Step 2 mst complete within one clock cycle, before the net positive clock edge. lticycle path 6

The slowest instrction... If all instrctions mst complete within one clock cycle, then the cycle time has to be large enogh to accommodate the slowest instrction. For eample, lw $, 4($) needs 8ns, assming the delays shown here. reading the instrction reading the base register $sp compting $sp-4 reading the storing back to $t 2ns ns 2ns 2ns ns 8ns Instrction [3-] Instrction 2 ns I [25-2] I [2-6] I [5 - ] I [5 - ] ns register register 2 register ns 2 Registers Sign etend ns ns ALU Zero Reslt 2 ns Data 2 ns ns lticycle path 7

...determines the clock cycle time If we make the cycle time 8ns then every instrction will take 8ns, even if they don t need that mch time. For eample, the instrction add $4, $, $2 really needs jst 6ns. reading the instrction reading registers $t and $t2 compting $t + $t2 storing the reslt into $s 2ns ns 2ns ns 6ns Instrction [3-] Instrction 2 ns I [25-2] I [2-6] I [5 - ] I [5 - ] ns register register 2 register ns 2 Registers Sign etend ns ns ALU Zero Reslt 2 ns Data 2 ns ns lticycle path 8

How bad is this? With these same component delays, a sw instrction wold need 7ns, and beq wold need jst 5ns. Let s consider the gcc instrction mi from p. 89 of the tetbook (ed2), Instrction Arithmetic Loads Stores Branches Freqency 48% 22% % 9% With a single-cycle path, each instrction wold reqire 8ns. Bt if we cold eecte instrctions as fast as possible, the average time per instrction for gcc wold be: (48% 6ns) + (22% 8ns) + (% 7ns) + (9% 5ns) = 6.36ns The single-cycle path is abot.26 times slower! lticycle path 9

It gets worse... Or small instrction set incldes only very simple operations. If we spported more comple, time-consming instrctions, then the performance penalty of a single-cycle machine cold be mch lower. Integer mltiplication and division, or floating-point operations Comple ing modes like the 886 Vector-based, SID instrctions like X lticycle path

...and worse... A single-cycle path also ses etra hardware one ALU is not enogh, since we mst do p to three calclations in one clock cycle for a beq. PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend lticycle path

...and worse This is also why we sed a Harvard architectre with two memories; yo can t easily read two es from the same in one cycle. PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend lticycle path 2

A mltistage approach to instrction eection We ve informally described instrctions as eecting in several steps.. Instrction fetch and PC increment. 2. ing sorces from the register file. 3. Performing an ALU comptation. 4. ing or writing (). 5. Storing back to the register file. What if we made these stages eplicit in the hardware design? 3

Performance benefits Each instrction can eecte only the stages that are necessary. Arithmetic operations never read or write. A sw instrction does not save anything to the register file. Branches neither access nor write to the registers. This wold mean that instrctions complete as soon as possible, instead of being limited by the slowest instrction. Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file lticycle path 4

The clock cycle Things are simpler if we assme that each stage takes one clock cycle. This means instrctions will reqire mltiple clock cycles to eecte. Bt since a single stage is fairly simple, the cycle time can be low. For the proposed eection stages below and the sample path delays shown earlier, each stage needs 2ns at most. This acconts for the slowest devices, the ALU and. A 2ns clock cycle time corresponds to a 5Hz clock rate! Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file lticycle path 5

Cost benefits As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We are still restricted to sing each fnctional nit once per cycle, jst like before. Bt since instrctions reqire mltiple cycles, we cold rese some nits in a different cycle dring the eection of a single instrction. For eample, we cold se an ALU to increment the PC in the first clock cycle of an instrction eection, and then rese that ALU for arithmetic operations in the third cycle. Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file lticycle path 6

Two etra adders Or original single-cycle path had an ALU and two adders. The arithmetic-logic nit had two responsibilities. Doing an operation on two registers for arithmetic instrctions. ( 3rd stage) Adding a register to a sign-etended constant, to compte effective es for lw and sw instrctions. ( 3rd stage) One of the etra adders incremented the PC by compting PC + 4. ( st stage) The other adder compted branch targets, by adding a sign-etended, shifted offset to (PC + 4). ( 3rd stage) lticycle path 7

The etra single-cycle adders PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend lticycle path 8

Or new adder setp We can eliminate both etra adders in a mlticycle path, and instead se jst one ALU, with mltipleers to select the proper inpts. A 2-to- m ALUSrcA sets the first ALU inpt to be the PC or a register. A 4-to- m ALUSrcB selects the second ALU inpt from among: the register file (for arithmetic operations), a constant 4 (to increment the PC), a sign-etended constant (for effective es), and a sign-etended and shifted constant (for branch targets). This permits a single ALU to perform all of the necessary fnctions. Arithmetic operations on two register operands. Incrementing the PC. Compting effective es for lw and sw. Adding a sign-etended, shifted offset to (PC + 4) for branches. lticycle path 9

The mlticycle adder setp highlighted PC PC IorD em ALUSrcA Address emory em em Data RegDst register register 2 register Reg 2 Registers 4 2 3 ALU Zero Reslt ALUOp ALUSrcB Sign etend Shift left 2 emtoreg lticycle path 2

Eliminating a Similarly, we can get by with one nified, which will store both program instrctions and. This is sed in both the instrction fetch and access stages, and the cold come from either: the PC register (when we re fetching an instrction), or the ALU otpt (for the effective of a lw or sw). We add another 2-to- m, IorD, to decide whether the is being accessed for instrctions or for. Proposed eection stages. Instrction fetch and PC increment 2. ing sorces from the register file 3. Performing an ALU comptation 4. ing or writing () 5. Storing back to the register file lticycle path 2

The new setp highlighted PC PC IorD em ALUSrcA Address emory em em Data RegDst register register 2 register Reg 2 Registers 4 2 3 ALU Zero Reslt ALUOp ALUSrcB Sign etend Shift left 2 emtoreg lticycle path 22

Intermediate registers Sometimes we need the otpt of a fnctional nit in a later clock cycle dring the eection of one instrction. The instrction word fetched in stage determines the destination of the register write in stage 5. The ALU reslt for an comptation in stage 3 is needed as the for lw or sw in stage 4. These otpts will have to be stored in intermediate registers for ftre se. Otherwise they wold probably be lost by the net clock cycle. The instrction read in stage is saved in Instrction register. Register file otpts from stage 2 are saved in registers A and B. The ALU otpt will be stored in a register ALUOt. Any fetched from in stage 4 is kept in the emory register, also called DR. lticycle path 23

The final mlticycle path PC PC IorD ALUSrcA em Address emory em em Data IR [3-26] [25-2] [2-6] [5-] [5-] RegDst register register 2 register Reg 2 Registers A B 4 2 3 ALU Zero Reslt ALUOp ALU Ot PCSorce Instrction register emory register Sign etend Shift left 2 ALUSrcB emtoreg lticycle path 24

Register write control signals We have to add a few more control signals to the path. Since instrctions now take a variable nmber of cycles to eecte, we cannot pdate the PC on each cycle. Instead, a PC signal controls the loading of the PC. The instrction register also has a write signal, IR. We need to keep the instrction word for the dration of its eection, and mst eplicitly re-load the instrction register when needed. The other intermediate registers, DR, A, B and ALUOt, will store for only one clock cycle at most, and do not need write control signals. lticycle path 25

Smmary A single-cycle CPU has two main disadvantages. The cycle time is limited by the slowest instrction. It reqires more hardware than necessary. A mlticycle processor splits instrction eection into several stages. Instrctions only eecte as many stages as reqired. Each stage is relatively simple, so the clock cycle time is redced. Fnctional nits can be resed on different cycles. We made several modifications to the single-cycle path. The two etra adders and one were removed. ltipleers were inserted so the ALU and can be sed for different prposes in different eection stages. New registers are needed to store intermediate reslts. Net onday we ll look at controlling this beast, which will also help s nderstand how this path works. lticycle path 26