Pipelined Datapath. One register file is enough

Similar documents
CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W12-M

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

The extra single-cycle adders

Single-Cycle Examples, Multi-Cycle Introduction

ECE 154A Introduction to. Fall 2012

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W9-W

Computer Science 141 Computing Hardware

Multicycle conclusion

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CPE 335 Computer Organization. Basic MIPS Architecture Part I

EE 457 Unit 6a. Basic Pipelining Techniques

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

Review: Abstract Implementation View

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Chapter 4 (Part II) Sequential Laundry

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Processor: Multi- Cycle Datapath & Control

Processor (multi-cycle)

COMP2611: Computer Organization. The Pipelined Processor

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

ECE 313 Computer Organization EXAM 2 November 9, 2001

RISC Design: Multi-Cycle Implementation

Chapter 4. The Processor

Chapter 4. The Processor

More advanced CPUs. August 4, Howard Huang 1

Improving Performance: Pipelining

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Processor Design

Chapter 4 The Processor 1. Chapter 4A. The Processor

Review Multicycle: What is Happening. Controlling The Multicycle Design

Designing a Pipelined CPU

Lecture 5: The Processor

Multicycle Approach. Designing MIPS Processor

The single-cycle design from last time

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

COSC 6385 Computer Architecture - Pipelining

Basic Pipelining Concepts

The overall datapath for RT, lw,sw beq instrucution

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Lecture 7 Pipelining. Peng Liu.

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

Multiple Cycle Data Path

Computer Architectures. DLX ISA: Pipelined Implementation

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

14:332:331 Pipelined Datapath

Pipelining: Basic Concepts

COMPUTER ORGANIZATION AND DESIGN

CPE 335. Basic MIPS Architecture Part II

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

ECS 154B Computer Architecture II Spring 2009

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Lecture 10 Multi-Cycle Implementation

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Pipelining. Pipeline performance

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Pipelined Processor Design

Appendix C: Pipelining: Basic and Intermediate Concepts

The Pipelined MIPS Processor

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Advanced processor designs

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

Working on the Pipeline

Pipeline Architecture RISC

Lecture 6: Pipelining

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Topic #6. Processor Design

Ch 5: Designing a Single Cycle Datapath

CISC 662 Graduate Computer Architecture Lecture 5 - Pipeline Pipelining

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Processor (II) - pipelining. Hwansoo Han

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Module 4c: Pipelining

ECE369. Chapter 5 ECE369

Mapping Control to Hardware

Pipelining. Chapter 4

are Softw Instruction Set Architecture Microarchitecture are rdw

CENG 3420 Lecture 06: Datapath

Design of the MIPS Processor

Full Datapath. Chapter 4 The Processor 2

ECEC 355: Pipelining

Simple Instruction Pipelining

Transcription:

ipelined path The goal of pipelining is to allow multiple instructions execute at the same time We may need to perform several operations in a cycle Increment the and add s at the same time. Fetch one instruction while another one reads or writes. lock cycle 3 5 6 7 lw $t, ($sp) IF ID E E sub$v, $a, $a IF ID E E and$t, $t, $t3 IF ID E E or $s, $s, $s IF ID E E add$t5, $t6, $ IF ID E E Thus, like the single-cycle path, a pipelined processor needs to duplicate hardware elements that are needed in the same clock cycle One file is enough We need only one file to support both the ID and stages. s and writes go to separate ports on the file s occur in the first half of the cycle, reads occur in the second half

Single-cycle path, slightly rearranged Src Reg [3-] em ress Src Op emtoreg Instr [5 - ] Instr [ - 6] Instr [5 - ] RegDst em 3 What s been changed? Almost nothing! This is equivalent to the original single-cycle path There are separate memories for instructions and There are adders for -based computations and one The control signals are the same Only some cosmetic changes were made to make the diagram smaller A few labels are missing, and the muxes are smaller. The has only one ress input. The actual operation can be determined from the em and em control signals. The path components have also been moved around in preparation for adding pipeline s

ultiple cycles In pipelining, instruction is execution divided into multiple cycles Information computed during one cycle may be needed in a later cycle The instruction read in the IF stage determines which s are fetched in the ID stage, what constant is used for the E stage, and what the destination is for The s read in ID are used in the E and/or E stages The output produced in the E stage is an effective for the E stage or a result for the stage We added several intermediate s to the multicycle path to preserve information between stages, as highlighted on the next slide 5 added to the multi-cycle IorD SrcA u x em ress emory em em IR [3-6] [5-] [-6] [5-] [5-] RegDst u x Reg A B u x 3 Op Out u x Source emory u x SrcB emtoreg 6 3

ipeline s We ll add intermediate s to our pipelined path There s a lot of information to save, however. We ll simplify our diagrams by drawing just one big pipeline between each stage The s are named for the stages they connect. ID/E E/E E/ No is needed after the stage, because after the instruction is done 7 ipelined path Src ID/E E/E E/ Reg [3-] em ress Src Op emtoreg Instr [5 - ] Instr [ - 6] Instr [5 - ] RegDst em

ropagating values forward Any values required in later stages must be propagated through the pipeline s The most extreme example is the destination The rd field of the instruction word, retrieved in the first stage (IF), determines the destination. But that isn t updated until the fifth stage () Thus, the rd field must be passed through all of the pipeline stages, as shown in red on the next slide Notice that we can t keep a single instruction like we did before in the multicycle path, because the pipelined machine needs to fetch a new instruction every clock cycle The destination Src ID/E E/E E/ Reg [3-] em ress Src Op emtoreg Instr [5 - ] Instr [ - 6] Instr [5 - ] RegDst em 5

What about control signals? The control signals are generated in the same way as in the single-cycle processor after an instruction is fetched, the processor decodes it and produces all of the appropriate control values But just like before, some of the control signals will not be needed until some later stage and clock cycle These signals must be propagated through the pipeline until they reach the appropriate stage. We can just pass them in the pipeline s, along with the other. ontrol signals can be categorized by the pipeline stage that uses them ipelined path and control ID/E Src ontrol E/E E/ E Reg [3-] em ress Stage ontrol Instr [5 signals - ] needed E Src Op RegDst Instr [ - 6] E em Instr em [5 - ] Src Reg emtoreg Src RegDst Op em emtoreg 6

ipelined path and control ID/E Src ontrol E/E E/ E Reg [3-] em ress Src Op emtoreg Instr [5 - ] Instr [ - 6] Instr [5 - ] RegDst em 3 Notes about the diagram The control signals are grouped together in the pipeline s, just to make the diagram a little clearer. Not all of the s have a write enable signal Because the path fetches one instruction per cycle, the must also be updated on each clock cycle. Including a write enable for the would be redundant Similarly, the pipeline s are also written on every cycle, so no explicit write signals are needed 7

An example execution sequence Here s a sample sequence of instructions to execute. es in decimal : lw $, ($) : sub $, $, $5 : and $, $, $ : or $6, $7, $ 6: add $3, $, $ ake some assumptions about actual values Each contains its number plus. For instance, $ contains, $ contains, etc Every location contains. Our pipeline diagrams will follow some conventions: An indicates values that aren t important, like the constant field of an R-type instruction Question marks indicate values we don t know, usually resulting from instructions coming before and after the ones in our example 5 ycle (filling) IF: lw $, ($) ID: E: E: : Src ontrol Reg (?) ID/E E E/E E/ [3-] Src (?) RegDst (?) Op () em (?) ress em (?) emtoreg (?) 6

ycle IF: sub $, $, $5 ID: lw $, ($) E: E: : Src [3-] ontrol Reg (?) ID/E E Src (?) RegDst (?) Op () E/E em (?) ress em (?) E/ emtoreg (?) 7 ycle 3 IF: and $, $, $ ID: sub $, $, $5 E: lw $, ($) E: : Src [3-] 5 ontrol Reg (?) 5 ID/E E Src () RegDst () Op (add) 33 E/E em (?) ress em (?) E/ emtoreg (?)

ycle IF: or $6, $7, $ ID: and $, $, $ E: sub $, $, $5 E: lw $, ($) : Src 6 [3-] ontrol Reg (?) ID/E E 5 Src () RegDst () Op (sub) E/E 33 em () ress em () E/ emtoreg (?) ycle 5 (full) IF: add $3, $, $ ID: or $6, $7, $ E: and $, $, $ E: sub $, $, $5 : lw $, ($) 6 Src [3-] 7 6 ontrol Reg () 7 ID/E E Src () RegDst () Op (and) E/E - 5 em () ress em () E/ emtoreg () 33

ycle 6 (emptying) IF: ID: add $3, $, $ E: or $6, $7, $ E: and $, $, $ : sub $, $, $5 Src [3-] - 3 ontrol Reg () ID/E E 6 7 Src () RegDst () Op (or) 6 E/E em () ress em () E/ emtoreg () ycle 7 IF: ID: E: add $3, $, $ E: or $6, $7, $ : and $, $, $ Src [3-] ontrol Reg () ID/E E 3 Src () RegDst () Op (add) 3 E/E 6 em () ress em () E/ emtoreg ()

ycle IF: ID: E: E: add $3, $, $ : or $6, $7, $ Src [3-] 6 ontrol Reg () ID/E E Src (?) RegDst (?) Op () E/E 3 em () ress em () E/ emtoreg () 6 3 ycle IF: ID: E: E: : add $3, $, $ Src [3-] 3 ontrol Reg () ID/E E Src (?) RegDst (?) Op () E/E? em (?) ress em (?) E/ emtoreg () 3

That s a lot of diagrams there lock cycle 3 5 6 7 lw $t, ($sp) IF ID E E sub$v, $a, $a IF ID E E and$t, $t, $t3 IF ID E E or $s, $s, $s IF ID E E add$t5, $t6, $ IF ID E E ompare the last nine slides with the pipeline diagram above You can see how instruction executions are overlapped Each functional unit is used by a different instruction in each cycle The pipeline s save control and values generated in previous clock cycles for later use When the pipeline is full in clock cycle 5, all of the hardware units are utilized. This is the ideal situation, and what makes pipelined processors so fast 5 erformance Revisited Assuming the following functional unit latencies: 3ns ns ns 3ns ns Inst mem Reg em Reg What is the cycle time of a single-cycle implementation? What is its throughput? What is the cycle time of a ideal pipelined implementation? What is its steady-state throughput? How much faster is pipelining? 6 3

Ideal speedup lock cycle 3 5 6 7 lw $t, ($sp) IF ID E E sub$v, $a, $a IF ID E E and$t, $t, $t3 IF ID E E or $s, $s, $s IF ID E E add$sp, $sp, - IF ID E E This implies that the maximum speedup is 5 times In general, the ideal speedup equals the pipeline depth Why was our speedup on the previous slide only x? The pipeline stages are imbalanced: a file and operations can be done in ns, but we must stretch that out to 3ns to keep the ID, E, and stages synchronized with IF and E Balancing the stages is one of the many hard parts in designing a pipelined processor 7 The pipelining paradox lock cycle 3 5 6 7 lw $t, ($sp) IF ID E E sub$v, $a, $a IF ID E E and$t, $t, $t3 IF ID E E or $s, $s, $s IF ID E E add$sp, $sp, - IF ID E E ipelining does not improve the execution time of any single instruction. Each instruction here actually takes longer to execute than in a single-cycle path (5ns vs. ns)! Instead, pipelining increases the throughput, or the amount of work done per unit time. Here, several instructions are executed together in a clock cycle The result is improved execution time for a sequence of instructions, such as an entire program

set architectures & pipelining The IS instruction set was designed especially for easy pipelining. All instructions are 3-bits long, so the instruction fetch stage just needs to read one word on every clock cycle Fields are in the same position in different instruction formats the opcode is always the first six bits, rs is the next five bits, etc. This makes things easy for the ID stage IS is a -to- architecture, so arithmetic operations cannot contain references. This keeps the pipeline shorter and simpler ipelining is harder for older, more complex ISAs If different instructions had different lengths or formats, the fetch and decode stages would need extra time to determine the actual length of each instruction and the position of the fields With -to- instructions, additional pipeline stages may be needed to compute effective es and read before the E stage Summary The pipelined path combines ideas from the single and multicycle processors that we saw earlier It uses multiple memories and s execution is split into several stages ipeline s propagate and control values to later stages The IS instruction set architecture supports pipelining with uniform instruction formats and simple ing modes Next, we ll begin talking about Hazards 3 5