Pipelining. Pipeline performance

Similar documents
ECE260: Fundamentals of Computer Engineering

ECEC 355: Pipelining

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Processor (II) - pipelining. Hwansoo Han

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Designing a Pipelined CPU

Full Datapath. Chapter 4 The Processor 2

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

COSC 6385 Computer Architecture - Pipelining

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

ECE154A Introduction to Computer Architecture. Homework 4 solution

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

COMP2611: Computer Organization. The Pipelined Processor

Pipelining. CSC Friday, November 6, 2015

Chapter 4. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

CSEE 3827: Fundamentals of Computer Systems

EE2011 Computer Organization Lecture 10: Enhancing Performance with Pipelining ~ Pipelined Datapath

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

LECTURE 3: THE PROCESSOR

Chapter 4 The Processor 1. Chapter 4B. The Processor

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

ECE260: Fundamentals of Computer Engineering

ECS 154B Computer Architecture II Spring 2009

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Pipeline design. Mehran Rezaei

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Question 1: (20 points) For this question, refer to the following pipeline architecture.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Slides for Lecture 15

Designing a Pipelined CPU

Hakim Weatherspoon CS 3410 Computer Science Cornell University

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

Instruction Pipelining

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

COMPUTER ORGANIZATION AND DESIGN

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Instr. execution impl. view

Chapter 4. The Processor

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Review. Review

Computer Architecture

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Appendix C. Abdullah Muzahid CS 5513

Processor Architecture

CS 2506 Computer Organization II Test 2

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS422 Computer Architecture

What is Pipelining? RISC remainder (our assumptions)

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Pipeline Control Hazards and Instruction Variations

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Computer Systems Architecture Spring 2016

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

(Basic) Processor Pipeline

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Data Hazards. Dealing With Data Hazards

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

ECE/CS 552: Pipeline Hazards

Basic Pipelining Concepts

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.

MIPS An ISA for Pipelining

Chapter 4 (Part II) Sequential Laundry

Instruction Pipelining

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

ECE260: Fundamentals of Computer Engineering

CS/CoE 1541 Mid Term Exam (Fall 2018).

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Computer Architecture. Lecture 6.1: Fundamentals of

LECTURE 9. Pipeline Hazards

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Pipelining. Maurizio Palesi

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

CS/CoE 1541 Exam 1 (Spring 2019).

ECE331: Hardware Organization and Design

ECE232: Hardware Organization and Design

Lecture 7 Pipelining. Peng Liu.

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

Outline Marquette University

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

ELE 655 Microprocessor System Design

Transcription:

Pipelining Basic concept of assembly line Split a job A into n sequential subjobs (A 1,A 2,,A n ) with each A i taking approximately the same time Each subjob is processed by a different substation (or resource) or, equivalently, passes through a series of stages Subjobs of different jobs overlap their execution, i.e. when subjob A 1 of job A is finished in stage 1, subjob A 2 of job A will start executing in stage 2 while subjob B 1 of job B will start executing in stage 1 2/15/99 CSE378 Pipelining 1 Pipeline performance Latency of a single job can be longer because each stage takes as long as the longest one, say t max because all jobs have to go through all stages even if they don t do anything in one of the stages Throughput is enhanced Ideally, in steady state, one job completes every t max rather than after (t 1 + t 2 +... t n ) In the ideal case (all t i are the same) throughput is n times better if there are n stages. But : Execution time of a job could take less than n stages We assume that the pipeline can be kept full all the time 2/15/99 CSE378 Pipelining 2 1

Pipeline applied to instruction execution Multiple cycle implementation Instr. i In pipeline mode Instr. i Instr. i+1 Instr. i+1 Instr. i+2 Idle time 2/15/99 CSE378 Pipelining 3 Pipeline parallelism Note that at any given time after the pipeline is full, we have 5 instructions in progress but that each individual instruction has a longer latency Instr. i Instr. i+1 t t+1 t +2 t+3 t+4 t+5 t +6 t+7 t+8 t+9 Instr. i+2 Instr. i+3 Instr. i+4 Instr. i+5 5 instructions in progress 2/15/99 CSE378 Pipelining 4 2

Pipeline implementation requirements 5 stages are active on 5 different instructions Thus all the resources needed for single cycle implementation will be required (at least) All stages are independent and isolated from each other. This implies that all information gathered at stage i needed in stages i+1, i+2 etc must be kept and passed from stage to stage. For that purpose we will use pipeline registers (or latches) between each stage 2/15/99 CSE378 Pipelining 5 Example of what to store in pipeline registers The following is not an exhaustive list (just a sample) The register number where the result will be stored Known at stage 2; needed at stage 5 The register number of the data containing the contents of a store as well as the contents of that register Known at stage 2; needed at stage 4 The immediate value Known at stage 2; needed at stage 3 The updated PC etc.. 2/15/99 CSE378 Pipelining 6 3

Pipeline data path (highly abstracted) IF/ID ID/EX EX/Mem Mem/WB IF ID EX Mem WB 2/15/99 CSE378 Pipelining 7 Pipeline data path (a little less abstracted) Let s look back at Fig. 5.17 (Single cycle implementation) and see where the pipeline registers should be put and what additional resources, if any, are needed. 2/15/99 CSE378 Pipelining 8 4

Hazards Hazards are what prevents the pipeline to be ideal Three types of hazards: Structural where two instructions at different stages want to use the same resource. Solved by using more resources (e.g., instruction and data memory; several ALU s) Data hazards when an instruction in the pipeline is dependent on another instruction still in the pipeline. Some of these hazards will be resolved by bypassing and some will result in the pipeline being stalled Control hazards which happens on a successful branch (or procedure call/return). We ll investigate several techniques to take care of these hazards. 2/15/99 CSE378 Pipelining 9 Notation (cf. Figure 6.18) Pipeline registers keeping information between stages labeled by the names of the stages they connect (IF/ID, ID/EX, etc.) Information flows from left to right except for writing the result register (potential for data hazard) modifying the PC (potential for control hazard) 2/15/99 CSE378 Pipelining 10 5

Tracing one instruction through all 5 stages (St 1) Stage 1: Instruction fetch and increment PC (same for all instructions) Instruction fetched from instruction memory. The instruction will be needed in subsequent stages. Hence stored in IF/ID recall (i) the IR register in multiple cycle implementation, and (ii) that in single cycle implementation we had two memories, one for instruction and one for data PC PC + 4 The incremented PC will be needed to fetch the next instruction at the next cycle but it will also be needed if we have to do branch target computation. Hence incremented PC saved in IF/ID Resources needed: Instruction memory and ALU IF/ID contains instruction and incremented PC 2/15/99 CSE378 Pipelining 11 Tracing one instruction through all 5 stages (St 2) Stage 2: Instruction decode and register read (same for all instructions) We save in ID/EX everything that can be needed in next stages: The instruction and the incremented PC (from IF/ID) (e.g., function bits can be needed, the name of the register to be written etc.) The contents of registers rs and rt (recall registers A and B) The sign extended immediate field (for imm. Inst., load/store, branch) Note that we do not compute the branch target address here. We could and we will do so as an optimization later on. Resources needed: register file ID/EX contains PC, instruction, contents of rs and rt, extended imm. field 2/15/99 CSE378 Pipelining 12 6

Tracing one instruction through all 5 stages (St 3) Stage 3: depends on the type of instruction being executed. Let s assume a Load. Hence this stage is address computation ALU sources come from ID/EX (rs and immediate field) ALU result stored in EX/Mem Contents of rs, rt, and imm. Filed not needed any longer Looks like PC not needed either but we ll keep it because of possible exceptions (see later in the course) Need to keep in EX/Mem part of the instruction (name of the rt register) and the indication we have a load 2/15/99 CSE378 Pipelining 13 Stage 3 (ct d) Resource needed: ALU But if we had a branch we need two ALU s, one for the branch target computation and one for comparing registers. EX/Mem needs to keep result of ALU (for load/store and arith. instructions) name of result register (for load and arith. instructions) contents of rs (for store) PC (in case of exceptions) 2/15/99 CSE378 Pipelining 14 7

Tracing one instruction through all 5 stages (St 4) Stage 4: Access data memory. Still assuming we have a load Access data memory with address kept in EX/Mem Keep result of load in Mem/WB Resource needed: Data memory Mem/WB needs to keep Result of load if load instruction Result of arith. Instr. if arith instr (from EX/Mem) Name of result register No need to keep PC any longer (no exception occurs in last stage) 2/15/99 CSE378 Pipelining 15 Tracing one instruction through all 5 stages (St 5) Stage 5: Write back register (for load and arith) Contents of what to write and where to write it in Mem/WB Nothing to be kept Resource needed: registers Ah! But registers were also needed in Stage 2! We allow writing registers in first part of cycle and reading registers in second part of cycle 2/15/99 CSE378 Pipelining 16 8

Summary of requirements of ideal pipeline data path Stages 1 and 2: info to be kept is similar for all instructions The width of the pipeline registers must be such that it fits the maximum amount of info that will be needed Instructions (except branches, see soon) must pass through all stages even if nothing is done in that stage The state of the machine is modified only in the last stage (except for branches) 2/15/99 CSE378 Pipelining 17 9