Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Similar documents
CISC 662 Graduate Computer Architecture Lecture 5 - Pipeline Pipelining

CISC 662 Graduate Computer Architecture Lecture 5 - Pipeline. Pipelining. Pipelining the Idea. Similar to assembly line in a factory:

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

ECEC 355: Pipelining

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Lecture 19 Introduction to Pipelining

Pipelining: Basic Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

COSC 6385 Computer Architecture - Pipelining

Pipeline Review. Review

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.

Pipelining: Hazards Ver. Jan 14, 2014

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Instruction Pipelining Review

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Lecture 5: Pipelining Basics

mywbut.com Pipelining

Instr. execution impl. view

Pipelining. Maurizio Palesi

Multi-cycle Datapath (Our Version)

Advanced Computer Architecture Pipelining

What do we have so far? Multi-Cycle Datapath

COMP2611: Computer Organization. The Pipelined Processor

MIPS ISA AND PIPELINING OVERVIEW Appendix A and C

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Lecture: Pipelining Basics

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Updated Exercises by Diana Franklin

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

DLX Unpipelined Implementation

Lecture 6: Pipelining

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Chapter 4 (Part II) Sequential Laundry

Computer System. Hiroaki Kobayashi 6/16/2010. Ver /16/2010 Computer Science 1

Appendix A. Overview

Improve performance by increasing instruction throughput

Assignment 1 solutions

Module 4c: Pipelining

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Designing a Pipelined CPU

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Computer Architectures. DLX ISA: Pipelined Implementation

Computer System. Agenda

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

EE 457 Unit 6a. Basic Pipelining Techniques

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

CS433 Homework 2 (Chapter 3)

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月

EITF20: Computer Architecture Part2.2.1: Pipeline-1

The Pipelined MIPS Processor

ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Lecture: Pipeline Wrap-Up and Static ILP

ECE 341. Lecture # 15

14:332:331 Pipelined Datapath

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Appendix C: Pipelining: Basic and Intermediate Concepts

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W12-M

Lecture 15: Pipelining. Spring 2018 Jason Tang

Lecture 7 Pipelining. Peng Liu.

Module 5 Introduction to Parallel Processing Systems

Week 11: Assignment Solutions

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

EE 4683/5683: COMPUTER ARCHITECTURE

UNIT I (Two Marks Questions & Answers)

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Pipeline: Introduction

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Appendix C. Abdullah Muzahid CS 5513

Good luck and have fun!

Pipelined Datapath. One register file is enough

Computer Architecture

Computer Architecture. Lecture 6.1: Fundamentals of

Transcription:

Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing different tasks from different instructions in parallel Ideally this should lead to throughput of one instruction per clock cycle 1 npipelined achine 2 instructions completed in 6 cycles t t+1 t+2 t+3 t+4 t+5 t+6 2 Pipelined achine 2 instructions completed in 4 cycles t t+1 t+2 t+3 t+4 t+5 t+6 3 Pipelining Overview Each task is called pipe stage or pipe segment All stages must be able to proceed at the same time: Stage duration is called the processor cycle It is determined by the slowest stage Goal of pipelining is to increase throughput number of instructions completed per clock cycle In ideally balanced pipeline with n stages: Tinstruction _ unpipelined Tinstruction = Speedup = n n 4 Pipelining Overview Pipelining increases throughput, reduces the average execution time per instruction It does not reduce the time needed to execute each instruction Frequently this time is slightly increased due to overhead involved in passing between stages 5 IPS npipelined Fetch instruction from em[pc] PC=PC+4 In parallel: Decode the instruction Read source registers; compare registers for possible branch Sign-extend the offset field if needed; compute possible branch target address by adding sign-extended offset to PC+4 (branch can now be completed) 6 Write data into register file either for LOAD or for instruction If instruction is: Load: load data from memory Store: store data from register to memory If instruction is: emory reference: add base register and offset to form memory address : perform the operation; for immediate sign-extend the second operand 1

IPS Pipelined Each task becomes a pipeline stage Start a new instruction in each clock cycle Instruction i Instruction i+1 1 2 3 4 5 6 7 8 9 For this to work we have to make sure that pipeline stages use distinct resources IPS Pipelined Separate instruction and data memory emory must deliver 5 times the bandwidth ister file must support two reads and one write We will perform them in half-cycles, first write and then read We need adder to increment PC and to perform branch target calculation 7 8 Pipeline Data Path Example IF/ID ID/E E/E E/WB Pipeline overhead= register delay+ clock skew Consider the unpipelined processor and assume that it has a 1 ns clock cycle and that it uses 4 cycles for operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40%, respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2 ns of overhead to the clock. Ignoring any latency impact: how much speedup will we gain from a pipeline? how many stages has this pipeline? 9 10 Pipeline Hazards Conflicts that prevent the instruction A from executing during its designated clock cycle: 1. Structural hazards instruction A requires a resource occupied by some previous instruction 2. Data hazards a source operand of instruction A is the output of some previous instruction and is not ready 3. Control hazards pipelining branches and other instructions that change PC Pipeline Hazards Hazards may make it necessary to stall the pipeline: Instruction A and all subsequent instructions are delayed until the hazard resolves All previous instructions proceed with execution. 11 12 2

Performance Degradation Due to Hazards Tper _ instruction _ unpipelined Speedup = = Tper _ instruction _ pipelined CPIunpipelined* CCunpipelined = CPI pipelined* CC pipelined n 1+ Stall _ Cycles per _ instruction Structural Hazards Resource conflicts: E.g. using same memory for data and instructions will create a hazard whenever we have STORE or LOAD Instructions are stalled until resource becomes available Conflicts can be avoided through resource duplication If structural hazards are not that frequent it may be cheaper to allow them 13 14 Load instruction Instruction i+1 Structural Hazards 1 2 3 4 5 6 7 8 9 stall IF ID E E IF ID E IF ID IF Example Suppose that data references constitute 40% of the mix, and that the ideal CPI of the pipelined processor, ignoring the structural hazard is 1. Assume that the processor with the structural hazard (data and instructions are stored in the same memory) has a clock rate that is 1.05 times higher than the clock rate of the processor without the hazard. Is the pipeline with or without structural hazard faster, and by how much? Case when we have instructions and data stored in the same memory 15 16 Source operand of the current instruction is the output of a previous instruction and is not ready 17 18 3

19 20 LD R4, 0(R1) LD R4, 0(R1) SD R4, 12(R1) SD R4, 12(R1) 21 22 D can be generalized as passing the result from one functional unit to another unit that needs it This is done through pipeline registers, not directly 23 24 4

Requiring Stalls Load followed by instructions that use the result Requiring Stalls 25 26 Requiring Stalls Requiring Stalls A piece of hardware called pipeline interlock is added to detect a hazard and stall the pipeline 27 28 5