Lecture 6: Pipelining

Similar documents
Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Designing a Pipelined CPU

Chapter 4 (Part II) Sequential Laundry

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Computer Architecture. Lecture 6.1: Fundamentals of

Outline Marquette University

Improve performance by increasing instruction throughput

Pipeline: Introduction

Lecture 5: The Processor

COMP2611: Computer Organization. The Pipelined Processor

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

ELCT 501: Digital System Design

微算機系統第六章. Enhancing Performance with Pipelining 陳伯寧教授電信工程學系國立交通大學. Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold

Computer Systems Architecture Spring 2016

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Pipelining. Maurizio Palesi

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Modern Computer Architecture

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Multi-cycle Datapath (Our Version)

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Design a MIPS Processor (2/2)

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

Advanced Computer Architecture Pipelining

Lecture 15: Pipelining. Spring 2018 Jason Tang

Lecture 19 Introduction to Pipelining

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Enhanced Performance with Pipelining

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Pipelining. Chapter 4

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

What do we have so far? Multi-Cycle Datapath

Chapter 8. Pipelining

Working on the Pipeline

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.

Chapter 5 (a) Overview

Processor Architecture

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15

Lecture 7 Pipelining. Peng Liu.

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

The Pipelined MIPS Processor

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Pipelining. CSC Friday, November 6, 2015

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

cs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes

Chapter 4. The Processor

14:332:331 Pipelined Datapath

EE 457 Unit 6a. Basic Pipelining Techniques

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

COSC 6385 Computer Architecture - Pipelining

ECE/CS 552: Pipelining

Full Datapath. Chapter 4 The Processor 2

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Computer Architecture

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

Chapter 4. The Processor

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Appendix A. Overview

EECS 322 Computer Architecture Improving Memory Access: the Cache

EITF20: Computer Architecture Part2.2.1: Pipeline-1

ECEC 355: Pipelining

COMPUTER ORGANIZATION AND DESIGN

ECE154A Introduction to Computer Architecture. Homework 4 solution

Pipelining: Basic Concepts

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Processor (II) - pipelining. Hwansoo Han

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Pipelining. lecture 15. MIPS data path and control 3. Five stages of a MIPS (CPU) instruction. - factory assembly line (Henry Ford years ago)

Review: Computer Organization

Topics. Pipelining Lessons (Task, Resource) Pipelining Lessons (Time cycles) Pipelining is Natural! Laundry Example

Pipelining. Pipeline performance

Chapter 4. The Processor

CPE 335. Basic MIPS Architecture Part II

Transcription:

Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other sources for academic purpose only. The instructor does not claim any originality.

The Big Picture: Where are We Now? We know five classic components of a computer. We understand how the instruction set plays a key role in determining the performance, the design complexity of the path and controller. We designed a processor comprising of path and controller for a small set of instructions. Datapath: th Single-cycle implementation - Too slow ulti-cycle implementation Large instructions take longer time, small instructions take shorter time Controller: Approach I: FS Based Approach Structured approach to derive a circuit implementation from FS specification Implementation styles: () Random-logic (2) PLA (3) RO Approach 2: icroprogramming Control written as a program using microinstructions Flexible but slower 2

Pipelining is Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold. A B C D Washing takes 3 minutes. Drying takes 3 minutes. Folding takes 3 minutes. Putting-away takes 3 minutes to put clothes into drawers. 3

Sequential Laundry 6 P 7 8 9 2 2 A T a s k O r d e r 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Time A B C D Sequential laundry takes 8 hours for 4 loads. Ifthey learned pipelining, pp how long would laundry take? 4

Pipelined Laundry: Start work ASAP 2 2 A 6 P 7 8 9 T a s k O r d e r A B C D 3 3 3 3 3 3 3 Time Pipelined laundry takes 3.5 hours for 4 loads! 5

Pipelining Lessons T a s k O r d e r 6 P 7 8 9 Time 3 3 3 3 3 3 3 A B C D Pipelining doesn t help latency of single task, it helps throughput of entire workload. ultiple tasks operating simultaneously using different resources. Potential speedup = Number pipeline stages. Pipeline rate limited by slowest pipeline stage. Unbalanced lengths of pipe stages reduces speedup. Time to fill pipeline and time to drain it reduces speedup. Stall for dependences. 6

IPS case: The Five Stages of Load Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Ifetch /Dec Exec em WB Ifetch: Fetch Fetch the instruction from the emory /Dec: isters Fetch and Decode Exec: Calculate the memory address em: the from the Data emory WB: the back to the register file 7

Conventional Pipelined Execution Representation Time IFetch Dcd Exec em WB IFetch Dcd Exec em WB IFetch Dcd Exec em WB IFetch Dcd Exec em WB Program Flow IFetch Dcd Exec em WB IFetch Dcd Exec em WB 8

Single Cycle, ultiple Cycle, vs. Pipeline Clk Cycle Single Cycle Implementation: Cycle 2 Load Store Waste Clk Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle ultiple Cycle Implementation: Load Store Ifetch Exec em Wr Ifetch Exec em R-type Ifetch Pipeline Implementation: Load Ifetch Exec em Wr Store Ifetch Exec em Wr Improves performance by increasing instruction throughput. R-type Ifetch Exec em Wr 9

Program execution order Time (in instructions) lw $, ($) Single Cycle Vs Pipelining 2 4 6 8 2 4 6 8 Data ALU fetch access lw $2, 2($) 8 nsps fetch ALU Data access lw $3, 3($) 8ps ns Overall execution time = 3x8ps = 24ps. fetch... 8 nsps Program execution Time order (in instructions) lw $, ($) lw $2, 2($) lw $3, 3($) fetch 2 nsps 2 4 6 8 2 4 fetch 2ps ns ALU fetch 2 nsps Data access ALU 2 nsps Data access ALU 2 nsps Data access 2 nsps Overall execution time = 3x2ps = 6ps. 2 nsps Speedup = 24/6 = 4. Ideal speedup is number of stages in the pipeline. Do we achieve this?

Pipelining What makes it easy/hard? What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural t hazards: suppose we had only one memory hazards: an instruction depends on a previous instruction control hazards: need to worry about branch instructions We ll build a simple pipeline and look at these issues.

Pipelining the Datapath: Basic Idea IF: fetch u ID: decode/ register file read EX: Execute/ address calculation E: emory access WB: back x Add 4 Add Add result Shift left 2 PC Address memory register register 2 isters 2 register u x Zero ALU ALU result Address Data memory ux 6 Sign extend 32 What do we need to add to actually split the path into stages? 2

ux Pipelined Datapath 64-bit 28-bit 97-bit 64-bit IF/ID ID/EX EX/E E/WB Add 4 Add Add result Shift left 2 PC Address register register 2 isters memory 2 register ux ruction Instr Zero ALU ALU result Address Data memory ux 6 Sign extend 32 Follow Fig. 2 (page-389) to Fig. 4 (page-39) to understand pipelined execution of lw instruction. Others instructions will be similar. 3

u x Corrected Datapath IF/ID ID/EX EX/E E/WB Add 4 Add Add result Shift left 2 PC In nstruction register Address register 2 isters memory 2 register u x Zero ALU ALU result Address Data memory u x 6 Sign extend 32 For load instruction, register number is needed in the last stage, thus same needs to be passed along in order to be preserved. 4

Graphically Representing Pipelines Tim e (in clock cycles) Program execution C C C C 2 C C 3 C C 4 C C 5 C C 6 o r d e r (in in s tru c tio n s ) lw $, 2($) I R e g ALU D I D sub $, $2, $3 ALU Pipeline can be thought of as a series of paths shifted in time. The above graphics can help in answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand paths 5

Pipeline Control PCSrc u x IF/ID ID/EX EX/E E/W B Add 4 W rite Shift le ft 2 Add A d d result Branch PC Address memory Instru c tion register register 2 isters register 2 ALUSrc u x Zero ALU ALU result Address W rite em Data memory emto u x [5 ] 6 Sign 32 extend 6 ALU control em [2 6] [5 ] u x ALUOp Dst 6

Pipeline Control We have 5 stages. What needs to be controlled in each stage? Fetch and PC Increment Decode / ister Fetch Execution emory Stage Back 7

Pipeline Control Pass control signals along just like the. Execution/Address Calculation stage control lines emory access stage control lines stage control lines ALU ALU ALU em em em to Dst Op Op Src Branch write R-format lw sw X X beq X X WB Control WB EX WB IF/ID ID/EX EX/E E/WB 8

PCSrc Datapath with Control ux Control ID/EX WB EX/E WB E/WB IF/ID EX WB Add PC 4 Add Add result Address memory ruction Instr register register 2 isters 2 register Shift left 2 ux ALUSrc Zero ALU ALU result Branch em Address Data memory em mto ux [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp ux Dst 9

Pipeline Hazards ajor Hurdles of Pipelining Dictionary meaning of hazard: a source of danger structural hazards: attempt to use the same resource two different ways at the same time e.g., combined washer/dryer would be a structural hazard hazards: attempt to use item before it is ready ti dependsd onresult of prior instruction ti still in the pipeline control hazards: attempt to make a decision before condition is evaluated Branch instructions One Solution: Wait until dependencies are resolved pipeline control must detect the hazard take action (or delay action) to resolve hazards 2