Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

Similar documents
UCB CS61C : Machine Structures

ECE331: Hardware Organization and Design

Introduction To Pipelining. Chapter Pipelining1 1

Lecture #22 Pipelining II, Cache I

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

COSC 6385 Computer Architecture. - Pipelining

Computer Science 141 Computing Hardware

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

The Processor: Improving Performance Data Hazards

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue

CENG 3420 Lecture 07: Pipeline

Computer Architecture. Lecture 6.1: Fundamentals of

CSE4201. Computer Architecture

Pipeline: Introduction

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Review from last lecture

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Outline Marquette University

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

Lecture 6: Pipelining

CS61C : Machine Structures

Modern Computer Architecture

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Any modern computer system will incorporate (at least) two levels of storage:

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining. Maurizio Palesi

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Working on the Pipeline

Computer Systems Architecture Spring 2016

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade

ECE 154A Introduction to. Fall 2012

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly

A Novel Parallel Deadlock Detection Algorithm and Architecture

ELCT 501: Digital System Design

Multidimensional Testing

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies

Computer Architecture

Chapter 8. Pipelining

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

THE THETA BLOCKCHAIN

Overview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Chapter 5 (a) Overview

Chapter 4 The Processor 1. Chapter 4A. The Processor

EITF20: Computer Architecture Part2.2.1: Pipeline-1

ECE260: Fundamentals of Computer Engineering

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

ELE 655 Microprocessor System Design

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Lecture 7 Pipelining. Peng Liu.

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15

Modeling a shared medium access node with QoS distinction

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

COSC 6385 Computer Architecture - Pipelining

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Pipelining. CSC Friday, November 6, 2015

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Lecture 15: Pipelining. Spring 2018 Jason Tang

Module 4c: Pipelining

Chapter 4. The Processor

A Memory Efficient Array Architecture for Real-Time Motion Estimation

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

EE 6900: Interconnection Networks for HPC Systems Fall 2016

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Design a MIPS Processor (2/2)

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

Chapter 4 (Part II) Sequential Laundry

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

Conversion Functions for Symmetric Key Ciphers

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

Appendix A. Overview

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Chapter 4. The Processor

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Processor Architecture

Transcription:

Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1

Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams. Contol instucts datapath on what to do next. Datapath needs: access to stoage (geneal pupose egistes and memoy) computational ability () helpe hadwae (local egistes and PC) * 2

Review (2/3) Five stages of datapath (executing an instuction): 1. Instuction Fetch (Incement PC) 2. Instuction Decode (Read Registes) 3. (Computation) 4. Memoy Access 5. Wite to Registes ALL instuctions must go though ALL five stages. Datapath designed in hadwae. * 3

Example Datapath PC instuction memoy d s t egistes Data memoy +4 imm 1. Instuction Fetch 2. Decode/ Registe Read 3. Execute 4. Memoy 5. Wite Back * 4

Outline Pipelining Analogy Pipelining Instuction Execution Hazads Advanced Pipelining Concepts by Analogy * 5

Gotta Do Laundy Ann, Bian, Cathy, Dave each have one load of clothes to wash, dy, fold, and put away Washe takes 30 minutes Dye takes 30 minutes A B C D Folde takes 30 minutes Stashe takes 30 minutes to put clothes into dawes * 6

Sequential Laundy 6 PM 7 8 9 10 11 12 1 2 AM T a s k O d e A B C D 3030 30 30 30 30 30 30 3030 30 30 3030 30 30 Time Sequential laundy takes 8 hous fo 4 loads * 7

Pipelined Laundy 12 2 AM 6 PM 7 8 9 10 11 1 T a s k O d e A B C D 3030 30 30 30 30 30 Time Pipelined laundy takes 3.5 hous fo 4 loads! * 8

Geneal Definitions Latency: time to completely execute a cetain task fo example, time to ead a secto fom disk is disk access time o disk latency Thoughput: amount of wok that can be done ove a peiod of time * 9

T a s k O d e Pipelining Lessons (1/2) Pipelining doesn t help 6 PM 7 8 9 latency of single task, it helps thoughput of Time entie wokload A B C D 30 30 30 30 30 30 30 Multiple tasks opeating simultaneously using diffeent esouces Potential speedup = Numbe pipe stages Time to fill pipeline and time to dain it educes speedup: 2.3X v. 4X in this example * 10

T a s k O d e Pipelining Lessons (2/2) Suppose new 6 PM 7 8 9 A B C D Time 30 30 30 30 30 30 30 Washe takes 20 minutes, new Stashe takes 20 minutes. How much faste is pipeline? Pipeline ate limited by slowest pipeline stage Unbalanced lengths of pipe stages also educes speedup * 11

Steps in Executing MIPS 1) IFetch: Fetch Instuction, Incement PC 2) Decode Instuction, Read Registes 3) Execute: Mem-ef: Calculate Addess Aith-log: Pefom Opeation 4) Memoy: Load: Stoe: Read Data fom Memoy Wite Data to Memoy 5) Wite Back: Wite Data to Registe * 12

Pipelined Execution Repesentation Time IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Evey instuction must take same numbe of steps, also called pipeline stages, so some will go idle sometimes * 13

Review: Datapath fo MIPS PC +4 instuction memoy Stage 1 Stage 2 Stage 3Stage 4 Stage 5 1. Instuction Fetch d s t imm Use datapath figue to epesent pipeline IFtch Dcd egistes 2. Decode/ Registe Read Exec Mem WB Data memoy 3. Execute 4. Memoy5. Wite Back I$ Reg D$ Reg * 14

I n s t. O d e Gaphical Pipeline Repesentation (In Reg, ight half highlight ead, left half wite) Time (clock cycles) Load Add Stoe Sub O I$ Reg I$ Reg I$ D$ Reg I$ * 15 Reg D$ Reg I$ Reg D$ Reg Reg D$ Reg D$ Reg

Example Suppose 2 ns fo memoy access, 2 ns fo opeation, and 1 ns fo egiste file ead o wite Nonpipelined Execution: lw : IF + Read Reg + + Memoy + Wite Reg = 2 + 1 + 2 + 2 + 1 = 8 ns add: IF + Read Reg + + Wite Reg = 2 + 1 + 2 + 1 = 6 ns Pipelined Execution: Max(IF,Read Reg,,Memoy,Wite Reg) = 2 ns * 16

Pipeline Hazad: Matching socks in late load 12 2 AM 6 PM 7 8 9 10 11 1 T a s k A B 3030 30 30 30 30 30 bubble Time O d e C D E F A depends on D; stall since folde tied up * 17

Poblems fo Computes Limits to pipelining: Hazads pevent next instuction fom executing duing its designated clock cycle Stuctual hazads: HW cannot suppot this combination of instuctions (single peson to fold and put clothes away) Contol hazads: Pipelining of banches & othe instuctions stall the pipeline until the hazad bubbles in the pipeline Data hazads: Instuction depends on esult of pio instuction still in the pipeline (missing sock) * 18

Stuctual Hazad #1: Single Memoy (1/2) I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Read same memoy twice in same clock cycle * 19 Reg D$ Reg I$ Reg D$ Reg

I n s t. Stuctual Hazad #2: Registes (1/2) Load Inst 1 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg O d e Inst 2 Inst 3 Inst 4 I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Can t ead and wite to egistes simultaneously * 20

Stuctual Hazad #2: Registes (2/2) Fact: Registe access is VERY fast: takes less than half the time of stage Solution: intoduce convention always Wite to Registes duing fist half of each clock cycle always Read fom Registes duing second half of each clock cycle Result: can pefom Read and Wite duing same clock cycle * 21

Contol Hazad: Banching (1/6) Suppose we put banch decisionmaking hadwae in stage then two moe instuctions afte the banch will always be fetched, whethe o not the banch is taken Desied functionality of a banch if we do not take the banch, don t waste any time and continue executing nomally if we take the banch, don t execute any instuctions afte the banch, just go to the desied label * 22

Contol Hazad: Banching (2/6) Initial Solution: Stall until decision is made inset no-op instuctions: those that accomplish nothing, just take time Dawback: banches take 3 clock cycles each (assuming compaato is put in stage) * 23

Contol Hazad: Banching (3/6) Optimization #1: move compaato up to Stage 2 as soon as instuction is decoded (Opcode identifies is as a banch), immediately make a decision and set the value of the PC (if necessay) Benefit: since banch is complete in Stage 2, only one unnecessay instuction is fetched, so only one no-op is needed Side Note: This means that banches ae idle in Stages 3, 4 and 5. * 24

I n s t. O d e Contol Hazad: Banching (4/6) Inset a single no-op (bubble) Add Beq Load Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg bub ble I$ Reg D$ Reg Impact: 2 clock cycles pe banch instuction slow * 25

Contol Hazad: Banching (5/6) Optimization #2: Redefine banches Old definition: if we take the banch, none of the instuctions afte the banch get executed by accident New definition: whethe o not we take the banch, the single instuction immediately following the banch gets executed (called the banch-delay slot) * 26

Contol Hazad: Banching (6/6) Notes on Banch-Delay Slot Wost-Case Scenaio: can always put a no-op in the banch-delay slot Bette Case: can find an instuction peceding the banch which can be placed in the banch-delay slot without affecting flow of the pogam - e-odeing instuctions is a common method of speeding up pogams - compile must be vey smat in ode to find instuctions to do this - usually can find such an instuction at least 50% of the time * 27

Example: Nondelayed vs. Delayed Banch Nondelayed Banch o $8, $9,$10 Delayed Banch add $1,$2,$3 add $1,$2,$3 sub $4, $5,$6 beq $1, $4, Exit xo $10, $1,$11 sub $4, $5,$6 beq $1, $4, Exit o $8, $9,$10 xo $10, $1,$11 Exit: Exit: * 28

Things to Remembe (1/2) Optimal Pipeline Each stage is executing pat of an instuction each clock cycle. One instuction finishes duing each clock cycle. On aveage, execute fa moe quickly. What makes this wok? Similaities between instuctions allow us to use same stages fo all instuctions (geneally). Each stage takes about the same amount of time as all othes: little wasted time. * 29

Advanced Pipelining Concepts (if time) Out-of-ode Execution Supescala execution State-of-the-At Micopocesso * 30

Review Pipeline Hazad: Stall is dependency T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 30 30 bubble Time A depends on D; stall since folde tied up * 31

Out-of-Ode Laundy: Don t Wait T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 30 30 bubble Time A depends on D; est continue; need moe esouces to allow out-of-ode * 32

Supescala Laundy: Paallel pe stage T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 Time (light clothing) (dak clothing) (vey dity clothing) (light clothing) (dak clothing) (vey dity clothing) Moe esouces, HW to match mix of paallel tasks? * 33

Supescala Laundy: Mismatch Mix 12 2 AM 6 PM 7 8 9 10 11 1 T a s k O d e A B C D Time 3030 30 30 30 30 30 (light clothing) (light clothing) (dak clothing) (light clothing) Task mix undeutilizes exta esouces * 34

Compaq Alpha 21264 Vey simila instuction set to MIPS 1 64KB Instuction cache, 1 64 KB Data cache on chip; 16MB L2 cache off chip Clock cycle = 1.5 nanoseconds, o 667 MHz clock ate Supescala: fetch up to 6 instuctions /clock cycle, eties up to 4 instuction/clock cycle Execution out-of-ode 15 million tansistos, 90 watts! * 35

Things to Remembe (1/2) Optimal Pipeline Each stage is executing pat of an instuction each clock cycle. One instuction finishes duing each clock cycle. On aveage, execute fa moe quickly. What makes this wok? Similaities between instuctions allow us to use same stages fo all instuctions (geneally). Each stage takes about the same amount of time as all othes: little wasted time. * 36

Things to Remembe (2/2) Pipelining a Big Idea: widely used concept What makes it less than pefect? Stuctual hazads: two diffeent instuctions equie same hadwae Need moe HW esouces Contol hazads: need to woy about banch instuctions? Delayed banch Data hazads: an instuction depends on a pevious instuction? * 37