COSC 6385 Computer Architecture. - Pipelining

Similar documents
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

The Processor: Improving Performance Data Hazards

Introduction To Pipelining. Chapter Pipelining1 1

COSC 6385 Computer Architecture - Pipelining

Computer Science 141 Computing Hardware

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)

CSE4201. Computer Architecture

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

UCB CS61C : Machine Structures

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

ECE331: Hardware Organization and Design

Review from last lecture

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture

CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu

CENG 3420 Lecture 07: Pipeline

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

Lecture #22 Pipelining II, Cache I

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Computer Architecture Spring 2016

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Modern Computer Architecture

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly

Computer Architecture

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

ECE154A Introduction to Computer Architecture. Homework 4 solution

Chapter 4 The Processor 1. Chapter 4A. The Processor

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Basic Pipelining Concepts

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

HY425 Lecture 05: Branch Prediction

Chapter 4 The Processor 1. Chapter 4B. The Processor

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

(Basic) Processor Pipeline

MIPS An ISA for Pipelining

A Memory Efficient Array Architecture for Real-Time Motion Estimation

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Processor (II) - pipelining. Hwansoo Han

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Full Datapath. Chapter 4 The Processor 2

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Chapter 4. The Processor

ECE/CS 552: Pipeline Hazards

ECE331: Hardware Organization and Design

Pipelining. CSC Friday, November 6, 2015

Pipelining. Pipeline performance

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Lecture 2: Processor and Pipelining 1

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Pipeline Control Hazards and Instruction Variations

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

ECE232: Hardware Organization and Design

ECE 2300 Digital Logic & Computer Organization. Caches

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Computer Organization and Structure

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

Modeling a shared medium access node with QoS distinction

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

GCC-AVR Inline Assembler Cookbook Version 1.2

EE 6900: Interconnection Networks for HPC Systems Fall 2016

COMPUTER ORGANIZATION AND DESIGN

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

ECS 154B Computer Architecture II Spring 2009

ECE260: Fundamentals of Computer Engineering

Final Exam Fall 2007

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Chapter 4. The Processor

LECTURE 3: THE PROCESSOR

CSEE 3827: Fundamentals of Computer Systems

Advanced Computer Architecture Pipelining

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Very Simple MIPS Implementation

1.3 Multiplexing, Time-Switching, Point-to-Point versus Buses

Advanced Computer Architecture

CS 2506 Computer Organization II Test 2

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc

Transcription:

COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped in execution Split an expensive opeation into seveal subopeations Execute the sub-opeations in a staggeed manne Real wold analogy: assembly line in ca manufactuing Each station is doing something diffeent Each station woking on a sepaate ca Pipelining inceases the thoughput, but does not educe the latency of an opeation 1

Classes of instuctions instuctions Take eithe 2 egistes as opeands o 1 egiste and one 16bit immediate offset Results ae stoed in a 3 d egiste Load and stoe instuctions Banches and jumps Typical implementation of an instuction (I) 1. Instuction fetch cycle (IF): send PC to memoy Fetch cuent instuction Update PC to next sequential PC (+4 bytes) 2. Instuction decode/egiste fetch cycle (ID) Decode instuction egistes coesponding to egiste souce specifies fom egiste file Sign extend offset fields if needed Compute possible banch taget addess 2

Typical implementation of an instuction (II) 3. Execution /effective addess cycle (EX) adds base egiste and offset to fom effective addess o pefoms opeations on the values ead fom egiste file o pefoms opeation on value ead fom egiste and signextended immediate 4. Memoy access cycle (MEM) If instuction is a load, ead memoy using the effective addess computed in step 3 If instuction is a stoe, wite the data fom the second egiste ead of the egiste file to the effective addess 5. Wite-back cycle (WB) Wite esult into egiste file Fom memoy fo a load instuction Fom fo an instuction Typical implementation of an instuction (III) Instuction Fetch Next PC PC 4 Adde Memoy Inst Inst. Decode. Fetch Next SEQ PC RS1 RS2 RD Imm File Sign Extend Execute Add. Calc MUX MUX Zeo? Memoy Access MUX Memoy L M D Wite Back MUX WB 3

Details(I) Fetching instuctions and incementing pogam count (PC) 4 Adde PC addess Instuction Instuction memoy Details (II) instuctions, e.g. add R1, R2, R3 iste numbe input is 5 bit wide if you have 32(=2 5 ) egistes opeation contol signal (4 bits) iste numbes 5 5 5 egiste 1 egiste 2 Wite egiste iste file data 1 data 2 opeation 4 Zeo esult Wite Wite Wite contol signal 4

Details (III) Load/Stoe instuctions, e.g. LW R1,offset (R2) MemWite Addess Wite data memoy 16 32 Sign Extend Mem Basic steps fo a load/stoe opeation sign extend the offset fom 16 to 32 bit add the sign extended offset to R2 Load the content of the esulting addess into R1 o stoe the data fom R1 into the esulting memoy addess Details (IV) Combining Load/Stoe and instuctions opeation Instuction egiste 1 egiste 2 Wite egiste Wite iste file data 1 data 2 Wite sc 0 1 M U X 4 Addess data memoy Wite MemWite Memto 0 1 M U X 16 32 Sign Extend Mem 5

Details (V) Banches e.g. beq R1,R2,offset Basic steps fo a banch equal instuction compute banch taget addess sign extended offset field shift offset field by 2 bits in ode to ensue a wod offset add shifted, sign-extended offset to PC compae egistes R1 and R2 Details (VI) Implementation of banches, e.g. beq R1,R2,offset PC+4 fom instuction datapath Shift Left 2 Add Banch taget Instuction egiste 1 egiste 2 Wite egiste Wite iste file data 1 data 2 4 opeation To banch contol logic Wite 16 32 Sign Extend 6

Visualizing pipelining Time (clock ycles) I n s t. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 IF ID IF ID Mem WB Mem WB O d e IF ID IF ID Mem WB Mem WB Effects of pipelining A pipeline of depth n equies n-times the memoy bandwidth of a non-pipelined pocesso fo the same clock ate Sepaate data and instuction cache eliminates some memoy conflicts iste file is used in stage ID and in WB Usually not a conflict, since wite s ae executed in the fist half of the clock-cycle and ead s in the second half Instuctions in the pipeline should not attempt to use the same hadwae esouces at the same time Intoducing pipeline egistes between successive stages of the pipeline istes named afte the stages they connect (e.g. IF/ID, ID/, etc.) 7

Instuction Inst. Decode Fetch. Fetch Execute Add.Calc Memoy Access Wite Back Next PC Addess 4 Adde Memoy IF/ID Next SEQ PC RS1 RS2 File ID/EX Next SEQ PC MUX MUX Zeo? EX/MEM MUX Memoy MEM/WB MUX Imm Sign Extend RD RD RD Pipeline Hazads Limits to pipelining: Hazads pevent next instuction fom executing duing its designated clock cycle Stuctual hazads: HW cannot suppot this combination of instuctions hazads: Instuction depends on esult of pio instuction still in the pipeline Contol hazads: Caused by delay between the fetching of instuctions and decisions about changes in contol flow (banches and jumps). 8

One Memoy Pot/Stuctual Hazads Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 One Memoy Pot/Stuctual Hazads Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Stall Inst 3 Bubble Bubble Bubble Bubble Bubble 9

Hazad on R1 IF ID EX MEMWB I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11 Thee Geneic Hazads Afte Wite (RAW) Inst J ties to ead opeand befoe Inst I wites it I: add 1,2,3 J: sub 4,1,3 Caused by a Dependence (in compile nomenclatue). This hazad esults fom an actual need fo communication. 10

Thee Geneic Hazads Wite Afte (WAR) Inst J wites opeand befoe Inst I eads it I: sub 4,1,3 J: add 1,2,3 K: mul 6,1,7 Called an anti-dependence by compile wites. This esults fom euse of the name 1. Can t happen in ou 5 stage pipeline because: All instuctions take 5 stages, and s ae always in stage 2, and Wites ae always in stage 5 Thee Geneic Hazads Wite Afte Wite (WAW) Inst J wites opeand befoe Inst I wites it. I: sub 1,4,3 J: add 1,2,3 K: mul 6,1,7 Called an output dependence by compile wites This also esults fom the euse of name 1. Can t happen in 5 stage pipeline because: All instuctions take 5 stages, and Wites ae always in stage 5 11

I n s t. Fowading to Avoid Hazad Time (clock cycles) add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11 Hazad even with Fowading I n s t. lw 1, 0(2) sub 4,1,6 O d e and 6,1,7 o 8,1,9 12

Hazad Even with Fowading I n s t. lw 1, 0(2) sub 4,1,6 Bubble O d e and 6,1,7 o 8,1,9 Bubble Bubble Next PC Banches: Pipelined path Instuction Fetch 4 Adde Inst. Decode. Fetch Next SEQ PC RS1 Adde MUX Zeo? Execute Add. Calc Memoy Access Wite Back Addess Memoy IF/ID RS2 File ID/EX MUX EX/MEM Memoy MEM/WB MUX Imm Sign Extend RD RD RD WB 13

Fou Banch Hazad Altenatives #1: Stall until banch diection is clea #2: Pedict Banch Not Taken Execute successo instuctions in sequence Squash instuctions in pipeline if banch actually taken Advantage of late pipeline state update 47% banches not taken on aveage PC+4 aleady calculated, so use it to get next instuction #3: Pedict Banch Taken 53% banches taken on aveage But haven t calculated banch taget addess yet still incus 1 cycle banch penalty Othe machines: banch taget known befoe outcome Fou Banch Hazad Altenatives #4: Delayed Banch Define banch to take place AFTER a following instuction banch instuction sequential successo 1 sequential successo 2... sequential successo n banch taget if taken Banch delay of length n 1 slot delay allows pope decision and banch taget addess in 5 stage pipeline 14

Delayed Banch Whee to get instuctions to fill banch delay slot? Befoe banch instuction Fom the taget addess: only valuable when banch taken Fom fall though: only valuable when banch not taken Compile effectiveness fo single banch delay slot: Fills about 60% of banch delay slots About 80% of instuctions executed in banch delay slots useful in computation 15