CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Similar documents
Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20

COSC 6385 Computer Architecture. - Pipelining

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Review from last lecture

CSE4201. Computer Architecture

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

The Processor: Improving Performance Data Hazards

Computer Architecture

Modern Computer Architecture

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Introduction To Pipelining. Chapter Pipelining1 1

Computer Science 141 Computing Hardware

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)

COSC 6385 Computer Architecture - Pipelining

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

UCB CS61C : Machine Structures

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture

ECE331: Hardware Organization and Design

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu

Lecture 2: Processor and Pipelining 1

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

CENG 3420 Lecture 07: Pipeline

Pipelining: Hazards Ver. Jan 14, 2014

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

Computer Architecture Spring 2016

Pipeline Review. Review

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

Advanced Computer Architecture

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly

Advanced Computer Architecture Pipelining

Lecture #22 Pipelining II, Cache I

Appendix A. Overview

Appendix C: Pipelining: Basic and Intermediate Concepts

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

CSE 502 Graduate Computer Architecture. Lec 4-6 Performance + Instruction Pipelining Review

MIPS An ISA for Pipelining

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 1: Introduction

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism

Pipelining: Basic and Intermediate Concepts

ECE154A Introduction to Computer Architecture. Homework 4 solution

Chapter 4 The Processor 1. Chapter 4B. The Processor

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Outline. Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards. Handling exceptions Multi-cycle operations

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

CISC 662 Graduate Computer Architecture Lecture 5 - Pipeline. Pipelining. Pipelining the Idea. Similar to assembly line in a factory:

Computer Architecture and System

Computer Architecture and System

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

(Basic) Processor Pipeline

Chapter 4 The Processor 1. Chapter 4A. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

ELE 655 Microprocessor System Design

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CISC 662 Graduate Computer Architecture Lecture 5 - Pipeline Pipelining

Basic Pipelining Concepts

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue

Overview of Pipelining

DLX Unpipelined Implementation

COMP2611: Computer Organization. The Pipelined Processor

HY425 Lecture 05: Branch Prediction

Computer Architecture. Lecture 6.1: Fundamentals of

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Very Simple MIPS Implementation

Pipelining. Maurizio Palesi

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

The Big Picture Problem Focus S re r g X r eg A d, M lt2 Sub u, Shi h ft Mac2 M l u t l 1 Mac1 Mac Performance Focus Gate Source Drain BOX

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

ECE232: Hardware Organization and Design

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Computer Systems Architecture Spring 2016

ISA. CSE 4201 Minas E. Spetsakis based on Computer Architecture by Hennessy and Patterson

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Advanced Computer Architecture

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

ECS 154B Computer Architecture II Spring 2009

ECE/CS 552: Pipeline Hazards

Transcription:

CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue, 4th edition ---- Additional teaching mateial fom: Jelena Mikovic (U Del) and John Kubiatowicz (UC Bekeley)

Pipelining is not quite that easy! Limits to pipelining: Hazads pevent next instuction fom executing duing its designated clock cycle Stuctual hazads: HW cannot suppot this combination of instuctions (single peson to fold and put clothes away) Data hazads: Instuction depends on esult of pio instuction still in the pipeline (missing sock) Contol hazads: Caused by delay between the fetching of instuctions and decisions about changes in contol flow (banches and jumps). 2

One Memoy Pot/Stuctual Hazads Figue A.4, Page A-14 Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 3

One Memoy Pot/Stuctual Hazads (Simila to Figue A.5, Page A-15) Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Stall Inst 3 Bubble Bubble Bubble Bubble Bubble How do you bubble the pipe? 4

Speed Up Equation fo Pipelining CPI pipelined = Ideal CPI + Aveage Stall cycles pe Inst Ideal CPI Pipeline depth Speedup = Ideal CPI + Pipeline stall CPI Cycle Cycle Time Time unpipelined pipelined Fo simple RISC pipeline, CPI = 1: Pipeline depth Speedup = 1 + Pipeline stall CPI Cycle Cycle Time Time unpipelined pipelined 5

Example: Dual-pot vs. Single-pot Machine A: Dual poted memoy ( Havad Achitectue ) Machine B: Single poted memoy, but its pipelined implementation has a 1.05 times faste clock ate Ideal CPI = 1 fo both Loads ae 40% of instuctions executed SpeedUp A = Pipeline Depth/(1 + 0) x (clock unpipe /clock pipe ) = Pipeline Depth SpeedUp B = Pipeline Depth/(1 + 0.4 x 1) x (clock unpipe /(clock unpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUp A / SpeedUp B = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33 Machine A is 1.33 times faste 6

Data Hazad on R1 Figue A.6, Page A-17 Time (clock cycles) IF ID/RF EX MEM WB I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11 7

Thee Geneic Data Hazads Read Afte Wite (RAW) Inst J ties to ead opeand befoe Inst I wites it I: add 1,2,3 J: sub 4,1,3 Caused by a Dependence (in compile nomenclatue). This hazad esults fom an actual need fo communication. 8

Thee Geneic Data Hazads Wite Afte Read (WAR) Inst J wites opeand befoe Inst I eads it I: sub 4,1,3 J: add 1,2,3 K: mul 6,1,7 Called an anti-dependence by compile wites. This esults fom euse of the name 1. Can t happen in MIPS 5 stage pipeline because: All instuctions take 5 stages, and Reads ae always in stage 2, and Wites ae always in stage 5 9

Thee Geneic Data Hazads Wite Afte Wite (WAW) Inst J wites opeand befoe Inst I wites it. I: sub 1,4,3 J: add 1,2,3 K: mul 6,1,7 Called an output dependence by compile wites This also esults fom the euse of name 1. Can t happen in MIPS 5 stage pipeline because: All instuctions take 5 stages, and Wites ae always in stage 5 Will see WAR and WAW in moe complicated pipes 10

Fowading to Avoid Data Hazad Figue A.7, Page A-19 Time (clock cycles) I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11 11

HW Change fo Fowading Figue A.23, Page A-37 NextPC istes ID/EX mux mux EX/MEM Data Memoy MEM/WR Immediate mux What cicuit detects and esolves this hazad? 12

Pipeline Contol Pass contol signals along just like the data Execution/Addess Calculation stage contol lines Memoy access stage contol lines Wite-back stage contol lines Instuction Dst Op1 Op0 Sc Banch Mem Read Mem Wite wite Mem to R-fomat 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X WB Instuction Contol M WB EX M WB IF/ID ID/EX EX/MEM MEM/WB 13

Datapath with Contol 14

Fowading to Avoid LW-SW Data Hazad Figue A.8, Page A-20 Time (clock cycles) I n s t. add 1,2,3 lw 4, 0(1) O d e sw 4,12(1) o 8,6,9 xo 10,9,11 15

Data Hazad Even with Fowading Figue A.9, Page A-21 Time (clock cycles) I n s t. lw 1, 0(2) sub 4,1,6 O d e and 6,1,7 o 8,1,9 16

Data Hazad Even with Fowading (Simila to Figue A.10, Page A-21) Time (clock cycles) I n s t. O d e lw 1, 0(2) sub 4,1,6 and 6,1,7 Bubble Bubble o 8,1,9 Bubble How is this detected? 17

Softwae Scheduling to Avoid Load Hazads Ty poducing fast code fo a = b + c; d = e f; assuming a, b, c, d,e, and f in memoy. Slow code: LW LW ADD SW LW LW SUB Rb,b Rc,c Ra,Rb,Rc a,ra Re,e Rf,f Rd,Re,Rf SW d,rd Fast code: LW LW LW ADD LW SW SUB Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf SW d,rd Compile optimizes fo pefomance. Hadwae checks fo safety. 18

Contol Hazad 19

Contol Hazad on Banches Thee Stage Stall 10: beq 1,3,36 14: and 2,3,5 18: o 6,1,7 22: add 8,1,9 36: xo 10,1,11 What do you do with the 3 instuctions in between? How do you do it? Whee is the commit? 20

Banch Stall Impact If CPI = 1, 30% banch, Stall 3 cycles => new CPI = 1.9! Two pat solution: Detemine banch taken o not soone, AND Compute taken banch addess ealie MIPS banch tests if egiste = 0 o 0 MIPS Solution: Move Zeo test to ID/RF stage Adde to calculate new PC in ID/RF stage 1 clock cycle penalty fo banch vesus 3 21

Pipelined MIPS Datapath Figue A.24, page A-38 Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Next PC 4 Adde Next SEQ PC Adde RS1 MUX Zeo? Addess Memoy IF/ID RS2 File ID/EX MUX EX/MEM Data Memoy MEM/WB MUX Imm Sign Extend RD RD RD WB Data Inteplay of instuction set design and cycle time. 22

Fou Banch Hazad Altenatives Static altenatives: fixed fo each banch duing the entie execution #1: Stall until banch diection is clea #2: Pedict Banch Not Taken Execute successo instuctions in sequence Squash instuctions in pipeline if banch actually taken Advantage of late pipeline state update 47% MIPS banches not taken on aveage PC+4 aleady calculated, so use it to get next instuction #3: Pedict Banch Taken 53% MIPS banches taken on aveage But haven t calculated banch taget addess in MIPS» MIPS still incus 1 cycle banch penalty» Othe machines: banch taget known befoe outcome 23

Fou Banch Hazad Altenatives #4: Delayed Banch Define banch to take place AFTER a following instuction banch instuction sequential successo 1 sequential successo 2... sequential successo n banch taget if taken Banch delay of length n 1 slot delay allows pope decision and banch taget addess in 5 stage pipeline MIPS uses this 24

Scheduling Banch Delay Slots (Fig A.14) A. Fom befoe banch B. Fom banch taget C. Fom fall though add $1,$2,$3 if $2=0 then delay slot sub $4,$5,$6 add $1,$2,$3 if $1=0 then delay slot add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes becomes becomes add $1,$2,$3 if $2=0 then if $1=0 then add $1,$2,$3 add $1,$2,$3 if $1=0 then sub $4,$5,$6 sub $4,$5,$6 A is the best choice, fills delay slot & educes instuction count (IC) In B, the sub instuction may need to be copied, inceasing IC In B and C, must be okay to execute sub when banch fails 25

Delayed Banch Compile effectiveness fo single banch delay slot: Fills about 60% of banch delay slots About 80% of instuctions executed in banch delay slots useful in computation About 50% (60% x 80%) of slots usefully filled Delayed Banch downside: As pocesso go to deepe pipelines and multiple issue, the banch delay gows and need moe than one delay slot Delayed banching has lost populaity compaed to moe expensive but moe flexible dynamic appoaches Gowth in available tansistos has made dynamic appoaches elatively cheape 26

Evaluating Banch Altenatives Pipeline speedup = Pipeline depth 1 +Banch fequency Banch penalty Assume 4% unconditional banch, 6% conditional banchuntaken, 10% conditional banch-taken Scheduling Banch CPI speedup v. speedup v. scheme penalty unpipelined stall Stall pipeline 3 1.60 3.1 1.0 Pedict taken 1 1.20 4.2 1.33 Pedict not taken 1 1.14 4.4 1.40 Delayed banch 0.5 1.10 4.5 1.45 27