PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Similar documents
ILP: CONTROL FLOW. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

ILP: COMPILER-BASED TECHNIQUES

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1

(Basic) Processor Pipeline

Lecture 7 Pipelining. Peng Liu.

COSC 6385 Computer Architecture - Pipelining

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

are Softw Instruction Set Architecture Microarchitecture are rdw

Multi-cycle Instructions in the Pipeline (Floating Point)

Pipelining. CSC Friday, November 6, 2015

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

EITF20: Computer Architecture Part2.2.1: Pipeline-1

BRANCH PREDICTORS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Readings. H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152

Computer Architecture

HARDWARE SPECULATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

ECE154A Introduction to Computer Architecture. Homework 4 solution

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

CPE300: Digital System Architecture and Design

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining. Maurizio Palesi

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

Lecture 5: Pipelining Basics

Pipelining: Hazards Ver. Jan 14, 2014

Modern Computer Architecture

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Processor Design Pipelined Processor (II) Hung-Wei Tseng

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

Basic Pipelining Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

LECTURE 3: THE PROCESSOR

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Pipelining Review

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

CSEE 3827: Fundamentals of Computer Systems

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Chapter 4. The Processor

Processor: Superscalars Dynamic Scheduling

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

Appendix C. Abdullah Muzahid CS 5513

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

Full Datapath. Chapter 4 The Processor 2

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE 571 Advanced Microprocessor-Based Design Lecture 4

Mikko Lipasti Spring 2002 ECE/CS 552 : Introduction to Computer Architecture IN-CLASS MIDTERM EXAM March 14th, 2002

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

Advanced Computer Architecture

Processor (II) - pipelining. Hwansoo Han

What is Pipelining? RISC remainder (our assumptions)

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Full Datapath. Chapter 4 The Processor 2

Lecture 19: Instruction Level Parallelism

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

Pipelining is Hazardous!

COMPUTER ORGANIZATION AND DESIGN

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Thomas Polzer Institut für Technische Informatik

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

Instr. execution impl. view

ECEC 355: Pipelining

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Copyright 2012, Elsevier Inc. All rights reserved.

Static vs. Dynamic Scheduling

Processor Design Pipelined Processor. Hung-Wei Tseng

Announcements. EE382A Lecture 6: Register Renaming. Lecture 6 Outline. Dynamic Branch Prediction Using History. 1. Branch Prediction (epilog)

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

Transcription:

PIPELINING: HAZARDS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

Overview Announcement Homework 1 submission deadline: Jan. 30 th This lecture Impacts of pipelining on performance The MIPS five-stage pipeline Pipeline hazards n Structural hazards n Data hazards

Pipelining Technique Improving throughput at the expense of latency Delay: D = T + nδ Throughput: IPS = n/(t + nδ) Combinational Logic Critical Path Delay = 30 D = IPS = Combinational Logic Critical Path Delay = 15 Combinational Logic Critical Path Delay = 15 D = IPS = Comb. Logic Delay = 10 Comb. Logic Delay = 10 Comb. Logic Delay = 10 D = IPS =

Pipelining Technique Improving throughput at the expense of latency Delay: D = T + nδ Throughput: IPS = n/(t + nδ) Combinational Logic Critical Path Delay = 30 D = 31 IPS = 1/31 Combinational Logic Critical Path Delay = 15 Combinational Logic Critical Path Delay = 15 D = 32 IPS = 2/32 Comb. Logic Delay = 10 Comb. Logic Delay = 10 Comb. Logic Delay = 10 D = 33 IPS = 3/33

Pipelining Latency vs. Throughput Theoretical delay and throughput models for perfect pipelining Relative Performance 20 15 10 5 0 Delay (D) Throughput (IPS) 0 50 100 150 200 Number of Pipeline Stages

Five Stage MIPS Pipeline

Simple Five Stage Pipeline A pipelined load-store architecture that processes up to one instruction per cycle Write Back PC Inst. Memory Register File ALU Data Memory Inst. Fetch Inst. Decode Execute Memory

Instruction Fetch Read an instruction from memory (I-Cache) Use the program counter (PC) to index into the I- Memory Compute NPC by incrementing current PC n What about branches? Update pipeline registers Write the instruction into the pipeline registers

Instruction Fetch clock Branch Target clock PC + NPC NPC = PC + 4 Memory 4 Instruction Why increment by 4? Pipeline Register

Instruction Fetch clock P3 Branch Target clock PC + NPC NPC = PC + 4 Memory P1 4 P2 Instruction Why increment by 4? Critical Path = Max{P1, P2, P3} Pipeline Register

Instruction Decode Generate control signals for the opcode bits Read source operands from the register file (RF) Use the specifiers for indexing RF n How many read ports are required? Update pipeline registers Send the operand and immediate values to next stage Pass control signals and NPC to next stage

Instruction Decode target NPC NPC Instruction Register File decode reg reg ctrl Pipeline Register Pipeline Register

Execute Stage Perform ALU operation Compute the result of ALU n Operation type: control signals n First operand: contents of a register n Second operand: either a register or the immediate value Compute branch target n Target = NPC + immediate Update pipeline registers Control signals, branch target, ALU results, and destination

Execute Stage NPC + Target reg reg ctrl ALU ctrl reg Res Pipeline Register Pipeline Register

Memory Access Access data memory Load/store address: ALU outcome Control signals determine read or write access Update pipeline registers ALU results from execute Loaded data from D-Memory Destination register

Memory Access Target Res reg ctrl addr data Memory data ctrl Dat Res Pipeline Register Pipeline Register

Register Write Back Update register file Control signals determine if a register write is needed Only one write port is required n Write the ALU result to the destination register, or n Write the loaded data into the register file

Five Stage Pipeline Ideal pipeline: IPC=1 Is there enough resources to keep the pipeline stages busy all the time? Inst. Fetch Decode Execute Memory Writeback PC + + Mem 4 Reg. File ALU Mem Reg. File

Pipeline Hazards

Pipeline Hazards Structural hazards: multiple instructions compete for the same resource Data hazards: a dependent instruction cannot proceed because it needs a value that hasn t been produced Control hazards: the next instruction cannot be fetched because the outcome of an earlier branch is unknown

Structural Hazards 1. Unified memory for instruction and data R1ß Mem[R2] R3ß Mem[R20] R6ß R4-R5 R7ß R1+R0

Structural Hazards 1. Unified memory for instruction and data R1ß Mem[R2] R3ß Mem[R20] R6ß R4-R5 R7ß R1+R0 Separate inst. and data memories.

Structural Hazards 1. Unified memory for instruction and data 2. Register file with shared read/write access ports R1ß Mem[R2] R3ß Mem[R20] R6ß R4-R5 R7ß R1+R0

Structural Hazards 1. Unified memory for instruction and data 2. Register file with shared read/write access ports R1ß Mem[R2] R3ß Mem[R20] R6ß R4-R5 R7ß R1+R0 Register access in half cycles.

Data Hazards True dependence: read-after-write (RAW) Consumer has to wait for producer Loading data from memory. R1ß Mem[R2] R3ß R1+R0 R4ß R1-R3

Data Hazards True dependence: read-after-write (RAW) Consumer has to wait for producer Loaded data will be available two cycles later. R1ß Mem[R2] R3ß R1+R0 R4ß R1-R3

Data Hazards True dependence: read-after-write (RAW) Consumer has to wait for producer Inserting two bubbles. R1ß Mem[R2] Nothing Nothing R3ß R1+R0 R4ß R1-R3

Data Hazards True dependence: read-after-write (RAW) Consumer has to wait for producer Inserting single bubble + RF bypassing. R1ß Mem[R2] Nothing R3ß R1+R0 R4ß R1-R3 Load delay slot. SW vs. HW management?

Data Hazards True dependence: read-after-write (RAW) Consumer has to wait for producer Using the result of an ALU instruction. R1ß R2+R3 R5ß R1+R0 R3ß R1+R0 R4ß R1-R3

Data Hazards True dependence: read-after-write (RAW) Consumer has to wait for producer Using the result of an ALU instruction. R1ß R2+R3 R5ß R1+R0 R3ß R1+R0 R4ß R1-R3 Forwarding ALU result.

Data Hazards True dependence: read-after-write (RAW) Anti dependence: write-after-read (WAR) Write must wait for earlier read R1ß R2+R1 R2ß R8+R9

Data Hazards True dependence: read-after-write (RAW) Anti dependence: write-after-read (WAR) Write must wait for earlier read R1ß R2+R1 R2ß R8+R9 No WAR hazards in 5-stage pipeline!

Data Hazards True dependence: read-after-write (RAW) Anti dependence: write-after-read (WAR) Output dependence: write-after-write (WAW) Old writes must not overwrite the younger write R1ß R2+R3 R1ß R8+R9

Data Hazards True dependence: read-after-write (RAW) Anti dependence: write-after-read (WAR) Output dependence: write-after-write (WAW) Old writes must not overwrite the younger write R1ß R2+R3 R1ß R8+R9 No WAW hazards in 5-stage pipeline!

Data Hazards Forwarding with additional hardware

Data Hazards How to detect and resolve data hazards Show all of the data hazards in the code below R1ß Mem[R2] R2ß R1+R0 R1ß R1-R2 Mem[R3] ß R2

Data Hazards How to detect and resolve data hazards Show all of the data hazards in the code below R1ß Mem[R2] WAR WAW R2ß R1+R0 R1ß R1-R2 RAW Mem[R3] ß R2