Lecture 5: Pipelining Basics

Similar documents
Lecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4)

Lecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Lecture: Pipelining Basics

CS252 Prerequisite Quiz. Solutions Fall 2007

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Pipelining Review

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Lecture: Pipeline Wrap-Up and Static ILP

Pipeline Review. Review

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

DLX Unpipelined Implementation

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

ECEC 355: Pipelining

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Pipelining. CSC Friday, November 6, 2015

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Pipelining. Maurizio Palesi

Appendix C. Abdullah Muzahid CS 5513

EITF20: Computer Architecture Part2.2.1: Pipeline-1

COSC 6385 Computer Architecture - Pipelining

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

EITF20: Computer Architecture Part2.2.1: Pipeline-1

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

Computer System. Agenda

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月

ECE260: Fundamentals of Computer Engineering

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Computer System. Hiroaki Kobayashi 6/16/2010. Ver /16/2010 Computer Science 1

CS422 Computer Architecture

Pipelining. Pipeline performance

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Pipeline Architecture RISC

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

CIS 662: Midterm. 16 cycles, 6 stalls

CS / ECE 6810 Midterm Exam - Oct 21st 2008

Appendix A. Overview

Basic Pipelining Concepts

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Pipelining: Hazards Ver. Jan 14, 2014

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Chapter 4. The Processor

ECE 505 Computer Architecture

Module 4c: Pipelining

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Processor (II) - pipelining. Hwansoo Han

MIPS ISA AND PIPELINING OVERVIEW Appendix A and C

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Appendix C: Pipelining: Basic and Intermediate Concepts

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Hardware-based Speculation

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Full Datapath. Chapter 4 The Processor 2

Advanced Computer Architecture

Exercises (III) Output dependencies: Instruction 6 has output dependence with instruction 1 for access to R1

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Lecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

Computer Architecture CS372 Exam 3

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

ECE 486/586. Computer Architecture. Lecture # 12

Lecture 1 An Overview of High-Performance Computer Architecture. Automobile Factory (note: non-animated version)

Updated Exercises by Diana Franklin

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Lecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue

Course on Advanced Computer Architectures

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Computer Architecture

CSEE 3827: Fundamentals of Computer Systems

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

MIPS pipelined architecture

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Readings. H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152

Lecture 15: Pipelining. Spring 2018 Jason Tang

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

Transcription:

Lecture 5: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3) 1

The Assembly Line Unpipelined Start and finish a job before moving to the next Jobs Time A B C A B C Break the job into smaller stages A B C Pipelined A B C 2

Performance Improvements? Does it take longer to finish each individual job? Does it take shorter to finish a series of jobs? What assumptions were made while answering these questions? Is a 10-stage pipeline better than a 5-stage pipeline? 3

Quantitative Effects As a result of pipelining: Time in ns per instruction goes up Number of cycles per instruction goes up (note the increase in clock speed) Total execution time goes down, resulting in lower time per instruction Average cycles per instruction increases slightly Under ideal conditions, speedup = ratio of elapsed times between successive instruction completions = number of pipeline stages = increase in clock speed 4

A 5-Stage Pipeline 5

A 5-Stage Pipeline Use the PC to access the I-cache and increment PC by 4 6

A 5-Stage Pipeline Read registers, compare registers, compute branch target; for now, assume branches take 2 cyc (there is enough work that branches can easily take more) 7

A 5-Stage Pipeline ALU computation, effective address computation for load/store 8

A 5-Stage Pipeline Memory access to/from data cache, stores finish in 4 cycles 9

A 5-Stage Pipeline Write result of ALU computation or load into register file 10

Conflicts/Problems I-cache and D-cache are accessed in the same cycle it helps to implement them separately Registers are read and written in the same cycle easy to deal with if register read/write time equals cycle time/2 (else, use bypassing) Branch target changes only at the end of the second stage -- what do you do in the meantime? Data between stages get latched into registers (overhead that increases latency per instruction) 11

Hazards Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction Control hazard: fetch cannot continue because it does not know the outcome of an earlier branch special case of a data hazard separate category because they are treated in different ways 12

Structural Hazards Example: a unified instruction and data cache stage 4 (MEM) and stage 1 (IF) can never coincide The later instruction and all its successors are delayed until a cycle is found when the resource is free these are pipeline bubbles Structural hazards are easy to eliminate increase the number of resources (for example, implement a separate instruction and data cache) 13

Data Hazards Example: DADD R1, R2, R3 DSUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 XOR R10, R1, R11 14

Bypassing Some data hazard stalls can be eliminated: bypassing 15

Bypassing Example 2 DADD R1, R2, R3 LD R4, 0(R1) SD R4, 12(R1) 16

Data Hazard Stalls Not all data hazards can be eliminated! Example: LD R1, 0(R2) DSUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 Pipeline interlock: hardware that detects this condition and stalls the instruction 17

Control Hazards Simple techniques to handle control hazard stalls: for every branch, introduce a stall cycle (note: every 6 th instruction is a branch!) assume the branch is not taken and start fetching the next instruction if the branch is taken, need hardware to cancel the effect of the wrong-path instruction fetch the next instruction (branch delay slot) and execute it anyway if the instruction turns out to be on the correct path, useful work was done if the instruction turns out to be on the wrong path, hopefully program state is not lost 18

Branch Delay Slots Any ideas on how to deal with data hazards? 19

Slowdowns from Stalls Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num instructions) speedup = increase in clock speed = num pipeline stages With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes Total cycles = number of instructions + stall cycles Slowdown because of stalls = 1/ (1 + stall cycles per instr) 20

Pipeline Implementation Signals for the muxes have to be generated some of this can happen during ID Need look-up tables to identify situations that merit bypassing/stalling the number of inputs to the muxes goes up 21

Detecting Control Signals Situation No dependence Dependence requiring stall Dependence overcome by forwarding Example code LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R6, R7 LD R1, 45(R2) DADD R5, R1, R7 DSUB R8, R6, R7 OR R9, R6, R7 LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R1, R7 OR R9, R6, R7 Action No hazards Detect use of R1 during ID of DADD and stall Detect use of R1 during ID of DSUB and set mux control signal that accepts result from bypass path Dependence with accesses in order LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R1, R7 No action required 22

Summary Basic 5-stage pipeline Structural and data hazards: bypassing, stalling Control hazards: branch delay slots Next class: difficulties in pipelining, long latency operations, scoreboarding 23

Title Bullet 24