Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Similar documents
Instruction Pipelining Review

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

ECE 486/586. Computer Architecture. Lecture # 12

There are different characteristics for exceptions. They are as follows:

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

LECTURE 10. Pipelining: Advanced ILP

Appendix C: Pipelining: Basic and Intermediate Concepts

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

COSC 6385 Computer Architecture - Pipelining

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

LECTURE 3: THE PROCESSOR

Announcement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Instr. execution impl. view

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Anti-Inspiration. It s true hard work never killed anybody, but I figure, why take the chance? Ronald Reagan, US President

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

ECEC 355: Pipelining

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

Basic Pipelining Concepts

Computer Architecture Spring 2016

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

CMSC 611: Advanced Computer Architecture

ECE 505 Computer Architecture

ELE 655 Microprocessor System Design

(Basic) Processor Pipeline

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Advanced Computer Architecture

Full Datapath. Chapter 4 The Processor 2

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Lecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Multi-cycle Instructions in the Pipeline (Floating Point)

COMP 4211 Seminar Presentation

ECE154A Introduction to Computer Architecture. Homework 4 solution

HY425 Lecture 05: Branch Prediction

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Computer Architecture

Pipelining: Hazards Ver. Jan 14, 2014

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESI

COSC 6385 Computer Architecture - Pipelining (II)

Chapter3 Pipelining: Basic Concepts

Full Datapath. Chapter 4 The Processor 2

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Instruction Pipelining

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Appendix A. Overview

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

ECE/CS 552: Pipeline Hazards

Processor (II) - pipelining. Hwansoo Han

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

MIPS An ISA for Pipelining

Instruction Pipelining

DLX Unpipelined Implementation

Thomas Polzer Institut für Technische Informatik

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Modern Computer Architecture

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CSEE 3827: Fundamentals of Computer Systems

Computer Systems Architecture I. CSE 560M Lecture 5 Prof. Patrick Crowley

Chapter 4. The Processor

Chapter 4. The Processor

ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS 2506 Computer Organization II Test 2

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

ECE232: Hardware Organization and Design

COSC4201. Prof. Mokhtar Aboelaze York University

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

ECE260: Fundamentals of Computer Engineering

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

COMPUTER ORGANIZATION AND DESIGN

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Lecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4)

Floating Point/Multicycle Pipelining in DLX

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Lecture: Pipeline Wrap-Up and Static ILP

Chapter 4 The Processor 1. Chapter 4B. The Processor

Advanced issues in pipelining

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Improve performance by increasing instruction throughput

Transcription:

Chapter 3. Pipelining EE511 In-Cheol Park, KAIST

Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup = # of stages

IF (Instruction fetch) ID (Instruction decode / Register fetch) EX (Execution / Effective address) MEM (Memory access / Branch completion) WB (Write-back) IF/ID ID/EX EX/MEM MEM/WB IF ID EX MEM WB

Hazards Prevent the next instruction from executing during its desired clock cycle pipeline stall Earlier instructions must continue, while later instructions are stalled Classes Structural Hazards Resource conflicts Data Hazards Data dependency Control Hazards Caused by instructions changing the PC such as branches

Functional unit conflict For example, not fully pipelined FU Memory access conflict For example, MEM and IF Solutions: Separate I$ and D$ Dual-port memory On-chip I$ Instruction queue Register file access conflict For example, ID and WB Solutions: Multi-port register file Time multiplexed R/W access

Data hazard classification RAW (True data dependency) Internal data forwarding To reduce forwarding logic, register file write is done before read Instruction scheduling Rearranges code sequence Delayed Load Insert a NOP if there is no proper instruction to be inserted into the delay slot WAR (Anti-dependency), WAW (Output dependency) Register renaming

Pipeline stalls until the new PC is available Branch delay Turns into a branch penalty Solutions Pipeline stalls when we find a branch instruction Fill with NOPs Rearrange code sequence Delayed branch To reduce the branch penalty Compute the branch instructions as early as possible Target, taken/not-taken Delayed branch / Squashed branch / Annulled branch Branch prediction

Static prediction at compile time Predict taken or predict not-taken as a whole Predict on the basis of branch direction Backward-going taken Forward-going not taken Profile-based prediction: Branch prediction for each individual branch instruction Individual branch instructions are highly biased Introduce a prediction bit in the instruction format Dynamic Prediction at run time

Exception / Interrupt Synchronous / Asynchronous User requested / Coerced User maskable / Nonmaskable Within / Between instructions Resume / Terminate Restartability almost all machines support Precise exception Restarting Execution 1. Force a trap instruction into the pipeline on the next IF 2. Turn off all writes for the faulting instructions and all following instructions in the pipeline, but not the preceding instructions 3. Save the PC of the faulting instruction

Initiation interval = repeat interval The number of cycles that must elapse between issuing two operations of a given type Latency The number of intervening cycles between an instruction that produces a result and an instruction that uses the result # of EX stages 1 Problems in longer latency pipelines Structural hazards Multiple register writes Stall before it issues Stall a conflicting instruction when it tries to enter the MEM stage

WAW hazards no longer reach WB in order Delay the issue of the later instruction Stamp out the result of the former instruction so that the instruction does not write its result Instructions can complete in an order different from that of issued (outof-order completion) Leads to imprecise exceptions RAW hazards are more frequent

Precise / Imprecise Precise exceptions Exception is checked at the WB stage Hardware posts all exceptions in a status vector which is carried along as the instruction goes down the pipeline Once an exception indication is set in the status vector, all writes are turned off

Ignore the problem and settle for imprecise exceptions Two operating modes Fast but imprecise / slower but precise Buffer the results of an instruction until all the instruction that were issued earlier are complete History file / future file Smith and Plezskun, Implementing precise interrupts in pipelined processors, IEEE Trans. Computes, 1988 Keep enough information so that the trap-handling routines can create a precise sequence for the exception Hwu and Patt, Check-point repair for out-of-order execution machines, ISCA 1987 Allows the instruction issue only if it is certain that all the instructions before the issuing instruction will complete without causing an exception

Variable instruction lengths and running times can lead to imbalance among pipeline stages Sometimes justify the added complexity cache Sophisticated addressing modes can complicate pipeline control and make it difficult to keep the pipeline flowing Writes into instruction space (self-modifying code) can cause trouble for pipelining Implicitly set condition codes increase the difficulty of finding when a branch has been decided and the difficulty of scheduling branch delays

Deeper integer pipeline 8 stages IF IS RF EX DF DS TC WB IF : First half of instruction fetch IS : Second half of instruction fetch, complete I$ access DF : First half of D$ access DS : Second half of data fetch, completion of D$ access TC : Tag check, determine whether the D$ access hit Two cycle load delay 3 cycle branch delay Single cycle delayed branch Predict-not-taken for the remaining two cycles If taken, two idle cycles

Pitfall: Unexpected execution sequences may cause unexpected hazards Pitfall: Extensive pipelining can impact other aspects of a design, leading to overall worse cost/performance Fallacy: Increasing the number of pipeline stages always increases performance Pitfall: Evaluating a compile-time scheduler on the basis of unoptimized code