Lecture 15 Pipelining & ILP Instructor: Prof. Falsafi

Similar documents
EECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018

15-740/ Computer Architecture Lecture 5: Precise Exceptions. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013

Advanced issues in pipelining

EECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018

Wide Instruction Fetch

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

CSE502: Computer Architecture CSE 502: Computer Architecture

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

To read more. CS 6354: Homework 1 Related / Precise Interrupts. Homework 1. On paper reviews

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods

EECS 470 Lecture 7. Branches: Address prediction and recovery (And interrupt recovery too.)

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units


CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches

The memory unit has three sets of reservation stations, not one:

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

Pipelining to Superscalar

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Precise Exceptions and Out-of-Order Execution. Samira Khan

CS152 Computer Architecture and Engineering. Complex Pipelines

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming

Dynamic Scheduling. Chapter 2: Instruction Level Parallelism. Dynamic Scheduling. Dynamic Scheduling

Chapter. Out of order Execution

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

ECE 4750 Computer Architecture, Fall 2017 T10 Advanced Processors: Out-of-Order Execution

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

POLITECNICO DI MILANO. Exception handling. Donatella Sciuto:

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

CSE502: Computer Architecture CSE 502: Computer Architecture

Hardware-based Speculation

E0-243: Computer Architecture

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

15-740/ Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University

Multiple Instruction Issue and Hardware Based Speculation

Computer Systems Architecture I. CSE 560M Lecture 5 Prof. Patrick Crowley

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

EECS 470 Lecture 6. Branches: Address prediction and recovery (And interrupt recovery too.)

ECE/CS 552: Pipeline Hazards

Computer Architecture Spring 2016

LSU EE 4720 Dynamic Scheduling Study Guide Fall David M. Koppelman. 1.1 Introduction. 1.2 Summary of Dynamic Scheduling Method 3

Lecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue

CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Advanced Computer Architecture

University of Toronto Faculty of Applied Science and Engineering

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

exam length Exam Review 1 exam focus exam format

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EECS 470. Lecture 14 Advanced Caches. DEC Alpha. Fall Jon Beaumont

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

Eric Rotenberg Jim Smith North Carolina State University Department of Electrical and Computer Engineering

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011

Project #3: Dynamic Instruction Scheduling. Due: Wednesday, Dec. 4, 11:00 PM

Appendix C: Pipelining: Basic and Intermediate Concepts

Out of Order Processing

EECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont

Speculation and Future-Generation Computer Architecture

HARDWARE SPECULATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Chapter 06: Instruction Pipelining and Parallel Processing

Tutorial 11. Final Exam Review

Computer Science 146. Computer Architecture

ROB: head/tail. exercise: result of processing rest? 2. rename map (for next rename) log. phys. free list: X11, X3. PC log. reg prev.

Superscalar Organization

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

ECE/CS 552: Introduction to Superscalar Processors

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Instr. execution impl. view

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

The Processor: Improving the performance - Control Hazards

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism

Portland State University ECE 587/687. Superscalar Issue Logic

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Multithreaded Processors. Department of Electrical Engineering Stanford University

The Nios II Family of Configurable Soft-core Processors

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Chapter 4 The Processor 1. Chapter 4D. The Processor

Handout 2 ILP: Part B

Computer Architecture EE 4720 Final Examination

Lecture 11: Out-of-order Processors. Topics: more ooo design details, timing, load-store queue

Processor: Superscalars Dynamic Scheduling

Topics. Digital Systems Architecture EECE EECE Predication, Prediction, and Speculation

Transcription:

18-547 Lecture 15 Pipelining & ILP Instructor: Prof. Falsafi Electrical & Computer Engineering Carnegie Mellon University Slides developed by Profs. Hill, Wood, Sohi, and Smith of University of Wisconsin Madison 1 Homework 4 posted => due Monday Project Status report next week 15-minute meetings during office hours Will extend discussion on a per-project basis Preliminary results done coding done selecting benchmarks plot numbers (distributions, performance, prediction coverage) 2 1

Imprecise interrupts - ignore the problem! makes page faults (any restartable exceptions) hard IEEE standard strongly suggests precise interrupts In-order completion - stall pipe if necessary Software clean-up - save info for trap handlers implementation dependent Hardware clean-up - restore to consistent state re-order instructions at writeback stage complete out-of-order, commit in-order 3 Jim Smith and Andy Pleszkun An instruction taking i cycles reserves stage i when it issues stalls if stage i is already taken (tries for i+1) Shift down each cycle Instruction in stage 1 gets result bus Result shift register (RSR) control result bus - at most one completion per cycle 4 2

Reserve preceding stages an instrn taking j stages reserves all available i<j stages prevents later (sequentially) instr from completing before it Must synchronize stores wait for RSR to empty put dummy store in RSR What if store causes exception? 5 Instructions complete out-of-order Re-order buffer reorganizes instructions Modify state (writeback) in-order Add tag to RSR to point to corresponding ROB entry 6 3

On dispatch allocate ROB entry and RSR entry make RSR point to ROB On completion put result in ROB entry pointed to by RSR tag mark ROB entry valid When head of ROB is valid, commit Hardware update register and exceptions for performance reasons need bypass paths from ROB implies a lot of associative compares (what is this?) 7 Instead of reorder buffer instructions complete out-of-order register file updated immediately hardware cleans-up on exception History buffer (HB) logs previous value of register used to undo instructions 8 4

On issue, copy current value of dest reg to HB On completion, mark HB entry valid On exception, mark exception bit When head of HB contains an entry marked valid reuse entry (increment head) 9 When an entry has exception Stall issue, wait for pipe to empty Roll back register file from tail to head PC at head is precise PC Hardware 3 read ports bypassing is slightly more complex (what?) 10 5

Separate register file Architectural file updated in-order Future file updated out-of-order Re-order buffer controls updates to architecture file 11 Future file is managed like an ordinary imprecise pipeline Architecture file managed like re-order buffer scheme when head entry valid On exception architecture file contains precise state Advantages state saving is easy no extra bypassing 12 6

Simulations using livermore loops Compare three options in-order completion reorder buffer reorder buffer with bypass same as future file and history file Examine both ways of handling stores blocking and non-blocking Relative performance: Time precise interrupts / Time imprecise interrupts 13 In-order completion 16%-23% increase in exec time ROB with 10 entries reduces it to 12%-18% ROB with 10 entries with bypasses reduces it to 3%!!! Must allow out-of-order execution 14 7

RSR is not the only way to control the result bus RSR will work only if timing is known apriori (caches?) Could have bus arbitration to decide who gets the bus Arbiter will be requested as and when needed and not apriori The key for precise state is reorder buffer (or the history buffer) complete out of order (overlap execution of instrs in buffer) commit in order consider interrupts of ONLY instruction at commit point if committing instr interrupts, squash all later instructions 15 8