Wrong Path Events and Their Application to Early Misprediction Detection and Recovery

Similar documents
Understanding The Effects of Wrong-path Memory References on Processor Performance

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

Techniques for Efficient Processing in Runahead Execution Engines

Performance-Aware Speculation Control Using Wrong Path Usefulness Prediction. Chang Joo Lee Hyesoon Kim Onur Mutlu Yale N. Patt

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution

Branch statistics. 66% forward (i.e., slightly over 50% of total branches). Most often Not Taken 33% backward. Almost all Taken

Computer Architecture: Branch Prediction. Prof. Onur Mutlu Carnegie Mellon University

Execution-based Prediction Using Speculative Slices

Performance-Aware Speculation Control using Wrong Path Usefulness Prediction

November 7, 2014 Prediction

18-447: Computer Architecture Lecture 23: Tolerating Memory Latency II. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 4/18/2012

Instruction Level Parallelism (Branch Prediction)

Looking for Instruction Level Parallelism (ILP) Branch Prediction. Branch Prediction. Importance of Branch Prediction

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

Looking for Instruction Level Parallelism (ILP) Branch Prediction. Branch Prediction. Importance of Branch Prediction

15-740/ Computer Architecture Lecture 5: Precise Exceptions. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

15-740/ Computer Architecture Lecture 29: Control Flow II. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 11/30/11

15-740/ Computer Architecture Lecture 28: Prefetching III and Control Flow. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 11/28/11

High Performance Systems Group Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas

15-740/ Computer Architecture Lecture 16: Prefetching Wrap-up. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 14: Runahead Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/12/2011

Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars

Wide Instruction Fetch

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

Design of Digital Circuits Lecture 18: Branch Prediction. Prof. Onur Mutlu ETH Zurich Spring May 2018

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction

Dynamic Memory Dependence Predication

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

Use-Based Register Caching with Decoupled Indexing

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses. Onur Mutlu Hyesoon Kim Yale N.

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Fall 2011 Prof. Hyesoon Kim

Architectures for Instruction-Level Parallelism

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013

A Mechanism for Verifying Data Speculation

1993. (BP-2) (BP-5, BP-10) (BP-6, BP-10) (BP-7, BP-10) YAGS (BP-10) EECC722

An Efficient Indirect Branch Predictor

Lecture 13: Branch Prediction

COSC 6385 Computer Architecture Dynamic Branch Prediction

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Eric Rotenberg Karthik Sundaramoorthy, Zach Purser

Static Branch Prediction

Performance of tournament predictors In the last lecture, we saw the design of the tournament predictor used by the Alpha

Chapter 3 (CONT) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,2013 1

Computer Architecture, Fall 2010 Midterm Exam I

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

Computer Architecture: Out-of-Order Execution II. Prof. Onur Mutlu Carnegie Mellon University

Bottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt

Chapter 4. The Processor

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

Static Branch Prediction

Lecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Instruction Fetching based on Branch Predictor Confidence. Kai Da Zhao Younggyun Cho Han Lin 2014 Fall, CS752 Final Project

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II

15-740/ Computer Architecture

EE382A Lecture 5: Branch Prediction. Department of Electrical Engineering Stanford University

2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Old formulation of branch paths w/o prediction. bne $2,$3,foo subu $3,.. ld $2,..

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors. Moinuddin K. Qureshi Onur Mutlu Yale N.

HANDLING MEMORY OPS. Dynamically Scheduling Memory Ops. Loads and Stores. Loads and Stores. Loads and Stores. Memory Forwarding

DIVERGE-MERGE PROCESSOR: GENERALIZED AND ENERGY-EFFICIENT DYNAMIC PREDICATION

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Dynamic Branch Prediction

Hardware-Based Speculation

Full Datapath. Chapter 4 The Processor 2

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

ECE 571 Advanced Microprocessor-Based Design Lecture 7

Simultaneous Multithreading Processor

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Achieving Out-of-Order Performance with Almost In-Order Complexity

ECE 571 Advanced Microprocessor-Based Design Lecture 8

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

18-740/640 Computer Architecture Lecture 5: Advanced Branch Prediction. Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 9/16/2015

Control Hazards. Prediction

EE482: Advanced Computer Organization Lecture #3 Processor Architecture Stanford University Monday, 8 May Branch Prediction

Appendix A.2 (pg. A-21 A-26), Section 4.2, Section 3.4. Performance of Branch Prediction Schemes

Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching

15-740/ Computer Architecture Lecture 22: Superscalar Processing (II) Prof. Onur Mutlu Carnegie Mellon University

Copyright 2012, Elsevier Inc. All rights reserved.

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

Exploitation of instruction level parallelism

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Towards a More Efficient Trace Cache

Transcription:

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University of Texas at Austin

Motivation Branch predictors are not perfect. Branch misprediction penalty causes significant performance degradation. We would like to reduce the misprediction penalty by detecting the misprediction and initiating recovery before the mispredicted branch is resolved.

Branch Misprediction Resolution Latency

Performance Potential of Early Misprediction Recovery

An Observation and A Question In an out-of-order processor, some instructions are executed on the mispredicted path (wrong-path instructions). Is the behavior of wrong-path instructions different from the behavior of correct-path instructions? If so, we can use the difference in behavior for early misprediction detection and recovery.

Talk Outline Concept of Wrong Path Events Types of Wrong Path Events Experimental Evaluation Realistic Early Misprediction Recovery Shortcomings and Future Research Conclusions

What is a Wrong Path Event? An instance of illegal or unusual behavior that is more likely to occur on the wrong path than on the correct path. Wrong Path Event = WPE Probability (wrong path WPE) ~ 1

Why Does a WPE Occur? A wrong-path instruction may be executed before the mispredicted branch is executed. Because the mispredicted branch may be dependent on a long-latency instruction. The wrong-path instruction may consume a data value that is not properly initialized.

WPE Example from eon: NULL pointer dereference!"# $%%&&& '( )(

Beginning of the loop Array boundary i = 0 Array of pointers to structs x8abcd0 xeff8b0 x0 x0!"# $%%&&& '( )(

First iteration Array boundary i = 0 ptr = x8abcd0 Array of pointers to structs x8abcd0 xeff8b0 x0 x0!"# $%%&&& '( )(

First iteration Array boundary i = 0 ptr = x8abcd0 Array of pointers to structs x8abcd0 xeff8b0 x0 x0 * ptr!"# $%%&&& '( )(

Loop branch correctly predicted Array boundary i = 1 Array of pointers to structs x8abcd0 xeff8b0 x0 x0!"# $%%&&& '( )(

Second iteration Array boundary i = 1 ptr = xeff8b0 Array of pointers to structs x8abcd0 xeff8b0 x0 x0!"# $%%&&& '( )(

Second iteration Array boundary i = 1 ptr = xeff8b0 Array of pointers to structs x8abcd0 xeff8b0 x0 x0 * ptr!"# $%%&&& '( )(

Loop exit branch mispredicted Array boundary i = 2 Array of pointers to structs x8abcd0 xeff8b0 x0 x0!"# $%%&&& '( )(

Third iteration on wrong path Array boundary i = 2 ptr = 0 Array of pointers to structs x8abcd0 xeff8b0 x0 x0!"# $%%&&& '( )(

Wrong Path Event Array boundary i = 2 ptr = 0 Array of pointers to structs x8abcd0 xeff8b0 x0 x0 * ptr!"# $%%&&& '( )( NULL pointer dereference!

Types of WPEs Due to memory instructions NULL pointer dereference Write to read-only page Unaligned access (illegal in the Alpha ISA) Access to an address out of segment range Data access to code segment Multiple concurrent TLB misses

Types of WPEs (continued) Due to control-flow instructions Misprediction under misprediction If three branches are executed and resolved as mispredicts while there are older unresolved branches in the processor, it is almost certain that one of the older unresolved branches is mispredicted. Return address stack underflow Unaligned instruction fetch address (illegal in Alpha) Due to arithmetic instructions Some arithmetic exceptions e.g. Divide by zero

Experimental Evaluation

Two Empirical Questions 1. How often do WPEs occur? 2. When do WPEs occur on the wrong path?

Simulation Methodology Execution-driven Alpha ISA simulator Faithful modeling of wrong-path execution and wrong-path misprediction recoveries 8-wide out-of-order processor 256-entry instruction window Large and accurate branch predictors 64K-entry gshare and 64K-entry PAs hybrid 64K-entry target cache for indirect branches 64-entry return address stack Minimum 30-cycle branch misprediction penalty Minimum 500-cycle memory latency, 1 MB L2 cache SPEC 2000 integer benchmarks

Mispredictions Resulting in WPEs

Distribution of WPEs

When do WPEs Occur?

Early Misprediction Recovery Using WPEs

Early Misprediction Recovery Once a WPE is detected, there may be: no older unresolved branch in the window Do nothing (false alarm) one older unresolved branch in the window The processor initiates early recovery for that branch, guessing that it is mispredicted. multiple older unresolved branches in the window May be different instances of the same static branch. To initiate early misprediction recovery, we need a mechanism that decides which branch is the oldest mispredicted branch in the window.

A Realistic Recovery Mechanism The distance in the instruction window between the WPE-generating instruction and the oldest mispredicted branch is predictable. The first time a WPE is encountered, the processor records the distance between the WPE-generating instruction and the mispredicted branch. The next time the same instruction generates a WPE, the processor predicts that the branch that has the same distance to the WPE-generating instruction is mispredicted.

Ld C Generates a WPE Instr. SeqNo PC Br A1 10 PC A Br A2 20 PC A Ld C 40 PC C NULL pointer dereference

WPE is Recorded Instr. SeqNo PC Br A1 10 PC A Br A2 20 PC A WPE Record SeqNo PC Ld C 40 PC C 40 PC C

Br A1 Resolves as a Mispredict Instr. SeqNo PC Br A1 10 PC A Misprediction Br A2 20 PC A WPE Record SeqNo PC Ld C 40 PC C 40 PC C

Distance is Recorded in a Table Valid Distance PC C XOR 1 30 Global History

Ld C Again Generates a WPE Instr. SeqNo PC Br A1 25 PC A Br A2 35 PC A Ld C 55 PC C NULL pointer dereference

Distance Table is Accessed Valid Distance PC C Global History XOR 1 30

Br A1 is Predicted To Be a Mispredict Instr. SeqNo PC Br A1 25 PC A Br A2 35 PC A Distance = 30 Ld C 55 PC C

Three Issues in Realistic Recovery Recovery may be initiated for a correctly-predicted correct-path branch. Happens for 3.8% of WPE detections. A WPE may occur on the correct path! Happens rarely (0.05% of all WPEs). We see no recovery initiated in this case. Early recovery for indirect branches The processor needs to predict a new target address. Target addresses can also be recorded in the distance prediction table.

IPC Improvement

Shortcomings 1. WPEs do not occur frequently. Their coverage of mispredicted branches is low. 2. WPEs do not occur early enough. 3. Staying on the wrong path a few more cycles is sometimes more beneficial for performance than recovering early.

Future Research Directions Finding/generating new types of WPEs Can the compiler help? Better understanding of the differences between wrong path and correct path needed. Identifying what makes a particular wrong-path episode beneficial for performance Understanding the trade-offs related to the wrong path would be useful in deciding when to initiate early recovery. What else can WPEs be used for? Other kinds of speculation? Energy savings?

Conclusions Wrong-path instructions sometimes exhibit illegal and unusual behavior, called wrong path events (WPEs). WPEs can be used to guess that the processor is on the wrong path and to initiate early misprediction recovery. A realistic and accurate early misprediction recovery mechanism is described. Shortcomings of WPEs are discussed. Future research should focus on finding new WPEs.

Backup Slides

Related Work Glew proposed the use of bad memory addresses and illegal instructions as strong indicators of branch mispredictions [Wisc 00]. Jacobsen et al. explored branch confidence estimation to predict the likelihood that a branch is mispredicted [ISCA 96]. Manne et al. used branch confidence to gate the pipeline when there is a high likelihood that the processor is on the wrong path [ISCA 98]. Jiménez et al. proposed an overriding predictor scheme, where a more accurate slow predictor overrides the prediction made by a less accurate fast predictor [MICRO 00]. Falcón et al. proposed the use of predictions made for younger branches to re-evaluate the prediction made for an older branch [ISCA 04].

Contributions 1. A new idea in branch prediction. 2. We observe that wrong-path instructions exhibit illegal and unusual behavior, called wrong path events (WPEs). 3. We describe a novel mechanism to trigger early misprediction detection and recovery using WPEs. 4. We analyze and discuss the shortcomings of WPEs and propose new research areas to address these shortcomings.

Distribution of Cycles Between WPE and Branch Resolution

Distance Predictor Accuracy

WPE and Misprediction Rate