Speculative Versioning Cache: Unifying Speculation and Coherence

Similar documents
Speculative Versioning Cache

CS533: Speculative Parallelization (Thread-Level Speculation)

Complexity Analysis of A Cache Controller for Speculative Multithreading Chip Multiprocessors

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

SPECULATIVE MULTITHREADED ARCHITECTURES

Multi-Version Caches for Multiscalar Processors. Manoj Franklin. Clemson University. 221-C Riggs Hall, Clemson, SC , USA

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Complexity Analysis of Cache Mechanisms for Speculative Multithreading Chip Multiprocessors

Annotated Memory References: A Mechanism for Informed Cache Management

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Computer Science 146. Computer Architecture

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)

Data Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun

Portland State University ECE 588/688. IBM Power4 System Microarchitecture

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

Computer Architecture

Lecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

EECS 570 Final Exam - SOLUTIONS Winter 2015

Multiple Instruction Issue and Hardware Based Speculation

Chapter 5. Multiprocessors and Thread-Level Parallelism

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

Eric Rotenberg Jim Smith North Carolina State University Department of Electrical and Computer Engineering

Multiprocessors & Thread Level Parallelism

Task Selection for a Multiscalar Processor

Portland State University ECE 588/688. Cray-1 and Cray T3E

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Limitations of Scalar Pipelines

Thread-level Parallelism. Synchronization. Explicit multithreading Implicit multithreading Redundant multithreading Summary

Computer Architecture EE 4720 Final Examination

Speculative Multithreaded Processors

Computer Architecture

Execution-based Prediction Using Speculative Slices

CSE502: Computer Architecture CSE 502: Computer Architecture

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

Lecture 15 Pipelining & ILP Instructor: Prof. Falsafi

Lecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue

EE 660: Computer Architecture Superscalar Techniques

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

A Dynamic Approach to Improve the Accuracy of Data Speculation

Multiprocessors 1. Outline

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

A Chip-Multiprocessor Architecture with Speculative Multithreading

Memory Hierarchy in a Multiprocessor

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

Limiting the Number of Dirty Cache Lines

Bus-Based Coherent Multiprocessors

ECE 485/585 Microprocessor System Design

Computer Science 146. Computer Architecture

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins

Modern CPU Architectures

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

CSC 631: High-Performance Computer Architecture

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

COMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Systems Architecture

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Page 1. Cache Coherence

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Supertask Successor. Predictor. Global Sequencer. Super PE 0 Super PE 1. Value. Predictor. (Optional) Interconnect ARB. Data Cache

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I

Handout 2 ILP: Part B

Token Coherence. Milo M. K. Martin Dissertation Defense

h Coherence Controllers

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Lecture-13 (ROB and Multi-threading) CS422-Spring

Speculative Execution via Address Prediction and Data Prefetching

A Dynamic Multithreading Processor

Characterization of Silent Stores

Wide Instruction Fetch

Advanced Parallel Programming I

Chapter 5. Multiprocessors and Thread-Level Parallelism

1. Memory technology & Hierarchy

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

ECE/CS 552: Introduction to Superscalar Processors

CMP Support for Large and Dependent Speculative Threads

Handout 3 Multiprocessor and thread level parallelism

CHAPTER 15. Exploiting Load/Store Parallelism via Memory Dependence Prediction

EC 513 Computer Architecture

Getting CPI under 1: Outline

Transient-Fault Recovery Using Simultaneous Multithreading

Transcription:

Speculative Versioning Cache: Unifying Speculation and Coherence Sridhar Gopal T.N. Vijaykumar, Jim Smith, Guri Sohi Multiscalar Project Computer Sciences Department University of Wisconsin, Madison Electrical and Computer Engineering, Purdue University

Motivation Enable a single hardware platform to support... P P P P SV$ SV$ SV$ SV$ 1. SMP execution model - explicitly parallel programs Private L1 coherent caches for high performance 2. Hierarchical execution model - sequential programs Multiscalar, Trace Processors, Agassiz, Hydra, Stampede, WarpEngine Memory Renaming or Speculative Versioning required SVC = CACHE COHERENCE + SPECULATIVE VERSIONING Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 2

Outline Motivation Hierarchical Execution Model Hierarchical Execution Multiscalar Speculative Versioning Unifying Speculation and Coherence Speculative Versioning Cache (SVC) Address Resolution Buffer (ARB) Performance Evaluation Conclusions Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 3

Hierarchical Execution Model Extract Parallelism in Sequential Programs 1. Using Speculation 2. Using Multiple Processors Group instructions into tasks, traces or speculation regions Employ task level speculation Sequential Execution Hierarchical Execution A B A, B: Tasks Predict Predict Predict A B Speculatively execute A and B in parallel Dependences between A and B? Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 4

Hierarchical Execution: Multiscalar Higher Level: Tasks Incorrect control prediction: squash task state and re-execute Task completed: commit task state sequentially Lower Level: Instructions Register dependences: Hardware + Software Memory dependences: Hierarchical hardware Intra-task: Load Store Queue in each processor Inter-task: SVC or ARB Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 5

Outline Motivation Hierarchical Execution Model Speculative Versioning Problem: Guarantee sequential program semantics Example execution orders Key requirements of a solution Unifying Speculation and Coherence Speculative Versioning Cache (SVC) Address Resolution Buffer (ARB) Performance Evaluation Conclusions Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 6

Speculative Versioning: Problem Program Order Tasks execute in parallel T 0 T 1 T2 T 3 Time Locally ordered load/store streams Load/Store Order seen by memory system DIFFERENT Address: Accesses can be reordered SAME ADDRESS: WHICH ORDERS ARE ALLOWED? Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 7

Execution Order: Examples Program Order S L S L Ld/St to the same address T0 T1 T2 T 3 Time S L S L Execution Orders S L S L No action! S S L L Out of order loads okay!? S S L L Buffer green store, Supply correct value S L L S Blue load is incorrect!? S S L L Stores out of order, Write back in order CANNOT REORDER DEPENDENT LOADS AND STORES Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 8

Speculative Versioning: Solution Definition: Version of a location Each store creates a new version What should a solution provide? Buffer multiple versions and track order Supply the correct version for a load Detect incorrect loads Write back versions in order Definition: Version Ordering List (VOL) of a location Execution order of loads and stores DIRECTORY OF VERSION ORDERING LISTS Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 9

Speculative Versioning Unification Multiple copies of multiple speculative versions Execution order tracked using an ordered list Cache Coherence Multiple Reader Multiple Writer Protocol Multiple copies of a single version Sharers tracked using an unordered list Multiple Reader Single Writer Protocol SVC = CACHE COHERENCE + SPECULATIVE VERSIONING Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 10

Outline Motivation Hierarchical Execution Model Speculative Versioning Speculative Versioning Cache (SVC) Simple Cache Coherence Protocol Base SVC Protocol Advanced SVC Protocol Examples Address Resolution Buffer (ARB) Performance Evaluation Conclusions Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 11

Speculative Versioning Cache P P P P SV$ Snooping Bus Architected Storage Bus Arbiter Version Control Logic Extensions to SMP coherent cache Both speculative and committed data More state information VOL: a pointer in each line VCL: maintain VOL on bus requests Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 12

Simplified SMP Cache Coherence C BWr/- St/BWr BRd/Wb Ld/BRd St/BWr I D BWr/- I C D Ld St BRd BWr Wb Invalid Clean Dirty Load Store BusRead BusWrite Writeback TALK Cache block size = Load/Store granularity = One word PAPER Realistic linesize, Cast-outs Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 13

Base SVC Design: Processor Requests C St/BWr Ld/BRd Co/Wb D Sq/- St/BWr Co/- Sq/- I DL Co/Wb Sq/- Co: Commit Sq: Squash DL: Dirty after Load Load bit or DL: detect incorrect loads DL remembers use before definition Write back all dirty lines on commit Invalidate all clean lines on squash Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 14

BusRead: Supplying the Correct Version C BRd/ Fl D Fl: Flush Newer I DL BRd/ Fl Examples Program Order T 0 T 1 T 2 T3 T4 1. D D C I D 2. C I C I D VCL: Closest older version is Flushed Self Loops: Cannot write back speculative versions Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 15

BusWrite: Detecting Incorrect Loads C BWr/ In D Examples In: Invalidate Older I BWr/ In DL Program Order T 0 T 1 T 2 T 3 1. C I C C 2. C I C C T4 D DL VCL: Copies used by newer tasks Invalidated DL is also invalidated because of incorrect use Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 16

Base Design Problems and Solutions Write back dirty lines on commit New tasks begin after all write backs Bottleneck: Tasks commit sequentially Commit (C) bit Invalidate clean lines on commit/squash Every task begins with a cold cache stale (T) and Architectural (A) bits Reference Spreading Private caches: Multiple misses to the same data Snarfing Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 17

Outline Motivation Hierarchical Execution Model Speculative Versioning Unifying Speculation and Coherence Speculative Versioning Cache (SVC) Address Resolution Buffer (ARB) Previously proposed shared cache solution Performance Evaluation Conclusions Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 18

Shared Cache Solution: ARB P P P P ARB Interconnection Network Cache Next Level Memory Shared speculative ARB + Shared Cache - Every load and store incurs interconnect latency - On commits, speculative versions must be written back - Allocates space for non-existent versions Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 19

Performance Evaluation: SPEC95 6 5 ARB (3 cycle hit) ARB (2 cycle hit) ARB (1 cycle hit) SVC (1 cycle hit) IPC 4 3 2 1 4-way Multiscalar gcc perl mgrid fpppp compress vortex ijpeg apsi turb3d 64KB data cache with 8KB ARB 2-way ooo superscalar Ps TRADES-OFF HIT-RATE FOR HIT-LATENCY 4x16KB SVC Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 20

Speculative Versioning Cache Unifies speculative versioning and cache coherence Private cache solution for speculative versioning Hold both speculative and committed data in one cache Decouples write backs from commits Eliminates unnecessary write backs Trades-off hit-rate for hit-latency SVC = CACHE COHERENCE + SPECULATIVE VERSIONING Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 21

Decoupling Commit from Write backs CC Committed Clean CD Committed Dirty BRd/- BWr/ - Sq/- Sq/- I BRd/ Wb BWr/Wb C D Co/- Ld/BRd St/BWr Co/- Ld/BRd St/BWr CC CD Explicit pointers necessary to track order among versions VCL determines the CD line to be written back; all other CD lines are purged Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 22

Keeping Cache Warm: After Commits SC Stale Clean C Co/- SCC Stale Committed Clean B BRd, BWr B/ Stale B/ Stale SC Co/- CC Ld/- B/ Stale B/Stale Ld/BRd SCC Problem All C lines are invalidated when a task commits Solution stale bit: to distinguish between stale and clean copies CC LINES ARE CACHE HITS ON LOADS Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 23

Keeping Cache Warm: After Squashes AC Architectural Clean B BRd, BWr C B/ Arch B/ Arch AC Ld/BRd Ld/BRd BWr/- Sq/- BWr/- Problem A squashed task could fetch the same data multiple times Solution Architectural bit: to retain architectural versions after a squash I AC LINES ARE CACHE HITS ON LOADS Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 24

Base SVC Design: Examples Tag V S L Data V: Valid S: Store L: Load W..Z: Caches State Data 0..3: Tasks Program Order Task # 0 st 0, A 1 2 3 st 1, A st 3, A Program order Execution time order S 0 S 0 X/0 X/0 S 3 W/3 Y/1 S 3 W/3 Y/1 L 0 st 1, A 5 st 5, A 6 S 0 X/0 W/- Y/1 Z/- S 1 S 3 S 0 X/0 W/3 Y/1 L 0 S 1 Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 25

Adv. SVC Design I: Efficient Commits Tag V S L C Pointer Data V: Valid S: Store L: Load C: Commit Program Order Task # 0 st 0, A 1 2 3 st 1, A st 3, A S 3 CS 0 X/4 W/3 Y/5 CS 1 S 3 X/4 W/3 Y/5 L 1 5 st 5, A CS 0 6 S 3 X/4 W/3 Y/5 CS 1 st 5, A S 3 X/4 W/3 Y/5 S 5 Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 26

Multiple Committed Versions CS 0 CS 4 CS 3 X/20 W/23 Y/21 2 CS 3 X/20 W/23 Y/21 2 CS W 0 X/20 CS - 3 W/23 Y/21 CS X 3 2 - CS 4 X/20 W/23 Y/21 2 Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 27

Adv. SVC Design II: Stale Copies S Y 0 CS Y 0 X/0 X/4 W/3 Y/1 S Z 1 W/7 Y/5 CS Z 1 Z/6 1 CL 1 L - - S - 3 CS Y 0 CS Y 0 X/4 X/4 W/3 Y/5 CS Z 1 CS - 3 W/7 Y/5 CS Z 1 Z/6 L W 1 CL W 1 Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 28

Adv. SVC Design II: stale bit Tag V S L C T A Pointer Data V: Valid C: Commit S: Store T: stale L: Load A: Architectural ST Y 0 X/0 W/3 Y/1 L - 1 S Z 1 CST Y 0 X/4 W/7 Y/5 CS Z/6 CL - 1 Z 1 S - 3 CST Y 0 X/4 W/3 Y/5 LT W 1 CST Z 1 CS - 3 CST Y 0 X/4 W/7 Y/5 Z/6 CLT W 1 CST Z 1 Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 29

Adv. SVC Design II: Architectural Copies AST Y 0 X/0 W/3 Y/1 S - 1 AST Y 0 X/0 W/3 Y/1 L - 1 S Z 1 CSAT Y 0 X/4 W/3 Y/5 CS - 1 X/4 W/3 Y/5 AL 1 - CSAT Y 0 X/4 S - 3 W/3 Y/5 CST W 1 S - 3 X/4 W/3 Y/5 ALT W 1 Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 30

Adv. SVC Designs: Task Squashes S - 3 CST Y 0 X/4 W/3 Y/1 ST W 1 CST Y 0 X/- W/- Y/1 ST W 1 X/4 W/3 Y/1 S Z 1 L - 1 Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 31

False sharing SVC Design: Realistic Linesize Unnecessary invalidates similar to that for parallel programs - Worse, these invalidates lead to false squashes Versioning Block Similar to Sub-blocking (Transfer and Address Blocks) Definition: Granularity at which speculative versioning is provided Replicate only the L and S bits for every versioning block Sridhar Gopal, UW-Madison Speculative Versioning Cache February 3, 1998 32