CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface
|
|
- Shonda Goodman
- 5 years ago
- Views:
Transcription
1 CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface Yatin A. Manerkar, Daniel Lustig, Michael Pellauer*, and Margaret Martonosi Princeton University *NVIDIA MICRO-48 1
2 Coherence and Consistency At a high level: Coherence Protocols: Propagation of writes to other cores Consistency Models: Ordering rules for visibility of reads and writes 2
3 Coherence and Consistency Coherence Verifiers Consistency Verifiers Arch. Level 3
4 Coherence and Consistency Coherence Verifiers Consistency Verifiers Arch. Level Coherence and consistency often interwoven µarch. Level 4
5 Coherence and Consistency Coherence Verifiers Ignore consistency even when protocol affects consistency! Consistency Verifiers Assume abstract coherence instead of protocol in use! Arch. Level Coherence and consistency often interwoven µarch. Level 5
6 Coherence and Consistency Coherence Verifiers Consistency Verifiers Ignore consistency even when protocol affects consistency! CCI Assume abstract coherence instead of protocol in use! Arch. Level Coherence and consistency often interwoven µarch. Level 6
7 Motivating Example Peekaboo 7
8 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 8
9 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 2. Livelock avoidance: allow destination core to perform one operation on data when it arrives, even if already invalidated [Sorin et al. Primer] Does not break coherence Sometimes intentionally returns stale data 9
10 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 2. Livelock avoidance: allow destination core to perform one operation on data when it arrives, even if already invalidated [Sorin et al. Primer] Does not break coherence Sometimes intentionally returns stale data 3. Prefetching 10
11 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 2. Livelock avoidance: allow destination core to Individual perform one operation Opt. on No data violation when it arrives, even if already invalidated [Sorin et al. Combination of Opts. Violation! Primer] Does not break coherence Sometimes intentionally returns stale data 3. Prefetching 11
12 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Core 1 x: Shared y: Modified x: Invalid y: Invalid [x] 1 [y] 1 r1 [y] r2 [x] 12
13 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 x: Shared y: Modified [x] 1 [y] 1 x: Invalid y: Invalid r1 [y] r2 [x] 13
14 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Shared y: Modified [x] 1 [y] 1 x: Invalid y: Invalid r1 [y] r2 [x] 14
15 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Shared y: Modified [x] 1 [y] 1 Inv x: Invalid y: Invalid r1 [y] r2 [x] 15
16 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Shared y: Modified [x] 1 [y] 1 Inv Inv-Ack x: Invalid y: Invalid r1 [y] r2 [x] 16
17 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Modified [x] 1 [y] 1 Inv Inv-Ack x: Invalid y: Invalid r1 [y] r2 [x] 17
18 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Modified [x] 1 [y] 1 Inv Inv-Ack x: Invalid y: Invalid r1 [y] r2 [x] 18
19 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Modified [x] 1 [y] 1 Inv Inv-Ack Request y x: Invalid y: Invalid r1 [y] r2 [x] 19
20 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Shared [x] 1 [y] 1 Inv Inv-Ack Request y Data (y = 1) x: Invalid y: Shared r1 = 1 r2 [x] 20
21 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Shared [x] 1 [y] 1 Inv Inv-Ack Request y Data (y = 1) x: Invalid y: Shared r1 = 1 r2 [x] 21
22 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Shared [x] 1 [y] 1 Inv Inv-Ack Request y Data (y = 1) x: Invalid y: Shared r1 = 1 r2 = 0 22
23 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 23
24 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 24
25 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 25
26 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 26
27 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 27
28 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Livelock + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 28
29 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Livelock + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 29
30 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Livelock + Expected Coherence = CCI Mismatch Consistency Violation! CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 30
31 Our Work: CCICheck Static CCI-aware consistency verification Fetch Dec. Exec. SB Lds. SB Fetch Dec. Exec. Mem. L1 L1 Mem. WB L2 WB Coherence Orderings (SWMR, DVI, etc.) Microarch spec Litmus Test 31
32 Our Work: CCICheck Static CCI-aware consistency verification Fetch Dec. Exec. SB Lds. SB Fetch Dec. Exec. Mem. L1 L1 Mem. WB L2 WB Coherence Orderings (SWMR, DVI, etc.) Microarch spec Litmus Test Microarchitectural happensbefore (µhb) graph 32
33 Background: PipeCheck Litmus Test mp Exhaustive enumeration of executions using µhb graphs Cyclic graph forbidden by µarch Acyclic graph allowed by µarch [Lustig et al. MICRO-47] 33
34 Background: PipeCheck Litmus Test mp Exhaustive enumeration of executions using µhb graphs Cyclic graph forbidden by µarch Acyclic graph allowed by µarch [Lustig et al. MICRO-47] 34
35 Background: PipeCheck Exhaustive enumeration of executions using µhb graphs Prior techniques cannot model Litmus Test mp CCI events! Cyclic graph forbidden by µarch Acyclic graph allowed by µarch [Lustig et al. MICRO-47] 35
36 Modelling CCI Events Need to model per-cache occupancy Lazy coherence and partial incoherence (e.g. GPUs) Need to model coherence transitions that relate to consistency (e.g. Peekaboo) 36
37 Modelling CCI Events Need to model per-cache occupancy Lazy coherence and partial incoherence (e.g. GPUs) Need to model coherence transitions that relate to consistency (e.g. Peekaboo) 37
38 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 38
39 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 39
40 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 40
41 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 41
42 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 42
43 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 43
44 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 44
45 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 45
46 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 46
47 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 47
48 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 48
49 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 49
50 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 50
51 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 51
52 ViCL: Value in Cache Lifetime Can model requests, downgrades, etc. Litmus Test co-mp 52
53 ViCL: Value in Cache Lifetime Can model requests, downgrades, etc. Litmus Test co-mp 53
54 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 54
55 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 55
56 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 56
57 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 57
58 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 58
59 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 59
60 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 60
61 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 61
62 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 62
63 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 63
64 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 64
65 Path Enumeration Constraint Satisfaction 65
66 Path Enumeration Constraint Satisfaction 66
67 Path Enumeration Constraint Satisfaction 67
68 Path Enumeration Constraint Satisfaction 68
69 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 69
70 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 70
71 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 71
72 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 72
73 Path Enumeration Constraint Satisfaction Cyclic Graph Prune Unsatisfiable Constraint Invalid Scenario 73
74 Case Studies and Results 74
75 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 75
76 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 76
77 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 77
78 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 78
79 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 79
80 Partial Incoherence: GPUs e.g.: mp with membar fences [Alglave et al. ASPLOS15] If fence does not enforce InvCache ordering => no cycle 80
81 Partial Incoherence: GPUs e.g.: mp with membar fences [Alglave et al. ASPLOS15] If fence does not enforce InvCache ordering => no cycle 81
82 Partial Incoherence: GPUs e.g.: mp with membar fences [Alglave et al. ASPLOS15] If fence does not enforce InvCache ordering => no cycle 82
83 Verification Times Runtimes remain reasonable due to intelligent pruning and unsatisfiable constraint detection Subsequent research has used SMT solverbased techniques to run most tests in just seconds! [ASPLOS 2016] 83
84 Verification Times Runtimes remain reasonable due to intelligent pruning and unsatisfiable constraint detection Subsequent research has used SMT solverbased techniques to run most tests in just seconds! [ASPLOS 2016] 84
85 Conclusion CCI verification is critical to correct operation of complex parallel systems CCICheck: static CCI-aware microarchitectural consistency verification Partial incoherence (GPUs), lazy coherence, and more! µhb graphs, ViCLs, and constraint-based enumeration Comprehensive and intuitive µarch modelling Allows designers to build correct systems with greater ease and confidence 85
86 CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface Yatin A. Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi Code available at 86
87 Lazy Coherence (TSO-CC) No eager invalidation of sharers, but InvCache edges model the invalidation of a core s private cache on an L1 miss Thus, TSO is maintained 87
88 Lazy Coherence (TSO-CC) No eager invalidation of sharers, but InvCache edges model the invalidation of a core s private cache on an L1 miss Thus, TSO is maintained 88
89 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data 89
90 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data 90
91 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data 91
92 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data => Two possibilities enumerated. 92
93 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data => Two possibilities enumerated. 93
CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface
Iheck: Using µhb Graphs to Verify the oherence-onsistency Interface Yatin A. Manerkar Daniel Lustig Michael Pellauer Margaret Martonosi Princeton University NVIDIA {manerkar,dlustig,mrm}@princeton.edu
More informationRTLCheck: Verifying the Memory Consistency of RTL Designs
RTLCheck: Verifying the Memory Consistency of RTL Designs Yatin A. Manerkar, Daniel Lustig*, Margaret Martonosi, and Michael Pellauer* Princeton University *NVIDIA MICRO-50 http://check.cs.princeton.edu/
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Hardware-Aware Security Verification and Synthesis Margaret Martonosi H. T. Adams 35 Professor Dept. of Computer Science Princeton University
More informationPipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications
PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications Yatin A. Manerkar Daniel Lustig Margaret Martonosi Aarti Gupta Princeton University NVIDIA {manerkar,mrm,aartig}@princeton.edu
More informationTriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
TriCheck: Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel, Yatin A. Manerkar, Daniel Lustig*, Michael Pellauer*, Margaret Martonosi Princeton University *NVIDIA ASPLOS 2017
More informationTSO-CC Specification
TSO-CC Specification Marco Elver marco.elver@ed.ac.uk December 17, 2015 Contents 1 Introduction 1 2 Storage Requirements 1 3 Assumptions & Definitions 1 4 Protocol State Table 2 4.1 Private Cache Controller..................................
More informationTSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan
TSO-CC: Consistency-directed Coherence for TSO Vijay Nagarajan 1 People Marco Elver (Edinburgh) Bharghava Rajaram (Edinburgh) Changhui Lin (Samsung) Rajiv Gupta (UCR) Susmit Sarkar (St Andrews) 2 Multicores
More informationRTLCheck: Verifying the Memory Consistency of RTL Designs
RTLCheck: Verifying the Memory Consistency of RTL Designs ABSTRACT Yatin A. Manerkar Daniel Lustig Margaret Martonosi Michael Pellauer Princeton University NVIDIA {manerkar,mrm}@princeton.edu {dlustig,mpellauer}@nvidia.com
More informationC11 Compiler Mappings: Exploration, Verification, and Counterexamples
C11 Compiler Mappings: Exploration, Verification, and Counterexamples Yatin Manerkar Princeton University manerkar@princeton.edu http://check.cs.princeton.edu November 22 nd, 2016 1 Compilers Must Uphold
More informationTRANSISTENCY MODELS:MEMORY ORDERING AT THE HARDWARE OS INTERFACE
... TRANSISTENCY MODELS:MEMORY ORDERING AT THE HARDWARE OS INTERFACE... Daniel Lustig Princeton University Geet Sethi Abhishek Bhattacharjee Rutgers University Margaret Martonosi Princeton University THIS
More informationFull-Stack Memory Model Verification with TriCheck
THEME ARTICLE: Top Picks Full-Stack Memory Model Verification with TriCheck Caroline Trippel and Yatin A. Manerkar Princeton University Daniel Lustig and Michael Pellauer Nvidia Margaret Martonosi Princeton
More informationUnderstanding POWER multiprocessors
Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2,3 Luc Maranget 3 Derek Williams 4 1 University of Cambridge 2 Oxford University 3 INRIA 4 IBM June 2011 Programming shared-memory
More informationGetting to Work with OpenPiton
Getting to Work with OpenPiton Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, David Wentzlaff
More informationarxiv: v1 [cs.pl] 29 Aug 2018
Memory Consistency Models using Constraints Özgür Akgün, Ruth Hoffmann, and Susmit Sarkar arxiv:1808.09870v1 [cs.pl] 29 Aug 2018 School of Computer Science, University of St Andrews, UK {ozgur.akgun, rh347,
More informationarxiv: v1 [cs.cr] 11 Feb 2018
MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols Caroline Trippel Daniel Lustig* Margaret Martonosi Princeton University *NVIDIA {ctrippel,mrm@princetonedu
More informationShared Memory Consistency Models: A Tutorial
Shared Memory Consistency Models: A Tutorial By Sarita Adve, Kourosh Gharachorloo WRL Research Report, 1995 Presentation: Vince Schuster Contents Overview Uniprocessor Review Sequential Consistency Relaxed
More informationRelaxed Memory: The Specification Design Space
Relaxed Memory: The Specification Design Space Mark Batty University of Cambridge Fortran meeting, Delft, 25 June 2013 1 An ideal specification Unambiguous Easy to understand Sound w.r.t. experimentally
More informationLecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations
Lecture 12: TM, Consistency Models Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations 1 Paper on TM Pathologies (ISCA 08) LL: lazy versioning, lazy conflict detection, committing
More informationTriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel Yatin A. Manerkar Daniel Lustig* Michael Pellauer* Margaret Martonosi Princeton University *NVIDIA
More informationGPU Concurrency: Weak Behaviours and Programming Assumptions
GPU Concurrency: Weak Behaviours and Programming Assumptions Jyh-Jing Hwang, Yiren(Max) Lu 03/02/2017 Outline 1. Introduction 2. Weak behaviors examples 3. Test methodology 4. Proposed memory model 5.
More information4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins
4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the
More informationTSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model
TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model Sudheendra Hangal, Durgam Vahia, Chaiyasit Manovit, Joseph Lu and Sridhar Narayanan tsotool@sun.com ISCA-2004 Sun Microsystems
More informationTriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel Yatin A. Manerkar Daniel Lustig* Michael Pellauer* Margaret Martonosi Princeton University {ctrippel,manerkar,mrm}@princeton.edu
More informationFormal Specification of RISC-V Systems Instructions
Formal Specification of RISC-V Systems Instructions Arvind Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Murali Vijayaraghavan Computer Science and Artificial Intelligence Lab. MIT RISC-V Workshop, MIT,
More informationLamport Clocks: Verifying A Directory Cache-Coherence Protocol. Computer Sciences Department
Lamport Clocks: Verifying A Directory Cache-Coherence Protocol * Manoj Plakal, Daniel J. Sorin, Anne E. Condon, Mark D. Hill Computer Sciences Department University of Wisconsin-Madison {plakal,sorin,condon,markhill}@cs.wisc.edu
More informationCS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II
CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252
More informationSpecifying, Verifying, and Translating. Between Memory Consistency Models
Specifying, Verifying, and Translating Between Memory Consistency Models Daniel Joseph Lustig A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy
More informationCheck Quick Start. Figure 1: Closeup of icon for starting a terminal window.
Check Quick Start 1 Getting Started 1.1 Downloads The tools in the tutorial are distributed to work within VirtualBox. If you don t already have Virtual Box installed, please download VirtualBox from this
More informationModule 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency
Memory Consistency Models Memory consistency SC SC in MIPS R10000 Relaxed models Total store ordering PC and PSO TSO, PC, PSO Weak ordering (WO) [From Chapters 9 and 11 of Culler, Singh, Gupta] [Additional
More informationLecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models
Lecture 13: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models 1 Coherence Vs. Consistency Recall that coherence guarantees
More informationDirectory Implementation. A High-end MP
ectory Implementation Distributed memory each processor (or cluster of processors) has its own memory processor-memory pairs are connected via a multi-path interconnection network snooping with broadcasting
More informationInterconnect Routing
Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through
More informationA Fault Tolerant Superscalar Processor
A Fault Tolerant Superscalar Processor 1 [Based on Coverage of a Microarchitecture-level Fault Check Regimen in a Superscalar Processor by V. Reddy and E. Rotenberg (2008)] P R E S E N T E D B Y NAN Z
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:
The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationSpeculative Parallelization in Decoupled Look-ahead
Speculative Parallelization in Decoupled Look-ahead Alok Garg, Raj Parihar, and Michael C. Huang Dept. of Electrical & Computer Engineering University of Rochester, Rochester, NY Motivation Single-thread
More informationCache Coherence Protocols for Chip Multiprocessors - I
Cache Coherence Protocols for Chip Multiprocessors - I John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 5 6 September 2016 Context Thus far chip multiprocessors
More informationShared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16
Shared memory Caches, Cache coherence and Memory consistency models Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Shared memory Caches, Cache
More informationNOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.
Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which
More informationThe Bulk Multicore Architecture for Programmability
The Bulk Multicore Architecture for Programmability Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Acknowledgments Key contributors: Luis Ceze Calin
More informationAConsiderable consensus has been reached that cache
TC-Release++: An Efficient Timestamp-Based Coherence Protocol for Many-Core Architectures Yuan Yao, Wenzhi Chen, Tulika Mitra and Yang Xiang 1 Abstract As we enter the era of many-core, providing the shared
More informationData-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.
Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some
More informationDesigning Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve
Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors
Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went
More informationBandwidth Adaptive Snooping
Two classes of multiprocessors Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet Project Computer Sciences Department University of Wisconsin
More informationLecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization
Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationPage 1. Outline. Coherence vs. Consistency. Why Consistency is Important
Outline ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Memory Consistency Models Copyright 2006 Daniel J. Sorin Duke University Slides are derived from work by Sarita
More informationMultiprocessors 1. Outline
Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples
More informationUChicago CMSC Computer Architecture, Autumn 2018 Lab 5 Extra Credit: Multi-Core and The MESI Cache Coherence Protocol
UChicago CMSC-22200 Computer Architecture, Autumn 2018 Lab 5 Extra Credit: Multi-Core and The MESI Cache Coherence Protocol Instructor: Prof. Yanjing Li TA: Miao (Tony) He Assigned: Thur., 11/29, 2018
More informationECE/CS 757: Advanced Computer Architecture II
ECE/CS 757: Advanced Computer Architecture II Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes based on slides created by John Shen, Mark Hill, David Wood, Guri Sohi,
More informationSELECTED TOPICS IN COHERENCE AND CONSISTENCY
SELECTED TOPICS IN COHERENCE AND CONSISTENCY Michel Dubois Ming-Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA90089-2562 dubois@usc.edu INTRODUCTION IN CHIP
More informationA Memory Model for RISC-V
A Memory Model for RISC-V Arvind (joint work with Sizhuo Zhang and Muralidaran Vijayaraghavan) Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Barcelona Supercomputer
More informationEECS 570 Lecture 13. Directory & Optimizations. Winter 2018 Prof. Satish Narayanasamy
Directory & Optimizations Winter 2018 Prof. Satish Narayanasamy http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,
More informationDeclarative semantics for concurrency. 28 August 2017
Declarative semantics for concurrency Ori Lahav Viktor Vafeiadis 28 August 2017 An alternative way of defining the semantics 2 Declarative/axiomatic concurrency semantics Define the notion of a program
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationMemory Hierarchy in a Multiprocessor
EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected
More informationCrossing Guard: Mediating Host-Accelerator Coherence Interactions
Crossing Guard: Mediating Host-Accelerator Coherence Interactions Lena E. Olson Mark D. Hill David A. Wood University of Wisconsin-Madison {lena, markhill, david} @cs.wisc.edu Abstract Specialized hardware
More informationCoherence and Consistency
Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationHardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13
Hardware models: inventing a usable abstraction for Power/ARM 1 Hardware models: inventing a usable abstraction for Power/ARM Disclaimer: 1. ARM MM is analogous to Power MM all this is your next phone!
More informationDeLorean: Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently
: Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently, Luis Ceze* and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign
More informationProblem Set 5 Solutions CS152 Fall 2016
Problem Set 5 Solutions CS152 Fall 2016 Problem P5.1: Sequential Consistency Problem P5.1.A Can X hold value of 4 after all three threads have completed? Please explain briefly. Yes / No C1, B1-B6, A1-A4,
More informationReducing Fair Stuttering Refinement of Transaction Systems
Reducing Fair Stuttering Refinement of Transaction Systems Rob Sumners Advanced Micro Devices robert.sumners@amd.com November 16th, 2015 Rob Sumners (AMD) Transaction Progress Checking November 16th, 2015
More informationRelaxed Memory Consistency
Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationDuke University Department of Electrical and Computer Engineering
Duke University Department of Electrical and Computer Engineering Senior Honors Thesis Spring 2008 Proving the Completeness of Error Detection Mechanisms in Simple Core Chip Multiprocessors Michael Edward
More informationLecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory
Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory 1 Capacity Limitations P P P P B1 C C B1 C C Mem Coherence Monitor Mem Coherence Monitor B2 In a Sequent NUMA-Q design above,
More informationIntroduction to GPU programming with CUDA
Introduction to GPU programming with CUDA Dr. Juan C Zuniga University of Saskatchewan, WestGrid UBC Summer School, Vancouver. June 12th, 2018 Outline 1 Overview of GPU computing a. what is a GPU? b. GPU
More informationComputer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>
Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: This is a closed book, closed notes exam. 80 Minutes 19 pages Notes: Not all questions
More informationFall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand
Cache Coherence Nima Honarmand Cache Coherence: Problem (Review) Problem arises when There are multiple physical copies of one logical location Multiple copies of each cache block (In a shared-mem system)
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More informationDistributed Shared Memory and Memory Consistency Models
Lectures on distributed systems Distributed Shared Memory and Memory Consistency Models Paul Krzyzanowski Introduction With conventional SMP systems, multiple processors execute instructions in a single
More informationSpring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic
Spring 2010 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic C/C++ program Compiler Assembly Code (binary) Processor 0010101010101011110 Memory MAR MDR INPUT Processing Unit OUTPUT ALU TEMP PC Control
More informationCache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem
Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually
More informationEECS 583 Class 13 Software Pipelining
EECS 583 Class 13 Software Pipelining University of Michigan October 29, 2012 Announcements + Reading Material Project proposals» Due Friday, Nov 2, 5pm» 1 paragraph summary of what you plan to work on
More informationEXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu
Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points
More informationThe University of Michigan - Department of EECS EECS578 Correct Operation for Processor and Embedded Systems Midterm exam December 3, 2015
The University of Michigan - Department of EECS EECS578 Correct Operation for Processor and Embedded Systems Midterm exam December 3, 2015 Name: This exam is CLOSED BOOKS, CLOSED NOTES. You can only have
More informationToday s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming
CS758: Multicore Programming Today s Outline: Shared Memory Review Shared Memory & Concurrency Introduction to Shared Memory Thread-Level Parallelism Shared Memory Prof. David A. Wood University of Wisconsin-Madison
More informationOverview: Memory Consistency
Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering
More informationLect. 6: Directory Coherence Protocol
Lect. 6: Directory Coherence Protocol Snooping coherence Global state of a memory line is the collection of its state in all caches, and there is no summary state anywhere All cache controllers monitor
More informationFoundations of Computer Systems
18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:
More informationVisualization of OpenCL Application Execution on CPU-GPU Systems
Visualization of OpenCL Application Execution on CPU-GPU Systems A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Northeastern University Computer Architecture Research
More informationadministrivia final hour exam next Wednesday covers assembly language like hw and worksheets
administrivia final hour exam next Wednesday covers assembly language like hw and worksheets today last worksheet start looking at more details on hardware not covered on ANY exam probably won t finish
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationFall 2012 Parallel Computer Architecture Lecture 15: Speculation I. Prof. Onur Mutlu Carnegie Mellon University 10/10/2012
18-742 Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I Prof. Onur Mutlu Carnegie Mellon University 10/10/2012 Reminder: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi
More informationLecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM
Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in
More informationPiton: A 25-core Academic Manycore Research Processor
Piton: A 25-core Academic Manycore Research Processor Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Jonathan Balkind, Alexey Lavrov, Mohammad Shahrad, Samuel Payne*, and David Wentzlaff Hot Chips
More informationLecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,
More informationSequential Consistency & TSO. Subtitle
Sequential Consistency & TSO Subtitle Core C1 Core C2 data = 0, 1lag SET S1: store data = NEW S2: store 1lag = SET L1: load r1 = 1lag B1: if (r1 SET) goto L1 L2: load r2 = data; Will r2 always be set to
More informationISCA Tutorial Reading List
ISCA Tutorial Reading List Caroline Trippel Yatin A. Manerkar Margaret Martonosi Princeton University {ctrippel,manerkar,mrm}@princeton.edu June, 2017 References [1] Sarita Adve and Kourosh Gharachorloo.
More informationronny@mit.edu www.cag.lcs.mit.edu/scale Introduction Architectures are all about exploiting the parallelism inherent to applications Performance Energy The Vector-Thread Architecture is a new approach
More informationDistributed Operating Systems Memory Consistency
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Distributed Operating Systems Memory Consistency Marcus Völp (slides Julian Stecklina, Marcus Völp) SS2014 Concurrent
More informationFinal Exam May 8th, 2018 Professor Krste Asanovic Name:
Notes: CS 152 Computer Architecture and Engineering Final Exam May 8th, 2018 Professor Krste Asanovic Name: This is a closed book, closed notes exam. 170 Minutes. 26 pages. Not all questions are of equal
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationInstruction Level Parallelism (ILP)
1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models Review. Why are relaxed memory-consistency models needed? How do relaxed MC models require programs to be changed? The safety net between operations whose order needs
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency
More informationECE 313 Computer Organization FINAL EXAM December 13, 2000
This exam is open book and open notes. You have until 11:00AM. Credit for problems requiring calculation will be given only if you show your work. 1. Floating Point Representation / MIPS Assembly Language
More informationShared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB
Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an
More information