CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface

Size: px
Start display at page:

Download "CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface"

Transcription

1 CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface Yatin A. Manerkar, Daniel Lustig, Michael Pellauer*, and Margaret Martonosi Princeton University *NVIDIA MICRO-48 1

2 Coherence and Consistency At a high level: Coherence Protocols: Propagation of writes to other cores Consistency Models: Ordering rules for visibility of reads and writes 2

3 Coherence and Consistency Coherence Verifiers Consistency Verifiers Arch. Level 3

4 Coherence and Consistency Coherence Verifiers Consistency Verifiers Arch. Level Coherence and consistency often interwoven µarch. Level 4

5 Coherence and Consistency Coherence Verifiers Ignore consistency even when protocol affects consistency! Consistency Verifiers Assume abstract coherence instead of protocol in use! Arch. Level Coherence and consistency often interwoven µarch. Level 5

6 Coherence and Consistency Coherence Verifiers Consistency Verifiers Ignore consistency even when protocol affects consistency! CCI Assume abstract coherence instead of protocol in use! Arch. Level Coherence and consistency often interwoven µarch. Level 6

7 Motivating Example Peekaboo 7

8 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 8

9 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 2. Livelock avoidance: allow destination core to perform one operation on data when it arrives, even if already invalidated [Sorin et al. Primer] Does not break coherence Sometimes intentionally returns stale data 9

10 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 2. Livelock avoidance: allow destination core to perform one operation on data when it arrives, even if already invalidated [Sorin et al. Primer] Does not break coherence Sometimes intentionally returns stale data 3. Prefetching 10

11 Motivating Example Peekaboo 1. Invalidation before use Repeated inv before use livelock [Kubiatowicz et al. ASPLOS 1992] 2. Livelock avoidance: allow destination core to Individual perform one operation Opt. on No data violation when it arrives, even if already invalidated [Sorin et al. Combination of Opts. Violation! Primer] Does not break coherence Sometimes intentionally returns stale data 3. Prefetching 11

12 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Core 1 x: Shared y: Modified x: Invalid y: Invalid [x] 1 [y] 1 r1 [y] r2 [x] 12

13 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 x: Shared y: Modified [x] 1 [y] 1 x: Invalid y: Invalid r1 [y] r2 [x] 13

14 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Shared y: Modified [x] 1 [y] 1 x: Invalid y: Invalid r1 [y] r2 [x] 14

15 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Shared y: Modified [x] 1 [y] 1 Inv x: Invalid y: Invalid r1 [y] r2 [x] 15

16 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Shared y: Modified [x] 1 [y] 1 Inv Inv-Ack x: Invalid y: Invalid r1 [y] r2 [x] 16

17 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Modified [x] 1 [y] 1 Inv Inv-Ack x: Invalid y: Invalid r1 [y] r2 [x] 17

18 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Modified [x] 1 [y] 1 Inv Inv-Ack x: Invalid y: Invalid r1 [y] r2 [x] 18

19 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Modified [x] 1 [y] 1 Inv Inv-Ack Request y x: Invalid y: Invalid r1 [y] r2 [x] 19

20 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Shared [x] 1 [y] 1 Inv Inv-Ack Request y Data (y = 1) x: Invalid y: Shared r1 = 1 r2 [x] 20

21 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Shared [x] 1 [y] 1 Inv Inv-Ack Request y Data (y = 1) x: Invalid y: Shared r1 = 1 r2 [x] 21

22 Motivating Example Peekaboo Consider mp with the livelock-avoidance mechanism: Core 0 Prefetch x Core 1 Data (x = 0) x: Modified y: Shared [x] 1 [y] 1 Inv Inv-Ack Request y Data (y = 1) x: Invalid y: Shared r1 = 1 r2 = 0 22

23 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 23

24 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 24

25 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 25

26 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 26

27 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Stale Data + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 27

28 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Livelock + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 28

29 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Livelock + Expected Coherence = Consistency CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 29

30 The Coherence-Consistency Interface (CCI) SWMR, DVI, No Livelock + Expected Coherence = CCI Mismatch Consistency Violation! CCI = guarantees that coherence protocol provides to rest of microarchitecture + memory ordering guarantees that rest of microarch. expects from coherence protocol 30

31 Our Work: CCICheck Static CCI-aware consistency verification Fetch Dec. Exec. SB Lds. SB Fetch Dec. Exec. Mem. L1 L1 Mem. WB L2 WB Coherence Orderings (SWMR, DVI, etc.) Microarch spec Litmus Test 31

32 Our Work: CCICheck Static CCI-aware consistency verification Fetch Dec. Exec. SB Lds. SB Fetch Dec. Exec. Mem. L1 L1 Mem. WB L2 WB Coherence Orderings (SWMR, DVI, etc.) Microarch spec Litmus Test Microarchitectural happensbefore (µhb) graph 32

33 Background: PipeCheck Litmus Test mp Exhaustive enumeration of executions using µhb graphs Cyclic graph forbidden by µarch Acyclic graph allowed by µarch [Lustig et al. MICRO-47] 33

34 Background: PipeCheck Litmus Test mp Exhaustive enumeration of executions using µhb graphs Cyclic graph forbidden by µarch Acyclic graph allowed by µarch [Lustig et al. MICRO-47] 34

35 Background: PipeCheck Exhaustive enumeration of executions using µhb graphs Prior techniques cannot model Litmus Test mp CCI events! Cyclic graph forbidden by µarch Acyclic graph allowed by µarch [Lustig et al. MICRO-47] 35

36 Modelling CCI Events Need to model per-cache occupancy Lazy coherence and partial incoherence (e.g. GPUs) Need to model coherence transitions that relate to consistency (e.g. Peekaboo) 36

37 Modelling CCI Events Need to model per-cache occupancy Lazy coherence and partial incoherence (e.g. GPUs) Need to model coherence transitions that relate to consistency (e.g. Peekaboo) 37

38 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 38

39 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 39

40 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 40

41 4-tuple: ViCL: Value in Cache Lifetime (cache_id, address, data_value, generation_id) cache_id and generation_id uniquely identify each cache line A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address ViCLs start at a ViCL Create event and end at a ViCL Expire event 41

42 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 42

43 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 43

44 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 44

45 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 45

46 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 46

47 ViCL: Value in Cache Lifetime Conventional co-mp timeline (M = Modified, S = Shared) Litmus Test co-mp 47

48 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 48

49 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 49

50 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 50

51 ViCL: Value in Cache Lifetime Now with ViCLs Litmus Test co-mp 51

52 ViCL: Value in Cache Lifetime Can model requests, downgrades, etc. Litmus Test co-mp 52

53 ViCL: Value in Cache Lifetime Can model requests, downgrades, etc. Litmus Test co-mp 53

54 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 54

55 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 55

56 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 56

57 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 57

58 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 58

59 ViCLs in µhb Graphs Litmus Test co-mp Use pipeline model from PipeCheck, but add ViCL nodes and edges 59

60 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 60

61 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 61

62 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 62

63 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 63

64 CCICheck Toolflow CCICheck µarch specification 1. Instruction Paths 2. Per-Stage Orderings 3. Constraints for Instr. Paths Litmus Tests Path Enum. Constraint Satisfaction Pruning (Cycle Checking) Compare CCICheck Pass/ Fail 64

65 Path Enumeration Constraint Satisfaction 65

66 Path Enumeration Constraint Satisfaction 66

67 Path Enumeration Constraint Satisfaction 67

68 Path Enumeration Constraint Satisfaction 68

69 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 69

70 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 70

71 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 71

72 Path Enumeration Constraint Satisfaction Unsatisfiable Constraint Invalid Scenario 72

73 Path Enumeration Constraint Satisfaction Cyclic Graph Prune Unsatisfiable Constraint Invalid Scenario 73

74 Case Studies and Results 74

75 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 75

76 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 76

77 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 77

78 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 78

79 Peekaboo Livelock prevention mechanism allows use of stale data Peekaboo edge completes cycle => outcome forbidden Consistency maintained 79

80 Partial Incoherence: GPUs e.g.: mp with membar fences [Alglave et al. ASPLOS15] If fence does not enforce InvCache ordering => no cycle 80

81 Partial Incoherence: GPUs e.g.: mp with membar fences [Alglave et al. ASPLOS15] If fence does not enforce InvCache ordering => no cycle 81

82 Partial Incoherence: GPUs e.g.: mp with membar fences [Alglave et al. ASPLOS15] If fence does not enforce InvCache ordering => no cycle 82

83 Verification Times Runtimes remain reasonable due to intelligent pruning and unsatisfiable constraint detection Subsequent research has used SMT solverbased techniques to run most tests in just seconds! [ASPLOS 2016] 83

84 Verification Times Runtimes remain reasonable due to intelligent pruning and unsatisfiable constraint detection Subsequent research has used SMT solverbased techniques to run most tests in just seconds! [ASPLOS 2016] 84

85 Conclusion CCI verification is critical to correct operation of complex parallel systems CCICheck: static CCI-aware microarchitectural consistency verification Partial incoherence (GPUs), lazy coherence, and more! µhb graphs, ViCLs, and constraint-based enumeration Comprehensive and intuitive µarch modelling Allows designers to build correct systems with greater ease and confidence 85

86 CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface Yatin A. Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi Code available at 86

87 Lazy Coherence (TSO-CC) No eager invalidation of sharers, but InvCache edges model the invalidation of a core s private cache on an L1 miss Thus, TSO is maintained 87

88 Lazy Coherence (TSO-CC) No eager invalidation of sharers, but InvCache edges model the invalidation of a core s private cache on an L1 miss Thus, TSO is maintained 88

89 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data 89

90 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data 90

91 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data 91

92 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data => Two possibilities enumerated. 92

93 Constraint-Based Enumeration i3 needs a source for its value L1 ViCL with same address and data => Two possibilities enumerated. 93

CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface

CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface Iheck: Using µhb Graphs to Verify the oherence-onsistency Interface Yatin A. Manerkar Daniel Lustig Michael Pellauer Margaret Martonosi Princeton University NVIDIA {manerkar,dlustig,mrm}@princeton.edu

More information

RTLCheck: Verifying the Memory Consistency of RTL Designs

RTLCheck: Verifying the Memory Consistency of RTL Designs RTLCheck: Verifying the Memory Consistency of RTL Designs Yatin A. Manerkar, Daniel Lustig*, Margaret Martonosi, and Michael Pellauer* Princeton University *NVIDIA MICRO-50 http://check.cs.princeton.edu/

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Hardware-Aware Security Verification and Synthesis Margaret Martonosi H. T. Adams 35 Professor Dept. of Computer Science Princeton University

More information

PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications

PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications Yatin A. Manerkar Daniel Lustig Margaret Martonosi Aarti Gupta Princeton University NVIDIA {manerkar,mrm,aartig}@princeton.edu

More information

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA TriCheck: Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel, Yatin A. Manerkar, Daniel Lustig*, Michael Pellauer*, Margaret Martonosi Princeton University *NVIDIA ASPLOS 2017

More information

TSO-CC Specification

TSO-CC Specification TSO-CC Specification Marco Elver marco.elver@ed.ac.uk December 17, 2015 Contents 1 Introduction 1 2 Storage Requirements 1 3 Assumptions & Definitions 1 4 Protocol State Table 2 4.1 Private Cache Controller..................................

More information

TSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan

TSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan TSO-CC: Consistency-directed Coherence for TSO Vijay Nagarajan 1 People Marco Elver (Edinburgh) Bharghava Rajaram (Edinburgh) Changhui Lin (Samsung) Rajiv Gupta (UCR) Susmit Sarkar (St Andrews) 2 Multicores

More information

RTLCheck: Verifying the Memory Consistency of RTL Designs

RTLCheck: Verifying the Memory Consistency of RTL Designs RTLCheck: Verifying the Memory Consistency of RTL Designs ABSTRACT Yatin A. Manerkar Daniel Lustig Margaret Martonosi Michael Pellauer Princeton University NVIDIA {manerkar,mrm}@princeton.edu {dlustig,mpellauer}@nvidia.com

More information

C11 Compiler Mappings: Exploration, Verification, and Counterexamples

C11 Compiler Mappings: Exploration, Verification, and Counterexamples C11 Compiler Mappings: Exploration, Verification, and Counterexamples Yatin Manerkar Princeton University manerkar@princeton.edu http://check.cs.princeton.edu November 22 nd, 2016 1 Compilers Must Uphold

More information

TRANSISTENCY MODELS:MEMORY ORDERING AT THE HARDWARE OS INTERFACE

TRANSISTENCY MODELS:MEMORY ORDERING AT THE HARDWARE OS INTERFACE ... TRANSISTENCY MODELS:MEMORY ORDERING AT THE HARDWARE OS INTERFACE... Daniel Lustig Princeton University Geet Sethi Abhishek Bhattacharjee Rutgers University Margaret Martonosi Princeton University THIS

More information

Full-Stack Memory Model Verification with TriCheck

Full-Stack Memory Model Verification with TriCheck THEME ARTICLE: Top Picks Full-Stack Memory Model Verification with TriCheck Caroline Trippel and Yatin A. Manerkar Princeton University Daniel Lustig and Michael Pellauer Nvidia Margaret Martonosi Princeton

More information

Understanding POWER multiprocessors

Understanding POWER multiprocessors Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2,3 Luc Maranget 3 Derek Williams 4 1 University of Cambridge 2 Oxford University 3 INRIA 4 IBM June 2011 Programming shared-memory

More information

Getting to Work with OpenPiton

Getting to Work with OpenPiton Getting to Work with OpenPiton Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, David Wentzlaff

More information

arxiv: v1 [cs.pl] 29 Aug 2018

arxiv: v1 [cs.pl] 29 Aug 2018 Memory Consistency Models using Constraints Özgür Akgün, Ruth Hoffmann, and Susmit Sarkar arxiv:1808.09870v1 [cs.pl] 29 Aug 2018 School of Computer Science, University of St Andrews, UK {ozgur.akgun, rh347,

More information

arxiv: v1 [cs.cr] 11 Feb 2018

arxiv: v1 [cs.cr] 11 Feb 2018 MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols Caroline Trippel Daniel Lustig* Margaret Martonosi Princeton University *NVIDIA {ctrippel,mrm@princetonedu

More information

Shared Memory Consistency Models: A Tutorial

Shared Memory Consistency Models: A Tutorial Shared Memory Consistency Models: A Tutorial By Sarita Adve, Kourosh Gharachorloo WRL Research Report, 1995 Presentation: Vince Schuster Contents Overview Uniprocessor Review Sequential Consistency Relaxed

More information

Relaxed Memory: The Specification Design Space

Relaxed Memory: The Specification Design Space Relaxed Memory: The Specification Design Space Mark Batty University of Cambridge Fortran meeting, Delft, 25 June 2013 1 An ideal specification Unambiguous Easy to understand Sound w.r.t. experimentally

More information

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations Lecture 12: TM, Consistency Models Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations 1 Paper on TM Pathologies (ISCA 08) LL: lazy versioning, lazy conflict detection, committing

More information

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel Yatin A. Manerkar Daniel Lustig* Michael Pellauer* Margaret Martonosi Princeton University *NVIDIA

More information

GPU Concurrency: Weak Behaviours and Programming Assumptions

GPU Concurrency: Weak Behaviours and Programming Assumptions GPU Concurrency: Weak Behaviours and Programming Assumptions Jyh-Jing Hwang, Yiren(Max) Lu 03/02/2017 Outline 1. Introduction 2. Weak behaviors examples 3. Test methodology 4. Proposed memory model 5.

More information

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins 4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the

More information

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model Sudheendra Hangal, Durgam Vahia, Chaiyasit Manovit, Joseph Lu and Sridhar Narayanan tsotool@sun.com ISCA-2004 Sun Microsystems

More information

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel Yatin A. Manerkar Daniel Lustig* Michael Pellauer* Margaret Martonosi Princeton University {ctrippel,manerkar,mrm}@princeton.edu

More information

Formal Specification of RISC-V Systems Instructions

Formal Specification of RISC-V Systems Instructions Formal Specification of RISC-V Systems Instructions Arvind Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Murali Vijayaraghavan Computer Science and Artificial Intelligence Lab. MIT RISC-V Workshop, MIT,

More information

Lamport Clocks: Verifying A Directory Cache-Coherence Protocol. Computer Sciences Department

Lamport Clocks: Verifying A Directory Cache-Coherence Protocol. Computer Sciences Department Lamport Clocks: Verifying A Directory Cache-Coherence Protocol * Manoj Plakal, Daniel J. Sorin, Anne E. Condon, Mark D. Hill Computer Sciences Department University of Wisconsin-Madison {plakal,sorin,condon,markhill}@cs.wisc.edu

More information

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252

More information

Specifying, Verifying, and Translating. Between Memory Consistency Models

Specifying, Verifying, and Translating. Between Memory Consistency Models Specifying, Verifying, and Translating Between Memory Consistency Models Daniel Joseph Lustig A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy

More information

Check Quick Start. Figure 1: Closeup of icon for starting a terminal window.

Check Quick Start. Figure 1: Closeup of icon for starting a terminal window. Check Quick Start 1 Getting Started 1.1 Downloads The tools in the tutorial are distributed to work within VirtualBox. If you don t already have Virtual Box installed, please download VirtualBox from this

More information

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Module 15: Memory Consistency Models Lecture 34: Sequential Consistency and Relaxed Models Memory Consistency Models. Memory consistency Memory Consistency Models Memory consistency SC SC in MIPS R10000 Relaxed models Total store ordering PC and PSO TSO, PC, PSO Weak ordering (WO) [From Chapters 9 and 11 of Culler, Singh, Gupta] [Additional

More information

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models Lecture 13: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models 1 Coherence Vs. Consistency Recall that coherence guarantees

More information

Directory Implementation. A High-end MP

Directory Implementation. A High-end MP ectory Implementation Distributed memory each processor (or cluster of processors) has its own memory processor-memory pairs are connected via a multi-path interconnection network snooping with broadcasting

More information

Interconnect Routing

Interconnect Routing Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through

More information

A Fault Tolerant Superscalar Processor

A Fault Tolerant Superscalar Processor A Fault Tolerant Superscalar Processor 1 [Based on Coverage of a Microarchitecture-level Fault Check Regimen in a Superscalar Processor by V. Reddy and E. Rotenberg (2008)] P R E S E N T E D B Y NAN Z

More information

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains: The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations

More information

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today

More information

Speculative Parallelization in Decoupled Look-ahead

Speculative Parallelization in Decoupled Look-ahead Speculative Parallelization in Decoupled Look-ahead Alok Garg, Raj Parihar, and Michael C. Huang Dept. of Electrical & Computer Engineering University of Rochester, Rochester, NY Motivation Single-thread

More information

Cache Coherence Protocols for Chip Multiprocessors - I

Cache Coherence Protocols for Chip Multiprocessors - I Cache Coherence Protocols for Chip Multiprocessors - I John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 5 6 September 2016 Context Thus far chip multiprocessors

More information

Shared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16

Shared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16 Shared memory Caches, Cache coherence and Memory consistency models Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Shared memory Caches, Cache

More information

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem. Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which

More information

The Bulk Multicore Architecture for Programmability

The Bulk Multicore Architecture for Programmability The Bulk Multicore Architecture for Programmability Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Acknowledgments Key contributors: Luis Ceze Calin

More information

AConsiderable consensus has been reached that cache

AConsiderable consensus has been reached that cache TC-Release++: An Efficient Timestamp-Based Coherence Protocol for Many-Core Architectures Yuan Yao, Wenzhi Chen, Tulika Mitra and Yang Xiang 1 Abstract As we enter the era of many-core, providing the shared

More information

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes. Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some

More information

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important

More information

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors

Module 9: Introduction to Shared Memory Multiprocessors Lecture 16: Multiprocessor Organizations and Cache Coherence Shared Memory Multiprocessors Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went

More information

Bandwidth Adaptive Snooping

Bandwidth Adaptive Snooping Two classes of multiprocessors Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet Project Computer Sciences Department University of Wisconsin

More information

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important Outline ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Memory Consistency Models Copyright 2006 Daniel J. Sorin Duke University Slides are derived from work by Sarita

More information

Multiprocessors 1. Outline

Multiprocessors 1. Outline Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples

More information

UChicago CMSC Computer Architecture, Autumn 2018 Lab 5 Extra Credit: Multi-Core and The MESI Cache Coherence Protocol

UChicago CMSC Computer Architecture, Autumn 2018 Lab 5 Extra Credit: Multi-Core and The MESI Cache Coherence Protocol UChicago CMSC-22200 Computer Architecture, Autumn 2018 Lab 5 Extra Credit: Multi-Core and The MESI Cache Coherence Protocol Instructor: Prof. Yanjing Li TA: Miao (Tony) He Assigned: Thur., 11/29, 2018

More information

ECE/CS 757: Advanced Computer Architecture II

ECE/CS 757: Advanced Computer Architecture II ECE/CS 757: Advanced Computer Architecture II Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes based on slides created by John Shen, Mark Hill, David Wood, Guri Sohi,

More information

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

SELECTED TOPICS IN COHERENCE AND CONSISTENCY SELECTED TOPICS IN COHERENCE AND CONSISTENCY Michel Dubois Ming-Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA90089-2562 dubois@usc.edu INTRODUCTION IN CHIP

More information

A Memory Model for RISC-V

A Memory Model for RISC-V A Memory Model for RISC-V Arvind (joint work with Sizhuo Zhang and Muralidaran Vijayaraghavan) Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Barcelona Supercomputer

More information

EECS 570 Lecture 13. Directory & Optimizations. Winter 2018 Prof. Satish Narayanasamy

EECS 570 Lecture 13. Directory & Optimizations. Winter 2018 Prof. Satish Narayanasamy Directory & Optimizations Winter 2018 Prof. Satish Narayanasamy http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Declarative semantics for concurrency. 28 August 2017

Declarative semantics for concurrency. 28 August 2017 Declarative semantics for concurrency Ori Lahav Viktor Vafeiadis 28 August 2017 An alternative way of defining the semantics 2 Declarative/axiomatic concurrency semantics Define the notion of a program

More information

Lecture 25: Multiprocessors

Lecture 25: Multiprocessors Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed

More information

Memory Hierarchy in a Multiprocessor

Memory Hierarchy in a Multiprocessor EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected

More information

Crossing Guard: Mediating Host-Accelerator Coherence Interactions

Crossing Guard: Mediating Host-Accelerator Coherence Interactions Crossing Guard: Mediating Host-Accelerator Coherence Interactions Lena E. Olson Mark D. Hill David A. Wood University of Wisconsin-Madison {lena, markhill, david} @cs.wisc.edu Abstract Specialized hardware

More information

Coherence and Consistency

Coherence and Consistency Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

Hardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13

Hardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13 Hardware models: inventing a usable abstraction for Power/ARM 1 Hardware models: inventing a usable abstraction for Power/ARM Disclaimer: 1. ARM MM is analogous to Power MM all this is your next phone!

More information

DeLorean: Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently

DeLorean: Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently : Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently, Luis Ceze* and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

More information

Problem Set 5 Solutions CS152 Fall 2016

Problem Set 5 Solutions CS152 Fall 2016 Problem Set 5 Solutions CS152 Fall 2016 Problem P5.1: Sequential Consistency Problem P5.1.A Can X hold value of 4 after all three threads have completed? Please explain briefly. Yes / No C1, B1-B6, A1-A4,

More information

Reducing Fair Stuttering Refinement of Transaction Systems

Reducing Fair Stuttering Refinement of Transaction Systems Reducing Fair Stuttering Refinement of Transaction Systems Rob Sumners Advanced Micro Devices robert.sumners@amd.com November 16th, 2015 Rob Sumners (AMD) Transaction Progress Checking November 16th, 2015

More information

Relaxed Memory Consistency

Relaxed Memory Consistency Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Duke University Department of Electrical and Computer Engineering

Duke University Department of Electrical and Computer Engineering Duke University Department of Electrical and Computer Engineering Senior Honors Thesis Spring 2008 Proving the Completeness of Error Detection Mechanisms in Simple Core Chip Multiprocessors Michael Edward

More information

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory 1 Capacity Limitations P P P P B1 C C B1 C C Mem Coherence Monitor Mem Coherence Monitor B2 In a Sequent NUMA-Q design above,

More information

Introduction to GPU programming with CUDA

Introduction to GPU programming with CUDA Introduction to GPU programming with CUDA Dr. Juan C Zuniga University of Saskatchewan, WestGrid UBC Summer School, Vancouver. June 12th, 2018 Outline 1 Overview of GPU computing a. what is a GPU? b. GPU

More information

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY> Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: This is a closed book, closed notes exam. 80 Minutes 19 pages Notes: Not all questions

More information

Fall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand

Fall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand Cache Coherence Nima Honarmand Cache Coherence: Problem (Review) Problem arises when There are multiple physical copies of one logical location Multiple copies of each cache block (In a shared-mem system)

More information

Page 1. Cache Coherence

Page 1. Cache Coherence Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale

More information

Distributed Shared Memory and Memory Consistency Models

Distributed Shared Memory and Memory Consistency Models Lectures on distributed systems Distributed Shared Memory and Memory Consistency Models Paul Krzyzanowski Introduction With conventional SMP systems, multiple processors execute instructions in a single

More information

Spring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic

Spring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic Spring 2010 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic C/C++ program Compiler Assembly Code (binary) Processor 0010101010101011110 Memory MAR MDR INPUT Processing Unit OUTPUT ALU TEMP PC Control

More information

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually

More information

EECS 583 Class 13 Software Pipelining

EECS 583 Class 13 Software Pipelining EECS 583 Class 13 Software Pipelining University of Michigan October 29, 2012 Announcements + Reading Material Project proposals» Due Friday, Nov 2, 5pm» 1 paragraph summary of what you plan to work on

More information

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points

More information

The University of Michigan - Department of EECS EECS578 Correct Operation for Processor and Embedded Systems Midterm exam December 3, 2015

The University of Michigan - Department of EECS EECS578 Correct Operation for Processor and Embedded Systems Midterm exam December 3, 2015 The University of Michigan - Department of EECS EECS578 Correct Operation for Processor and Embedded Systems Midterm exam December 3, 2015 Name: This exam is CLOSED BOOKS, CLOSED NOTES. You can only have

More information

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming CS758: Multicore Programming Today s Outline: Shared Memory Review Shared Memory & Concurrency Introduction to Shared Memory Thread-Level Parallelism Shared Memory Prof. David A. Wood University of Wisconsin-Madison

More information

Overview: Memory Consistency

Overview: Memory Consistency Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering

More information

Lect. 6: Directory Coherence Protocol

Lect. 6: Directory Coherence Protocol Lect. 6: Directory Coherence Protocol Snooping coherence Global state of a memory line is the collection of its state in all caches, and there is no summary state anywhere All cache controllers monitor

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:

More information

Visualization of OpenCL Application Execution on CPU-GPU Systems

Visualization of OpenCL Application Execution on CPU-GPU Systems Visualization of OpenCL Application Execution on CPU-GPU Systems A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Northeastern University Computer Architecture Research

More information

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets administrivia final hour exam next Wednesday covers assembly language like hw and worksheets today last worksheet start looking at more details on hardware not covered on ANY exam probably won t finish

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I. Prof. Onur Mutlu Carnegie Mellon University 10/10/2012

Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I. Prof. Onur Mutlu Carnegie Mellon University 10/10/2012 18-742 Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I Prof. Onur Mutlu Carnegie Mellon University 10/10/2012 Reminder: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi

More information

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in

More information

Piton: A 25-core Academic Manycore Research Processor

Piton: A 25-core Academic Manycore Research Processor Piton: A 25-core Academic Manycore Research Processor Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Jonathan Balkind, Alexey Lavrov, Mohammad Shahrad, Samuel Payne*, and David Wentzlaff Hot Chips

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

Sequential Consistency & TSO. Subtitle

Sequential Consistency & TSO. Subtitle Sequential Consistency & TSO Subtitle Core C1 Core C2 data = 0, 1lag SET S1: store data = NEW S2: store 1lag = SET L1: load r1 = 1lag B1: if (r1 SET) goto L1 L2: load r2 = data; Will r2 always be set to

More information

ISCA Tutorial Reading List

ISCA Tutorial Reading List ISCA Tutorial Reading List Caroline Trippel Yatin A. Manerkar Margaret Martonosi Princeton University {ctrippel,manerkar,mrm}@princeton.edu June, 2017 References [1] Sarita Adve and Kourosh Gharachorloo.

More information

ronny@mit.edu www.cag.lcs.mit.edu/scale Introduction Architectures are all about exploiting the parallelism inherent to applications Performance Energy The Vector-Thread Architecture is a new approach

More information

Distributed Operating Systems Memory Consistency

Distributed Operating Systems Memory Consistency Faculty of Computer Science Institute for System Architecture, Operating Systems Group Distributed Operating Systems Memory Consistency Marcus Völp (slides Julian Stecklina, Marcus Völp) SS2014 Concurrent

More information

Final Exam May 8th, 2018 Professor Krste Asanovic Name:

Final Exam May 8th, 2018 Professor Krste Asanovic Name: Notes: CS 152 Computer Architecture and Engineering Final Exam May 8th, 2018 Professor Krste Asanovic Name: This is a closed book, closed notes exam. 170 Minutes. 26 pages. Not all questions are of equal

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

Instruction Level Parallelism (ILP)

Instruction Level Parallelism (ILP) 1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models Relaxed Memory-Consistency Models Review. Why are relaxed memory-consistency models needed? How do relaxed MC models require programs to be changed? The safety net between operations whose order needs

More information

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency

More information

ECE 313 Computer Organization FINAL EXAM December 13, 2000

ECE 313 Computer Organization FINAL EXAM December 13, 2000 This exam is open book and open notes. You have until 11:00AM. Credit for problems requiring calculation will be given only if you show your work. 1. Floating Point Representation / MIPS Assembly Language

More information

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an

More information