Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

Size: px
Start display at page:

Download "Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization"

Transcription

1 Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni 1 PPPJ 2015, Melbourne, Florida, USA

2 Programming Language Semantics? Data Races Java provides weak semantics 2

3 Weak Semantics T1 A a = null; boolean init = false; T2 a = new A(); init = true; if (init) a.field++; 3

4 Weak Semantics T1 a = new A(); init = true; No data dependence T2 if (init) a.field++; 4

5 Weak Semantics T1 A a = null; boolean init = false; T2 a = new A(); init = true; if (init) a.field++; 5

6 Weak Semantics T1 T2 init = true; if (init) a.field++; a = new A(); 6

7 Weak Semantics T1 T2 Null dereference init = true; if (init) a.field++; a = new A(); 7

8 Java Memory Model JMM (Manson et al., POPL, 2005) variant of DRF0 (Adve and Hill, ISCA, 1990) Atomicity of synchronization-free regions for datarace-free programs Data races: weak semantics 8

9 Need for Stronger Memory Models The inability to define reasonable semantics for programs with data races is not just a theoretical shortcoming, but a fundamental hole in the foundation of our languages and systems Give better semantics to programs with data races Stronger memory models Adve and Boehm, CACM,

10 Run-time cost Memory Models: Run-time cost vs Strength Synchronization-free region serializability 1 SC Statically Bounded Region Serializability DRF0 Strength 1. Ouyang et al.... and region serializability for all. In HotPar,

11 Run-time cost Memory Models: Run-time cost vs Strength Synchronization-free region serializability SC Statically Bounded Region Serializability 2 DRF0 Strength Sengupta et al. Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability. ASPLOS, 2015.

12 Statically Bounded Region Serializability (SBRS) Compiler demarcated regions execute atomically Execution is an interleaving of these regions Sengupta et al., ASPLOS,

13 Statically Bounded Region Serializability (SBRS) acq(lock) Synchronization operations Method calls Loop backedges methodcall() rel(lock) 13

14 Statically Bounded Region Serializability (SBRS) acq(lock) methodcall() rel(lock) 14

15 Statically Bounded Region Serializability (SBRS) Statically and dynamically bounded Loop backedges 15

16 Overview Enforcement of SBRS with dynamic locks Precise dynamic locks: EnfoRSer-D (our prior work), low contention Enforcement of SBRS with static locks Imprecise static locks: EnfoRSer-S, low instrumentation overhead Hybridization of locks EnfoRSer-H: static and dynamic locks, right locks for right sites? Results EnfoRSer-H does at least as well as either. Some cases significant benefit 16

17 Enforcement of SBRS Prevent two concurrent accesses to the same memory location where one is a write 17

18 Enforcement of SBRS Prevent two Prevent concurrent regions accesses that to have the same memory location races where from one running is a write concurrently! 18

19 Enforcement of SBRS Acquire locks before each memory access Acquire locks at the start of the region 19

20 Enforcement of SBRS Acquire locks before each memory access Precise object locks: dynamic locks Acquire locks at the start of the region 20

21 Enforcement of SBRS Acquire locks before each memory access Acquire locks at the start of the region Region level locks: statically chosen for an access site 21

22 EnfoRSer-D Precise dynamic locks Per-access locks with retry mechanism Compiler Transformations: Speculative execution 22

23 SBRS with Dynamic Locks Y = = X Dynamic per-access locks Z = 23

24 SBRS with Dynamic Locks Y = Program access = X Z = 24

25 SBRS with Dynamic Locks Y = = X Ownership transferred Z = 25

26 SBRS with Dynamic Locks Y = Dynamic locks: precise location, hence no reasoning about races statically! = X Ownership transferred Z = 26

27 SBRS with Dynamic Locks Precise conflict detectionefficient for high-conflicting regions Y = High instrumentation cost at each access = X High overhead for lowconflicting programs Ownership transferred Z = 27

28 Experimental Methodology Benchmarks DaCapo 2006, 9.12-bach Fixed-workload versions of SPECjbb2000 and SPECjbb2005 Platform Intel Xeon system: 32 cores 28

29 Implementation and Evaluation Developed in Jikes RVM Code publicly available on Jikes RVM Research Archive 29

30 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 30

31 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D % overhead on average hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 31

32 %overhead over unmodified JVM Run-time Performance EnfoRSer-D Precise conflict detection but high instrumentation overhead! Complex compiler transformations (additional code) Can we do better? 15-5 hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 32

33 EnfoRSer with Static Locks Reduce the instrumentation overhead of EnfoRSer-D Less complex code generation 33

34 EnfoRSer-S Static region level locks Racing sites acquire same lock Coarsened locks to reduce instrumentation overhead 34

35 SBRS with Locks on Static Sites L0 Static locks: racy sites protected by same lock L1 L2 Static Region Locks Y = = X Z = 35

36 SBRS with Locks on Static Sites L0 L1 L2 All locks acquired before access Y = = X Z = 36

37 SBRS with Locks on Static Sites L0 L1 L2 Y = Ownership transferred = X Z = 37

38 SBRS with Locks on Static Sites L0 L1 Imprecise and does not lower instrumentation overhead! L2 Y = Ownership transferred = X Z = 38

39 SBRS with Locks on Static Sites L012 Coarsened single lock Y = = X Z = 39

40 SBRS with Locks on Static Sites Low instrumentation L012 overhead Coarsened single lock Imprecise Y = conflict detection = X Z = 40

41 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 41

42 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 42

43 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 RACE 43

44 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 RACE Naik et al. s 2006 race detection algorithm, Chord 44

45 EnfoRSer-S R1 R2 R3 R4 L3 s0 s2 s5 s7 s3 L2 s8 s1 s4 s6 L4 L0 L1 45

46 EnfoRSer-S R1 R2 R3 R4 L0 L0 L1 L3 s0 s2 s5 s7 s3 s8 s1 s4 s6 46

47 EnfoRSer-S R1 R2 R3 R4 L0 L0 L1 L3 L1 L1 L2 L2 L4 s0 s2 s5 s7 s3 s8 s1 s4 s6 47

48 EnfoRSer-S R1 R2 R3 R4 L0 L0 L1 L3 L1 L1 L2 L2 L4 s0 s2 Does not reduce s3 instrumentation overhead! s5 s7 s8 s1 s4 s6 48

49 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 49

50 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 Same Region Static Locks (SRSL) RACE 50

51 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 Same Region Static Locks (SRSL) RACE L0 L1 51

52 EnfoRSer-S R1 R2 R3 R4 L0 L0 L0 L1 s0 s2 s5 s7 s3 s8 s1 s4 s6 52

53 EnfoRSer-S R1 R2 R3 R4 s0 L0 Reduces instrumentation L0 L0 L1 overhead s2 s5 s7 Increases contention s3 s8 s1 s4 s6 53

54 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S 11,000 3,500 12,000 33,000 10,000 4,300 3, , hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 54

55 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S 11,000 3,500 12,000 33,000 10,000 4,300 3, , % overhead on average hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 55

56 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S 11,000 3,500 12,000 33,000 10,000 4,300 3, , EnfoRSer-S performs better 2600% overhead on average 15-5 hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 56

57 Hybridizing Locks High contention sites: Precise dynamic locks (precise conflict detection) Low contention sites: Single static lock (low instrumentation overhead) 57

58 EnfoRSer-H Static locks to reduce instrumentation Dynamic locks for precise conflict detection Correctly and efficiently combine: best of both 58

59 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, , % overhead on average hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 59

60 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, , Provides nearly the best of either approaches! % reduction in overhead 15-5 hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 60

61 %overhead over unmodified JVM Run-time Performance EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, ,600 49% reduction in lock acquires and not increasing contention! % reduction in overhead hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 61

62 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, , % reduction in lock acquires Dramatically improves on both when hybridization helps! 87% reduction in overhead hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 62

63 Hybridizing Locks Right synchronization for right program sites Combining different synchronization mechanisms Best of different synchronization mechanisms 63

64 EnfoRSer-H Costbenefit model Profiling Right locks for right sites Assignment algorithm Combine static and dynamic locks 64

65 Two-iteration Methodology Program execution First iteration Profiling (use RACE relation) Assignment Algorithm Between iterations Compute SRSL and estimate cost Program exectuion Second iteration Use hybridized instrumentation 65

66 Assignment Algorithm 66

67 Profiled data Compute initial cost Change a static lock to dynamic lock Change all its racing sites to dynamic locks Profiled data Recompute cost current cost < previous cost No Yes Retain state Revert to previous state 67

68 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 estimatedcost = N i estimatecost(ri) ; 68

69 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 estimatedcost = N i estimatecost(ri) ; 1. Conflicts on each site 2. Lock acquires on each site 69

70 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 70

71 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 71

72 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 If (current cost < previous cost) // retain state 72

73 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Switch on next site 73

74 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 74

75 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 If (current cost < previous cost) // retain state 75

76 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Switch on next site 76

77 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 estimatedcost = N i estimatecost(ri) ; current cost > previous cost // revert state 77

78 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 78

79 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 s1, s9 79

80 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Final State! 80

81 Lock Assignment R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 L1 81

82 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 L1 82

83 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 L1 83

84 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 84

85 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Per-object locks and transformations 85

86 Related Work Use of static locks Chimera, Lee et al., PLDI 2012 Use of static analysis Static Conflict Analysis for Multi-Threaded Object-Oriented Programs, Von Praun and Gross, PLDI, Goldilocks, Elmas et al., PLDI Red Card, Flanagan and Freund, ECOOP 2013 Hybridizing locks Hybrid Tracking, Cao et al., WODET

87 Conclusion Combine synchronization mechanisms Best of different kinds of synchronization Efficient enforcement of SBRS Static locks Dynamic locks Precise conflict detection Low instrumentation overhead Select the best for a region 87

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Programming

More information

Legato: Bounded Region Serializability Using Commodity Hardware Transactional Memory. Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni

Legato: Bounded Region Serializability Using Commodity Hardware Transactional Memory. Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni Legato: Bounded Region Serializability Using Commodity Hardware al Memory Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni 1 Programming Language Semantics? Data Races C++ no guarantee of semantics

More information

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization Aritra Sengupta Man Cao Michael D. Bond Milind Kulkarni Ohio State University Purdue University {sengupta,caoma,mikebond}@cse.ohio-state.edu

More information

Lightweight Data Race Detection for Production Runs

Lightweight Data Race Detection for Production Runs Lightweight Data Race Detection for Production Runs Swarnendu Biswas, UT Austin Man Cao, Ohio State University Minjia Zhang, Microsoft Research Michael D. Bond, Ohio State University Benjamin P. Wood,

More information

Hybrid Static Dynamic Analysis for Statically Bounded Region Serializability

Hybrid Static Dynamic Analysis for Statically Bounded Region Serializability Hybrid Static Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta Swarnendu Biswas Minjia Zhang Michael D. Bond Milind Kulkarni Ohio State University Purdue University {sengupta,biswass,zhanminj,mikebond@cse.ohio-state.edu

More information

Legato: End-to-End Bounded Region Serializability Using Commodity Hardware Transactional Memory

Legato: End-to-End Bounded Region Serializability Using Commodity Hardware Transactional Memory Legato: End-to-End Bounded Region Serializability Using Commodity Hardware Transactional Memory Aritra Sengupta Man Cao Michael D. Bond Milind Kulkarni Ohio State University (USA) Purdue University (USA)

More information

Valor: Efficient, Software-Only Region Conflict Exceptions

Valor: Efficient, Software-Only Region Conflict Exceptions Valor: Efficient, Software-Only Region Conflict Exceptions Swarnendu Biswas PhD Student, Ohio State University biswass@cse.ohio-state.edu Abstract Data races complicate programming language semantics,

More information

Avoiding Consistency Exceptions Under Strong Memory Models

Avoiding Consistency Exceptions Under Strong Memory Models Avoiding Consistency Exceptions Under Strong Memory Models Minjia Zhang Microsoft Research (USA) minjiaz@microsoft.com Swarnendu Biswas University of Texas at Austin (USA) sbiswas@ices.utexas.edu Michael

More information

DoubleChecker: Efficient Sound and Precise Atomicity Checking

DoubleChecker: Efficient Sound and Precise Atomicity Checking DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University PLDI 2014 Impact of Concurrency Bugs Impact

More information

Towards Parallel, Scalable VM Services

Towards Parallel, Scalable VM Services Towards Parallel, Scalable VM Services Kathryn S McKinley The University of Texas at Austin Kathryn McKinley Towards Parallel, Scalable VM Services 1 20 th Century Simplistic Hardware View Faster Processors

More information

11/19/2013. Imperative programs

11/19/2013. Imperative programs if (flag) 1 2 From my perspective, parallelism is the biggest challenge since high level programming languages. It s the biggest thing in 50 years because industry is betting its future that parallel programming

More information

Lightweight Data Race Detection for Production Runs

Lightweight Data Race Detection for Production Runs Lightweight Data Race Detection for Production Runs Ohio State CSE Technical Report #OSU-CISRC-1/15-TR01; last updated August 2016 Swarnendu Biswas Ohio State University biswass@cse.ohio-state.edu Michael

More information

NDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness

NDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness EECS Electrical Engineering and Computer Sciences P A R A L L E L C O M P U T I N G L A B O R A T O R Y NDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness Jacob Burnim,

More information

Lightweight Data Race Detection for Production Runs

Lightweight Data Race Detection for Production Runs Lightweight Data Race Detection for Production Runs Swarnendu Biswas University of Texas at Austin (USA) sbiswas@ices.utexas.edu Man Cao Ohio State University (USA) caoma@cse.ohio-state.edu Minjia Zhang

More information

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center October

More information

Valor: Efficient, Software-Only Region Conflict Exceptions

Valor: Efficient, Software-Only Region Conflict Exceptions to Reuse * * Evaluated * OOPSLA * Artifact * AEC Valor: Efficient, Software-Only Region Conflict Exceptions Swarnendu Biswas Minjia Zhang Michael D. Bond Brandon Lucia Ohio State University (USA) Carnegie

More information

PACER: Proportional Detection of Data Races

PACER: Proportional Detection of Data Races PACER: Proportional Detection of Data Races Michael D. Bond Katherine E. Coons Kathryn S. McKinley Department of Computer Science, The University of Texas at Austin {mikebond,coonske,mckinley}@cs.utexas.edu

More information

How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services

How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services Kathryn S McKinley The University of Texas at Austin Kathryn McKinley Towards Parallel, Scalable VM Services 1 20 th

More information

Probabilistic Calling Context

Probabilistic Calling Context Probabilistic Calling Context Michael D. Bond Kathryn S. McKinley University of Texas at Austin Why Context Sensitivity? Static program location not enough at com.mckoi.db.jdbcserver.jdbcinterface.execquery():213

More information

A Dynamic Evaluation of the Precision of Static Heap Abstractions

A Dynamic Evaluation of the Precision of Static Heap Abstractions A Dynamic Evaluation of the Precision of Static Heap Abstractions OOSPLA - Reno, NV October 20, 2010 Percy Liang Omer Tripp Mayur Naik Mooly Sagiv UC Berkeley Tel-Aviv Univ. Intel Labs Berkeley Tel-Aviv

More information

The Java Memory Model

The Java Memory Model Jeremy Manson 1, William Pugh 1, and Sarita Adve 2 1 University of Maryland 2 University of Illinois at Urbana-Champaign Presented by John Fisher-Ogden November 22, 2005 Outline Introduction Sequential

More information

EECS 570 Lecture 13. Directory & Optimizations. Winter 2018 Prof. Satish Narayanasamy

EECS 570 Lecture 13. Directory & Optimizations. Winter 2018 Prof. Satish Narayanasamy Directory & Optimizations Winter 2018 Prof. Satish Narayanasamy http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Hybrid Static Dynamic Analysis for Region Serializability

Hybrid Static Dynamic Analysis for Region Serializability Hybrid Static Dynamic Analysis for Region Serializability Aritra Sengupta Swarnendu Biswas Minjia Zhang Michael D. Bond Milind Kulkarni + Ohio State University + Purdue University {sengupta,biswass,zhanminj,mikebond@cse.ohio-state.edu,

More information

Trace-based JIT Compilation

Trace-based JIT Compilation Trace-based JIT Compilation Hiroshi Inoue, IBM Research - Tokyo 1 Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace,

More information

High-level languages

High-level languages High-level languages High-level languages are not immune to these problems. Actually, the situation is even worse: the source language typically operates over mixed-size values (multi-word and bitfield);

More information

Chí Cao Minh 28 May 2008

Chí Cao Minh 28 May 2008 Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor

More information

Atomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions

Atomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions Towards Reliable Multithreaded Software Atomicity for Reliable Concurrent Software Cormac Flanagan UC Santa Cruz Joint work with Stephen Freund Shaz Qadeer Microsoft Research 1 Multithreaded software increasingly

More information

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs , pp.45-49 http://dx.doi.org/10.14257/astl.2014.76.12 Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs Hyun-Ji Kim 1, Byoung-Kwi Lee 2, Ok-Kyoon Ha 3, and Yong-Kee Jun 1 1 Department

More information

Lu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine

Lu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine Lu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine 2015-07-09 Inefficient code regions [G. Jin et al. PLDI 2012] Inefficient code

More information

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races) FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races) Cormac Flanagan UC Santa Cruz Stephen Freund Williams College Multithreading and Multicore! Multithreaded programming

More information

Reference Object Processing in On-The-Fly Garbage Collection

Reference Object Processing in On-The-Fly Garbage Collection Reference Object Processing in On-The-Fly Garbage Collection Tomoharu Ugawa, Kochi University of Technology Richard Jones, Carl Ritson, University of Kent Weak Pointers Weak pointers are a mechanism to

More information

Compiler Techniques For Memory Consistency Models

Compiler Techniques For Memory Consistency Models Compiler Techniques For Memory Consistency Models Students: Xing Fang (Purdue), Jaejin Lee (Seoul National University), Kyungwoo Lee (Purdue), Zehra Sura (IBM T.J. Watson Research Center), David Wong (Intel

More information

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016 Java Memory Model Jian Cao Department of Electrical and Computer Engineering Rice University Sep 22, 2016 Content Introduction Java synchronization mechanism Double-checked locking Out-of-Thin-Air violation

More information

CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1

CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1 CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1 1 University of California, Berkeley, USA {pallavi,parkcs,ksen}@eecs.berkeley.edu

More information

Atomicity via Source-to-Source Translation

Atomicity via Source-to-Source Translation Atomicity via Source-to-Source Translation Benjamin Hindman Dan Grossman University of Washington 22 October 2006 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){

More information

Chronicler: Lightweight Recording to Reproduce Field Failures

Chronicler: Lightweight Recording to Reproduce Field Failures Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and Gail Kaiser Columbia University, New York, NY USA What happens when software crashes? Stack Traces Aren

More information

MAMA: Mostly Automatic Management of Atomicity

MAMA: Mostly Automatic Management of Atomicity MAMA: Mostly Automatic Management of Atomicity Christian DeLozier Joseph Devietti Milo M. K. Martin Computer and Information Science Department, University of Pennsylvania {delozier, devietti, milom}@cis.upenn.edu

More information

Programming Language Memory Models: What do Shared Variables Mean?

Programming Language Memory Models: What do Shared Variables Mean? Programming Language Memory Models: What do Shared Variables Mean? Hans-J. Boehm 10/25/2010 1 Disclaimers: This is an overview talk. Much of this work was done by others or jointly. I m relying particularly

More information

A Case for System Support for Concurrency Exceptions

A Case for System Support for Concurrency Exceptions A Case for System Support for Concurrency Exceptions Luis Ceze, Joseph Devietti, Brandon Lucia and Shaz Qadeer University of Washington {luisceze, devietti, blucia0a}@cs.washington.edu Microsoft Research

More information

Taming release-acquire consistency

Taming release-acquire consistency Taming release-acquire consistency Ori Lahav Nick Giannarakis Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) POPL 2016 Weak memory models Weak memory models provide formal sound semantics

More information

Nondeterminism is Unavoidable, but Data Races are Pure Evil

Nondeterminism is Unavoidable, but Data Races are Pure Evil Nondeterminism is Unavoidable, but Data Races are Pure Evil Hans-J. Boehm HP Labs 5 November 2012 1 Low-level nondeterminism is pervasive E.g. Atomically incrementing a global counter is nondeterministic.

More information

Atomicity. Experience with Calvin. Tools for Checking Atomicity. Tools for Checking Atomicity. Tools for Checking Atomicity

Atomicity. Experience with Calvin. Tools for Checking Atomicity. Tools for Checking Atomicity. Tools for Checking Atomicity 1 2 Atomicity The method inc() is atomic if concurrent ths do not interfere with its behavior Guarantees that for every execution acq(this) x t=i y i=t+1 z rel(this) acq(this) x y t=i i=t+1 z rel(this)

More information

C. Flanagan Dynamic Analysis for Atomicity 1

C. Flanagan Dynamic Analysis for Atomicity 1 1 Atomicity The method inc() is atomic if concurrent threads do not interfere with its behavior Guarantees that for every execution acq(this) x t=i y i=t+1 z rel(this) acq(this) x y t=i i=t+1 z rel(this)

More information

Part 3: Beyond Reduction

Part 3: Beyond Reduction Lightweight Analyses For Reliable Concurrency Part 3: Beyond Reduction Stephen Freund Williams College joint work with Cormac Flanagan (UCSC), Shaz Qadeer (MSR) Busy Acquire Busy Acquire void busy_acquire()

More information

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April

More information

Cooperative Concurrency for a Multicore World

Cooperative Concurrency for a Multicore World Cooperative Concurrency for a Multicore World Stephen Freund Williams College Cormac Flanagan, Tim Disney, Caitlin Sadowski, Jaeheon Yi, UC Santa Cruz Controlling Thread Interference: #1 Manually x = 0;

More information

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo June 15, 2012 ISMM 2012 at Beijing, China Motivation

More information

Motivation & examples Threads, shared memory, & synchronization

Motivation & examples Threads, shared memory, & synchronization 1 Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency

More information

Static and Dynamic Program Analysis: Synergies and Applications

Static and Dynamic Program Analysis: Synergies and Applications Static and Dynamic Program Analysis: Synergies and Applications Mayur Naik Intel Labs, Berkeley CS 243, Stanford University March 9, 2011 Today s Computing Platforms Trends: parallel cloud mobile Traits:

More information

Semantic Atomicity for Multithreaded Programs!

Semantic Atomicity for Multithreaded Programs! P A R A L L E L C O M P U T I N G L A B O R A T O R Y Semantic Atomicity for Multithreaded Programs! Jacob Burnim, George Necula, Koushik Sen! Parallel Computing Laboratory! University of California, Berkeley!

More information

Enforcing Textual Alignment of

Enforcing Textual Alignment of Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Enforcing Textual Alignment of Collectives using Dynamic Checks and Katherine Yelick UC Berkeley Parallel Computing

More information

Efficient Runtime Tracking of Allocation Sites in Java

Efficient Runtime Tracking of Allocation Sites in Java Efficient Runtime Tracking of Allocation Sites in Java Rei Odaira, Kazunori Ogata, Kiyokuni Kawachiya, Tamiya Onodera, Toshio Nakatani IBM Research - Tokyo Why Do You Need Allocation Site Information?

More information

A Parameterized Type System for Race-Free Java Programs

A Parameterized Type System for Race-Free Java Programs A Parameterized Type System for Race-Free Java Programs Chandrasekhar Boyapati Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology {chandra, rinard@lcs.mit.edu Data races

More information

Foundations of the C++ Concurrency Memory Model

Foundations of the C++ Concurrency Memory Model Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model

More information

Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat

Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat Rei Odaira, Toshio Nakatani IBM Research Tokyo ASPLOS 2012 March 5, 2012 Many Wasteful Objects Hurt Performance.

More information

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency exceptions

More information

Towards Transactional Memory Semantics for C++

Towards Transactional Memory Semantics for C++ Towards Transactional Memory Semantics for C++ Tatiana Shpeisman Intel Corporation Santa Clara, CA tatiana.shpeisman@intel.com Yang Ni Intel Corporation Santa Clara, CA yang.ni@intel.com Ali-Reza Adl-Tabatabai

More information

The Java Memory Model

The Java Memory Model The Java Memory Model The meaning of concurrency in Java Bartosz Milewski Plan of the talk Motivating example Sequential consistency Data races The DRF guarantee Causality Out-of-thin-air guarantee Implementation

More information

The C/C++ Memory Model: Overview and Formalization

The C/C++ Memory Model: Overview and Formalization The C/C++ Memory Model: Overview and Formalization Mark Batty Jasmin Blanchette Scott Owens Susmit Sarkar Peter Sewell Tjark Weber Verification of Concurrent C Programs C11 / C++11 In 2011, new versions

More information

FastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund

FastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund FastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund doi:10.1145/1839676.1839699 Abstract Multithreaded programs are notoriously prone to race conditions. Prior

More information

IBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation

IBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 Trace Compilation Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace, a hot path identified

More information

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction : A Distributed Shared Memory System Based on Multiple Java Virtual Machines X. Chen and V.H. Allan Computer Science Department, Utah State University 1998 : Introduction Built on concurrency supported

More information

A Causality-Based Runtime Check for (Rollback) Atomicity

A Causality-Based Runtime Check for (Rollback) Atomicity A Causality-Based Runtime Check for (Rollback) Atomicity Serdar Tasiran Koc University Istanbul, Turkey Tayfun Elmas Koc University Istanbul, Turkey RV 2007 March 13, 2007 Outline This paper: Define rollback

More information

ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA

ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA B.E., Nagpur University, 1998 M.S., University of Illinois at Urbana-Champaign, 2001 DISSERTATION Submitted in partial fulfillment of

More information

Multicore Programming Java Memory Model

Multicore Programming Java Memory Model p. 1 Multicore Programming Java Memory Model Peter Sewell Jaroslav Ševčík Tim Harris University of Cambridge MSR with thanks to Francesco Zappa Nardelli, Susmit Sarkar, Tom Ridge, Scott Owens, Magnus O.

More information

Coherence, Consistency, and Déjà vu: Memory Hierarchies in the Era of Specialization

Coherence, Consistency, and Déjà vu: Memory Hierarchies in the Era of Specialization Coherence, Consistency, and Déjà vu: Memory Hierarchies in the Era of Specialization Sarita Adve University of Illinois at Urbana-Champaign sadve@illinois.edu w/ Johnathan Alsop, Rakesh Komuravelli, Matt

More information

Efficient Processor Support for DRFx, a Memory Model with Exceptions

Efficient Processor Support for DRFx, a Memory Model with Exceptions Efficient Processor Support for DRFx, a Memory Model with Exceptions Abhayendra Singh Daniel Marino Satish Narayanasamy Todd Millstein Madanlal Musuvathi University of Michigan, Ann Arbor University of

More information

Language- Level Memory Models

Language- Level Memory Models Language- Level Memory Models A Bit of History Here is a new JMM [5]! 2000 Meyers & Alexandrescu DCL is not portable in C++ [3]. Manson et. al New shiny C++ memory model 2004 2008 2012 2002 2006 2010 2014

More information

Phase-based Adaptive Recompilation in a JVM

Phase-based Adaptive Recompilation in a JVM Phase-based Adaptive Recompilation in a JVM Dayong Gu Clark Verbrugge Sable Research Group, School of Computer Science McGill University, Montréal, Canada {dgu1, clump}@cs.mcgill.ca April 7, 2008 Sable

More information

Software Transactions: A Programming-Languages Perspective

Software Transactions: A Programming-Languages Perspective Software Transactions: A Programming-Languages Perspective Dan Grossman University of Washington 5 December 2006 A big deal Research on software transactions broad Programming languages PLDI, POPL, ICFP,

More information

Towards Garbage Collection Modeling

Towards Garbage Collection Modeling Towards Garbage Collection Modeling Peter Libič Petr Tůma Department of Distributed and Dependable Systems Faculty of Mathematics and Physics, Charles University Malostranské nám. 25, 118 00 Prague, Czech

More information

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end

More information

Types for Precise Thread Interference

Types for Precise Thread Interference Types for Precise Thread Interference Jaeheon Yi, Tim Disney, Cormac Flanagan UC Santa Cruz Stephen Freund Williams College October 23, 2011 2 Multiple Threads Single Thread is a non-atomic read-modify-write

More information

Efficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management

Efficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management to Reuse * * Evaluated * OOPSLA * Artifact * AEC Efficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management Jipeng Huang Michael D. Bond Ohio State

More information

Duet: Static Analysis for Unbounded Parallelism

Duet: Static Analysis for Unbounded Parallelism Duet: Static Analysis for Unbounded Parallelism Azadeh Farzan and Zachary Kincaid University of Toronto Abstract. Duet is a static analysis tool for concurrent programs in which the number of executing

More information

An introduction to weak memory consistency and the out-of-thin-air problem

An introduction to weak memory consistency and the out-of-thin-air problem An introduction to weak memory consistency and the out-of-thin-air problem Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) CONCUR, 7 September 2017 Sequential consistency 2 Sequential

More information

Maximal Causality Reduction for TSO and PSO. Shiyou Huang Jeff Huang Parasol Lab, Texas A&M University

Maximal Causality Reduction for TSO and PSO. Shiyou Huang Jeff Huang Parasol Lab, Texas A&M University Maximal Causality Reduction for TSO and PSO Shiyou Huang Jeff Huang huangsy@tamu.edu Parasol Lab, Texas A&M University 1 A Real PSO Bug $12 million loss of equipment curpos = new Point(1,2); class Point

More information

Chimera: Hybrid Program Analysis for Determinism

Chimera: Hybrid Program Analysis for Determinism Chimera: Hybrid Program Analysis for Determinism Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor - 1 - * Chimera image from http://superpunch.blogspot.com/2009/02/chimera-sketch.html

More information

Process Synchronization. Mehdi Kargahi School of ECE University of Tehran Spring 2008

Process Synchronization. Mehdi Kargahi School of ECE University of Tehran Spring 2008 Process Synchronization Mehdi Kargahi School of ECE University of Tehran Spring 2008 Producer-Consumer (Bounded Buffer) Producer Consumer Race Condition Producer Consumer Critical Sections Structure of

More information

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem. Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which

More information

Program logics for relaxed consistency

Program logics for relaxed consistency Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 1st Lecture, 28 July 2014 Outline Part I. Weak memory models 1. Intro

More information

DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages

DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages Daniel Marino Abhayendra Singh Todd Millstein Madanlal Musuvathi Satish Narayanasamy University of California, Los Angeles

More information

Program Calling Context. Motivation

Program Calling Context. Motivation Program alling ontext Motivation alling context enhances program understanding and dynamic analyses by providing a rich representation of program location ollecting calling context can be expensive The

More information

Optimistic Shared Memory Dependence Tracing

Optimistic Shared Memory Dependence Tracing Optimistic Shared Memory Dependence Tracing Yanyan Jiang1, Du Li2, Chang Xu1, Xiaoxing Ma1 and Jian Lu1 Nanjing University 2 Carnegie Mellon University 1 powered by Understanding Non-determinism Concurrent

More information

Part IV: Exploiting Purity for Atomicity

Part IV: Exploiting Purity for Atomicity Busy Acquire Part IV: Exploiting Purity for Atomicity atomic void busy_acquire() { if (CAS(m,0,1)) break; CAS(m,0,1) CAS(m,0,1) CAS(m,0,1) (fails) (fails) (succeeds) boolean b[max]; // b[i]==true iff block

More information

Deterministic Parallel Programming

Deterministic Parallel Programming Deterministic Parallel Programming Concepts and Practices 04/2011 1 How hard is parallel programming What s the result of this program? What is data race? Should data races be allowed? Initially x = 0

More information

A Scalable Lock Manager for Multicores

A Scalable Lock Manager for Multicores A Scalable Lock Manager for Multicores Hyungsoo Jung Hyuck Han Alan Fekete NICTA Samsung Electronics University of Sydney Gernot Heiser NICTA Heon Y. Yeom Seoul National University @University of Sydney

More information

Weak Memory Models: an Operational Theory

Weak Memory Models: an Operational Theory Opening Weak Memory Models: an Operational Theory INRIA Sophia Antipolis 9th June 2008 Background on weak memory models Memory models, what are they good for? Hardware optimizations Contract between hardware

More information

Deterministic Replay and Data Race Detection for Multithreaded Programs

Deterministic Replay and Data Race Detection for Multithreaded Programs Deterministic Replay and Data Race Detection for Multithreaded Programs Dongyoon Lee Computer Science Department - 1 - The Shift to Multicore Systems 100+ cores Desktop/Server 8+ cores Smartphones 2+ cores

More information

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania Combining the Logical and the Probabilistic in Program Analysis Xin Zhang Xujie Si Mayur Naik University of Pennsylvania What is Program Analysis? int f(int i) {... } Program Analysis x may be null!...

More information

Phosphor: Illuminating Dynamic. Data Flow in Commodity JVMs

Phosphor: Illuminating Dynamic. Data Flow in Commodity JVMs Phosphor: Illuminating Dynamic Fork me on Github Data Flow in Commodity JVMs Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA Dynamic Data Flow Analysis: Taint Tracking Output that is

More information

The New Java Technology Memory Model

The New Java Technology Memory Model The New Java Technology Memory Model java.sun.com/javaone/sf Jeremy Manson and William Pugh http://www.cs.umd.edu/~pugh 1 Audience Assume you are familiar with basics of Java technology-based threads (

More information

Sound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection

Sound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection Wellesley College Wellesley College Digital Scholarship and Archive Honors Thesis Collection 2017 Sound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection Samuila Mincheva sminchev@wellesley.edu

More information

Potential violations of Serializability: Example 1

Potential violations of Serializability: Example 1 CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same

More information

AtomChase: Directed Search Towards Atomicity Violations

AtomChase: Directed Search Towards Atomicity Violations AtomChase: Directed Search Towards Atomicity Violations Mahdi Eslamimehr, Mohsen Lesani Viewpoints Research Institute, 1025 Westwood Blvd 2nd flr, Los Angeles, CA 90024 t: (310) 208-0524 AtomChase: Directed

More information

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important

More information

Dynamic Analysis of Inefficiently-Used Containers

Dynamic Analysis of Inefficiently-Used Containers Dynamic Analysis of Inefficiently-Used Containers Shengqian (Victor) Yang 1 Dacong Yan 1 Guoqing Xu 2 Atanas Rountev 1 1 Ohio State University 2 University of California, Irvine PRESTO: Program Analyses

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

Static Conflict Analysis for Multi-Threaded Object Oriented Programs

Static Conflict Analysis for Multi-Threaded Object Oriented Programs Static Conflict Analysis for Multi-Threaded Object Oriented Programs Christoph von Praun and Thomas Gross Presented by Andrew Tjang Authors Von Praun Recent PhD Currently at IBM (yorktown( heights) Compilers

More information

RELAXED CONSISTENCY 1

RELAXED CONSISTENCY 1 RELAXED CONSISTENCY 1 RELAXED CONSISTENCY Relaxed Consistency is a catch-all term for any MCM weaker than TSO GPUs have relaxed consistency (probably) 2 XC AXIOMS TABLE 5.5: XC Ordering Rules. An X Denotes

More information