Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization
|
|
- Mark Peters
- 5 years ago
- Views:
Transcription
1 Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni 1 PPPJ 2015, Melbourne, Florida, USA
2 Programming Language Semantics? Data Races Java provides weak semantics 2
3 Weak Semantics T1 A a = null; boolean init = false; T2 a = new A(); init = true; if (init) a.field++; 3
4 Weak Semantics T1 a = new A(); init = true; No data dependence T2 if (init) a.field++; 4
5 Weak Semantics T1 A a = null; boolean init = false; T2 a = new A(); init = true; if (init) a.field++; 5
6 Weak Semantics T1 T2 init = true; if (init) a.field++; a = new A(); 6
7 Weak Semantics T1 T2 Null dereference init = true; if (init) a.field++; a = new A(); 7
8 Java Memory Model JMM (Manson et al., POPL, 2005) variant of DRF0 (Adve and Hill, ISCA, 1990) Atomicity of synchronization-free regions for datarace-free programs Data races: weak semantics 8
9 Need for Stronger Memory Models The inability to define reasonable semantics for programs with data races is not just a theoretical shortcoming, but a fundamental hole in the foundation of our languages and systems Give better semantics to programs with data races Stronger memory models Adve and Boehm, CACM,
10 Run-time cost Memory Models: Run-time cost vs Strength Synchronization-free region serializability 1 SC Statically Bounded Region Serializability DRF0 Strength 1. Ouyang et al.... and region serializability for all. In HotPar,
11 Run-time cost Memory Models: Run-time cost vs Strength Synchronization-free region serializability SC Statically Bounded Region Serializability 2 DRF0 Strength Sengupta et al. Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability. ASPLOS, 2015.
12 Statically Bounded Region Serializability (SBRS) Compiler demarcated regions execute atomically Execution is an interleaving of these regions Sengupta et al., ASPLOS,
13 Statically Bounded Region Serializability (SBRS) acq(lock) Synchronization operations Method calls Loop backedges methodcall() rel(lock) 13
14 Statically Bounded Region Serializability (SBRS) acq(lock) methodcall() rel(lock) 14
15 Statically Bounded Region Serializability (SBRS) Statically and dynamically bounded Loop backedges 15
16 Overview Enforcement of SBRS with dynamic locks Precise dynamic locks: EnfoRSer-D (our prior work), low contention Enforcement of SBRS with static locks Imprecise static locks: EnfoRSer-S, low instrumentation overhead Hybridization of locks EnfoRSer-H: static and dynamic locks, right locks for right sites? Results EnfoRSer-H does at least as well as either. Some cases significant benefit 16
17 Enforcement of SBRS Prevent two concurrent accesses to the same memory location where one is a write 17
18 Enforcement of SBRS Prevent two Prevent concurrent regions accesses that to have the same memory location races where from one running is a write concurrently! 18
19 Enforcement of SBRS Acquire locks before each memory access Acquire locks at the start of the region 19
20 Enforcement of SBRS Acquire locks before each memory access Precise object locks: dynamic locks Acquire locks at the start of the region 20
21 Enforcement of SBRS Acquire locks before each memory access Acquire locks at the start of the region Region level locks: statically chosen for an access site 21
22 EnfoRSer-D Precise dynamic locks Per-access locks with retry mechanism Compiler Transformations: Speculative execution 22
23 SBRS with Dynamic Locks Y = = X Dynamic per-access locks Z = 23
24 SBRS with Dynamic Locks Y = Program access = X Z = 24
25 SBRS with Dynamic Locks Y = = X Ownership transferred Z = 25
26 SBRS with Dynamic Locks Y = Dynamic locks: precise location, hence no reasoning about races statically! = X Ownership transferred Z = 26
27 SBRS with Dynamic Locks Precise conflict detectionefficient for high-conflicting regions Y = High instrumentation cost at each access = X High overhead for lowconflicting programs Ownership transferred Z = 27
28 Experimental Methodology Benchmarks DaCapo 2006, 9.12-bach Fixed-workload versions of SPECjbb2000 and SPECjbb2005 Platform Intel Xeon system: 32 cores 28
29 Implementation and Evaluation Developed in Jikes RVM Code publicly available on Jikes RVM Research Archive 29
30 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 30
31 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D % overhead on average hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 31
32 %overhead over unmodified JVM Run-time Performance EnfoRSer-D Precise conflict detection but high instrumentation overhead! Complex compiler transformations (additional code) Can we do better? 15-5 hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 32
33 EnfoRSer with Static Locks Reduce the instrumentation overhead of EnfoRSer-D Less complex code generation 33
34 EnfoRSer-S Static region level locks Racing sites acquire same lock Coarsened locks to reduce instrumentation overhead 34
35 SBRS with Locks on Static Sites L0 Static locks: racy sites protected by same lock L1 L2 Static Region Locks Y = = X Z = 35
36 SBRS with Locks on Static Sites L0 L1 L2 All locks acquired before access Y = = X Z = 36
37 SBRS with Locks on Static Sites L0 L1 L2 Y = Ownership transferred = X Z = 37
38 SBRS with Locks on Static Sites L0 L1 Imprecise and does not lower instrumentation overhead! L2 Y = Ownership transferred = X Z = 38
39 SBRS with Locks on Static Sites L012 Coarsened single lock Y = = X Z = 39
40 SBRS with Locks on Static Sites Low instrumentation L012 overhead Coarsened single lock Imprecise Y = conflict detection = X Z = 40
41 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 41
42 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 42
43 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 RACE 43
44 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 RACE Naik et al. s 2006 race detection algorithm, Chord 44
45 EnfoRSer-S R1 R2 R3 R4 L3 s0 s2 s5 s7 s3 L2 s8 s1 s4 s6 L4 L0 L1 45
46 EnfoRSer-S R1 R2 R3 R4 L0 L0 L1 L3 s0 s2 s5 s7 s3 s8 s1 s4 s6 46
47 EnfoRSer-S R1 R2 R3 R4 L0 L0 L1 L3 L1 L1 L2 L2 L4 s0 s2 s5 s7 s3 s8 s1 s4 s6 47
48 EnfoRSer-S R1 R2 R3 R4 L0 L0 L1 L3 L1 L1 L2 L2 L4 s0 s2 Does not reduce s3 instrumentation overhead! s5 s7 s8 s1 s4 s6 48
49 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 49
50 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 Same Region Static Locks (SRSL) RACE 50
51 EnfoRSer-S R1 R2 R3 R4 s0 s2 s5 s7 s3 s8 s1 s4 s6 Same Region Static Locks (SRSL) RACE L0 L1 51
52 EnfoRSer-S R1 R2 R3 R4 L0 L0 L0 L1 s0 s2 s5 s7 s3 s8 s1 s4 s6 52
53 EnfoRSer-S R1 R2 R3 R4 s0 L0 Reduces instrumentation L0 L0 L1 overhead s2 s5 s7 Increases contention s3 s8 s1 s4 s6 53
54 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S 11,000 3,500 12,000 33,000 10,000 4,300 3, , hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 54
55 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S 11,000 3,500 12,000 33,000 10,000 4,300 3, , % overhead on average hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 55
56 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S 11,000 3,500 12,000 33,000 10,000 4,300 3, , EnfoRSer-S performs better 2600% overhead on average 15-5 hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 56
57 Hybridizing Locks High contention sites: Precise dynamic locks (precise conflict detection) Low contention sites: Single static lock (low instrumentation overhead) 57
58 EnfoRSer-H Static locks to reduce instrumentation Dynamic locks for precise conflict detection Correctly and efficiently combine: best of both 58
59 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, , % overhead on average hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 59
60 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, , Provides nearly the best of either approaches! % reduction in overhead 15-5 hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 60
61 %overhead over unmodified JVM Run-time Performance EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, ,600 49% reduction in lock acquires and not increasing contention! % reduction in overhead hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 61
62 %overhead over unmodified JVM Run-time Performance 115 EnfoRSer-D EnfoRSer-S EnfoRSer-H 11,000 3,500 12,000 33,000 10,000 4,300 3, , % reduction in lock acquires Dramatically improves on both when hybridization helps! 87% reduction in overhead hsqldb6 xalan6 avrora9 luindex9 lusearch6 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean 62
63 Hybridizing Locks Right synchronization for right program sites Combining different synchronization mechanisms Best of different synchronization mechanisms 63
64 EnfoRSer-H Costbenefit model Profiling Right locks for right sites Assignment algorithm Combine static and dynamic locks 64
65 Two-iteration Methodology Program execution First iteration Profiling (use RACE relation) Assignment Algorithm Between iterations Compute SRSL and estimate cost Program exectuion Second iteration Use hybridized instrumentation 65
66 Assignment Algorithm 66
67 Profiled data Compute initial cost Change a static lock to dynamic lock Change all its racing sites to dynamic locks Profiled data Recompute cost current cost < previous cost No Yes Retain state Revert to previous state 67
68 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 estimatedcost = N i estimatecost(ri) ; 68
69 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 estimatedcost = N i estimatecost(ri) ; 1. Conflicts on each site 2. Lock acquires on each site 69
70 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 70
71 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 71
72 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 If (current cost < previous cost) // retain state 72
73 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Switch on next site 73
74 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 74
75 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 If (current cost < previous cost) // retain state 75
76 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Switch on next site 76
77 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 estimatedcost = N i estimatecost(ri) ; current cost > previous cost // revert state 77
78 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 78
79 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 s1, s9 79
80 Assignment Algorithm R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Final State! 80
81 Lock Assignment R1 R2 R3 R4 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 L1 81
82 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 L1 82
83 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 L1 83
84 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 84
85 Lock Assignment R1 R2 R3 R4 L1 L1 L1 s0 s2 s4 s7 s1 s3 s5 s8 s9 s6 Per-object locks and transformations 85
86 Related Work Use of static locks Chimera, Lee et al., PLDI 2012 Use of static analysis Static Conflict Analysis for Multi-Threaded Object-Oriented Programs, Von Praun and Gross, PLDI, Goldilocks, Elmas et al., PLDI Red Card, Flanagan and Freund, ECOOP 2013 Hybridizing locks Hybrid Tracking, Cao et al., WODET
87 Conclusion Combine synchronization mechanisms Best of different kinds of synchronization Efficient enforcement of SBRS Static locks Dynamic locks Precise conflict detection Low instrumentation overhead Select the best for a region 87
Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability
Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Programming
More informationLegato: Bounded Region Serializability Using Commodity Hardware Transactional Memory. Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Legato: Bounded Region Serializability Using Commodity Hardware al Memory Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni 1 Programming Language Semantics? Data Races C++ no guarantee of semantics
More informationToward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization
Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization Aritra Sengupta Man Cao Michael D. Bond Milind Kulkarni Ohio State University Purdue University {sengupta,caoma,mikebond}@cse.ohio-state.edu
More informationLightweight Data Race Detection for Production Runs
Lightweight Data Race Detection for Production Runs Swarnendu Biswas, UT Austin Man Cao, Ohio State University Minjia Zhang, Microsoft Research Michael D. Bond, Ohio State University Benjamin P. Wood,
More informationHybrid Static Dynamic Analysis for Statically Bounded Region Serializability
Hybrid Static Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta Swarnendu Biswas Minjia Zhang Michael D. Bond Milind Kulkarni Ohio State University Purdue University {sengupta,biswass,zhanminj,mikebond@cse.ohio-state.edu
More informationLegato: End-to-End Bounded Region Serializability Using Commodity Hardware Transactional Memory
Legato: End-to-End Bounded Region Serializability Using Commodity Hardware Transactional Memory Aritra Sengupta Man Cao Michael D. Bond Milind Kulkarni Ohio State University (USA) Purdue University (USA)
More informationValor: Efficient, Software-Only Region Conflict Exceptions
Valor: Efficient, Software-Only Region Conflict Exceptions Swarnendu Biswas PhD Student, Ohio State University biswass@cse.ohio-state.edu Abstract Data races complicate programming language semantics,
More informationAvoiding Consistency Exceptions Under Strong Memory Models
Avoiding Consistency Exceptions Under Strong Memory Models Minjia Zhang Microsoft Research (USA) minjiaz@microsoft.com Swarnendu Biswas University of Texas at Austin (USA) sbiswas@ices.utexas.edu Michael
More informationDoubleChecker: Efficient Sound and Precise Atomicity Checking
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University PLDI 2014 Impact of Concurrency Bugs Impact
More informationTowards Parallel, Scalable VM Services
Towards Parallel, Scalable VM Services Kathryn S McKinley The University of Texas at Austin Kathryn McKinley Towards Parallel, Scalable VM Services 1 20 th Century Simplistic Hardware View Faster Processors
More information11/19/2013. Imperative programs
if (flag) 1 2 From my perspective, parallelism is the biggest challenge since high level programming languages. It s the biggest thing in 50 years because industry is betting its future that parallel programming
More informationLightweight Data Race Detection for Production Runs
Lightweight Data Race Detection for Production Runs Ohio State CSE Technical Report #OSU-CISRC-1/15-TR01; last updated August 2016 Swarnendu Biswas Ohio State University biswass@cse.ohio-state.edu Michael
More informationNDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness
EECS Electrical Engineering and Computer Sciences P A R A L L E L C O M P U T I N G L A B O R A T O R Y NDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness Jacob Burnim,
More informationLightweight Data Race Detection for Production Runs
Lightweight Data Race Detection for Production Runs Swarnendu Biswas University of Texas at Austin (USA) sbiswas@ices.utexas.edu Man Cao Ohio State University (USA) caoma@cse.ohio-state.edu Minjia Zhang
More informationAdaptive Multi-Level Compilation in a Trace-based Java JIT Compiler
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center October
More informationValor: Efficient, Software-Only Region Conflict Exceptions
to Reuse * * Evaluated * OOPSLA * Artifact * AEC Valor: Efficient, Software-Only Region Conflict Exceptions Swarnendu Biswas Minjia Zhang Michael D. Bond Brandon Lucia Ohio State University (USA) Carnegie
More informationPACER: Proportional Detection of Data Races
PACER: Proportional Detection of Data Races Michael D. Bond Katherine E. Coons Kathryn S. McKinley Department of Computer Science, The University of Texas at Austin {mikebond,coonske,mckinley}@cs.utexas.edu
More informationHow s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services
How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services Kathryn S McKinley The University of Texas at Austin Kathryn McKinley Towards Parallel, Scalable VM Services 1 20 th
More informationProbabilistic Calling Context
Probabilistic Calling Context Michael D. Bond Kathryn S. McKinley University of Texas at Austin Why Context Sensitivity? Static program location not enough at com.mckoi.db.jdbcserver.jdbcinterface.execquery():213
More informationA Dynamic Evaluation of the Precision of Static Heap Abstractions
A Dynamic Evaluation of the Precision of Static Heap Abstractions OOSPLA - Reno, NV October 20, 2010 Percy Liang Omer Tripp Mayur Naik Mooly Sagiv UC Berkeley Tel-Aviv Univ. Intel Labs Berkeley Tel-Aviv
More informationThe Java Memory Model
Jeremy Manson 1, William Pugh 1, and Sarita Adve 2 1 University of Maryland 2 University of Illinois at Urbana-Champaign Presented by John Fisher-Ogden November 22, 2005 Outline Introduction Sequential
More informationEECS 570 Lecture 13. Directory & Optimizations. Winter 2018 Prof. Satish Narayanasamy
Directory & Optimizations Winter 2018 Prof. Satish Narayanasamy http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,
More informationHybrid Static Dynamic Analysis for Region Serializability
Hybrid Static Dynamic Analysis for Region Serializability Aritra Sengupta Swarnendu Biswas Minjia Zhang Michael D. Bond Milind Kulkarni + Ohio State University + Purdue University {sengupta,biswass,zhanminj,mikebond@cse.ohio-state.edu,
More informationTrace-based JIT Compilation
Trace-based JIT Compilation Hiroshi Inoue, IBM Research - Tokyo 1 Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace,
More informationHigh-level languages
High-level languages High-level languages are not immune to these problems. Actually, the situation is even worse: the source language typically operates over mixed-size values (multi-word and bitfield);
More informationChí Cao Minh 28 May 2008
Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor
More informationAtomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions
Towards Reliable Multithreaded Software Atomicity for Reliable Concurrent Software Cormac Flanagan UC Santa Cruz Joint work with Stephen Freund Shaz Qadeer Microsoft Research 1 Multithreaded software increasingly
More informationDynamic Monitoring Tool based on Vector Clocks for Multithread Programs
, pp.45-49 http://dx.doi.org/10.14257/astl.2014.76.12 Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs Hyun-Ji Kim 1, Byoung-Kwi Lee 2, Ok-Kyoon Ha 3, and Yong-Kee Jun 1 1 Department
More informationLu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine
Lu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine 2015-07-09 Inefficient code regions [G. Jin et al. PLDI 2012] Inefficient code
More informationFastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)
FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races) Cormac Flanagan UC Santa Cruz Stephen Freund Williams College Multithreading and Multicore! Multithreaded programming
More informationReference Object Processing in On-The-Fly Garbage Collection
Reference Object Processing in On-The-Fly Garbage Collection Tomoharu Ugawa, Kochi University of Technology Richard Jones, Carl Ritson, University of Kent Weak Pointers Weak pointers are a mechanism to
More informationCompiler Techniques For Memory Consistency Models
Compiler Techniques For Memory Consistency Models Students: Xing Fang (Purdue), Jaejin Lee (Seoul National University), Kyungwoo Lee (Purdue), Zehra Sura (IBM T.J. Watson Research Center), David Wong (Intel
More informationJava Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016
Java Memory Model Jian Cao Department of Electrical and Computer Engineering Rice University Sep 22, 2016 Content Introduction Java synchronization mechanism Double-checked locking Out-of-Thin-Air violation
More informationCalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1
CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1 1 University of California, Berkeley, USA {pallavi,parkcs,ksen}@eecs.berkeley.edu
More informationAtomicity via Source-to-Source Translation
Atomicity via Source-to-Source Translation Benjamin Hindman Dan Grossman University of Washington 22 October 2006 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){
More informationChronicler: Lightweight Recording to Reproduce Field Failures
Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and Gail Kaiser Columbia University, New York, NY USA What happens when software crashes? Stack Traces Aren
More informationMAMA: Mostly Automatic Management of Atomicity
MAMA: Mostly Automatic Management of Atomicity Christian DeLozier Joseph Devietti Milo M. K. Martin Computer and Information Science Department, University of Pennsylvania {delozier, devietti, milom}@cis.upenn.edu
More informationProgramming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean? Hans-J. Boehm 10/25/2010 1 Disclaimers: This is an overview talk. Much of this work was done by others or jointly. I m relying particularly
More informationA Case for System Support for Concurrency Exceptions
A Case for System Support for Concurrency Exceptions Luis Ceze, Joseph Devietti, Brandon Lucia and Shaz Qadeer University of Washington {luisceze, devietti, blucia0a}@cs.washington.edu Microsoft Research
More informationTaming release-acquire consistency
Taming release-acquire consistency Ori Lahav Nick Giannarakis Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) POPL 2016 Weak memory models Weak memory models provide formal sound semantics
More informationNondeterminism is Unavoidable, but Data Races are Pure Evil
Nondeterminism is Unavoidable, but Data Races are Pure Evil Hans-J. Boehm HP Labs 5 November 2012 1 Low-level nondeterminism is pervasive E.g. Atomically incrementing a global counter is nondeterministic.
More informationAtomicity. Experience with Calvin. Tools for Checking Atomicity. Tools for Checking Atomicity. Tools for Checking Atomicity
1 2 Atomicity The method inc() is atomic if concurrent ths do not interfere with its behavior Guarantees that for every execution acq(this) x t=i y i=t+1 z rel(this) acq(this) x y t=i i=t+1 z rel(this)
More informationC. Flanagan Dynamic Analysis for Atomicity 1
1 Atomicity The method inc() is atomic if concurrent threads do not interfere with its behavior Guarantees that for every execution acq(this) x t=i y i=t+1 z rel(this) acq(this) x y t=i i=t+1 z rel(this)
More informationPart 3: Beyond Reduction
Lightweight Analyses For Reliable Concurrency Part 3: Beyond Reduction Stephen Freund Williams College joint work with Cormac Flanagan (UCSC), Shaz Qadeer (MSR) Busy Acquire Busy Acquire void busy_acquire()
More informationA Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler
A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April
More informationCooperative Concurrency for a Multicore World
Cooperative Concurrency for a Multicore World Stephen Freund Williams College Cormac Flanagan, Tim Disney, Caitlin Sadowski, Jaeheon Yi, UC Santa Cruz Controlling Thread Interference: #1 Manually x = 0;
More informationIdentifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo
Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo June 15, 2012 ISMM 2012 at Beijing, China Motivation
More informationMotivation & examples Threads, shared memory, & synchronization
1 Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency
More informationStatic and Dynamic Program Analysis: Synergies and Applications
Static and Dynamic Program Analysis: Synergies and Applications Mayur Naik Intel Labs, Berkeley CS 243, Stanford University March 9, 2011 Today s Computing Platforms Trends: parallel cloud mobile Traits:
More informationSemantic Atomicity for Multithreaded Programs!
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Semantic Atomicity for Multithreaded Programs! Jacob Burnim, George Necula, Koushik Sen! Parallel Computing Laboratory! University of California, Berkeley!
More informationEnforcing Textual Alignment of
Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Enforcing Textual Alignment of Collectives using Dynamic Checks and Katherine Yelick UC Berkeley Parallel Computing
More informationEfficient Runtime Tracking of Allocation Sites in Java
Efficient Runtime Tracking of Allocation Sites in Java Rei Odaira, Kazunori Ogata, Kiyokuni Kawachiya, Tamiya Onodera, Toshio Nakatani IBM Research - Tokyo Why Do You Need Allocation Site Information?
More informationA Parameterized Type System for Race-Free Java Programs
A Parameterized Type System for Race-Free Java Programs Chandrasekhar Boyapati Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology {chandra, rinard@lcs.mit.edu Data races
More informationFoundations of the C++ Concurrency Memory Model
Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model
More informationContinuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat
Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat Rei Odaira, Toshio Nakatani IBM Research Tokyo ASPLOS 2012 March 5, 2012 Many Wasteful Objects Hurt Performance.
More information7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs
Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency exceptions
More informationTowards Transactional Memory Semantics for C++
Towards Transactional Memory Semantics for C++ Tatiana Shpeisman Intel Corporation Santa Clara, CA tatiana.shpeisman@intel.com Yang Ni Intel Corporation Santa Clara, CA yang.ni@intel.com Ali-Reza Adl-Tabatabai
More informationThe Java Memory Model
The Java Memory Model The meaning of concurrency in Java Bartosz Milewski Plan of the talk Motivating example Sequential consistency Data races The DRF guarantee Causality Out-of-thin-air guarantee Implementation
More informationThe C/C++ Memory Model: Overview and Formalization
The C/C++ Memory Model: Overview and Formalization Mark Batty Jasmin Blanchette Scott Owens Susmit Sarkar Peter Sewell Tjark Weber Verification of Concurrent C Programs C11 / C++11 In 2011, new versions
More informationFastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund
FastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund doi:10.1145/1839676.1839699 Abstract Multithreaded programs are notoriously prone to race conditions. Prior
More informationIBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation
数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 Trace Compilation Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace, a hot path identified
More informationMultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction
: A Distributed Shared Memory System Based on Multiple Java Virtual Machines X. Chen and V.H. Allan Computer Science Department, Utah State University 1998 : Introduction Built on concurrency supported
More informationA Causality-Based Runtime Check for (Rollback) Atomicity
A Causality-Based Runtime Check for (Rollback) Atomicity Serdar Tasiran Koc University Istanbul, Turkey Tayfun Elmas Koc University Istanbul, Turkey RV 2007 March 13, 2007 Outline This paper: Define rollback
More informationANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA
ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA B.E., Nagpur University, 1998 M.S., University of Illinois at Urbana-Champaign, 2001 DISSERTATION Submitted in partial fulfillment of
More informationMulticore Programming Java Memory Model
p. 1 Multicore Programming Java Memory Model Peter Sewell Jaroslav Ševčík Tim Harris University of Cambridge MSR with thanks to Francesco Zappa Nardelli, Susmit Sarkar, Tom Ridge, Scott Owens, Magnus O.
More informationCoherence, Consistency, and Déjà vu: Memory Hierarchies in the Era of Specialization
Coherence, Consistency, and Déjà vu: Memory Hierarchies in the Era of Specialization Sarita Adve University of Illinois at Urbana-Champaign sadve@illinois.edu w/ Johnathan Alsop, Rakesh Komuravelli, Matt
More informationEfficient Processor Support for DRFx, a Memory Model with Exceptions
Efficient Processor Support for DRFx, a Memory Model with Exceptions Abhayendra Singh Daniel Marino Satish Narayanasamy Todd Millstein Madanlal Musuvathi University of Michigan, Ann Arbor University of
More informationLanguage- Level Memory Models
Language- Level Memory Models A Bit of History Here is a new JMM [5]! 2000 Meyers & Alexandrescu DCL is not portable in C++ [3]. Manson et. al New shiny C++ memory model 2004 2008 2012 2002 2006 2010 2014
More informationPhase-based Adaptive Recompilation in a JVM
Phase-based Adaptive Recompilation in a JVM Dayong Gu Clark Verbrugge Sable Research Group, School of Computer Science McGill University, Montréal, Canada {dgu1, clump}@cs.mcgill.ca April 7, 2008 Sable
More informationSoftware Transactions: A Programming-Languages Perspective
Software Transactions: A Programming-Languages Perspective Dan Grossman University of Washington 5 December 2006 A big deal Research on software transactions broad Programming languages PLDI, POPL, ICFP,
More informationTowards Garbage Collection Modeling
Towards Garbage Collection Modeling Peter Libič Petr Tůma Department of Distributed and Dependable Systems Faculty of Mathematics and Physics, Charles University Malostranské nám. 25, 118 00 Prague, Czech
More informationLecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation
Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end
More informationTypes for Precise Thread Interference
Types for Precise Thread Interference Jaeheon Yi, Tim Disney, Cormac Flanagan UC Santa Cruz Stephen Freund Williams College October 23, 2011 2 Multiple Threads Single Thread is a non-atomic read-modify-write
More informationEfficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management
to Reuse * * Evaluated * OOPSLA * Artifact * AEC Efficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management Jipeng Huang Michael D. Bond Ohio State
More informationDuet: Static Analysis for Unbounded Parallelism
Duet: Static Analysis for Unbounded Parallelism Azadeh Farzan and Zachary Kincaid University of Toronto Abstract. Duet is a static analysis tool for concurrent programs in which the number of executing
More informationAn introduction to weak memory consistency and the out-of-thin-air problem
An introduction to weak memory consistency and the out-of-thin-air problem Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) CONCUR, 7 September 2017 Sequential consistency 2 Sequential
More informationMaximal Causality Reduction for TSO and PSO. Shiyou Huang Jeff Huang Parasol Lab, Texas A&M University
Maximal Causality Reduction for TSO and PSO Shiyou Huang Jeff Huang huangsy@tamu.edu Parasol Lab, Texas A&M University 1 A Real PSO Bug $12 million loss of equipment curpos = new Point(1,2); class Point
More informationChimera: Hybrid Program Analysis for Determinism
Chimera: Hybrid Program Analysis for Determinism Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor - 1 - * Chimera image from http://superpunch.blogspot.com/2009/02/chimera-sketch.html
More informationProcess Synchronization. Mehdi Kargahi School of ECE University of Tehran Spring 2008
Process Synchronization Mehdi Kargahi School of ECE University of Tehran Spring 2008 Producer-Consumer (Bounded Buffer) Producer Consumer Race Condition Producer Consumer Critical Sections Structure of
More informationNOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.
Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which
More informationProgram logics for relaxed consistency
Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 1st Lecture, 28 July 2014 Outline Part I. Weak memory models 1. Intro
More informationDRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages
DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages Daniel Marino Abhayendra Singh Todd Millstein Madanlal Musuvathi Satish Narayanasamy University of California, Los Angeles
More informationProgram Calling Context. Motivation
Program alling ontext Motivation alling context enhances program understanding and dynamic analyses by providing a rich representation of program location ollecting calling context can be expensive The
More informationOptimistic Shared Memory Dependence Tracing
Optimistic Shared Memory Dependence Tracing Yanyan Jiang1, Du Li2, Chang Xu1, Xiaoxing Ma1 and Jian Lu1 Nanjing University 2 Carnegie Mellon University 1 powered by Understanding Non-determinism Concurrent
More informationPart IV: Exploiting Purity for Atomicity
Busy Acquire Part IV: Exploiting Purity for Atomicity atomic void busy_acquire() { if (CAS(m,0,1)) break; CAS(m,0,1) CAS(m,0,1) CAS(m,0,1) (fails) (fails) (succeeds) boolean b[max]; // b[i]==true iff block
More informationDeterministic Parallel Programming
Deterministic Parallel Programming Concepts and Practices 04/2011 1 How hard is parallel programming What s the result of this program? What is data race? Should data races be allowed? Initially x = 0
More informationA Scalable Lock Manager for Multicores
A Scalable Lock Manager for Multicores Hyungsoo Jung Hyuck Han Alan Fekete NICTA Samsung Electronics University of Sydney Gernot Heiser NICTA Heon Y. Yeom Seoul National University @University of Sydney
More informationWeak Memory Models: an Operational Theory
Opening Weak Memory Models: an Operational Theory INRIA Sophia Antipolis 9th June 2008 Background on weak memory models Memory models, what are they good for? Hardware optimizations Contract between hardware
More informationDeterministic Replay and Data Race Detection for Multithreaded Programs
Deterministic Replay and Data Race Detection for Multithreaded Programs Dongyoon Lee Computer Science Department - 1 - The Shift to Multicore Systems 100+ cores Desktop/Server 8+ cores Smartphones 2+ cores
More informationCombining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania
Combining the Logical and the Probabilistic in Program Analysis Xin Zhang Xujie Si Mayur Naik University of Pennsylvania What is Program Analysis? int f(int i) {... } Program Analysis x may be null!...
More informationPhosphor: Illuminating Dynamic. Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Fork me on Github Data Flow in Commodity JVMs Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA Dynamic Data Flow Analysis: Taint Tracking Output that is
More informationThe New Java Technology Memory Model
The New Java Technology Memory Model java.sun.com/javaone/sf Jeremy Manson and William Pugh http://www.cs.umd.edu/~pugh 1 Audience Assume you are familiar with basics of Java technology-based threads (
More informationSound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection
Wellesley College Wellesley College Digital Scholarship and Archive Honors Thesis Collection 2017 Sound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection Samuila Mincheva sminchev@wellesley.edu
More informationPotential violations of Serializability: Example 1
CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same
More informationAtomChase: Directed Search Towards Atomicity Violations
AtomChase: Directed Search Towards Atomicity Violations Mahdi Eslamimehr, Mohsen Lesani Viewpoints Research Institute, 1025 Westwood Blvd 2nd flr, Los Angeles, CA 90024 t: (310) 208-0524 AtomChase: Directed
More informationDesigning Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve
Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important
More informationDynamic Analysis of Inefficiently-Used Containers
Dynamic Analysis of Inefficiently-Used Containers Shengqian (Victor) Yang 1 Dacong Yan 1 Guoqing Xu 2 Atanas Rountev 1 1 Ohio State University 2 University of California, Irvine PRESTO: Program Analyses
More informationEffective Performance Measurement and Analysis of Multithreaded Applications
Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined
More informationStatic Conflict Analysis for Multi-Threaded Object Oriented Programs
Static Conflict Analysis for Multi-Threaded Object Oriented Programs Christoph von Praun and Thomas Gross Presented by Andrew Tjang Authors Von Praun Recent PhD Currently at IBM (yorktown( heights) Compilers
More informationRELAXED CONSISTENCY 1
RELAXED CONSISTENCY 1 RELAXED CONSISTENCY Relaxed Consistency is a catch-all term for any MCM weaker than TSO GPUs have relaxed consistency (probably) 2 XC AXIOMS TABLE 5.5: XC Ordering Rules. An X Denotes
More information