Lightweight Data Race Detection for Production Runs

Similar documents
Lightweight Data Race Detection for Production Runs

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

Lightweight Data Race Detection for Production Runs

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Legato: Bounded Region Serializability Using Commodity Hardware Transactional Memory. Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni

DoubleChecker: Efficient Sound and Precise Atomicity Checking

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

PACER: Proportional Detection of Data Races

Valor: Efficient, Software-Only Region Conflict Exceptions

Towards Parallel, Scalable VM Services

11/19/2013. Imperative programs

Nondeterminism is Unavoidable, but Data Races are Pure Evil

Hybrid Static Dynamic Analysis for Statically Bounded Region Serializability

Avoiding Consistency Exceptions Under Strong Memory Models

Valor: Efficient, Software-Only Region Conflict Exceptions

Sound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection

Motivation & examples Threads, shared memory, & synchronization

Probabilistic Calling Context

How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

DataCollider: Effective Data-Race Detection for the Kernel

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)

Legato: End-to-End Bounded Region Serializability Using Commodity Hardware Transactional Memory

A Dynamic Evaluation of the Precision of Static Heap Abstractions

Static and Dynamic Program Analysis: Synergies and Applications

Lu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine

Efficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management

Scalable Thread Sharing Analysis

RaceMob: Crowdsourced Data Race Detec,on

NDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness

Chronicler: Lightweight Recording to Reproduce Field Failures

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

Hybrid Static Dynamic Analysis for Region Serializability

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler

Chimera: Hybrid Program Analysis for Determinism

Efficient Data Race Detection for Unified Parallel C

Embedded System Programming

Program Calling Context. Motivation

Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems

Automatically Classifying Benign and Harmful Data Races Using Replay Analysis

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

RaceMob: Crowdsourced Data Race Detection

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

Deterministic Replay and Data Race Detection for Multithreaded Programs

High-level languages

Deterministic Parallel Programming

MultiThreading. Object Orientated Programming in Java. Benjamin Kenwright

Method-Level Phase Behavior in Java Workloads

CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1

Semantic Atomicity for Multithreaded Programs!

Dynamic Race Detection with LLVM Compiler

Static Analysis of Embedded C

A Case for System Support for Concurrency Exceptions

A Causality-Based Runtime Check for (Rollback) Atomicity

Effective Performance Measurement and Analysis of Multithreaded Applications

Optimistic Shared Memory Dependence Tracing

FastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund

Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat

Symbolic Execution, Dynamic Analysis

The Java Memory Model

ProRace: Practical Data Race Detection for Production Use

Atomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016

EECS 570 Lecture 13. Directory & Optimizations. Winter 2018 Prof. Satish Narayanasamy

Phosphor: Illuminating Dynamic. Data Flow in Commodity JVMs

Runtime Atomicity Analysis of Multi-threaded Programs

Dynamic Analysis of Inefficiently-Used Containers

Redflag: A Framework for Analysis of Kernel-Level Concurrency

Static and dynamic analysis: synergy and duality

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania

Static Analysis of Embedded C Code

AtomChase: Directed Search Towards Atomicity Violations

Towards Garbage Collection Modeling

Reference Object Processing in On-The-Fly Garbage Collection

CHESS: Analysis and Testing of Concurrent Programs

Trace-based JIT Compilation

Learning from Executions

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Transaction Memory for Existing Programs Michael M. Swift Haris Volos, Andres Tack, Shan Lu, Adam Welc * University of Wisconsin-Madison, *Intel

Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler

Lockout: Efficient Tes0ng for Deadlock Bugs. Ali Kheradmand, Baris Kasikci, George Candea

New Algorithms for Static Analysis via Dyck Reachability

DEP+BURST: Online DVFS Performance Prediction for Energy-Efficient Managed Language Execution

Development of Technique for Healing Data Races based on Software Transactional Memory

Data-flow Analysis for Interruptdriven Microcontroller Software

Large-Scale API Protocol Mining for Automated Bug Detection

BERKELEY PAR LAB. UPC-THRILLE Demo. Chang-Seo Park

CSE544 Database Architecture

Simplified Semantics and Debugging of Concurrent Programs via Targeted Race Detection

A Serializability Violation Detector for Shared-Memory Server Programs

Predicting Data Races from Program Traces

Efficient Runtime Tracking of Allocation Sites in Java

Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems

CMSC 132: Object-Oriented Programming II

Part 3: Beyond Reduction

Eraser : A dynamic data race detector for multithreaded programs By Stefan Savage et al. Krish 9/1/2011

Efficient Data Race Detection for C/C++ Programs Using Dynamic Granularity

DMP Deterministic Shared Memory Multiprocessing

Diagnosing Production-Run Concurrency-Bug Failures. Shan Lu University of Wisconsin, Madison

Lightweight Fault Detection in Parallelized Programs

Transcription:

Lightweight Data Race Detection for Production Runs Swarnendu Biswas, UT Austin Man Cao, Ohio State University Minjia Zhang, Microsoft Research Michael D. Bond, Ohio State University Benjamin P. Wood, Wellesley College CC 2017

A Java Program With a Data Race Object X = null; boolean done= false; Thread T1 Thread T2 X = new Object(); done = true; while (!done) {} X.compute();

Object X = null; boolean done= false; write Thread T1 X = new Object(); done = true; Thread T2 while (!done) {} X.compute(); read Data race Conflicting accesses two threads access the same shared variable where at least one access is a write Concurrent accesses accesses are not ordered by synchronization operations

LICM Thread T1 Thread T2 X = new Object(); temp = done; Infinite loop done = true; while (!temp) {} Thread T1 Thread T2 done = true; X = new Object(); while (!done) {} X.compute(); NPE

Data Races are Evil Challenging to reason about the correctness of racy executions May unpredictably break code Lack of semantic guarantees in most mainstream multithreaded languages Usually indicate other concurrency errors Atomicity, order, or sequential consistency violations S. Adve and H. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM 2010. S. Adve. Data Races Are Evil with No Exceptions: Technical Perspective. CACM 2010.

Far-Reaching Impact of Data Races ~50 million people affected

Get Rid of Data Races!!! Avoiding and/or eliminating data races efficiently is a challenging and unsolved problem

Data Race Detection on Production Systems Avoiding and/or eliminating data races efficiently is a challenging and unsolved problem No satisfactory solution to date

Data Race Detection Techniques Static and predictive analyses Too many false positives, do not scale

Data Race Detection Techniques Dynamic analysis sound no missed races precise no false races Lockset analysis Happens-before analysis Expensive, reports many false positives Sound and precise Expensive, not scalable, incurs space overhead Coverage limited to observed executions C. Flanagan and S. Freund. FastTrack: Efficient and Precise Dynamic Data Race Detection. PLDI 2009.

Existing Approaches for Data Race Detection on Production Runs Happens-before-based sampling approaches E.g., LiteRace 1, Pacer 2 Overheads are still too high for a reasonable sampling rate Pacer with 3% sampling rate incurs 86% overhead!!! 1. D. Marino et al. LiteRace: Effective Sampling for Lightweight Data-Race Detection. PLDI 2009. 2. M. D. Bond et al. Pacer: Proportional Detection of Data Races. PLDI 2010.

Existing Approaches for Data Race Detection on Production Runs Happens-before-based sampling approaches E.g., LiteRace 1, Pacer 2 Overheads are still too high for a reasonable sampling rate Pacer with 3% sampling rate incurs 86% overhead!!! RaceMob 3 Optimizes tracking of happens-before relations Monitors only one race per run to minimize overhead Cannot bound overhead, limited scalability and coverage 1. D. Marino et al. LiteRace: Effective Sampling for Lightweight Data-Race Detection. PLDI 2009. 2. M. D. Bond et al. Pacer: Proportional Detection of Data Races. PLDI 2010. 3. B. Kasikci et al. RaceMob: Crowdsourced Data Race Detection. SOSP 2013.

Existing Approaches for Data Race Detection on Production Runs DataCollider 4 Tries to collide racy accesses, synchronization oblivious Samples accesses, and uses hardware debug registers for performance Dependence on debug registers Not portable, and may not scale well Few debug registers Cannot bound overhead 4. J. Erickson et al. Effective Data-Race Detection for the Kernel. OSDI 2009.

Outline Data Races Problems and Challenges Data Race Detection in Production Systems Drawbacks of existing approaches Our contribution: efficient, complementary analyses RaceChaser: Precise data race detection Caper: Sound data race detection

Our Insight Decouple data race detection into two lightweight and complementary analysis

Efficient Our Contributions Decouple data race detection into two lightweight and complementary analysis RaceChaser Precise data race detector Under-approximates data races Caper Dynamically sound data race detector Over-approximates data races Can miss true data races Can report false data races

RaceChaser: Precise Data Race Detection Desired Properties: Performance and Scalability? Bounded time and space overhead? Coverage and Portability? Design: Monitor one data race (two source locations) per run Use collision analysis Bound overhead introduced

avrora.sim.radio.medium:access$302() byte offset 0 avrora.sim.radio.medium:access$402() byte offset 2 Two static sites involved in a potential data race Max overhead 5% RaceChaser True data race! Data race not reproduced RaceChaser Algorithm

Instrumenting Racy Accesses Limited to one potential race pair avrora.sim.radio.medium: access$302() byte offset 0 avrora.sim.radio.medium: access$402() byte offset 2

Randomly Sample Racy Accesses Use frequency of samples taken and Compute overhead introduced by waiting avrora.sim.radio.medium: access$302() byte offset 0 Dynamic instance 992 Dynamic instance 993 avrora.sim.radio.medium: access$402() byte offset 2 Sampled

Try to Collide Racy Accesses Block thread for some time avrora.sim.radio.medium: access$302() byte offset 0 Dynamic instance 992 Dynamic instance 993 avrora.sim.radio.medium: access$402() byte offset 2 Sampled

Collision is Successful avrora.sim.radio.medium: access$302() byte offset 0 Dynamic instance 992 Dynamic instance 993 avrora.sim.radio.medium: access$402() byte offset 2 Sampled True data race detected Dynamic instance 215

Collision is Unsuccessful Thread unblocks, resets the analysis state, and continues execution avrora.sim.radio.medium: access$302() byte offset 0 Dynamic instance 992 Dynamic instance 993 avrora.sim.radio.medium: access$402() byte offset 2 Sampled Next instruction

Evaluation of RaceChaser Implementation is publicly available Jikes RVM 3.1.3 Benchmarks Large workload sizes of DaCapo 2006 and 9.12- bach suite Fixed-workload versions of SPECjbb2000 and SPECjbb2005 Platform 64-core AMD Opteron 6272

50 Run-time Overhead (%) of RaceChaser 45 40 35 37% 30 25 20 20% 15 10 5 0 hsqdb6 lusearch6 xalan6 avrora9 luindex9 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean RaceChaser (max = 0%) RaceChaser (max = 5%) RaceChaser (max = 10%) LiteHB FullHB < 2% adapted from 1. B. Kasikci et al. RaceMob: Crowdsourced Data Race Detection. SOSP 2013.

Effectiveness of RaceChaser Collision analysis can potentially detect data races that are hidden by spurious happensbefore relations Data race coverage of collision analysis depends on the perturbation and the delay Prior studies seem to indicate that data races often occur close in time RaceChaser did as well as RaceMob/LiteHB over a number of runs

Outline Data Races Problems and Challenges Data Race Detection in Production Systems Drawbacks of existing approaches Our contribution: efficient, complementary analyses RaceChaser: Precise data race detection Caper: Sound data race detection

Sound, Efficient Data Race Detection Options Use static analysis offline Too many false positives Use dynamic analysis online Efficient enough for production runs?

multiple runs Caper: Sound Data Race Detection Input program Caper algorithm dynamically sound Set of potential race pairs Static analysis Dynamic analysis

multiple runs Caper Algorithm Input program Static analysis Dynamic analysis Set of potential race pairs Input program Static data race detector Static race pairs (sppairs) Dynamic analysis dynamic race pairs (dppairs)

Sound Dynamic Escape Analysis for Data Race Detection Reachability-based analysis q ESCAPED f g p NOT_ESCAPED NOT_ESCAPED q.f = p q ESCAPED p ESCAPED ESCAPED f g

Caper s Dynamic Analysis desites = { s ( s s, s sppairs dppairs) s escaped in an analyzed execution } dppairs = { s 1, s 2 s 1 desites s 2 desites }

120 Run-time Overhead (%) of Caper 100 80 60 40 27% 20 0 hsqldb6 lusearch6 xalan6 avrora9 luindex9 lusearch9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean DEA Caper (first run) Caper (steady state) 3% 9%

Effectiveness of Caper Sound static data race detector Caper Dynamic alias analysis hsqldb6 212,205 1,612 757 lusearch6 4,692 302 292 xalan6 83,488 1,241 581 avrora9 61,193 19,941 570 luindex9 10,257 192 193 lusearch9 7,303 34 39 sunflow9 28,587 200 1,086 xalan9 20,036 1,861 600 pjbb2000 29,604 11,243 1,679 pjbb2005 2,552 984 447

% of dynamic accesses that need to be instrumented 100 Efficiency vs Precision 90 80 70 Static data race detector, e.g., Chord 60 50 40 30 20 10 Caper Dynamic alias analysis 0 0 200 400 600 800 1000 1200 1400 Run-time overhead (%)

Usefulness of Caper Improve performance of analyses whose correctness relies on knowing all data races Record and replay systems Atomicity checking Software transactional memory Generate potential data races for analyses like RaceChaser/RaceMob/DataCollider

Lightweight Data Race Detection for Production Runs Swarnendu Biswas, UT Austin Man Cao, Ohio State University Minjia Zhang, Microsoft Research Michael D. Bond, Ohio State University Benjamin P. Wood, Wellesley College CC 2017