DoubleChecker: Efficient Sound and Precise Atomicity Checking

Similar documents
Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

Lightweight Data Race Detection for Production Runs

Legato: Bounded Region Serializability Using Commodity Hardware Transactional Memory. Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni

Optimistic Shared Memory Dependence Tracing

Valor: Efficient, Software-Only Region Conflict Exceptions

Hybrid Static Dynamic Analysis for Statically Bounded Region Serializability

Avoiding Consistency Exceptions Under Strong Memory Models

Hybrid Static Dynamic Analysis for Region Serializability

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

Predicting Null-Pointer Dereferences in Concurrent Programs. Parthasarathy Madhusudan Niloofar Razavi Francesco Sorrentino

A Serializability Violation Detector for Shared-Memory Server Programs

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)

SideTrack: Generalizing Dynamic Atomicity Analysis

Diagnosing Production-Run Concurrency-Bug Failures. Shan Lu University of Wisconsin, Madison

FastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund

Valor: Efficient, Software-Only Region Conflict Exceptions

Predictive Analysis for Detecting Serializability Violations through Trace Segmentation

Chimera: Hybrid Program Analysis for Determinism

PACER: Proportional Detection of Data Races

Cooperative Reasoning for Preemptive Execution

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

Motivation & examples Threads, shared memory, & synchronization

Marathon: Detecting Atomic-Set Serializability Violations with Conflict Graphs

CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1

Lightweight Data Race Detection for Production Runs

A Scalable Lock Manager for Multicores

Yuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam *

NDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

HEAT: An Integrated Static and Dynamic Approach for Thread Escape Analysis

Complex, concurrent software. Precision (no false positives) Find real bugs in real executions

Sound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection

the complexity of predicting atomicity violations

Semantic Atomicity for Multithreaded Programs!

A Parameterized Type System for Race-Free Java Programs

Lightweight Data Race Detection for Production Runs

Static and Dynamic Program Analysis: Synergies and Applications

F. Tip and M. Weintraub CONCURRENCY

High-level languages

MULTITHREADING has become a common programming

AtomChase: Directed Search Towards Atomicity Violations

A Causality-Based Runtime Check for (Rollback) Atomicity

Static Java Program Features for Intelligent Squash Prediction

Atomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions

Finding Concurrency Bugs in Java

A Classification of Concurrency Bugs in Java Benchmarks by Developer Intent

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Nondeterminism is Unavoidable, but Data Races are Pure Evil

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

DMP Deterministic Shared Memory Multiprocessing

Probabilistic Calling Context

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

Static Analysis of Embedded C

Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic!

Part IV: Exploiting Purity for Atomicity

Symbolic Execution, Dynamic Analysis

A Case for System Support for Concurrency Exceptions

CSE 374 Programming Concepts & Tools

Software Transactions: A Programming-Languages Perspective

Automatic Generation of Program Specifications

On the Structure of Sharing in Open Concurrent Java Programs

Atomicity via Source-to-Source Translation

Part 3: Beyond Reduction

Call Paths for Pin Tools

Profile-Guided Program Simplification for Effective Testing and Analysis

Runtime Atomicity Analysis of Multi-threaded Programs

Concurrency, Thread. Dongkun Shin, SKKU

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Verifying Concurrent Programs

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions

Duet: Static Analysis for Unbounded Parallelism

Hybrid Dynamic Data Race Detection

Computing Approximate Happens-Before Order with Static and Dynamic Analysis

Do you have to reproduce the bug on the first replay attempt?

Compiler Techniques For Memory Consistency Models

11/19/2013. Imperative programs

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Part II: Atomicity for Software Model Checking. Analysis of concurrent programs is difficult (1) Transaction. The theory of movers (Lipton 75)

Lock-sensitive Interference Analysis for Java: Combining Program Dependence Graphs with Dynamic Pushdown Networks

Dynamic Race Detection with LLVM Compiler

Scheduler Activations. CS 5204 Operating Systems 1

Redflag: A Framework for Analysis of Kernel-Level Concurrency

Deterministic Replay and Data Race Detection for Multithreaded Programs

Efficient Data Race Detection for Unified Parallel C

Understanding and Genera-ng High Quality Patches for Concurrency Bugs. Haopeng Liu, Yuxi Chen and Shan Lu

CSE 403: Software Engineering, Fall courses.cs.washington.edu/courses/cse403/16au/ Static Analysis. Emina Torlak

Synchronization I. Jo, Heeseung

Learning from Executions

AUTOMATICALLY INFERRING STRUCTURE CORRELATED VARIABLE SET FOR CONCURRENT ATOMICITY SAFETY

Development of Technique for Healing Data Races based on Software Transactional Memory

Transaction Memory for Existing Programs Michael M. Swift Haris Volos, Andres Tack, Shan Lu, Adam Welc * University of Wisconsin-Madison, *Intel

Maximal Causality Reduction for TSO and PSO. Shiyou Huang Jeff Huang Parasol Lab, Texas A&M University

CMSC 330: Organization of Programming Languages

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Leveraging the Short-Term Memory of Hardware to Diagnose Production-Run Software Failures. Joy Arulraj, Guoliang Jin and Shan Lu

CSE 530A ACID. Washington University Fall 2013

Fast dynamic program analysis Race detection. Konstantin Serebryany May

Falcon: Fault Localization in Concurrent Programs

Transcription:

DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University PLDI 2014

Impact of Concurrency Bugs

Impact of Concurrency Bugs Northeastern blackout, 2003

Impact of Concurrency Bugs

Atomicity Violations Constitute 69% 1 of all non-deadlock concurrency bugs 1. S. Lu et al. Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In ASPLOS, 2008.

Atomicity Concurrency correctness property Synonymous with serializability Program execution must be equivalent to some serial execution of the atomic regions

Thread 1 Thread 2 void execute() { while (...) { preparelist(); processlist(); void execute() { while (...) { preparelist(); processlist(); } resetlist(); } resetlist(); } } Atomicity Violation Example

Thread 1 Thread 2 void preparelist() { synchronized (l1) { list.add(new Object()); } } void processlist() { synchronized (l1) { Object head = list.get(0); } } void resetlist() { synchronized (l1) { list = null; } } Atomicity Violation Example

Thread 1 Thread 2 void preparelist() { synchronized (l1) { list.add(new Object()); } } Null pointer dereference void processlist() { synchronized (l1) { Object head = list.get(0); } } void resetlist() { synchronized (l1) { list = null; } } Data-race-free program Atomicity Violation Example

Thread 1 Thread 2 void execute() { while (...) { preparelist(); processlist(); atomic void execute() { while (...) { preparelist(); processlist(); } resetlist(); } resetlist(); } } Atomicity Violation Example

Detecting Atomicity Violations Check for conflict serializability Build a transactional dependence graph Check for cycles Existing work Velodrome, Flanagan et al., PLDI 2008 Farzan and Parthasarathy, CAV 2008

acq lock wr o.f time transaction wr o.g wr o.f rel lock Thread 1 Thread 2 Thread 3 Transactional Dependence Graph

acq lock wr o.f time transaction wr o.g wr o.f rel lock Thread 1 Thread 2 Thread 3 Transactional Dependence Graph

acq lock wr o.f time transaction wr o.g wr o.f rd o.f rel lock Thread 1 Thread 2 Thread 3 Cycle means Atomicity Violation

Velodrome 1 Paper reports 12.7X overhead 6.1X in our experiments 1. C. Flanagan et al. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, 2008. Prior Work is Slow

High Overheads of Prior Work Precise tracking is expensive last transaction(s) to read/write for every field Need atomic updates in instrumentation

Instrumentation Approach Program access Program access Uninstrumented program Instrumented program

Precise Tracking is Expensive! Precise tracking of dependences Program access Analysis-specific work Update metadata Can lead to remote cache misses for mostly read-only variables Program access Uninstrumented program Instrumented program

Synchronized Updates are Expensive! Lock metadata access Program access atomic atomic Program access Unlock metadata access Uninstrumented program Instrumented program

Synchronized Updates are Expensive! synchronization on every access Lock metadata access Program access atomic atomic Program access slows programs Unlock metadata access Uninstrumented program Instrumented program

DoubleChecker

DoubleChecker s Contributions Dynamic atomicity checker based on conflict serializability Precise Sound and unsound operation modes Incurs 2-4 times lower overheads Makes dynamic atomicity checking more practical

Key Insights Avoid high costs of precise tracking of dependences at every access Common case: no dependences Most accesses are thread local

Key Insights Tracks dependences imprecisely Soundly over-approximates dependences Recovers precision when required Turns out to be a lot cheaper

Staged Analysis Imprecise cycle detection (ICD) Precise cycle detection (PCD)

Imprecise Cycle Detection Program execution atomicity specifications ICD sound tracking Imprecise cycles Processes every program access Soundly overapproximates dependences, is cheap Could have false positives

Precise Cycle Detection Imprecise cycles static program locations access information PCD Precise violations Processes a subset of program accesses Performs precise analysis No false positives

Program execution atomicity specifications ICD sound tracking Imprecise cycles Precise violations PCD static program locations access information Staged Analyses: ICD and PCD

Program execution atomicity specifications ICD sound tracking Imprecise cycles Precise violations PCD static program locations access information true atomicity violations ICD is Sound

Role of ICD Program execution atomicity specifications ICD sound tracking Imprecise cycles Most accesses in a program are thread-local Uses Octet 1 for tracking cross-thread dependences Acts as a dynamically sound transaction filter 1. M. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOPSLA, 2013.

Role of PCD Processes transactions involved in an ICD cycle Imprecise cycles static program locations access information PCD Performs precise serializability analysis PCD has to do much less work Precise violation Program conforming to its atomicity specification will have very few cycles

Different Modes of Operation Single-run mode Multi-run mode

Program execution atomicity specifications ICD+PCD Atomicity violations ICD ICD cycles read/write logs PCD Single-Run Mode

Program execution atomicity specifications ICD sound tracking Potentially imprecise cycles First run Static transaction information Program execution monitored transactions ICD+PCD Atomicity violations Second run Multi-run Mode

Design Choices Multi-run mode Conditionally instruments non-transactional accesses Otherwise overhead increases by 29% Could use Velodrome for the second run But performance is worse Second run has to process many accesses ICD is still effective as a dynamic transaction filter

Examples Imprecise analysis Precise analysis

transaction wr o.f (WrEx T1 ) time Thread 1 Thread 2 Thread 3 Thread 4 Imprecise Analysis

wr o.f (WrEx T1 ) time Thread 1 Thread 2 Thread 3 Thread 4 Imprecise Analysis

wr o.f (WrEx T1 ) rd o.g (RdEx T2 ) time Thread 1 Thread 2 Thread 3 Thread 4 Imprecise Analysis

wr o.f (WrEx T1 ) time rd o.g (RdEx T2 ) rd o.f (RdSh c ) Thread 1 Thread 2 Thread 3 Thread 4 Imprecise Analysis

wr o.f (WrEx T1 ) time rd o.g (RdEx T2 ) rd o.f (RdSh c ) rd o.h (fence) Thread 1 Thread 2 Thread 3 Thread 4 Imprecise Analysis

wr o.f (WrEx T1 ) time wr o.f (WrEx T1 ) rd o.g (RdEx T2 ) rd o.f (RdSh c ) rd o.h (fence) Thread 1 Thread 2 Thread 3 Thread 4 Imprecise Analysis

rd o.g rd o.f time rd o.h wr o.f Thread 1 Thread 2 Thread 3 Thread 4 Precise Analysis

rd o.g rd o.f time rd o.h wr o.f Thread 1 Thread 2 Thread 3 Thread 4 No Precise Violation

wr o.f (WrEx T1 ) rd o.g (RdEx T2 ) time wr o.f (WrEx T1 ) rd o.h (RdEx T2 ) rd o.f (RdSh c ) rd o.h (fence) Thread 1 Thread 2 Thread 3 Thread 4 ICD Cycle

wr o.f rd o.g time rd o.h rd o.f rd o.h wr o.f Thread 1 Thread 2 Thread 3 Thread 4 Precise analysis

wr o.f rd o.g time rd o.h rd o.f rd o.h wr o.f Thread 1 Thread 2 Thread 3 Thread 4 Precise Violation

Evaluation Methodology Implementation Atomicity specifications Experiments

Implementation DoubleChecker and Velodrome Developed in Jikes RVM 3.1.3 Artifact successfully evaluated Code shared on Jikes RVM Research Archive

Experimental Methodology Benchmarks DaCapo 2006, 9.12-bach, Java Grande, other benchmarks used in prior work 1 Platform: 3.30 GHz 4-core Intel i5 processor 1. C. Flanagan et al. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, 2008.

Atomicity Specifications Assume provided by the programmers We reuse prior work s approach to infer the specifications All methods except main(), run(), callers of join(), wait(), etc. DoubleChecker/ Velodrome new violations reported? Yes No atomicity specification considered non-atomic

Soundness Experiments Generated atomicity violations with Velodrome - sound and precise DoubleChecker Single-run mode - sound and precise Multi-run mode - unsound Results match closely for Velodrome and the single-run mode Multi-run mode finds 83% of all violations

Performance Experiments

Performance Experiments Single-run mode - 1.9 times faster than Velodrome Multi-run mode First run - 5.6 times faster Second run - 3.7 times faster

DoubleChecker 2-4 times lesser overhead than current state-of-art Makes dynamic atomicity checking more practical

Related Work Type systems Flanagan and Qadeer, PLDI 2003 Flanagan et al., TOPLAS 2008 Model checking Farzan and Madhusudan, CAV 2006 Flanagan, SPIN 2004 Hatcliff et al., VMCAI 2004

Related Work Dynamic analysis Conflict-serializability-based approaches Flanagan et al., PLDI 2008; Farzan and Madhusudan, CAV 2008 Inferring atomicity Lu et al., ASPLOS 2006; Xu et al., PLDI 2005; Hammer et al., ICSE 2008 Predictive approaches Sinha et al., MEMOCODE 2011; Sorrentino et al., FSE 2010 Other approaches Wang and Stoller, PPoPP 2006; Wang and Stoller, TSE 2006

What Has DoubleChecker Achieved? Improved overheads over current state-ofart Makes dynamic atomicity checking more practical Cheaper to over-approximate dependences Showcases a judicious separation of tasks to recover precision