Eraser: Dynamic Data Race Detection

Similar documents
Eraser : A dynamic data race detector for multithreaded programs By Stefan Savage et al. Krish 9/1/2011

Detecting Data Races in Multi-Threaded Programs

W4118: concurrency error. Instructor: Junfeng Yang

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

Symbolic Execution, Dynamic Analysis

The Art and Science of Memory Allocation

Learning from Executions

Evaluating Research on Software Design and Productivity

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

CMSC 714 Lecture 14 Lamport Clocks and Eraser

Birth of Optimistic Methods. On Optimistic Methods for Concurrency Control. Basic performance arg

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)

Identifying Ad-hoc Synchronization for Enhanced Race Detection

Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs

CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement"

Software Analysis Techniques for Detecting Data Race

Processes and Threads

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Evolution of Virtual Machine Technologies for Portability and Application Capture. Bob Vandette Java Hotspot VM Engineering Sept 2004

Chianti: A Tool for Change Impact Analysis of Java Programs

CS5460: Operating Systems

CSE 120 Principles of Operating Systems

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

3/3/2014! Anthony D. Joseph!!CS162! UCB Spring 2014!

The Art and Science of Memory Alloca4on

CS533 Concepts of Operating Systems. Jonathan Walpole

Concurrency, Thread. Dongkun Shin, SKKU

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Deterministic Process Groups in

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Atomicity via Source-to-Source Translation

Concurrent & Distributed Systems Supervision Exercises

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS414 Final Exam, Version 1 (December 2, 2004) 75 Minutes, closed book. Five questions, 20 points each

Multiprocessor Systems. COMP s1

Advanced Memory Management

CS-537: Midterm Exam (Fall 2013) Professor McFlub

Multiprocessor Systems. Chapter 8, 8.1

Lecture 4: Memory Management & The Programming Interface

12: Memory Management

Software Engineering at VMware Dan Scales May 2008

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Runtime Atomicity Analysis of Multi-threaded Programs

Concurrency. Glossary

CSE 333 Lecture 1 - Systems programming

Linux kernel synchronization. Don Porter CSE 506

15 Sharing Main Memory Segmentation and Paging

Thread Synchronization: Too Much Milk

Atomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions

Lecture 12: Demand Paging

Review: Easy Piece 1

Checking System Rules Using System-Specific, Programmer- Written Compiler Extensions

Lecture 9 Dynamic Compilation

Page 1. Goals for Today" TLB organization" CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement"

Design and Implementation of Modern Programming Languages (Seminar)

Motivation. What s the Problem? What Will we be Talking About? What s the Solution? What s the Problem?

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Static Analysis of Embedded C

Fall 2014:: CSE 506:: Section 2 (PhD) Threading. Nima Honarmand (Based on slides by Don Porter and Mike Ferdman)

Portland State University ECE 587/687. Virtual Memory and Virtualization

CMPSC 311- Introduction to Systems Programming Module: Systems Programming

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

MODERN SYSTEMS: EXTENSIBLE KERNELS AND CONTAINERS

CSE 374 Programming Concepts & Tools

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources

The Java Memory Model

Parallelism and Concurrency. Motivation, Challenges, Impact on Software Development CSE 110 Winter 2016

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

VIRTUAL MEMORY READING: CHAPTER 9

VFS, Continued. Don Porter CSE 506

ECE 3574: Applied Software Design. Threads

DataCollider: Effective Data-Race Detection for the Kernel

Synchronization COMPSCI 386

Performance Profiling

Static Analysis of Embedded C Code

Motivation & examples Threads, shared memory, & synchronization

16 Sharing Main Memory Segmentation and Paging

COREMU: a Portable and Scalable Parallel Full-system Emulator

Computer Fundamentals: Operating Systems, Concurrency. Dr Robert Harle

Safety and liveness for critical sections

CSE544 Database Architecture

Fast dynamic program analysis Race detection. Konstantin Serebryany May

Towards automated detection of buffer overrun vulnerabilities: a first step. NDSS 2000 Feb 3, 2000

Threads. Computer Systems. 5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1

CMSC 330: Organization of Programming Languages

Operating Systems Design Exam 2 Review: Spring 2011

CSCI-1200 Data Structures Fall 2009 Lecture 25 Concurrency & Asynchronous Computing

Midterm Exam. October 20th, Thursday NSC

Static Analysis versus Software Model Checking for bug finding

CS 416: Opera-ng Systems Design March 23, 2012

(Refer Slide Time: 1:26)

Virtualization and memory hierarchy

Fine-grained synchronization & lock-free programming

Threads. CS-3013 Operating Systems Hugh C. Lauer. CS-3013, C-Term 2012 Threads 1

A Gentle Introduction to Program Analysis

Intro to Threads. Two approaches to concurrency. Threaded multifinger: - Asynchronous I/O (lab) - Threads (today s lecture)

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Transcription:

Eraser: Dynamic Data Race Detection 15-712 Topics overview Concurrency and race detection Framework: dynamic, static Sound vs. unsound Tools, generally: Binary rewriting (ATOM, Etch,...) and analysis (BAP, etc) - see also use in JITs and VMMs. Techniques used for state space reduction (c.f. model checking) What are the eval criteria they used? How did they argue? Debugging Concurrency is Hard Few tools beyond per-proc gdb Debugging multi-process systems is harder than single process, even without... Threads come with unique problems Deadlock, data races, non-determinism High performance typically implies lots of locks For max parallelism, vs. one big lock Shows up in modern kernel evolution as SMP grows As #cores grow, parallelism will have to extend further and further into apps/libs to keep getting faster Eraser: Lint for multi-threading Multiple threads in single address space Shared memory, one CPU Assumes: pthreads lock() is sync, not monitors libc memory allocation Doesn t work out causality Costly bookkeeping; requires observing right interleaving Instead looks for idiom/style Must hold lock to access shared variable Data race: simultaneous access w/1+ writer Limitation: must be consistent locking, not either of two locks e.g., DB pages or multi-page locks

Binary Rewriting See Savage Slides Metaprogramming tool for executables Read & partially understand binary Could also be done in compiler... Write new extended code after adding function (Harder on x86, but it s been done - variable length binary instruction set vs. more RISC-like systems) DEC ATOM tool for Alpha Insert counters -- super gprof Idea: Add memory restriction for safer code (dynamic bounds checking) Add lock analysis to lint the sync code. compare to: modifying the source, interpreting the instruction stream, trapping system calls in the kernel http://www.cs.ucsd.edu/~savage/papers/ Sosp97Slides.pdf Next bunch of slides taken in very large part, mostly verbatim, from the SOSP talk. Some annotations. Blame dga, not SOSP talk, for bugs How happens-before misses races Thread 1 Thread 2 y := y + 1; lock(mu); v := v + 1; unlock(mu); lock(mu); v := v + 1; unlock(mu); y := y + 1; Not detected as a race by happens-before Lockset algorithm Dynamic analysis Require programmer to adhere to convention Locking Discipline: Consistently hold lock when using resource Automatically infers which lock(s) protect resource Finds more bugs than happens-before But can generate many false positives!

Checking simple discipline Program Refining the candidate set New value of C(v) C(v): locks that might protect variable v Initialize C(v) := set of all locks On each access to v by thread t, C(v) := C(v)! locks_held(t) If C(v) is empty, then issue a warning lock(mu1); v := v + 1; unlock(mu1);... lock(mu2); v := v + 1; unlock(mu2); C(v) = {mu1, mu2} C(v) = {mu1} C(v) = {} False Positives FPs are the defining problem with this type of approach We ll see later some other examples... Really hurt usability. 1000 FPs + 1 bug isn t useful. Much of the rest of paper is about avoiding FPs Ideally without reducing true positives (does it?) Limitations of simple algorithm! Initialization " Don t need locks until data is shared! Read-shared data " Don t need locks if all accesses are reads! Reader/writer locks " Read locks can t protect writes

Modified algorithm Mapping variables to sets of locks! Assume first thread is initializer v " Only update C(v) after two threads touch v! Only report races after data is known to be write-shared! Track read and write locks separately " Remove read locks from C(v) on a write Program memory 2 &v + offset mu1 Shadow memory Lockset index table mu4 Lock vector Performance Experiences! Fast enough to be useful " 10-30x user-time slowdown! Lots of opportunities for optimization " Half of overhead due to ATOM! Tested real programs " AltaVista web server and index library " Vesta cache server " Petal distributed disk server " Undergrad coursework from intro OS class! Most programs found to contain races! False alarms easy to manage

Case study: AltaVista Program characterization Program AltaVista Lines of code Threads Locks Lock sets Ni2 20,000 10 900 3600 mhttpd 5,000 10 100 250 Vesta 30,000 10 26 70 Petal 25,000 64! Double blind experiment " Two old races reintroduced " Previously undetected for several months " Found and fixed in 30 minutes! Several additional (minor) races found! Several benign races " Tricky optimizations Re-introducing bugs is a useful and now common technique Benign race example Serious race (subtle) if (p->fp == 0) { lock(p->lock); if (p->fp == 0) { p->fp = open_file(); } unlock(p->lock); } if (p->fp == 0) { lock(p->lock); if (p->fp == 0) { p->fp = open_file(); } unlock(p->lock); } pos = p->fp->pos; Advanced programmers make automatic inference hard... Subtle optimization, hard to reason about its correctness

Case study: Undergraduate OS Overall races detected! Four simple synchronization problems " e.g. producer consumer! ~180 homeworks tested! Found data races in more than 10% Program Serious races Minor races Benign races AltaVista ä ä Vesta ä ä Petal ä ä Undergrad assignments Kinds of false alarms! Private memory allocators " e.g. free list " Need to reinitialize C(v)! Private lock implementations " e.g. reader/writer locks " Need to know when locks are held! Benign races Removing false alarms! Simple program annotations! Number of annotations needed to remove all false alarms: " AltaVista (19) " Vesta (10) " Petal (4)

end SOSP talk slides Eval Modern systems eval Real impl (distributable; recently rediscovered value for research code) Injected faults by sticking old bugs back into code This technique used in many subsequent evals cvs/svn/git/etc. history Applied to large, real applications and multiple mostly independent tests (students) Would have been nice to see more quantitative comparison between systems, but that s another paper... :) Eval 2 Not perfect tool: 10x slowdown Processor specific for dead processor No guarantee to catch all races In particular: Engler papers suggest many bugs lurk in infrequently hit code -- error handling, etc. Dynamic detection has hard time catching those Just like testing does. But useful for an otherwise hard problem Design Space Static vs. Dynamic analysis Dynamic: Depends on execution order Static: Intractable (but can work well, today) Sound vs. Unsound Note tension between pragmatists and correct-ists Existing languages (C...) vs. Better languages Actual code? Annotations? Model? Type systems; correct by construction? Code generation: prove correct, then generate code. Eval questions: False positives? False negatives? Coverage? Running time? Run in tests vs. in production system?

Follow-on Want more? Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code SOSP 2001. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf The Daikon system for dynamic detection of likely invariants Science of Computer Programming 2007. Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, Chen Xiao. Both examine more general invariants Daikon examines more invariants; Engler et al. is static Both use machine-learning/statistical techniques to infer invariants