Chimera: Hybrid Program Analysis for Determinism

Size: px
Start display at page:

Download "Chimera: Hybrid Program Analysis for Determinism"

Transcription

1 Chimera: Hybrid Program Analysis for Determinism Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor * Chimera image from

2 Deterministic Replay Goal: record and reproduce multithreaded execution Debugging concurrency bugs Offline heavyweight dynamic analysis Forensics and intrusion detection and many more uses Problem Multithreaded record-and-replay is too slow (>2x) or requires custom hardware - 2 -

3 Multithreaded Record-and-Replay is Slow Thread 1 Thread 2 Thread 3 Checkpoint Memory and Register State Log non-deterministic program input - Interrupts, I/O values, DMA, etc. Write Write Read Log shared memory dependencies - 3 -

4 Replay for Data-Race-Free Programs is Cheap Lock(l) X=1 Y=1 Unlock(l) Z=1 Signal(c) T1 T2 X=0 Y=0 Unlock(l) T3 order of mem. ops. order of sync. ops. Wait(c) X=2 Y=2 Z=2 Data-race-free programs Shared memory accesses are well ordered by synchronization ops. Recording happens-before order of sync. ops. is sufficient Problem: Programs with data races - 4 -

5 Our Contribution: A Hybrid Analysis Chimera Potentially racy program P Data-race-free program P Sound static data race analysis Add synchronizations for potential data races Problem: Too many false positives Profiling non-concurrent code regions Symbolic bounds analysis - 5 -

6 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Weak-lock Design Evaluation Conclusion - 6 -

7 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Weak-lock Design Evaluation Conclusion - 7 -

8 Static Data Race Analysis Find potential data-races using a sound static data race detector RELAY [Voung et al., FSE 07] Protect all potential data-races using weak-locks A new time-out lock which may be preempted (discussed later) Record and replay the happens-before order of weak-locks - 8 -

9 Protect Potential Races using Weak-locks void foo() { X = 0; Potential racy-pair void bar() { X = 1; for(i =... ){ Y[ tid ][ i ] = 0; Potential racy-pair for(i = ){ Y[ tid ][ i ] = 1; No race report Z = 1; Static analysis helps avoid instrumentation for access to Z - 9 -

10 Sources of False Positives in RELAY Sound data-race detector reports too many false data-races 53x overhead Source 1: Non-mutex synchronizations are ignored Lockset based analysis ignores fork-join, barrier, signal-wait, etc. May report a false data-race between memory instructions that can never execute concurrently Source 2: Conservative pointer analysis Overestimate variables accessed by a memory instruction May report a false data-race between memory instructions that can never access the same location

11 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Weak-lock Design Evaluation Conclusion

12 Profiling Non-concurrent Code Regions T1 foo() BARRIER BARRIER Problem Lockset based analysis ignores non-mutex synchronization ops. Solution Profile non-concurrent code regions (e.g., functions) Increase the granularity of weak-locks to protect a larger code region instead of each potential racy instruction Parallelism is preserved unless mis-profiled T2 bar()

13 Function-Level Weak-Locks void foo() { X = 0; void bar() { X = 1; for(i = ){ Y[ tid ][ i ] = 0; for(i = ){ Y[ tid ][ i ] = 1; Z = 1; if profiler says foo() and bar() are not likely to run concurrently foo() BARRIER BARRIER bar()

14 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Design Evaluation Conclusion

15 Imprecision in Conservative Pointer Analysis T1 foo() BARRIER May run Concurrently T2 bar() BARRIER

16 Imprecision in Conservative Pointer Analysis void foo() { for(i = 0 to N){ Y[ tid ][ i ] = 0; void bar() { Potential for(i= 0 to N){ racy-pair False Race Y[ tid ][ i ] = 1; Thread1 Thread 2 Y[][] RELAY uses Steensgaard s and Anderson s pointer analysis Flow-Insensitive and Context-Insensitive (FICI) analysis Naming heap objects is conservative Overestimate the variables accessed by a memory instruction

17 Symbolic Bounds Analysis Our Solution Derive the symbolic lower and upper bounds that a racy code region may access (e.g., loops) [Rugina and Rinard, PLDI 00] void foo() { for(i = 0 to N){ Y[ tid ][ i ] = 0; Symbolic Bounds Analysis Bounds: &Y[tid][0] to &Y[tid][N] Increase the granularity of weak-locks to protect a larger code region for a set of addresses specified by a symbolic expression Parallelism is preserved if the bounds are precise enough

18 Loop-level Weak-locks void foo() { X = 0; (&Y[tid][0],&Y[tid][N]) for(i = 0 to N){ Y[ tid ][ i ] = 0; void bar() { X = 1; (&Y[tid][0],&Y[tid][N]) for(i = 0 to N){ Y[ tid ][ i ] = 1; (&Y[tid][0],&Y[tid][N]) Z = 1; (&Y[tid][0],&Y[tid][N]) Symbolic bounds: &Y[tid][0] ~ &Y[tid][N]

19 Imprecise Symbolic Bounds Sources Depend on the value computed inside the code region Depend on arithmetic operations not supported in the analysis e.g., modulo operations, logical AND/OR, etc. void qux() { for(i = 0 to N){ prev = Z[ prev ]; Symbolic Bounds Analysis Bounds: -INF to +INF Choosing the optimal granularity If bounds are too imprecise and the loop body is long enough, resort to instruction (basic-block) level weak-locks for parallelism

20 Roadmap Motivation Chimera Analysis Weak-lock Design Evaluation Conclusion

21 Deadlock due to Weak-locks No deadlocks between weak-locks function-level > loop-level > instruction-level Deadlock between weak-locks and original sync. ops. is possible T 1 wait (cv) T 2 signal(cv) Time-out!!

22 Weak-lock Time-out A weak-lock might time-out Invoke a special system call to handle it T 1 Current owner T 2 Current owner Time-out!! wait (cv) Logged order of weak-locks signal(cv) Weak-lock guarantee Only one thread holds a given weak-lock at any given time Mutual exclusion may be compromised; but sufficient for replay

23 Roadmap Motivation Chimera Analysis Weak-lock Design Evaluation Conclusion

24 Implementation Source-to-source Instrumentation Implemented in OCaml using CIL as a front end Static analysis Data race detection: RELAY [Voung et al., FSE 07] Include all library source codes for soundness (uclibc s libc, libm, etc.) Symbolic bounds analysis: [Rugina and Rinard, PLDI 00] Intra-procedural analysis for racy loops only Runtime system Modified Linux kernel to record/replay program input Modified pthread library to record/replay happens-before order of original synchronization operations and weak-locks

25 Evaluation Setup Test Environment 2.66 GHz 8-core Xeon processor with 4 GB of RAM Different set of inputs for profiling and performance evaluation Average of five trials with 4 worker threads 2, 4, 8 threads for scalability results Benchmarks Desktop applications aget, pfscan, and pbzip2 Server programs knot and apache SPLASH-2 suite ocean, water-nsq, fft, and radix

26 Record and Replay Performance 2.5 record replay 86% slowdown Normalized perf. overhead % slowdown 39% 0 aget pfscan pbzip2 knot apache ocean water fft radix average Recording : 39% on average Replay : similar to recording; much lower for I/O intensive prgs

27 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > 53x aget pfscan pbzip2 knot apache ocean water fft radix average

28 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation

29 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation Exception: control-flow dependency (e.g., pfscan)

30 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation Exception: control-flow dependency (e.g., pfscan)

31 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > 1.39x aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation Exception: control-flow dependency (e.g., pfscan)

32 Breakdown of Recording Overhead Normalized recording overhead func locks loop locks instr/bb locks sync op & system log aget pfscan pbzip2 knot apache ocean water fft radix Weak-lock overhead = contention (waiting) cost + logging cost

33 Breakdown of Recording Overhead Normalized recording overhead func wait func log loop wait loop log instr/bb wait instr/bb log sync op & system log aget pfscan pbzip2 knot apache ocean water fft radix Weak-lock overhead = contention (waiting) cost + logging cost High loop-lock contention High instr/bb-lock contention

34 Normalized recording overhead Scalability 2p 4p 8p aget pfscan pbzip2 knot apache ocean water fft radix average Scientific applications scale worse due to imprecise symbolic bounds analysis

35 Conclusion Goal: Software-only deterministic multiprocessor replay systems Chimera Analysis Static data race analysis Find and protect potential data races with weak-locks Instruction/basic-block-level weak-locks Profiling non-concurrent code regions Address the inadequacy of lockset-based algorithm Function-level weak-locks Symbolic bounds analysis Address the imprecision of conservative pointer analysis Loop-level weak-locks Low Recording Overhead 39% recording overhead for 4 worker threads

36 Thank you

Deterministic Replay and Data Race Detection for Multithreaded Programs

Deterministic Replay and Data Race Detection for Multithreaded Programs Deterministic Replay and Data Race Detection for Multithreaded Programs Dongyoon Lee Computer Science Department - 1 - The Shift to Multicore Systems 100+ cores Desktop/Server 8+ cores Smartphones 2+ cores

More information

Embedded System Programming

Embedded System Programming Embedded System Programming Multicore ES (Module 40) Yann-Hang Lee Arizona State University yhlee@asu.edu (480) 727-7507 Summer 2014 The Era of Multi-core Processors RTOS Single-Core processor SMP-ready

More information

Optimistic Shared Memory Dependence Tracing

Optimistic Shared Memory Dependence Tracing Optimistic Shared Memory Dependence Tracing Yanyan Jiang1, Du Li2, Chang Xu1, Xiaoxing Ma1 and Jian Lu1 Nanjing University 2 Carnegie Mellon University 1 powered by Understanding Non-determinism Concurrent

More information

RaceMob: Crowdsourced Data Race Detec,on

RaceMob: Crowdsourced Data Race Detec,on RaceMob: Crowdsourced Data Race Detec,on Baris Kasikci, Cris,an Zamfir, and George Candea School of Computer & Communica3on Sciences Data Races to shared memory loca,on By mul3ple threads At least one

More information

Efficient Data Race Detection for Unified Parallel C

Efficient Data Race Detection for Unified Parallel C P A R A L L E L C O M P U T I N G L A B O R A T O R Y Efficient Data Race Detection for Unified Parallel C ParLab Winter Retreat 1/14/2011" Costin Iancu, LBL" Nick Jalbert, UC Berkeley" Chang-Seo Park,

More information

Major Project, CSD411

Major Project, CSD411 Major Project, CSD411 Deterministic Execution In Multithreaded Applications Supervisor Prof. Sorav Bansal Ashwin Kumar 2010CS10211 Himanshu Gupta 2010CS10220 1 Deterministic Execution In Multithreaded

More information

Deterministic Process Groups in

Deterministic Process Groups in Deterministic Process Groups in Tom Bergan Nicholas Hunt, Luis Ceze, Steven D. Gribble University of Washington A Nondeterministic Program global x=0 Thread 1 Thread 2 t := x x := t + 1 t := x x := t +

More information

Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis

Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis Accelerating Dynamic Data Race Detection Using Static Thread Interference Peng Di and Yulei Sui School of Computer Science and Engineering The University of New South Wales 2052 Sydney Australia March

More information

Identifying Ad-hoc Synchronization for Enhanced Race Detection

Identifying Ad-hoc Synchronization for Enhanced Race Detection Identifying Ad-hoc Synchronization for Enhanced Race Detection IPD Tichy Lehrstuhl für Programmiersysteme IPDPS 20 April, 2010 Ali Jannesari / Walter F. Tichy KIT die Kooperation von Forschungszentrum

More information

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs , pp.45-49 http://dx.doi.org/10.14257/astl.2014.76.12 Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs Hyun-Ji Kim 1, Byoung-Kwi Lee 2, Ok-Kyoon Ha 3, and Yong-Kee Jun 1 1 Department

More information

Optimistic Hybrid Analysis

Optimistic Hybrid Analysis Optimistic Hybrid Analysis David Devecsery, Peter M. Chen, Satish Narayanasamy, Jason Flinn University of Michigan Motivation Problem Dynamic analysis is essential Memory errors Buffer overflow, use-after-free,

More information

Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism

Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan Satish Narayanasamy Peter M. Chen Jason Flinn Dept. of EECS, University

More information

Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism

Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan Satish Narayanasamy Peter M. Chen Jason Flinn Dept. of EECS, University

More information

Predicting Data Races from Program Traces

Predicting Data Races from Program Traces Predicting Data Races from Program Traces Luis M. Carril IPD Tichy Lehrstuhl für Programmiersysteme KIT die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Concurrency &

More information

TERN: Stable Deterministic Multithreading through Schedule Memoization

TERN: Stable Deterministic Multithreading through Schedule Memoization TERN: Stable Deterministic Multithreading through Schedule Memoization Heming Cui, Jingyue Wu, Chia-che Tsai, Junfeng Yang Columbia University Appeared in OSDI 10 Nondeterministic Execution One input many

More information

Do you have to reproduce the bug on the first replay attempt?

Do you have to reproduce the bug on the first replay attempt? Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park, Yuanyuan Zhou University of California, San Diego Weiwei

More information

LDetector: A low overhead data race detector for GPU programs

LDetector: A low overhead data race detector for GPU programs LDetector: A low overhead data race detector for GPU programs 1 PENGCHENG LI CHEN DING XIAOYU HU TOLGA SOYATA UNIVERSITY OF ROCHESTER 1 Data races in GPU Introduction & Contribution Impact correctness

More information

Active Testing for Concurrent Programs

Active Testing for Concurrent Programs Active Testing for Concurrent Programs Pallavi Joshi Mayur Naik Chang-Seo Park Koushik Sen 1/8/2009 ParLab Retreat ParLab, UC Berkeley Intel Research Overview Checking correctness of concurrent programs

More information

DoublePlay: Parallelizing Sequential Logging and Replay

DoublePlay: Parallelizing Sequential Logging and Replay oubleplay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan ongyoon Lee enjamin Wester Jessica Ouyang Peter M. hen Jason Flinn Satish Narayanasamy University of Michigan {kaushikv,dongyoon,bwester,jouyang,pmchen,jflinn,nsatish}@umich.edu

More information

Online Shared Memory Dependence Reduction via Bisectional Coordination

Online Shared Memory Dependence Reduction via Bisectional Coordination Online Shared Memory Dependence Reduction via Bisectional Coordination Yanyan Jiang, Chang Xu, Du Li, Xiaoxing Ma, Jian Lu State Key Lab for Novel Software Technology, Nanjing University, China Department

More information

Active Testing for Concurrent Programs

Active Testing for Concurrent Programs Active Testing for Concurrent Programs Pallavi Joshi, Mayur Naik, Chang-Seo Park, Koushik Sen 12/30/2008 ROPAS Seminar ParLab, UC Berkeley Intel Research Overview ParLab The Parallel Computing Laboratory

More information

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency, Thread. Dongkun Shin, SKKU Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point

More information

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to

More information

Automatically Classifying Benign and Harmful Data Races Using Replay Analysis

Automatically Classifying Benign and Harmful Data Races Using Replay Analysis Automatically Classifying Benign and Harmful Data Races Using Replay Analysis Satish Narayanasamy, Zhenghao Wang, Jordan Tigani, Andrew Edwards, Brad Calder Microsoft University of California, San Diego

More information

Dynamic Race Detection with LLVM Compiler

Dynamic Race Detection with LLVM Compiler Dynamic Race Detection with LLVM Compiler Compile-time instrumentation for ThreadSanitizer Konstantin Serebryany, Alexander Potapenko, Timur Iskhodzhanov, and Dmitriy Vyukov OOO Google, 7 Balchug st.,

More information

Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic!

Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic! Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University of Washington Parallel Programming is Hard

More information

Siloed Reference Analysis

Siloed Reference Analysis Siloed Reference Analysis Xing Zhou 1. Objectives: Traditional compiler optimizations must be conservative for multithreaded programs in order to ensure correctness, since the global variables or memory

More information

Review: Easy Piece 1

Review: Easy Piece 1 CS 537 Lecture 10 Threads Michael Swift 10/9/17 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1 Review: Easy Piece 1 Virtualization CPU Memory Context Switch Schedulers

More information

Accelerating Data Race Detection with Minimal Hardware Support

Accelerating Data Race Detection with Minimal Hardware Support Accelerating Data Race Detection with Minimal Hardware Support Rodrigo Gonzalez-Alberquilla 1 Karin Strauss 2,3 Luis Ceze 3 Luis Piñuel 1 1 Univ. Complutense de Madrid, Madrid, Spain {rogonzal, lpinuel}@pdi.ucm.es

More information

Optimistic Hybrid Analysis: Accelerating Dynamic Analysis through Predicated Static Analysis

Optimistic Hybrid Analysis: Accelerating Dynamic Analysis through Predicated Static Analysis Optimistic Hybrid Analysis: Accelerating Dynamic Analysis through Predicated Static Analysis David Devecsery University of Michigan ddevec@umich.edu Jason Flinn University of Michigan jflinn@umich.edu

More information

RaceMob: Crowdsourced Data Race Detection

RaceMob: Crowdsourced Data Race Detection RaceMob: Crowdsourced Data Race Detection Baris Kasikci, Cristian Zamfir, and George Candea School of Computer and Communication Sciences École Polytechnique Fédérale de Lausanne (EPFL), Switzerland {baris.kasikci,cristian.zamfir,george.candea}@epfl.ch

More information

POSIX Threads: a first step toward parallel programming. George Bosilca

POSIX Threads: a first step toward parallel programming. George Bosilca POSIX Threads: a first step toward parallel programming George Bosilca bosilca@icl.utk.edu Process vs. Thread A process is a collection of virtual memory space, code, data, and system resources. A thread

More information

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races) FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races) Cormac Flanagan UC Santa Cruz Stephen Freund Williams College Multithreading and Multicore! Multithreaded programming

More information

DoubleChecker: Efficient Sound and Precise Atomicity Checking

DoubleChecker: Efficient Sound and Precise Atomicity Checking DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University PLDI 2014 Impact of Concurrency Bugs Impact

More information

COREMU: a Portable and Scalable Parallel Full-system Emulator

COREMU: a Portable and Scalable Parallel Full-system Emulator COREMU: a Portable and Scalable Parallel Full-system Emulator Haibo Chen Parallel Processing Institute Fudan University http://ppi.fudan.edu.cn/haibo_chen Full-System Emulator Useful tool for multicore

More information

Intel Threading Tools

Intel Threading Tools Intel Threading Tools Paul Petersen, Intel -1- INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,

More information

Data-flow Analysis for Interruptdriven Microcontroller Software

Data-flow Analysis for Interruptdriven Microcontroller Software Data-flow Analysis for Interruptdriven Microcontroller Software Nathan Cooprider Advisor: John Regehr Dissertation defense School of Computing University of Utah Data-flow Analysis for Interruptdriven

More information

Samsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions

Samsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions Samsara: Efficient Deterministic Replay in Multiprocessor Environments with Hardware Virtualization Extensions Shiru Ren, Le Tan, Chunqi Li, Zhen Xiao, and Weijia Song June 24, 2016 Table of Contents 1

More information

DBT Tool. DBT Framework

DBT Tool. DBT Framework Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu

More information

Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis

Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis by David Devecsery A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of

More information

Production-Run Software Failure Diagnosis via Hardware Performance Counters. Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu

Production-Run Software Failure Diagnosis via Hardware Performance Counters. Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu Production-Run Software Failure Diagnosis via Hardware Performance Counters Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu Motivation Software inevitably fails on production machines These failures

More information

Efficient Deterministic Multithreading through Schedule Relaxation

Efficient Deterministic Multithreading through Schedule Relaxation Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang {heming, jingyue, jmg, huayang, junfeng}@cs.columbia.edu Department

More information

Detecting and Surviving Data Races using Complementary Schedules

Detecting and Surviving Data Races using Complementary Schedules Detecting and Surviving Data Races using Complementary Schedules Kaushik Veeraraghavan, Peter M. Chen, Jason Flinn, and Satish Narayanasamy University of Michigan {kaushikv,pmchen,jflinn,nsatish}@umich.edu

More information

Dynamically Detecting and Tolerating IF-Condition Data Races

Dynamically Detecting and Tolerating IF-Condition Data Races Dynamically Detecting and Tolerating IF-Condition Data Races Shanxiang Qi (Google), Abdullah Muzahid (University of San Antonio), Wonsun Ahn, Josep Torrellas University of Illinois at Urbana-Champaign

More information

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution 1. (15 points) A system with two dual-core processors has four processors available for scheduling. A CPU-intensive application (e.g., a program that spends most of its time on computation, not I/O or

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 Process creation in UNIX All processes have a unique process id getpid(),

More information

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II) SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II) Shan He School for Computational Science University of Birmingham Module 06-19321: SSC Outline Outline of Topics

More information

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Toshiyuki Maeda and Akinori Yonezawa University of Tokyo Quiz [Environment] CPU: Intel Xeon X5570 (2.93GHz)

More information

ProRace: Practical Data Race Detection for Production Use

ProRace: Practical Data Race Detection for Production Use ProRace: Practical Data Race Detection for Production Use Tong Zhang Changhee Jung Dongyoon Lee Virginia Tech {ztong, chjung, dongyoon}@vt.edu Abstract This paper presents PRORACE, a dynamic data race

More information

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; } Semaphore Semaphore S integer variable Two standard operations modify S: wait() and signal() Originally called P() and V() Can only be accessed via two indivisible (atomic) operations wait (S) { while

More information

Lightweight Fault Detection in Parallelized Programs

Lightweight Fault Detection in Parallelized Programs Lightweight Fault Detection in Parallelized Programs Li Tan UC Riverside Min Feng NEC Labs Rajiv Gupta UC Riverside CGO 13, Shenzhen, China Feb. 25, 2013 Program Parallelization Parallelism can be achieved

More information

Detecting and Surviving Data Races using Complementary Schedules

Detecting and Surviving Data Races using Complementary Schedules Detecting and Surviving Data Races using Complementary Schedules Kaushik Veeraraghavan, Peter M. Chen, Jason Flinn, and Satish Narayanasamy University of Michigan {kaushikv,pmchen,jflinn,nsatish}@umich.edu

More information

CS 261 Fall Mike Lam, Professor. Threads

CS 261 Fall Mike Lam, Professor. Threads CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain

More information

The benefits and costs of writing a POSIX kernel in a high-level language

The benefits and costs of writing a POSIX kernel in a high-level language 1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38

More information

Scheduler Activations. CS 5204 Operating Systems 1

Scheduler Activations. CS 5204 Operating Systems 1 Scheduler Activations CS 5204 Operating Systems 1 Concurrent Processing How can concurrent processing activity be structured on a single processor? How can application-level information and system-level

More information

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? CS 470 Spring 2018 POSIX Mike Lam, Professor Multithreading & Pthreads MIMD

More information

Fast dynamic program analysis Race detection. Konstantin Serebryany May

Fast dynamic program analysis Race detection. Konstantin Serebryany May Fast dynamic program analysis Race detection Konstantin Serebryany May 20 2011 Agenda Dynamic program analysis Race detection: theory ThreadSanitizer: race detector Making ThreadSanitizer

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution

More information

W4118: concurrency error. Instructor: Junfeng Yang

W4118: concurrency error. Instructor: Junfeng Yang W4118: concurrency error Instructor: Junfeng Yang Goals Identify patterns of concurrency errors (so you can avoid them in your code) Learn techniques to detect concurrency errors (so you can apply these

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

Comprehensive Kernel Instrumentation via Dynamic Binary Translation

Comprehensive Kernel Instrumentation via Dynamic Binary Translation Comprehensive Kernel Instrumentation via Dynamic Binary Translation Peter Feiner Angela Demke Brown Ashvin Goel University of Toronto 011 Complexity of Operating Systems 012 Complexity of Operating Systems

More information

ECE 574 Cluster Computing Lecture 8

ECE 574 Cluster Computing Lecture 8 ECE 574 Cluster Computing Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 16 February 2017 Announcements Too many snow days Posted a video with HW#4 Review HW#5 will

More information

A Parameterized Type System for Race-Free Java Programs

A Parameterized Type System for Race-Free Java Programs A Parameterized Type System for Race-Free Java Programs Chandrasekhar Boyapati Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology {chandra, rinard@lcs.mit.edu Data races

More information

Efficient Deterministic Multithreading through Schedule Relaxation

Efficient Deterministic Multithreading through Schedule Relaxation Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang {heming, jingyue, jmg, huayang, junfeng@cs.columbia.edu Department of

More information

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Objectives To introduce the notion of a

More information

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency exceptions

More information

11/19/2013. Imperative programs

11/19/2013. Imperative programs if (flag) 1 2 From my perspective, parallelism is the biggest challenge since high level programming languages. It s the biggest thing in 50 years because industry is betting its future that parallel programming

More information

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019 Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019 Bringing It Together OS has three hats: What are they? Processes help with one? two? three? of those hats OS protects itself

More information

CS377P Programming for Performance Multicore Performance Synchronization

CS377P Programming for Performance Multicore Performance Synchronization CS377P Programming for Performance Multicore Performance Synchronization Sreepathi Pai UTCS October 21, 2015 Outline 1 Synchronization Primitives 2 Blocking, Lock-free and Wait-free Algorithms 3 Transactional

More information

Parallelization Primer. by Christian Bienia March 05, 2007

Parallelization Primer. by Christian Bienia March 05, 2007 Parallelization Primer by Christian Bienia March 05, 2007 What is Parallelization? Answer: The creation of a new algorithm! Trivial case: Run sequential algorithm on multiple CPUs, throw locks around shared

More information

Optimistic Shared Memory Dependence Tracing

Optimistic Shared Memory Dependence Tracing Optimistic Shared Memory Dependence Tracing Yanyan Jiang, Du Li, Chang Xu, Xiaoxing Ma, Jian Lu State Key Laboratory for Novel Software Technology, Nanjing University Department of Computer Science and

More information

Samuel T. King, George W. Dunlap, and Peter M. Chen University of Michigan. Presented by: Zhiyong (Ricky) Cheng

Samuel T. King, George W. Dunlap, and Peter M. Chen University of Michigan. Presented by: Zhiyong (Ricky) Cheng Samuel T. King, George W. Dunlap, and Peter M. Chen University of Michigan Presented by: Zhiyong (Ricky) Cheng Outline Background Introduction Virtual Machine Model Time traveling Virtual Machine TTVM

More information

Motivation & examples Threads, shared memory, & synchronization

Motivation & examples Threads, shared memory, & synchronization 1 Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency

More information

Execution Replay for Multiprocessor Virtual Machines

Execution Replay for Multiprocessor Virtual Machines Execution Replay for Multiprocessor Virtual Machines George W. Dunlap, Dominic G. Lucchetti, Peter M. Chen Electrical Engineering and Computer Science Dept. University of Michigan Ann Arbor, MI 48109-2122

More information

Static and Dynamic Program Analysis: Synergies and Applications

Static and Dynamic Program Analysis: Synergies and Applications Static and Dynamic Program Analysis: Synergies and Applications Mayur Naik Intel Labs, Berkeley CS 243, Stanford University March 9, 2011 Today s Computing Platforms Trends: parallel cloud mobile Traits:

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

CS5460: Operating Systems

CS5460: Operating Systems CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that

More information

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas. a.cs.uiuc.edu

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas.   a.cs.uiuc.edu : Ligh htweight hardware support for data ra ce detection ec during systematic testing Adrian Nistor, Darko Marinov, Josep Torrellas University of Illinois, Urbana Champaign http://iacoma a.cs.uiuc.edu

More information

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits CS307 What is a thread? Threads A thread is a basic unit of CPU utilization contains a thread ID, a program counter, a register set, and a stack shares with other threads belonging to the same process

More information

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts. CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.

More information

Towards Production-Run Heisenbugs Reproduction on Commercial Hardware

Towards Production-Run Heisenbugs Reproduction on Commercial Hardware Towards Production-Run Heisenbugs Reproduction on Commercial Hardware Shiyou Huang, Bowen Cai, and Jeff Huang, Texas A&M University https://www.usenix.org/conference/atc17/technical-sessions/presentation/huang

More information

A Serializability Violation Detector for Shared-Memory Server Programs

A Serializability Violation Detector for Shared-Memory Server Programs A Serializability Violation Detector for Shared-Memory Server Programs Min Xu Rastislav Bodík Mark Hill University of Wisconsin Madison University of California, Berkeley Serializability Violation Detector:

More information

Understanding and Genera-ng High Quality Patches for Concurrency Bugs. Haopeng Liu, Yuxi Chen and Shan Lu

Understanding and Genera-ng High Quality Patches for Concurrency Bugs. Haopeng Liu, Yuxi Chen and Shan Lu 1 Understanding and Genera-ng High Quality Patches for Concurrency Bugs Haopeng Liu, Yuxi Chen and Shan Lu 2 What are concurrency bugs Synchroniza-on mistakes in mul--threaded programs 3 What are concurrency

More information

Synchronization I. Jo, Heeseung

Synchronization I. Jo, Heeseung Synchronization I Jo, Heeseung Today's Topics Synchronization problem Locks 2 Synchronization Threads cooperate in multithreaded programs To share resources, access shared data structures Also, to coordinate

More information

Lightweight Data Race Detection for Production Runs

Lightweight Data Race Detection for Production Runs Lightweight Data Race Detection for Production Runs Swarnendu Biswas, UT Austin Man Cao, Ohio State University Minjia Zhang, Microsoft Research Michael D. Bond, Ohio State University Benjamin P. Wood,

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple

More information

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process

More information

Diagnosing Production-Run Concurrency-Bug Failures. Shan Lu University of Wisconsin, Madison

Diagnosing Production-Run Concurrency-Bug Failures. Shan Lu University of Wisconsin, Madison Diagnosing Production-Run Concurrency-Bug Failures Shan Lu University of Wisconsin, Madison 1 Outline Myself and my group Production-run failure diagnosis What is this problem What are our solutions CCI

More information

CMSC 132: Object-Oriented Programming II

CMSC 132: Object-Oriented Programming II CMSC 132: Object-Oriented Programming II Synchronization in Java Department of Computer Science University of Maryland, College Park Multithreading Overview Motivation & background Threads Creating Java

More information

Multiprocessors and Locking

Multiprocessors and Locking Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access

More information

Definition Multithreading Models Threading Issues Pthreads (Unix)

Definition Multithreading Models Threading Issues Pthreads (Unix) Chapter 4: Threads Definition Multithreading Models Threading Issues Pthreads (Unix) Solaris 2 Threads Windows 2000 Threads Linux Threads Java Threads 1 Thread A Unix process (heavy-weight process HWP)

More information

Synchronization Principles

Synchronization Principles Synchronization Principles Gordon College Stephen Brinton The Problem with Concurrency Concurrent access to shared data may result in data inconsistency Maintaining data consistency requires mechanisms

More information

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018 Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018 Last Time CPU Scheduling discussed the possible policies the scheduler may use to choose the next process (or thread!)

More information

Hints for Writing About Your Research

Hints for Writing About Your Research Hints for Writing About Your Research School of Computer & Communication Sciences Good Writing so I wait for you like a lonely house till you will see me again and live in me. Till then my windows ache.

More information

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han Seoul National University, Korea Dongduk Women s University, Korea Contents Motivation and Background

More information

CS533 Concepts of Operating Systems. Jonathan Walpole

CS533 Concepts of Operating Systems. Jonathan Walpole CS533 Concepts of Operating Systems Jonathan Walpole Introduction to Threads and Concurrency Why is Concurrency Important? Why study threads and concurrent programming in an OS class? What is a thread?

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University CS 571 Operating Systems Midterm Review Angelos Stavrou, George Mason University Class Midterm: Grading 2 Grading Midterm: 25% Theory Part 60% (1h 30m) Programming Part 40% (1h) Theory Part (Closed Books):

More information

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Dongdong Deng, Wei Zhang, Borui Wang, Peisen Zhao, Shan Lu University of Wisconsin, Madison 1 Concurrency bug detec3on is

More information

Capriccio : Scalable Threads for Internet Services

Capriccio : Scalable Threads for Internet Services Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate

More information