Chimera: Hybrid Program Analysis for Determinism
|
|
- Gladys Jenkins
- 5 years ago
- Views:
Transcription
1 Chimera: Hybrid Program Analysis for Determinism Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor * Chimera image from
2 Deterministic Replay Goal: record and reproduce multithreaded execution Debugging concurrency bugs Offline heavyweight dynamic analysis Forensics and intrusion detection and many more uses Problem Multithreaded record-and-replay is too slow (>2x) or requires custom hardware - 2 -
3 Multithreaded Record-and-Replay is Slow Thread 1 Thread 2 Thread 3 Checkpoint Memory and Register State Log non-deterministic program input - Interrupts, I/O values, DMA, etc. Write Write Read Log shared memory dependencies - 3 -
4 Replay for Data-Race-Free Programs is Cheap Lock(l) X=1 Y=1 Unlock(l) Z=1 Signal(c) T1 T2 X=0 Y=0 Unlock(l) T3 order of mem. ops. order of sync. ops. Wait(c) X=2 Y=2 Z=2 Data-race-free programs Shared memory accesses are well ordered by synchronization ops. Recording happens-before order of sync. ops. is sufficient Problem: Programs with data races - 4 -
5 Our Contribution: A Hybrid Analysis Chimera Potentially racy program P Data-race-free program P Sound static data race analysis Add synchronizations for potential data races Problem: Too many false positives Profiling non-concurrent code regions Symbolic bounds analysis - 5 -
6 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Weak-lock Design Evaluation Conclusion - 6 -
7 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Weak-lock Design Evaluation Conclusion - 7 -
8 Static Data Race Analysis Find potential data-races using a sound static data race detector RELAY [Voung et al., FSE 07] Protect all potential data-races using weak-locks A new time-out lock which may be preempted (discussed later) Record and replay the happens-before order of weak-locks - 8 -
9 Protect Potential Races using Weak-locks void foo() { X = 0; Potential racy-pair void bar() { X = 1; for(i =... ){ Y[ tid ][ i ] = 0; Potential racy-pair for(i = ){ Y[ tid ][ i ] = 1; No race report Z = 1; Static analysis helps avoid instrumentation for access to Z - 9 -
10 Sources of False Positives in RELAY Sound data-race detector reports too many false data-races 53x overhead Source 1: Non-mutex synchronizations are ignored Lockset based analysis ignores fork-join, barrier, signal-wait, etc. May report a false data-race between memory instructions that can never execute concurrently Source 2: Conservative pointer analysis Overestimate variables accessed by a memory instruction May report a false data-race between memory instructions that can never access the same location
11 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Weak-lock Design Evaluation Conclusion
12 Profiling Non-concurrent Code Regions T1 foo() BARRIER BARRIER Problem Lockset based analysis ignores non-mutex synchronization ops. Solution Profile non-concurrent code regions (e.g., functions) Increase the granularity of weak-locks to protect a larger code region instead of each potential racy instruction Parallelism is preserved unless mis-profiled T2 bar()
13 Function-Level Weak-Locks void foo() { X = 0; void bar() { X = 1; for(i = ){ Y[ tid ][ i ] = 0; for(i = ){ Y[ tid ][ i ] = 1; Z = 1; if profiler says foo() and bar() are not likely to run concurrently foo() BARRIER BARRIER bar()
14 Roadmap Motivation Chimera Analysis 1) Static data race analysis 2) Profiling non-concurrent code regions 3) Symbolic bounds analysis Design Evaluation Conclusion
15 Imprecision in Conservative Pointer Analysis T1 foo() BARRIER May run Concurrently T2 bar() BARRIER
16 Imprecision in Conservative Pointer Analysis void foo() { for(i = 0 to N){ Y[ tid ][ i ] = 0; void bar() { Potential for(i= 0 to N){ racy-pair False Race Y[ tid ][ i ] = 1; Thread1 Thread 2 Y[][] RELAY uses Steensgaard s and Anderson s pointer analysis Flow-Insensitive and Context-Insensitive (FICI) analysis Naming heap objects is conservative Overestimate the variables accessed by a memory instruction
17 Symbolic Bounds Analysis Our Solution Derive the symbolic lower and upper bounds that a racy code region may access (e.g., loops) [Rugina and Rinard, PLDI 00] void foo() { for(i = 0 to N){ Y[ tid ][ i ] = 0; Symbolic Bounds Analysis Bounds: &Y[tid][0] to &Y[tid][N] Increase the granularity of weak-locks to protect a larger code region for a set of addresses specified by a symbolic expression Parallelism is preserved if the bounds are precise enough
18 Loop-level Weak-locks void foo() { X = 0; (&Y[tid][0],&Y[tid][N]) for(i = 0 to N){ Y[ tid ][ i ] = 0; void bar() { X = 1; (&Y[tid][0],&Y[tid][N]) for(i = 0 to N){ Y[ tid ][ i ] = 1; (&Y[tid][0],&Y[tid][N]) Z = 1; (&Y[tid][0],&Y[tid][N]) Symbolic bounds: &Y[tid][0] ~ &Y[tid][N]
19 Imprecise Symbolic Bounds Sources Depend on the value computed inside the code region Depend on arithmetic operations not supported in the analysis e.g., modulo operations, logical AND/OR, etc. void qux() { for(i = 0 to N){ prev = Z[ prev ]; Symbolic Bounds Analysis Bounds: -INF to +INF Choosing the optimal granularity If bounds are too imprecise and the loop body is long enough, resort to instruction (basic-block) level weak-locks for parallelism
20 Roadmap Motivation Chimera Analysis Weak-lock Design Evaluation Conclusion
21 Deadlock due to Weak-locks No deadlocks between weak-locks function-level > loop-level > instruction-level Deadlock between weak-locks and original sync. ops. is possible T 1 wait (cv) T 2 signal(cv) Time-out!!
22 Weak-lock Time-out A weak-lock might time-out Invoke a special system call to handle it T 1 Current owner T 2 Current owner Time-out!! wait (cv) Logged order of weak-locks signal(cv) Weak-lock guarantee Only one thread holds a given weak-lock at any given time Mutual exclusion may be compromised; but sufficient for replay
23 Roadmap Motivation Chimera Analysis Weak-lock Design Evaluation Conclusion
24 Implementation Source-to-source Instrumentation Implemented in OCaml using CIL as a front end Static analysis Data race detection: RELAY [Voung et al., FSE 07] Include all library source codes for soundness (uclibc s libc, libm, etc.) Symbolic bounds analysis: [Rugina and Rinard, PLDI 00] Intra-procedural analysis for racy loops only Runtime system Modified Linux kernel to record/replay program input Modified pthread library to record/replay happens-before order of original synchronization operations and weak-locks
25 Evaluation Setup Test Environment 2.66 GHz 8-core Xeon processor with 4 GB of RAM Different set of inputs for profiling and performance evaluation Average of five trials with 4 worker threads 2, 4, 8 threads for scalability results Benchmarks Desktop applications aget, pfscan, and pbzip2 Server programs knot and apache SPLASH-2 suite ocean, water-nsq, fft, and radix
26 Record and Replay Performance 2.5 record replay 86% slowdown Normalized perf. overhead % slowdown 39% 0 aget pfscan pbzip2 knot apache ocean water fft radix average Recording : 39% on average Replay : similar to recording; much lower for I/O intensive prgs
27 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > 53x aget pfscan pbzip2 knot apache ocean water fft radix average
28 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation
29 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation Exception: control-flow dependency (e.g., pfscan)
30 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation Exception: control-flow dependency (e.g., pfscan)
31 Effectiveness of Coarse-grained Weak-locks Normalized recording overhead instr instr + func instr + loop instr + loop + func instr + bb + loop + func > 1.39x aget pfscan pbzip2 knot apache ocean water fft radix average Coarse-grained weak-locks reduce the cost of instrumentation Exception: control-flow dependency (e.g., pfscan)
32 Breakdown of Recording Overhead Normalized recording overhead func locks loop locks instr/bb locks sync op & system log aget pfscan pbzip2 knot apache ocean water fft radix Weak-lock overhead = contention (waiting) cost + logging cost
33 Breakdown of Recording Overhead Normalized recording overhead func wait func log loop wait loop log instr/bb wait instr/bb log sync op & system log aget pfscan pbzip2 knot apache ocean water fft radix Weak-lock overhead = contention (waiting) cost + logging cost High loop-lock contention High instr/bb-lock contention
34 Normalized recording overhead Scalability 2p 4p 8p aget pfscan pbzip2 knot apache ocean water fft radix average Scientific applications scale worse due to imprecise symbolic bounds analysis
35 Conclusion Goal: Software-only deterministic multiprocessor replay systems Chimera Analysis Static data race analysis Find and protect potential data races with weak-locks Instruction/basic-block-level weak-locks Profiling non-concurrent code regions Address the inadequacy of lockset-based algorithm Function-level weak-locks Symbolic bounds analysis Address the imprecision of conservative pointer analysis Loop-level weak-locks Low Recording Overhead 39% recording overhead for 4 worker threads
36 Thank you
Deterministic Replay and Data Race Detection for Multithreaded Programs
Deterministic Replay and Data Race Detection for Multithreaded Programs Dongyoon Lee Computer Science Department - 1 - The Shift to Multicore Systems 100+ cores Desktop/Server 8+ cores Smartphones 2+ cores
More informationEmbedded System Programming
Embedded System Programming Multicore ES (Module 40) Yann-Hang Lee Arizona State University yhlee@asu.edu (480) 727-7507 Summer 2014 The Era of Multi-core Processors RTOS Single-Core processor SMP-ready
More informationOptimistic Shared Memory Dependence Tracing
Optimistic Shared Memory Dependence Tracing Yanyan Jiang1, Du Li2, Chang Xu1, Xiaoxing Ma1 and Jian Lu1 Nanjing University 2 Carnegie Mellon University 1 powered by Understanding Non-determinism Concurrent
More informationRaceMob: Crowdsourced Data Race Detec,on
RaceMob: Crowdsourced Data Race Detec,on Baris Kasikci, Cris,an Zamfir, and George Candea School of Computer & Communica3on Sciences Data Races to shared memory loca,on By mul3ple threads At least one
More informationEfficient Data Race Detection for Unified Parallel C
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Efficient Data Race Detection for Unified Parallel C ParLab Winter Retreat 1/14/2011" Costin Iancu, LBL" Nick Jalbert, UC Berkeley" Chang-Seo Park,
More informationMajor Project, CSD411
Major Project, CSD411 Deterministic Execution In Multithreaded Applications Supervisor Prof. Sorav Bansal Ashwin Kumar 2010CS10211 Himanshu Gupta 2010CS10220 1 Deterministic Execution In Multithreaded
More informationDeterministic Process Groups in
Deterministic Process Groups in Tom Bergan Nicholas Hunt, Luis Ceze, Steven D. Gribble University of Washington A Nondeterministic Program global x=0 Thread 1 Thread 2 t := x x := t + 1 t := x x := t +
More informationAccelerating Dynamic Data Race Detection Using Static Thread Interference Analysis
Accelerating Dynamic Data Race Detection Using Static Thread Interference Peng Di and Yulei Sui School of Computer Science and Engineering The University of New South Wales 2052 Sydney Australia March
More informationIdentifying Ad-hoc Synchronization for Enhanced Race Detection
Identifying Ad-hoc Synchronization for Enhanced Race Detection IPD Tichy Lehrstuhl für Programmiersysteme IPDPS 20 April, 2010 Ali Jannesari / Walter F. Tichy KIT die Kooperation von Forschungszentrum
More informationDynamic Monitoring Tool based on Vector Clocks for Multithread Programs
, pp.45-49 http://dx.doi.org/10.14257/astl.2014.76.12 Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs Hyun-Ji Kim 1, Byoung-Kwi Lee 2, Ok-Kyoon Ha 3, and Yong-Kee Jun 1 1 Department
More informationOptimistic Hybrid Analysis
Optimistic Hybrid Analysis David Devecsery, Peter M. Chen, Satish Narayanasamy, Jason Flinn University of Michigan Motivation Problem Dynamic analysis is essential Memory errors Buffer overflow, use-after-free,
More informationRespec: Efficient Online Multiprocessor Replay via Speculation and External Determinism
Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan Satish Narayanasamy Peter M. Chen Jason Flinn Dept. of EECS, University
More informationRespec: Efficient Online Multiprocessor Replay via Speculation and External Determinism
Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan Satish Narayanasamy Peter M. Chen Jason Flinn Dept. of EECS, University
More informationPredicting Data Races from Program Traces
Predicting Data Races from Program Traces Luis M. Carril IPD Tichy Lehrstuhl für Programmiersysteme KIT die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Concurrency &
More informationTERN: Stable Deterministic Multithreading through Schedule Memoization
TERN: Stable Deterministic Multithreading through Schedule Memoization Heming Cui, Jingyue Wu, Chia-che Tsai, Junfeng Yang Columbia University Appeared in OSDI 10 Nondeterministic Execution One input many
More informationDo you have to reproduce the bug on the first replay attempt?
Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park, Yuanyuan Zhou University of California, San Diego Weiwei
More informationLDetector: A low overhead data race detector for GPU programs
LDetector: A low overhead data race detector for GPU programs 1 PENGCHENG LI CHEN DING XIAOYU HU TOLGA SOYATA UNIVERSITY OF ROCHESTER 1 Data races in GPU Introduction & Contribution Impact correctness
More informationActive Testing for Concurrent Programs
Active Testing for Concurrent Programs Pallavi Joshi Mayur Naik Chang-Seo Park Koushik Sen 1/8/2009 ParLab Retreat ParLab, UC Berkeley Intel Research Overview Checking correctness of concurrent programs
More informationDoublePlay: Parallelizing Sequential Logging and Replay
oubleplay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan ongyoon Lee enjamin Wester Jessica Ouyang Peter M. hen Jason Flinn Satish Narayanasamy University of Michigan {kaushikv,dongyoon,bwester,jouyang,pmchen,jflinn,nsatish}@umich.edu
More informationOnline Shared Memory Dependence Reduction via Bisectional Coordination
Online Shared Memory Dependence Reduction via Bisectional Coordination Yanyan Jiang, Chang Xu, Du Li, Xiaoxing Ma, Jian Lu State Key Lab for Novel Software Technology, Nanjing University, China Department
More informationActive Testing for Concurrent Programs
Active Testing for Concurrent Programs Pallavi Joshi, Mayur Naik, Chang-Seo Park, Koushik Sen 12/30/2008 ROPAS Seminar ParLab, UC Berkeley Intel Research Overview ParLab The Parallel Computing Laboratory
More informationConcurrency, Thread. Dongkun Shin, SKKU
Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point
More informationOverview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions
CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to
More informationAutomatically Classifying Benign and Harmful Data Races Using Replay Analysis
Automatically Classifying Benign and Harmful Data Races Using Replay Analysis Satish Narayanasamy, Zhenghao Wang, Jordan Tigani, Andrew Edwards, Brad Calder Microsoft University of California, San Diego
More informationDynamic Race Detection with LLVM Compiler
Dynamic Race Detection with LLVM Compiler Compile-time instrumentation for ThreadSanitizer Konstantin Serebryany, Alexander Potapenko, Timur Iskhodzhanov, and Dmitriy Vyukov OOO Google, 7 Balchug st.,
More informationExplicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic!
Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University of Washington Parallel Programming is Hard
More informationSiloed Reference Analysis
Siloed Reference Analysis Xing Zhou 1. Objectives: Traditional compiler optimizations must be conservative for multithreaded programs in order to ensure correctness, since the global variables or memory
More informationReview: Easy Piece 1
CS 537 Lecture 10 Threads Michael Swift 10/9/17 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1 Review: Easy Piece 1 Virtualization CPU Memory Context Switch Schedulers
More informationAccelerating Data Race Detection with Minimal Hardware Support
Accelerating Data Race Detection with Minimal Hardware Support Rodrigo Gonzalez-Alberquilla 1 Karin Strauss 2,3 Luis Ceze 3 Luis Piñuel 1 1 Univ. Complutense de Madrid, Madrid, Spain {rogonzal, lpinuel}@pdi.ucm.es
More informationOptimistic Hybrid Analysis: Accelerating Dynamic Analysis through Predicated Static Analysis
Optimistic Hybrid Analysis: Accelerating Dynamic Analysis through Predicated Static Analysis David Devecsery University of Michigan ddevec@umich.edu Jason Flinn University of Michigan jflinn@umich.edu
More informationRaceMob: Crowdsourced Data Race Detection
RaceMob: Crowdsourced Data Race Detection Baris Kasikci, Cristian Zamfir, and George Candea School of Computer and Communication Sciences École Polytechnique Fédérale de Lausanne (EPFL), Switzerland {baris.kasikci,cristian.zamfir,george.candea}@epfl.ch
More informationPOSIX Threads: a first step toward parallel programming. George Bosilca
POSIX Threads: a first step toward parallel programming George Bosilca bosilca@icl.utk.edu Process vs. Thread A process is a collection of virtual memory space, code, data, and system resources. A thread
More informationFastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)
FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races) Cormac Flanagan UC Santa Cruz Stephen Freund Williams College Multithreading and Multicore! Multithreaded programming
More informationDoubleChecker: Efficient Sound and Precise Atomicity Checking
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University PLDI 2014 Impact of Concurrency Bugs Impact
More informationCOREMU: a Portable and Scalable Parallel Full-system Emulator
COREMU: a Portable and Scalable Parallel Full-system Emulator Haibo Chen Parallel Processing Institute Fudan University http://ppi.fudan.edu.cn/haibo_chen Full-System Emulator Useful tool for multicore
More informationIntel Threading Tools
Intel Threading Tools Paul Petersen, Intel -1- INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,
More informationData-flow Analysis for Interruptdriven Microcontroller Software
Data-flow Analysis for Interruptdriven Microcontroller Software Nathan Cooprider Advisor: John Regehr Dissertation defense School of Computing University of Utah Data-flow Analysis for Interruptdriven
More informationSamsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions
Samsara: Efficient Deterministic Replay in Multiprocessor Environments with Hardware Virtualization Extensions Shiru Ren, Le Tan, Chunqi Li, Zhen Xiao, and Weijia Song June 24, 2016 Table of Contents 1
More informationDBT Tool. DBT Framework
Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu
More informationEnabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis
Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis by David Devecsery A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of
More informationProduction-Run Software Failure Diagnosis via Hardware Performance Counters. Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu
Production-Run Software Failure Diagnosis via Hardware Performance Counters Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu Motivation Software inevitably fails on production machines These failures
More informationEfficient Deterministic Multithreading through Schedule Relaxation
Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang {heming, jingyue, jmg, huayang, junfeng}@cs.columbia.edu Department
More informationDetecting and Surviving Data Races using Complementary Schedules
Detecting and Surviving Data Races using Complementary Schedules Kaushik Veeraraghavan, Peter M. Chen, Jason Flinn, and Satish Narayanasamy University of Michigan {kaushikv,pmchen,jflinn,nsatish}@umich.edu
More informationDynamically Detecting and Tolerating IF-Condition Data Races
Dynamically Detecting and Tolerating IF-Condition Data Races Shanxiang Qi (Google), Abdullah Muzahid (University of San Antonio), Wonsun Ahn, Josep Torrellas University of Illinois at Urbana-Champaign
More informationEECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution
1. (15 points) A system with two dual-core processors has four processors available for scheduling. A CPU-intensive application (e.g., a program that spends most of its time on computation, not I/O or
More informationCS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University
CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 Process creation in UNIX All processes have a unique process id getpid(),
More informationSSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)
SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II) Shan He School for Computational Science University of Birmingham Module 06-19321: SSC Outline Outline of Topics
More informationTyped Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts
Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Toshiyuki Maeda and Akinori Yonezawa University of Tokyo Quiz [Environment] CPU: Intel Xeon X5570 (2.93GHz)
More informationProRace: Practical Data Race Detection for Production Use
ProRace: Practical Data Race Detection for Production Use Tong Zhang Changhee Jung Dongyoon Lee Virginia Tech {ztong, chjung, dongyoon}@vt.edu Abstract This paper presents PRORACE, a dynamic data race
More informationSemaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }
Semaphore Semaphore S integer variable Two standard operations modify S: wait() and signal() Originally called P() and V() Can only be accessed via two indivisible (atomic) operations wait (S) { while
More informationLightweight Fault Detection in Parallelized Programs
Lightweight Fault Detection in Parallelized Programs Li Tan UC Riverside Min Feng NEC Labs Rajiv Gupta UC Riverside CGO 13, Shenzhen, China Feb. 25, 2013 Program Parallelization Parallelism can be achieved
More informationDetecting and Surviving Data Races using Complementary Schedules
Detecting and Surviving Data Races using Complementary Schedules Kaushik Veeraraghavan, Peter M. Chen, Jason Flinn, and Satish Narayanasamy University of Michigan {kaushikv,pmchen,jflinn,nsatish}@umich.edu
More informationCS 261 Fall Mike Lam, Professor. Threads
CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain
More informationThe benefits and costs of writing a POSIX kernel in a high-level language
1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38
More informationScheduler Activations. CS 5204 Operating Systems 1
Scheduler Activations CS 5204 Operating Systems 1 Concurrent Processing How can concurrent processing activity be structured on a single processor? How can application-level information and system-level
More informationWarm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?
Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? CS 470 Spring 2018 POSIX Mike Lam, Professor Multithreading & Pthreads MIMD
More informationFast dynamic program analysis Race detection. Konstantin Serebryany May
Fast dynamic program analysis Race detection Konstantin Serebryany May 20 2011 Agenda Dynamic program analysis Race detection: theory ThreadSanitizer: race detector Making ThreadSanitizer
More informationCS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University
CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution
More informationW4118: concurrency error. Instructor: Junfeng Yang
W4118: concurrency error Instructor: Junfeng Yang Goals Identify patterns of concurrency errors (so you can avoid them in your code) Learn techniques to detect concurrency errors (so you can apply these
More informationEffective Performance Measurement and Analysis of Multithreaded Applications
Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined
More informationComprehensive Kernel Instrumentation via Dynamic Binary Translation
Comprehensive Kernel Instrumentation via Dynamic Binary Translation Peter Feiner Angela Demke Brown Ashvin Goel University of Toronto 011 Complexity of Operating Systems 012 Complexity of Operating Systems
More informationECE 574 Cluster Computing Lecture 8
ECE 574 Cluster Computing Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 16 February 2017 Announcements Too many snow days Posted a video with HW#4 Review HW#5 will
More informationA Parameterized Type System for Race-Free Java Programs
A Parameterized Type System for Race-Free Java Programs Chandrasekhar Boyapati Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology {chandra, rinard@lcs.mit.edu Data races
More informationEfficient Deterministic Multithreading through Schedule Relaxation
Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang {heming, jingyue, jmg, huayang, junfeng@cs.columbia.edu Department of
More informationChapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads
Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Objectives To introduce the notion of a
More information7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs
Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency exceptions
More information11/19/2013. Imperative programs
if (flag) 1 2 From my perspective, parallelism is the biggest challenge since high level programming languages. It s the biggest thing in 50 years because industry is betting its future that parallel programming
More informationThreads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019
Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019 Bringing It Together OS has three hats: What are they? Processes help with one? two? three? of those hats OS protects itself
More informationCS377P Programming for Performance Multicore Performance Synchronization
CS377P Programming for Performance Multicore Performance Synchronization Sreepathi Pai UTCS October 21, 2015 Outline 1 Synchronization Primitives 2 Blocking, Lock-free and Wait-free Algorithms 3 Transactional
More informationParallelization Primer. by Christian Bienia March 05, 2007
Parallelization Primer by Christian Bienia March 05, 2007 What is Parallelization? Answer: The creation of a new algorithm! Trivial case: Run sequential algorithm on multiple CPUs, throw locks around shared
More informationOptimistic Shared Memory Dependence Tracing
Optimistic Shared Memory Dependence Tracing Yanyan Jiang, Du Li, Chang Xu, Xiaoxing Ma, Jian Lu State Key Laboratory for Novel Software Technology, Nanjing University Department of Computer Science and
More informationSamuel T. King, George W. Dunlap, and Peter M. Chen University of Michigan. Presented by: Zhiyong (Ricky) Cheng
Samuel T. King, George W. Dunlap, and Peter M. Chen University of Michigan Presented by: Zhiyong (Ricky) Cheng Outline Background Introduction Virtual Machine Model Time traveling Virtual Machine TTVM
More informationMotivation & examples Threads, shared memory, & synchronization
1 Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency
More informationExecution Replay for Multiprocessor Virtual Machines
Execution Replay for Multiprocessor Virtual Machines George W. Dunlap, Dominic G. Lucchetti, Peter M. Chen Electrical Engineering and Computer Science Dept. University of Michigan Ann Arbor, MI 48109-2122
More informationStatic and Dynamic Program Analysis: Synergies and Applications
Static and Dynamic Program Analysis: Synergies and Applications Mayur Naik Intel Labs, Berkeley CS 243, Stanford University March 9, 2011 Today s Computing Platforms Trends: parallel cloud mobile Traits:
More informationSoftware-Controlled Multithreading Using Informing Memory Operations
Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University
More informationCS5460: Operating Systems
CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that
More informationLight64: Ligh support for data ra. Darko Marinov, Josep Torrellas. a.cs.uiuc.edu
: Ligh htweight hardware support for data ra ce detection ec during systematic testing Adrian Nistor, Darko Marinov, Josep Torrellas University of Illinois, Urbana Champaign http://iacoma a.cs.uiuc.edu
More informationThreads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits
CS307 What is a thread? Threads A thread is a basic unit of CPU utilization contains a thread ID, a program counter, a register set, and a stack shares with other threads belonging to the same process
More informationCPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.
CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.
More informationTowards Production-Run Heisenbugs Reproduction on Commercial Hardware
Towards Production-Run Heisenbugs Reproduction on Commercial Hardware Shiyou Huang, Bowen Cai, and Jeff Huang, Texas A&M University https://www.usenix.org/conference/atc17/technical-sessions/presentation/huang
More informationA Serializability Violation Detector for Shared-Memory Server Programs
A Serializability Violation Detector for Shared-Memory Server Programs Min Xu Rastislav Bodík Mark Hill University of Wisconsin Madison University of California, Berkeley Serializability Violation Detector:
More informationUnderstanding and Genera-ng High Quality Patches for Concurrency Bugs. Haopeng Liu, Yuxi Chen and Shan Lu
1 Understanding and Genera-ng High Quality Patches for Concurrency Bugs Haopeng Liu, Yuxi Chen and Shan Lu 2 What are concurrency bugs Synchroniza-on mistakes in mul--threaded programs 3 What are concurrency
More informationSynchronization I. Jo, Heeseung
Synchronization I Jo, Heeseung Today's Topics Synchronization problem Locks 2 Synchronization Threads cooperate in multithreaded programs To share resources, access shared data structures Also, to coordinate
More informationLightweight Data Race Detection for Production Runs
Lightweight Data Race Detection for Production Runs Swarnendu Biswas, UT Austin Man Cao, Ohio State University Minjia Zhang, Microsoft Research Michael D. Bond, Ohio State University Benjamin P. Wood,
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple
More informationQuestions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process
Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process
More informationDiagnosing Production-Run Concurrency-Bug Failures. Shan Lu University of Wisconsin, Madison
Diagnosing Production-Run Concurrency-Bug Failures Shan Lu University of Wisconsin, Madison 1 Outline Myself and my group Production-run failure diagnosis What is this problem What are our solutions CCI
More informationCMSC 132: Object-Oriented Programming II
CMSC 132: Object-Oriented Programming II Synchronization in Java Department of Computer Science University of Maryland, College Park Multithreading Overview Motivation & background Threads Creating Java
More informationMultiprocessors and Locking
Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access
More informationDefinition Multithreading Models Threading Issues Pthreads (Unix)
Chapter 4: Threads Definition Multithreading Models Threading Issues Pthreads (Unix) Solaris 2 Threads Windows 2000 Threads Linux Threads Java Threads 1 Thread A Unix process (heavy-weight process HWP)
More informationSynchronization Principles
Synchronization Principles Gordon College Stephen Brinton The Problem with Concurrency Concurrent access to shared data may result in data inconsistency Maintaining data consistency requires mechanisms
More informationThreads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018
Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018 Last Time CPU Scheduling discussed the possible policies the scheduler may use to choose the next process (or thread!)
More informationHints for Writing About Your Research
Hints for Writing About Your Research School of Computer & Communication Sciences Good Writing so I wait for you like a lonely house till you will see me again and live in me. Till then my windows ache.
More informationHigh-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han
High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han Seoul National University, Korea Dongduk Women s University, Korea Contents Motivation and Background
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole Introduction to Threads and Concurrency Why is Concurrency Important? Why study threads and concurrent programming in an OS class? What is a thread?
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationCS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University
CS 571 Operating Systems Midterm Review Angelos Stavrou, George Mason University Class Midterm: Grading 2 Grading Midterm: 25% Theory Part 60% (1h 30m) Programming Part 40% (1h) Theory Part (Closed Books):
More informationUnderstanding the Interleaving Space Overlap across Inputs and So7ware Versions
Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Dongdong Deng, Wei Zhang, Borui Wang, Peisen Zhao, Shan Lu University of Wisconsin, Madison 1 Concurrency bug detec3on is
More informationCapriccio : Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate
More information