Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

Similar documents
Development of Technique for Healing Data Races based on Software Transactional Memory

GENERAL MESSAGE RACES IN DATA DISTRIBUTION SERVICE PROGRAMS FOR AIRBORNE SOFTWARE

Chimera: Hybrid Program Analysis for Determinism

Embedded System Programming

CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs Pallavi Joshi 1, Mayur Naik 2, Chang-Seo Park 1, and Koushik Sen 1

Debugging Multi-Threaded Applications using Pin-Augmented GDB (PGDB)

FastTrack: Efficient and Precise Dynamic Race Detection (+ identifying destructive races)

Identifying Ad-hoc Synchronization for Enhanced Race Detection

Dynamic Race Detection with LLVM Compiler

FastTrack: Efficient and Precise Dynamic Race Detection By Cormac Flanagan and Stephen N. Freund

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

Trajectory Planning for Mobile Robots with Considering Velocity Constraints on Xenomai

Motivation & examples Threads, shared memory, & synchronization

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

Configuration Tool for ARINC 653 Operating Systems. {etchoi, jassmin, Abstract

Survey of Dynamic Instrumentation of Operating Systems

CS 261 Fall Mike Lam, Professor. Threads

Intel Threading Tools

DoubleChecker: Efficient Sound and Precise Atomicity Checking

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Deterministic Replay and Data Race Detection for Multithreaded Programs

Concurrency, Thread. Dongkun Shin, SKKU

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Dynamic Recognition of Synchronization Operations for Improved Data Race Detection

Efficient Data Race Detection for Unified Parallel C

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Sound Thread Local Analysis for Lockset-Based Dynamic Data Race Detection

Hybrid Dynamic Data Race Detection

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Hardware Support for Software Debugging

PACER: Proportional Detection of Data Races

Efficient Data Race Detection for C/C++ Programs Using Dynamic Granularity

Questions from last time

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Dynamic Analysis of Embedded Software using Execution Replay

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

On-the-fly Race Detection in Multi-Threaded Programs

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

A Personal Information Retrieval System in a Web Environment

Performance Tools for Technical Computing

CMSC 714 Lecture 14 Lamport Clocks and Eraser

CS420: Operating Systems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

Threads. CS3026 Operating Systems Lecture 06

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Lecture 13. Shared memory: Architecture and programming

Programming with Shared Memory. Nguyễn Quang Hùng

Analysis of Virtual Machine Scalability based on Queue Spinlock

A Serializability Violation Detector for Shared-Memory Server Programs

Runtime Atomicity Analysis of Multi-threaded Programs

Supporting Collaborative 3D Editing over Cloud Storage

Automatically Classifying Benign and Harmful Data Races Using Replay Analysis

Software Architecture and Engineering: Part II

Call Paths for Pin Tools

Professional Multicore Programming. Design and Implementation for C++ Developers

Robot localization method based on visual features and their geometric relationship

11/19/2013. Imperative programs

An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

CSE 4/521 Introduction to Operating Systems

Application parallelization for multi-core Android devices

Optimising Multicore JVMs. Khaled Alnowaiser

Chapter 4: Multi-Threaded Programming

Real-time processing for intelligent-surveillance applications

Lightweight Data Race Detection for Production Runs

Application of Fuzzy Logic Control to Dynamic Channel Allocation of WiMedia UWB Networks

Eliminate Threading Errors to Improve Program Stability

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

Chapter 4: Threads. Chapter 4: Threads

Scheduler Activations. CS 5204 Operating Systems 1

RACE CONDITION DETECTION FOR DEBUGGING SHARED-MEMORY PARALLEL PROGRAMS

Eraser: Dynamic Data Race Detection

Process Description and Control

Analysis of an OpenMP Program for Race Detection

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processes, Threads, SMP, and Microkernels

Chapter 4: Threads. Operating System Concepts with Java 8 th Edition

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Context-Switch-Directed Verification in DIVINE

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk

Threads. Threads (continued)

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

OPERATING SYSTEM. Chapter 4: Threads

Software Analysis Techniques for Detecting Data Race

Chapter 4: Threads. Operating System Concepts 9 th Edition

Operating Systems: Internals and Design Principles. Chapter 4 Threads Seventh Edition By William Stallings

Real-Time Automatic Detection of Stuck Transactions

Time Stamp based Multiple Snapshot Management Method for Storage System

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Concurrent Programming. Alexandre David

TERN: Stable Deterministic Multithreading through Schedule Memoization

Eraser : A dynamic data race detector for multithreaded programs By Stefan Savage et al. Krish 9/1/2011

Processes The Process Model. Chapter 2 Processes and Threads. Process Termination. Process States (1) Process Hierarchies

Chapter 4: Multithreaded Programming. Operating System Concepts 8 th Edition,

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

HEAT: An Integrated Static and Dynamic Approach for Thread Escape Analysis

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Atomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions

Transcription:

, pp.45-49 http://dx.doi.org/10.14257/astl.2014.76.12 Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs Hyun-Ji Kim 1, Byoung-Kwi Lee 2, Ok-Kyoon Ha 3, and Yong-Kee Jun 1 1 Department of Informatics, Gyeongsang National University, 501, Jinjudea-ro, Jinju, Republic of Korea 2 POSCO ICT Company Ltd., 68, Hodong-ro, Nam-gu, Pohang, Republic of Korea 3 Engineering Research Institute, Gyeongsang National University, 501, Jinjdea-ro, Jinju, Republic of Korea hecho3927@gmail.com, bk.lee@poscoict.com, {jassmin, jun}@gnu.ac.kr Abstract. Monitoring program execution to collect the information of program executions, such as the occurrence of concurrent threads, memory accesses, and thread synchronization, are important to locate data races for debugging multithread programs. This paper presents an efficient and practical monitoring tool, called VcTrace that analyzes partial ordering of concurrent threads during an execution of the program based on vector clock system, and VcTrace is applied to detect data races for debugging real applications using multithreads. Empirical results on C/C++ benchmarks using multithreads show that VcTrace is a sound and practical tool for on-the-fly data race detection as well as analyzing multithread programs. Keywords: Multithreads, data races, debugging, dynamic monitoring. 1 Introduction It is well known that data races [1,2] are the hardest defect to handle in multithread programs because of their non-deterministic interleaving of concurrent threads. The analysis techniques for debugging data races generally use one of static and dynamic techniques [3,4]. Static techniques analyze the program information to report data races from source codes without any execution, while dynamic techniques locate data races from execution information of the program. Static techniques are sound, but imprecise since they report too many false positives. Dynamic techniques are precise and provide higher reliability of analysis than static techniques, but these techniques are unsound and inefficient because they require additional runtime overhead to monitor the information of program executions, such as the occurrence of concurrent threads, the accesses to shared memory locations, and thread synchronization. Dynamic techniques employ trace based post-mortem methods or on-the-fly methods, which report data races occurred in an execution of a programs. Post-mortem methods analyze the traced information or re-execute the program after an execution. On-the- ISSN: 2287-1233 ASTL Copyright 2014 SERSC

fly methods are based on three different analysis methods: lockset analysis, happensbefore analysis, and hybrid analysis. 2 Background For detecting data races during an execution of multithread programs, it is important to identify the partial order relation of two accesses (threads) by analyzing the Happens-before Relation [5]. The partial order relation for two accesses, e t and e u on thread t and u respectively, can be identified as following: 1. If e t must happen at an earlier time than e u, e t happens before e u, denoted by e t e u. 2. If e t e u or e u e t is not satisfied, we say that e t is concurrent with e u, denoted by e t e u. Vector clock (VC) [6] is a representation of the Happens-before relation to identify the partial order relation of concurrent threads considering thread operations and synchronization operations, such as fork-join operation. A VC is a set of clock value c for each thread while the program is executing. Thus, a VC C t for thread t represents <c 1,, c n >, where n designates the maximum number of simultaneously active threads during an execution. Using the VCs of each thread, we simply analyze the partial order relation between any two threads. If the clock value for a thread t, denoted by C t [t], is less or equal to C t [u] which is the corresponding clock value of another thread u, ( iff C t [t] C t [u]), we can analyze t u. Otherwise, C t [t] > C u [t] or C t [u] < C u [u], t u. Fig. 1 represents a multithread execution using a direct acyclic graph (DAG), and the VCs are allocated for each thread by the above partial order relation. In Fig. 1, we can easily analyze that a read r 1 on a thread t 1 happens before a read r 3 on t 0, r 1 r 3, because their VCs are satisfied (C t1 [t 0 ] = 0) (C t0 [t 0 ] = 2) by a synchronization between t 1 and t 0. On the other hand, we know that r 4 on t 1 is concurrent with r 5 on t 2, r 4 r 5, due to the fact that their VCs are satisfied (C t1 [t 1 ] = 3) > (C t2 [t 1 ] = 2). init( ) t2 r5 <0,2,1, > init( ) acq( ) rel( ) fork( ) join( ) t1 r1 r4 <0,1,0, > <0,2,0, > <0,3,0, > <0,4,1, > init( ) acq( ) rel( ) t0 w2 r3 <1,0,0, > <1,1,0, > <2,1,0, > Fig. 1. An example of DAG for an execution of multiple threads 46 Copyright 2014 SERSC

3 Design of Dynamic Monitoring Tool To locate data races during an execution of multithread programs, we must monitor important operations, such as thread operations and synchronization operations, and provide thread identification, such as VCs, which maintains logical concurrency information of multiple threads for the partial order relation. This paper presents a dynamic monitoring tool, called VcTrace that analyzes partial ordering of concurrent threads and their accesses to the shared memory locations during an execution of the program based on vector clock system, and applies it to detect data races for debugging applications. The tool considers following thread operations: init( ) for a thread initialization, fork( ) for the creation of a child thread, and join( ) which terminates forked child threads and spawns a single thread. It also collects the information of thread synchronization considering acq_lock( ) and rel_lock( ) used to provide the mutual exclusion of concurrent threads, wait( ) and signal( ) for the internal interleaving of concurrent threads, and barrier( ) for the waiting of multiple threads until the threads satisfy a waiting condition. Additionally, the tool monitors the thread accesses (read/write) to the shared memory locations. We design VcTrace to monitor these thread and synchronization operations during programs executions based on a dynamic binary instrumentation framework, called PIN [7]. The Tool requires no source code annotation to monitor an execution of the programs, because it uses a just-in-time (JIT) compiler to recompile target program binaries for dynamic instrumentation. Fig. 2 depicts the overall architecture of VcTrace. Monitoring Framework Instrumentor - JIT Compiler Monitor Multithread Programs Thread Operations Sync. Operations Thread Identification Routines Execution Report Memory Accesses Fig. 2. The overall architecture of VcTrace The tool consists of three monitoring modules which monitor thread operations, synchronization operations, and the memory accesses of threads, and the thread identification routines which create and maintain VCs to analyze the partial order relation of concurrent threads. To offer the correct identification of concurrent threads, VcTrace uses System Thread Id, Pthread Id, and PIN Thread Id for each thread. The System Thread Id is the thread id allocated by the operating system, and the Pthread Id is a identifier allocated by the pthread functions such as pthread_create( ). The PIN Thread Id is the logical identifier created whenever the Monitor catches a thread start operation, such as init( ) and fork( ). Finally, VcTrace provides comprehensive Copyright 2014 SERSC 47

information which is possible to analyze dynamically data races in an execution only using the binary codes of multithread programs. 4 Experimentation We implemented VcTrace using C/C++ on top of PIN software framework and evaluated the efficiency of the tool for data race. The implementation and experimentation was carried on a system with Intel Xeon Quard-core 2 CPUs and 48GB main memory under Linux operating system (kernel 2.6). The FastTrack algorithm was connected to VcTrace to detect data races. We employed three benchmarks using multithreads, MySql (an open source DBMS), Cherokee (an open source Web Server application), and X.264 (an open library for encoding video streams), and these applications were executed five times to measure the runtime and memory overheads. Table 1 shows measured results of average runtime and memory consumption for the three benchmarks under VcTrace. For the experiments, X.264 used 64 multiple threads during an execution, and 70 threads was used for MySql and Cherokee. In the table, Origin means the original execution without VcTrace, and Monitor means measured results on monitoring phase only by VcTrace. Detect column indicates the overheads on VcTrace connected with data race detection, and Races column indicates the number of located data races under the detection phase. Table 1. Measured runtime and memory overhead results for three benchmarks Benchmarks Time (Sec) Memory (MB) Origin Monitor Detect Origin Monitor Detect Races X.264 (64) 0.4 2 12.4 30 228 1,134 3 MySql (70) 60 60 60 337 686 1,244 13 Cherokee (70) 60 60 60 912 1,349 1,453 6 In case of X.264, Monitoring phase incurred an average runtime overhead of 5X and an average memory overhead of 7.6X, and the tool required average runtime overhead of 31X and memory overhead of 37.8X for detection phase. For MySql and Cherokee, VcTrace incurred average memory overheads of 2X and 1.4X, respectively, in monitoring phase, and incurred average memory overheads of 3.7X and 1.6X, respectively, in detection phase. Additionally, the tool precisely located data races for three benchmarks, because these data races are well known that they exist in these applications by a prior work [8]. 48 Copyright 2014 SERSC

5 Conclusion It is important to monitor the aspect of thread execution and the accesses to the shared memory location for debugging data races. This paper presented a dynamic monitoring tool, called VcTrace that analyzes partial ordering of concurrent threads and their accesses to the shared memory locations during an execution of the program based on vector clock system. Empirical results on C/C++ benchmarks using multithreads show that VcTrace is a sound and practical for on-the-fly data race detection as well as analyzing multithread programs. In the future, VcTrace might be developed as a practical software testing tool for detecting concurrency bugs, such as data races and deadlocks. Acknowledgments. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2011389) and also was supported by the Korea Evaluation Institute of Industrial Technology (KEIT) under the Development of Verification System for Mid-sized IMA Project (10043591) funded by the Ministry of Trade, Industry & Energy. References 1. Netzer, R.H.B., Miller, B.P.: What are race conditions?: Some issues and formalizations. ACM Lett. Program. Lang. Syst. 1, 74-88 (1992) 2. Banerjee, U., Bliss, B., Ma, Z., Petersen, P.: A theory of data race detection. In: Proceedings of the 2006 Workshop on Parallel and Distributed Systems: Testing and Debugging, PADTAD 2006, pp. 69 78. ACM, New York (2006) 3. Flanagan, C., Freund, S.N.: FastTrack: Efficient and precise dynamic race detection. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, pp. 121 133. ACM, New York (2009) 4. Ha, O.-K., Kuh, I.-B., Tchamgoue, G. M., Jun, Y.-K.: On-the-fly detection of data races in OpenMP programs. In: Proceedings of the 2012 Workshop on Parallel and Distributed Systems: Testing and Debugging. PADTAD 2012, pp. 1-10. ACM, New York (2012) 5. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 558 565 (1978) 6. Baldoni, R., Raynal, M.: Fundamentals of distributed computing: A practical tour of vector clock systems. IEEE Distributed Systems Online 3 (February 2002) 7. Bach, M., Charney, M., Cohn, R., Demikhovsky, E., Devor, T., Hazelwood, K., Jaleel, A., Luk, C.K., Lyons, G., Patil, H., Tal, A.: Analyzing parallel programs with pin. Computer 43(3), 34-41 (2010) 8. Yu, J., Narayanasamy, S.: A case for an interleaving constrained shared-memory multiprocessor. SIGARCH Comput. Archit. News 37(3), 325-336 (2009) Copyright 2014 SERSC 49