Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis

Size: px
Start display at page:

Download "Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis"

Transcription

1 Accelerating Dynamic Data Race Detection Using Static Thread Interference Peng Di and Yulei Sui School of Computer Science and Engineering The University of New South Wales 2052 Sydney Australia March 12-13, / 18

2 Outline Motivation phases Evalution 1 / 18

3 Dynamic Data Race Detector: ThreadSanitizer ThreadSanitizer (TSan) finds data races in multithreaded programs by inserting instrumentations at compile-time to perform runtime checks for all memory accesses. void foo(int *p) { *p = 42; } Orignial C code void foo(int *p) { tsan_func_entry( builtin_return_address(0)); tsan_write4(p); *p = 42; tsan_func_exit(); } Code after instrumentation 2 / 18

4 TSan performance slowdown over native code for SPLASH2 benchmarks (under compiler option -O0) 4 Threads 16 Threads 40 x 20 x 0 x barnes fft lu cb lu ncb ocean cp ocean ncp radiosity radix raytrace water nsquared water spatial average Machine: Ubuntu Linux generic Intel Xeon Quad Core HT, 3.7GHZ, 64GB 3 / 18

5 Outline Motivation phases Evalution 3 / 18

6 Framework Source Clang Front-End bc file Analyses s bc file Memory Pairs Collection Call Graph Construction Reachability Memory pairs Reachable pairs Interleaving Interleaving MHP pairs Pointer Alias Aliased pairs Thread-Local Thread-Local Lockset Lockset Escaped pairs Unlocked pairs Guided Instrumentation Instrumented bc file Code Generation Binary 4 / 18

7 Framework Source Clang Front-End bc file Analyses s bc file Memory Pairs Collection Call Graph Construction Reachability Memory pairs Reachable pairs Interleaving Interleaving MHP pairs Pointer Alias Aliased pairs Thread-Local Thread-Local Lockset Lockset Escaped pairs Unlocked pairs Guided Instrumentation Instrumented bc file Code Generation Binary 4 / 18

8 Framework Source Clang Front-End bc file Analyses s bc file Memory Pairs Collection Call Graph Construction Reachability Memory pairs Reachable pairs Interleaving Interleaving MHP pairs Pointer Alias Aliased pairs Thread-Local Thread-Local Lockset Lockset Escaped pairs Unlocked pairs Guided Instrumentation Instrumented bc file Code Generation Binary 4 / 18

9 Framework Source Clang Front-End bc file Analyses s bc file Memory Pairs Collection Call Graph Construction Reachability Memory pairs Reachable pairs Interleaving Interleaving MHP pairs Pointer Alias Aliased pairs Thread-Local Thread-Local Lockset Lockset Escaped pairs Unlocked pairs Guided Instrumentation Instrumented bc file Code Generation Binary 4 / 18

10 Framework Source Clang Front-End bc file Analyses s bc file Memory Pairs Collection Call Graph Construction Reachability Memory pairs Reachable pairs Interleaving Interleaving MHP pairs Pointer Alias Aliased pairs Thread-Local Thread-Local Lockset Lockset Escaped pairs Unlocked pairs Guided Instrumentation Instrumented bc file Code Generation Binary 4 / 18

11 Refining Pairs Only statements in the paris that are not filtered are instrumented for runtime check. bc file Memory Pairs Collection Memory Pairs Reachable Pairs MHP Pairs Alised Pairs Escaped Pairs Unlocked Pairs Guided Instrumentation Instrumented bc file 5 / 18

12 Context-Sensitive Abstract Threads An abstract thread t refers to a call of pthread create() at a context-sensitive fork site during the analysis. void main(){ cs1: cs2: } foo(); foo(); t1 refers to fork site under context [1,3] void foo(){ cs3: fork(t1, bar); } t1' refers to fork site under context [2,3] void main(){ } for(i=0;i<10;i++){ fork(t[i], foo) } t1 and t1' are context-sensitive threads t is multi-forked thread 6 / 18

13 Context-Sensitive Abstract Threads An abstract thread t refers to a call of pthread create() at a context-sensitive fork site during the analysis. void main(){ cs1: cs2: } foo(); foo(); t1 refers to fork site under context [1,3] void foo(){ cs3: fork(t1, bar); } t1' refers to fork site under context [2,3] void main(){ } for(i=0;i<10;i++){ fork(t[i], foo) } t1 and t1' are context-sensitive threads t is multi-forked thread A thread t always refers to a context-sensitive fork site, i.e., a unique runtime thread unless t M is multi-forked, in which case, t may represent more than one runtime thread. 6 / 18

14 Outline Motivation phases Thread Interleaving Alias Thread Local Lock Evalution 6 / 18

15 Context-sensitive Thread Interleaving (t 1, c 1, s 1 ) (t 2, c 2, s 2 ) holds if: { t 2 I(t 1, c 1, s 1 ) t 1 I(t 2, c 2, s 2 ) if t 1 t 2 t 1 M otherwise where I(t, c, s): denotes a set of interleaved threads may run in parallel with s in thread t under calling context c, M is the set of multi-forked threads. 7 / 18

16 Interleaving Computing I(t, c, s) is formalized as a forward data-flow problem (V,, F). V : the set of all thread interleaving facts. : meet operator ( ). F: V V transfer functions associated with each node in an ICFG. 8 / 18

17 Interleaving Rule [I-DESCENDANT] [I-SIBLING] (c,fk i ) t t (t, c, fk i ) (t, c, l) (c, l ) = Entry(S t ) {t } I(t, c, l) {t} I(t, c, l ) t t (c, l) = Entry(S t) (c, l ) = Entry(S t ) t t t t {t} I(t, c, l ) {t } I(t, c, l) [I-JOIN] (c,jn i ) t t I(t, c, jn i ) = I(t, c, jn i )\{t } [I-CALL] (t, c, l) call i (t, c, l ) c = c.push(i) I(t, c, l) I(t, c, l ) [I-INTRA] (t, c, l) (t, c, l ) I(t, c, l) I(t, c, l ) [I-RET] (t, c, l) ret i (t, c, l ) i = c.peek() c = c.pop() I(t, c, l) I(t, c, l ) 9 / 18

18 Interleaving Rule [I-DESCENDANT] t (c,fk i ) t (t, c, fk i ) (t, c, l) (c, l ) = Entry(S t ) {t } I(t, c, l) {t} I(t, c, l ) t t' I(t,c,s) = {t'} fork s s' I(t',c',s')={t} (t,c,s) (t',c',s') 10 / 18

19 Interleaving Rule [I-JOIN] (c,jn i ) t t I(t, c, jn i ) = I(t, c, jn i )\{t } t t' fork I(t,c,s) = {t'} join I(t,c,s1) = { } (t,c,s) (t,c,s) s s1 s' I(t',c',s')={t} (t',c',s') (t,c,s1) 10 / 18

20 Interleaving Rule [I-SIBLING] t t (c, l) = Entry(S t ) (c, l ) = Entry(S t ) t t t t {t} I(t, c, l ) {t } I(t, c, l) t' t0 t I(t',c',s')={} s' fork join s fork join I(t,c,s)={} (t,c,s) (t',c',s') 10 / 18

21 Interleaving Rule [I-SIBLING] t t (c, l) = Entry(S t ) (c, l ) = Entry(S t ) t t t t {t} I(t, c, l ) {t } I(t, c, l) t' t0 t I(t',c',s')={t} s' fork join s fork join I(t,c,s)={t'} (t,c,s) (t',c',s') 10 / 18

22 Interleaving Rule [I-DESCENDANT] [I-SIBLING] (c,fk i ) t t (t, c, fk i ) (t, c, l) (c, l ) = Entry(S t ) {t } I(t, c, l) {t} I(t, c, l ) t t (c, l) = Entry(S t) (c, l ) = Entry(S t ) t t t t {t} I(t, c, l ) {t } I(t, c, l) [I-JOIN] (c,jn i ) t t I(t, c, jn i ) = I(t, c, jn i )\{t } [I-CALL] (t, c, l) call i (t, c, l ) c = c.push(i) I(t, c, l) I(t, c, l ) [I-INTRA] (t, c, l) (t, c, l ) I(t, c, l) I(t, c, l ) [I-RET] (t, c, l) ret i (t, c, l ) i = c.peek() c = c.pop() I(t, c, l) I(t, c, l ) 11 / 18

23 Outline Motivation phases Thread Interleaving Alias Thread Local Lock Evalution 11 / 18

24 Alias Obtain aliasing pairs by refining MHP store-load and store-store pairs [(t, c, s), (t, c, s )], where Alias( p, q) is the set of objects pointed to by both p and q. s : p = s : = q or q = (t, c, s) (t, c, s ) o Alias( p, q) o s s int x,y; int *p,*q,*r; p=&x; q=&x; r=&y; void main(){ fork(t, foo); s1: *p=...; s2: *r=...; } void foo(){ s3:...=*q; } 12 / 18

25 Outline Motivation phases Thread Interleaving Alias Thread Local Lock Evalution 12 / 18

26 Thread-Local An object is not thread-local, i.e. escaping, if it escapes via arguments at a forksite global pointers void main(){ int x,y; fork(t,foo,&x); s1: x=...; join(t); s3: y=...; } void foo(int* p){ s2:...=*p; } 13 / 18

27 Outline Motivation phases Thread Interleaving Alias Thread Local Lock Evalution 13 / 18

28 Lockset Statements from different lockunlock spans, are interferencefree if these spans are protected by a common lock. Our framework does this by performing a flow- and contextsensitive analysis for lock/unlock operations int x; mutex m; void main(){ fork(t,foo); s1: x=...; lock(m); s3: x=...; unlock(m); join(t); } void foo(){ lock(m); s2:...=x; unlock(m); } 14 / 18

29 Outline Motivation phases Evalution 14 / 18

30 Evaluation Implementation: On top of our previous open-source tool SVF ( Based on our previous papers CGO 16 and ICPP 15 Benchmarks: 11 SPLASH2 Pthread benchmarks Machine setup: Ubuntu Linux generic Intel Xeon Quad Core HT, 3.7GHZ, 64GB 15 / 18

31 Instrumentation Statistics (under Option -O0) Pthread API TSan Our Approach Fork Join Lock Unlock Read Write Read Write barnes fft lu cb lu ncb ocean cp ocean ncp radiosity radix raytrace water nsquared water spatial / 18

32 Speedups over Original TSan (under Option -O0) 4 Threads 16 Threads 4 x 2 x 0 x barnes fft lu cb lu ncb ocean cp ocean ncp radiosity radix raytrace water nsquared water spatial average 17 / 18

33 Thanks! Q & A 18 / 18

SVF: Static Value-Flow Analysis in LLVM

SVF: Static Value-Flow Analysis in LLVM SVF: Static Value-Flow Analysis in LLVM Yulei Sui, Peng Di, Ding Ye, Hua Yan and Jingling Xue School of Computer Science and Engineering The University of New South Wales 2052 Sydney Australia March 18,

More information

Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis

Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis ABSTRACT Peng Di School of Computer Science and Engineering University of New South Wales, Australia pengd@cse.unsw.edu.au

More information

Chimera: Hybrid Program Analysis for Determinism

Chimera: Hybrid Program Analysis for Determinism Chimera: Hybrid Program Analysis for Determinism Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor - 1 - * Chimera image from http://superpunch.blogspot.com/2009/02/chimera-sketch.html

More information

Siloed Reference Analysis

Siloed Reference Analysis Siloed Reference Analysis Xing Zhou 1. Objectives: Traditional compiler optimizations must be conservative for multithreaded programs in order to ensure correctness, since the global variables or memory

More information

Identifying Ad-hoc Synchronization for Enhanced Race Detection

Identifying Ad-hoc Synchronization for Enhanced Race Detection Identifying Ad-hoc Synchronization for Enhanced Race Detection IPD Tichy Lehrstuhl für Programmiersysteme IPDPS 20 April, 2010 Ali Jannesari / Walter F. Tichy KIT die Kooperation von Forschungszentrum

More information

Predicting Data Races from Program Traces

Predicting Data Races from Program Traces Predicting Data Races from Program Traces Luis M. Carril IPD Tichy Lehrstuhl für Programmiersysteme KIT die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Concurrency &

More information

Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization

Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Yulei Sui, Xiaokang Fan, Hao Zhou and Jingling Xue School of Computer Science and Engineering The University of

More information

This is the published version of a paper presented at CGO 2017, February 4 8, Austin, TX.

This is the published version of a paper presented at CGO 2017, February 4 8, Austin, TX. http://www.diva-portal.org This is the published version of a paper presented at CGO 2017, February 4 8, Austin, TX. Citation for the original published paper: Jimborean, A., Waern, J., Ekemark, P., Kaxiras,

More information

Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis

Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis Clay Hughes & Tao Li Department of Electrical and Computer Engineering University of Florida

More information

Dynamic Race Detection with LLVM Compiler

Dynamic Race Detection with LLVM Compiler Dynamic Race Detection with LLVM Compiler Compile-time instrumentation for ThreadSanitizer Konstantin Serebryany, Alexander Potapenko, Timur Iskhodzhanov, and Dmitriy Vyukov OOO Google, 7 Balchug st.,

More information

Can We Make It Faster? Efficient May-Happen-in-Parallel Analysis Revisited

Can We Make It Faster? Efficient May-Happen-in-Parallel Analysis Revisited Can We Make It Faster? Efficient May-Happen-in-Parallel Analysis Revisited Congming Chen, Wei Huo, Lung Li Xiaobing Feng, Kai Xing State Key Laboratory of Computer Architecture Institute of Computing Technology,

More information

LDetector: A low overhead data race detector for GPU programs

LDetector: A low overhead data race detector for GPU programs LDetector: A low overhead data race detector for GPU programs 1 PENGCHENG LI CHEN DING XIAOYU HU TOLGA SOYATA UNIVERSITY OF ROCHESTER 1 Data races in GPU Introduction & Contribution Impact correctness

More information

Efficient Data Race Detection for Unified Parallel C

Efficient Data Race Detection for Unified Parallel C P A R A L L E L C O M P U T I N G L A B O R A T O R Y Efficient Data Race Detection for Unified Parallel C ParLab Winter Retreat 1/14/2011" Costin Iancu, LBL" Nick Jalbert, UC Berkeley" Chang-Seo Park,

More information

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas. a.cs.uiuc.edu

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas.   a.cs.uiuc.edu : Ligh htweight hardware support for data ra ce detection ec during systematic testing Adrian Nistor, Darko Marinov, Josep Torrellas University of Illinois, Urbana Champaign http://iacoma a.cs.uiuc.edu

More information

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Toshiyuki Maeda and Akinori Yonezawa University of Tokyo Quiz [Environment] CPU: Intel Xeon X5570 (2.93GHz)

More information

Deterministic Replay and Data Race Detection for Multithreaded Programs

Deterministic Replay and Data Race Detection for Multithreaded Programs Deterministic Replay and Data Race Detection for Multithreaded Programs Dongyoon Lee Computer Science Department - 1 - The Shift to Multicore Systems 100+ cores Desktop/Server 8+ cores Smartphones 2+ cores

More information

Fast dynamic program analysis Race detection. Konstantin Serebryany May

Fast dynamic program analysis Race detection. Konstantin Serebryany May Fast dynamic program analysis Race detection Konstantin Serebryany May 20 2011 Agenda Dynamic program analysis Race detection: theory ThreadSanitizer: race detector Making ThreadSanitizer

More information

Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks

Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks Snatch: Opportunistically Reassigning Power Allocation between and in 3D Stacks Dimitrios Skarlatos, Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim,

More information

Static Versioning of Global State for Race Condition Detection

Static Versioning of Global State for Race Condition Detection Static Versioning of Global State for Race Condition Detection Steffen Keul Dept. of Programming Languages and Compilers Institute of Software Technology University of Stuttgart 15 th International Conference

More information

CS 153 Lab4 and 5. Kishore Kumar Pusukuri. Kishore Kumar Pusukuri CS 153 Lab4 and 5

CS 153 Lab4 and 5. Kishore Kumar Pusukuri. Kishore Kumar Pusukuri CS 153 Lab4 and 5 CS 153 Lab4 and 5 Kishore Kumar Pusukuri Outline Introduction A thread is a straightforward concept : a single sequential flow of control. In traditional operating systems, each process has an address

More information

Intel Threading Tools

Intel Threading Tools Intel Threading Tools Paul Petersen, Intel -1- INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,

More information

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Projects 1 Information flow analysis for mobile applications 2 2 Machine-learning-guide typestate analysis for UAF vulnerabilities 3 3 Preventing

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 Process creation in UNIX All processes have a unique process id getpid(),

More information

MELD: Merging Execution- and Language-level Determinism. Joe Devietti, Dan Grossman, Luis Ceze University of Washington

MELD: Merging Execution- and Language-level Determinism. Joe Devietti, Dan Grossman, Luis Ceze University of Washington MELD: Merging Execution- and Language-level Determinism Joe Devietti, Dan Grossman, Luis Ceze University of Washington CoreDet results overhead normalized to nondet 800% 600% 400% 200% 0% 2 4 8 2 4 8 2

More information

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically Abdullah Muzahid, Shanxiang Qi, and University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu MICRO December

More information

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Exercises 2 Contech is An LLVM compiler pass to instrument

More information

Detecting and Tolerating Asymmetric Races

Detecting and Tolerating Asymmetric Races Detecting and Tolerating Asymmetric Races Paruj Ratanaworabhan 1, Martin Burtscher 2, Darko Kirovski 3, Rahul Nagpal 4, Karthik Pattabiraman 5, and Benjamin Zorn 3 1 Cornell University, 2 The University

More information

Dynamic Deadlock Avoidance Using Statically Inferred Effects

Dynamic Deadlock Avoidance Using Statically Inferred Effects Dynamic Deadlock Avoidance Using Statically Inferred Effects Kostis Sagonas1,2 joint work with P. Gerakios1 1 N. Papaspyrou1 P. Vekris1,3 School of ECE, National Technical University of Athens, Greece

More information

POSIX PTHREADS PROGRAMMING

POSIX PTHREADS PROGRAMMING POSIX PTHREADS PROGRAMMING Download the exercise code at http://www-micrel.deis.unibo.it/~capotondi/pthreads.zip Alessandro Capotondi alessandro.capotondi(@)unibo.it Hardware Software Design of Embedded

More information

COMP6771 Advanced C++ Programming

COMP6771 Advanced C++ Programming 1.... COMP6771 Advanced C++ Programming Week 9 Multithreading 2016 www.cse.unsw.edu.au/ cs6771 .... Single Threaded Programs All programs so far this semester have been single threaded They have a single

More information

Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems

Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems V Brazilian Symposium on Computing Systems Engineering Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems Alessandro Trindade, Hussama Ismail, and Lucas Cordeiro Foz

More information

New features in AddressSanitizer. LLVM developer meeting Nov 7, 2013 Alexey Samsonov, Kostya Serebryany

New features in AddressSanitizer. LLVM developer meeting Nov 7, 2013 Alexey Samsonov, Kostya Serebryany New features in AddressSanitizer LLVM developer meeting Nov 7, 2013 Alexey Samsonov, Kostya Serebryany Agenda AddressSanitizer (ASan): a quick reminder New features: Initialization-order-fiasco Stack-use-after-scope

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms. Whitepaper Introduction A Library Based Approach to Threading for Performance David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

More information

ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu

ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu int get_seqno() { } return ++seqno; // ~1 Billion ops/s // single-threaded int threadsafe_get_seqno()

More information

Multicore programming in CilkPlus

Multicore programming in CilkPlus Multicore programming in CilkPlus Marc Moreno Maza University of Western Ontario, Canada CS3350 March 16, 2015 CilkPlus From Cilk to Cilk++ and Cilk Plus Cilk has been developed since 1994 at the MIT Laboratory

More information

Shared Memory Parallel Programming with Pthreads An overview

Shared Memory Parallel Programming with Pthreads An overview Shared Memory Parallel Programming with Pthreads An overview Part II Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from ECE459: Programming for Performance course at University of Waterloo

More information

Taming Parallelism in a Multi-Variant Execution Environment

Taming Parallelism in a Multi-Variant Execution Environment Taming Parallelism in a Multi-Variant Execution Environment Stijn Volckaert University of California, Irvine stijnv@uci.edu Bart Coppens Ghent University bart.coppens@ugent.be Bjorn De Sutter Ghent University

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio Fall 2017 1 Outline Inter-Process Communication (20) Threads

More information

CS 3305 Intro to Threads. Lecture 6

CS 3305 Intro to Threads. Lecture 6 CS 3305 Intro to Threads Lecture 6 Introduction Multiple applications run concurrently! This means that there are multiple processes running on a computer Introduction Applications often need to perform

More information

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming Copyright 2010 Daniel J. Sorin Duke University Slides are derived from work by Sarita Adve (Illinois),

More information

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control Processes & Threads Concurrent Programs Process = Address space + one thread of control Concurrent program = multiple threads of control Multiple single-threaded processes Multi-threaded process 2 1 Concurrent

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution

More information

W4118: concurrency error. Instructor: Junfeng Yang

W4118: concurrency error. Instructor: Junfeng Yang W4118: concurrency error Instructor: Junfeng Yang Goals Identify patterns of concurrency errors (so you can avoid them in your code) Learn techniques to detect concurrency errors (so you can apply these

More information

Deterministic Process Groups in

Deterministic Process Groups in Deterministic Process Groups in Tom Bergan Nicholas Hunt, Luis Ceze, Steven D. Gribble University of Washington A Nondeterministic Program global x=0 Thread 1 Thread 2 t := x x := t + 1 t := x x := t +

More information

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence Plan Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 Multi-core Architecture 2 Race Conditions and Cilkscreen (Moreno Maza) Introduction

More information

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs , pp.45-49 http://dx.doi.org/10.14257/astl.2014.76.12 Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs Hyun-Ji Kim 1, Byoung-Kwi Lee 2, Ok-Kyoon Ha 3, and Yong-Kee Jun 1 1 Department

More information

Heuristics for Profile-driven Method- level Speculative Parallelization

Heuristics for Profile-driven Method- level Speculative Parallelization Heuristics for Profile-driven Method- level John Whaley and Christos Kozyrakis Stanford University Speculative Multithreading Speculatively parallelize an application Uses speculation to overcome ambiguous

More information

Advanced Threads & Monitor-Style Programming 1/24

Advanced Threads & Monitor-Style Programming 1/24 Advanced Threads & Monitor-Style Programming 1/24 First: Much of What You Know About Threads Is Wrong! // Initially x == 0 and y == 0 // Thread 1 Thread 2 x = 1; if (y == 1 && x == 0) exit(); y = 1; Can

More information

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple

More information

Introduction to OpenMP.

Introduction to OpenMP. Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i

More information

Data-Centric Locality in Chapel

Data-Centric Locality in Chapel Data-Centric Locality in Chapel Ben Harshbarger Cray Inc. CHIUW 2015 1 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward

More information

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard

More information

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Shigeki Akiyama, Kenjiro Taura The University of Tokyo June 17, 2015 HPDC 15 Lightweight Threads Lightweight threads enable

More information

Call Paths for Pin Tools

Call Paths for Pin Tools , Xu Liu, and John Mellor-Crummey Department of Computer Science Rice University CGO'14, Orlando, FL February 17, 2014 What is a Call Path? main() A() B() Foo() { x = *ptr;} Chain of function calls that

More information

Verifying Concurrent Programs

Verifying Concurrent Programs Verifying Concurrent Programs Daniel Kroening 8 May 1 June 01 Outline Shared-Variable Concurrency Predicate Abstraction for Concurrent Programs Boolean Programs with Bounded Replication Boolean Programs

More information

CS 261 Fall Mike Lam, Professor. Threads

CS 261 Fall Mike Lam, Professor. Threads CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain

More information

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution 1. (15 points) A system with two dual-core processors has four processors available for scheduling. A CPU-intensive application (e.g., a program that spends most of its time on computation, not I/O or

More information

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores Presented by Xiaohui Chen Joint work with Marc Moreno Maza, Sushek Shekar & Priya Unnikrishnan University of Western Ontario,

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Synchronization 3 Automatic Parallelization and OpenMP 4 GPGPU 5 Q& A 2 Multithreaded

More information

Efficient, Deterministic, and Deadlock-free Concurrency

Efficient, Deterministic, and Deadlock-free Concurrency Efficient, Deterministic Concurrency p. 1/31 Efficient, Deterministic, and Deadlock-free Concurrency Nalini Vasudevan Columbia University Efficient, Deterministic Concurrency p. 2/31 Data Races int x;

More information

Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic!

Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic! Explicitly Parallel Programming with Shared Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University of Washington Parallel Programming is Hard

More information

Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World

Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World Chris Lattner Apple Andrew Lenharth UIUC Vikram Adve UIUC What is Heap Cloning? Distinguish objects by acyclic

More information

Dealing with Concurrency Problems

Dealing with Concurrency Problems Dealing with Concurrency Problems Nalini Vasudevan Columbia University Dealing with Concurrency Problems, Nalini Vasudevan p. 1 What is the output? int x; foo(){ int m; m = qux(); x = x + m; bar(){ int

More information

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism. Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,

More information

Putting the Checks into Checked C. Archibald Samuel Elliott Quals - 31st Oct 2017

Putting the Checks into Checked C. Archibald Samuel Elliott Quals - 31st Oct 2017 Putting the Checks into Checked C Archibald Samuel Elliott Quals - 31st Oct 2017 Added Runtime Bounds Checks to the Checked C Compiler 2 C Extension for Spatial Memory Safety Added Runtime Bounds Checks

More information

Lecture 14 Pointer Analysis

Lecture 14 Pointer Analysis Lecture 14 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis [ALSU 12.4, 12.6-12.7] Phillip B. Gibbons 15-745: Pointer Analysis

More information

Deterministic Concurrency

Deterministic Concurrency Candidacy Exam p. 1/35 Deterministic Concurrency Candidacy Exam Nalini Vasudevan Columbia University Motivation Candidacy Exam p. 2/35 Candidacy Exam p. 3/35 Why Parallelism? Past Vs. Future Power wall:

More information

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017 CSCI4430 Data Communication and Computer Networks Pthread Programming ZHANG, Mi Jan. 26, 2017 Outline Introduction What is Multi-thread Programming Why to use Multi-thread Programming Basic Pthread Programming

More information

Ad Hoc Synchroniza/on Considered Harmful

Ad Hoc Synchroniza/on Considered Harmful Ad Hoc Synchroniza/on Considered Harmful Weiwei Xiong, Soyoen Park, Jiaqi Zhang, Yuanyuan Zhou and Zhiqiang Ma UC San Diego University of Illinois Intel Synchroniza/on is Important Concurrent programs

More information

ASaP: Annotations for Safe Parallelism in Clang. Alexandros Tzannes, Vikram Adve, Michael Han, Richard Latham

ASaP: Annotations for Safe Parallelism in Clang. Alexandros Tzannes, Vikram Adve, Michael Han, Richard Latham ASaP: Annotations for Safe Parallelism in Clang Alexandros Tzannes, Vikram Adve, Michael Han, Richard Latham Motivation Debugging parallel code is hard!! Many bugs are hard to reason about & reproduce

More information

Software Speculative Multithreading for Java

Software Speculative Multithreading for Java Software Speculative Multithreading for Java Christopher J.F. Pickett and Clark Verbrugge School of Computer Science, McGill University {cpicke,clump}@sable.mcgill.ca Allan Kielstra IBM Toronto Lab kielstra@ca.ibm.com

More information

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Janghaeng Lee, Haicheng Wu, Madhumitha Ravichandran, Nathan Clark Motivation Hardware Trends Put more cores

More information

Motivation & examples Threads, shared memory, & synchronization

Motivation & examples Threads, shared memory, & synchronization 1 Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency

More information

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency exceptions

More information

pthreads CS449 Fall 2017

pthreads CS449 Fall 2017 pthreads CS449 Fall 2017 POSIX Portable Operating System Interface Standard interface between OS and program UNIX-derived OSes mostly follow POSIX Linux, macos, Android, etc. Windows requires separate

More information

CS Lecture 3! Threads! George Mason University! Spring 2010!

CS Lecture 3! Threads! George Mason University! Spring 2010! CS 571 - Lecture 3! Threads! George Mason University! Spring 2010! Threads! Overview! Multithreading! Example Applications! User-level Threads! Kernel-level Threads! Hybrid Implementation! Observing Threads!

More information

Compiler Techniques For Memory Consistency Models

Compiler Techniques For Memory Consistency Models Compiler Techniques For Memory Consistency Models Students: Xing Fang (Purdue), Jaejin Lee (Seoul National University), Kyungwoo Lee (Purdue), Zehra Sura (IBM T.J. Watson Research Center), David Wong (Intel

More information

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? CS 470 Spring 2018 POSIX Mike Lam, Professor Multithreading & Pthreads MIMD

More information

RAMP-White / FAST-MP

RAMP-White / FAST-MP RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White

More information

Formal Methods: Model Checking and Other Applications. Orna Grumberg Technion, Israel. Marktoberdorf 2017

Formal Methods: Model Checking and Other Applications. Orna Grumberg Technion, Israel. Marktoberdorf 2017 Formal Methods: Model Checking and Other Applications Orna Grumberg Technion, Israel Marktoberdorf 2017 1 Outline Model checking of finite-state systems Assisting in program development Program repair

More information

Calvin Lin The University of Texas at Austin

Calvin Lin The University of Texas at Austin Interprocedural Analysis Last time Introduction to alias analysis Today Interprocedural analysis March 4, 2015 Interprocedural Analysis 1 Motivation Procedural abstraction Cornerstone of programming Introduces

More information

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Dongdong Deng, Wei Zhang, Borui Wang, Peisen Zhao, Shan Lu University of Wisconsin, Madison 1 Concurrency bug detec3on is

More information

PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction

PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction Mihail Popov, Chadi kel, Florent Conti, William Jalby, Pablo de Oliveira Castro UVSQ - PRiSM - ECR Mai 28, 2015 Introduction

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

Universidad Carlos III de Madrid Computer Science and Engineering Department Operating Systems Course

Universidad Carlos III de Madrid Computer Science and Engineering Department Operating Systems Course Exercise 1 (20 points). Autotest. Answer the quiz questions in the following table. Write the correct answer with its corresponding letter. For each 3 wrong answer, one correct answer will be subtracted

More information

11/19/2013. Imperative programs

11/19/2013. Imperative programs if (flag) 1 2 From my perspective, parallelism is the biggest challenge since high level programming languages. It s the biggest thing in 50 years because industry is betting its future that parallel programming

More information

Runtime Monitoring on Multicores via OASES

Runtime Monitoring on Multicores via OASES Runtime Monitoring on Multicores via OASES Vijay Nagarajan and Rajiv Gupta University of California, CSE Department, Riverside, CA 92521 {vijay,gupta}@cs.ucr.edu Abstract Runtime monitoring support serves

More information

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Pros and Cons of Pointers Lecture 27 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Many procedural languages have pointers

More information

Lecture 20 Pointer Analysis

Lecture 20 Pointer Analysis Lecture 20 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis (Slide content courtesy of Greg Steffan, U. of Toronto) 15-745:

More information

Context-Switch-Directed Verification in DIVINE

Context-Switch-Directed Verification in DIVINE Context-Switch-Directed Verification in DIVINE MEMICS 2014 Vladimír Štill Petr Ročkai Jiří Barnat Faculty of Informatics Masaryk University, Brno October 18, 2014 Vladimír Štill et al. Context-Switch-Directed

More information

NVthreads: Practical Persistence for Multi-threaded Applications

NVthreads: Practical Persistence for Multi-threaded Applications NVthreads: Practical Persistence for Multi-threaded Applications Terry Hsu*, Purdue University Helge Brügner*, TU München Indrajit Roy*, Google Inc. Kimberly Keeton, Hewlett Packard Labs Patrick Eugster,

More information

CSE 374 Programming Concepts & Tools

CSE 374 Programming Concepts & Tools CSE 374 Programming Concepts & Tools Hal Perkins Fall 2017 Lecture 22 Shared-Memory Concurrency 1 Administrivia HW7 due Thursday night, 11 pm (+ late days if you still have any & want to use them) Course

More information

CIRM - Dynamic Error Detection

CIRM - Dynamic Error Detection CIRM - Dynamic Error Detection Peter Pirkelbauer Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory This work was funded by the Department of Defense and used elements

More information

Detecting Data Races in Multi-Threaded Programs

Detecting Data Races in Multi-Threaded Programs Slide 1 / 22 Detecting Data Races in Multi-Threaded Programs Eraser A Dynamic Data-Race Detector for Multi-Threaded Programs John C. Linford Slide 2 / 22 Key Points 1. Data races are easy to cause and

More information

LogTM: Log-Based Transactional Memory

LogTM: Log-Based Transactional Memory LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet

More information

OpenACC. Part 2. Ned Nedialkov. McMaster University Canada. CS/SE 4F03 March 2016

OpenACC. Part 2. Ned Nedialkov. McMaster University Canada. CS/SE 4F03 March 2016 OpenACC. Part 2 Ned Nedialkov McMaster University Canada CS/SE 4F03 March 2016 Outline parallel construct Gang loop Worker loop Vector loop kernels construct kernels vs. parallel Data directives c 2013

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information