PThreads in a Nutshell

Similar documents
CSCI 4061: Threads in a Nutshell

CSE 333 SECTION 9. Threads

CS 153 Lab4 and 5. Kishore Kumar Pusukuri. Kishore Kumar Pusukuri CS 153 Lab4 and 5

Operating systems fundamentals - B06

OpenMP: Open Multi-Processing

pthreads CS449 Fall 2017

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017

Outline. CS4254 Computer Network Architecture and Programming. Introduction 2/4. Introduction 1/4. Dr. Ayman A. Abdel-Hamid.

Introduction to PThreads and Basic Synchronization

Lecture 4. Threads vs. Processes. fork() Threads. Pthreads. Threads in C. Thread Programming January 21, 2005

LSN 13 Linux Concurrency Mechanisms

Threads. Threads (continued)

EPL372 Lab Exercise 2: Threads and pthreads. Εργαστήριο 2. Πέτρος Παναγή

Introduction to pthreads

POSIX PTHREADS PROGRAMMING

POSIX Threads. HUJI Spring 2011

CS 3305 Intro to Threads. Lecture 6

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

HPCSE - I. «Introduction to multithreading» Panos Hadjidoukas

Concurrency, Thread. Dongkun Shin, SKKU

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 261 Fall Mike Lam, Professor. Threads

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Parallel Programming with Threads

CSci 4061 Introduction to Operating Systems. (Threads-POSIX)

CS370 Operating Systems

CS370 Operating Systems

TCSS 422: OPERATING SYSTEMS

POSIX threads CS 241. February 17, Copyright University of Illinois CS 241 Staff

ENCM 501 Winter 2019 Assignment 9

Shared Memory Parallel Programming

CS510 Operating System Foundations. Jonathan Walpole

Section 4: Threads and Context Switching

Multicore and Multiprocessor Systems: Part I

Section 4: Threads CS162. September 15, Warmup Hello World Vocabulary 2

CMPSC 311- Introduction to Systems Programming Module: Concurrency

High Performance Computing Course Notes Shared Memory Parallel Programming

CS444 1/28/05. Lab 03

CMPSC 311- Introduction to Systems Programming Module: Concurrency

Preview. What are Pthreads? The Thread ID. The Thread ID. The thread Creation. The thread Creation 10/25/2017

Shared Memory Programming: Threads and OpenMP. Lecture 6

Exercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads

Agenda. Process vs Thread. ! POSIX Threads Programming. Picture source:

Lecture 19: Shared Memory & Synchronization

CS333 Intro to Operating Systems. Jonathan Walpole

Shared Memory Programming. Parallel Programming Overview

COSC 6374 Parallel Computation. Shared memory programming with POSIX Threads. Edgar Gabriel. Fall References

1. Introduction. 2. Project Submission and Deliverables

CMPSC 311- Introduction to Systems Programming Module: Concurrency

ANSI/IEEE POSIX Standard Thread management

Chapter 4 Concurrent Programming

Programming with Shared Memory. Nguyễn Quang Hùng

Concurrency and Synchronization. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

real time operating systems course

A Thread is an independent stream of instructions that can be schedule to run as such by the OS. Think of a thread as a procedure that runs

CS-345 Operating Systems. Tutorial 2: Grocer-Client Threads, Shared Memory, Synchronization

CS510 Operating System Foundations. Jonathan Walpole

Concurrency. Johan Montelius KTH

CS370 Operating Systems

Threads Tuesday, September 28, :37 AM

POSIX Threads. Paolo Burgio

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Multiprocessors 2007/2008

Concurrent Server Design Multiple- vs. Single-Thread

Shared Memory Architectures

4.8 Summary. Practice Exercises

Thread and Synchronization

Threads. studykorner.org

Pre-lab #2 tutorial. ECE 254 Operating Systems and Systems Programming. May 24, 2012

What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

Pthreads. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Multithreaded Programming

Shared Memory Programming with Pthreads. T. Yang. UCSB CS240A.

Threads. Jo, Heeseung

Processes and Threads

Chapter 4: Threads. Operating System Concepts with Java 8 th Edition

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Chapter 4 Threads. Images from Silberschatz 03/12/18. CS460 Pacific University 1

Threaded Programming. Lecture 9: Alternatives to OpenMP

Threads need to synchronize their activities to effectively interact. This includes:

High Performance Computing Lecture 21. Matthew Jacob Indian Institute of Science

CS533 Concepts of Operating Systems. Jonathan Walpole

CSE 306/506 Operating Systems Threads. YoungMin Kwon

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Process Synchronization

CS 326: Operating Systems. Process Execution. Lecture 5

Operating Systems. Threads and Signals. Amir Ghavam Winter Winter Amir Ghavam

Creating Threads. Programming Details. COMP750 Distributed Systems

Shared Memory: Virtual Shared Memory, Threads & OpenMP

Threads. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Introduction to Embedded Systems

Thread Concept. Thread. No. 3. Multiple single-threaded Process. One single-threaded Process. Process vs. Thread. One multi-threaded Process

CS370 Operating Systems

C Grundlagen - Threads

Threads. CS3026 Operating Systems Lecture 06

POSIX Threads and OpenMP tasks

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS. Information and Computer Science Department. ICS 431 Operating Systems. Lab # 9.

Questions from last time

THREADS. Jo, Heeseung

Concurrency and Threads

Transcription:

PThreads in a Nutshell Chris Kauffman CS 499: Spring 2016 GMU

Logistics Today POSIX Threads Briefly Reading Grama 7.1-9 (PThreads) POSIX Threads Programming Tutorial HW4 Upcoming Post over the weekend soon Due in last week of class OpenMP Password Cracking PThreads Version Exploration of alternative programming models Maybe a sorting routine...

Threaded Programming OpenMP provided threads via directives (#pragma omp) Thread creation, execution, and cleanup all automated PThreads is lower-level, similar to fork() / waitpid() of IPC programming

Threads vs IPC You can mix IPC/Threads if you hate yourself enough. Threads in PThreads Fast startup Memory shared by default Little protection between threads pthread_create() /join() Queues, Semaphores, Mutexes, CondVars Process in IPC Longer startup Must share memory explicitly Good protection between processes fork() / waitpid() Queues, Semaphores, Shared Mem Source Source

Thread Creation #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); int pthread_join(pthread_t thread, void **retval); Start a thread running function start_routine attr may be NULL for default attributes Pass arguments arg to the function Wait for thread to finish, put return in retval

Minimal Example // Minimal example of starting a pthread, passing a // parameter to the thread function, then waiting for it to // finish #include <pthread.h> #include <stdio.h> void *fx(void *param){ int p=(int) param; p = p*2; return (void *) p; } int main(){ pthread_t thread_1; pthread_create(&thread_1, NULL, fx, (void *) 42); int xres; pthread_join(thread_1, (void **) &xres); printf("result is: %d\n",xres); return 0; }

Compilation > gcc pthreads_minimal_example.c -lpthread pthreads_minimal_example.c: In function fx : pthreads_minimal_example.c:7:9: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] int p=(int) param; ^ pthreads_minimal_example.c:9:10: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] return (void *) p; ^ > a.out result is: 84

Things to Ask How much compiler support do you get with pthreads? How does one pass multiple arguments to a function? What does the parent thread do on creating a child thread? If multiple children are spawned, which execute?

Exercise: A Slice of the Pi Recall Monte-Carlo estimation of π Serial code: http://cs.gmu.edu/~kauffman/cs499/picalc.c Convert serial version to use PThreads How to determine # of threads, thread id What info to communicate threads How to accumulate results main(){ unsigned int rstate = 123456789; int npoints = atoi(argv[1]); int total_hits=0; for (int i = 0; i < npoints; i++) { double x = ((double) rand_r(&rstate)) / ((double) RAND_MAX); double y = ((double) rand_r(&rstate)) / ((double) RAND_MAX); if (x*x + y*y <= 1.0){ total_hits++; } } double pi_est = ((double)total_hits) / npoints * 4.0; }

Speedup! Recall that this problem is almost embarassingly parallel Very little communication/coordination required Speedup follows 4-proc desktop > gcc picalc.c > time a.out 100000000 npoints: 100000000 hits: 78541355 pi_est: 3.141654 real 2.35 user 2.27 sys 0.00 > gcc pthreads_picalc.c > time -p a.out 100000000 2 npoints: 100000000 hits: 78539689 pi_est: 3.141588 real 1.36 user 2.53 sys 0.02

Timings on Zeus zeus-1 [cs499]% gcc pthreads_picalc.c -lpthread -std=c99 zeus-1 [cs499]% time a.out 100000000 1... Time (s) #threads real user 1 3.405 3.340 2 1.728 3.342 3 1.180 3.341 4 0.901 3.340 5 0.736 3.339 6 0.622 3.339 7 0.543 3.340 8 0.483 3.339 9 0.596 3.342 10 0.562 3.341 What kind of speedup are we getting here?

get_thread_id()??? As noted in other answers, pthreads does not define a platform-independent way to retrieve an integral thread ID. This answer http: // stackoverflow. com/ a/ 21206357/ 316487 gives a non-portable way which works on many BSD-based platforms. Bleater on Stack Overflow // Standard opaque object, non-printable?? pthread_t opaque = pthread_self(); // Non-portable, non-linux pthread_id_np_t tid = pthread_getthreadid_np(); // Linux only pid_t tid = syscall( NR_gettid ); printf("thread %d reporting for duty\n",tid);

Mutual Exclusion POSIX provides mutual exclusion via mutexes (mutices?), commonly referred to as locks. Good for thread synchronization, can also be used in IPC sync rather than semaphores/message queues Basic pattern Crate Lock variable Initialize Lock... Obtain Lock Execute critical section Release Lock... Destroy Lock Posix Calls pthread_mutex_t lock; pthread_mutex_init(&lock, NULL);... pthread_mutex_lock(&lock); total_hits++; pthread_mutex_unlock(&lock);... pthread_mutex_destroy(&lock);

Picalc with Locks Adjust code to use global total_hits, protect updates using locking https://cs.gmu.edu/~kauffman/cs499/pthreads_picalc.c

Timing with Locks zeus-1 [cs499]% gcc pthreads_picalc_locking.c -lpthread -std zeus-1 [cs499]% time -p a.out 100000000 1 Locks Lock-Free #threads real user sys real user 1 7.05 6.98 0.00 3.405 3.340 2 27.62 22.15 28.73 1.728 3.342 3 25.65 22.60 40.07 1.180 3.341 4 34.32 29.41 94.90 0.901 3.340 5 43.65 33.26 162.76 0.736 3.339 6 41.32 29.15 194.13 0.622 3.339 7 34.66 23.72 197.91 0.543 3.340 8 30.68 20.00 200.10 0.483 3.339 9 29.11 19.21 197.84 0.596 3.342 10 28.49 18.51 197.57 0.562 3.341 Why are these numbers so much worse than the lock-free version?

Locks versus Condition Variables POSIX Mutexes use busy waiting - occupy CPU time while repeatedly trying to acquire the lock: Spin Lock or Polling Condition variables allow non-busy waiting Recall the Semaphore Check an integer value atomically Increment / decrement that value If decrementing would drop below 0, block, wait to be notified of non-zero value Built-in wait queue to notify blocked processes of changes Blocking does not use CPU CondVar Wait Queue Only the queue part of a semaphore // Wait for signals pthread_cond_wait(cv,mtx); // Wake up waiting thread pthread_cond_signal(cv); Required: External variable/variables to indicate state Required: Mutex to control access to those variables

Picalc with Condvars int critical_occupied = 0; pthread_mutex_t critical_mtx; pthread_cond_t critical_cv;...; double x = ((double) rand_r(&rstate)) / ((double) RAND_MAX); double y = ((double) rand_r(&rstate)) / ((double) RAND_MAX); if (x*x + y*y <= 1.0){ // Enter the critical section pthread_mutex_lock(&critical_mtx); while(critical_occupied){ pthread_cond_wait(&critical_cv, &critical_mtx); } critical_occupied = 1; pthread_mutex_unlock(&critical_mtx); // Update the state total_hits++; // Exit the critical section critical_occupied = 0; pthread_cond_signal(&critical_cv); }

Timings Condvar Mutex Redux #Th real user sys real user sys real user 1 9.59 9.53 0.00 7.05 6.98 0.00 3.405 3.340 2 29.73 21.62 29.09 27.62 22.15 28.73 1.728 3.342 3 85.88 62.54 151.82 25.65 22.60 40.07 1.180 3.341 4 96.39 64.24 226.77 34.32 29.41 94.90 0.901 3.340 5 126.69 72.78 368.33 43.65 33.26 162.76 0.736 3.339 6 161.57 79.77 531.09 41.32 29.15 194.13 0.622 3.339 7 178.40 80.26 646.49 34.66 23.72 197.91 0.543 3.340 8 192.38 78.64 759.83 30.68 20.00 200.10 0.483 3.339 9 202.28 78.20 820.12 29.11 19.21 197.84 0.596 3.342 10 211.73 79.41 834.23 28.49 18.51 197.57 0.562 3.341

More Canonical Examples of Condition Variables Picalc is ill-suited for either Mutexes or Condition Variables to control access to the critical section of code. More canonical example of condvar is producer/consumer Examine pthreads_producer_consumer.c

John s Solution to Mutex Problems JohnMillerCards += 5; Tried several variants of lock(), trylock() schemes Found the following to be the most scalable if (x*x + y*y > 1.0){ if(pthread_mutex_trylock(&lock) == 0){ total_miss++; pthread_mutex_unlock(&lock); } else{ pthread_yield(); pthread_mutex_lock(&lock); total_miss++; pthread_mutex_unlock(&lock); } } Beats the OpenMP critical version for scaling

Scaling of John s Solutions

Take-Home PThreads provide threaded execution within a single program, shared memory Primary capability: spawn threads starting different functions Provide basic coordination mechanisms for mutual exclusion Did not cover large swaths of other facilities (message queues, thread priority and cancellation, etc.) but these exist and should be investigated should the need arise