Data Handling in OpenMP Manipulate data by threads By private: a thread initializes and uses a variable alone Keep local copies, such as loop indices By firstprivate: a thread repeatedly reads a variable that has been initialized earlier in the program Make a copy and inherit the value at the time of thread creation When a thread is scheduled on the processor, the data can reside at the same processor (in its cache) No interprocessor communication By reduction: multiple threads manipulate a single piece of data Break manipulations into local operations followed by a global operation Counting, summation 1
Data Handling in OpenMP (cont d) Localization: If multiple threads manipulate different parts of a large data structure, the programmer should break it into smaller data structures and make them private to the threads By shared: after all the above techniques have been explored The threadprivate directive Some objects persist through parallel and serial blocks The number of threads remains the same Avoid copying into the master thread s data space and reinitializing at the next parallel block Initialized once before they are accessed in a parallel region The copyin(variable_list) directive Assign the same value to threadprivate variables across all threads in a parallel region 2
Data Handling in OpenMP (cont d) Data in threadprivate objects is guaranteed to persist only if the dynamic threads mechanism is "turned off" and the number of threads in different parallel regions remains constant The default setting of dynamic threads is undefined. 3
Clauses/Directives Summary The following OpenMP directives do NOT accept clauses master, critical, barrier, atomic, flush, ordered, and threadprivate 4
Controlling Number of Threads and Processors 5
Controlling Number of Threads and Processors omp_set_num_threads: set the default number of threads Outside the scope of a parallel region Dynamic adjustment of threads must be enabled By OMP_DYNAMIC or omp_set_dynamic() omp_get_num_threads: return the number of threads Bind to the closest parallel directive omp_get_max_threads: return the maximum number of thread that could be created by a parallel directive omp_get_thread_num: return a unique thread i.d. From 0 to omp_get_num_threads()-1 omp_get_num_procs: return the number of processors that are available to execute the threaded program omp_in_parallel: in parallel region or not 6
Controlling and Monitoring Thread Creation omp_set_dynamic: allow the programmer to dynamically alter the number of threads To disable: the value dynamic_threads is set to 0 Called outside the parallel regions omp_get_dynamic: determine dynamic adjustment is enable or not omp_set_nested: enable nested parallelism if the nested is non-zero If disabled, any nested parallel regions are serialized omp_get_nested: return the state of nested parallelism 7
Mutual Exclusion Omp_init_lock: initialize a lock before using it Omp_destroy_lock: discard a lock Omp_set_lock: acquire a lock Omp_unset_lock: unlock the lock The result of a thread attempting to unlock a lock owned by another thread is undefined Omp_test_lock: non-blocking lock Non-zero: successfully set the lock 8
Mutual Exclusion (cont d) Nestable locks: can be locked multiple times by the same thread Similar to recursive mutexes in Pthreads 9
Environment Variables in OpenMP OMP_NUM_THREADS: the default number of threads Changed by the omp_set_num_threads function or the num_threads clause Requirement: the variable OMP_SET_DYNAMIC is set to TRUE or if the function omp_set_dynamic has been called with a non-zero argument Example (on bash): export OMP_NUM_THREADS=8 OMP_DYNAMIC: allow the number of threads to be controlled at runtime Disabled: call omp_set_dynamic function with a zero argument OMP_NESTED: enable or disable nested parallelism 10
Environment Variables in OpenMP (cont d) OMP_SCHEDULE: control the assignment of iteration spaces associated with for directives (runtime scheduling) Support static, dynamic, and guided with optional chunk size Examples: export OMP_SCHEDULE=static,4 export OMP_SCHEDULE=dynamic The default chunk size is 1 export OMP_SCHEDULE=guided The default chunk size is 1 11
Explicit Threads versus OpenMP Based Programming OpenMP: a layer on top of native threads Avoid tasks of initializing attributes objects, setting up arguments to threads, partitioning iteration spaces, etc Convenient for static and regular problems The overheads is minimal Explicit threads: Data exchange is more apparent Alleviate overheads from data movement, false sharing, and contention Richer APIs in the form of condition waits, locks of different types, and increased flexibility for building composite synchronization operations Better tools and support Used more widely than OpenMP 12