What else is available besides OpenMP? Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Parallel Programming June 6, RWTH Aachen University
Other Shared Memory paradigms o OpenMP: de facto standard, supported by all compilers Based on / Win32 Threads thos can be programmed by hand as well o C++ library, fits nicely to STL like programming style Provides parallel containers and operations: tasks, parallel_for, parallel_reduce, parallel_scan, parallel_sort, Commercial + Open Source variants, Windows + Linux + Solaris 2 o : The next release of C++ Will provide a memory model for multi threading Will provide basic thread management (currently in Boost)
Pi with OpenMP double f(double x) { return (double)4.0 / ((double)1.0 + (x*x)); void computepi() { double h = (double)1.0 / (double)inumintervals; double sum = 0, x; #pragma omp parallel for private(x) reduction(+:sum) for (int i = 1; i <= inumintervals; i++) { x = h * ((double)i - (double)0.5); sum += f(x); mypi = h * sum; 3
Pi with (1/2) 4 initialization of runtime tbb::task_scheduler_init init; CPI CalcPi(1.0 / (double)n); parallel reduction tbb::parallel_reduce(tbb::blocked_range<int>(1, n+1), CalcPi); pi = (1.0 / (double)n) * CalcPi.sum;
5 Advanced Topics on OpenMP Pi with (2/2) class CPI { public: functor: do the actual work here double sum; void operator() (const tbb::blocked_range<int>& r) { double x = 0.0; for (int i = r.begin(); i!= r.end(); ++i) { x = h * ((double)i 0.5); sum += f(x); constructor CPI(double) {... constructor CPI(CPI& other, tbb::split) {... void join(cpi& other) { sum += other.sum; reduction helper initialization of runtime tbb::task_scheduler_init init; CPI CalcPi(h /*=1.0 / (double)n*/); parallel reduction tbb::parallel_reduce(tbb::blocked_range<int>(1, n+1), CalcPi); pi = (h * CalcPi.sum;
Pi with [currently: Boost] (1/2) 6 thread_group t; for (int i = 0; i < = g_inumthreads; i++) t.create_thread( CalcPi(i) ); t.join_all(); spawn and join threads
7 Advanced Topics on OpenMP Pi with [currently: Boost] (1/2) struct CalcPi { CalcPi(int _in) : m_in { functor: do the actual work here void operator() { double dpi = 0.0, dh = 1.0 / (double)g_intervals, dsum = 0; for (int i = m_in + 1; i<=g_intervals; i+= g_inumthreads) { double dx = dh * ((double)i + 0.5); dsum += f(dx); dpi = dh * dsum; { mutex::scoped_lock1(g_mred); g_dpi += dpi; Mutual Exclusion thread_group t; for (int i = 0; i < = g_inumthreads; i++) spawn and join threads t.create_thread( CalcPi(i) ); t.join_all();
Pi with (1/2) pthread_t *tid; pthread_mutex_t reduction_mutex; 8 management overhead tid = (pthread_t*) calloc(g_inumthreads, sizeof(pthread_t)); for (i = 0; i < g_inumthreads; i++) pthread_create(&tid[i], NULL, PIworker, NULL); for (i = 0; i < g_inumthreads; i++) pthread_join(tid[i], NULL); spawn and join threads
9 Advanced Topics on OpenMP Pi with (2/2) pthread_t *tid; pthread_mutex_t reduction_mutex; Outlining the manual way void PIworker(void *arg) { int myid = pthread_self() tid[0]; for (int i = myid + 1; i <= g_intervals; i+= g_inumthreads) { /* computation of dpi */ pthread_mutex_lock(&reduction_mutex); g_dpi += dpi; Critical Region the pthread_mutex_unlock(&reduction_mutex); manual way management overhead tid = (pthread_t*) calloc(g_inumthreads, sizeof(pthread_t)); for (i = 0; i < g_inumthreads; i++) pthread_create(&tid[i], NULL, PIworker, NULL); for (i = 0; i < g_inumthreads; i++) pthread_join(tid[i], NULL); spawn and join threads
The End Thank you for your attention! 10