CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread Level Parallelism: OpenMP
|
|
- Candice Young
- 6 years ago
- Views:
Transcription
1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread Level Parallelism: OpenMP Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/sp11 3/15/11 1 Spring Lecture #16 2 Parallel Requests Assigned to computer e.g., Search Katz Parallel Threads Assigned to core e.g., Lookup, Ads So6ware Parallel InstrucYons >1 one Yme e.g., 5 pipelined instrucyons Parallel Data >1 data one Yme e.g., Add of 4 pairs of words Hardware descripyons All gates funcyoning in parallel at same Yme You Are Here! Harness Parallelism & Achieve High Performance Hardware Warehouse Scale Computer Core Memory Input/Output InstrucYon Unit(s) Main Memory Core Smart Phone Logic Gates 3 Computer (Cache) Core FuncYonal Unit(s) A 0 +B 0 A 1 +B 1 A 2 +B 2 A 3 +B 3 Project 3 Review: SynchronizaYon in MIPS Load linked: ll rt,offset(rs) Store condiyonal: sc rt,offset(rs) Succeeds if locayon not changed since the ll Returns 1 in rt (clobbers register value being stored) Fails if locayon has changed Returns 0 in rt (clobbers register value being stored) Example: atomic swap (to test/set lock variable) Exchange contents of reg and mem: $s4 ($s1) try: add $t0,$zero,$s4 ;copy exchange value ll $t1,0($s1) ;load linked sc $t0,0($s1) ;store conditional beq $t0,$zero,try ;branch store fails add $s4,$zero,$t1 ;put load value in $s4 4 OpenMP IntroducYon Administrivia OpenMP Examples Technology Break Common Pijalls Summary Agenda OpenMP IntroducYon Administrivia OpenMP Examples Technology Break Common Pijalls Summary Agenda 5 6 1
2 Machines in 61C Lab /usr/sbin/sysctl -a grep hw\. hw.model = MacPro4,1 hw.cachelinesize = 64 hw.l1icachesize: 32,768 hw.physicalcpu: 8 hw.l1dcachesize: 32,768 hw.logicalcpu: 16 hw.l2cachesize: 262,144 hw.l3cachesize: 8,388,608 hw.cpufrequency = 2,260,000,000 Try up to 16 threads to hw.physmem = see if performance gain 2,147,483,648 even though only 8 cores hw.model = MacBookAir3,1 hw.physicalcpu: 2 hw.logicalcpu: 2 hw.cpufrequency: 1,600,000,000 hw.physmem = 2,147,483,648 Randy s Laptop hw.cachelinesize = 64 hw.l1icachesize = hw.l1dcachesize = hw.l2cachesize = 3,145,728 No l3 cache Dual core One hw context per core 7 8 What is OpenMP? OpenMP SpecificaYon API used for muly- threaded, shared memory parallelism Compiler DirecYves RunYme Library RouYnes Environment Variables Portable Standardized See hgp:// hgp://compuyng.llnl.gov/tutorials/openmp/ 9 10 Shared Memory Model with Explicit Thread- based Parallelism Shared memory process consists of mulyple threads, explicit programming model with full programmer control over parallelizayon Pros: Takes advantage of shared memory, programmer need not worry (that much) about data placement Programming model is serial- like and thus conceptually simpler than alternayves (e.g., message passing/mpi) Compiler direcyves are generally simple and easy to use Legacy serial code does not need to be rewrigen Cons: Codes can only be run in shared memory environments! Compiler must support OpenMP (e.g., gcc 4.2) 11 OpenMP Uses the C Extension Pragmas Mechanism Pragmas are a preprocessor mechanism that C provides for language extensions Many different uses: structure packing, symbol aliasing, floayng point excepyon modes, Good for OpenMP because compilers that don't recognize a pragma are supposed to ignore them Runs on sequenyal computer even with embedded pragmas 12 2
3 OpenMP Programming Model OpenMP DirecYves Fork - Join Model: OpenMP programs begin as single process: master thread; Executes sequenyally unyl the first parallel region construct is encountered FORK: the master thread then creates a team of parallel threads Statements in program that are enclosed by the parallel region construct are executed in parallel among the various team threads JOIN: When the team threads complete the statements in the parallel region construct, they synchronize and terminate, leaving only the master thread shares iterayons of a loop across the team each secyon executed by a separate thread serializes the execuyon of a thread Building Block: C for loop for (i=0; i<max; i++) zero[i] = 0; Break for loop into chunks, and allocate each to a separate thread E.g., if max = 100, with two threads, assign 0-49 to thread 0, to thread 1 Must have relayvely simple shape for an OpenMP- aware compiler to be able to parallelize it Necessary for the run- Yme system to be able to determine how many of the loop iterayons to assign to each thread No premature exits from the loop allowed i.e., No break, return, exit, goto statements 15 OpenMP: Parallel for pragma #pragma omp parallel for for (i=0; i<max; i++) zero[i] = 0; Master thread creates addiyonal threads, each with a separate execuyon context All variables declared outside for loop are shared by default, except for loop index which is private per thread (Why?) Implicit synchronizayon at end of for loop Divide index regions sequenyally per thread Thread 0 gets 0, 1,, (max/n)- 1; Thread 1 gets max/n, max/n+1,, 2*(max/n)- 1 Why? 16 Student RouleGe? Thread CreaYon How many threads will OpenMP create? Defined by OMP_NUM_THREADS environment variable (or in code procedure call) Set this variable to the maximum number of threads you want OpenMP to use Usually equals the number of cores in the underlying HW on which the program is run OMP_NUM_THREADS Shell command to set number threads: export OMP_NUM_THREADS=x Shell command check number threads: echo $OMP_NUM_THREADS OpenMP intrinsic to set number of threads: omp_num_threads(x); OpenMP intrinsic to get number of threads: num_th = omp_get_num_threads(); OpenMP intrinsic to get Thread ID number: th_id = omp_get_thread_num();
4 Parallel Threads and Scope Agenda Each thread executes a copy of the code within the structured block #pragma omp parallel ID = omp_get_thread_num(); foo(id); OpenMP default is shared variables To make private, need to declare with pragma OpenMP IntroducYon Administrivia OpenMP Examples Technology Break Common Pijalls Summary #pragma omp parallel private (x) 19 Administrivia 20 Administrivia Regrade Policy Next Lab and Project Rubric on- line QuesYons covered in Discussion SecYon this week Wri@en appeal process Lab #9: Thread Level Parallelism, this week HW #5: due March 27, Make and Threads Project #3: Matrix MulYply Performance Improvement Explain rayonale for regrade request AGach rayonale to exam Submit to your TA in this week s laboratory Work in groups of two! SYck around a er class for shotgun marriage Part 1: March 27 (end of Spring Break) Part 2: April Agenda 23 hgp://en.wikipedia.org/wiki/daimler_dingo OpenMP IntroducYon Administrivia OpenMP Examples Technology Break Common Pijalls Summary 24 4
5 OpenMP Examples Hello World OMP Get Environment (hidden slides) For Loop Workshare SecYon Workshare SequenYal and Parallel Numerical CalculaYon of Pi Hello World int main (void) int nthreads, tid; /* Fork team of threads with each having a private tid variable */ #pragma omp parallel private(tid) /* Obtain and print thread id */ tid = omp_get_thread_num(); printf("hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) nthreads = omp_get_num_threads(); printf("number of threads = %d\n", nthreads); /* All threads join master thread and terminate */ Hello World localhost:openmp randykatz$./omp_hello Hello World from thread = 0 Hello World from thread = 1 27 Get Environment InformaYon #include <stdlib.h> int main (int argc, char *argv[]) int nthreads, tid, procs, maxt, inpar, dynamic, nested; #pragma omp parallel private(nthreads, tid) tid = omp_get_thread_num(); /* Obtain thread number */ if (tid == 0) /* Only master thread does this */ printf("thread %d getting environment info...\n", tid); /* Get environment information */ procs = omp_get_num_procs(); nthreads = omp_get_num_threads(); maxt = omp_get_max_threads(); inpar = omp_in_parallel(); dynamic = omp_get_dynamic(); nested = omp_get_nested(); 3/14/11 Spring Lecture #16 28 Get Environment InformaYon /* Print environment information */ printf("number of processors = %d\n", procs); printf("number of threads = %d\n", nthreads); printf("max threads = %d\n", maxt); printf("in parallel? = %d\n", inpar); printf("dynamic threads enabled? = %d\n", dynamic); printf("nested parallelism supported? = %d\n", nested); /* Parallel Done */ Get Environment InformaYon Thread 0 getting environment info... Number of processors = 2 Max threads = 1 In parallel? = 1 Dynamic threads enabled? = 0 Nested parallelism supported? = 0 Executes two threads Prints informadon from thread 0 only At this point just one thread available, hence max threads = 1 3/14/11 Spring Lecture # /14/11 Spring Lecture #
6 Workshare for loop Example #1 #include <stdlib.h> #define CHUNKSIZE 10 #define N 100 int main (int argc, char *argv[]) int nthreads, tid, i, chunk; float a[n], b[n], c[n]; /* Some initializations */ for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; chunk = CHUNKSIZE; 31 Workshare for loop Example #1 #pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid) tid = omp_get_thread_num(); if (tid == 0) nthreads = omp_get_num_threads(); printf("number of threads = %d\n", nthreads); printf("thread %d starting...\n",tid); #pragma omp for schedule(dynamic,chunk) for (i=0; i<n; i++) c[i] = a[i] + b[i]; printf("thread %d: c[%d]= %f\n",tid,i,c[i]); /* end of parallel section */ The iteradons of a loop are scheduled dynamically across the team of threads. A thread will perform CHUNK iteradons at a Dme before being scheduled for the next CHUNK of work. 32 Workshare for loop Example #1... Thread 0 staryng... Thread 1: c[82]= Thread 0: c[0]= Thread 1: c[83]= Thread 0: c[1]= Thread 1: c[84]= Thread 0: c[2]= Thread 1: c[85]= Thread 0: c[3]= Thread 0: c[70]= Thread 0: c[4]= Thread 1: c[86]= Thread 0: c[5]= Thread 0: c[71]= Thread 0: c[6]= Thread 1: c[87]= Thread 0: c[7]= Thread 0: c[72]= Thread 0: c[8]= Thread 1: c[88]= Thread 0: c[9]= Thread 0: c[73]= Thread 0: c[10]= Thread 1: c[89]= Thread 1 staryng... Thread 0: c[74]= Thread 0: c[11]= Thread 1: c[90]= Thread 1: c[20]= Thread 0: c[75]= Thread 0: c[12]= Thread 1: c[91]= Thread 1: c[21]= Thread 0: c[76]= Thread 0: c[13]= Thread 0: c[77]= Thread 1: c[22]= Thread 0: c[78]= Thread 0: c[14]= Thread 0: c[79]= Thread 1: c[23]= Thread 1: c[92]= Thread 0: c[15]= Thread 1: c[93]= Thread 1: c[24]= Thread 1: c[94]= Thread 0: c[16]= Thread 1: c[95]= Thread 1: c[25]= Thread 1: c[96]= Thread 1: c[26]= Thread 1: c[97]= Thread 1: c[27]= Thread 1: c[98]= Thread 1: c[28]= Thread 1: c[99]= Workshare for loop Example #1... Thread 0 staryng... Thread 0: c[78]= Thread 1 staryng... Thread 1: c[25]= Thread 0: c[0]= Thread 0: c[79]= Thread 1: c[10]= Thread 1: c[26]= Thread 0: c[1]= Thread 0: c[80]= Thread 1: c[11]= Thread 1: c[27]= Thread 0: c[2]= Thread 0: c[81]= Thread 1: c[12]= Thread 1: c[28]= Thread 0: c[3]= Thread 0: c[82]= Thread 1: c[13]= Thread 1: c[29]= Thread 0: c[4]= Thread 0: c[83]= Thread 1: c[14]= Thread 1: c[90]= Thread 0: c[5]= Thread 0: c[84]= Thread 1: c[15]= Thread 1: c[91]= Thread 0: c[6]= Thread 0: c[85]= Thread 1: c[16]= Thread 1: c[92]= Thread 1: c[17]= Thread 0: c[86]= Thread 1: c[18]= Thread 1: c[93]= Thread 0: c[7]= Thread 0: c[87]= Thread 1: c[19]= Thread 1: c[94]= Thread 0: c[8]= Thread 0: c[88]= Thread 1: c[20]= Thread 1: c[95]= Thread 0: c[9]= Thread 0: c[89]= Thread 1: c[21]= Thread 1: c[96]= Thread 0: c[30]= Thread 1: c[97]= Thread 1: c[22]= Thread 1: c[98]= Thread 0: c[31]= Thread 1: c[99]= Thread 1: c[23]= A er export OMP_IN_PARALLEL=2 Workshare sections Example #2 #include <stdlib.h> #define N 50 int main (int argc, char *argv[]) int i, nthreads, tid; float a[n], b[n], c[n], d[n]; /* Some initializations */ for (i=0; i<n; i++) a[i] = i * 1.5; b[i] = i ; c[i] = d[i] = 0.0; Workshare sections Example #2 #pragma omp parallel shared(a,b,c,d,nthreads) private(i,tid) tid = omp_get_thread_num(); if (tid == 0) nthreads = omp_get_num_threads(); printf("number of threads = %d\n", nthreads); printf("thread %d starting...\n",tid); #pragma omp sections nowait #pragma omp section printf("thread %d doing section 1\n",tid); for (i=0; i<n; i++) c[i] = a[i] + b[i]; printf("thread %d: c[%d]= %f\n",tid,i,c[i]); SecDons
7 Workshare sections Example #2 #pragma omp section printf("thread %d doing section 2\n",tid); for (i=0; i<n; i++) d[i] = a[i] * b[i]; printf("thread %d: d[%d]= %f\n",tid,i,d[i]); /* end of sections */ printf("thread %d done.\n",tid); /* end of parallel section */ SecDons 37 Workshare sections Example #2... Thread 0 staryng... Thread 0: d[26]= Thread 0 doing secyon 1 Thread 0: d[27]= Thread 0: c[0]= Thread 0: d[28]= Thread 0: c[1]= Thread 0: d[29]= Thread 0: c[2]= Thread 0: d[30]= Thread 0: c[3]= Thread 0: d[31]= Thread 0: c[4]= Thread 0: d[32]= Thread 0: c[5]= Thread 0: d[33]= Thread 0: c[6]= Thread 0: d[34]= Thread 0: c[7]= Thread 0: d[35]= Thread 0: c[8]= Thread 0: d[36]= Thread 0: c[9]= Thread 0: d[37]= Thread 0: c[10]= Thread 0: d[38]= Thread 0: c[11]= Thread 0: d[39]= Thread 0: c[12]= Thread 0: d[40]= Thread 0: c[13]= Thread 0: d[41]= Thread 0: c[14]= Thread 0: d[42]= Thread 0: c[15]= Thread 0: d[43]= Thread 0: c[16]= Thread 0: d[44]= Thread 0: c[17]= Thread 0: d[45]= Thread 0: c[18]= Thread 0: d[46]= Thread 0: c[19]= Thread 0: d[47]= Thread 0: c[20]= Thread 0: d[48]= Thread 0: c[21]= Thread 0: d[49]= Thread 0: c[22]= Thread 0 done. Thread 0: c[23]= Thread 1 staryng... Thread 0: c[24]= Thread 1 done. 38 Workshare sections Example #2... Thread 1 staryng... Thread 1: c[33]= Thread 0 staryng... Thread 0: d[41]= Thread 1 doing secyon 1 Thread 1: c[34]= Thread 0 doing secyon 2 Thread 0: d[42]= Thread 1: c[0]= Thread 1: c[35]= Thread 0: d[0]= Thread 0: d[43]= Thread 1: c[1]= Thread 1: c[36]= Thread 0: d[1]= Thread 0: d[44]= Thread 1: c[2]= Thread 1: c[37]= Thread 0: d[2]= Thread 1: c[38]= Thread 1: c[3]= Thread 1: c[39]= Thread 0: d[3]= Thread 1: c[40]= Thread 1: c[4]= Thread 1: c[41]= Thread 0: d[4]= Thread 1: c[42]= Thread 1: c[5]= Thread 1: c[43]= Thread 0: d[5]= Thread 1: c[44]= Thread 1: c[6]= Thread 0: d[45]= Thread 0: d[6]= Thread 1: c[45]= Thread 1: c[7]= Thread 0: d[46]= Thread 0: d[7]= Thread 1: c[46]= Thread 0: d[8]= Thread 0: d[47]= Thread 0: d[9]= Thread 1: c[47]= Thread 0: d[10]= Thread 0: d[48]= Thread 0: d[11]= Thread 0: d[49]= Thread 0: d[12]= Thread 0 done. Nowait in secyons Thread 0: d[13]= Thread 1: c[48]= Thread 0: d[14]= Thread 1: c[49]= Thread 0: d[15]= Thread 1 done. 39 A er export OMP_IN_PARALLEL=2 Riemann Sum ApproximaYon: CalculaYng π using Numerical IntegraYon P is a paryyon of [a, b] into n subintervals, [x 0, x 1 ], [x 1, x 2 ],..., [x n - 1, x n ] each of width x. x i * is in the interval [x i - 1, x i ] for each i. As P 0, n infinity and x 0. x i * to be the midpoint of each subinterval, i.e. x i * = ½(x i- 1 + x i ), for each i from 1 to n. hgp:// bjw 40 SequenYal CalculaYon of π in C 41 /* Serial Code */ static long num_steps = ; double step; int main (int argc, char *argv[]) int i; double x, pi, sum = 0.0; for (i=0; i < num_steps; i++) x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); pi = step/num_steps; printf ("pi = %6.12f\n", pi); 42 pi =
8 OpenMP Version #1 static long num_steps = ; double step; #define NUM_THREADS 2 int main (int argc, char *argv[]) int i, id; double x, pi, sum[num_threads]; #pragma omp parallel private(x) id = omp_get_thread_num(); for (i = id, sum[id] = 0.0; i < num_steps; i = i + NUM_THREADS) x = (i + 0.5)*step; sum[id] += 4.0/(1.0+x*x); for(i = 0, pi = 0.0; i < NUM_THREADS; i++) pi += sum[i] ; printf ("pi = %6.12f\n", pi / num_steps); 43 OpenMP Version #1 (with bug) pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = OpenMP CriYcal SecYon int main(void) int x; x = 0; #pragma omp parallel shared(x) #pragma omp critical x = x + 1; /* end of parallel section */ Only one thread executes the following code block at a Yme Compiler generates necessary lock/unlock code around the increment of x 45 Student RouleGe? OpenMP Version #2 static long num_steps = ; double step; #define NUM_THREADS 2 int main (int argc, char *argv[]) int i, id; double x, sum, pi=0.0; #pragma omp parallel private (x, sum) id = omp_get_thread_num(); for (i = id, sum = 0.0; i < num_steps; i = i + NUM_THREADS) x = (i + 0.5)*step; sum += 4.0/(1.0+x*x); #pragma omp critical pi += sum; printf ("pi = %6.12f,\n", pi/num_steps); 46 OpenMP Version #2 (with bug) pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = OpenMP ReducYon ReducDon: specifies that one or more variables that are private to each thread are subject of reducyon operayon at end of parallel region: reduction(operation:var) where OperaDon: operator to perform on the variables (var) at the end of the parallel region Var: One or more variables on which to perform scalar reducyon #pragma omp for reduction(+:nsum) for (i = START ; i <= END ; i++) nsum += i; 48 8
9 OpenMP Version #3 /static long num_steps = ; double step; int main (int argc, char *argv[]) int i; double x, pi, sum = 0.0; Note: Don t have to declare for loop index variable i private, since that is default #pragma omp parallel for private(x) reduction(+:sum) for (i = 0; i < num_steps; i++) x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); pi = sum / num_steps; printf ( pi = %6.8f\n, pi); 49 OpenMP Timing omp_get_wyme Elapsed wall clock Yme double omp_get_wtime(void); // to get function Elapsed wall clock Yme in seconds. Time is measured per thread, no guarantee can be made that two disynct threads measure the same Yme. Time is measured from some "Yme in the past". On POSIX compliant systems the seconds since the Epoch (00:00:00 UTC, January 1, 1970) are returned. 50 OpenMP Version #3 Agenda pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds pi = in seconds OpenMP IntroducYon Administrivia OpenMP Examples Technology Break Common Pijalls Summary OpenMP Version #4 static long num_steps = ; double step; int main (int argc, char *argv[]) int i, id; double x, pi, sum; sum = 0.0; #pragma omp parallel #pragma omp for for (i = 0; i < num_steps; i++) x = (i + 0.5)*step; sum += 4.0/(1.0+x*x); printf ("pi = %6.12f\n", step * sum); 53 3/14/11 Spring Lecture #
10 OpenMP Version #4 (with bug) pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = pi = OpenMP Version #5 static long num_steps = ; double step; int main (int argc, char *argv[]) int i, id; double x, pi, sum, sum0; sum = 0.0; #pragma omp parallel private(x,sum0) shared (step,sum) sum0 = 0.0; #pragma omp for for (i = 0; i < num_steps; i++) x = (i + 0.5)*step; sum0 += 4.0/(1.0+x*x); sum = sum + sum0; printf ("pi = %6.12f\n", step * sum); 3/14/11 Spring Lecture # /14/11 Spring Lecture #16 56 OpenMP Version #6 static long num_steps = ; double step; int main (int argc, char *argv[]) int i, id; double x, pi, sum, sum0; sum = 0.0; #pragma omp parallel private(x,sum0) shared (step,sum) sum0 = 0.0; #pragma omp for for (i = 0; i < num_steps; i++) x = (i + 0.5)*step; sum0 += 4.0/(1.0+x*x); #pragma omp critical sum = sum + sum0; printf ("pi = %6.12f\n", step * sum); 3/14/11 Spring Lecture #16 57 OpenMP Version #6 3/14/11 Spring Lecture #16 58 DescripYon of 32 Core System Intel Nehalem Xeon 7550 HW MulYthreading: 2 Threads / core 8 cores / chip 4 chips / board 64 Threads / system 2.00 GHz 256 KB L2 cache/ core 18 MB (!) shared L3 cache / chip How more than 1 thread per core? 59 Student RouleGe? Matrix MulYply start_time = omp_get_wtime(); #pragma omp parallel for private(tmp, i, j, k) for (i=0; i<ndim; i++) for (j=0; j<mdim; j++) tmp = 0.0; for( k=0; k<pdim; k++) Note: Outer loop spread across N threads; inner loops inside a thread /* C(i,j) = sum(over k) A(i,k) * B(k,j) */ tmp += *(A+(i*Ndim+k)) * *(B+(k*Pdim+j)); *(C+(i*Ndim+j)) = tmp; run_time = omp_get_wtime() - start_time; 60 10
11 Notes on Matrix MulYply Example More performance opymizayons available Higher compiler opymizayon (- O2, - O3) to reduce number of instrucyons executed Cache blocking to improve memory performance Using SIMD SSE3 InstrucYons to raise floayng point computayon rate OpenMP Pijall #1: Data Dependencies Consider the following code: a[0] = 1; for(i=1; i<5; i++) a[i] = i + a[i-1]; There are dependencies between loop iterayons SecYons of loops split between threads will not necessarily execute in order Out of order loop execuyon will result in undefined behavior Open MP Pijall #2: Avoiding Dependencies by Using Private Variables Consider the following loop: #pragma omp parallel for for(i=0; i<n; i++) temp = 2.0*a[i]; a[i] = temp; b[i] = c[i]/temp; Threads share common address space: will be modifying temp simultaneously; soluyon: #pragma omp parallel for private(temp) for(i=0; i<n; i++) temp = 2.0*a[i]; a[i] = temp; b[i] = c[i]/temp; 63 OpenMP Pijall #3: UpdaYng Shared Variables Simultaneously Now consider a global sum: for(i=0; i<n; i++) sum = sum + a[i]; This can be done by surrounding the summayon by a criycal secyon, but for convenience, OpenMP also provides the reducyon clause: #pragma omp parallel for reduction(+:sum) for(i=0; i<n; i++) sum = sum + a[i]; Compiler can generate highly efficient code for reducyon 64 OpenMP Pijall #3: Parallel Overhead Spawning and releasing threads results in significant overhead Therefore, you want to make your parallel regions as large as possible Parallelize over the largest loop that you can (even though it will involve more work to declare all of the private variables and eliminate dependencies) Coarse granularity is your friend! And in Conclusion, SequenYal so ware is slow so ware SIMD and MIMD only path to higher performance MulYprocessor/MulYcore uses Shared Memory Cache coherency implements shared memory even with mulyple copies in mulyple caches False sharing a concern; watch block size! Data races lead to subtle parallel bugs SynchronizaYon via atomic operayons: MIPS does it with Load Linked + Store CondiYonal OpenMP as simple parallel extension to C Threads, Parallel for, private, criycal secyons,
You Are Here! Peer Instruction: Synchronization. Synchronization in MIPS. Lock and Unlock Synchronization 7/19/2011
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread Level Parallelism: OpenMP Instructor: Michael Greenbaum 1 Parallel Requests Assigned to computer e.g., Search Katz Parallel Threads
More informationCS 61C: Great Ideas in Computer Architecture. OpenMP, Transistors
CS 61C: Great Ideas in Computer Architecture OpenMP, Transistors Instructor: Justin Hsia 7/17/2012 Summer 2012 Lecture #17 1 Review of Last Lecture Amdahl s Law limits benefits of parallelization Multiprocessor
More informationCS 61C: Great Ideas in Computer Architecture. Synchronization, OpenMP
CS 61C: Great Ideas in Computer Architecture Synchronization, OpenMP Guest Lecturer: Justin Hsia 3/15/2013 Spring 2013 Lecture #22 1 Review of Last Lecture Multiprocessor systems uses shared memory (single
More informationCS 61C: Great Ideas in Computer Architecture. OpenMP, Transistors
CS 61C: Great Ideas in Computer Architecture OpenMP, Transistors Instructor: Justin Hsia 7/23/2013 Summer 2013 Lecture #17 1 Review of Last Lecture Amdahl s Law limits benefits of parallelization Multiprocessor
More informationCS 61C: Great Ideas in Computer Architecture TLP and Cache Coherency
CS 61C: Great Ideas in Computer Architecture TLP and Cache Coherency Instructor: Bernhard Boser and Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa16 11/10/16 Fall 2016 -- Lecture #20 1 New-School
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread Level Parallelism
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread Level Parallelism Instructors: Randy H. Katz David A. PaFerson hfp://inst.eecs.berkeley.edu/~cs61c/sp11 3/10/11 1 Spring 2011 -
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Review
More informationReview!of!Last!Lecture! CS!61C:!Great!Ideas!in!! Computer!Architecture! Synchroniza+on,- OpenMP- Great!Idea!#4:!Parallelism! Agenda!
CS61C:GreatIdeasin ComputerArchitecture Miki$Lus(g ReviewofLastLecture Mul?processorsystemsusessharedmemory (singleaddressspace) Cachecoherenceimplementssharedmemory evenwithmul?plecopiesinmul?plecaches
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More informationCS 61C: Great Ideas in Computer Architecture. Synchroniza+on, OpenMP. Senior Lecturer SOE Dan Garcia
CS 61C: Great Ideas in Computer Architecture Synchroniza+on, OpenMP Senior Lecturer SOE Dan Garcia 1 Review of Last Lecture Mul@processor systems uses shared memory (single address space) Cache coherence
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Review
More informationTieing the Threads Together
Tieing the Threads Together 1 Review Sequential software is slow software SIMD and MIMD are paths to higher performance MIMD thru: multithreading processor cores (increases utilization), Multicore processors
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread- Level Parallelism (TLP) and OpenMP. Serializes the execunon of a thread
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread- Level Parallelism (TLP) and OpenMP Instructors: Krste Asanovic & Vladimir Stojanovic hep://inst.eecs.berkeley.edu/~cs61c/ Review
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread- Level Parallelism (TLP) and OpenMP
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread- Level Parallelism (TLP) and OpenMP Instructors: Krste Asanovic & Vladimir Stojanovic hap://inst.eecs.berkeley.edu/~cs61c/ Review
More informationCS 61C: Great Ideas in Computer Architecture TLP and Cache Coherency
CS 61C: Great Ideas in Computer Architecture TLP and Cache Coherency Instructors: Krste Asanović and Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 11/8/17 Fall 2017 -- Lecture #20 1 Parallel
More informationInstructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #16. Warehouse Scale Computer
CS 61C: Great Ideas in Computer Architecture OpenMP Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13 10/23/13 Fall 2013 - - Lecture #16 1 New- School Machine Structures (It s a bit more
More informationAssignment 1 OpenMP Tutorial Assignment
Assignment 1 OpenMP Tutorial Assignment B. Wilkinson and C Ferner: Modification date Aug 5, 2014 Overview In this assignment, you will write and execute run some simple OpenMP programs as a tutorial. First
More informationCS 61C: Great Ideas in Computer Architecture Lecture 20: Thread- Level Parallelism (TLP) and OpenMP Part 2
CS 61C: Great Ideas in Computer Architecture Lecture 20: Thread- Level Parallelism (TLP) and OpenMP Part 2 Instructor: Sagar Karandikar sagark@eecs.berkeley.edu hcp://inst.eecs.berkeley.edu/~cs61c 1 Review:
More informationCS 61C: Great Ideas in Computer Architecture Lecture 20: Thread- Level Parallelism (TLP) and OpenMP Part 2
CS 61C: Great Ideas in Computer Architecture Lecture 20: Thread- Level Parallelism (TLP) and OpenMP Part 2 Instructor: Sagar Karandikar sagark@eecs.berkeley.edu hgp://inst.eecs.berkeley.edu/~cs61c 1 Review:
More informationIntroduction to OpenMP
Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationby system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call
OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2015 OpenMP Provides thread programming model at a
More informationData Environment: Default storage attributes
COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes
More informationOpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015
OpenMP, Part 2 EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2015 References This presentation is almost an exact copy of Dartmouth College's openmp tutorial.
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationCS 110 Computer Architecture
CS 110 Computer Architecture Thread-Level Parallelism (TLP) and OpenMP Intro Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech
More informationECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications
ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Variable Sharing in OpenMP OpenMP synchronization issues OpenMP performance issues November 4, 2015 Lecture 22 Dan Negrut, 2015
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationShared Memory Parallelism using OpenMP
Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o
More information/Users/engelen/Sites/HPC folder/hpc/openmpexamples.c
/* Subset of these examples adapted from: 1. http://www.llnl.gov/computing/tutorials/openmp/exercise.html 2. NAS benchmarks */ #include #include #ifdef _OPENMP #include #endif
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Introduc)on to Machine Language
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Introduc)on to Machine Language TA: Sco? Beamer h?p://inst.eecs.berkeley.edu/~cs61c/sp12 1 New- School Machine Structures (It s a bit more
More informationIntroduction to OpenMP.
Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i
More informationCS 61C: Great Ideas in Computer Architecture Thread Level Parallelism (TLP)
CS 61C: Great Ideas in Computer Architecture Thread Level Parallelism (TLP) Instructor: David A. Pa@erson h@p://inst.eecs.berkeley.edu/~cs61c/sp12 Spring 2012 - - Lecture #16 1 New- School Machine Structures
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationOpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationME759 High Performance Computing for Engineering Applications
ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationOpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono
OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 2 OpenMP Shared address space programming High-level
More informationIntroduction to OpenMP
Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationCS 61C: Great Ideas in Computer Architecture Lecture 19: Thread- Level Parallelism and OpenMP Intro
CS 61C: Great Ideas in Computer Architecture Lecture 19: Thread- Level Parallelism and OpenMP Intro Instructor: Sagar Karandikar sagark@eecs.berkeley.edu hbp://inst.eecs.berkeley.edu/~cs61c 1 Review Amdahl
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationHPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas
HPCSE - I «OpenMP Programming Model - Part I» Panos Hadjidoukas 1 Schedule and Goals 13.10.2017: OpenMP - part 1 study the basic features of OpenMP able to understand and write OpenMP programs 20.10.2017:
More informationIntroduction to OpenMP
Introduction to OpenMP Christian Terboven 10.04.2013 / Darmstadt, Germany Stand: 06.03.2013 Version 2.3 Rechen- und Kommunikationszentrum (RZ) History De-facto standard for
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationIntroduction to. Slides prepared by : Farzana Rahman 1
Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More informationMango DSP Top manufacturer of multiprocessing video & imaging solutions.
1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.
More informationOpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,
More informationOpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system
OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives
More informationOpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++
OpenMP OpenMP Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum 1997-2002 API for Fortran and C/C++ directives runtime routines environment variables www.openmp.org 1
More informationIntroduction to OpenMP
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory
More informationhttps://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG
https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG OpenMP Basic Defs: Solution Stack HW System layer Prog. User layer Layer Directives, Compiler End User Application OpenMP library
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationLecture 2 A Hand-on Introduction to OpenMP
CS075 1896 1920 1987 2006 Lecture 2 A Hand-on Introduction to OpenMP, 2,1 01 1 2 Outline Introduction to OpenMP Creating Threads Synchronization between variables Parallel Loops Synchronize single masters
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationOpenMP Tutorial. Rudi Eigenmann of Purdue, Sanjiv Shah of Intel and others too numerous to name have contributed content for this tutorial.
OpenMP * in Action Tim Mattson Intel Corporation Barbara Chapman University of Houston Acknowledgements: Rudi Eigenmann of Purdue, Sanjiv Shah of Intel and others too numerous to name have contributed
More informationOpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationParallel programming using OpenMP
Parallel programming using OpenMP Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of
More informationProgramming with OpenMP*
Objectives At the completion of this module you will be able to Thread serial code with basic OpenMP pragmas Use OpenMP synchronization pragmas to coordinate thread execution and memory access 2 Agenda
More informationAdvanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele
Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb
More informationShared Memory Programming with OpenMP
Shared Memory Programming with OpenMP Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Copyright 2013, 2014, 2017 2019 Moreno Marzolla, Università
More informationParallel Processing/Programming
Parallel Processing/Programming with the applications to image processing Lectures: 1. Parallel Processing & Programming from high performance mono cores to multi- and many-cores 2. Programming Interfaces
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.
More informationIntroduction to OpenMP
Introduction to OpenMP Xiaoxu Guan High Performance Computing, LSU April 6, 2016 LSU HPC Training Series, Spring 2016 p. 1/44 Overview Overview of Parallel Computing LSU HPC Training Series, Spring 2016
More informationOpenMP: Open Multi-Processing
OpenMP: Open Multi-Processing Chris Kauffman CS 499: Spring 2016 GMU Logistics Today OpenMP Wrap-up Mini-Exam 3 Reading Grama 7.10 (OpenMP) OpenMP Tutorial at Laurence Livermore Grama 7.1-9 (PThreads)
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with
More informationDistributed Systems + Middleware Concurrent Programming with OpenMP
Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola
More informationOpenMP: The "Easy" Path to Shared Memory Computing
OpenMP: The "Easy" Path to Shared Memory Computing Tim Mattson Intel Corp. timothy.g.mattson@intel.com 1 * The name OpenMP is the property of the OpenMP Architecture Review Board. Copyright 2012 Intel
More informationOpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationOPENMP OPEN MULTI-PROCESSING
OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with
More informationAlfio Lazzaro: Introduction to OpenMP
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set
More informationShared Memory Programming Paradigm!
Shared Memory Programming Paradigm! Ivan Girotto igirotto@ictp.it Information & Communication Technology Section (ICTS) International Centre for Theoretical Physics (ICTP) 1 Multi-CPUs & Multi-cores NUMA
More informationSession 4: Parallel Programming with OpenMP
Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationParallel and Distributed Programming. OpenMP
Parallel and Distributed Programming OpenMP OpenMP Portability of software SPMD model Detailed versions (bindings) for different programming languages Components: directives for compiler library functions
More informationMulti-core Architecture and Programming
Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel
More informationShared memory parallel computing
Shared memory parallel computing OpenMP Sean Stijven Przemyslaw Klosiewicz Shared-mem. programming API for SMP machines Introduced in 1997 by the OpenMP Architecture Review Board! More high-level than
More informationParallel Processing Top manufacturer of multiprocessing video & imaging solutions.
1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More information[Potentially] Your first parallel application
[Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel
More information