Programming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen
|
|
- Teresa Tyler
- 5 years ago
- Views:
Transcription
1 Programmig with Shared Memory PART II HPC Sprig 2017 Prof. Robert va Egele
2 Overview Sequetial cosistecy Parallel programmig costructs Depedece aalysis OpeMP Autoparallelizatio Further readig HPC Sprig
3 Sequetial Cosistecy Sequetial cosistecy: the result of a parallel program is always the same as the sequetial program, irrespective of the statemet iterleavig that is a result of parallel executio Parallel program (a, b, x, y, z are shared) time a=5; x=1; y=x+3; z=x+y; a=5; x=1; y=x+3; z=x+y; Ay order of permitted statemet iterleavigs x=1; y=x+3; a=5; z=x+y; Output a, b, x, y, z HPC Sprig
4 Data Flow: Implicitly Parallel x=1; a=5; y=x+3; z=x+y; output a, b, x, y, z Flow depedeces determie the parallel executio schedule: each operatio waits util operads are produced HPC Sprig
5 Explicit Parallel Programmig Costructs Declarig shared data, whe private is implicit shared it x; shared it *p; A shared variable Private poiter to a shared value Declarig private data, whe private is explicit private it x; private it *p; Would this make ay sese? HPC Sprig
6 Explicit Parallel Programmig Costructs The par costruct par { S1; S2; S; Statemets i the body are executed cocurretly Oe thread threads S1 S2 S Oe thread HPC Sprig
7 Explicit Parallel Programmig Costructs The forall costruct (also called parfor) forall (i=0; i<; i++) { S1; S2; Sm; Oe thread threads Statemets i the body are executed i serial order by threads i=0..-1 i parallel i=0: S1; S2; Sm; i=1: S1; S2; Sm; i=-1:s1; S2; Sm; Oe thread HPC Sprig
8 Explicit Parallel: May Choices, Which is Safe? par { a=5; x=1; y=x+3; z=x+y; x=1; par { a=5; y=x+3; z=x+y; x=1; par { a=5; y=x+3; z=x+y; Thik about data flow: each operatio requires completio of operads first Data depedeces preserved sequetial cosistecy guarateed HPC Sprig
9 Berstei s Coditios I i is the set of memory locatios read by process P i O j is the set of memory locatios altered by process P j Processes P 1 ad P 2 ca be executed cocurretly if all of the followig coditios are met I 1 Ç O 2 = Æ I 2 Ç O 1 = Æ O 1 Ç O 2 = Æ HPC Sprig
10 Berstei s Coditios Verified by Depedece Aalysis Depedece aalysis performed by a compiler determies that Berstei s coditios are ot violated whe optimizig ad/or parallelizig a program idepedet P 1 : A = x + y; P 2 : B = x + z; I 1 Ç O 2 = Æ I 2 Ç O 1 = Æ O 1 Ç O 2 = Æ RAW P 1 : A = x + y; P 2 : B = x + A; I 1 Ç O 2 = Æ I 2 Ç O 1 = {A O 1 Ç O 2 = Æ WAR P 1 : A = x + B; P 2 : B = x + z; I 1 Ç O 2 = {B I 2 Ç O 1 = Æ O 1 Ç O 2 = Æ WAW P 1 : A = x + y; P 2 : A = x + z; I 1 Ç O 2 = Æ I 2 Ç O 1 = Æ O 1 Ç O 2 = {A HPC Sprig
11 Berstei s Coditios Verified by Depedece Aalysis Depedece aalysis performed by a compiler determies that Berstei s coditios are ot violated whe optimizig ad/or parallelizig a program idepedet P 1 : A = x + y; P 2 : B = x + z; I 1 Ç O 2 = Æ I 2 Ç O 1 = Æ O 1 Ç O 2 = Æ par { P 1 : A = x + y; P 2 : B = x + z; Recall: istructio schedulig for istructio-level parallelism (ILP) HPC Sprig
12 Berstei s Coditios i Loops A parallel loop is valid whe ay orderig of its parallel body yields the same result forall (I=4;I<7;I++) S1: A[I] = A[I-3]+B[I]; S1(4): S1(5): S1(6): A[4] = A[1]+B[4]; A[5] = A[2]+B[5]; A[6] = A[3]+B[6]; S1(4): S1(6): S1(5): A[4] = A[1]+B[4]; A[6] = A[3]+B[6]; A[5] = A[2]+B[5]; S1(5): S1(4): S1(6): A[5] = A[2]+B[5]; A[4] = A[1]+B[4]; A[6] = A[3]+B[6]; S1(6): S1(5): S1(4): A[6] = A[3]+B[6]; A[5] = A[2]+B[5]; A[4] = A[1]+B[4]; S1(5): S1(6): S1(4): A[5] = A[2]+B[5]; A[6] = A[3]+B[6]; A[4] = A[1]+B[4]; S1(6): S1(4): S1(5): A[6] = A[3]+B[6]; A[4] = A[1]+B[4]; A[5] = A[2]+B[5]; HPC Sprig
13 OpeMP OpeMP is a portable implemetatio of commo parallel costructs for shared memory machies OpeMP i C #pragma omp directive_ame statemet_block OpeMP i Fortra!$OMP directive_ame statemet_block!$omp ed directive ame HPC Sprig
14 HPC Sprig
15 OpeMP Parallel The parallel costruct (i OpeMP is ot the same as par) #pragma omp parallel { S1; S2; Sm; A team of threads all execute the body statemets ad joi whe doe threads Oe thread (master thread) Parallel regio S1; S2; Sm; S1; S2; Sm; S1; S2; Sm; Oe thread (master thread) HPC Sprig
16 OpeMP Parallel The parallel costruct #pragma omp parallel default(oe) shared(vars) { S1; S2; Sm; This specifies that variables should ot be assumed to be shared by default threads Oe thread (master thread) Parallel regio S1; S2; Sm; S1; S2; Sm; S1; S2; Sm; Oe thread (master thread) HPC Sprig
17 OpeMP Parallel The parallel costruct #pragma omp parallel private(, i) { = omp_get_um_threads(); i = omp_get_thread_um(); Use private to declare private data What happes if ad/or i are ot declared private? Do Berstei s coditios still hold? omp_get_um_threads() returs the umber of threads that are curretly beig used omp_get_thread_um() returs the thread id (0 to -1) HPC Sprig
18 OpeMP Parallel with Reductio The parallel costruct with reductio clause operatio: +, *, -, &, ^,, &&, #pragma omp parallel reductio(+:var) { var = expr; = var; Performs a global reductio operatio over privatized variable(s) ad assigs fial value to master s private variable(s) or to the shared variable(s) whe shared Oe thread var = expr; var = expr; var = expr; S var HPC Sprig
19 OpeMP Parallel The parallel costruct #pragma omp parallel um_threads() { S1; S2; Sm; Number of threads we wat Alteratively, use omp_set_um_threads() or set eviromet variable OMP_NUM_THREADS Oe thread threads S1; S2; Sm; S1; S2; Sm; S1; S2; Sm; Oe thread HPC Sprig
20 OpeMP Parallel Sectios The sectios costruct is for work-sharig, where a curret team of threads is used to execute differet statemets cocurretly, similar to par #pragma omp parallel #pragma omp sectios { #pragma omp sectio S1; #pragma omp sectio S2; #pragma omp sectio Sm; Statemets i the sectios are executed cocurretly threads executig m sectios S1 S2 Sm barrier HPC Sprig
21 OpeMP Parallel Sectios The sectios costruct is for work-sharig, where a curret team of threads is used to execute differet statemets cocurretly, similar to par #pragma omp parallel #pragma omp sectios owait { #pragma omp sectio S1; #pragma omp sectio S2; #pragma omp sectio Sm; Use owait to remove the implicit barrier threads executig m sectios S1 S2 Sm HPC Sprig
22 OpeMP Parallel Sectios The sectios costruct is for work-sharig, where a curret team of threads is used to execute differet statemets cocurretly, similar to par #pragma omp parallel sectios { #pragma omp sectio S1; #pragma omp sectio S2; #pragma omp sectio Sm; Oe thread Use parallel sectios to combie parallel with sectios threads executig m sectios S1 S2 Sm HPC Sprig
23 OpeMP For/Do The for costruct (do i Fortra) is for work-sharig, where a curret team of threads is used to execute a loop cocurretly #pragma omp parallel #pragma omp for for (i=0; i<k; i++) { S1; S2; Sm; Loop iteratios are executed cocurretly by threads Use owait to remove the implicit barrier threads executig k iteratios i=0: S1; S2; Sm; i=1: S1; S2; Sm; i=k-1:s1; S2; Sm; barrier HPC Sprig
24 OpeMP For/Do The for costruct is for work-sharig, where a curret team of threads is used to execute a loop cocurretly #pragma omp parallel #pragma omp for schedule(dyamic) for (i=0; i<k; i++) { S1; S2; Sm; Whe k>, threads execute radomly chose loop iteratios util all iteratios are completed threads executig k iteratios i=0: S1; S2; Sm; i=1: S1; S2; Sm; i=k-1:s1; S2; Sm; barrier HPC Sprig
25 OpeMP For/Do The for costruct is for work-sharig, where a curret team of threads is used to execute a loop cocurretly #pragma omp parallel #pragma omp for schedule(static) for (i=0; i<4; i++) { S1; S2; Sm; Whe k>, threads are assiged to ék/ù chuks of the iteratio space 2 threads executig 4 iteratios i=0; S1; S2; Sm; i=1; S1; S2; Sm; i=2; S1; S2; Sm; i=3; S1; S2; Sm; barrier HPC Sprig
26 OpeMP For/Do The for costruct is for work-sharig, where a curret team of threads is used to execute a loop cocurretly #pragma omp parallel #pragma omp for schedule(static, 2) for (i=0; i<8; i++) { S1; S2; Sm; 2 threads executig 8 iteratios usig chuk size 2 i a roud-robi fashio Chuk size i=0; S1; S2; Sm; i=1; S1; S2; Sm; i=2; S1; S2; Sm; i=3; S1; S2; Sm; i=4; S1; S2; Sm; i=5; S1; S2; Sm; i=6; S1; S2; Sm; i=7; S1; S2; Sm; barrier HPC Sprig
27 OpeMP For/Do The for costruct is for work-sharig, where a curret team of threads is used to execute a loop cocurretly #pragma omp parallel #pragma omp for schedule(guided, 4) for (i=0; i<k; i++) { S1; S2; Sm; Chuk size Expoetially decreasig chuk size, i this example: 4, 2, 1 HPC Sprig
28 OpeMP For/Do Schedulig Compariso 0 Loop iteratio idex Loop iteratio idex 27 HPC Sprig
29 OpeMP For/Do Schedulig with Load Imbalaces Cost per iteratio 0 Loop iteratio idex 27 HPC Sprig
30 OpeMP For/Do The for costruct is for work-sharig, where a curret team of threads is used to execute a loop cocurretly #pragma omp parallel #pragma omp for schedule(ru_time) for (i=0; i<k; i++) { S1; S2; Sm; Cotrolled by eviromet variable OMP_SCHEDULE: setev OMP_SCHEDULE dyamic setev OMP_SCHEDULE static setev OMP_SCHEDULE static,2 HPC Sprig
31 OpeMP For/Do The for costruct is for work-sharig, where a curret team of threads is used to execute a loop cocurretly operatio: +, *, -, &, ^,, &&, #pragma omp parallel #pragma omp for reductio(+:s) for (i=0; i<k; i++) { s += a[i]; Performs a global reductio operatio over privatized variables ad assigs fial value to master s private variable(s) or to the shared variable(s) whe shared i=0: s += a[0]; i=1: s += a[1]; i=k-1:s += a[k-1]; S s HPC Sprig
32 OpeMP For/Do The for costruct is for work-sharig, where a curret team of threads is used to execute a loop cocurretly #pragma omp parallel for for (i=0; i<k; i++) { S1; S2; Sm; Oe thread Use parallel for to combie parallel with for threads executig k iteratios i=0: S1; S2; Sm; i=1: S1; S2; Sm; i=k-1:s1; S2; Sm; HPC Sprig
33 OpeMP Firstprivate ad Lastprivate The parallel costruct with firstprivate ad/or lastprivate clause x = ; #pragma omp parallel firstprivate(x) lastprivate(y) { x = x + ; #pragma omp for for (i=0; i<k; i++) { y = i; = y; Use firstprivate to declare private variables that are iitialized with the mai thread s value of the variables Likewise, use lastprivate to declare private variables whose values are copied back out to mai thread s variables by the thread that executes the last iteratio of a parallel for loop, or the thread that executes the last parallel sectio HPC Sprig
34 OpeMP Sigle The sigle costruct selects oe thread of the curret team of threads to execute the body #pragma omp parallel #pragma omp sigle { S1; S2; Sm; Oe thread executes the body barrier S1; S2; Sm; HPC Sprig
35 OpeMP Master The master costruct selects the master thread of the curret team of threads to execute the body #pragma omp parallel #pragma omp master { S1; S2; Sm; The master thread executes the body, o barrier is iserted S1; S2; Sm; HPC Sprig
36 OpeMP Critical The critical costruct defies a critical sectio #pragma omp parallel #pragma omp critical ame { S1; S2; Sm; acquire S1; S2; Sm; wait Mutual exclusio is eforced o the body usig a amed lock release S1; S2; Sm; acquire release HPC Sprig
37 OpeMP Critical The critical costruct defies a critical sectio #pragma omp parallel #pragma omp critical qlock { equeue(job); #pragma omp critical qlock { dequeue(job); acquire release Oe thread is here Aother thread is here equeue(job); wait dequeue(job) acquire release HPC Sprig
38 OpeMP Critical The critical costruct defies a critical sectio #pragma omp parallel #pragma omp critical { S1; S2; Sm; acquire S1; S2; Sm; wait Mutual exclusio is eforced o the body usig a aoymous lock release S1; S2; Sm; acquire release HPC Sprig
39 OpeMP Barrier The barrier costruct sychroizes the curret team of threads #pragma omp parallel #pragma omp barrier barrier HPC Sprig
40 OpeMP Atomic The atomic costruct executes a expressio atomically (expressios are restricted to simple updates) #pragma omp parallel #pragma omp atomic expressio; idivisible expressio; expressio; idivisible HPC Sprig
41 OpeMP Atomic The atomic costruct executes a expressio atomically (expressios are restricted to simple updates) #pragma omp parallel #pragma omp atomic = +1; #pragma omp atomic = -1; Oe thread is here Aother thread is here idivisible = +1; = -1 idivisible HPC Sprig
42 OpeMP Flush The flush costruct flushes shared variables from local storage (registers, cache) to shared memory OpeMP adopts a relaxed cosistecy model of memory #pragma omp parallel #pragma omp flush(variables) flush b = 3, but there is o guaratee that a will be 3 = 3; a = b = HPC Sprig
43 OpeMP Relaxed Cosistecy Memory Model Relaxed cosistecy meas that memory updates made by oe CPU may ot be immediately visible to aother CPU Data ca be i registers Data ca be i cache (cache coherece protocol is slow or o-existet) Therefore, the updated value of a shared variable that was set by a thread may ot be available to aother A OpeMP flush is automatically performed at Etry ad exit of parallel ad critical Exit of for Exit of sectios Exit of sigle Barriers HPC Sprig
44 OpeMP Thread Schedulig Cotrolled by eviromet variable OMP_DYNAMIC Whe set to FALSE Same umber of threads used for every parallel regio Whe set to TRUE The umber of threads is adjusted for each parallel regio omp_get_um_threads() returs actual umber of threads omp_get_max_threads() returs OMP_NUM_THREADS Determie optimal umber of threads Determie optimal umber of threads Parallel regio Parallel regio HPC Sprig
45 OpeMP Threadprivate The threadprivate costruct declares variables i a global scope private to a thread across multiple parallel regios Must use whe variables should stay private, eve outside of the curret scope, e.g. across fuctio calls it couter; Global couter #pragma omp threadprivate(couter) #pragma omp parallel { couter = 0; #pragma omp parallel { couter++; Each thread has a local copy of couter Each thread has a local copy of couter HPC Sprig
46 OpeMP Locks Mutex locks, with additioal estable versios of locks omp_lock_t the lock type omp_lock_t lck; omp_iit_lock(&lck); omp_set_lock(&lck); critical sectio omp_uset_lock(&lck); omp_destroy_lock(&lck); omp_iit_lock() iitializatio omp_set_lock() blocks util lock is acquired omp_uset_lock() releases the lock omp_destroy_lock() deallocates the lock HPC Sprig
47 Compiler Optios for OpeMP GOMP project for GCC 4.2 (C ad Fortra) Use #iclude <omp.h> Note: the _OPENMP defie is set whe compilig with OpeMP Itel compiler: icc -opemp ifort -opemp Su compiler: succ -xopemp f95 -xopemp HPC Sprig
48 Autoparallelizatio Compiler applies depedece aalysis ad parallelizes a loop (or etire loop est) automatically whe possible Typically task-parallelizes outer loops (more parallel work), possibly after loop iterchage, fusio, etc. Similar to addig #pragma parallel for to loop(s), with appropriate private ad shared clauses Itel compiler: icc -parallel ifort -parallel Su compiler: succ -xautopar f95 -xautopar HPC Sprig
49 Further Readig [PP2] pages Optioal: OpeMP tutorial at Lawrece Livermore HPC Sprig
Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationParallel Computing. Prof. Marco Bertini
Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local
More informationChapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 9 Poiters ad Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 9.1 Poiters 9.2 Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Slide 9-3
More informationCS 11 C track: lecture 1
CS 11 C track: lecture 1 Prelimiaries Need a CMS cluster accout http://acctreq.cms.caltech.edu/cgi-bi/request.cgi Need to kow UNIX IMSS tutorial liked from track home page Track home page: http://courses.cms.caltech.edu/courses/cs11/material
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationLast class. n Scheme. n Equality testing. n eq? vs. equal? n Higher-order functions. n map, foldr, foldl. n Tail recursion
Aoucemets HW6 due today HW7 is out A team assigmet Submitty page will be up toight Fuctioal correctess: 75%, Commets : 25% Last class Equality testig eq? vs. equal? Higher-order fuctios map, foldr, foldl
More informationCOSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1
COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,
More informationAnalysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis
Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationParallel and Distributed Programming. OpenMP
Parallel and Distributed Programming OpenMP OpenMP Portability of software SPMD model Detailed versions (bindings) for different programming languages Components: directives for compiler library functions
More informationClasses and Objects. Again: Distance between points within the first quadrant. José Valente de Oliveira 4-1
Classes ad Objects jvo@ualg.pt José Valete de Oliveira 4-1 Agai: Distace betwee poits withi the first quadrat Sample iput Sample output 1 1 3 4 2 jvo@ualg.pt José Valete de Oliveira 4-2 1 The simplest
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.
More informationCOMP Parallel Computing. PRAM (1): The PRAM model and complexity measures
COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems
More informationData Handling in OpenMP
Data Handling in OpenMP Manipulate data by threads By private: a thread initializes and uses a variable alone Keep local copies, such as loop indices By firstprivate: a thread repeatedly reads a variable
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies
More informationMulti-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationCMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago Course Evaluatio Very importat Please fill out! 2 Lab3 Brach Predictio Competitio 8 teams etered the competitio,
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationCSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)
CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a
More informationUniprocessors. HPC Prof. Robert van Engelen
Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures
More informationThreads and Concurrency in Java: Part 1
Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationOpenMP C and C++ Application Program Interface Version 1.0 October Document Number
OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................
More informationThreads and Concurrency in Java: Part 1
Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationCOP4020 Programming Languages. Subroutines and Parameter Passing Prof. Robert van Engelen
COP4020 Programmig Laguages Subrouties ad Parameter Passig Prof. Robert va Egele Overview Parameter passig modes Subroutie closures as parameters Special-purpose parameters Fuctio returs COP4020 Fall 2016
More informationChapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.
Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4
More informationChapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 5 Fuctios for All Subtasks Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 5.1 void Fuctios 5.2 Call-By-Referece Parameters 5.3 Usig Procedural Abstractio 5.4 Testig ad Debuggig
More informationSolutions to Final COMS W4115 Programming Languages and Translators Monday, May 4, :10-5:25pm, 309 Havemeyer
Departmet of Computer ciece Columbia Uiversity olutios to Fial COM W45 Programmig Laguages ad Traslators Moday, May 4, 2009 4:0-5:25pm, 309 Havemeyer Closed book, o aids. Do questios 5. Each questio is
More informationCOP4020 Programming Languages. Compilers and Interpreters Prof. Robert van Engelen
COP4020 mig Laguages Compilers ad Iterpreters Prof. Robert va Egele Overview Commo compiler ad iterpreter cofiguratios Virtual machies Itegrated developmet eviromets Compiler phases Lexical aalysis Sytax
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More informationRecursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames
Uit 4, Part 3 Recursio Computer Sciece S-111 Harvard Uiversity David G. Sulliva, Ph.D. Review: Method Frames Whe you make a method call, the Java rutime sets aside a block of memory kow as the frame of
More informationCMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago
CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device
More informationIntroduction to. Slides prepared by : Farzana Rahman 1
Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationData diverse software fault tolerance techniques
Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the
More informationThreads and Concurrency in Java: Part 2
Threads ad Cocurrecy i Java: Part 2 1 Waitig Sychroized methods itroduce oe kid of coordiatio betwee threads. Sometimes we eed a thread to wait util a specific coditio has arise. 2003--09 T. S. Norvell
More informationMultiprocessors. HPC Prof. Robert van Engelen
Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies
More informationAdvanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele
Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationby system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call
OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by
More informationLecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming
Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis
More informationToday s objectives. CSE401: Introduction to Compiler Construction. What is a compiler? Administrative Details. Why study compilers?
CSE401: Itroductio to Compiler Costructio Larry Ruzzo Sprig 2004 Today s objectives Admiistrative details Defie compilers ad why we study them Defie the high-level structure of compilers Associate specific
More informationOpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++
OpenMP OpenMP Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum 1997-2002 API for Fortran and C/C++ directives runtime routines environment variables www.openmp.org 1
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.
Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:
More informationOpenMP Application Program Interface
OpenMP Application Program Interface DRAFT Version.1.0-00a THIS IS A DRAFT AND NOT FOR PUBLICATION Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material
More informationChapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings
Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The
More informationGraphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)
Graphs Miimum Spaig Trees Slides by Rose Hoberma (CMU) Problem: Layig Telephoe Wire Cetral office 2 Wirig: Naïve Approach Cetral office Expesive! 3 Wirig: Better Approach Cetral office Miimize the total
More informationLecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein
068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig
More informationOutline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis
Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis
More informationChapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.
Chapter 8 Strigs ad Vectors Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Slide 8-3 8.1 A Array Type for Strigs A Array Type for Strigs C-strigs ca be used to represet strigs
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead
More informationMaster Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts
More informationChapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 8 Strigs ad Vectors Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Copyright 2015 Pearso Educatio, Ltd..
More informationOPENMP OPEN MULTI-PROCESSING
OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with
More information[Potentially] Your first parallel application
[Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel
More informationChapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig
More informationRunning Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The
More informationChapter 2. C++ Basics. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 2 C++ Basics Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 2.1 Variables ad Assigmets 2.2 Iput ad Output 2.3 Data Types ad Expressios 2.4 Simple Flow of Cotrol 2.5 Program
More informationWhat are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs
What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationRunning Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
More informationAnalysis of Algorithms
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The
More informationGCC Developers Summit Ottawa, Canada, June 2006
OpenMP Implementation in GCC Diego Novillo dnovillo@redhat.com Red Hat Canada GCC Developers Summit Ottawa, Canada, June 2006 OpenMP Language extensions for shared memory concurrency (C, C++ and Fortran)
More informationMR-2010I %MktBSize Macro 989. %MktBSize Macro
MR-2010I %MktBSize Macro 989 %MktBSize Macro The %MktBSize autocall macro suggests sizes for balaced icomplete block desigs (BIBDs). The sizes that it reports are sizes that meet ecessary but ot sufficiet
More informationCSC 220: Computer Organization Unit 11 Basic Computer Organization and Design
College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:
More informationBasic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.
5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator
More informationEE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering
EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors
More informationSolution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:
CS 604 Data Structures Midterm Sprig, 00 VIRG INIA POLYTECHNIC INSTITUTE AND STATE U T PROSI M UNI VERSI TY Istructios: Prit your ame i the space provided below. This examiatio is closed book ad closed
More informationEE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California
EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP
More informationParallel programming using OpenMP
Parallel programming using OpenMP Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department
More informationMapReduce and Hadoop. Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata. November 10, 2014
MapReduce ad Hadoop Debapriyo Majumdar Data Miig Fall 2014 Idia Statistical Istitute Kolkata November 10, 2014 Let s keep the itro short Moder data miig: process immese amout of data quickly Exploit parallelism
More informationChapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4
More informationHow do we evaluate algorithms?
F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:
More informationLecture 1: Introduction and Strassen s Algorithm
5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms
More informationCMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists
More informationData Structures and Algorithms. Analysis of Algorithms
Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output
More informationEE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control
EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,
More informationChapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,
More informationPseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured
More informationComputers and Scientific Thinking
Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 26 Ehaced Data Models: Itroductio to Active, Temporal, Spatial, Multimedia, ad Deductive Databases Copyright 2016 Ramez Elmasri ad Shamkat B.
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13
CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis
More informationHPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas
HPCSE - I «OpenMP Programming Model - Part I» Panos Hadjidoukas 1 Schedule and Goals 13.10.2017: OpenMP - part 1 study the basic features of OpenMP able to understand and write OpenMP programs 20.10.2017:
More informationCluster Computing Spring 2004 Paul A. Farrell
Cluster Computig Sprig 004 3/18/004 Parallel Programmig Overview Task Parallelism OS support for task parallelism Parameter Studies Domai Decompositio Sequece Matchig Work Assigmet Static schedulig Divide
More informationCS : Programming for Non-Majors, Summer 2007 Programming Project #3: Two Little Calculations Due by 12:00pm (noon) Wednesday June
CS 1313 010: Programmig for No-Majors, Summer 2007 Programmig Project #3: Two Little Calculatios Due by 12:00pm (oo) Wedesday Jue 27 2007 This third assigmet will give you experiece writig programs that
More informationMicro-benchmarks for Cluster OpenMP Implementations: Memory Consistency Costs
Micro-bechmarks for Cluster OpeMP Implemetatios: Memory Cosistecy Costs H. J. Wog, J. Cai, A. P. Redell, ad P. Strazdis Australia Natioal Uiversity, Caberra ACT 0200, Australia {ji.wog,jie.cai,alistair.redell,peter.strazdis}@au.edu.au
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19
CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.
More informationCOP4020 Programming Languages. Functional Programming Prof. Robert van Engelen
COP4020 Programmig Laguages Fuctioal Programmig Prof. Robert va Egele Overview What is fuctioal programmig? Historical origis of fuctioal programmig Fuctioal programmig today Cocepts of fuctioal programmig
More informationimplement language system
Outlie Priciples of programmig laguages Lecture 3 http://few.vu.l/~silvis/ppl/2007 Part I. Laguage systems Part II. Fuctioal programmig. First look at ML. Natalia Silvis-Cividjia e-mail: silvis@few.vu.l
More informationMore Advanced OpenMP. Saturday, January 30, 16
More Advanced OpenMP This is an abbreviated form of Tim Mattson s and Larry Meadow s (both at Intel) SC 08 tutorial located at http:// openmp.org/mp-documents/omp-hands-on-sc08.pdf All errors are my responsibility
More informationChapter 4 The Datapath
The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that
More informationIntroduction to SWARM Software and Algorithms for Running on Multicore Processors
Itroductio to SWARM Software ad Algorithms for Ruig o Multicore Processors David A. Bader Georgia Istitute of Techology http://www.cc.gatech.edu/~bader Tutorial compiled by Rucheek H. Sagai M.S. Studet,
More information