Practical in Numerical Astronomy, SS 2012 LECTURE 12

Similar documents
OpenMP+F90 p OpenMP+F90

Introduction [1] 1. Directives [2] 7

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Parallel Programming in Fortran 95 using OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Introduction to. Slides prepared by : Farzana Rahman 1

CSL 860: Modern Parallel

OpenMP on Ranger and Stampede (with Labs)

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Parallel Programming

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

Shared Memory Programming with OpenMP

Introduction to OpenMP

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Data Handling in OpenMP

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

AMath 483/583 Lecture 14

Shared Memory programming paradigm: openmp

Shared Memory Parallelism - OpenMP

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

15-418, Spring 2008 OpenMP: A Short Introduction

OpenMP - Introduction

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Alfio Lazzaro: Introduction to OpenMP

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

An Introduction to OpenMP

Introduction to OpenMP

Introduction to Standard OpenMP 3.1

Parallel and Distributed Programming. OpenMP

Programming Shared-memory Platforms with OpenMP. Xu Liu

Introduction to Programming with OpenMP

Shared memory parallelism

Introduction to OpenMP

Parallel Programming: OpenMP + FORTRAN

Distributed Systems + Middleware Concurrent Programming with OpenMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Multithreading in C with OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Parallel Programming

Shared memory programming

Introduction to OpenMP

OpenMP Shared Memory Programming

Speeding Up Reactive Transport Code Using OpenMP. OpenMP

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Lecture 4: OpenMP Open Multi-Processing

Session 4: Parallel Programming with OpenMP

OpenMP Library Functions and Environmental Variables. Most of the library functions are used for querying or managing the threading environment

[Potentially] Your first parallel application

Shared Memory Parallelism using OpenMP

Parallel Programming: OpenMP

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Introduction to OpenMP. Lecture 4: Work sharing directives

OpenMP threading: parallel regions. Paolo Burgio

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

OpenMP Application Program Interface

Allows program to be incrementally parallelized

Introduction to OpenMP

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

OPENMP OPEN MULTI-PROCESSING

Introduction to OpenMP

Introduction to OpenMP

Introduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

OpenMP: Open Multiprocessing

OpenMP Fundamentals Fork-join model and data environment

Open Multi-Processing: Basic Course

Using OpenMP. Rebecca Hartman-Baker Oak Ridge National Laboratory

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Multi-core Architecture and Programming

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

INTRODUCTION TO OPENMP & OPENACC. Kadin Tseng Boston University Research Computing Services

Data Environment: Default storage attributes

!OMP #pragma opm _OPENMP

Parallel Computing. Prof. Marco Bertini

OpenMP Lab on Nested Parallelism and Tasks

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Parallel Programming in C with MPI and OpenMP

A brief introduction to OpenMP

Programming Shared-memory Platforms with OpenMP

Introduction to OpenMP.

Transcription:

Practical in Numerical Astronomy, SS 2012 LECTURE 12 Parallelization II. Open Multiprocessing (OpenMP) Lecturer Eduard Vorobyov. Email: eduard.vorobiev@univie.ac.at, raum 006.6 1

OpenMP is a shared memory parallelism. It is designed for the SMP (symmetric multuprocessing) machines. Wikipedia: symmetric multiprocessing involves a multi-processor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. MPI is a distributed memory parallelism. It is designed for computer clusters with distributed memory. Wikipedia: distributed memory refers to a multiple-processor computer systems in which each processor has its own private memory. 2

Basic idea fork- join programming model 0 Serial region 1 2 Serial region Serial region 3 4 1) The code starts as serial (non-parallel) and it has only one master thread. 2) The master thread is forked into N threads where a parallel region is encountered (in this example four additional threads 1, 2, 3, 4 are created). Thread 0 remains the master of all five threads. 3) Each thread executes part of the code in parallel with other threads. 4) Upon completion of the parallel region, threads and joined into one master thread, which continues execution in the serial region. 5) Calculations continue in the serial mode until a new parallel region is reached. 3

Parallelizing a serial code using OpenMP directives. The OpenMP standard offers the possibility of using the same source code with and without OpenMP parallelization (the MPI standard does not do this!). This can only be achieved by hiding the OpenMP directives and commands in such a way, that a normal compiler is unable to see them. For that purpose the following directive sentinel is introduced:!$omp Since the first character is an exclamation mark!, a normal compiler will interpret the line as a comment and will neglect its content. But an OpenMP-compliant compiler will identify the complete sequence and will execute commands that follow:!$omp PARALLEL DEFAULT(shared) PRIVATE(C, D) REDUCTION(+:a) 4

Making the FORTRAN compile recognize OpenMP directives. In order for the FOTRAN compiler to recognize OpenMP directives, one needs to compile the source code with a specific flag, which may be compiler-dependent and tells the compiler to link specific OpenMP libraries. GNU Fortran compiler gfortran -fopenmp Intel Fortran compiler ifort -openmp PGI Fortran compiler pgf90 -mp Note that when using OpenMP all local arrays will be allocated on the stack. When porting existing code to OpenMP, this may lead to surprising results, especially to segmentation faults if the stacksize is limited. 5

Setting the number of threads in a parallel region The number of threads can be set by environment variables In BASH shell: export OMP_NUM_THREADS = 8 In TCSH shell setenv OMP_NUM_THREADS = 8 Environment variables affect all OpenMP codes that are run from a given terminal. 6

The number of threads can also be set by OpenMP library calls Subroutine OMPsetup integer omp_get_num_threads, omp_get_max_threads, omp_get_num_procs Call OMP_SET_NUM_Threads(8)! Sets number of threads to 8!$OMP parallel! Parallel region starts here!$omp master! The following commands will be executed only by the master thread print(*,*) 'num threads=', omp_get_num_threads()! Number of executing threads print(*,*) 'max threads=', omp_get_max_threads()! Maximum possible number of threads print(*,*) 'max cpus=', omp_get_num_procs()! Available number of processors!$omp end master!$omp end parallel! End of parallel regions End subroutine OMPSetup Note that OMP_SET_NUM_Threads is called from a serial part of the code. The library call to OMP_SET_NUM_Threads supersedes the environment variable OMP_NUM_THREADS. 7

The PARALLEL construct The most important directive in OpenMP is the one in charge of defining the so called parallel regions. Such a region is a block of code that is going to be executed by multiple threads running in parallel. Since a parallel region needs to be created/opened and destroyed/closed, two directives are necessary, forming a so called directive-pair:!$omp parallel --!$OMP end parallel.... serial code..!$omp parallel write(*,*) "Hello"!$OMP end parallel... serial code.. Parallel code Since the code enclosed between the two directives is executed by each thread, the message Hello appears in the screen as many times as threads are being used in the parallel region. Before and after the parallel region, the code is executed by only one thread, which is the normal behavior in serial programs. 8

Parallelizing a DO loop. PRIVATE clause Serial DO loop Integer k Do k = 1, 1000.. end do Parallel DO loop Integer k Call OMP_SET_NUM_Threads(2)!$OMP parallel do private(k) Do k = 1, 1000.. end do!$omp end parallel do thread 0 thread 0 thread 1 Do k = 1, 1000 Do k = 1, 500 Do k = 501, 1000 Master thread does all the job Each thread computes part of the global DO loop Note that the same counter variable k has different values in each thread in the parallelized DO loop! To avoid memory conflicts, two copies of variable k need to be created in the memory. The clause PRIVATE(k) tells the compiler that each thread needs to have its own copy of the variable k. The PRIVATE clause can be very resource consuming Variables should be declared private only if they are modified inside the DO loop. Upon entering and after leaving the parallel DO loop, variable k is undefined (in the serial DO loop, k=1000, after leaving the loop). 9

Program example Implicit NONE Shared variables. The SHARED clause In contrast to the previous situation, sometimes there are variables which should be available to all threads inside the DO-loop because their values are needed by all threads or because all threads have to update their values. Call omp_set_num_threads(4)! Setting the number of threads to 4 Integer, parameter :: n=10 Integer i Real b Real, dimension(n) :: a!$omp parallel do shared(a,n) private(i,b)! Parallel DO loop begins here Do i=1, n b = i + 1 a(i) = b End do!$omp end parallel do! Parallel DO loop ends here end In this example, an array variable a(i), variable b, and counter i are modified inside the DO loop. However, each iteration of the loop accesses different elements of the array a(i). Therefore, one need not to create separate copies of array a(i). Such variables are declared as SHARED. Use shared when: a variable is not modified in the loop (as, e.g., n) a variable is an array in which each iteration of the loop accesses a different element. 10

Other DO loop clauses FIRSTPRIVATE(list) LASTPRIVATE(list) REDUCTION(operator:list) SCHEDULE(type, chunk) ORDERED DEFAULT FIRSTPRIVATE clause. Private variables have an undefined value after entering the parallel do construct. But sometimes it is of interest that these local variables have the value of the original variable in the serial part of the code. This is achieved by including the variable in a FIRSTPRIVATE clause as follows: Integer a, b a = 2 b = 1!$OMP parallel do private(a) firstprivate(b)!$omp end parallel do In this example, variable a has an undefined value at the beginning of the parallel region, while b has the value specified in the preceding serial region, namely b = 1. 11

LASTPRIVATE clause Private variables have an undefined value after leaving the parallel do construct. This is sometimes not convenient. By including a variable in a LASTPRIVATE clause, the original variable will be updated by the last value it gets inside the parallel DO-loop, if this DOloop would be executed in serial mode. For example: Integer i, a!$omp do private(i) lastprivate(a) do i = 1, 1000 a = i End do!$omp end do After the finishing of the parallel DO loop, the variable a will be equal to 1000, which is the value it would have, if the OpenMP directive would not exist. 12

The REDUCTION clause Integer i, a do i = 1, 1000 a = a + i enddo wrong OmpenMP parallelization!!$omp parallel do private(i) shared (a) do i = 1, 1000 a = a + i enddo!$omp end do When a variable has been declared as SHARED because all threads need to modify its value, it is necessary to ensure that only one thread at a time is writing/updating the memory location of the considered variable, otherwise unpredictable results will occur. By using the clause REDUCTION it is possible to solve this problem, since only one thread at a time is allowed to update the value of a, ensuring that the final result will be the correct one.!$omp parallel do reduction(+:a) do i = 1, 1000 a = a + i Endd o!$omp end parallel do 13

General syntax of the REDUCTION clause REDUCTION(operator or intrinsic function : variable list) Initialization rules for variables in variable list A private copy of each variable in variable list is created for each thread as if the PRIVATE clause had been used. The resulting private copies are initialized following the rules shown in the Table. At the end of the REDUCTION, the shared variable is updated to reflect the result of combining the final value of each of the private copies using the specified operator. 14

The SCHEDULE clause. Load balancing. Call omp_set_num_threads(4)!$omp parallel do private(k) shared(n) Do k=1,n.. End do!$omp end parallel do When a do-loop is parallelized and its iterations distributed over the different threads, the most simple way of doing this is by giving to each thread the same number of iterations: n/4. But this is not always the best choice, since the computational cost of the iterations may not be equal for all of them. Therefore, different ways of distributing the iterations exist. The SCHEDULE clause is meant to allow the programmer to specify the scheduling for each do-loop using the following syntax: Call omp_set_num_threads(4)!$omp parallel do private(k) shared(n) schedule(type, chunk) Do k=1,n.. End do!$omp end parallel do 15

The SCHEDULE clause accepts two parameters. The first one, type, specifies the way in which the work is distributed over the threads. The second one, chunk, is an optional parameter specifying the size of the work given to each thread.. STATIC :when this option is specified, the pieces of work created from the iteration space of the do-loop are distributed over the threads in the team following the order of their thread identification number. This assignment of work is done at the beginning of the do-loop and stays fixed during its execution. Number of threads = 3 and the DO-loop iteration space k=1, 600 No value of chunk is specified. Best choice in most cases. 16

When SCHEDULE(DYNAMIC,chunk) is specified, the iteration space is divided into pieces of work with a size equal to chunk. If this optional parameter is not given, then a size equal to one iteration is considered. Thereafter, each thread gets one of these pieces of work. When they have finished with their task, they get assigned a new one until no pieces of work are left. Example of dynamic scheduling See also: GUIDED and RUNTIME clauses 17

The ODERED clause. Eliminating the race condition. Program race_condition Integer i Integer, dimension(5) :: a,b a=1 b=2 Call omp_set_num_threads(2)!$omp parallel do private(i) shared(a,b) Do i=1,4 a(i+1)=a(i)+b(i) End do!$omp end parallel do end Thread 0 a(2) = a(1)+b(1) Thread 0 a(3) = a(2)+b(2) Thread 1 a(4) = a(3)+b(3) Thread 1 a(5) = a(4)+b(4) We have a data dependency between iterations, causing a so-called race condition P R O B L E M A solution is to use the ORDERED clause, which tell the compiler that some statements in the DO-loop need to be executed sequentially. 18

Program no_race_condition Integer i Integer, dimension(5) :: a,b a=1 b=2 Call omp_set_num_threads(2)!$omp parallel do private(i) shared(a,b) ordered Do i=1,4!$omp ordered a(i+1)=a(i)+b(i)!$omp end ordered End do!$omp end parallel do end In this case, the threads do not run in parallel. DEFAULT( PRIVATE SHARED NONE ) clause When most of the variables used inside the DO-loop are going to be private/shared, then it would be cumbersome to include all of them in one of the previous clauses. To avoid this, it is possible to specify what OpenMP has to do, when nothing is said about a specific variable: it is possible to specify a default setting. For example!$omp parallel do default(private) shared(a) 19

Parallelization of implicit DO-loops. WORKSHARE construct. FORTRAN 90 array operations include implicit DO-loops and can be parallelized by the WORKSHARE construct serial code real, dimension (10):: a, b, c.. a = 5.0 * cos(a) + 4.0 * sin(a).. parallelized code real, dimension (10):: a, b, c..!$omp parallel workshare a = 5.0 * cos(a) + 4.0 * sin(a)!$omp end parallel workshare.. Not all compilers support parallelization of FORTRAN 90 array operations! 20

Parallelization of nested DO-loops When several nested do-loops are present, it is always convenient to parallelize the outer most one, since then the amount of work distributed over the different threads is maximal. Also the number of times in which the!$omp parallel do --!$OMP end parallel do directive pair effectively acts is minimal, which implies a minimal overhead due to the OpenMP directive. do i = 1, 10 do j = 1, 10!$OMP parallel do private(k) shared(a,j,i) do k = 1, 10 A(k,j,i) = i * j * k end do!$omp end parallel do end do end do!$omp parallel do private(i,j,k) shared(a) do i = 1, 10 do j = 1, 10 do k = 1, 10 A(k,j,i) = i * j * k end do end do end do!$omp end parallel do the work to be computed in parallel is distributed i *j = 100 times and each thread gets less than 10 iterations to compute, since only the innermost do- loop is parallelized. the work to be computed in parallel is distributed only once and the work given to each thread has at least j*k = 100 iterations. Therefore, in this second case a better performance of the parallelization is to expect. 21

The SECTIONS construct The SECTIONS construct allows to assign to each thread a completely different task leading to an MPMD 1 model of execution. Each section of code is executed once and only once by a thread in the team. The syntax of this construct is the following one:!$omp parallel sections clause1 clause2...!$omp section... code executed by one thread!$omp section... code executed by another thread!$omp end parallel sections Each block of code, to be executed by one of the threads, starts with an!$omp SECTION directive and extends until the same directive is found again or until the closing-directive!$omp END SECTIONS is found. Any number of sections can be defined inside the present directive-pair, but only the existing number of threads is used to distribute the different blocks of code. This means, that if the number of sections is larger than the number of available threads, then some threads will execute more than one section of code in a serial fashion. Allowed clauses: PRIVATE, FIRSTPRIVATE, LASTPRIVATE, REDUCTION 1 MPMD stands for Multiple Programs Multiple Data and refers to the case of having completely different programs/tasks which share or interchange information and which are running simultaneously on different processors. 22

Calling serial subroutines inside a parallel region. SINGLE construct. integer, dimension(0:3) :: a = 99 integer :: i_am Call omp_set_num_threads(4)!$omp parallel private(i_am) shared(a) i_am = omp_get_thread_num() call work(a, i_am)!$omp single print*, 'a = ', a!$omp end single!$omp end parallel subroutine work(a, i_am) integer, dimension(0:3) :: a! becomes shared integer :: i_am! becomes private print*, 'work', i_am a(i_am) = i_am end subroutine work Dummy arguments inherit the data-sharing attributes of the associated actual arguments. The code enclosed in the SINGLE construct is only executed by one of the threads in the team, namely the one who first arrives to the opening-directive!$omp SINGLE. All the remaining threads wait at the implied synchronization in the closing-directive!$omp END SINGLE. Result of execution work 1 work 3 a = 99, 1, 99, 3 work 2 work 0 What went wrong? The SINGLE construct was executed by one of the threads (1 or 3) before threads 2 and 0 completed execution of subroutine work. 23

integer, dimension(0:3) :: a = 99 integer :: i_am Call omp_set_num_threads(4)!$omp parallel private(i_am) shared(a) i_am = omp_get_thread_num() call work(a, i_am)!$omp barrier! All threads wait at the barrier!$omp single print*, 'a = ', a!$omp end single!$omp end parallel subroutine work(a, i_am) integer, dimension(0:3) :: a! becomes shared integer :: i_am! becomes private print*, 'work', i_am a(i_am) = i_am end subroutine work Result of execution work 1 work 3 work 2 work 0 a = 0, 1, 2, 3 The BARRIER directive represents an explicit synchronization between the different threads in the team. When encountered, each thread waits until all the other threads have reached this point. 24

Calling parallel subroutines inside a parallel region. Call omp_set_num_threads(2)!$omp parallel shared(s) private(p)!$omp do private(j) do j = 1, 10... end do!$omp end do call sub(s, p)!$omp end parallel... end subroutine sub(s, p) integer :: s! shared integer :: p! private integer :: var, k! local variables are private!$omp do private(k) do k = 1, 10...! Thread 0 will do the first 5 iterations...! Thread 1 will do the last 5 iterations end do!$omp end do do k = 1, 10...! All threads will do full 10 iterations end do A PARALLEL directive dynamically inside another PARALLEL directive logically establishes a new team, which is composed of only the current thread, unless nested parallelism is established. We say that the loop is serialized. All threads perform six iterations each.!$omp parallel do private(k) do k = 1, 10...! A PARALLEL directive inside! another PARALLEL directive end do!$omp end parallel do end 25

The MASTER and CRITICAL constructs The code enclosed inside the MASTER construct is executed only by the master thread of the team. Meanwhile, all the other threads continue with their work. The syntax is as follows:!$omp master...!$omp end master In essence, this construct is similar to using the!$omp single --!$OMP end single construct presented before, only that the thread to execute the block of code is forced to be the master one instead of the first arriving one. The CRITICAL construct restricts the access to the enclosed code to only one thread at a time. Examples of application of this directive-pair could be to read an input from the keyboard/file or to update the value of a shared variable. The syntax is the following one:!$omp critical...!$omp end critical When a thread reaches the beginning of a critical section, it waits there until no other thread is executing the code in the critical section. 26

The THREADPRIVATE construct Sometimes it is of interest to have global variables, but with values which are specific for each thread. An example could be a variable called my_id, which stores the thread identification number of each thread: this number will be different for each thread, but it would be useful that its value is accessible from everywhere inside each thread and that its value does not change from one parallel region to the next. When the program enters the first parallel region, a private copy of each variable marked as THREADPRIVATE is created for each thread. integer, save :: my_id! Variable must have a SAVE attribute!$omp threadprivate(my_id)!$omp parallel my_id = OMP_get_thread_num()! Thread number is assigned to my_id!$omp end parallel..!$omp parallel...!$omp end parallel. In this example, the variable my_id gets assigned the thread identification number of each thread during the first parallel region. In the second parallel region, the variable my_id keeps the values assigned to it in the first parallel region, since it is THREADPRIVATE. 27

OpenMP runtime library overview OpenMP Fortran library routines are external functions Their names start with OMP_ but usually have an integer or logical return type These functions must be declared explicitly Name omp_set_num_threads omp_get_num_threads omp_get_max_threads omp_get_thread_num omp_get_num_procs omp_in_parallel omp_set_dynamic omp_get_dynamic omp_set_nested omp_get_nested Functionality Set number of threads Return number of threads in team Return maximum number of threads Get thread ID Return maximum number of processors Check whether in parallel region Activate dynamic thread adjustment Check for dynamic thread adjustment Activate nested parallelism Check for nested parallelism 28

References : www.openmp.org OpenMP Application Program Interface Version 3.0 May 2008 ALSO various web resources and books 29

Assignment 9 (five extra points) Parallelize your version of the Sedov test problem (or Sod shock tube problem) using OpenMP directives. (see Nigel s lecture on hyperbolic equations, Assignment 6). Use sufficiently high resolution so that the serial code would run 1 minute minimum. Use different number of threads (2, 4, max) Calculate the speedup for variable number of threads (2, 4, max) relative to the purely serial code. (use time./your _code in Linux to calculate the run time of your code) The report is due on 12.07.2012. 30