Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008
Serial Computing Serially solving a problem
Parallel Computing Parallelly solving a problem
Parallel Computer Memory Architecture Shared Memory
Parallel Computer Memory Architecture Shared Memory Multiple processor can operate independently but share the same memory resources. Changes in a memory location effected by one processor are visible to all other processors. Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA.
Parallel Computer Memory Architecture Shared Memory - Advantages Global address space User-friendly programming perspective to memory Data sharing - fast and uniform Shared Memory Disadvantages Lack of scalability More CPUs increases traffic on the shared memory path Synchronization Programmer's responsibility
Parallel Computer Memory Architecture Distributed Memory
Parallel Computer Memory Architecture Distributed Memory Processors have their own local memory. Memory addresses in one processor do not map to another processor. No concept of global address space or cache coherency. Each processor can operate independently. Access to the data in another processor is done via communication.
Parallel Computer Memory Architecture Distributed Memory Advantages Memory scalable: number of processors is proportional to the size of memory. Rapid access to own memor: without interference and overhead to maintain cache coherency. Cost effectiveness: can use commodity, off-the-shelf processors and networking.
Parallel Computer Memory Architecture Distributed Memory Disadvantages Data communication among processors details are driven by programmers. Difficult to map existing data structures, based on global memory, to this memory organization. Non-uniform memory access (NUMA) times
Parallel Computer Memory Architecture Hybrid Distributed-Shared Memory
Parallel Programming Model Shared Memory OpenMP Threads OpenMP, POSIX Thread (kowns as Ptherad) Message Passing MPI Data Parallel F90, F95, HPF Hybrid OpenMP & MPI together
What is OpenMP? An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism Open specifications for Multi Processing Portable: Unix & Windows NT Standardized: Expected to become ANSI standard later???
Goals for OpenMP Standardization: Provide a standard among a variety of shared memory architecture/platforms. Lean & Mean : Establish a simple and limited set of directives for programming shared memory machines. Significant parallelism can be implemented by using just 3 or 4 directive. Ease of use: Provide capability to incrementally parallelize a serial program. Provide the capability to implement both coarse-grain and fine-grain parallelism. Portability: Supports Fortran (77, 90, and 95), C, and C++
OpenMp Directive PARALLEL construct Work-Sharing construct Combined Parallel Work-sharing construct Synchronization construct THREADPRIVATE Directive
OpenMP Directives PARALLEL Region construct A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct. #pragma omp parallel [clause...] newline if (scalar_expression) private (list) shared (list) default (shared none) firstprivate (list) reduction (operator: list) copyin (list) num_threads (integer-expression) structured_block
OpenMP Directives Work-Sharing Constructs Ddivides the execution of the enclosed code region among the members of the team that encounter it. Does not launch new threads No implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the end of a work sharing construct.
OpenMP Directive Work-Sharing construct Data parallelism Functionality Serialize parallelism section of code
OpenMP Directives Synchronization construct MASTER only master execute CRITICAL only one thread at a time BARRIER wait for all threads to reach barrier and then execute all together. ATOMIC specific memory location must be updated atomically ORDERED iterations of loop will be executed in the same order as in the serial processor.
OpenMP Directives Synchronization construct MASTER only master execute CRITICAL only one thread at a time BARRIER wait for all threads to reach barrier and then execute all together. ATOMIC specific memory location must be updated atomically ORDERED iterations of loop will be executed in the same order as in the serial processor.
What is MPI? M P I = Message Passing Interface MPI is a specification, NOT a library - but rather the specification of what such a library should be. Practical Portable Efficient Flexible Defined for C/C++ and Fortran
Reasons for using MPI Standardization - Only message passing library which can be considered a standard. Portability - No need to modify your source code when you port your application to a different platform that supports (and is compliant with) the MPI standard Functionality - Over 115 routines are defined in MPI-1 alone. Availability - A variety of implementations are available, both vendor and public domain.
General MPI program structure
Virtual Topology Describes a mapping/ordering of MPI processes into a geometric "shape". 1 D Line Ring 2 D Mesh Torus 3 D 3 D Mesh Hypercube
MPI / OpenMP Pro - MPI Pro Open MP Portable to distributed and shared memory machines. Possible high performance Scales beyond one node No data placement problem Implicit synchronization Easy to implement parallelism Low latency, high bandwidth Implicit Communication Coarse and fine granularity Dynamic load balancing
MPI / OpenMP Cons - MPI Cons Open MP Difficult to develop and debug. Only on shared memory machines High latency, low bandwidth Explicit communication Large granularity Average performance Scale within one node Possible data placement problem Dynamic load balancing is difficult Explicit synchronization No specific thread order
Thank you.