Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Similar documents
Parallel and High Performance Computing CSE 745

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Parallel Programming Environments. Presented By: Anand Saoji Yogesh Patel

Introduction to Parallel Computing

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Introduction to Parallel Computing

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

High Performance Computing. Introduction to Parallel Computing

6.1 Multiprocessor Computing Environment

Introduction to Parallel Computing

Chapter 3 Parallel Software

Parallel programming models. Main weapons

An Introduction to OpenMP

Seminar Series on High Performance Computing:

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Introduction to Parallel Programming. Tuesday, April 17, 12

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

OPERATING SYSTEM. Chapter 4: Threads

Parallel Programming Libraries and implementations

Introduction to Parallel Computing. Yao-Yuan Chuang

Example of a Parallel Algorithm

Chapter 4: Threads. Chapter 4: Threads

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Distributed Systems CS /640

Chapter 4: Multithreaded Programming

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Implementation of Parallelization

Introduction to Parallel Programming

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Chapter 4: Multi-Threaded Programming

A brief introduction to OpenMP

Chapter 4: Multithreaded Programming. Operating System Concepts 8 th Edition,

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

WhatÕs New in the Message-Passing Toolkit

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

Chapter 4: Threads. Operating System Concepts 9 th Edition

POSIX Threads Programming

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014

CS691/SC791: Parallel & Distributed Computing

Accelerated Library Framework for Hybrid-x86

High performance computing and numerical modeling

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Parallel Programming. Libraries and Implementations

CS420: Operating Systems

Design of Parallel Programs Algoritmi e Calcolo Parallelo. Daniele Loiacono

Parallel Computing Introduction

Introduction to parallel Computing

Introduction to Parallel Computing

Introduction to Parallel Programming

Questions from last time

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-

Chapter 4: Threads. Operating System Concepts 9 th Edition

Introduction to Multicore Programming

Types of Parallel Computers

Parallel Paradigms & Programming Models. Lectured by: Pham Tran Vu Prepared by: Thoai Nam

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

CSE 4/521 Introduction to Operating Systems

Page 1. Analogy: Problems: Operating Systems Lecture 7. Operating Systems Lecture 7

CUDA GPGPU Workshop 2012

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Real Time and Embedded Systems. by Dr. Lesley Shannon Course Website:

Overview: The OpenMP Programming Model

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

Shared-Memory Parallelization of an Interval Equations Systems Solver Comparison of Tools

Chapter 4: Multithreaded

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

New Programming Paradigms: Partitioned Global Address Space Languages

Shared Memory Programming with OpenMP (3)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

15-418, Spring 2008 OpenMP: A Short Introduction

Parallel Computing. Hwansoo Han (SKKU)

Moore s Law. Computer architect goal Software developer assumption

Introduction to Multicore Programming

Trends and Challenges in Multicore Programming

Operating Systems. Operating System Structure. Lecture 2 Michael O Boyle

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

CSCI 4717 Computer Architecture

Data Model Considerations for Radar Systems

Bulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model

Programming with Shared Memory. Nguyễn Quang Hùng

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Chapter 4: Threads. Chapter 4: Threads

OpenMP: Open Multiprocessing

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Shared Memory programming paradigm: openmp

Multiprocessors 2007/2008

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Parallel Programming with OpenMP

Co-arrays to be included in the Fortran 2008 Standard

Modern Processor Architectures. L25: Modern Compiler Design

High Performance Computing

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Transcription:

Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program Multiple Data (MPMD) Parallel programming models exist as an abstraction above hardware and memory architectures. These models are NOT specific to a particular type of machine or memory architecture. Any of these models can be implemented on any underlying hardware. Shared Memory Model (without threads) In this programming model, tasks share a common address space, which they read and write to asynchronously. Various mechanisms such as locks may be used to control access to the shared memory. An advantage is that the notion of data "ownership" is lacking, so there is no need to specify explicitly the communication of data between tasks. Shared Memory Model (without threads) An important disadvantage is that it becomes more difficult to understand and manage data locality. Keeping data local to the processor that works on it conserves memory accesses, cache refreshes and bus traffic that occurs when multiple processors use the same data. Unfortunately, controlling data locality is hard to understand and beyond the control of the average user. Threads Model Native compilers and/or hardware translate user program variables into actual memory addresses, which are global. On stand-alone machines, this is straightforward. On distributed shared memory machines, such as the SGI Origin, memory is physically distributed across a network of machines, but made global through specialized hardware and software. This is a type of shared memory programming. In this model, a single process can have multiple, concurrent execution paths. An analogy that can be used to describe threads is the concept of a single program that includes a number of subroutines: 1

The main program a.out is scheduled to run by the native operating system. a.out loads and acquires all of the necessary system and user resources to run. a.out performs some serial work, and then creates a number of tasks (threads) that can be scheduled and run by the operating system concurrently. Each thread has local data, but also, shares the entire resources of a.out. This saves the overhead associated with replicating a program's resources for each thread. Each thread also benefits from a global memory view because it shares the memory space of a.out. A thread's work may best be described as a subroutine within the main program. Any thread can execute any subroutine at the same time as other threads. Threads communicate with each other through global memory (updating address locations). : This requires synchronization constructs to ensure that more than one thread is not updating the same global address at any time. Threads can come and go, but a.out remains present to provide the necessary shared resources until the application has completed. From a programming perspective, threads implementations commonly comprise: A library of subroutines that are called from within parallel source code A set of compiler directives imbedded in either serial or parallel source code 2

: : In both cases, the programmer is responsible for determining all parallelism. Threaded implementations are not new in computing. Historically, hardware vendors have implemented their own proprietary versions of threads. These implementations differed substantially from each other making it difficult for programmers to develop portable threaded applications. Unrelated standardization efforts have resulted in two very different implementations of threads: POSIX Threads and OpenMP. POSIX Threads POSIX Threads Library based; requires parallel coding Specified by the IEEE POSIX 1003.1c standard (1995). C Language only Commonly referred to as Pthreads. Most hardware vendors now offer Pthreads in addition to their proprietary threads implementations. Very explicit parallelism; requires significant programmer attention to detail. OpenMP OpenMP Compiler directive based; can use serial code Jointly defined and endorsed by a group of major computer hardware and software vendors. The OpenMP Fortran API was released October 28, 1997. The C/C++ API was released in late 1998. Portable / multi-platform, including Unix and Windows NT platforms Available in C/C++ and Fortran implementations Can be very easy and simple to use - provides for "incremental parallelism" Microsoft has its own implementation for threads, which is not related to the UNIX POSIX standard or OpenMP. 3

Distributed Memory / Message Passing Model This model demonstrates the following characteristics: A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines. Distributed Memory / Message Passing Model Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation. Distributed Memory / Message Passing Model From a programming perspective, message passing implementations usually comprise a library of subroutines. Calls to these subroutines are imbedded in source code. The programmer is responsible for determining all parallelism. Historically, a variety of message passing libraries have been available since the 1980s. These implementations differed substantially from each other making it difficult for programmers to develop portable applications. In 1992, the MPI Forum was formed with the primary goal of establishing a standard interface for message passing implementations. Part 1 of the Message Passing Interface (MPI) was released in 1994. Part 2 (MPI-2) was released in 1996. MPI is now the "de facto" industry standard for message passing, replacing virtually all other message passing implementations used for production work. 4

Data Parallel Model MPI implementations exist for virtually all popular parallel computing platforms. Not all implementations include everything in both MPI1 and MPI2. The data parallel model demonstrates the following characteristics: Most of the parallel work focuses on performing operations on a data set. The data set is typically organized into a common structure, such as an array or cube. A set of tasks work collectively on the same data structure, however, each task works on a different partition of the same data structure. Data Parallel Model Data Parallel Model Tasks perform the same operation on their partition of work, for example, "add 4 to every array element". On shared memory architectures, all tasks may have access to the data structure through global memory. On distributed memory architectures the data structure is split up and resides as "chunks" in the local memory of each task. Hybrid Model Programming with the data parallel model is usually accomplished by writing a program with data parallel constructs. The constructs can be calls to a data parallel subroutine library or, compiler directives recognized by a data parallel compiler. A hybrid model combines more than one of the previously described programming models. Currently, a common example of a hybrid model is the combination of the message passing model (MPI) with the threads model (OpenMP). Threads perform computationally intensive kernels using local, on-node data Communications between processes on different nodes occurs over the network using MPI 5

Hybrid Model Hybrid Model This model lends itself well to the increasingly common hardware environment of clustered multi/many-core machines. Another similar and increasingly popular example of a hybrid model is using MPI with GPU (Graphics Processing Unit) programming. GPUs perform computationally intensive kernels using local, on-node data Communications between processes on different nodes occurs over the network using MPI Single Program Multiple Data (SPMD): SPMD is actually a "high level" programming model that can be built upon any combination of the previously mentioned parallel programming models. SINGLE PROGRAM: All tasks execute their copy of the same program simultaneously. This program can be threads, message passing, data parallel or hybrid. SPMD MULTIPLE DATA: All tasks may use different data SPMD programs usually have the necessary logic programmed into them to allow different tasks to branch or conditionally execute only those parts of the program they are designed to execute. SPMD SPMD That is, tasks do not necessarily have to execute the entire program perhaps only a portion of it. The SPMD model, using message passing or hybrid programming, is probably the most commonly used parallel programming model for multi-node clusters 6

Multiple Program Multiple Data (MPMD): Like SPMD, MPMD is actually a "high level" programming model that can be built upon any combination of the previously mentioned parallel programming models. MULTIPLE PROGRAM: Tasks may execute different programs simultaneously. The programs can be threads, message passing, data parallel or hybrid. MPMD MULTIPLE DATA: All tasks may use different data MPMD applications are not as common as SPMD applications, but may be better suited for certain types of problems, particularly those that lend themselves better to functional decomposition than domain decomposition. MPMD 7