Parallel Programming with MPI

Similar documents
COSC 6385 Computer Architecture - Multi Processor Systems

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Clusters of SMP s. Sean Peisert

Real Parallel Computers

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

HPC Architectures. Types of resource currently in use

Real Parallel Computers

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory

Accelerating HPL on Heterogeneous GPU Clusters

Tech Computer Center Documentation

What are Clusters? Why Clusters? - a Short History

Cluster Network Products

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

C C V OpenMP V3.0. Dieter an Mey Center for Computing and Communication, RWTH Aachen University, Germany

Part 2: Computing and Networking Capacity (for research and instructional activities)

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Lecture 3: Intro to parallel machines and models

Parallel and High Performance Computing CSE 745

Multi-core Programming - Introduction

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Operational Robustness of Accelerator Aware MPI

High Performance Computing - Parallel Computers and Networks. Prof Matt Probert

Top500 Supercomputer list

The Use of Cloud Computing Resources in an HPC Environment

6.1 Multiprocessor Computing Environment

Introduction to High-Performance Computing

Introduction to Cluster Computing

Parallel Computing Platforms

COSC 6374 Parallel Computation. Parallel Computer Architectures

WHY PARALLEL PROCESSING? (CE-401)

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

COSC 6374 Parallel Computation. Parallel Computer Architectures

Practical Scientific Computing

CCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM

Issues in Multiprocessors

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Scaling with PGAS Languages

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

What does Heterogeneity bring?

When Will SystemC Replace gcc/g++? Bodo Parady, CTO Pentum Group Inc. Sunnyvale, CA

Imperial College London. Simon Burbidge 29 Sept 2016

High Performance Computing

A Distance Learning Tool for Teaching Parallel Computing 1

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

High Performance Computing (HPC) Introduction

Windows Compute Cluster Server 2003 allows MATLAB users to quickly and easily get up and running with distributed computing tools.

Parallel Programming with MPI: Day 1

Issues in Multiprocessors

[Scalasca] Tool Integrations

BİL 542 Parallel Computing

Parallel Computing: From Inexpensive Servers to Supercomputers

Introduction to Parallel Programming

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014

Introduction to Parallel Programming

Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures. Allison H. Baker, Todd Gamblin, Martin Schulz, and Ulrike Meier Yang

Practical Scientific Computing

Parallel Architecture. Hwansoo Han

Cray XE6 Performance Workshop

Readme for Platform Open Cluster Stack (OCS)

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters

Moore s Law. Computer architect goal Software developer assumption

Lecture 9: MIMD Architectures

First, the need for parallel processing and the limitations of uniprocessors are introduced.

Lecture 9: MIMD Architecture

A brief introduction to MPICH. Daniele D Agostino IMATI-CNR

Introduction II. Overview

Our new HPC-Cluster An overview

Accelerated Biological Meta- Data Generation and Indexing on the Cray XD1

Architecture Conscious Data Mining. Srinivasan Parthasarathy Data Mining Research Lab Ohio State University

CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 "

Comparing Linux Clusters for the Community Climate System Model

Parallel Computing. Hwansoo Han (SKKU)

The Ranger Virtual Workshop

Moore s Law. Computer architect goal Software developer assumption

Work Project Report: Benchmark for 100 Gbps Ethernet network analysis

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

Parallel Computers. c R. Leduc

High Performance Computing Course Notes Course Administration

Regional & National HPC resources available to UCSB

Parallel Computing Why & How?

CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 "

High Performance Computing Course Notes HPC Fundamentals

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Our Workshop Environment

QLogic TrueScale InfiniBand and Teraflop Simulations

Introduction to parallel computing with MPI

Compute Node Linux (CNL) The Evolution of a Compute OS

Support for Programming Reconfigurable Supercomputers

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters

High Performance Computing. University questions with solution

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor

GPUs and Emerging Architectures

Convergence of Parallel Architecture

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Transcription:

Parallel Programming with MPI Science and Technology Support Ohio Supercomputer Center 1224 Kinnear Road. Columbus, OH 43212 (614) 292-1800 oschelp@osc.edu http://www.osc.edu/supercomputing/

Functions - Scope of Activity Supercomputing. Computation, software, storage, and support services empower Ohio s scientists, engineers, faculty, students, businesses and other clients. Networking. Ohio s universities, colleges, K-12 and state government connect to the network. OSC also provides engineering services, video conferencing, and support through a 24x7 service desk. Research. Lead science and engineering projects, assist researchers with custom needs, partner with regional, national, and international researchers in groundbreaking initiatives, and develop new tools. Education. The Ralph Regula School of Computational Science delivers computational science training to students and companies across Ohio. EMPOWER. PARTNER. LEAD.

Table of Contents Setting the Stage Brief History of MPI MPI Program Structure What s in a Message? Point-to-Point Communication Non-Blocking Communication Derived Datatypes Collective Communication Virtual Topologies 3

Acknowledgments Edinburgh Parallel Computing Centre at the University of Edinburgh, for material on which this course is based Dr. David Ennis (formerly of the Ohio Supercomputer Center), who initially developed this course EMPOWER. PARTNER. LEAD.

Setting the Stage Overview of parallel computing Parallel architectures Parallel programming models Hardware Software 5

Overview of Parallel Computing In parallel computing, a program uses concurrency to either decrease the runtime needed to solve a problem increase the size of problem that can be solved Mainly this is a price/performance issue Vector machines (e.g. Cray X1e, NEC SX6) are very expensive to engineer and run. Massively parallel systems went out of vogue for a few years but are making a bit of a comeback at the very large scale (e.g. Cray XT-3/4, IBM BlueGene L/P). Parallel clusters with commodity hardware/software are the lion's share of HPC hardware today. 6

Writing a parallel application Decompose the problem into tasks Ideally, each task can be worked on independently of others Map tasks onto threads of execution Threads have shared and local data Shared: used by more than one thread Local: private to each thread Develop source code using some parallel programming environment Choices may depend on (among many things) hardware platform level of performance needed nature of the problem 7

Parallel architectures Distributed memory Each processor has local memory Cannot directly access the memory of other processors Shared memory Processors can directly reference memory attached to other processors Shared memory may be physically distributed Non-uniform memory architecture (NUMA) The cost to access remote memory may be high Several processors may sit on one memory bus (SMP) Hybrids are now very common, e.g. IBM e1350 Opteron Cluster: 965 compute nodes, each with 4-8 processor cores sharing 8-16 GB of memory High-speed Infiniband interconnect between nodes 8

Parallel programming models Distributed memory systems For processors to share data, the programmer must explicitly arrange for communication - Message Passing Message passing libraries: MPI ( Message Passing Interface ) PVM ( Parallel Virtual Machine ) Shmem, MPT (both Cray only) Shared memory systems Thread based programming Compiler directives (OpenMP; various proprietary systems) Can also do explicit message passing, of course 9

Parallel computing: Hardware In very good shape! Processors are cheap and powerful EM64T, Itanium, Opteron, POWER, PowerPC, Cell Theoretical performance approaching 10 GFLOP/sec or more SMP nodes with 8-32 processor cores are common Multicore chips becoming ubiquitous Clusters with hundreds of nodes are common. Affordable, high-performance interconnect technology is available. Systems with a few hundreds of processors and good inter-processor communication are not hard to build. 10

Parallel computing: Software Not as mature as the hardware The main obstacle to making use of all this power Perceived difficulties with writing parallel codes outweigh the benefits Emergence of standards is helping enormously MPI OpenMP Programming in a shared memory environment generally easier Often better performance using message passing Much like assembly language vs. C/Fortran 11

Brief History of MPI What is MPI? MPI forum Goals and scope of MPI MPI on OSC parallel platforms 12

What is MPI? Message Passing Interface What are the messages? DATA Allows data to be passed between processes in a distributed memory environment 13

MPI Forum Sixty people from forty different organizations International representation MPI 1.1 Standard developed from 1992-1994 MPI 2.0 Standard developed from 1995-1997 Recently resumed activity in 2008 Standards documents http://www.mcs.anl.gov/mpi http://www.mpi-forum.org/docs/docs.html 14

Goals and scope of MPI MPI s prime goals are: To provide source-code portability To allow efficient implementation It also offers: A great deal of functionality Support for heterogeneous parallel architectures 15

MPI on OSC platforms Itanium 2 cluster MPICH/ch_gm for Myrinet from Myricom (default) MPICH/ch_p4 for Gigabit Ethernet from Argonne Nat'l Lab SGI Altix MPICH/ch_p4 for shared memory from Argonne Nat'l Lab (default) SGI MPT Pentium 4 cluster MVAPICH for InfiniBand from OSU CSE (default) MPICH/ch_p4 for Gigabit Ethernet from Argonne Nat'l Lab BALE Opteron cluster MVAPICH for InfiniBand from OSU CSE (default) IBM e1350 Opteron cluster MVAPICH for InfiniBand from OSU CSE (default) MVAPICH2 for InfiniBand from OSU CSE (in progress) 16

Using MPI on the Systems at OSC Compile with the MPI wrapper scripts (mpicc, mpicc, mpif77, mpif90) Examples: $ mpicc myprog.c $ mpif90 myprog.f To run: In batch: mpiexec./a.out 17