Portable, MPI-Interoperable! Coarray Fortran

Size: px
Start display at page:

Download "Portable, MPI-Interoperable! Coarray Fortran"

Transcription

1 Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer Science Division! Argonne National Laboratory! Argonne, IL

2 Parallel Programming Models Shared memory! Pthread, OpenMP,! easy to program! hard to scale Message passing! MPI,! scalable! hard to program Process / Thread Process / Thread Process / Thread Process / Thread Process / Thread Process / Thread Memory Memory Memory Memory Interconnect 2

3 Partitioned Global Address Space Languages Combine the strength! shared memory models! Easy to program! Process / Thread Process / Thread Process / Thread message passing models! Scalable! Example PGAS Languages! Unified Parallel C (C)! Private Memory Memory Memory Titanium (Java)! Coarray Fortran (Fortran)! Public Memory Memory Shared Memory Related efforts! X10 (IBM)! Chapel (Cray)! Language level shared memory abstraction Fortress (Oracle) 3

4 Current status The dominance of Message Passing Interface (MPI)! Most applications on clusters are written using MPI! Many high-level libraries on clusters are built with MPI! Why people do not adopt PGAS languages?! Most PGAS languages are built with a different runtime system e.g. GASNet! Hard to adopt new programming models in existing applications incrementally 4

5 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 P1 5

6 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 Put P1 5

7 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 Put P1 Barrier 5

8 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 Put P1 Barrier 5

9 Problem 2: duplicates resources Memory usage of comm. runtimes Mapped Memory Size (MB) GASNet-only MPI-only Duplicate Runtimes Memory usage per process increases as the number of processes increases! At larger scale, excessive memory use of duplicate runtimes will hurt scalability Number of processes 6

10 The problem PGAS languages DO NOT play well with MPI! program may deadlock using MPI and other runtimes! program unnecessarily uses duplicated resources 7

11 The problem PGAS languages DO NOT play well with MPI! program may deadlock using MPI and other runtimes! program unnecessarily uses duplicated resources The solution Build PGAS language runtimes with MPI 7

12 Why people haven t done it? Previously MPI was considered insufficient for this goal* MPI-3 adds extensive support for Remote Memory Access (RMA) Separate Window (MPI-2) Unified window (MPI-3) Remote write Public Private MPI Window Local write Remote write Unified MPI Window Local write *: D. Bonachea and J. Duell. Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. Int. J. High Perform. Comput. Netw., 1(1-3):91 99, Aug

13 Question to investigate Build PGAS runtimes with MPI-3! Does it provide full interoperability?! Does it degrade performance? 9

14 Question to investigate Build PGAS runtimes with MPI-3! Does it provide full interoperability?! Does it degrade performance? Approach Build a PGAS language (Coarray Fortran) with MPI-3 and evaluate its interoperability and performance 9

15 Coarray in Fortran 2008 (CAF) Fortran 2008 Standard contains features for parallel programming using a SPMD (Single Program Multiple Data) model! What is a coarray?! extends array syntax with codimensions, e.g. REAL :: X(10,10)[*]! How to access a coarray?! Reference with [] mean data on specified image, e.g. X(1,:) = X(1,:)[p]! May be allocatable, structure components, dummy or actual arguments 10

16 Coarray Fortran 2.0 (CAF 2.0) A rich extension to Coarray Fortran developed at Rice University Teams (like MPI communicator) and collectives! Asynchronous operations! asynchronous copy, asynchronous collectives, and function shipping! Synchronization constructs! events, cofence, and finish Features in Blue has been adopted by Fortran standard committee 11

17 Coarray and MPI-3 RMA Mapping coarray features to MPI-3 APIs Initialization! INTEGER :: A(100,100)[*] MPI_WIN_ALLOCATE, then MPI_WIN_LOCK_ALL! Coarray Read & Write! A(:)[1] = A(:); A(:) = A(:)[2] MPI_RPUT; MPI_RGET! Synchronization! MPI_WIN_SYNC (local) & MPI_WIN_FLUSH (_ALL) (global) 12

18 Active Messages Lightweight, low-level, asynchronous, remote procedure calls Many CAF 2.0 features are built on top of Active Messages! MPI does not provide an implementation of Active Messages! Emulate Active Messages with MPI_Send and MPI_Recv routines! hurt performance - cannot overlap communication with AM handlers! hurt interoperability - could cause deadlock spawn wait MPI_Reduce 13

19 Active Messages Lightweight, low-level, asynchronous, remote procedure calls Many CAF 2.0 features are built on top of Active Messages! MPI does not provide an implementation of Active Messages! Emulate Active Messages with MPI_Send and MPI_Recv routines! hurt performance - cannot overlap communication with AM handlers! hurt interoperability - could cause deadlock spawn wait MPI_Reduce 13

20 Active Messages Lightweight, low-level, asynchronous, remote procedure calls Many CAF 2.0 features are built on top of Active Messages! MPI does not provide an implementation of Active Messages! Emulate Active Messages with MPI_Send and MPI_Recv routines! hurt performance - cannot overlap communication with AM handlers! hurt interoperability - could cause deadlock spawn wait MPI_Reduce DEADLOCK! 13

21 CAF 2.0 Events similar to counting semaphores Maps to Active Messages CALL event_notify(event, n)! Need to ensure all previous asynchronous operations have completed before the notification CALL event_wait(event, n)! Also serves as a compiler barrier (prevent compiler from reorder operations upward) for each window MPI_Win_sync(win) for each dirty window MPI_Win_flush_all(win) AM_Request( ) // MPI_Isend while (count < n) for each window MPI_Win_sync(win) AM_Poll( ) // MPI_Iprobe 14

22 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) 15

23 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) copy_async 15

24 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async 15

25 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async 15

26 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event 15

27 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event 15

28 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event 15

29 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event dest_event 15

30 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event dest_event Map copy_async to Active Message 15

31 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async dest_event Map copy_async to Active Message 15

32 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM request sent dest_event Map copy_async to Active Message 15

33 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM handler complete After AM request sent dest_event Map copy_async to Active Message 15

34 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM handler complete After AM request sent dest_event Map copy_async to Active Message MPI does not have AM support 15

35 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM handler complete After AM request sent dest_event Map copy_async to Active Message MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15

36 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event dest_event Map copy_async to Active Message MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15

37 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput pred_event src_event copy_async dest_event Map copy_async to Active Message MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15

38 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput After MPI_Rput returns pred_event src_event copy_async Map copy_async to Active Message dest_event MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15

39 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput After MPI_Rput returns pred_event src_event copy_async Map copy_async to Active Message dest_event After MPI_WIN_FLUSH? MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15

40 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput After MPI_Rput returns pred_event src_event copy_async Map copy_async to Active Message dest_event After MPI_WIN_FLUSH? MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) No asynchronous synchronization operation in MPI 15

41 Evaluation 2 machines! Cluster (InfiniBand) and Cray XC30! 3 benchmarks and 1 mini-app! RandomAccess, FFT, HPL, and CGPOP! 2 implementations! CAF-MPI and CAF-GASNet System Nodes Cores / Node Memory / Node Interconnect MPI Version Cluster (Fusion) 320 2x4 32GB InfiniBand QDR MVAPICH2-1.9 Cray XC30 (Edison) 5,200 2x12 64GB Cray Aries CRAY MPI

42 RandomAccess Measures worst case system throughput RandomAccess on Edison (Cray XC30) RandomAccess on Fusion (InfiniBand) 100 CAF-MPI CAF-GASNet IDEAL-SCALING (based on CAF-MPI) 100 CAF-MPI CAF-GASNet IDEAL-SCALING (based on CAF-MPI) GUPS GUPS Number of processes Number of processes 17

43 Performance Analysis of RandomAccess Time breakdown of RandomAccess The time spent in communication are about the same! event_notify is slower in CAF-MPI because of MPI_WIN_FLUSH_ALL! MPI_WIN_FLUSH_ALL performs MPI_WIN_FLUSH one by one Time (in seconds) event_notify event_wait coarray_write Computation CAF-GASNet CAF-MPI 18

44 FFT Measures all-to-all communication FFT on Edison (Cray XC30) FFT on Fusion (InfiniBand) CAF-MPI CAF-GASNet IDEAL-SCALING 1000 CAF-MPI CAF-GASNet IDEAL-SCALING GFlops 100 GFlops Number of processes Number of processes 19

45 Performance Analysis of FFT Time breakdown of FFT The CAF 2.0 version of FFT solely uses ALLtoALL for communication! CAF-MPI performs better because of fast all-to-all implementation Time (seconds) Computation All-to-all CAF-GASNet CAF-MPI 20

46 High Performance Linpack computation intensive HPL on Edison (Cray XC30) HPL on Fusion (InfiniBand) 100 CAF-MPI CAF-GASNet IDEAL-SCALING CAF-MPI CAF-GASNet IDEAL-SCALING TFlops TFlops Number of processes Number of processes 21

47 CGPOP A CAF+MPI hybrid application The conjugate gradient solver from LANL Parallel Ocean Program 2.0! performance bottleneck of the full POP 2.0 application! Performs linear algebra computation interspersed with two comm. steps:! GlobalSum: a 3-word vector sum (MPI_Reduce)! UpdateHalo: boundary exchange between neighboring subdomains (CAF) Andrew I Stone, John M. Dennis, Michelle Mills Strout, Evaluating Coarray Fortran with the CGPOP Miniapp" 22

48 CGPOP CGPOP on Edison (Cray XC30) CGPOP on Fusion (InfiniBand) 3000 CAF-MPI CAF-GASNet 700 CAF-MPI CAF-GASNet Execution time (in seconds) Execution time (in seconds) Number of processes Number of processes 23

49 Conclusions The benefits of building runtime systems on top of MPI! Interoperable with numerous MPI based libraries (Fortran 2008)! Deliver performance comparable to runtimes built with GASNet! MPI s rich interface is time-saving! What current MPI RMA lacks! MPI_WIN_RFLUSH - overlap synchronization with computation! Active Messages - full interoperability 24

Portable, MPI-Interoperable! Coarray Fortran

Portable, MPI-Interoperable! Coarray Fortran Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer

More information

Co-arrays to be included in the Fortran 2008 Standard

Co-arrays to be included in the Fortran 2008 Standard Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:

More information

Coarray Fortran: Past, Present, and Future. John Mellor-Crummey Department of Computer Science Rice University

Coarray Fortran: Past, Present, and Future. John Mellor-Crummey Department of Computer Science Rice University Coarray Fortran: Past, Present, and Future John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu CScADS Workshop on Leadership Computing July 19-22, 2010 1 Staff Bill Scherer

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Dmitry Durnov 15 February 2017

Dmitry Durnov 15 February 2017 Cовременные тенденции разработки высокопроизводительных приложений Dmitry Durnov 15 February 2017 Agenda Modern cluster architecture Node level Cluster level Programming models Tools 2/20/2017 2 Modern

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

PGAS languages. The facts, the myths and the requirements. Dr Michèle Weiland Monday, 1 October 12

PGAS languages. The facts, the myths and the requirements. Dr Michèle Weiland Monday, 1 October 12 PGAS languages The facts, the myths and the requirements Dr Michèle Weiland m.weiland@epcc.ed.ac.uk What is PGAS? a model, not a language! based on principle of partitioned global address space many different

More information

First Experiences with Application Development with Fortran Damian Rouson

First Experiences with Application Development with Fortran Damian Rouson First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran

More information

Lecture 32: Partitioned Global Address Space (PGAS) programming models

Lecture 32: Partitioned Global Address Space (PGAS) programming models COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP

More information

Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing

Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing Center for Scalable Application Development Software Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational

More information

C PGAS XcalableMP(XMP) Unified Parallel

C PGAS XcalableMP(XMP) Unified Parallel PGAS XcalableMP Unified Parallel C 1 2 1, 2 1, 2, 3 C PGAS XcalableMP(XMP) Unified Parallel C(UPC) XMP UPC XMP UPC 1 Berkeley UPC GASNet 1. MPI MPI 1 Center for Computational Sciences, University of Tsukuba

More information

A Message Passing Standard for MPP and Workstations

A Message Passing Standard for MPP and Workstations A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be

More information

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming

More information

CS 470 Spring Parallel Languages. Mike Lam, Professor

CS 470 Spring Parallel Languages. Mike Lam, Professor CS 470 Spring 2017 Mike Lam, Professor Parallel Languages Graphics and content taken from the following: http://dl.acm.org/citation.cfm?id=2716320 http://chapel.cray.com/papers/briefoverviewchapel.pdf

More information

Introduction to MPI. Branislav Jansík

Introduction to MPI. Branislav Jansík Introduction to MPI Branislav Jansík Resources https://computing.llnl.gov/tutorials/mpi/ http://www.mpi-forum.org/ https://www.open-mpi.org/doc/ Serial What is parallel computing Parallel What is MPI?

More information

Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT

Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Paul Hargrove Dan Bonachea, Michael Welcome, Katherine Yelick UPC Review. July 22, 2009. What is GASNet?

More information

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran

More information

Unified Runtime for PGAS and MPI over OFED

Unified Runtime for PGAS and MPI over OFED Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

A Local-View Array Library for Partitioned Global Address Space C++ Programs

A Local-View Array Library for Partitioned Global Address Space C++ Programs Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June

More information

Models and languages of concurrent computation

Models and languages of concurrent computation Models and languages of concurrent computation Lecture 11 of TDA383/DIT390 (Concurrent Programming) Carlo A. Furia Chalmers University of Technology University of Gothenburg SP3 2016/2017 Today s menu

More information

Understanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E

Understanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E Understanding Communication and MPI on Cray XC40 Features of the Cray MPI library Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Well tested code for high level

More information

Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing

Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing Center for Scalable Application Development Software Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu

More information

MATH 676. Finite element methods in scientific computing

MATH 676. Finite element methods in scientific computing MATH 676 Finite element methods in scientific computing Wolfgang Bangerth, Texas A&M University Lecture 41: Parallelization on a cluster of distributed memory machines Part 1: Introduction to MPI Shared

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits

RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits Sayantan Sur Hyun-Wook Jin Lei Chai D. K. Panda Network Based Computing Lab, The Ohio State University Presentation

More information

Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS

Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language,

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute

More information

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Lecture V: Introduction to parallel programming with Fortran coarrays

Lecture V: Introduction to parallel programming with Fortran coarrays Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN

IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of

More information

Unifying UPC and MPI Runtimes: Experience with MVAPICH

Unifying UPC and MPI Runtimes: Experience with MVAPICH Unifying UPC and MPI Runtimes: Experience with MVAPICH Jithin Jose Miao Luo Sayantan Sur D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Bulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model

Bulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model Bulk Synchronous and SPMD Programming The Bulk Synchronous Model CS315B Lecture 2 Prof. Aiken CS 315B Lecture 2 1 Prof. Aiken CS 315B Lecture 2 2 Bulk Synchronous Model The Machine A model An idealized

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information

Hiding Latency in Coarray Fortran 2.0

Hiding Latency in Coarray Fortran 2.0 Hiding Latency in Coarray Fortran 2.0 William N. Scherer III, Laksono Adhianto, Guohua Jin, John Mellor-Crummey, and Chaoran Yang Department of Computer Science, Rice University {scherer, laksono, jin,

More information

Optimization of MPI Applications Rolf Rabenseifner

Optimization of MPI Applications Rolf Rabenseifner Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization

More information

MPI Forum: Preview of the MPI 3 Standard

MPI Forum: Preview of the MPI 3 Standard MPI Forum: Preview of the MPI 3 Standard presented by Richard L. Graham- Chairman George Bosilca Outline Goal Forum Structure Meeting Schedule Scope Voting Rules 2 Goal To produce new versions of the MPI

More information

Composite Metrics for System Throughput in HPC

Composite Metrics for System Throughput in HPC Composite Metrics for System Throughput in HPC John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2003 Phoenix, AZ November 18, 2003 Overview The HPC Challenge Benchmark was announced last

More information

The CGPOP Miniapp, Version 1.0

The CGPOP Miniapp, Version 1.0 Computer Science Technical Report The CGPOP Miniapp, Version 1.0 Andrew Stone, John M. Dennis, Michelle Mills Strout July 8, 2011 Technical Report CS-11-103 Computer Science Department Colorado State University

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

Bringing a scientific application to the distributed world using PGAS

Bringing a scientific application to the distributed world using PGAS Bringing a scientific application to the distributed world using PGAS Performance, Portability and Usability of Fortran Coarrays Jeffrey Salmond August 15, 2017 Research Software Engineering University

More information

Message-Passing Programming with MPI

Message-Passing Programming with MPI Message-Passing Programming with MPI Message-Passing Concepts David Henty d.henty@epcc.ed.ac.uk EPCC, University of Edinburgh Overview This lecture will cover message passing model SPMD communication modes

More information

Purity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language

Purity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language Purity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language Richard B. Johnson and Jeffrey K. Hollingsworth Department of Computer Science, University of Maryland,

More information

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text Acknowledgments Programming with MPI Basic send and receive Jan Thorbecke Type to enter text This course is partly based on the MPI course developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012 Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2

More information

Programming with MPI Basic send and receive

Programming with MPI Basic send and receive Programming with MPI Basic send and receive Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Acknowledgments This course is partly based on the MPI course developed

More information

Exploring XcalableMP. Shun Liang. August 24, 2012

Exploring XcalableMP. Shun Liang. August 24, 2012 Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,

More information

The good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3

The good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3 The good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3 6th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2018) www.dash-project.org Karl Fürlinger

More information

Proceedings of the GCC Developers Summit

Proceedings of the GCC Developers Summit Reprinted from the Proceedings of the GCC Developers Summit June 17th 19th, 2008 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering

More information

Fortran 2008: what s in it for high-performance computing

Fortran 2008: what s in it for high-performance computing Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.

More information

CSE 262 Lecture 13. Communication overlap Continued Heterogeneous processing

CSE 262 Lecture 13. Communication overlap Continued Heterogeneous processing CSE 262 Lecture 13 Communication overlap Continued Heterogeneous processing Final presentations Announcements Friday March 13 th, 10:00 AM to 1:00PM Note time change. Room 3217, CSE Building (EBU3B) Scott

More information

Evaluating New Communication Models in the Nek5000 Code for Exascale

Evaluating New Communication Models in the Nek5000 Code for Exascale Evaluating New Communication Models in the Nek5000 Code for Exascale Ilya Ivanov (KTH), Rui Machado (Fraunhofer), Mirko Rahn (Fraunhofer), Dana Akhmetova (KTH), Erwin Laure (KTH), Jing Gong (KTH), Philipp

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

A Coarray Fortran Implementation to Support Data-Intensive Application Development

A Coarray Fortran Implementation to Support Data-Intensive Application Development A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

LS-DYNA Scalability Analysis on Cray Supercomputers

LS-DYNA Scalability Analysis on Cray Supercomputers 13 th International LS-DYNA Users Conference Session: Computing Technology LS-DYNA Scalability Analysis on Cray Supercomputers Ting-Ting Zhu Cray Inc. Jason Wang LSTC Abstract For the automotive industry,

More information

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism 1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How

More information

ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE

ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE 13 th ANNUAL WORKSHOP 2017 ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE Erik Paulson, Kayla Seager, Sayantan Sur, James Dinan, Dave Ozog: Intel Corporation Collaborators: Howard Pritchard:

More information

Noise Injection Techniques to Expose Subtle and Unintended Message Races

Noise Injection Techniques to Expose Subtle and Unintended Message Races Noise Injection Techniques to Expose Subtle and Unintended Message Races PPoPP2017 February 6th, 2017 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz and Christopher M. Chambreau

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Section 5. Victor Gergel, Professor, D.Sc. Lobachevsky State University of Nizhni Novgorod (UNN) Contents (CAF) Approaches to parallel programs development Parallel

More information

Understanding MPI on Cray XC30

Understanding MPI on Cray XC30 Understanding MPI on Cray XC30 MPICH3 and Cray MPT Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Cray provides enhancements on top of this: low level communication

More information

Parallel Languages: Past, Present and Future

Parallel Languages: Past, Present and Future Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads) Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program

More information

Message-Passing Programming with MPI. Message-Passing Concepts

Message-Passing Programming with MPI. Message-Passing Concepts Message-Passing Programming with MPI Message-Passing Concepts Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Using Lamport s Logical Clocks

Using Lamport s Logical Clocks Fast Classification of MPI Applications Using Lamport s Logical Clocks Zhou Tong, Scott Pakin, Michael Lang, Xin Yuan Florida State University Los Alamos National Laboratory 1 Motivation Conventional trace-based

More information

More about MPI programming. More about MPI programming p. 1

More about MPI programming. More about MPI programming p. 1 More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share

More information

High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT

High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT Krishna Kandalla (1), Hari Subramoni (1), Karen Tomko (2), Dmitry Pekurovsky

More information

John Mellor-Crummey Department of Computer Science Center for High Performance Software Research Rice University

John Mellor-Crummey Department of Computer Science Center for High Performance Software Research Rice University Co-Array Fortran and High Performance Fortran John Mellor-Crummey Department of Computer Science Center for High Performance Software Research Rice University LACSI Symposium October 2006 The Problem Petascale

More information

Compilers and Compiler-based Tools for HPC

Compilers and Compiler-based Tools for HPC Compilers and Compiler-based Tools for HPC John Mellor-Crummey Department of Computer Science Rice University http://lacsi.rice.edu/review/2004/slides/compilers-tools.pdf High Performance Computing Algorithms

More information

Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications

Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications William Gropp University of Illinois, Urbana- Champaign Pavan Balaji, Rajeev Thakur

More information

Chapel Introduction and

Chapel Introduction and Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple

More information

Accelerating HPL on Heterogeneous GPU Clusters

Accelerating HPL on Heterogeneous GPU Clusters Accelerating HPL on Heterogeneous GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran

OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman Department of Computer Science University

More information

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for

More information

Appendix D. Fortran quick reference

Appendix D. Fortran quick reference Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level

More information

Computer Architectures and Aspects of NWP models

Computer Architectures and Aspects of NWP models Computer Architectures and Aspects of NWP models Deborah Salmond ECMWF Shinfield Park, Reading, UK Abstract: This paper describes the supercomputers used in operational NWP together with the computer aspects

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan HPC Consultant User Services Goals Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Discuss briefly the

More information

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract

More information

Evaluation of PGAS Communication Paradigms With Geometric Multigrid

Evaluation of PGAS Communication Paradigms With Geometric Multigrid Lawrence Berkeley National Laboratory Evaluation of PGAS Communication Paradigms With Geometric Multigrid Hongzhang Shan, Amir Kamil, Samuel Williams, Yili Zheng, and Katherine Yelick Lawrence Berkeley

More information

WhatÕs New in the Message-Passing Toolkit

WhatÕs New in the Message-Passing Toolkit WhatÕs New in the Message-Passing Toolkit Karl Feind, Message-passing Toolkit Engineering Team, SGI ABSTRACT: SGI message-passing software has been enhanced in the past year to support larger Origin 2

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information