Portable, MPI-Interoperable! Coarray Fortran
|
|
- Ralph Dickerson
- 6 years ago
- Views:
Transcription
1 Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer Science Division! Argonne National Laboratory! Argonne, IL
2 Parallel Programming Models Shared memory! Pthread, OpenMP,! easy to program! hard to scale Message passing! MPI,! scalable! hard to program Process / Thread Process / Thread Process / Thread Process / Thread Process / Thread Process / Thread Memory Memory Memory Memory Interconnect 2
3 Partitioned Global Address Space Languages Combine the strength! shared memory models! Easy to program! Process / Thread Process / Thread Process / Thread message passing models! Scalable! Example PGAS Languages! Unified Parallel C (C)! Private Memory Memory Memory Titanium (Java)! Coarray Fortran (Fortran)! Public Memory Memory Shared Memory Related efforts! X10 (IBM)! Chapel (Cray)! Language level shared memory abstraction Fortress (Oracle) 3
4 Current status The dominance of Message Passing Interface (MPI)! Most applications on clusters are written using MPI! Many high-level libraries on clusters are built with MPI! Why people do not adopt PGAS languages?! Most PGAS languages are built with a different runtime system e.g. GASNet! Hard to adopt new programming models in existing applications incrementally 4
5 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 P1 5
6 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 Put P1 5
7 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 Put P1 Barrier 5
8 Problem 1: deadlock PROGRAM MAY_DEADLOCK!! USE MPI!! CALL MPI_INIT(E)!! CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, E)!! IF (MYRANK.EQ. 0) A(:)[1] = A(:)!! CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)!! CALL MPI_FINALIZE(E)! END PROGRAM! P0 Put P1 Barrier 5
9 Problem 2: duplicates resources Memory usage of comm. runtimes Mapped Memory Size (MB) GASNet-only MPI-only Duplicate Runtimes Memory usage per process increases as the number of processes increases! At larger scale, excessive memory use of duplicate runtimes will hurt scalability Number of processes 6
10 The problem PGAS languages DO NOT play well with MPI! program may deadlock using MPI and other runtimes! program unnecessarily uses duplicated resources 7
11 The problem PGAS languages DO NOT play well with MPI! program may deadlock using MPI and other runtimes! program unnecessarily uses duplicated resources The solution Build PGAS language runtimes with MPI 7
12 Why people haven t done it? Previously MPI was considered insufficient for this goal* MPI-3 adds extensive support for Remote Memory Access (RMA) Separate Window (MPI-2) Unified window (MPI-3) Remote write Public Private MPI Window Local write Remote write Unified MPI Window Local write *: D. Bonachea and J. Duell. Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. Int. J. High Perform. Comput. Netw., 1(1-3):91 99, Aug
13 Question to investigate Build PGAS runtimes with MPI-3! Does it provide full interoperability?! Does it degrade performance? 9
14 Question to investigate Build PGAS runtimes with MPI-3! Does it provide full interoperability?! Does it degrade performance? Approach Build a PGAS language (Coarray Fortran) with MPI-3 and evaluate its interoperability and performance 9
15 Coarray in Fortran 2008 (CAF) Fortran 2008 Standard contains features for parallel programming using a SPMD (Single Program Multiple Data) model! What is a coarray?! extends array syntax with codimensions, e.g. REAL :: X(10,10)[*]! How to access a coarray?! Reference with [] mean data on specified image, e.g. X(1,:) = X(1,:)[p]! May be allocatable, structure components, dummy or actual arguments 10
16 Coarray Fortran 2.0 (CAF 2.0) A rich extension to Coarray Fortran developed at Rice University Teams (like MPI communicator) and collectives! Asynchronous operations! asynchronous copy, asynchronous collectives, and function shipping! Synchronization constructs! events, cofence, and finish Features in Blue has been adopted by Fortran standard committee 11
17 Coarray and MPI-3 RMA Mapping coarray features to MPI-3 APIs Initialization! INTEGER :: A(100,100)[*] MPI_WIN_ALLOCATE, then MPI_WIN_LOCK_ALL! Coarray Read & Write! A(:)[1] = A(:); A(:) = A(:)[2] MPI_RPUT; MPI_RGET! Synchronization! MPI_WIN_SYNC (local) & MPI_WIN_FLUSH (_ALL) (global) 12
18 Active Messages Lightweight, low-level, asynchronous, remote procedure calls Many CAF 2.0 features are built on top of Active Messages! MPI does not provide an implementation of Active Messages! Emulate Active Messages with MPI_Send and MPI_Recv routines! hurt performance - cannot overlap communication with AM handlers! hurt interoperability - could cause deadlock spawn wait MPI_Reduce 13
19 Active Messages Lightweight, low-level, asynchronous, remote procedure calls Many CAF 2.0 features are built on top of Active Messages! MPI does not provide an implementation of Active Messages! Emulate Active Messages with MPI_Send and MPI_Recv routines! hurt performance - cannot overlap communication with AM handlers! hurt interoperability - could cause deadlock spawn wait MPI_Reduce 13
20 Active Messages Lightweight, low-level, asynchronous, remote procedure calls Many CAF 2.0 features are built on top of Active Messages! MPI does not provide an implementation of Active Messages! Emulate Active Messages with MPI_Send and MPI_Recv routines! hurt performance - cannot overlap communication with AM handlers! hurt interoperability - could cause deadlock spawn wait MPI_Reduce DEADLOCK! 13
21 CAF 2.0 Events similar to counting semaphores Maps to Active Messages CALL event_notify(event, n)! Need to ensure all previous asynchronous operations have completed before the notification CALL event_wait(event, n)! Also serves as a compiler barrier (prevent compiler from reorder operations upward) for each window MPI_Win_sync(win) for each dirty window MPI_Win_flush_all(win) AM_Request( ) // MPI_Isend while (count < n) for each window MPI_Win_sync(win) AM_Poll( ) // MPI_Iprobe 14
22 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) 15
23 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) copy_async 15
24 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async 15
25 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async 15
26 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event 15
27 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event 15
28 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event 15
29 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event dest_event 15
30 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event dest_event Map copy_async to Active Message 15
31 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async dest_event Map copy_async to Active Message 15
32 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM request sent dest_event Map copy_async to Active Message 15
33 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM handler complete After AM request sent dest_event Map copy_async to Active Message 15
34 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM handler complete After AM request sent dest_event Map copy_async to Active Message MPI does not have AM support 15
35 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send AM request pred_event src_event copy_async After AM handler complete After AM request sent dest_event Map copy_async to Active Message MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15
36 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) pred_event copy_async src_event dest_event Map copy_async to Active Message MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15
37 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput pred_event src_event copy_async dest_event Map copy_async to Active Message MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15
38 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput After MPI_Rput returns pred_event src_event copy_async Map copy_async to Active Message dest_event MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15
39 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput After MPI_Rput returns pred_event src_event copy_async Map copy_async to Active Message dest_event After MPI_WIN_FLUSH? MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) 15
40 CAF 2.0 Asynchronous Operations copy_async(dest, src, dest_event, src_event, pred_event) When posted, send MPI_Rput After MPI_Rput returns pred_event src_event copy_async Map copy_async to Active Message dest_event After MPI_WIN_FLUSH? MPI does not have AM support Map copy_async to MPI_RPUT (or MPI_RGET) No asynchronous synchronization operation in MPI 15
41 Evaluation 2 machines! Cluster (InfiniBand) and Cray XC30! 3 benchmarks and 1 mini-app! RandomAccess, FFT, HPL, and CGPOP! 2 implementations! CAF-MPI and CAF-GASNet System Nodes Cores / Node Memory / Node Interconnect MPI Version Cluster (Fusion) 320 2x4 32GB InfiniBand QDR MVAPICH2-1.9 Cray XC30 (Edison) 5,200 2x12 64GB Cray Aries CRAY MPI
42 RandomAccess Measures worst case system throughput RandomAccess on Edison (Cray XC30) RandomAccess on Fusion (InfiniBand) 100 CAF-MPI CAF-GASNet IDEAL-SCALING (based on CAF-MPI) 100 CAF-MPI CAF-GASNet IDEAL-SCALING (based on CAF-MPI) GUPS GUPS Number of processes Number of processes 17
43 Performance Analysis of RandomAccess Time breakdown of RandomAccess The time spent in communication are about the same! event_notify is slower in CAF-MPI because of MPI_WIN_FLUSH_ALL! MPI_WIN_FLUSH_ALL performs MPI_WIN_FLUSH one by one Time (in seconds) event_notify event_wait coarray_write Computation CAF-GASNet CAF-MPI 18
44 FFT Measures all-to-all communication FFT on Edison (Cray XC30) FFT on Fusion (InfiniBand) CAF-MPI CAF-GASNet IDEAL-SCALING 1000 CAF-MPI CAF-GASNet IDEAL-SCALING GFlops 100 GFlops Number of processes Number of processes 19
45 Performance Analysis of FFT Time breakdown of FFT The CAF 2.0 version of FFT solely uses ALLtoALL for communication! CAF-MPI performs better because of fast all-to-all implementation Time (seconds) Computation All-to-all CAF-GASNet CAF-MPI 20
46 High Performance Linpack computation intensive HPL on Edison (Cray XC30) HPL on Fusion (InfiniBand) 100 CAF-MPI CAF-GASNet IDEAL-SCALING CAF-MPI CAF-GASNet IDEAL-SCALING TFlops TFlops Number of processes Number of processes 21
47 CGPOP A CAF+MPI hybrid application The conjugate gradient solver from LANL Parallel Ocean Program 2.0! performance bottleneck of the full POP 2.0 application! Performs linear algebra computation interspersed with two comm. steps:! GlobalSum: a 3-word vector sum (MPI_Reduce)! UpdateHalo: boundary exchange between neighboring subdomains (CAF) Andrew I Stone, John M. Dennis, Michelle Mills Strout, Evaluating Coarray Fortran with the CGPOP Miniapp" 22
48 CGPOP CGPOP on Edison (Cray XC30) CGPOP on Fusion (InfiniBand) 3000 CAF-MPI CAF-GASNet 700 CAF-MPI CAF-GASNet Execution time (in seconds) Execution time (in seconds) Number of processes Number of processes 23
49 Conclusions The benefits of building runtime systems on top of MPI! Interoperable with numerous MPI based libraries (Fortran 2008)! Deliver performance comparable to runtimes built with GASNet! MPI s rich interface is time-saving! What current MPI RMA lacks! MPI_WIN_RFLUSH - overlap synchronization with computation! Active Messages - full interoperability 24
Portable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationCo-arrays to be included in the Fortran 2008 Standard
Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:
More informationCoarray Fortran: Past, Present, and Future. John Mellor-Crummey Department of Computer Science Rice University
Coarray Fortran: Past, Present, and Future John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu CScADS Workshop on Leadership Computing July 19-22, 2010 1 Staff Bill Scherer
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationDmitry Durnov 15 February 2017
Cовременные тенденции разработки высокопроизводительных приложений Dmitry Durnov 15 February 2017 Agenda Modern cluster architecture Node level Cluster level Programming models Tools 2/20/2017 2 Modern
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationIntroduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign
Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s
More informationPGAS languages. The facts, the myths and the requirements. Dr Michèle Weiland Monday, 1 October 12
PGAS languages The facts, the myths and the requirements Dr Michèle Weiland m.weiland@epcc.ed.ac.uk What is PGAS? a model, not a language! based on principle of partitioned global address space many different
More informationFirst Experiences with Application Development with Fortran Damian Rouson
First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran
More informationLecture 32: Partitioned Global Address Space (PGAS) programming models
COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP
More informationCoarray Fortran 2.0: A Productive Language for Scalable Scientific Computing
Center for Scalable Application Development Software Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu
More informationEnabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters
Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationC PGAS XcalableMP(XMP) Unified Parallel
PGAS XcalableMP Unified Parallel C 1 2 1, 2 1, 2, 3 C PGAS XcalableMP(XMP) Unified Parallel C(UPC) XMP UPC XMP UPC 1 Berkeley UPC GASNet 1. MPI MPI 1 Center for Computational Sciences, University of Tsukuba
More informationA Message Passing Standard for MPP and Workstations
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be
More informationCo-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming
More informationCS 470 Spring Parallel Languages. Mike Lam, Professor
CS 470 Spring 2017 Mike Lam, Professor Parallel Languages Graphics and content taken from the following: http://dl.acm.org/citation.cfm?id=2716320 http://chapel.cray.com/papers/briefoverviewchapel.pdf
More informationIntroduction to MPI. Branislav Jansík
Introduction to MPI Branislav Jansík Resources https://computing.llnl.gov/tutorials/mpi/ http://www.mpi-forum.org/ https://www.open-mpi.org/doc/ Serial What is parallel computing Parallel What is MPI?
More informationPorting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT
Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Paul Hargrove Dan Bonachea, Michael Welcome, Katherine Yelick UPC Review. July 22, 2009. What is GASNet?
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationParallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially
More informationA Local-View Array Library for Partitioned Global Address Space C++ Programs
Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June
More informationModels and languages of concurrent computation
Models and languages of concurrent computation Lecture 11 of TDA383/DIT390 (Concurrent Programming) Carlo A. Furia Chalmers University of Technology University of Gothenburg SP3 2016/2017 Today s menu
More informationUnderstanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E
Understanding Communication and MPI on Cray XC40 Features of the Cray MPI library Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Well tested code for high level
More informationCoarray Fortran 2.0: A Productive Language for Scalable Scientific Computing
Center for Scalable Application Development Software Coarray Fortran 2.0: A Productive Language for Scalable Scientific Computing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu
More informationMATH 676. Finite element methods in scientific computing
MATH 676 Finite element methods in scientific computing Wolfgang Bangerth, Texas A&M University Lecture 41: Parallelization on a cluster of distributed memory machines Part 1: Introduction to MPI Shared
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationRDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits
RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits Sayantan Sur Hyun-Wook Jin Lei Chai D. K. Panda Network Based Computing Lab, The Ohio State University Presentation
More informationImplementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS
Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language,
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationAn Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters
An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationLecture V: Introduction to parallel programming with Fortran coarrays
Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationIMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN
IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of
More informationUnifying UPC and MPI Runtimes: Experience with MVAPICH
Unifying UPC and MPI Runtimes: Experience with MVAPICH Jithin Jose Miao Luo Sayantan Sur D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationBulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model
Bulk Synchronous and SPMD Programming The Bulk Synchronous Model CS315B Lecture 2 Prof. Aiken CS 315B Lecture 2 1 Prof. Aiken CS 315B Lecture 2 2 Bulk Synchronous Model The Machine A model An idealized
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More informationAcknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text
Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center
More informationMPI Message Passing Interface
MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information
More informationHiding Latency in Coarray Fortran 2.0
Hiding Latency in Coarray Fortran 2.0 William N. Scherer III, Laksono Adhianto, Guohua Jin, John Mellor-Crummey, and Chaoran Yang Department of Computer Science, Rice University {scherer, laksono, jin,
More informationOptimization of MPI Applications Rolf Rabenseifner
Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization
More informationMPI Forum: Preview of the MPI 3 Standard
MPI Forum: Preview of the MPI 3 Standard presented by Richard L. Graham- Chairman George Bosilca Outline Goal Forum Structure Meeting Schedule Scope Voting Rules 2 Goal To produce new versions of the MPI
More informationComposite Metrics for System Throughput in HPC
Composite Metrics for System Throughput in HPC John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2003 Phoenix, AZ November 18, 2003 Overview The HPC Challenge Benchmark was announced last
More informationThe CGPOP Miniapp, Version 1.0
Computer Science Technical Report The CGPOP Miniapp, Version 1.0 Andrew Stone, John M. Dennis, Michelle Mills Strout July 8, 2011 Technical Report CS-11-103 Computer Science Department Colorado State University
More informationIntroduction to the Message Passing Interface (MPI)
Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018
More informationBringing a scientific application to the distributed world using PGAS
Bringing a scientific application to the distributed world using PGAS Performance, Portability and Usability of Fortran Coarrays Jeffrey Salmond August 15, 2017 Research Software Engineering University
More informationMessage-Passing Programming with MPI
Message-Passing Programming with MPI Message-Passing Concepts David Henty d.henty@epcc.ed.ac.uk EPCC, University of Edinburgh Overview This lecture will cover message passing model SPMD communication modes
More informationPurity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language
Purity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language Richard B. Johnson and Jeffrey K. Hollingsworth Department of Computer Science, University of Maryland,
More informationAcknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text
Acknowledgments Programming with MPI Basic send and receive Jan Thorbecke Type to enter text This course is partly based on the MPI course developed by Rolf Rabenseifner at the High-Performance Computing-Center
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationProgramming with MPI Basic send and receive
Programming with MPI Basic send and receive Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Acknowledgments This course is partly based on the MPI course developed
More informationExploring XcalableMP. Shun Liang. August 24, 2012
Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One
More informationMPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018
MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,
More informationThe good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3
The good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3 6th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2018) www.dash-project.org Karl Fürlinger
More informationProceedings of the GCC Developers Summit
Reprinted from the Proceedings of the GCC Developers Summit June 17th 19th, 2008 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering
More informationFortran 2008: what s in it for high-performance computing
Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.
More informationCSE 262 Lecture 13. Communication overlap Continued Heterogeneous processing
CSE 262 Lecture 13 Communication overlap Continued Heterogeneous processing Final presentations Announcements Friday March 13 th, 10:00 AM to 1:00PM Note time change. Room 3217, CSE Building (EBU3B) Scott
More informationEvaluating New Communication Models in the Nek5000 Code for Exascale
Evaluating New Communication Models in the Nek5000 Code for Exascale Ilya Ivanov (KTH), Rui Machado (Fraunhofer), Mirko Rahn (Fraunhofer), Dana Akhmetova (KTH), Erwin Laure (KTH), Jing Gong (KTH), Philipp
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing
More informationAn Introduction to MPI
An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationLS-DYNA Scalability Analysis on Cray Supercomputers
13 th International LS-DYNA Users Conference Session: Computing Technology LS-DYNA Scalability Analysis on Cray Supercomputers Ting-Ting Zhu Cray Inc. Jason Wang LSTC Abstract For the automotive industry,
More informationRuntime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism
1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background
More informationThe Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE
13 th ANNUAL WORKSHOP 2017 ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE Erik Paulson, Kayla Seager, Sayantan Sur, James Dinan, Dave Ozog: Intel Corporation Collaborators: Howard Pritchard:
More informationNoise Injection Techniques to Expose Subtle and Unintended Message Races
Noise Injection Techniques to Expose Subtle and Unintended Message Races PPoPP2017 February 6th, 2017 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz and Christopher M. Chambreau
More informationMDHIM: A Parallel Key/Value Store Framework for HPC
MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Section 5. Victor Gergel, Professor, D.Sc. Lobachevsky State University of Nizhni Novgorod (UNN) Contents (CAF) Approaches to parallel programs development Parallel
More informationUnderstanding MPI on Cray XC30
Understanding MPI on Cray XC30 MPICH3 and Cray MPT Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Cray provides enhancements on top of this: low level communication
More informationParallel Languages: Past, Present and Future
Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationParallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)
Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program
More informationMessage-Passing Programming with MPI. Message-Passing Concepts
Message-Passing Programming with MPI Message-Passing Concepts Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationUsing Lamport s Logical Clocks
Fast Classification of MPI Applications Using Lamport s Logical Clocks Zhou Tong, Scott Pakin, Michael Lang, Xin Yuan Florida State University Los Alamos National Laboratory 1 Motivation Conventional trace-based
More informationMore about MPI programming. More about MPI programming p. 1
More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share
More informationHigh-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT
High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT Krishna Kandalla (1), Hari Subramoni (1), Karen Tomko (2), Dmitry Pekurovsky
More informationJohn Mellor-Crummey Department of Computer Science Center for High Performance Software Research Rice University
Co-Array Fortran and High Performance Fortran John Mellor-Crummey Department of Computer Science Center for High Performance Software Research Rice University LACSI Symposium October 2006 The Problem Petascale
More informationCompilers and Compiler-based Tools for HPC
Compilers and Compiler-based Tools for HPC John Mellor-Crummey Department of Computer Science Rice University http://lacsi.rice.edu/review/2004/slides/compilers-tools.pdf High Performance Computing Algorithms
More informationSystems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications
Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications William Gropp University of Illinois, Urbana- Champaign Pavan Balaji, Rajeev Thakur
More informationChapel Introduction and
Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple
More informationAccelerating HPL on Heterogeneous GPU Clusters
Accelerating HPL on Heterogeneous GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationOpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran
OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman Department of Computer Science University
More informationOPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS
OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for
More informationAppendix D. Fortran quick reference
Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level
More informationComputer Architectures and Aspects of NWP models
Computer Architectures and Aspects of NWP models Deborah Salmond ECMWF Shinfield Park, Reading, UK Abstract: This paper describes the supercomputers used in operational NWP together with the computer aspects
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan HPC Consultant User Services Goals Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Discuss briefly the
More informationAn evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks
An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract
More informationEvaluation of PGAS Communication Paradigms With Geometric Multigrid
Lawrence Berkeley National Laboratory Evaluation of PGAS Communication Paradigms With Geometric Multigrid Hongzhang Shan, Amir Kamil, Samuel Williams, Yili Zheng, and Katherine Yelick Lawrence Berkeley
More informationWhatÕs New in the Message-Passing Toolkit
WhatÕs New in the Message-Passing Toolkit Karl Feind, Message-passing Toolkit Engineering Team, SGI ABSTRACT: SGI message-passing software has been enhanced in the past year to support larger Origin 2
More informationBarbara Chapman, Gabriele Jost, Ruud van der Pas
Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology
More information