Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS
|
|
- Lilian Dalton
- 6 years ago
- Views:
Transcription
1 Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS
2 Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language, an extension of Fortran and C Has two programming models: Globalview programming model n Abstraction of distribution nodes & templates directives n Data distribution distribute, align & shadow directives n Work distribution task, loop & array directives n Communication/synchronization reflect, gmove, reduction, bcast, barrier & wait_async directives n Intrinsic procedures Localview programming model Coarray Features compatible with Coarray Fortran (CAF) 1.0 n Interoperability with globalview coarray, image & local_alias directives n Coarray/C extensions LENS2015 WORKSHOP 2
3 Background MPI, XMP and CAF Programming An example of 2dimensional stencil communication (width=2) MPI ( 30 lines) call mpi_cart_create, mpi_cart_get and mpi_cart_shift call mpi_type_vector and mpi_type_commit call mpi_isend, mpi_irecv and mpi_waitall image [k1, k21] image [k11, k2] 1 1 (1) n n (3) m m+2 a(m, n) on image [k1, k2] (2) (4) image [k1, k2+1] XMP Globalview (1 line)!$xmp reflect (A) CAF (6 lines) if (k1>1) A(1:0,1:n) = A(m1:m,1:n)[k11,k2] (1) if (k1<k1x) A(m+1:m+2,1:n) = A(1:2,1:n)[k1+1,k2] (2) sync all if (k2>1) A(1:m+2,1:0) = A(1:m+2,n1:n)[k1,k21] (3) if (k2<k2x) A(1:m+2,n+1:n+2) = A(1:m+2,1:2)[k1,k2+1] (4) sync all image [k1+1, k2] Ease of programming: MPI < CAF < XMP Expressiveness: MPI CAF > XMP LENS2015 WORKSHOP 3
4 Contents Coarray Fortran and Other Implementations Issues in Our Implementation Evaluation, Comparing with Other Implementations Summary and Conclusion LENS2015 WORKSHOP 4
5 Coarray Fortran Language Specification An extension of Fortran to describe parallel execution. Adopted as a part of Fortran 2008 Basic Usage of Coarrays Declaration: real A(10,10)[*] // coarray A(10,10) on each image Reference (for get ) and Definition (for put ):... A(i,j)[k]... // reference to A(i,j) on image k A(i,j)[k] =... // assignment to A(i,j) on image k Useful in the context of array expression/assignment:... A[k]... // reference to the whole array A on k... A(i1:i2, j1:j2)[k]... // reference to a subarray of A on k LENS2015 WORKSHOP 5
6 Coarray Fortran Existing Implementations COMPILERS Vendors Cray Fortran Intel Fortran Open Source OpenUH (U. of Houston) Based on Open64 compiler OpenCoarrays Called by GCC or later TRANSLATORS (Sourcetosource Compilers) Open Source Rice CAF (noncompatible w/ F2008) Based on ROSE sourcetosource compiler OmniXMP CAF (preliminary version) Based on OmniXMP sourcetosource compiler LENS2015 WORKSHOP 6
7 Status of Our Implementation Fortran2008 Coarray Features (à ) Fortran2015 Coarray Features Partially supported: co_sum co_max co_min Interoperability with XMP globalview Not supported yet Section Feature in [1] declaration of static coarrays 3 initialization of coarrays declaration of allocatable coarrays reference to coindexed object 4 definition to coindexed variable dummy argument of static coarray 5 dummy argument of allocatable coarray ALLOCATE statement for coarray 9 DEALLOCATE statement for coarray implicit deallocation derived type coarray allocatable component of derived type coarray 10 pointer component of derived type coarray coarray component of structure SYNC ALL statement SYNC IMAGES statement LOCK/UNLOCK statements 12 CRITIDAL section SYNC MEMORY statement stat= and errmsg= specifiers normal termination 13 error termination, ERROR STOP statement image_index, lcobound, ucobound 15 num_images, this_image([coarray [,dim]) atomic_define, atomic_ref Support [1] John Reid. Coarrays in the next Fortran Standard. ISO/IEC JTC1/SC22/WG5 N1824, April 21, 2010 LENS2015 WORKSHOP 7
8 Our Implementation Based on OmniXMP Compiler Omni XMP compiler added to to support coarrays Coarray library XMP library XMP & Coarray program Coarray translator XMP translator Fortran program Fortran compiler (GNU, Fujitsu, ) object Fortran library MPI Linker (GNU, Fujitsu, ) GASNet FJRDMA Implementation of Coarray Features A part of OmniXMP compiler For interoperability with XMP globalview Translates CAF programs into F90 programs For portability not depending on Fortran compilers Advantage of the translator Any Fortran compiler can be chosen. Issues we faced during implementation 1. Memory allocation of the coarrays via the communication library 2. Knowing the Fortran data layout at runtime executable LENS2015 WORKSHOP 8
9 Issue 1 Declaration of Static Coarrays main subrou*ne program foo user_main init_main translator foo init_foo built in main rou*ne call traverser call user_main (3) subrou*ne bar linker a.out bar init_bar initializers traverser generator traverser call init_main call init_foo call init_bar (1) (2) Issue GASNet requires all coarrays to be allocated via GASNet library. à Allocation of static coarrays causes runtime overhead at the entrance of every procedure. Solution: allocation just before executing the program (1) Translator generates initializers corresponding to procedures. (2) Traverser generator generates traverser which calls initializers. (3) Traverser is called previously to the user s main program. LENS2015 WORKSHOP 9
10 Issue 2 Reference to Coindexed Objects Issue Data layout of an array is decided by the backend Fortran compiler. Example of array variable data layout: whole array of an explicitshape array allocation subarray (a part of whole array) and assumedshape array stride data object array element 2dimentionally (fully) contiguous 1dimentionally contiguous noncontiguous For efficient communication, the runtime library should know how long and periodic the contiguous data are. LENS2015 WORKSHOP 10
11 Issue 2 (cont.) Reference to Coindexed Objects Solution: algorithm by cooperation of translator and runtime library (1) Translator generates a library call with arguments: Addresses: Sizes: P 0 = address of A(ib, jb, ) L 0 is size of array element [byte] P 1 = address of A(ib+1, jb, ) L 1 = size(a, 1) P 2 = address of A(ib, jb+1, ) L 2 = size(a, 2) (2) Runtime library executes the following algorithm. P 0 1dim. contiguous P 0 P 2 L 2 2dim. contiguous L 0 P 1 P 0 + L 0 == P 1? no yes L 1 P 0 + L 0 L 1 == P 2? no yes L 0 L 1 L 2 bytes contiguous L 0 bytes contiguous L 0 L 1 bytes contiguous LENS2015 WORKSHOP 11
12 Evaluation Application: Himeno benchmark The original MPI program, 610 lines (excl. comment lines) Ported CAF program, 415 lines ( minus 32% ) Add declaration and allocation of communication buffers as coarrays. Replace mpi_allreduce with co_sum Delete codes around mpi_cart_create, mpi_cart_get, mpi_cat_shift Delete codes around mpi_type_vector, mpi_type_commit Replace mpi_isend/irecv and mpi_waitall with coarray assignment statements. Hardware: HAPACS/TCA in Univ. of Tsukuba CPU Memory GPU Node Network Intel Xeon E52680 v2, 2.8GHz, 10 cores, 2 CPU/node 128GB NVIDIA Tesla K20X x 4 (not used in this evaluation) 64 node/system Mellanox InfiniBand QDR 8GB/s/node LENS2015 WORKSHOP 12
13 # of nodes (images) Comparison in the Implementations Coarray Fortran 1x1x1 1x1x2 1x2x2 2x2x2 2x2x better [GFLOPS] 110 OpenUH gfortran Himeno bench Mmodel/CAF version ifort OpenUH3.0.40/mvapich2.0+GASNet OpenCoarrays1.0.0/gfortran6.0.0/mpich3.1.4 OmniXMP0.9.1/gfortran4.4.7/mvapich2.0+GASNet OmniXMP0.9.1/ifort15.0.2/IntelMPI5.0.0+GASNet ifort coarray=distributed/intelmpi5.0.0 (1) (2) MPI 2x2x4 (1) OmniXMP and OpenUH are comparable in performance if they use comparable Fortran compilers and the same GASNet. (2) The performance of OmniXMP much depends on the Fortran compilers. LENS2015 WORKSHOP 13
14 # of nodes (images) Comparison in the Implementations Coarray Fortran MPI 1x1x1 1x1x2 1x2x2 2x2x2 2x2x4 2x2x better [GFLOPS] 110 OpenUH gfortran Himeno bench Mmodel/CAF version ifort OpenUH3.0.40/mvapich2.0+GASNet OpenCoarrays1.0.0/gfortran6.0.0/mpich3.1.4 OmniXMP0.9.1/gfortran4.4.7/mvapich2.0+GASNet OmniXMP0.9.1/ifort15.0.2/IntelMPI5.0.0+GASNet ifort coarray=distributed/intelmpi5.0.0 (1) OmniXMP and OpenUH are comparable in performance if they use comparable Fortran compilers and the same GASNet. (2) The performance of OmniXMP much depends on the Fortran compilers. (3) CAF programs/omnixmp is currently 2% to 5% less performance than MPI programs with the same Fortran. LENS2015 WORKSHOP 14 (1) gfortran6.0.0/mpich3.1.4 gfortran4.4.7/mvapich2.0 ifort15.0.2/intelmpi5.0.0 (3) (3) (2)
15 Summary and Conclusion OmniXMP CAF Translator Implemented major features Settled some issues about Efficient memory allocation of coarrays Knowing data layout at runtime Evaluation on Himeno benchmark on HAPACS The CAF version program is 32% shorter than the original MPI version and 2% to 5% less performance. OMNIXMP is higher performance than OpenUH, OpenCoarrays and Intel s implementations when Intel Fortran is chosen. OmniXMP with Intel Fortran is 2 to 3 times higher performance than the one with gfortran. The Advantage of the Translator Any backend (Fortran) compiler can be chosen to get the best performance on the environment. LENS2015 WORKSHOP 15
16 LENS2015 WORKSHOP Appendix
17 Goals of OmniXMP CAF Interoperability with XcalableMP (XMP) globalview programming model Portability across different Fortran compilers and platforms Compatibility with coarray features in Fortran2008 And, of course, high performance LENS2015 WORKSHOP 17
18 Declaration of Allocatable Coarray subroutine FOO real(4), allocatable :: a(:, :)[:] allocate ( a(lb1:ub1, lb2:ub2)[*] ) a(i, j) end subroutine subroutine FOO real(4), pointer :: a(:, J call xmpf_coarray_alloc2d_r4 ( desc_a, a, tag_foo, lb1, ub1, lb2, ub2 ) a(i, j) end soubroutine & Runtime library subroutine xmpf_coarray_alloc2d_r4 & (descriptor, a, tag, lb1, ub1, lb2, ub2 ) real(4), pointer :: a(:, :) real(4) :: data(lb1:ub1, lb2:ub2) pointer (pdata, data) pdata = xxx_malloc( 4*(ub1lb1+1)*(ub2lb1+1) ) call pointer_assign(a, data) contains subroutine pointer_assign(a, data) real(4), pointer :: a(:, :) real(4), target :: data(lb1:, lb2:)! set lbounds a => data end subroutine end subroutine LENS2015 WORKSHOP 18
19 OmniXMP CAF Translator Declaration of a Static Coarray XMP & CAF program subroutine foo real(4), save :: a(10,20)[*]... end subroutine Translator subroutine foo real(4) :: a(10,20) pointer (cp_a, a) common /xmpf_cp_foo/cp_a... end subroutine Fortran90 program Fortran90 compiler subroutine xmpf_init_foo integer(8) :: cp_a common /xmpf_cp_foo/cp_a cp_a = xmpf_coarray_malloc(8*10*20) end subroutine object object object Fortran Linker Run9me library LENS2015 WORKSHOP 19 a.out
20 On (1), (4) and (8), the original MPI program was evaluated. On (2), (3), (5), (6), (7) and (9), cafwide was evaluated. (1) mvapich2 2.0 and gcc 4.4.7; option O2. (2) omni/gnu xmpf built w/ (1) and GASNet ibvconduit (built w/ gnu); option O2. (3) UHCAF OpenUH built w/ (1) and GASNet above; options mpi staticlibcaf layer=gasnetibv. (4) Intel MPI and icc/ifort ; option O2. (5) omni/intelo2 xmpf built w/ (4) and GASNet above; option O2. (6) omni/intelo1 same as (5); option O1. (7) ifort/intelmpi same as (4); options O2 coarray=distributed mt_mpi (8) mpich and hydra built w/ gcc 6.0.0; option O2. (9) OpenCoarray 1.0.0; called from (8) w/ options O2 fcoarray=lib lcaf_mpi. LENS2015 WORKSHOP 20
An Open64-based Compiler and Runtime Implementation for Coarray Fortran
An Open64-based Compiler and Runtime Implementation for Coarray Fortran talk by Deepak Eachempati Presented at: Open64 Developer Forum 2010 8/25/2010 1 Outline Motivation Implementation Overview Evaluation
More informationFortran 2008: what s in it for high-performance computing
Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.
More informationLecture V: Introduction to parallel programming with Fortran coarrays
Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing
More informationAn Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters
An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction
More informationC PGAS XcalableMP(XMP) Unified Parallel
PGAS XcalableMP Unified Parallel C 1 2 1, 2 1, 2, 3 C PGAS XcalableMP(XMP) Unified Parallel C(UPC) XMP UPC XMP UPC 1 Berkeley UPC GASNet 1. MPI MPI 1 Center for Computational Sciences, University of Tsukuba
More informationOmni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation
http://omni compiler.org/ Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation MS03 Code Generation Techniques for HPC Earth Science Applications Mitsuhisa Sato (RIKEN / Advanced
More informationPerformance Comparison between Two Programming Models of XcalableMP
Performance Comparison between Two Programming Models of XcalableMP H. Sakagami Fund. Phys. Sim. Div., National Institute for Fusion Science XcalableMP specification Working Group (XMP-WG) Dilemma in Parallel
More informationBringing a scientific application to the distributed world using PGAS
Bringing a scientific application to the distributed world using PGAS Performance, Portability and Usability of Fortran Coarrays Jeffrey Salmond August 15, 2017 Research Software Engineering University
More informationCo-arrays to be included in the Fortran 2008 Standard
Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:
More informationParallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL
Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL 2007 2015 August 5, 2015 Ge Baolai SHARCNET Western University Outline What is coarray How to write: Terms, syntax How to compile
More informationLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson
More informationOPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS
OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for
More informationFortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory
Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory This talk will explain the objectives of coarrays, give a quick summary of their history, describe the
More informationFortran 2008 coarrays
Fortran 2008 coarrays Anton Shterenlikht Mech Eng Dept, The University of Bristol, Bristol BS8 1TR mexas@bris.ac.uk ABSTRACT Coarrays are a Fortran 2008 standard feature intended for SPMD type parallel
More informationHPC Challenge Awards 2010 Class2 XcalableMP Submission
HPC Challenge Awards 2010 Class2 XcalableMP Submission Jinpil Lee, Masahiro Nakao, Mitsuhisa Sato University of Tsukuba Submission Overview XcalableMP Language and model, proposed by XMP spec WG Fortran
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationExploring XcalableMP. Shun Liang. August 24, 2012
Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application
More informationint a[100]; #pragma xmp nodes p[*] #pragma xmp template t[100] #pragma xmp distribute t[block] onto p #pragma xmp align a[i] with t[i]
2 3 4 int a[100]; #pragma xmp nodes p[*] #pragma xmp template t[100] #pragma xmp distribute t[block] onto p #pragma xmp align a[i] with t[i] integer :: a(100)!$xmp nodes p(*)!$xmp template t(100)!$xmp
More informationMasahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Mitsuhisa Sato
Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Mitsuhisa Sato Center for Computational Sciences, University of Tsukuba, Japan RIKEN Advanced Institute for Computational Science, Japan 2 XMP/C int array[16];
More informationWhat is Stencil Computation?
Model Checking Stencil Computations Written in a Partitioned Global Address Space Language Tatsuya Abe, Toshiyuki Maeda, and Mitsuhisa Sato RIKEN AICS HIPS 13 May 20, 2013 What is Stencil Computation?
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationOptimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2
Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 H. Wang, S. Potluri, M. Luo, A. K. Singh, X. Ouyang, S. Sur, D. K. Panda Network-Based
More informationProgramming Environment Research Team
Chapter 2 Programming Environment Research Team 2.1 Members Mitsuhisa Sato (Team Leader) Hitoshi Murai (Research Scientist) Miwako Tsuji (Research Scientist) Masahiro Nakao (Research Scientist) Jinpil
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationLeveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E
Executive Summary Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Alessandro Fanfarillo, Damian Rouson Sourcery Inc. www.sourceryinstitue.org We report on the experience of installing
More informationMore Coarray Features. SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran
More Coarray Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Overview Multiple Dimensions and Codimensions Allocatable Coarrays and Components of Coarray Structures
More informationParallel Programming in Fortran with Coarrays
Parallel Programming in Fortran with Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 is now in FDIS ballot: only typos permitted at this stage.
More informationGPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3
/CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose
More informationTowards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015 Alessandro Fanfarillo National Center for Atmospheric Research Damian Rouson Sourcery Institute Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges
More informationDangerously Clever X1 Application Tricks
Dangerously Clever X1 Application Tricks CUG 2004 James B. White III (Trey) trey@ornl.gov 1 Acknowledgement Research sponsored by the Mathematical, Information, and Division, Office of Advanced Scientific
More informationMorden Fortran: Concurrency and parallelism
Morden Fortran: Concurrency and parallelism GENERAL SUMMERSCHOOL INTEREST SEMINARS 2007 2017 April 19, 2017 Ge Baolai SHARCNET Western University Outline Highlights of some Fortran 2008 enhancement Array
More informationA Local-View Array Library for Partitioned Global Address Space C++ Programs
Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June
More informationMPI_Send(a,..., MPI_COMM_WORLD); MPI_Recv(a,..., MPI_COMM_WORLD, &status);
$ $ 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size);
More informationTechnical Specification on further interoperability with C
Technical Specification on further interoperability with C John Reid, ISO Fortran Convener Fortran 2003 (or 2008) provides for interoperability of procedures with nonoptional arguments that are scalars,
More informationAn Open-Source Compiler and Runtime Implementation for Coarray Fortran
An Open-Source Compiler and Runtime Implementation for Coarray Fortran Deepak Eachempati Hyoung Joon Jun Barbara Chapman Computer Science Department University of Houston Houston, TX, 77004, USA {dreachem,
More informationExploring XMP programming model applied to Seismic Imaging application. Laurence BEAUDE
Exploring XMP programming model applied to Seismic Imaging application Introduction Total at a glance: 96 000 employees in more than 130 countries Upstream operations (oil and gas exploration, development
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,
More informationA Heat-Transfer Example with MPI Rolf Rabenseifner
A Heat-Transfer Example with MPI (short version) Rolf Rabenseifner rabenseifner@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de A Heat-Transfer Example with
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,
More informationCo-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming
More informationProgramming for High Performance Computing in Modern Fortran. Bill Long, Cray Inc. 17-May-2005
Programming for High Performance Computing in Modern Fortran Bill Long, Cray Inc. 17-May-2005 Concepts in HPC Efficient and structured layout of local data - modules and allocatable arrays Efficient operations
More informationIMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN
IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of
More informationTechnical Report on further interoperability with C
Technical Report on further interoperability with C John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2003 (or 2008) provides for interoperability of procedures
More informationOpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran
OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman Department of Computer Science University
More informationParallel Programming with Coarray Fortran
Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming
More informationOpenACC Standard. Credits 19/07/ OpenACC, Directives for Accelerators, Nvidia Slideware
OpenACC Standard Directives for Accelerators Credits http://www.openacc.org/ o V1.0: November 2011 Specification OpenACC, Directives for Accelerators, Nvidia Slideware CAPS OpenACC Compiler, HMPP Workbench
More informationThe Complete Compendium on Cooperative Computing using Coarrays. c 2008 Andrew Vaught October 29, 2008
Preface The Complete Compendium on Cooperative Computing using Coarrays. c 2008 Andrew Vaught October 29, 2008 Over the last several decades, the speed of computing has increased exponentially, a phenononom
More informationAdditional Parallel Features in Fortran An Overview of ISO/IEC TS 18508
Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Dr. Reinhold Bader Leibniz Supercomputing Centre Introductory remarks Technical Specification a Mini-Standard permits implementors
More informationTS Further Interoperability of Fortran with C WG5/N1917
TS 29113 Further Interoperability of Fortran with C WG5/N1917 7th May 2012 12:21 Draft document for DTS Ballot (Blank page) 2012/5/7 TS 29113 Further Interoperability of Fortran with C WG5/N1917 Contents
More informationCoarrays in the next Fortran Standard
ISO/IEC JTC1/SC22/WG5 N1724 Coarrays in the next Fortran Standard John Reid, JKR Associates, UK March 18, 2008 Abstract The WG5 committee, at its meeting in Delft, May 2005, decided to include coarrays
More informationChapter 4. Fortran Arrays
Chapter 4. Fortran Arrays Fortran arrays are any object with the dimension attribute. In Fortran 90/95, and in HPF, arrays may be very different from arrays in older versions of Fortran. Arrays can have
More informationLecture 32: Partitioned Global Address Space (PGAS) programming models
COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP
More informationLight HPF for PC Clusters
Light HPF for PC Clusters Hidetoshi Iwashita Fujitsu Limited November 12, 2004 2 Background Fujitsu had developed HPF compiler product. For VPP5000, a distributed-memory vector computer.
More informationMigrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK
Migrating A Scientific Application from MPI to Coarrays John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Why and Why Not? +MPI programming is arcane +New emerging paradigms
More informationCoarrays in the next Fortran Standard
ISO/IEC JTC1/SC22/WG5 N1824 Coarrays in the next Fortran Standard John Reid, JKR Associates, UK April 21, 2010 Abstract Coarrays will be included in the next Fortran Standard, known informally as Fortran
More informationOverlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu
More informationFirst Experiences with Application Development with Fortran Damian Rouson
First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox
More informationAppendix D. Fortran quick reference
Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level
More informationTightly Coupled Accelerators Architecture
Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators
More informationReport from WG5 convener
Report from WG5 convener Content of Fortran 2008 Framework was decided at last years WG5 meeting and was not substantially changed at this year s WG5 meeting. Two large items bits and intelligent macros
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationProceedings of the GCC Developers Summit
Reprinted from the Proceedings of the GCC Developers Summit June 17th 19th, 2008 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering
More informationIntroduction to Fortran95 Programming Part II. By Deniz Savas, CiCS, Shef. Univ., 2018
Introduction to Fortran95 Programming Part II By Deniz Savas, CiCS, Shef. Univ., 2018 Summary of topics covered Logical Expressions, IF and CASE statements Data Declarations and Specifications ARRAYS and
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include
More informationExploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR
Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication
More informationRuntime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism
1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background
More informationIPSJ SIG Technical Report Vol.2014-HPC-145 No /7/29 XcalableMP FFT 1 1 1,2 HPC PGAS XcalableMP XcalableMP G-FFT 90.6% 186.6TFLOPS XMP MPI
XcalableMP FFT, HPC PGAS XcalableMP XcalableMP 89 G-FFT 9.6% 86.6TFLOPS XMP MPI. Fourier (FFT) MPI [] Partitioned Global Address Space (PGAS) FFT PGAS PGAS XcalableMP(XMP)[] C Fortran XMP HPC [] Global-FFT
More informationUniversity of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /
Shterenlikht, A., Margetts, L., Cebamanos, L., & Henty, D. (2015). Fortran 2008 coarrays. ACM SIGPLAN Fortran Forum, 34(1), 10-30. https://doi.org/10.1145/2754942.2754944 Peer reviewed version Link to
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationAn evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks
An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract
More informationMILC Performance Benchmark and Profiling. April 2013
MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationCLAW FORTRAN Compiler source-to-source translation for performance portability
CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary
More informationVerification of Fortran Codes
Verification of Fortran Codes Wadud Miah (wadud.miah@nag.co.uk) Numerical Algorithms Group http://www.nag.co.uk/content/fortran-modernization-workshop Fortran Compilers Compilers seem to be either high
More informationUnderstanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E
Understanding Communication and MPI on Cray XC40 Features of the Cray MPI library Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Well tested code for high level
More informationXcalableMP Implementation and
XcalableMP Implementation and Performance of NAS Parallel Benchmarks Mitsuhisa Sato Masahiro Nakao, Jinpil Lee and Taisuke Boku University of Tsukuba, Japan What s XcalableMP? XcalableMP (XMP for short)
More informationPerformance Evaluation for Omni XcalableMP Compiler on Many-core Cluster System based on Knights Landing
ABSTRACT Masahiro Nakao RIKEN Advanced Institute for Computational Science Hyogo, Japan masahiro.nakao@riken.jp Taisuke Boku Center for Computational Sciences University of Tsukuba Ibaraki, Japan To reduce
More informationLatest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand
Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationCESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011
CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationAdvanced Fortran Programming
Sami Ilvonen Pekka Manninen Advanced Fortran Programming March 20-22, 2017 PRACE Advanced Training Centre CSC IT Center for Science Ltd, Finland type revector(rk) integer, kind :: rk real(kind=rk), allocatable
More informationICON Performance Benchmark and Profiling. March 2012
ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC
More informationSupport for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth
Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth by D.K. Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline Overview of MVAPICH2-GPU
More informationChapter 3. Fortran Statements
Chapter 3 Fortran Statements This chapter describes each of the Fortran statements supported by the PGI Fortran compilers Each description includes a brief summary of the statement, a syntax description,
More informationHimeno Performance Benchmark and Profiling. December 2010
Himeno Performance Benchmark and Profiling December 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource
More informationIndex. classes, 47, 228 coarray examples, 163, 168 copystring, 122 csam, 125 csaxpy, 119 csaxpyval, 120 csyscall, 127 dfetrf,14 dfetrs, 14
Index accessor-mutator routine example in a module, 7 PUBLIC or PRIVATE components, 6 ACM, ix editors of CALGO, ix Adams, Brainerd et al., see books, Fortran reference Airy s equation boundary value problem,
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationOPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA
OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work
More informationComparing One-Sided Communication with MPI, UPC and SHMEM
Comparing One-Sided Communication with MPI, UPC and SHMEM EPCC University of Edinburgh Dr Chris Maynard Application Consultant, EPCC c.maynard@ed.ac.uk +44 131 650 5077 The Future ain t what it used to
More informationCode Parallelization
Code Parallelization a guided walk-through m.cestari@cineca.it f.salvadore@cineca.it Summer School ed. 2015 Code Parallelization two stages to write a parallel code problem domain algorithm program domain
More informationBits, Bytes, and Precision
Bits, Bytes, and Precision Bit: Smallest amount of information in a computer. Binary: A bit holds either a 0 or 1. Series of bits make up a number. Byte: 8 bits. Single precision variable: 4 bytes (32
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationProgramming techniques for heterogeneous architectures. Pietro Bonfa SuperComputing Applications and Innovation Department
Programming techniques for heterogeneous architectures Pietro Bonfa p.bonfa@cineca.it SuperComputing Applications and Innovation Department Heterogeneous computing Gain performance or energy efficiency
More informationLiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster
LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster H. W. Jin, S. Sur, L. Chai, and D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering
More informationAMBER 11 Performance Benchmark and Profiling. July 2011
AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationThe Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer
The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer William Killian Tom Scogland, Adam Kunen John Cavazos Millersville University of Pennsylvania
More informationAdvanced Fortran Topics - Hands On Sessions
Advanced Fortran Topics - Hands On Sessions Getting started & general remarks Log in to the LRZ Linux Cluster The windows command line (Start --> Enter command) should be used to execute the following
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More information