Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS

Size: px
Start display at page:

Download "Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS"

Transcription

1 Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS

2 Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language, an extension of Fortran and C Has two programming models: Globalview programming model n Abstraction of distribution nodes & templates directives n Data distribution distribute, align & shadow directives n Work distribution task, loop & array directives n Communication/synchronization reflect, gmove, reduction, bcast, barrier & wait_async directives n Intrinsic procedures Localview programming model Coarray Features compatible with Coarray Fortran (CAF) 1.0 n Interoperability with globalview coarray, image & local_alias directives n Coarray/C extensions LENS2015 WORKSHOP 2

3 Background MPI, XMP and CAF Programming An example of 2dimensional stencil communication (width=2) MPI ( 30 lines) call mpi_cart_create, mpi_cart_get and mpi_cart_shift call mpi_type_vector and mpi_type_commit call mpi_isend, mpi_irecv and mpi_waitall image [k1, k21] image [k11, k2] 1 1 (1) n n (3) m m+2 a(m, n) on image [k1, k2] (2) (4) image [k1, k2+1] XMP Globalview (1 line)!$xmp reflect (A) CAF (6 lines) if (k1>1) A(1:0,1:n) = A(m1:m,1:n)[k11,k2] (1) if (k1<k1x) A(m+1:m+2,1:n) = A(1:2,1:n)[k1+1,k2] (2) sync all if (k2>1) A(1:m+2,1:0) = A(1:m+2,n1:n)[k1,k21] (3) if (k2<k2x) A(1:m+2,n+1:n+2) = A(1:m+2,1:2)[k1,k2+1] (4) sync all image [k1+1, k2] Ease of programming: MPI < CAF < XMP Expressiveness: MPI CAF > XMP LENS2015 WORKSHOP 3

4 Contents Coarray Fortran and Other Implementations Issues in Our Implementation Evaluation, Comparing with Other Implementations Summary and Conclusion LENS2015 WORKSHOP 4

5 Coarray Fortran Language Specification An extension of Fortran to describe parallel execution. Adopted as a part of Fortran 2008 Basic Usage of Coarrays Declaration: real A(10,10)[*] // coarray A(10,10) on each image Reference (for get ) and Definition (for put ):... A(i,j)[k]... // reference to A(i,j) on image k A(i,j)[k] =... // assignment to A(i,j) on image k Useful in the context of array expression/assignment:... A[k]... // reference to the whole array A on k... A(i1:i2, j1:j2)[k]... // reference to a subarray of A on k LENS2015 WORKSHOP 5

6 Coarray Fortran Existing Implementations COMPILERS Vendors Cray Fortran Intel Fortran Open Source OpenUH (U. of Houston) Based on Open64 compiler OpenCoarrays Called by GCC or later TRANSLATORS (Sourcetosource Compilers) Open Source Rice CAF (noncompatible w/ F2008) Based on ROSE sourcetosource compiler OmniXMP CAF (preliminary version) Based on OmniXMP sourcetosource compiler LENS2015 WORKSHOP 6

7 Status of Our Implementation Fortran2008 Coarray Features (à ) Fortran2015 Coarray Features Partially supported: co_sum co_max co_min Interoperability with XMP globalview Not supported yet Section Feature in [1] declaration of static coarrays 3 initialization of coarrays declaration of allocatable coarrays reference to coindexed object 4 definition to coindexed variable dummy argument of static coarray 5 dummy argument of allocatable coarray ALLOCATE statement for coarray 9 DEALLOCATE statement for coarray implicit deallocation derived type coarray allocatable component of derived type coarray 10 pointer component of derived type coarray coarray component of structure SYNC ALL statement SYNC IMAGES statement LOCK/UNLOCK statements 12 CRITIDAL section SYNC MEMORY statement stat= and errmsg= specifiers normal termination 13 error termination, ERROR STOP statement image_index, lcobound, ucobound 15 num_images, this_image([coarray [,dim]) atomic_define, atomic_ref Support [1] John Reid. Coarrays in the next Fortran Standard. ISO/IEC JTC1/SC22/WG5 N1824, April 21, 2010 LENS2015 WORKSHOP 7

8 Our Implementation Based on OmniXMP Compiler Omni XMP compiler added to to support coarrays Coarray library XMP library XMP & Coarray program Coarray translator XMP translator Fortran program Fortran compiler (GNU, Fujitsu, ) object Fortran library MPI Linker (GNU, Fujitsu, ) GASNet FJRDMA Implementation of Coarray Features A part of OmniXMP compiler For interoperability with XMP globalview Translates CAF programs into F90 programs For portability not depending on Fortran compilers Advantage of the translator Any Fortran compiler can be chosen. Issues we faced during implementation 1. Memory allocation of the coarrays via the communication library 2. Knowing the Fortran data layout at runtime executable LENS2015 WORKSHOP 8

9 Issue 1 Declaration of Static Coarrays main subrou*ne program foo user_main init_main translator foo init_foo built in main rou*ne call traverser call user_main (3) subrou*ne bar linker a.out bar init_bar initializers traverser generator traverser call init_main call init_foo call init_bar (1) (2) Issue GASNet requires all coarrays to be allocated via GASNet library. à Allocation of static coarrays causes runtime overhead at the entrance of every procedure. Solution: allocation just before executing the program (1) Translator generates initializers corresponding to procedures. (2) Traverser generator generates traverser which calls initializers. (3) Traverser is called previously to the user s main program. LENS2015 WORKSHOP 9

10 Issue 2 Reference to Coindexed Objects Issue Data layout of an array is decided by the backend Fortran compiler. Example of array variable data layout: whole array of an explicitshape array allocation subarray (a part of whole array) and assumedshape array stride data object array element 2dimentionally (fully) contiguous 1dimentionally contiguous noncontiguous For efficient communication, the runtime library should know how long and periodic the contiguous data are. LENS2015 WORKSHOP 10

11 Issue 2 (cont.) Reference to Coindexed Objects Solution: algorithm by cooperation of translator and runtime library (1) Translator generates a library call with arguments: Addresses: Sizes: P 0 = address of A(ib, jb, ) L 0 is size of array element [byte] P 1 = address of A(ib+1, jb, ) L 1 = size(a, 1) P 2 = address of A(ib, jb+1, ) L 2 = size(a, 2) (2) Runtime library executes the following algorithm. P 0 1dim. contiguous P 0 P 2 L 2 2dim. contiguous L 0 P 1 P 0 + L 0 == P 1? no yes L 1 P 0 + L 0 L 1 == P 2? no yes L 0 L 1 L 2 bytes contiguous L 0 bytes contiguous L 0 L 1 bytes contiguous LENS2015 WORKSHOP 11

12 Evaluation Application: Himeno benchmark The original MPI program, 610 lines (excl. comment lines) Ported CAF program, 415 lines ( minus 32% ) Add declaration and allocation of communication buffers as coarrays. Replace mpi_allreduce with co_sum Delete codes around mpi_cart_create, mpi_cart_get, mpi_cat_shift Delete codes around mpi_type_vector, mpi_type_commit Replace mpi_isend/irecv and mpi_waitall with coarray assignment statements. Hardware: HAPACS/TCA in Univ. of Tsukuba CPU Memory GPU Node Network Intel Xeon E52680 v2, 2.8GHz, 10 cores, 2 CPU/node 128GB NVIDIA Tesla K20X x 4 (not used in this evaluation) 64 node/system Mellanox InfiniBand QDR 8GB/s/node LENS2015 WORKSHOP 12

13 # of nodes (images) Comparison in the Implementations Coarray Fortran 1x1x1 1x1x2 1x2x2 2x2x2 2x2x better [GFLOPS] 110 OpenUH gfortran Himeno bench Mmodel/CAF version ifort OpenUH3.0.40/mvapich2.0+GASNet OpenCoarrays1.0.0/gfortran6.0.0/mpich3.1.4 OmniXMP0.9.1/gfortran4.4.7/mvapich2.0+GASNet OmniXMP0.9.1/ifort15.0.2/IntelMPI5.0.0+GASNet ifort coarray=distributed/intelmpi5.0.0 (1) (2) MPI 2x2x4 (1) OmniXMP and OpenUH are comparable in performance if they use comparable Fortran compilers and the same GASNet. (2) The performance of OmniXMP much depends on the Fortran compilers. LENS2015 WORKSHOP 13

14 # of nodes (images) Comparison in the Implementations Coarray Fortran MPI 1x1x1 1x1x2 1x2x2 2x2x2 2x2x4 2x2x better [GFLOPS] 110 OpenUH gfortran Himeno bench Mmodel/CAF version ifort OpenUH3.0.40/mvapich2.0+GASNet OpenCoarrays1.0.0/gfortran6.0.0/mpich3.1.4 OmniXMP0.9.1/gfortran4.4.7/mvapich2.0+GASNet OmniXMP0.9.1/ifort15.0.2/IntelMPI5.0.0+GASNet ifort coarray=distributed/intelmpi5.0.0 (1) OmniXMP and OpenUH are comparable in performance if they use comparable Fortran compilers and the same GASNet. (2) The performance of OmniXMP much depends on the Fortran compilers. (3) CAF programs/omnixmp is currently 2% to 5% less performance than MPI programs with the same Fortran. LENS2015 WORKSHOP 14 (1) gfortran6.0.0/mpich3.1.4 gfortran4.4.7/mvapich2.0 ifort15.0.2/intelmpi5.0.0 (3) (3) (2)

15 Summary and Conclusion OmniXMP CAF Translator Implemented major features Settled some issues about Efficient memory allocation of coarrays Knowing data layout at runtime Evaluation on Himeno benchmark on HAPACS The CAF version program is 32% shorter than the original MPI version and 2% to 5% less performance. OMNIXMP is higher performance than OpenUH, OpenCoarrays and Intel s implementations when Intel Fortran is chosen. OmniXMP with Intel Fortran is 2 to 3 times higher performance than the one with gfortran. The Advantage of the Translator Any backend (Fortran) compiler can be chosen to get the best performance on the environment. LENS2015 WORKSHOP 15

16 LENS2015 WORKSHOP Appendix

17 Goals of OmniXMP CAF Interoperability with XcalableMP (XMP) globalview programming model Portability across different Fortran compilers and platforms Compatibility with coarray features in Fortran2008 And, of course, high performance LENS2015 WORKSHOP 17

18 Declaration of Allocatable Coarray subroutine FOO real(4), allocatable :: a(:, :)[:] allocate ( a(lb1:ub1, lb2:ub2)[*] ) a(i, j) end subroutine subroutine FOO real(4), pointer :: a(:, J call xmpf_coarray_alloc2d_r4 ( desc_a, a, tag_foo, lb1, ub1, lb2, ub2 ) a(i, j) end soubroutine & Runtime library subroutine xmpf_coarray_alloc2d_r4 & (descriptor, a, tag, lb1, ub1, lb2, ub2 ) real(4), pointer :: a(:, :) real(4) :: data(lb1:ub1, lb2:ub2) pointer (pdata, data) pdata = xxx_malloc( 4*(ub1lb1+1)*(ub2lb1+1) ) call pointer_assign(a, data) contains subroutine pointer_assign(a, data) real(4), pointer :: a(:, :) real(4), target :: data(lb1:, lb2:)! set lbounds a => data end subroutine end subroutine LENS2015 WORKSHOP 18

19 OmniXMP CAF Translator Declaration of a Static Coarray XMP & CAF program subroutine foo real(4), save :: a(10,20)[*]... end subroutine Translator subroutine foo real(4) :: a(10,20) pointer (cp_a, a) common /xmpf_cp_foo/cp_a... end subroutine Fortran90 program Fortran90 compiler subroutine xmpf_init_foo integer(8) :: cp_a common /xmpf_cp_foo/cp_a cp_a = xmpf_coarray_malloc(8*10*20) end subroutine object object object Fortran Linker Run9me library LENS2015 WORKSHOP 19 a.out

20 On (1), (4) and (8), the original MPI program was evaluated. On (2), (3), (5), (6), (7) and (9), cafwide was evaluated. (1) mvapich2 2.0 and gcc 4.4.7; option O2. (2) omni/gnu xmpf built w/ (1) and GASNet ibvconduit (built w/ gnu); option O2. (3) UHCAF OpenUH built w/ (1) and GASNet above; options mpi staticlibcaf layer=gasnetibv. (4) Intel MPI and icc/ifort ; option O2. (5) omni/intelo2 xmpf built w/ (4) and GASNet above; option O2. (6) omni/intelo1 same as (5); option O1. (7) ifort/intelmpi same as (4); options O2 coarray=distributed mt_mpi (8) mpich and hydra built w/ gcc 6.0.0; option O2. (9) OpenCoarray 1.0.0; called from (8) w/ options O2 fcoarray=lib lcaf_mpi. LENS2015 WORKSHOP 20

An Open64-based Compiler and Runtime Implementation for Coarray Fortran

An Open64-based Compiler and Runtime Implementation for Coarray Fortran An Open64-based Compiler and Runtime Implementation for Coarray Fortran talk by Deepak Eachempati Presented at: Open64 Developer Forum 2010 8/25/2010 1 Outline Motivation Implementation Overview Evaluation

More information

Fortran 2008: what s in it for high-performance computing

Fortran 2008: what s in it for high-performance computing Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.

More information

Lecture V: Introduction to parallel programming with Fortran coarrays

Lecture V: Introduction to parallel programming with Fortran coarrays Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time

More information

A Coarray Fortran Implementation to Support Data-Intensive Application Development

A Coarray Fortran Implementation to Support Data-Intensive Application Development A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing

More information

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction

More information

C PGAS XcalableMP(XMP) Unified Parallel

C PGAS XcalableMP(XMP) Unified Parallel PGAS XcalableMP Unified Parallel C 1 2 1, 2 1, 2, 3 C PGAS XcalableMP(XMP) Unified Parallel C(UPC) XMP UPC XMP UPC 1 Berkeley UPC GASNet 1. MPI MPI 1 Center for Computational Sciences, University of Tsukuba

More information

Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation

Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation http://omni compiler.org/ Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation MS03 Code Generation Techniques for HPC Earth Science Applications Mitsuhisa Sato (RIKEN / Advanced

More information

Performance Comparison between Two Programming Models of XcalableMP

Performance Comparison between Two Programming Models of XcalableMP Performance Comparison between Two Programming Models of XcalableMP H. Sakagami Fund. Phys. Sim. Div., National Institute for Fusion Science XcalableMP specification Working Group (XMP-WG) Dilemma in Parallel

More information

Bringing a scientific application to the distributed world using PGAS

Bringing a scientific application to the distributed world using PGAS Bringing a scientific application to the distributed world using PGAS Performance, Portability and Usability of Fortran Coarrays Jeffrey Salmond August 15, 2017 Research Software Engineering University

More information

Co-arrays to be included in the Fortran 2008 Standard

Co-arrays to be included in the Fortran 2008 Standard Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:

More information

Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL

Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL 2007 2015 August 5, 2015 Ge Baolai SHARCNET Western University Outline What is coarray How to write: Terms, syntax How to compile

More information

LLVM-based Communication Optimizations for PGAS Programs

LLVM-based Communication Optimizations for PGAS Programs LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson

More information

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for

More information

Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory

Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory This talk will explain the objectives of coarrays, give a quick summary of their history, describe the

More information

Fortran 2008 coarrays

Fortran 2008 coarrays Fortran 2008 coarrays Anton Shterenlikht Mech Eng Dept, The University of Bristol, Bristol BS8 1TR mexas@bris.ac.uk ABSTRACT Coarrays are a Fortran 2008 standard feature intended for SPMD type parallel

More information

HPC Challenge Awards 2010 Class2 XcalableMP Submission

HPC Challenge Awards 2010 Class2 XcalableMP Submission HPC Challenge Awards 2010 Class2 XcalableMP Submission Jinpil Lee, Masahiro Nakao, Mitsuhisa Sato University of Tsukuba Submission Overview XcalableMP Language and model, proposed by XMP spec WG Fortran

More information

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012 Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2

More information

Exploring XcalableMP. Shun Liang. August 24, 2012

Exploring XcalableMP. Shun Liang. August 24, 2012 Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application

More information

int a[100]; #pragma xmp nodes p[*] #pragma xmp template t[100] #pragma xmp distribute t[block] onto p #pragma xmp align a[i] with t[i]

int a[100]; #pragma xmp nodes p[*] #pragma xmp template t[100] #pragma xmp distribute t[block] onto p #pragma xmp align a[i] with t[i] 2 3 4 int a[100]; #pragma xmp nodes p[*] #pragma xmp template t[100] #pragma xmp distribute t[block] onto p #pragma xmp align a[i] with t[i] integer :: a(100)!$xmp nodes p(*)!$xmp template t(100)!$xmp

More information

Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Mitsuhisa Sato

Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Mitsuhisa Sato Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Mitsuhisa Sato Center for Computational Sciences, University of Tsukuba, Japan RIKEN Advanced Institute for Computational Science, Japan 2 XMP/C int array[16];

More information

What is Stencil Computation?

What is Stencil Computation? Model Checking Stencil Computations Written in a Partitioned Global Address Space Language Tatsuya Abe, Toshiyuki Maeda, and Mitsuhisa Sato RIKEN AICS HIPS 13 May 20, 2013 What is Stencil Computation?

More information

Portable, MPI-Interoperable! Coarray Fortran

Portable, MPI-Interoperable! Coarray Fortran Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer

More information

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 H. Wang, S. Potluri, M. Luo, A. K. Singh, X. Ouyang, S. Sur, D. K. Panda Network-Based

More information

Programming Environment Research Team

Programming Environment Research Team Chapter 2 Programming Environment Research Team 2.1 Members Mitsuhisa Sato (Team Leader) Hitoshi Murai (Research Scientist) Miwako Tsuji (Research Scientist) Masahiro Nakao (Research Scientist) Jinpil

More information

Portable, MPI-Interoperable! Coarray Fortran

Portable, MPI-Interoperable! Coarray Fortran Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer

More information

Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E

Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Executive Summary Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Alessandro Fanfarillo, Damian Rouson Sourcery Inc. www.sourceryinstitue.org We report on the experience of installing

More information

More Coarray Features. SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran

More Coarray Features. SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran More Coarray Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Overview Multiple Dimensions and Codimensions Allocatable Coarrays and Components of Coarray Structures

More information

Parallel Programming in Fortran with Coarrays

Parallel Programming in Fortran with Coarrays Parallel Programming in Fortran with Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 is now in FDIS ballot: only typos permitted at this stage.

More information

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 /CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose

More information

Towards Exascale Computing with Fortran 2015

Towards Exascale Computing with Fortran 2015 Towards Exascale Computing with Fortran 2015 Alessandro Fanfarillo National Center for Atmospheric Research Damian Rouson Sourcery Institute Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges

More information

Dangerously Clever X1 Application Tricks

Dangerously Clever X1 Application Tricks Dangerously Clever X1 Application Tricks CUG 2004 James B. White III (Trey) trey@ornl.gov 1 Acknowledgement Research sponsored by the Mathematical, Information, and Division, Office of Advanced Scientific

More information

Morden Fortran: Concurrency and parallelism

Morden Fortran: Concurrency and parallelism Morden Fortran: Concurrency and parallelism GENERAL SUMMERSCHOOL INTEREST SEMINARS 2007 2017 April 19, 2017 Ge Baolai SHARCNET Western University Outline Highlights of some Fortran 2008 enhancement Array

More information

A Local-View Array Library for Partitioned Global Address Space C++ Programs

A Local-View Array Library for Partitioned Global Address Space C++ Programs Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June

More information

MPI_Send(a,..., MPI_COMM_WORLD); MPI_Recv(a,..., MPI_COMM_WORLD, &status);

MPI_Send(a,..., MPI_COMM_WORLD); MPI_Recv(a,..., MPI_COMM_WORLD, &status); $ $ 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size);

More information

Technical Specification on further interoperability with C

Technical Specification on further interoperability with C Technical Specification on further interoperability with C John Reid, ISO Fortran Convener Fortran 2003 (or 2008) provides for interoperability of procedures with nonoptional arguments that are scalars,

More information

An Open-Source Compiler and Runtime Implementation for Coarray Fortran

An Open-Source Compiler and Runtime Implementation for Coarray Fortran An Open-Source Compiler and Runtime Implementation for Coarray Fortran Deepak Eachempati Hyoung Joon Jun Barbara Chapman Computer Science Department University of Houston Houston, TX, 77004, USA {dreachem,

More information

Exploring XMP programming model applied to Seismic Imaging application. Laurence BEAUDE

Exploring XMP programming model applied to Seismic Imaging application. Laurence BEAUDE Exploring XMP programming model applied to Seismic Imaging application Introduction Total at a glance: 96 000 employees in more than 130 countries Upstream operations (oil and gas exploration, development

More information

A Coarray Fortran Implementation to Support Data-Intensive Application Development

A Coarray Fortran Implementation to Support Data-Intensive Application Development A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,

More information

A Heat-Transfer Example with MPI Rolf Rabenseifner

A Heat-Transfer Example with MPI Rolf Rabenseifner A Heat-Transfer Example with MPI (short version) Rolf Rabenseifner rabenseifner@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de A Heat-Transfer Example with

More information

A Coarray Fortran Implementation to Support Data-Intensive Application Development

A Coarray Fortran Implementation to Support Data-Intensive Application Development A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,

More information

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming

More information

Programming for High Performance Computing in Modern Fortran. Bill Long, Cray Inc. 17-May-2005

Programming for High Performance Computing in Modern Fortran. Bill Long, Cray Inc. 17-May-2005 Programming for High Performance Computing in Modern Fortran Bill Long, Cray Inc. 17-May-2005 Concepts in HPC Efficient and structured layout of local data - modules and allocatable arrays Efficient operations

More information

IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN

IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of

More information

Technical Report on further interoperability with C

Technical Report on further interoperability with C Technical Report on further interoperability with C John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2003 (or 2008) provides for interoperability of procedures

More information

OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran

OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman Department of Computer Science University

More information

Parallel Programming with Coarray Fortran

Parallel Programming with Coarray Fortran Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming

More information

OpenACC Standard. Credits 19/07/ OpenACC, Directives for Accelerators, Nvidia Slideware

OpenACC Standard. Credits 19/07/ OpenACC, Directives for Accelerators, Nvidia Slideware OpenACC Standard Directives for Accelerators Credits http://www.openacc.org/ o V1.0: November 2011 Specification OpenACC, Directives for Accelerators, Nvidia Slideware CAPS OpenACC Compiler, HMPP Workbench

More information

The Complete Compendium on Cooperative Computing using Coarrays. c 2008 Andrew Vaught October 29, 2008

The Complete Compendium on Cooperative Computing using Coarrays. c 2008 Andrew Vaught October 29, 2008 Preface The Complete Compendium on Cooperative Computing using Coarrays. c 2008 Andrew Vaught October 29, 2008 Over the last several decades, the speed of computing has increased exponentially, a phenononom

More information

Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508

Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Dr. Reinhold Bader Leibniz Supercomputing Centre Introductory remarks Technical Specification a Mini-Standard permits implementors

More information

TS Further Interoperability of Fortran with C WG5/N1917

TS Further Interoperability of Fortran with C WG5/N1917 TS 29113 Further Interoperability of Fortran with C WG5/N1917 7th May 2012 12:21 Draft document for DTS Ballot (Blank page) 2012/5/7 TS 29113 Further Interoperability of Fortran with C WG5/N1917 Contents

More information

Coarrays in the next Fortran Standard

Coarrays in the next Fortran Standard ISO/IEC JTC1/SC22/WG5 N1724 Coarrays in the next Fortran Standard John Reid, JKR Associates, UK March 18, 2008 Abstract The WG5 committee, at its meeting in Delft, May 2005, decided to include coarrays

More information

Chapter 4. Fortran Arrays

Chapter 4. Fortran Arrays Chapter 4. Fortran Arrays Fortran arrays are any object with the dimension attribute. In Fortran 90/95, and in HPF, arrays may be very different from arrays in older versions of Fortran. Arrays can have

More information

Lecture 32: Partitioned Global Address Space (PGAS) programming models

Lecture 32: Partitioned Global Address Space (PGAS) programming models COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP

More information

Light HPF for PC Clusters

Light HPF for PC Clusters Light HPF for PC Clusters Hidetoshi Iwashita Fujitsu Limited November 12, 2004 2 Background Fujitsu had developed HPF compiler product. For VPP5000, a distributed-memory vector computer.

More information

Migrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK

Migrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Migrating A Scientific Application from MPI to Coarrays John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Why and Why Not? +MPI programming is arcane +New emerging paradigms

More information

Coarrays in the next Fortran Standard

Coarrays in the next Fortran Standard ISO/IEC JTC1/SC22/WG5 N1824 Coarrays in the next Fortran Standard John Reid, JKR Associates, UK April 21, 2010 Abstract Coarrays will be included in the next Fortran Standard, known informally as Fortran

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

First Experiences with Application Development with Fortran Damian Rouson

First Experiences with Application Development with Fortran Damian Rouson First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox

More information

Appendix D. Fortran quick reference

Appendix D. Fortran quick reference Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level

More information

Tightly Coupled Accelerators Architecture

Tightly Coupled Accelerators Architecture Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators

More information

Report from WG5 convener

Report from WG5 convener Report from WG5 convener Content of Fortran 2008 Framework was decided at last years WG5 meeting and was not substantially changed at this year s WG5 meeting. Two large items bits and intelligent macros

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

Unified Runtime for PGAS and MPI over OFED

Unified Runtime for PGAS and MPI over OFED Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction

More information

Proceedings of the GCC Developers Summit

Proceedings of the GCC Developers Summit Reprinted from the Proceedings of the GCC Developers Summit June 17th 19th, 2008 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering

More information

Introduction to Fortran95 Programming Part II. By Deniz Savas, CiCS, Shef. Univ., 2018

Introduction to Fortran95 Programming Part II. By Deniz Savas, CiCS, Shef. Univ., 2018 Introduction to Fortran95 Programming Part II By Deniz Savas, CiCS, Shef. Univ., 2018 Summary of topics covered Logical Expressions, IF and CASE statements Data Declarations and Specifications ARRAYS and

More information

MPI Runtime Error Detection with MUST

MPI Runtime Error Detection with MUST MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include

More information

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication

More information

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism 1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background

More information

IPSJ SIG Technical Report Vol.2014-HPC-145 No /7/29 XcalableMP FFT 1 1 1,2 HPC PGAS XcalableMP XcalableMP G-FFT 90.6% 186.6TFLOPS XMP MPI

IPSJ SIG Technical Report Vol.2014-HPC-145 No /7/29 XcalableMP FFT 1 1 1,2 HPC PGAS XcalableMP XcalableMP G-FFT 90.6% 186.6TFLOPS XMP MPI XcalableMP FFT, HPC PGAS XcalableMP XcalableMP 89 G-FFT 9.6% 86.6TFLOPS XMP MPI. Fourier (FFT) MPI [] Partitioned Global Address Space (PGAS) FFT PGAS PGAS XcalableMP(XMP)[] C Fortran XMP HPC [] Global-FFT

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): / Shterenlikht, A., Margetts, L., Cebamanos, L., & Henty, D. (2015). Fortran 2008 coarrays. ACM SIGPLAN Fortran Forum, 34(1), 10-30. https://doi.org/10.1145/2754942.2754944 Peer reviewed version Link to

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract

More information

MILC Performance Benchmark and Profiling. April 2013

MILC Performance Benchmark and Profiling. April 2013 MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

CLAW FORTRAN Compiler source-to-source translation for performance portability

CLAW FORTRAN Compiler source-to-source translation for performance portability CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary

More information

Verification of Fortran Codes

Verification of Fortran Codes Verification of Fortran Codes Wadud Miah (wadud.miah@nag.co.uk) Numerical Algorithms Group http://www.nag.co.uk/content/fortran-modernization-workshop Fortran Compilers Compilers seem to be either high

More information

Understanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E

Understanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E Understanding Communication and MPI on Cray XC40 Features of the Cray MPI library Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Well tested code for high level

More information

XcalableMP Implementation and

XcalableMP Implementation and XcalableMP Implementation and Performance of NAS Parallel Benchmarks Mitsuhisa Sato Masahiro Nakao, Jinpil Lee and Taisuke Boku University of Tsukuba, Japan What s XcalableMP? XcalableMP (XMP for short)

More information

Performance Evaluation for Omni XcalableMP Compiler on Many-core Cluster System based on Knights Landing

Performance Evaluation for Omni XcalableMP Compiler on Many-core Cluster System based on Knights Landing ABSTRACT Masahiro Nakao RIKEN Advanced Institute for Computational Science Hyogo, Japan masahiro.nakao@riken.jp Taisuke Boku Center for Computational Sciences University of Tsukuba Ibaraki, Japan To reduce

More information

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011 CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Advanced Fortran Programming

Advanced Fortran Programming Sami Ilvonen Pekka Manninen Advanced Fortran Programming March 20-22, 2017 PRACE Advanced Training Centre CSC IT Center for Science Ltd, Finland type revector(rk) integer, kind :: rk real(kind=rk), allocatable

More information

ICON Performance Benchmark and Profiling. March 2012

ICON Performance Benchmark and Profiling. March 2012 ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC

More information

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth by D.K. Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline Overview of MVAPICH2-GPU

More information

Chapter 3. Fortran Statements

Chapter 3. Fortran Statements Chapter 3 Fortran Statements This chapter describes each of the Fortran statements supported by the PGI Fortran compilers Each description includes a brief summary of the statement, a syntax description,

More information

Himeno Performance Benchmark and Profiling. December 2010

Himeno Performance Benchmark and Profiling. December 2010 Himeno Performance Benchmark and Profiling December 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

Index. classes, 47, 228 coarray examples, 163, 168 copystring, 122 csam, 125 csaxpy, 119 csaxpyval, 120 csyscall, 127 dfetrf,14 dfetrs, 14

Index. classes, 47, 228 coarray examples, 163, 168 copystring, 122 csam, 125 csaxpy, 119 csaxpyval, 120 csyscall, 127 dfetrf,14 dfetrs, 14 Index accessor-mutator routine example in a module, 7 PUBLIC or PRIVATE components, 6 ACM, ix editors of CALGO, ix Adams, Brainerd et al., see books, Fortran reference Airy s equation boundary value problem,

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work

More information

Comparing One-Sided Communication with MPI, UPC and SHMEM

Comparing One-Sided Communication with MPI, UPC and SHMEM Comparing One-Sided Communication with MPI, UPC and SHMEM EPCC University of Edinburgh Dr Chris Maynard Application Consultant, EPCC c.maynard@ed.ac.uk +44 131 650 5077 The Future ain t what it used to

More information

Code Parallelization

Code Parallelization Code Parallelization a guided walk-through m.cestari@cineca.it f.salvadore@cineca.it Summer School ed. 2015 Code Parallelization two stages to write a parallel code problem domain algorithm program domain

More information

Bits, Bytes, and Precision

Bits, Bytes, and Precision Bits, Bytes, and Precision Bit: Smallest amount of information in a computer. Binary: A bit holds either a 0 or 1. Series of bits make up a number. Byte: 8 bits. Single precision variable: 4 bytes (32

More information

OCTOPUS Performance Benchmark and Profiling. June 2015

OCTOPUS Performance Benchmark and Profiling. June 2015 OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the

More information

Programming techniques for heterogeneous architectures. Pietro Bonfa SuperComputing Applications and Innovation Department

Programming techniques for heterogeneous architectures. Pietro Bonfa SuperComputing Applications and Innovation Department Programming techniques for heterogeneous architectures Pietro Bonfa p.bonfa@cineca.it SuperComputing Applications and Innovation Department Heterogeneous computing Gain performance or energy efficiency

More information

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster H. W. Jin, S. Sur, L. Chai, and D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering

More information

AMBER 11 Performance Benchmark and Profiling. July 2011

AMBER 11 Performance Benchmark and Profiling. July 2011 AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer

The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer William Killian Tom Scogland, Adam Kunen John Cavazos Millersville University of Pennsylvania

More information

Advanced Fortran Topics - Hands On Sessions

Advanced Fortran Topics - Hands On Sessions Advanced Fortran Topics - Hands On Sessions Getting started & general remarks Log in to the LRZ Linux Cluster The windows command line (Start --> Enter command) should be used to execute the following

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information