A PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre

Size: px

Start display at page:

Download "A PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre"

Corey Booth
5 years ago
Views:

1 A PSyclone perspec.ve of the big picture Rupert Ford STFC Hartree Centre

2 Requirement I Maintainable so,ware maintain codes in a way that subject ma7er experts can s:ll modify the code Leslie Hart from NOAA presen:ng at NCAR Mul:Core 2016 Want to use a single source code for all architectures Ulrich Schä7ler from DWD presen:ng at NCAR Mul:Core 2015 Key aim: Single source science code 2

3 Requirement II High performance Efficient and scalable so,ware on current and future HPC architectures We want Performance on mul:ple architectures and maintainable code Ma7hew Norman from ORNL presen:ng at NCAR Mul:Core 2015 Key aim: Performance portability 3

4 What is the problem? I Complex and evolving science It is hard to restructure codes Codes So,ware has a long life (20 years +) so outlives hardware architectures So,ware Compilers complex and evolving Standards evolving (OpenMP, OpenACC, MPI, ) 4

5 What is the problem? II Hardware Mul:ple levels of parallelism inter-node (MPI), intra-node OpenMP/OpenACC,/, SIMD vectorisa:on, Oversubscrip:on Heterogeneity is coming Very different hardware solu:ons (many-core vs. GPU) Hardware solu:ons change rapidly Memory bandwidth is increasing but so is memory latency Memory direc:ves coming? 5

6 Where are we now? Current best prac:ce Code for hierarchies of parallelism (loops, blocks, par::on) Use standard direc:ves (OpenMP/OpenACC) Op:mise separately for many-core/gpu Try to minimise code differences In prac:ce Large legacy (MPI) code, difficult to modify OpenMP for oversubscrip:on only (or none) Few usable GPU implementa:ons HPC experts op:mise for the latest architecture 6

7 Con.nue as we are? 7

8 Is there an alterna.ve? Libraries General: MPI, NETCDF, BLAS Domain-specific : PIO, MCT, OASIS3, ESMF (infrastructure) Threading abstrac:on (MPI + X) OCCA (targets OpenMP, CUDA, OpenCL, OpenACC) Kokkos (C++, targets OpenMP, CUDA) Performance portable for the given parallelisa:on strategy 8

9 Op.mised code Complex parallel code + Complex parallel architectures + Complex compilers = A complex op:misa:on space Do we really expect there to be a single minimum? 9

10 Simple compiler example 10

11 Op.mised code Code changes to get good performance were invasive, with likely impacts to the CPU, MIC performance Mark Gove7 from NOAA presen:ng at NCAR Mul:Core 2016 on FV3 for GPU s Exact same code performing on all architectures is a pipe dream Ma7hew Norman from ORNL presen:ng at NCAR Mul:Core 2015 Single source op:mised code is not a7ainable 11

12 Is there a solu.on? Separa:on of concerns Separate science code from parallelisa:on and op:misa:on Single science source Targeted parallelisa:on and op:misa:on for portable performance Achievable using domain specific knowledge 12

13 Domain-specific knowledge I Finite element/volume/difference-specific Opera:ons over a mesh Typically same opera:on at each element/volume/point Data parallel (typically independent opera:ons) Nearest neighbour communica:ons for stencils Global reduc:on(s) for convergence and/or conserva:on 13

14 Domain-specific knowledge II Weather/climate-specific Fixed mesh topology Structured or semi-structured (quasi-uniform) mesh Ver:cal resolu:on << horizontal resolu:on 2D + 1D mesh (extrusion) Structured or unstructured data in horizontal and structured in ver:cal Dynamics mostly independent in ver:cal Physics mostly independent in horizontal Parallelise in horizontal 14

15 Exis.ng DSL s I Two main DSL s being used/developed by major centre s for use in this domain Stella/GridTools Designed to support FE/FV/FD ESM dynamical core stencils PSyclone/LFRic Designed to allow support for FE/FV/FD ESM models Also Firedrake A more general purpose DSL for FE s 15

16 Exis.ng DSL s II PSyclone and Firedrake add any required communica:on (halo communica:on and global reduc:ons) Stella? PSyclone will be inves:gated for use with Physics PSyclone designed to support mul:ple API s lfric and gocean (2d nemo-esque) Could PSyclone use Stella? Could PSyclone/Stella use OCCA/Kokkos? 16

17 DSL approach Logically global view at Algorithm level No itera:on over the mesh No reference to parallelism Opera:ons on full fields Unit of work at the Kernel level Mesh point/element or column Can be run in any order Domain specific compiler takes the Algorithm and Kernel specifica:ons and generates architecture-specific op:mised parallel code 17

18 DSL Languages I Both PSyclone and Stella are DSEL s PSyclone is embedded in Fortran Algorithm and Kernel code wri7en in Fortran Stella is embedded in C++ Algorithm and Kernel code wri7en in C++ 18

DSL Languages II PSyclone ra:onale for Fortran Exper:se: Scien:sts have exis:ng exper:se in Fortran Familiarity: Scien:sts (say they) want to con:nue with Fortran Development: unsupported features

19 DSL Languages II PSyclone ra:onale for Fortran Exper:se: Scien:sts have exis:ng exper:se in Fortran Familiarity: Scien:sts (say they) want to con:nue with Fortran Development: unsupported features can be wri7en in Fortran Adop:on: Don t move too far as Scien:fic adop:on is key Legacy: It should be easier to integrate exis:ng Fortran code Performance: Fortran is a reasonable language for HPC keep code simple, portable, readable and similar to Fortran Ulrich Schä7ler from DWD presen:ng at NCAR Mul:Core 2015 Ninja DSL! Fortran code is just a specifica:on Algorithm/Kernel separa:on allows for different languages 19

20 Coded kernels PSyclone kernels wri7en in Fortran so any science can be wri7en PSyclone supports built-in s where no kernel code is required Stella kernels wri7en in C++. Limited to stencil descrip:ons? Firedrake no coded kernels. Closer to FE descrip:on. However, trap door access to manually wri7en C kernels. Coded Fortran kernels requires a data model IJK, IKJ, KIJ, KL 20

21 DSL op.misa.ons I Delaying op:misa:ons 21

Psyclone takes a user-specified op:misa:on approach to support the expert

22 DSL op.misa.ons II Large op:misa:on space Can op:misa:ons be automated? Perhaps for reasonable performance Interes:ng to search the op:misa:on space Psyclone takes a user-specified op:misa:on approach to support the expert HPC expert provides an op:misa:on recipe Compile :me op:misa:on (sta:c analysis) 22

23 Maintenance benefits O,en overlooked Not only single source but Simplified science code Higher level problem specifica:on Lots of code generated NEMO loop bounds! 23

24 Have we missed anything? Func:onal parallelism Currently primarily limited to coupling Inves:gate finer grain (task hierarchy) Smaller units of work and compose these A finer grain coupling approach Suited to architecture heterogeneity (Jetson TX1) EuroExa EU project proposal, Peter Bauer, Graham Riley, Hartree Centre, Poten:al for flexible precision (fpga s) 24

25 Summary To cross the chasm we should aim for single source science code, higher level specifica:ons and performance portability DSEL s are a poten:al way forward Stella/GridTools and PSyclone/LFRic provide demonstrators Mostly complimentary work Perhaps PSyclone could use Stella? Poten:al for collabora:on with op:misa:ons? Convince other groups to try these out Threading abstrac:on libraries are interes:ng. These are complimentary to DSL s DSL s can use them. Func:onal parallelism and heterogeneous architectures are emerging issues Data layout is s:ll a poten:al issue 25

26 Thank you for listening

27 The Abyss What is PSyclone? (Wikipedia) When I made Psyclone, I was at the height of my alcoholism and addic:on, I was literally staring into the abyss" 27

28 Bring people with you 28

29 Hartree Centre Clients

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU