PREPARING AN AMR LIBRARY FOR SUMMIT. Max Katz March 29, 2018

Size: px

Start display at page:

Download "PREPARING AN AMR LIBRARY FOR SUMMIT. Max Katz March 29, 2018"

Mildred Francis
5 years ago
Views:

1 PREPARING AN AMR LIBRARY FOR SUMMIT Max Katz March 29, 2018

2 CORAL: SIERRA AND SUMMIT NVIDIA Volta fueling supercomputers IBM Power 9 + NVIDIA Volta V100 Sierra (LLNL): 4 GPUs/node, ~4300 nodes Summit (ORNL): 6 GPUs/node, ~4600 nodes Up to 300 GB/s combined NVLink bandwidth P9 and V100 share coherent memory access 2

3 AMR-ENABLED SCIENCE ASTROPHYSICS AND COSMOLOGY COMBUSTION ACCELERATORS MULTIPHASE FLOW 3

4 TRYING TO MAKE STARS EXPLODE 4

5 AND SOMETIMES SUCCEEDING 5

rectangular grids Grids are dynamically adjusted Data

6 AMREX Block-structured AMR Solution state defined on hierarchy of levels Levels are unions of (logically) rectangular grids Grids are dynamically adjusted Data is in the form of Zone center/edge/corner data Particles 6

different unit of work Tiling is hidden in iterator over mesh

7 TILES WITHIN BOXES EXPOSE PARALLELISM Logical tiling used to improve serial, parallel performance Data layout unchanged; different unit of work Tiling is hidden in iterator over mesh patches Leads to more efficient memory access Data provided by Weiqun Zhang 7

8 AMREX GRID ITERATOR APPROACH C++ driver launches work within each tile for (grid_iterator it(state); it.isvalid(); ++it) { box = it.tilebox(); local_state = state[it]; // contains fluid characteristics evolve(box.lo(), box.hi(), local_state); } This iterator is configured to loop over grids or tiles, and could do particles instead 8

9 AMREX GRID ITERATOR APPROACH Fortran subroutine does the numerical work subroutine evolve(lo, hi, state) do lo(3), hi(3) do lo(2), hi(2) do lo(1), hi(1) state(i,j,k,:) =... enddo enddo enddo 9

10 AMREX ASTROPHYSICS CODES Nyx: cosmological hydrodynamics Castro: compressible hydrodynamics, self-gravity, nuclear reactions Maestro: low-mach number hydrodynamics, self-gravity, nuclear reactions 10

PERFORMANCE ANALYSIS TEST APPLICATION StarLord: mini-app version of Castro hydrodynamics Sedov blast wave with an astrophysical twist 13 fluid species (atomic isotopes)

11 PERFORMANCE ANALYSIS TEST APPLICATION StarLord: mini-app version of Castro hydrodynamics Sedov blast wave with an astrophysical twist 13 fluid species (atomic isotopes) Realistic astrophysical equation of state Performance measured with standard Figure of Merit FOM == zones evolved per microsecond Typically measured with O(10 6 ) zones per GPU 11

12 MINI-APP TEST RESULTS ON KNL 12

13 PARTICLE SCALING RESULTS ON KNL WarpX electromagnetic PIC code is currently being ported to AMReX Particles are sorted into local tiles for cache-friendliness Reductions done on cell-by-cell basis for multithreading effectiveness Data provided by Andrew Myers 13

14 EVALUATING A GPU STRATEGY Use Unified Memory and C++ iterators to hide complexities of data motion Replace malloc/new with cudamallocmanaged CUDA Fortran used to offload compute onto the device with minimal code markup We will also examine OpenMP and OpenACC for offloading CUDA-aware MPI for transfers between processes 14

15 GRID LOOPING STRATEGY FOR THE GPU for (grid_iterator it(state); it.isvalid(); ++it) { ++it: asynchronously perform operations for each grid (e.g. reductions) box = it.box(); local_state = state[it]; #pragma gpu } evolve(box.lo(), box.hi(), local_state); Grid loop is not tiled state[it]: prefetch the data from this grid to the device Function call: launch device kernel ~grid_iterator: perform device synchronize, optionally prefetch data back to host 15

16 MINI-APP TESTING ON SUMMITDEV IBM Minsky nodes with 20 POWER 8 cores, 4 NVIDIA P100 GPUs at Oak Ridge 20 POWER 8 cores: 0.6 zones/μsec 1x P100: 2.4 zones/μsec 4x P100: 10.3 zones/μsec This problem only feasible on Pascal+ 16

17 MINI-APP TESTING ON SUMMIT 17

18 BENEFITS OF SUMMIT FOR AMR Memory oversubscription is possible with Unified Memory Performance suboptimal when oversubscribing within a level, even with NVLink Streaming between levels recovers most of the performance Example: HPGMG 18

19 AMREX COLLABORATORS LBL: Ann Almgren, Vince Beckner, John Bell, Marc Day, Andrew Myers, Brian Friesen, Andy Nonaka, Weiqun Zhang Stony Brook: Maria Barrios-Sazo, Don Willcox, Mike Zingale MSU: Adam Jacobs LANL: Chris Malone University of Washington: Xinsheng Qin

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics