GPU Developments in Atmospheric Sciences

Size: px

Start display at page:

Download "GPU Developments in Atmospheric Sciences"

Laureen Garrett
5 years ago
Views:

1 GPU Developments in Atmospheric Sciences Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA David Hall, Ph.D., Sr. Solutions Architect, NVIDIA, Boulder, CO, USA

2 NVIDIA HPC UPDATE TOPICS OF DISCUSSION GPU RESULTS FOR NUMERICAL MODELS: IFS, WRF GPU BENEFITS FOR AI MODELS 2

NVIDIA Company Update HQ in Santa Clara,

3 NVIDIA Company Update HQ in Santa Clara, CA, USA FY18 Rev ~$10B USD GTC 2019 Mar x in 2 years New NVIDIA HQ Endeavor NVIDIA Core Markets GPU (HPC) Computing Computer Graphics Artificial Intelligence 3

4 NVIDIA GPU Introduction GPU Introduction #1 ORNL Summit: 4,600 nodes; 27,600 GPUs CPU PCIe or NVLink Tesla P100 GPU 1x 3x 10x Unified Memory Co-processor to the CPU Threaded Parallel (SIMT) CPUs: x86 Power HPC Motivation: o Performance o Efficiency o Cost Savings 4

NVIDIA GPUs in the Top 10 of Current Top500 It s somewhat ironic that training for deep learning probably has more similarity to the HPL benchmark than many of the simulations that are run today - -

5 NVIDIA GPUs in the Top 10 of Current Top500 It s somewhat ironic that training for deep learning probably has more similarity to the HPL benchmark than many of the simulations that are run today - - Kathy Yelick, LBNL Top500 #1: ORNL Summit RMAX (HPL) = 122 PF IBM Power + GPU system Top #1 s in 3 regions with GPUs: USA ORNL Summit Europe CSCS Piz Daint Japan AIST ABCI Total 5 of Top7 with GPUs #1 Summit vs. #7 Titan (2012) ~7x more performance ~Same power consumption 5

6 Feature Progression in NVIDIA GPU Architectures V100 (2017) P100 (2016) K40 (2014) Double Precision TFlop/s x 5.4x x 1.4 Single Precision TFlop/s x 3.5x x 4.3 Half Precision TFlop/s 120 (DL) ~6x 21.2 n/a Memory Bandwidth (GB/s) x 3.1x x 288 Memory Size 16 or 32GB 2.00x 16GB 1.33x 12GB Interconnect NVLink: Up to 300 GB/s PCIe: 32 GB/s NVLink: 160 GB/s PCIe: 32 GB/s PCIe: 16 GB/s Power 250W - 300W 300W 235W V100 Availability 6

7 NVIDIA GPU Server Node System for Data Centers Tesla HGX-2 Building Block for Large-scale Systems 16 Tesla V100 GPUs 512 GB Memory 14.4 TB/s BW Multi-precision Computing 2 PFLOPS AI 250 TFLOPS FP TFLOPS FP64 7

GPU Collaborations for Atmospheric Models Model Organizations Funding

KISTI, IBM WACA II FV3/UFS NOAA SENA NUMA/NEPTUNE NPS, US Naval Res

DWD, MPI-M, CSCS, MCH PASC ENIAC KIM KIAPS KMA COSMO MCH, CSCS, DWD

8 GPU Collaborations for Atmospheric Models Model Organizations Funding Source E3SM-Atm, SAM DOE: ORNL, SNL E3SM, ECP MPAS-A NCAR, UWyo, KISTI, IBM WACA II FV3/UFS NOAA SENA NUMA/NEPTUNE NPS, US Naval Res Lab NPS IFS ECMWF ESCAPE GungHo/LFRic MetOffice, STFC PSyclone ICON DWD, MPI-M, CSCS, MCH PASC ENIAC KIM KIAPS KMA COSMO MCH, CSCS, DWD PASC GridTools WRFg NVIDIA, NCAR NVIDIA AceCAST-WRF TempoQuest Venture backed 8

9 NVIDIA HPC UPDATE TOPICS OF DISCUSSION GPU RESULTS FOR NUMERICAL MODELS: IFS, WRF GPU BENEFITS FOR AI MODELS 9

variable length GPU-based Spectral Transform Approach: o o o For all vertical Layers 1D FFT

10 ESCAPE Development of Weather & Climate Dwarfs NVIDIA-Developed Dwarf: Spectral Transform - Spherical Harmonics (SH) Batched 1D FFT variable length Batched Legendre Transform (GEMM) variable length GPU-based Spectral Transform Approach: o o o For all vertical Layers 1D FFT along all latitudes GEMM along all longitudes NVIDIA-based libraries for FFT and GEMM OpenACC directives 10

11 GPU Performance of SH Dwarf in IFS Single-GPU From ECMWF Scalability Programme Dr. Peter Bauer, UM User Workshop, MetOffice, Exeter, UK 15 June 2018 Single V100 GPU improved SH dwarf by 14x Single V100 GPU improved MPDATA dwarf by 57x 11

12 GPU Performance of SH Dwarf in IFS Multi-GPU From ECMWF Scalability Programme Dr. Peter Bauer, UM User Workshop, MetOffice, Exeter, UK 15 June 2018 Results of Spherical Harmonics Dwarf on NVIDIA DGX System Additional 2.4x gain from DGX-2 NVSwitch for 16 GPU systems 12

13 WRF GPU Status WRFg Sep 2018 WRFg Based on ARW release Limited physics options: Enough for full model on GPU Further development ongoing Community release: Freely available objects/exe Restricted source availability Availability during Q1 2019: Benchmark tests NOW Support and roadmap TBD WRFg Physics Options (10) Microphysics Radiation Option WSM5 4 WSM6 6 Thompson 8 RRTM (lw), Dudhia (sw) 1 RRTMG_fast (lw,sw) 24 Planetary boundary layer YSU 1 Surface layer Revised MM5 1 Land surface Cumulus 5-layer TDS 1 13

Average Timestep (s) Speedup Node to Node Elapsed Walltime (s) WRFg Full Model Performance Smaller is better ConUS 2018 2.5km, Node Scaling ConUS 2018 2.

14 Average Timestep (s) Speedup Node to Node Elapsed Walltime (s) WRFg Full Model Performance Smaller is better ConUS km, Node Scaling ConUS km, Single Node Nodes x CPU GPU Speedup 0 1xE5-2698V4 1xV100 2xV100 4xV100 8xV100 1x CPU: 10hr 42m, 1x GPU: 2hr 45m, 4x GPU: 41m, 8x GPU: < 21 m ConUS 2.5km 2018* Model, 2xE5-2698V4 CPU, 8xVolta V100 GPU *Modifications to standard ConUS 2.5: WSM6, RRTMG, radt=3, 5-layer TDS 14

15 NVIDIA HPC UPDATE TOPICS OF DISCUSSION GPU RESULTS FOR NUMERICAL MODELS: IFS, WRF GPU BENEFITS FOR AI MODELS David Hall, Ph.D., Sr. Solutions Architect, NVIDIA, Boulder, CO, USA 15

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU