The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1
Piz Daint, presently one of Europe s most powerful petascale supercomputers Cray XC30 with 5272 hybrid nodes: Intel SandyBridge CPU + NVIDIA K20x GPU ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!2
source: David Leutwyler ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!3
Domain is larger by ~ 10x small: 500 x 500 x 60 large: 1536 x 1536 x 60 Same integration speed 1:80 About 1.5x more nodes small: 95 nodes @ 32 (AMD) cores large: 144 hybrid (GPU+CPU) nodes Different implementations small: regular COSMO code (MPI) large: new MPI+OpenMP/CUDA code source: David Leutwyler ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!4
Speedup of the full COSMO-2 production problem (apples to apples with 33h forecast of Meteo Swiss) 4x Cray XE6 (Nov. 2011) Cray XK7 (Nov. 2012) Cray XC30 (Nov. 2012) Cray XC30 hybrid (GPU) (Nov. 2013) 4x 3x 3x 1.67x 3.36x 2x New HP2C funded code 1.77x 1.49x 2.5x 2x 1x 1.4x 1.35x 1x Current production code ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!5
Energy to solution (kwh / ensemble member) 6.0 Cray XE6 (Nov. 2011) Cray XK7 (Nov. 2012) Cray XC30 (Nov. 2012) Cray XC30 hybrid (GPU) (Nov. 2013) Current production code 4.5 1.41x 1.75x New HP2C funded code 3.0 1.49x 3.93x 6.89x 2.51x 2.64x 1.5 ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!6
The bottom line: Improving the implementation and introducing a new architecture (GPUs) results in a 2 1/2 x speedup and 4 x improvement in energy to solution ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!7
Refactoring COSMO Runtime based 2 km production model of MeteoSwiss % Code Lines (F90) % Runtime Original code (with OpenACC for GPU) Rewrite in C++ (with CUDA backend for GPU) ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!8
velocities pressure temperature water Mathematical description Physical model turbulence Discretization / algorithm Domain science (incl. applied mathematics) lap(i,j,k) = 4.0 * data(i,j,k) + data(i+1,j,k) + data(i-1,j,k) + data(i,j+1,k) + data(i,j-1,k); Code / implementation Code compilation A given supercomputer Port serial code to supercomputers > vectorize > parallelize > petascaling > exascaling >... Computer engineering (& computer science) ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!9
velocities pressure temperature water Mathematical description Physical model turbulence Discretization / algorithm Domain science (incl. applied mathematics) lap(i,j,k) = 4.0 * data(i,j,k) + data(i+1,j,k) + data(i-1,j,k) + data(i,j+1,k) + data(i,j-1,k); Code compilation Architectural options / design Code / implementation Optimal algorithm Auto tuning Domain specific libraries & tools Computer engineering (& computer science) ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!10
COSMO: current and new, HP2C developed code main (current) main (new) dynamics Prototyping code / interactive data analysis Application code physics dynamics physics with OpenMP / OpenACC stencil library X86 GPU boundary conditions Domain Specific Libraries & Tools (DSL&T) GCL MPI MPI Basic Libraries (incl. BLAS, LAPACK, FFT,...) system system Gory detail will be given in Xavier s talk tomorrow ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!11
velocities pressure temperature water Mathematical description Physical model Dynamic developer environment, i.e. not Fortran/C/C++ but based on Python or equivalent dynamic language turbulence Computer engineering (& computer science) Domain science lap(i,j,k) = 4.0 * data(i,j,k) + data(i+1,j,k) + data(i-1,j,k) + data(i,j+1,k) + data(i,j-1,k); Code compilation Architectural options / design Code / implementation Discretization / algorithm Optimal algorithm Auto tuning Domain specific libraries & tools PASC co-design projects ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!12
COSMO & other models: how things could develop with a dynamic developer environment main (current) main (new) scripts (future) dynamics Python environment physics dynamics physics with OpenMP / OpenACC stencil library X86 GPU boundary conditions GCL physics tools numerical tools dynamics grid tools MPI MPI backend backend system system system The main advantage: model development is scalable! ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!13
THANK YOU! ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!14