The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms

Size: px

Start display at page:

Download "The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms"

Arron Barnett
5 years ago
Views:

1 The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms Harald Köstler, Uli Rüde (LSS Erlangen, Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de SIAM PP 14 Portland, February

2 Outline 3D printing process as motivating example walberla performance driven co-design scalability GPU acceleration performance engineering conclusions 2

3 walberla: an HPC Multiphysics Framework Focus on lattice Boltzmann method written in C++ Hybridly parallelized (MPI + OpenMP) painstakingly optimized machine-specific kernels for max performance generic, easily adaptable kernels for prototyping all data structures exa-scalable from desktop to multi-petascale machines (and beyond) portable (Compiler/OS) will go open source soon

Motivating Example: Simulation of Electron Beam Melting Process (Additive Manufacturing) EU-Project Fast- EBM ARCAM (Sweden) TWI (Cambridge) WTM (FAU) ZISC (FAU) Generation of powder bed Energy

4 Motivating Example: Simulation of Electron Beam Melting Process (Additive Manufacturing) EU-Project Fast- EBM ARCAM (Sweden) TWI (Cambridge) WTM (FAU) ZISC (FAU) Generation of powder bed Energy transfer by electron beam modeling penetration depth heat transfer Flow dynamics Melting/ solidification phase transition surfce tension fluid flow wetting, capillary forces Joint work with C. Körner, M. Markl, R. Ammer 4

5 Simulation of Electron Beam Melting 5

6 Lattice Boltzmann Method Lattice Boltzmann equation (singlerelaxation time) Macroscopic quantities Equilibrium distribution function

7 Geometry Initialization Complex geometry given by surface Add regular block partitioning Load balancing Discard empty blocks Allocate block data 7

8 Two Multi-PetaFlops Supercomputers JUQUEEN Blue Gene/Q architecture 458,752 PowerPC A2 cores 16 cores (1.6 GHz) per node 16 GiB RAM per node 5D torus interconnect Europe s fastest supercomputer SuperMUC Intel Xeon architecture 147,456 cores 16 cores (2.7 GHz) per node 32 GiB RAM per node Pruned tree interconnect World s fastest x86-based supercomputer SIAM PP 14: Ulrich Ruede

9 Single Node Performance JUQUEEN SuperMUC SIAM PP 14: Ulrich Ruede

10 Weak scaling (Lid Driven Cavity) TRT JUQUEEN 16 processes per node 4 threads per process 1.93 trillion cell updates per second (TLups) SuperMUC 4 processes per node 4 threads per process 837 billion cell updates per second (GLups) SIAM PP 14: Ulrich Ruede

11 Summary of Performance Evaluation on Coronary Geometry Weak scaling on JUQUEEN with over a trillion (10 12 ) fluid lattice cells Cell sizes of 1.27 µm (diameter of red blood cells about 7 µm ) Strong scaling at cell sizes of 0.1 and 0.05 mm In excess of 2000 time steps per second Project co-financed by Siemens Health Care Division Paper at Supercomputing 13 with C. Godenschwager, M. Bauer, F. Schornbaum see also: Talk by Florian Schornbaum in MS 23, Wed.

12 walberla on Tsubame 2.0 at Tokyo Tech Compute nodes: 1442 Processor: Intel Xeon X5670 GPU: 3 x Nvidia Tesla M2050 LINPACK performance: 1.2 Petaflops Power consumption: 1.4 MW Interconnect: QDR Infiniband with C. Feichtinger J. Habich, G. Wellein T. Aoki, Tokyo Tech 12

13 walberla with GPU acceleration 13

14 Overlapping computation and communication 14

15 Performance Model II Single node performance on Tsubame Machine balance Code balance Lightspeed estimate & l = min $ 1, % B B m c #! " 15

16 Single Compute Node Performance 16

17 Single Compute Node Performance II 17

18 Performance Model Driven Single Compute Node Optimization 18

19 Weak scaling, 3 GPUs per node 19

Aoki (Tokyo Tech) Fluidized Beds: Direct numerical simulation fully resolved

20 Heterogenous CPU-GPU Simulation with C. Feichtinger, H. Köstler, J. Habich, G. Wellein, T. Aoki (Tokyo Tech) Fluidized Beds: Direct numerical simulation fully resolved particles Fluid-structureinteraction 4-way-coupling Particles: 31250, Domain: 400x400x200, Timesteps: Devices: 2 x M Intel Westmere, Runtime: 17.5 h 20

21 Fluid-Structure Interaction direct simulation of Particle Laden Flows (4-way coupling) 21

22 Tumbling Fibers with D. Bartuschat and K. Gustavsson (KTH Stockholm): validation against integral eqn/slender body approximation in Stokes flow 22

23 Thank you for your attention! Questions? Animation by S. Bogner. Slides, reports, thesis, animations available for download at: www10.informatik.uni-erlangen.de 23

simulation framework for piecewise regular grids

WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler