Lattice Boltzmann Methods on the way to exascale

Similar documents
Lattice Boltzmann methods on the way to exascale

The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm

simulation framework for piecewise regular grids

Peta-Scale Simulations with the HPC Software Framework walberla:

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more

walberla: Developing a Massively Parallel HPC Framework

Massively Parallel Phase Field Simulations using HPC Framework walberla

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla

Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method

Performance and Software-Engineering Considerations for Massively Parallel Simulations

A Python extension for the massively parallel framework walberla


Towards PetaScale Computational Science

Software and Performance Engineering for numerical codes on GPU clusters

Reconstruction of Trees from Laser Scan Data and further Simulation Topics

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Simulation of moving Particles in 3D with the Lattice Boltzmann Method

Large scale Imaging on Current Many- Core Platforms

Towards Exa-Scale: Computing with Millions of Cores

Sustainability and Efficiency for Simulation Software in the Exascale Era

Adaptive Hierarchical Grids with a Trillion Tetrahedra

Challenges in Fully Generating Multigrid Solvers for the Simulation of non-newtonian Fluids

Numerical Algorithms on Multi-GPU Architectures

Lehrstuhl für Informatik 10 (Systemsimulation)

Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

Towards Generating Solvers for the Simulation of non-newtonian Fluids. Harald Köstler, Sebastian Kuckuk FAU Erlangen-Nürnberg

Massively Parallel Finite Element Simulations with deal.ii

Direct Numerical Simulation of Particulate Flows on Processor Cores

A Python Extension for the Massively Parallel Multiphysics Simulation Framework walberla

Efficient implementation of simple lattice Boltzmann kernels

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

Generic finite element capabilities for forest-of-octrees AMR

Support for Multi physics in Chrono

Simulation Based Engineering Laboratory Technical Report University of Wisconsin - Madison

τ-extrapolation on 3D semi-structured finite element meshes

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG. Lehrstuhl für Informatik 10 (Systemsimulation)

Enzo-P / Cello. Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology. San Diego Supercomputer Center. Department of Physics and Astronomy

Architecture Aware Multigrid

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)

arxiv: v1 [cs.pf] 5 Dec 2011

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Knights Landing Scalability and the Role of Hybrid Parallelism

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

LARGE-SCALE FREE-SURFACE FLOW SIMULATION USING LATTICE BOLTZMANN METHOD ON MULTI-GPU CLUSTERS

Lehrstuhl für Informatik 10 (Systemsimulation)

ORAP Forum October 10, 2013

Free Surface Lattice-Boltzmann fluid simulations. with and without level sets.

Shallow Water Simulations on Graphics Hardware

Applications of ICFD /SPH Solvers by LS-DYNA to Solve Water Splashing Impact to Automobile Body. Abstract

Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration

CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison

Lattice Boltzmann with CUDA

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Numerical Simulation in the Multi-Core Age

High Performance Computing

Simulieren geht über Probieren

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Free Surface Flows with Moving and Deforming Objects for LBM

The Immersed Interface Method

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

smooth coefficients H. Köstler, U. Rüde

Thermal Coupling Method Between SPH Particles and Solid Elements in LS-DYNA

Particleworks: Particle-based CAE Software fully ported to GPU

Particle Tracing Module

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

Development of Hybrid Fluid Jet / Float Polishing Process

Multilevel Methods for Forward and Inverse Ice Sheet Modeling

References. T. LeBlanc, Memory management for large-scale numa multiprocessors, Department of Computer Science: Technical report*311

CFD MODELING FOR PNEUMATIC CONVEYING

Advances of parallel computing. Kirill Bogachev May 2016

arxiv: v1 [cs.dc] 3 Nov 2011

Forest-of-octrees AMR: algorithms and interfaces

NVIDIA Application Lab at Jülich

Shape of Things to Come: Next-Gen Physics Deep Dive

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

HIGH PERFORMANCE COMPUTATION (HPC) FOR THE

Experimental Validation of the Computation Method for Strongly Nonlinear Wave-Body Interactions

Large scale physics simulations in industrial vehicle simulators

Large Scale Simulation of Cloud Cavitation Collapse

Multigrid algorithms on multi-gpu architectures

Final drive lubrication modeling

Cloth Hair. and. soft bodies

Lattice Boltzmann Method: A New Alternative for Loss Estimation Juan C. Díaz Cruz* a, Sergio E. Rodríguez a, Sergio A. Ordóñez b, Felipe Muñoz a a

Technical Report TR

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Team 194: Aerodynamic Study of Airflow around an Airfoil in the EGI Cloud

High Performance Simulation of Free Surface Flows Using the Lattice Boltzmann Method

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

LaBS: a CFD tool based on the Lattice Boltzmann method. E. Tannoury, D. Ricot, B. Gaston

Robust Simulation of Sparsely Sampled Thin Features in SPH-Based Free Surface Flows

Computational Fluid Dynamics and Interactive Visualisation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Transcription:

Lattice Boltzmann Methods on the way to exascale Ulrich Rüde (LSS Erlangen, ulrich.ruede@fau.de) Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de HIGH PERFORMANCE COMPUTING From Clouds and Big Data to Exascale and Beyond An International Advanced Workshop Cetraro Italy, June 27 July 1, 2016 1

Outline Goals: drive algorithms towards their performance limits (scalability is necessary but not sufficient) sustainable software: reproducibility & flexibility coupled multi physics Three software packages: 1. Many body problems: rigid body dynamics 2.8 10 10 non-spherical particles 2. Kinetic methods: Lattice Boltzmann - fluid flow >10 12 cells, adaptive, load balancing 3. Continuum methods: Finite element - multigrid fully implicit solves with >10 13 DoF Real life applications 2

The work horses JUQUEEN Blue Gene/Q architecture 458,752 PowerPC A2 cores 16 cores (1.6 GHz) per node 16 GiB RAM per node 5D torus interconnect 5.8 PFlops Peak TOP 500: #13 SuperMUC Intel Xeon architecture 147,456 cores 16 cores (2.7 GHz) per node 32 GiB RAM per node Pruned tree interconnect 3.2 PFlops Peak TOP 500: #27 LBM Methods Ulrich Rüde

Building block I: The Lagrangian View: Granular media simulations 1250000 spherical particles 256 processors 300300 time steps runtime: 48h (including data output) texture mapping, ray tracing with the physics engine Pöschel, T., & Schwager, T. (2005). Computational granular dynamics: models and algorithms. Springer Science & Business Media. 4

Nonlinear Complementarity and Time Stepping continuous forces impulses Non-penetration conditions ξ =0 ξ + =0 ξ =0 ξ 0 λ n 0 ξ + 0 λ n 0 ξ + 0 λ n 0 ξ 0 Λ n 0 ξ + 0 Λ n 0 Coulomb friction conditions λ to 2 μλ n δv + to 2 λ to = μλ n δv + to δv + to 2 λ to = μλ n δv + to Λ to 2 μλ n δv + to 2 Λ to = μλ n δv + to δv + to 2 =0 discrete ξ δt + δv n(λ) 0 λ n 0 λ to 2 μλ n δv to(λ) 2 λ to = μλ n δv to(λ) Signorini condition impact law friction cone condition frictional reaction opposes slip Moreau, J., Panagiotopoulos P. (1988): Nonsmooth mechanics and applications, vol 302. Springer, Wien-New York Popa, C., Preclik, T., & UR (2014). Regularized solution of LCP problems with application to rigid body dynamics. Numerical Algorithms, 1-12. Preclik, T. & UR (2015). Ultrascale simulations of non-smooth granular dynamics; Computational Particle Mechanics, DOI: 10.1007/s40571-015-0047-6 5

Dense granular channel flow with crystallization 6

Scaling Results Solver algorithmically not optimal for dense systems, hence cannot scale unconditionally, but is highly efficient in many cases of practical importance Strong and weak scaling results for a constant number of iterations performed on SuperMUC and Juqueen Largest ensembles computed 2.8 10 10 non-spherical particles 1.1 10 10 contacts granular gas: scaling results Breakup up of compute times on Erlangen RRZE Cluster Emmy (b) Weak-scaling graph on the Juqueen supercomputer. 9.5% 25.9% 8.0% 12.6% (a) Time-step profile of the granular gas executed with 5 2 2 = 20 processes on a single node. 25.8% 18.1% 8.3% 5.9% 22.7% 30.6% 16.5% 16.0% (b) Time-step profile of the granular gas executed with 8 8 5 = 320 processes on 16 nodes. 7

Building Block III: Scalable Flow Simulations with the Lattice Boltzmann Method Succi, S. (2001). The lattice Boltzmann equation: for fluid dynamics and beyond. Oxford university press. Feichtinger, C., Donath, S., Köstler, H., Götz, J., & Rüde, U. (2011). WaLBerla: HPC software design for computational engineering simulations. Journal of Computational Science, 2(2), 105-112. Extreme Scale LBM Methods - Ulrich Rüde 8

Partitioning and Parallelization static block-level refinement ( forest of octrees) static load balancing compact (KiB/MiB) binary MPI IO allocation of block data ( grids) separation of domain partitioning from simulation (optional) 9

Parallel AMR load balancing different views on domain partitioning 2:1 balanced grid (used for the LBM) distributed graph: nodes = blocks edges explicitly stored as < block ID, process rank > pairs forest of octrees: octrees are not explicitly stored, but implicitly defined via block IDs 10

AMR and Load Balancing with walberla Isaac, T., Burstedde, C., Wilcox, L. C., & Ghattas, O. (2015). Recursive algorithms for distributed forests of octrees. SIAM Journal on Scientific Computing, 37(5), C497-C531. Meyerhenke, H., Monien, B., & Sauerwald, T. (2009). A new diffusion-based multilevel algorithm for computing graph partitions. Journal of Parallel and Distributed Computing, 69(9), 750-761. Schornbaum, F., & Rüde, U. (2016). Massively Parallel Algorithms for the Lattice Boltzmann Method on NonUniform Grids. SIAM Journal on Scientific Computing, 38(2), C96-C126. Extreme Scale LBM Methods - Ulrich Rüde 11

AMR Performance Extreme Scale LBM Methods - Ulrich Rüde 12

AMR Performance uring this refresh process all Extreme Scale LBM Methods - Ulrich Rüde 13

AMR Performance Extreme Scale LBM Methods - Ulrich Rüde 14

AMR Performance Extreme Scale LBM Methods - Ulrich Rüde 15

Performance on Coronary Arteries Geometry Color coded proc assignment Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., & UR (2013). A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (p. 35). ACM. Weak scaling 458,752 cores of JUQUEEN over a trillion (10 12 ) fluid lattice cells cell sizes 1.27μm diameter of red blood cells: 7μm 2.1 10 12 cell updates per second 0.41 PFlops Strong scaling 32,768 cores of SuperMUC cell sizes of 0.1 mm 2.1 million fluid cells 6000+ time steps per second Extreme Scale LBM Methods - Ulrich Rüde

Single Node Performance JUQUEEN SuperMUC vectorized optimized standard Pohl, T., Deserno, F., Thürey, N., UR, Lammers, P., Wellein, G., & Zeiser, T. (2004). Performance evaluation of parallel largescale lattice Boltzmann applications on three supercomputing architectures. Proceedings of the 2004 ACM/IEEE conference on Supercomputing (p. 21). IEEE Computer Society. Donath, S., Iglberger, K., Wellein, G., Zeiser, T., Nitsure, A., & UR (2008). Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems. International Journal of Computational Science and Engineering, 4(1), 3-11. Extreme Scale LBM - Ulrich Rüde

Flow through structure of thin crystals (filter) work with Jose Pedro Galache and Antonio Gil CMT-Motores Termicos, Universitat Politecnica de Valencia 18

Building Block IV (electrostatics) Direct numerical simulation of charged particles in flow Positive and negatively charged particles in flow subjected to transversal electric field Masilamani, K., Ganguly, S., Feichtinger, C., & UR (2011). Hybrid lattice-boltzmann and finite-difference simulation of electroosmotic flow in a microchannel. Fluid Dynamics Research, 43(2), 025501. Bartuschat, D., Ritter, D., & UR (2012). Parallel multigrid for electrokinetic simulation in particle-fluid flows. In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 374-380). IEEE. Bartuschat, D. & UR (2015). Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows, Journal of Computational Science, Volume 8, May 2015, Pages 1-19 19

6-way coupling charge distribution velocity BCs Finite volumes MG treat BCs V-cycle iter at. object motion hydrodynam. force LBM treat BCs stream-collide step electrostat. force Newtonian mechanics collision response object distance correction force Lubrication correction 20

Separation experiment 240 time steps fully 6-way coupled simulation 400 sec on SuperMuc weak scaling up to 32 768 cores 7.1 Mio particles 400 90 80 70 60 50 40 30 20 10 0 10 3 MFLUPS (LBM) 0 250 500 750 1000 1250 1500 1750 2000 Number of nodes 21 LBM Perform. MG Perform. 120 10 3 MLUPS (MG) 300 200 100 80 60 40 20 1 2 4 8 16 32 64 128 256 512 1024 2048 100 Total runtimes [] 0 Number of nodes LBM Map Lubr HydrF pe MG SetRHS PtCm ElectF

Building Block V Volume of Fluids Method for Free Surface Flows joint work with Regina Ammer, Simon Bogner, Martin Bauer, Daniela Anderl, Nils Thürey, Stefan Donath, Thomas Pohl, C Körner, A. Delgado Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR. (2005). Lattice Boltzmann model for free surface flow for modeling foaming. Journal of Statistical Physics, 121(1-2), 179-196. Donath, S., Feichtinger, C., Pohl, T., Götz, J., & UR. (2010). A Parallel Free Surface Lattice Boltzmann Method for Large- Scale Applications. Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, 318. Anderl, D., Bauer, M., Rauh, C., UR, & Delgado, A. (2014). Numerical simulation of adsorption and bubble interaction in protein foams using a lattice Boltzmann method. Food & function, 5(4), 755-763. 22

Free Surface Flows Volume-of-Fluids like approach Flag field: Compute only in fluid Special free surface conditions in interface cells Reconstruction of curvature for surface tension 23

Free Surface Bubble Model Data of a Bubble: Initial Volume (Density=1) Current Volume Density/Pressure = initial volume / current volume Update Management Each process logs change of volume due to cell conversions (Interface Gas / Gas Interface) and mass variations in Interface cells All volume changes are added to the volume of the bubble at the end of the timestep (which also has to be communicated) 24

Simulation for hygiene products (for Procter&Gamble) capillary pressure surface tension inclination contact angle 25

Additive Manufacturing Fast Electron Beam Melting Bikas, H., Stavropoulos, P., & Chryssolouris, G. (2015). Additive manufacturing methods and modelling approaches: a critical review. The International Journal of Advanced Manufacturing Technology, 1-17. Klassen, A., Scharowsky, T., & Körner, C. (2014). Evaporation model for beam based additive manufacturing using free surface lattice Boltzmann methods. Journal of Physics D: Applied Physics, 47(27), 275303. 26

Electron Beam Melting Process 3D printing EU-Project Fast- EBM ARCAM (Gothenburg) TWI (Cambridge) FAU Erlangen Generation of powder bed Energy transfer by electron beam penetration depth heat transfer Flow dynamics melting melt flow surface tension wetting capillary forces contact angles solidification Ammer, R., Markl, M., Ljungblad, U., Körner, C., & UR (2014). Simulating fast electron beam melting with a parallel thermal free surface lattice Boltzmann method. Computers & Mathematics with Applications, 67(2), 318-330. Ammer, R., UR, Markl, M., Jüchter V., & Körner, C. (2014). Validation experiments for LBM simulations of electron beam melting. International Journal of Modern Physics C. 27

Simulation of Electron Beam Melting Simulating powder bed generation using the PE framework High speed camera shows melting step for manufacturing a hollow cylinder WaLBerla Simulation 28

Conclusions 29

CSE research is done by teams Harald Köstler Christian Godenschwager Kristina Pickl Regina Ammer Simon Bogner Florian Schornbaum Sebastian Kuckuk Christoph Rettinger Dominik Bartuschat Martin Bauer 30

Thank you for your attention! Videos, preprints, slides at https://www10.informatik.uni-erlangen.de 31