A Python extension for the massively parallel framework walberla

Similar documents
Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla

simulation framework for piecewise regular grids

walberla: Developing a Massively Parallel HPC Framework

Massively Parallel Phase Field Simulations using HPC Framework walberla

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm

Peta-Scale Simulations with the HPC Software Framework walberla:

The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms

A Python Extension for the Massively Parallel Multiphysics Simulation Framework walberla

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp

Lattice Boltzmann Methods on the way to exascale

Python VSIP API: A first draft

Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more

Software and Performance Engineering for numerical codes on GPU clusters

Simulation of moving Particles in 3D with the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

International Supercomputing Conference 2009

Two-Phase flows on massively parallel multi-gpu clusters

VisIt Libsim. An in-situ visualisation library

Reconstruction of Trees from Laser Scan Data and further Simulation Topics

Large scale Imaging on Current Many- Core Platforms

Parallel Simulation of Dendritic Growth On Unstructured Grids

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG. Lehrstuhl für Informatik 10 (Systemsimulation)

High Performance Computing

Numerical Algorithms on Multi-GPU Architectures

Challenges in Fully Generating Multigrid Solvers for the Simulation of non-newtonian Fluids

Ikra-Cpp: A C++/CUDA DSL for Object-oriented Programming with Structure-of-Arrays Layout

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

Recent Updates to the CFD General Notation System (CGNS)

Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

LARGE-SCALE FREE-SURFACE FLOW SIMULATION USING LATTICE BOLTZMANN METHOD ON MULTI-GPU CLUSTERS

AutoWIG Documentation

Towards Generating Solvers for the Simulation of non-newtonian Fluids. Harald Köstler, Sebastian Kuckuk FAU Erlangen-Nürnberg

Massively Parallel Phase-Field Simulations for Ternary Eutectic Directional Solidification

Accelerating CFD with Graphics Hardware

Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python

Scientific computing platforms at PGI / JCNS

multiprocessing and mpi4py

In-Situ Data Analysis and Visualization: ParaView, Calalyst and VTK-m

Aerodynamic optimization of low pressure axial fans with OpenFOAM

Python where we can, C ++ where we must

A Multi-layered Domain-specific Language for Stencil Computations

Performance and Software-Engineering Considerations for Massively Parallel Simulations

Parallel I/O Libraries and Techniques

PyCUDA. An Introduction

MPI: the Message Passing Interface


High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

The MoDeNa multi-scale simulation software framework

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Mixed language programming with NumPy arrays

Designing and Optimizing LQCD code using OpenACC

ANSYS AIM 16.0 Overview. AIM Program Management

The Icosahedral Nonhydrostatic (ICON) Model

Caching and Buffering in HDF5

Python Scripting for Computational Science

Foster s Methodology: Application Examples

Architecture Aware Multigrid

Speed and Accuracy of CFD: Achieving Both Successfully ANSYS UK S.A.Silvester

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

Simulation of nasal. flow...

Introduction to Scientific Computing with Python, part two.

cobra-headtail ICE Meeting 6. November 2013, CERN

Python Scripting for Computational Science

Brook Spec v0.2. Ian Buck. May 20, What is Brook? 0.2 Streams

Plotting data in a GPU with Histogrammar

Message Passing Interface

Using jupyter notebooks on Blue Waters. Roland Haas (NCSA / University of Illinois)

Summer 2016 Internship: Mapping with MetPy

An Evaluation of Domain-Specific Language Technologies for Code Generation

OP2 C++ User s Manual

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

CD-adapco STAR Global Conference, Orlando, 2013, March 18-20

STCE. An (more) effective Discrete Adjoint Model for OpenFOAM

A performance portable implementation of HOMME via the Kokkos programming model

An introduction to scientific programming with. Session 5: Extreme Python

The Cut and Thrust of CUDA

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

References. T. LeBlanc, Memory management for large-scale numa multiprocessors, Department of Computer Science: Technical report*311

STAR-CCM+: Ventilation SPRING Notes on the software 2. Assigned exercise (submission via Blackboard; deadline: Thursday Week 9, 11 pm)

Fast Dynamic Load Balancing for Extreme Scale Systems

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration

First Time Experiences Using SciPy for Computer Vision Research

Comparison of PGAS Languages on a Linked Cell Algorithm

Charm++ for Productivity and Performance

Domain Decomposition for Colloid Clusters. Pedro Fernando Gómez Fernández

Parallelization of Scientific Applications (II)

MASSIVELY-PARALLEL MULTI-GPU SIMULATIONS FOR FAST AND ACCURATE AUTOMOTIVE AERODYNAMICS

Introduction to High-Performance Computing

Generic Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids

6.S096 Lecture 10 Course Recap, Interviews, Advanced Topics

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

Contribution of Python to LMGC90 platform

PyROOT Automatic Python bindings for ROOT. Enric Tejedor, Stefan Wunsch, Guilherme Amadio for the ROOT team ROOT. Data Analysis Framework

Post-processing utilities in Elmer

How to Optimize Geometric Multigrid Methods on GPUs

CAF versus MPI Applicability of Coarray Fortran to a Flow Solver

CSCI 171 Chapter Outlines

Transcription:

A Python extension for the massively parallel framework walberla PyHPC at SC 14, November 17 th 2014 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Matthias Markl, Daniela Anderl, Harald Köstler and Ulrich Rüde Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

Outline walberla Framework Data Structures Communication patterns Application Python interface: Motivation: Simulation Setup Data structure export for simulation evaluation/control Challenges of writing a C++/Python interface 2

The walberla Framework

walberla Framework widely applicable Lattice-Boltzmann from Erlangen general HPC software framework, originally developed for CFD simulations with Lattice Boltzmann Method (LBM) but other algorithms working on structured grid also possible: phase field models, molecular dynamics with linked cell algorithm coupling with in-house rigid body physics engine pe Turbulent flow (Re=11000) around a sphere Microstructure simulation of ternary, eutectic solidification process (phase field method) 4

walberla Framework Written in C++ with Python extensions Hybridly parallelized (MPI + OpenMP) No data structures growing with number of processes involved Scales from laptop to recent petascale machines Parallel I/O Portable (Compiler/OS) Automated tests / CI servers Open Source release planned llvm/clang 5

Block Structured Grids structured grid domain is decomposed into blocks blocks are the container data structure for simulation data (lattice) blocks are the basic unit of load balancing 6

Block Structured Grids Complex geometry given by surface Add regular block partitioning Load balancing Discard empty blocks Allocate block data 7

Block Structured Grids Domain partitioning of coronary tree dataset One block per process 512 processes 458,752 processes 9

MLUP/s per core Weak scaling Weak scaling - TRT kernel (3.34 million cells per core) 10 9 8 7 6 5 4 3 2 1 0 2 islands 0 32 256 2048 16384 131072 Cores 60 50 40 30 20 10 Communication share (%) 16P 1T 4P 4T 2P 8T Comm 11

Modular Software Design walberla Library field mpi domain_decomposition cuda geometry communication lattice_boltzmann phasefield pde Main Goal: Modularity Decoupling: heavy use of template concepts 12

Python Extension Motivation: Simulation Setup

Simulation Setup Example Application: free surface LBM simulation Bubbly Flow through channel

Simulation Setup Example Application: free surface LBM simulation Bubbly Flow through channel

Simulation Workflow

Motivation: Configuration files walberla provides a hierarchic key-value pair configuration specified in a JSON-like syntax DomainSetup { dx 0.01; cells < 200,200,1000>; } Physics { LBM_Model { type SRT; relaxation_time 1.9; } } Geometry { free_surface { sphere { position < 100,100,500>; radius 10; } // } at_boundary { position west; type inflow; vel 0.01; } } Compute from dx,dt,viscosity? Use previous values? cells/2 Check valid range? Physical Units?

Motivation: Configuration files Special C++ modules were developed to enhance the config file unit conversion automatic calculation of missing parameters check for valid parameter combinations and ranges support for functions/macros? essentially we started to develop a custom scripting language Python was chosen because: popular in scientific community easy to use together with C/C++ boost::python

Library Stack walberla python_coupling module interface to expose walberla datatypes and call Python functions boost::python C++ Interface libpython C Interface

Callbacks Simulation (still) driven by C++ code, Callbacks to Python functions configuration provided in a Python dict import walberla @walberla.callback( "config" ) def make_config( **kwargs ): return { 'DomainSetup' : { 'cells' : (200,200,1000), } 'Geometry' : { } } python_coupling::callback cb ( "config" ); cb(); // run python function auto config = convertdicttoconfig ( cb.returnvalue() )

Simulation Setup @walberla.callback( "config" ) def config(): c = { 'Physical' : { 'viscosity' : 1e-6*m*m/s, 'surface_tension': 0.072*N/m 'dx' : 0.01*m, #... } 'Control' : { 'timesteps' : 10000, 'vtk_output_interval': 100, #... } } compute_derived_parameters(c) c['physical']['dt'] = find_optimal_dt(c) nondimensionalize(c) return c Units Library Computation and Optimization

Simulation Setup Further callbacks, for example for boundary setup: gas_bubbles = dense_sphere_packing(300,100,100) @walberla.callback( "domain_init" ) def geometry_and_boundary_setup( cell ): p_w = c['physics']['pressure_west'] if is_at_border( cell, 'W' ): boundary = [ 'pressure', p_w ] elif is_at_border( cell, 'E' ): boundary = [ 'pressure', 1.0 ] elif is_at_border( cell, 'NSTB'): boundary = [ 'noslip' ] else: boundary = [] return{'fill_level': 1-gas_bubbles.overlap(cell), 'boundary' : boundary }

Python Extension Simulation Evaluation

Simulation Workflow

Callbacks Next step: make C++ data structures available in Python functions Most of them straightforward, using boost::python wrappers import walberla @walberla.callback( "at_end_of_timestep" ) def my_callback( blockstorage, **kwargs ): for block in blockstorage: # access and analyse simulation data velocity_field = block['velocity'] Callback cb ( "at_end_of_timestep" ); cb.exposeptr("blockstorage", blockstorage ); cb(); // run python function

Field access in Python Field is the central data structure in walberla ( stores all simulation data ) four dimensional array, that stores all simulation data easy switching between AoS and SoA memory layout distributed: field only represents the locally stored part of the lattice

Field access in Python Field is the central data structure in walberla ( stores all simulation data ) four dimensional array, that stores all simulation data easy switching between AoS and SoA memory layout distributed: field only represents the locally stored part of the lattice user friendly and efficient access from Python is essential no copying should behave like a numpy.array (indexing, slicing, etc. )

Field access in Python Solution: Buffer Protocol raw memory is the interface: pointer to beginning together with strides for each dimension not wrapped in boost::python -> directly use C interface import walberla def to_numpy( f ): return np.asarray( f.buffer() ) @walberla.callback( "at_end_of_timestep" ) def my_callback( blockstorage, **kwargs ): for block in blockstorage: # access and analyse simulation data velocity_field = to_numpy( block['velocity'] )

Exporting template code heavy use of template code due to performance reasons all templates have to instantiated which combinations to instantiate? namespace field { // field module template<typename T> class Field; template<typename T> class FlagField; } namespace lbm { // lbm module template<typename FieldType> class VelocityAdaptor; } namespace vtk { // vtk module template<typename FieldType> void write( const FieldType & f ); } namespace postprocessing { // postprocessing module template<typename FieldType> void generateisosurface( const FieldType & f ); }

Exporting template code using boost::mpl::vector; namespace field { // field module template<typename T> class; template<typename T> class FFieldlagField; walberla Library typedef vector< Field<double>, Field<int>, FlagField<unsigned char> > FieldTypes; } namespace lbm { // lbm module template<typename FieldType> class VelocityAdaptor; typedef boost::mpl::vector<velocityadaptor< double > FieldTypes; } typedef boost::mpl::joint_view< field::fieldtypes, lbm::fieldtypes > MyFieldTypes; Application vtk:: python::fieldfunctionexport<myfieldtypes>; postprocessing::python::fieldfunctionexport<myfieldtypes>;

Evaluation Example: determine maximal velocity in channel two stage process, due to distributed storage of lattice import numpy as np @walberla.callback( "at_end_of_timestep" ) def evaluation(blockstorage, bubbles): # Distributed evaluation x_vel_max = 0 for block in blockstorage: vel_field = to_numpy(block['velocity']) x_vel_max = max(vel_field[:,:,:,0].max(), x_vel_max) x_vel_max = mpi.reduce( x_vel_max, mpi.max ) if x_vel_max: #valid on root only log.result("max X Vel", x_vel_max)

Evaluation for smaller simulations: simpler (non-distributed) version import numpy as np @walberla.callback( "at_end_of_timestep" ) def evaluation(blockstorage, bubbles): # Gather and evaluate locally size = blockstorage.numberofcells() vel_profile_z = gather_slice(x=size[0]/2, y=size[1]/2,coarsen=4) if vel_profile_z: #valid on root only eval_vel_profile(vel_profile_z)

Evaluation Python ecosystem for evaluation scipy: FFT to get vertex shedding frequency sqlite3: store results of parameter studies matplotlib: instantly plot results for singlenode, debugging runs Simulation Control: use results of evaluation during simulation Example: when velocity does not change in channel a steady flow developed and simulation can be stopped All scenario related code (setup, analysis, control) in a single file

Summary

Summary

Summary / Outlook Summary Python simplifies all stages of the simulation workflow setup: nondimensionalization, geometry setup evaluation: simulation data accessible as numpy array, storage to database, plotting control: definition of stopping criteria, output control all scenario dependent code in one file, no custom C++ application necessary any more Outlook add possibility to drive simulation from Python (export all required data structures)

Thank you!