Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla
|
|
- Gervais Ford
- 5 years ago
- Views:
Transcription
1 Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich Rüde Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
2 Outline Motivation walberla Framework Phase-Field Method in walberla Single-Core Optimizations Asynchronous Communication I/O and Post-processing In-Situ Processing with Python Summary and Outlook 2
3 Motivation large domain required to reduce boundary influence some physical patterns only occur in highly resolved simulations ( spiral ) simulate big domains in 3D unoptimized, general purpose code phase field code from KIT available goal: write optimized parallel version for specific model 3
4 The walberla Framework
5 walberla Framework widely applicable Lattice-Boltzmann from Erlangen HPC software framework, originally developed for CFD simulations with Lattice Boltzmann Method (LBM) evolved into general framework for algorithms on block-structured grids Vocal Fold Study (Florian Schornbaum) Free Surface Flow Fluid Structure Interaction (Simon Bogner) 6
6 walberla Framework Written in C++ with Python extensions Hybridly parallelized (MPI + OpenMP) No data structures growing with number of processes involved Scales from laptop to recent petascale machines Parallel I/O Portable (Compiler/OS) Automated tests / CI servers Open Source release planned llvm/clang 7
7 Block-structured Grids Domain Decomposition & Distribution to Processes: regular decomposition into blocks containing uniform grids grid refinement: octree-like decomposition In most cases, if a regular decomposition of a uniform grid is used, exactly one block is assigned to each process. forest of octrees: each block contains a uniform grid of the same size 2:1 balance between neighboring cells on level transitions 8
8 Hybrid Parallelization Distributed Memory Parallelization: MPI data exchange on borders between blocks via ghost layers sender process receiver process (slightly more complicated for non-uniform domain decompositions, but the same general ideas still apply) support for overlapping communication and computation some advanced models require more complex communication patterns ( e.g. free-surface and fluid-structure interaction) 10
9 Phase field in walberla
10 Phase field algorithm two lattices (fields): phase field φ with 4 components chemical potential μ with 2 components storing two time steps in src and dst fields 12
11 Optimizations of Phase Field algorithm
12 Optimization Model Layer moving window technique simplifications due to special setup ( e.g. analytical temperature gradient ) shortcuts: some terms can be neglected in certain cell types Algorithm Layer access patterns / stencils overlapping computation and communication eliminate common subexpressions Hardware Layer SIMDification Memory layout (AoS vs. SoA)
13 Single Core Optimization Results general purpose C code basic walberla implementation with SIMD intrinsics single cell with T(z) optimization with staggered buffer with shortcuts (cellwise branching) interface solid liquid speedup: 6 speedup: 28 speedup: 59 speedup: 67 speedup: MLUP/s ( for φ-kernel only) test system: one SuperMUC core (Intel Xeon E C) for both kernels: systematic performance engineering leads to 80x faster code
14 Optimization Algorithm Layer Optimization
15 Communication Overlap Algorithm Layer Optimization communication of μ can be overlapped without kernel adaptations 19
16 Communication Overlap Algorithm Layer Optimization 20
17 Optimization Algorithm Layer Optimization
18 Optimization Algorithm Layer Optimization
19 Scaling Results SuperMUC JUQUEEN
20 I/O and Postprocessing
21 Managing I/O I/O necessary to store results (frequently) and for checkpointing (seldom) for highly parallel simulations the output of results quickly becomes bottleneck Example: storing one time step of (2420 x 2420 x 1474) domain as voxel file: 386 GB Solution: generate surface mesh from voxel data during simulation, locally on each process using a marching cubes algorithm one mesh for each phase boundary mesh size: < 10 MB 25
22 Managing I/O surface meshes still unnecessarily fine resolved: one triangle per interface cell 26
23 Managing I/O quadric edge reduce algorithm ( cglib ) crucial: mesh reduction step preserves boundary vertices hierarchical mesh coarsening and reduction during simulation result: one coarse mesh with size in the order of several MB local fine meshes generated by marching cubes on coarse mesh on root 27
24 Python Coupling extracting relevant data while simulation is running direct, efficient array access via Python numpy package data is shared, not copied using boost::python library to connect C++ code with Python further applications: flexible configuration model development: Matlab-like functionality available walberla python_coupling module boost::python libpython
25 Simplify Workflow
26 Python Coupling Method 1: Using Python from C++ Host Language: C++ Method 2: Using C++ from Python Host Language: Python walberla python_coupling module Python interpreter boost::python walberla.so libpython
27 Python Coupling Method 1: Using Python from C++ Host Language: C++ Method 2: Using C++ from Python Host Language: Python walberla python_coupling module Python interpreter boost::python walberla.so libpython Demo
28 Summary
29 Summary / Outlook Summary efficient phase field algorithm necessary to simulate certain physical effects ( spiral ) systematic performance engineering several levels speedup by factor of 80 compared to original version parallel output data processing during simulation to reduce result file size coupling to Python scripting language for in-situ processing Outlook GPU implementation coupling to Lattice Boltzmann Method improve discretization scheme (implicit method)
30 Thank you! Questions?
Massively Parallel Phase Field Simulations using HPC Framework walberla
Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationComputational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm
Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm March 17 March 21, 2014 Florian Schornbaum, Martin Bauer, Simon Bogner Chair for System Simulation Friedrich-Alexander-Universität
More informationPeta-Scale Simulations with the HPC Software Framework walberla:
Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,
More informationA Python extension for the massively parallel framework walberla
A Python extension for the massively parallel framework walberla PyHPC at SC 14, November 17 th 2014 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Matthias Markl, Daniela Anderl, Harald Köstler
More informationwalberla: Developing a Massively Parallel HPC Framework
walberla: Developing a Massively Parallel HPC Framework SIAM CS&E 2013, Boston February 26, 2013 Florian Schornbaum*, Christian Godenschwager*, Martin Bauer*, Matthias Markl, Ulrich Rüde* *Chair for System
More informationThe walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms
The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms Harald Köstler, Uli Rüde (LSS Erlangen, ruede@cs.fau.de) Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationMassively Parallel Phase-Field Simulations for Ternary Eutectic Directional Solidification
Massively Parallel Phase-Field Simulations for Ternary Eutectic Directional Solidification Martin Bauer 1 *, Johannes Hötzer 2,3, Philipp Steinmetz 2, Marcus Jainta 2,3, Marco Berghoff 2, Florian Schornbaum
More informationChallenges in Fully Generating Multigrid Solvers for the Simulation of non-newtonian Fluids
Challenges in Fully Generating Multigrid Solvers for the Simulation of non-newtonian Fluids Sebastian Kuckuk FAU Erlangen-Nürnberg 18.01.2016 HiStencils 2016, Prague, Czech Republic Outline Outline Scope
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationAutomatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014
Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential
More informationPerformance Analysis of the Lattice Boltzmann Method on x86-64 Architectures
Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures Jan Treibig, Simon Hausmann, Ulrich Ruede Zusammenfassung The Lattice Boltzmann method (LBM) is a well established algorithm
More informationSimulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method
Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method June 21, 2011 Introduction Free Surface LBM Liquid-Gas-Solid Flows Parallel Computing Examples and More References Fig. Simulation
More information(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more
Parallel Free-Surface Extension of the Lattice-Boltzmann Method A Lattice-Boltzmann Approach for Simulation of Two-Phase Flows Stefan Donath (LSS Erlangen, stefan.donath@informatik.uni-erlangen.de) Simon
More informationFrom Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp
From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with andreas.schaefer@cs.fau.de Friedrich-Alexander-Universität Erlangen-Nürnberg GPU Technology Conference 2013, San José,
More informationParallel Simulation of Dendritic Growth On Unstructured Grids
Parallel Simulation of Dendritic Growth On Unstructured Grids, Julian Hammer, Dietmar Fey Friedrich-Alexander-Universität Erlangen-Nürnberg IA 3 Workshop @SC11, Nov. 13th, 2011 Outline 1 What and why?
More informationLattice Boltzmann Methods on the way to exascale
Lattice Boltzmann Methods on the way to exascale Ulrich Rüde (LSS Erlangen, ulrich.ruede@fau.de) Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de HIGH PERFORMANCE
More informationIntroducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method
Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System
More informationA Python Extension for the Massively Parallel Multiphysics Simulation Framework walberla
A Python Extension for the Massively Parallel Multiphysics Simulation Framework walberla Martin Bauer, Florian Schornbaum, Christian Godenschwager, Matthias Markl, Daniela Anderl, Harald Köstler, and Ulrich
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationTowards Generating Solvers for the Simulation of non-newtonian Fluids. Harald Köstler, Sebastian Kuckuk FAU Erlangen-Nürnberg
Towards Generating Solvers for the Simulation of non-newtonian Fluids Harald Köstler, Sebastian Kuckuk FAU Erlangen-Nürnberg 22.12.2015 Outline Outline Scope and Motivation Project ExaStencils The Application
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationCAF versus MPI Applicability of Coarray Fortran to a Flow Solver
CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD
More informationHow to Optimize Geometric Multigrid Methods on GPUs
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
More informationInternational Supercomputing Conference 2009
International Supercomputing Conference 2009 Implementation of a Lattice-Boltzmann-Method for Numerical Fluid Mechanics Using the nvidia CUDA Technology E. Riegel, T. Indinger, N.A. Adams Technische Universität
More informationICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction
ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow
More informationHigh Performance Computing
High Performance Computing ADVANCED SCIENTIFIC COMPUTING Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich
More informationSimulation of moving Particles in 3D with the Lattice Boltzmann Method
Simulation of moving Particles in 3D with the Lattice Boltzmann Method, Nils Thürey, Christian Feichtinger, Hans-Joachim Schmid Chair for System Simulation University Erlangen/Nuremberg Chair for Particle
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationA Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications
A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications James Bordner, Michael L. Norman San Diego Supercomputer Center University of California, San Diego 15th SIAM Conference
More informationPorting a parallel rotor wake simulation to GPGPU accelerators using OpenACC
DLR.de Chart 1 Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC Melven Röhrig-Zöllner DLR, Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU)
More informationSustainability and Efficiency for Simulation Software in the Exascale Era
Sustainability and Efficiency for Simulation Software in the Exascale Era Dominik Thönnes, Ulrich Rüde, Nils Kohl Chair for System Simulation, University of Erlangen-Nürnberg March 09, 2018 SIAM Conference
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationA Contact Angle Model for the Parallel Free Surface Lattice Boltzmann Method in walberla Stefan Donath (stefan.donath@informatik.uni-erlangen.de) Computer Science 10 (System Simulation) University of Erlangen-Nuremberg
More informationPerformance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures
Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Dirk Ribbrock, Markus Geveler, Dominik Göddeke, Stefan Turek Angewandte Mathematik, Technische Universität Dortmund
More informationGeometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion
Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion M. Stürmer, H. Köstler, and U. Rüde Lehrstuhl für Systemsimulation Friedrich-Alexander-Universität Erlangen-Nürnberg
More informationMassively Parallel Finite Element Simulations with deal.ii
Massively Parallel Finite Element Simulations with deal.ii Timo Heister, Texas A&M University 2012-02-16 SIAM PP2012 joint work with: Wolfgang Bangerth, Carsten Burstedde, Thomas Geenen, Martin Kronbichler
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationPerformance and Software-Engineering Considerations for Massively Parallel Simulations
Performance and Software-Engineering Considerations for Massively Parallel Simulations Ulrich Rüde (ruede@cs.fau.de) Ben Bergen, Frank Hülsemann, Christoph Freundl Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationLattice Boltzmann with CUDA
Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization
More informationGeometric Representations. Stelian Coros
Geometric Representations Stelian Coros Geometric Representations Languages for describing shape Boundary representations Polygonal meshes Subdivision surfaces Implicit surfaces Volumetric models Parametric
More informationτ-extrapolation on 3D semi-structured finite element meshes
τ-extrapolation on 3D semi-structured finite element meshes European Multi-Grid Conference EMG 2010 Björn Gmeiner Joint work with: Tobias Gradl, Ulrich Rüde September, 2010 Contents The HHG Framework τ-extrapolation
More informationOpenACC programming for GPGPUs: Rotor wake simulation
DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing
More informationEfficient Imaging Algorithms on Many-Core Platforms
Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based
More informationORAP Forum October 10, 2013
Towards Petaflop simulations of core collapse supernovae ORAP Forum October 10, 2013 Andreas Marek 1 together with Markus Rampp 1, Florian Hanke 2, and Thomas Janka 2 1 Rechenzentrum der Max-Planck-Gesellschaft
More informationShallow Water Simulations on Graphics Hardware
Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationSolid Modeling. Michael Kazhdan ( /657) HB , FvDFH 12.1, 12.2, 12.6, 12.7 Marching Cubes, Lorensen et al.
Solid Modeling Michael Kazhdan (601.457/657) HB 10.15 10.17, 10.22 FvDFH 12.1, 12.2, 12.6, 12.7 Marching Cubes, Lorensen et al. 1987 Announcement OpenGL review session: When: Today @ 9:00 PM Where: Malone
More informationFast Dynamic Load Balancing for Extreme Scale Systems
Fast Dynamic Load Balancing for Extreme Scale Systems Cameron W. Smith, Gerrett Diamond, M.S. Shephard Computation Research Center (SCOREC) Rensselaer Polytechnic Institute Outline: n Some comments on
More informationForest-of-octrees AMR: algorithms and interfaces
Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationVisualization and Data Analysis using VisIt - In Situ Visualization -
Mitglied der Helmholtz-Gemeinschaft Visualization and Data Analysis using VisIt - In Situ Visualization - Jens Henrik Göbbert 1, Herwig Zilken 1 1 Jülich Supercomputing Centre, Forschungszentrum Jülich
More informationThe Icosahedral Nonhydrostatic (ICON) Model
The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationHigh Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters
SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014
More informationPerformance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf
PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland
More informationTowards modernisation of the Gadget code on many-core architectures Fabio Baruffa, Luigi Iapichino (LRZ)
Towards modernisation of the Gadget code on many-core architectures Fabio Baruffa, Luigi Iapichino (LRZ) Overview Modernising P-Gadget3 for the Intel Xeon Phi : code features, challenges and strategy for
More informationRecent Advances in Heterogeneous Computing using Charm++
Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing
More informationThe challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.
The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1
More informationEfficient Representation and Extraction of 2-Manifold Isosurfaces Using kd-trees
Efficient Representation and Extraction of 2-Manifold Isosurfaces Using kd-trees Alexander Greß and Reinhard Klein University of Bonn Institute of Computer Science II Römerstraße 164, 53117 Bonn, Germany
More informationEnzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy
Enzo-P / Cello Formation of the First Galaxies James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2 Michigan State University Department
More informationGeneric finite element capabilities for forest-of-octrees AMR
Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationCAF versus MPI - Applicability of Coarray Fortran to a Flow Solver
CAF versus MPI - Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller 2 German Research School for Simulation Sciences GmbH, 52062 Aachen, Germany 2 RWTH Aachen
More informationL10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion
L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationGeneration of Multigrid-based Numerical Solvers for FPGA Accelerators
Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System
More informationA parallel patch based algorithm for CT image denoising on the Cell Broadband Engine
A parallel patch based algorithm for CT image denoising on the Cell Broadband Engine Dominik Bartuschat, Markus Stürmer, Harald Köstler and Ulrich Rüde Friedrich-Alexander Universität Erlangen-Nürnberg,Germany
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationMultigrid algorithms on multi-gpu architectures
Multigrid algorithms on multi-gpu architectures H. Köstler European Multi-Grid Conference EMG 2010 Isola d Ischia, Italy 20.9.2010 2 Contents Work @ LSS GPU Architectures and Programming Paradigms Applications
More informationEnabling In Situ Viz and Data Analysis with Provenance in libmesh
Enabling In Situ Viz and Data Analysis with Provenance in libmesh Vítor Silva Jose J. Camata Marta Mattoso Alvaro L. G. A. Coutinho (Federal university Of Rio de Janeiro/Brazil) Patrick Valduriez (INRIA/France)
More informationIntroduction to Multigrid and its Parallelization
Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationUsing Graph Partitioning and Coloring for Flexible Coarse-Grained Shared-Memory Parallel Mesh Adaptation
Available online at www.sciencedirect.com Procedia Engineering 00 (2017) 000 000 www.elsevier.com/locate/procedia 26th International Meshing Roundtable, IMR26, 18-21 September 2017, Barcelona, Spain Using
More informationStructured Grid Generation for Turbo Machinery Applications using Topology Templates
Structured Grid Generation for Turbo Machinery Applications using Topology Templates January 13th 2011 Martin Spel martin.spel@rtech.fr page 1 Agenda: R.Tech activities Grid Generation Techniques Structured
More informationGeneric Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids
Generic Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids Matthieu Lefebvre 1, Jean-Marie Le Gouez 2 1 PhD at Onera, now post-doc at Princeton, department of Geosciences,
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationExploring unstructured Poisson solvers for FDS
Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests
More informationLecture 2 Unstructured Mesh Generation
Lecture 2 Unstructured Mesh Generation MIT 16.930 Advanced Topics in Numerical Methods for Partial Differential Equations Per-Olof Persson (persson@mit.edu) February 13, 2006 1 Mesh Generation Given a
More informationAvailable online at ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013)
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 61 ( 2013 ) 81 86 Parallel Computational Fluid Dynamics Conference (ParCFD2013) An OpenCL-based parallel CFD code for simulations
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationAdaptive Mesh Refinement in Titanium
Adaptive Mesh Refinement in Titanium http://seesar.lbl.gov/anag Lawrence Berkeley National Laboratory April 7, 2005 19 th IPDPS, April 7, 2005 1 Overview Motivations: Build the infrastructure in Titanium
More informationPreliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede
Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison
More informationOptimisation Myths and Facts as Seen in Statistical Physics
Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY
More informationHybrid OpenMP-MPI Turbulent boundary Layer code over 32k cores
Hybrid OpenMP-MPI Turbulent boundary Layer code over 32k cores T/NT INTERFACE y/ x/ z/ 99 99 Juan A. Sillero, Guillem Borrell, Javier Jiménez (Universidad Politécnica de Madrid) and Robert D. Moser (U.
More informationCUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata
CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationReferences. T. LeBlanc, Memory management for large-scale numa multiprocessors, Department of Computer Science: Technical report*311
References [Ande 89] [Ande 92] [Ghos 93] [LeBl 89] [Rüde92] T. Anderson, E. Lazowska, H. Levy, The Performance Implication of Thread Management Alternatives for Shared-Memory Multiprocessors, ACM Trans.
More informationScalable, Hybrid-Parallel Multiscale Methods using DUNE
MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE R. Milk S. Kaulmann M. Ohlberger December 1st 2014 Outline MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 2 /28 Abstraction
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More information1. Mathematical Modelling
1. describe a given problem with some mathematical formalism in order to get a formal and precise description see fundamental properties due to the abstraction allow a systematic treatment and, thus, solution
More informationImplementation of an integrated efficient parallel multiblock Flow solver
Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationA dynamic load-balancing strategy for large scale CFD-applications
A dynamic load-balancing strategy for large scale CFD-applications Philipp Offenhäuser 10.10.2017 1/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 :: Outline Motivation
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationShape of Things to Come: Next-Gen Physics Deep Dive
Shape of Things to Come: Next-Gen Physics Deep Dive Jean Pierre Bordes NVIDIA Corporation Free PhysX on CUDA PhysX by NVIDIA since March 2008 PhysX on CUDA available: August 2008 GPU PhysX in Games Physical
More informationFRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG. Lehrstuhl für Informatik 10 (Systemsimulation)
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) walberla: Visualization of Fluid
More information