Handling Parallelisation in OpenFOAM
|
|
- Shonda Bryan
- 5 years ago
- Views:
Transcription
1 Handling Parallelisation in OpenFOAM Hrvoje Jasak Faculty of Mechanical Engineering and Naval Architecture University of Zagreb, Croatia Handling Parallelisation in OpenFOAM p. 1
2 Parallelisation in OpenFOAM Objective Present the approach and implementation of massively parallel discretisation and solution method support in the OpenFOAM Topics Background on architecture and massive parallelism Components Parallel communications library: Pstream Domain decomposition and reconstruction Zero halo layer discretisation support: processor boundary type Discretisation and Algorithmic issues Cell- and point-based algorithms: FVM and FEM Linear solvers and preconditioners Parallelisation of Lagrangian particle tracking Running OpenFOAM in parallel Summary Handling Parallelisation in OpenFOAM p. 2
3 Background Massively Parallel Computing Today, most large-scale CFD solvers rely on distributed-memory parallel computer architectures for all medium-sized and large simulations Parallelisation of commercial solvers is completed: if the algorithm does not parallelise well, it is not used calculation Current development work aimed at bottle-necks: parallel mesh generation Parallel Computer Architecture Parallel CCM software operates almost exclusively in domain decomposition mode: a large loop (e.g. cell-face loop in the FVM solver) is split into bits and given to a separate CPU. Data dependency is handled explicitly by the software Similar to high-performance architecture, parallel computers differ in how each node (CPU) can see and access data (memory) on other nodes: Shared memory machines: single node sees complete memory Distributed memory machines: each node is a self-contained unit. Node-to-node communication involves considerable overhead For distributed memory machines, a node can be an off-the-shelf PC or a server node: cheap, but limited by speed of (network) communication Handling Parallelisation in OpenFOAM p. 3
4 Components Parallel Components and Functionality 1. Parallel communication wrapper Basic information about the run-time environment: serial or parallel execution, number of processors, process IDs etc. Passing information in transparent and protocol-independent manner Global gather-scatter communication operations 2. Mesh-related operations Mesh and data decomposition and reconstruction Global mesh information, e.g. global mesh size Handling patched pairwise communications Processor topology communication scheduling data 3. Discretisation support Processor data updates across processors: data consistency Matrix assembly: executing discretisation operations in parallel 4. Linear equation solver support 5. Auxiliary operations, e.g. messaging or algorithmic communications, non-field algorithms (e.g. particle tracking), data input-output Handling Parallelisation in OpenFOAM p. 4
5 Parallel Communications Stream-Based Parallel Communication Library In C++, functional communication is replaced by a stream-based system int a = 7; // Some C programming, using ellipsis arguments printf("hello World, for the %ith time", a); // Strongly checked, object-oriented C++ way cout << "Hello, World " << a << endl; Stream-based system separates class-dependent part from communications A stream is responsible for handling the data in a stream-dependent way. Example: graph in a file or on a screen In OpenFOAM, all relevant classes contain Istream constructor and operator<<() (write operation) Parallel communication protocol is-a stream: Pstream OPstream toneighb(procpatch.neighbprocno()); toneighb << patchinfo; IPstream fromneighb(procpatch.neighbprocno()); fromneighb >> nbrpatchinfo; Handling Parallelisation in OpenFOAM p. 5
6 Parallel Communications Stream-Based Parallel Communication Library Parallel communications library holds basic run-time information Since Pstream is derived from IOstream, no object-level changes are required: sending a floating point array and a mesh object is of same complexity Buffered and compressed transfer are specified as Pstream settings Implementation Most Pstream data and behaviour is generic (does not depend on the underlying communications algorithm): create and destroy, manage buffers, record processor IDs etc. The part which is communication-dependent is limited to a few functions: initialise, exit, send data and receive data OpenFOAM implements a single Pstream class, but protocol-dependent library is implemented separately: run-time linkage changes the communication protocol! Easy porting to new communication platform: re-implementing calls in a library... and no changes are required anywhere else in the code Communication protocol is chosen by picking the shared library: libpstream Optional use of compression on send/receive to speed up communication Handling Parallelisation in OpenFOAM p. 6
7 Domain Decomposition Approach Parallel FVM Simulation Computational domain is split up into meshes, each associated with a single processor: Allocation of cells to processors Physical decomposition of the mesh in the native solver format This step is termed domain decomposition A mechanism for data transfer between processors needs to be devised: a standard interface to a wrapped communications package Solution algorithm is analysed to establish the inter-dependence and points of synchronisation Mesh partitioning constraints Load balance: all processing units should have approximately the same amount of work between communication and synchronisation points Minimum communication, relative to local work. Performing local computations is orders of magnitude faster than communicating the data Data Reconstruction Operate in the opposite direction: reconstruct decomposed mesh and data Longer term, this should be avoided: parallelisation of the complete simulation Handling Parallelisation in OpenFOAM p. 7
8 Parallel Algorithms Mesh Support For purposes of algorithmic analysis, we shall recognise that each cell belongs to one and only one processor: no inherent overlap for computational points In FVM, mesh faces can be grouped as follows Internal faces, within a single processor mesh Boundary faces Inter-processor boundary faces: faces used to be internal but are now separate and represented on 2 CPUs. No face may belong to more than 2 sub-domains FEM (and cell-to-point interpolation) operates on vertices in a similar manner but a vertex may be multiply shared between processors: overlap is unavoidable Algorithmically, there is no change for objects internal to the mesh and on the processor boundary: this is the source of parallel speed-up The challenge is to repeat the operations for objects on inter-processor boundaries Handling Parallelisation in OpenFOAM p. 8
9 Parallel Algorithms Halo Layer Approach Traditionally, FVM parallelisation uses halo layer approach: data for cells next to a processor boundary is duplicated Halo layer covers all processor boundaries and is explicitly updated through parallel communications calls Major impact on code design: all cell and face loops need to recognise and handle the presence of halo layer Communications pattern is prescribed: only halo information is exchanged Global domain Subdomain 1 Subdomain 2 decomposition Subdomain 3 Subdomain 4 Handling Parallelisation in OpenFOAM p. 9
10 Parallel Algorithms Zero Halo Layer Approach Algorithmically, halo is an inconsistent approach: coupled boundary with out-of-core addressing exists in other places in the code Example: cyclic and periodic boundary conditions pose the same problem y x N P y x N Implementation principle: introduce implicit updated boundaries with out-of core addressing Specific types: cyclic, periodic, processor (= cyclic with communication), GGI Processor boundary update encapsulates communication to do evaluation. Virtual function mechanism does the rest: no impact in the rest of the code! Difference between parallel and serial execution is the presence of processor boundary type and serial or parallel deployment (mpirun) Handling Parallelisation in OpenFOAM p. 10
11 FVM Discretisation Explicit FVM Operators: Gradient Calculation Using Gauss theorem, we need to evaluate face values of the variable. For internal faces, this is done trough interpolation: φ f = f x φ P +(1 f x )φ N Once calculated, face value may be re-used until cell-centred φ changes In parallel, φ P and φ N live on different processors. Assuming φ P is local, φ N can be fetched through communication: this is once-per-solution cost and obtained by pairwise communication Note that all processors perform identical duties: thus, for a processor boundary between domain A and B, evaluation of face values can be done in 3 steps: 1. Collect a subset internal cell values from local domain and send the values to the neighbouring processor 2. Receive neighbour values from neighbouring processor 3. Evaluate local processor face value using interpolation Handling Parallelisation in OpenFOAM p. 11
12 FVM Discretisation Finite Volume Matrix Assembly Similar to gradient calculation above, assembly of matrix coefficients on processor boundaries can be done using simple pairwise communication In order to assemble the coefficient, we need geometrical information and some interpolated data: all readily available, maybe with some communication Example: off-diagonal coefficient of a Laplace operator a N = s f γ f d f where γ f is the interpolated diffusion coefficient and the rest are geometry-related properties. In actual implementation, geometry is calculated locally and interpolation factors are cached to minimise communication Discretisation of a convection term is similarly simple Sources, sinks and temporal schemes all remain unchanged: each cell belongs to only one processor Handling Parallelisation in OpenFOAM p. 12
13 FEM Discretisation Supporting FEM Discretisation Unlike the FVM, FEM computational points are present on multiple processors Since discretisation is element-based, matrix assembly is completed by combining contribution from various processors Parallel communication pattern may now be more complex: n processors sharing the same point... but for consistency FEM must operate on the same domain decomposition In order to avoid communications overheads of many small messages in a complex connectivity graph, a 2-tier update is used Pairwise patch-based communications: long messages, similar to FVM Collapse all globally shared communication into a single gather-scatter operation: avoid small messages and complex comms pattern Handling Parallelisation in OpenFOAM p. 13
14 Linear Solver Algorithms Linear Equation Solvers Major impact of parallelism in linear equation solvers is in choice of algorithm. Only algorithms that can operate on a fixed local matrix slice created by local discretisation will give acceptable performance In terms of code organisation, each sub-domain creates its own numbering space: locally, equation numbering always starts with zero and one cannot rely on global numbering: it breaks parallel efficiency Coefficients related to processor interfaces are kept separate and multiplied through in a separate matrix update Impact of processor boundaries will be seen in: Every matrix-vector multiplication operation Every Gauss-Seidel or similar smoothing sweep... but nowhere else! Handling Parallelisation in OpenFOAM p. 14
15 Linear Solver Algorithms Processor Interface Updates Data dependency in out-of-core vector- matrix multiplication is identical to explicit evaluation of shared data during discretisation This appears for all implicitly coupled boundary conditions: virtual base class interface needed lducoupledinterface handles all out-of-core updates. It is updated after every vector-matrix operation or smoothing sweep processorlducoupledinterface is a derived class, using Pstream for communications and processorfvpatch for addressing Parallel Algebraic Multigrid (AMG) As a rule, Krylov space solvers parallelise naturally: global updates on scaling and residual combined with local vector-matrix operations In Algebraic Multigrid care needs to be given to coarsening algorithms Aggregative AMG (AAMG) work naturally on matrices without overlap (FVM) For cases with overlap (FEM), Selective AMG works better Currently, all algorithms assume uniform communications performance across the machine. For very large clusters, AMG suffers due to lack of scaling: coarse level serialises the work and boosts communications latency issues Handling Parallelisation in OpenFOAM p. 15
16 Parallel Algorithms Synchronisation Parallel domain decomposition solvers operate such that all processors follow identical execution path in the code. In order to achieve this, some decisions and control parameters need to be synchronised across all processor Example: convergence tolerance. If one of the processors decides convergence is reached and others do not, they will attempt to continue with iterations and simulation will lock up waiting for communication Global reduce operations synchronise decision-making and appear throughout high-level code. This is built into reduction operators: gsum, gmax, gmin etc. Communications in global reduce is of gather-scatter type: all CPUs send their data to CPU 0, which combines the data and broadcasts it back Actual implementation is more clever: using native gather-scatter functionality optimised for type of inter-connect, or hierarchical communications vector force = sum ( mesh.sf().boundaryfield()[patchi]*p.boundaryfield()[patchi] ); reduce(force, sumop<vector>()); Handling Parallelisation in OpenFOAM p. 16
17 Lagrangian Particle Tracking Parallelisation of Particle Tracking Particle tracking algorithm operates on each segment of decomposed mesh by tracking local particles Tracking class implements boundary interaction: what happens when a particle hits a boundary face is handled through virtual functions Processor boundary interaction: answering the virtual function interface, a particle is migrated to connecting processor Issues with load balancing: loss of performance for cases where particles are not uniformly distributed in the domain Handling Parallelisation in OpenFOAM p. 17
18 Running OpenFOAM in Parallel Case Preparation Typically, the case will be prepared in one piece: serial mesh generation decomposepar: parallel decomposition tool, controlled by decomposepardict Options in the dictionary allow choice of decomposition and auxiliary data Upon decomposition, processornn directories are created with decomposed mesh and fields; solution controls, model choice and discretisation parameters are shared. Each CPU may use local disk space decomposepar may output cell-to-processor decomposition Manual decomposition (debugging): provide cell-to-processor file Parallel Execution Top-level code does not change between serial and parallel execution: operations related to parallel support are embedded in the library Launch executable using mpirun (or equivalent) with -parallel option Data in time directories is created on a per-processor basis It is possible to visualise a single CPU data (but we do not do it often): there may be problems with processor boundaries Field initialisation may also be run in parallel: trivial parallelisation Handling Parallelisation in OpenFOAM p. 18
19 Running OpenFOAM in Parallel On-the-Fly Data Analysis Sampling, graphing and post-processing tools will execute correctly in parallel: location of probes is reduced over CPUs In parallel decomposition, all patches are present on all CPUs (even with zero size). This allows global reduction of patch-based data Graphical Post-Processing Used most often: reconstruct the data to a single CPU: reconstructpar Reconstructed data can be re-decomposed to a different number of CPUs Dynamic load balancing and some mesh handling features are work-in-progress For large data sets, sampling tools may be executed in parallel: extract and merge iso-surfaces for visualisation Handling Parallelisation in OpenFOAM p. 19
20 Summary Parallelisation in OpenFOAM Parallel communications are wrapped in Pstream library to isolate communication details from library use Discretisation uses the domain decomposition with zero halo layer approach Parallel updates are a special case of coupled discretisation and linear algebra functionality: the code transparently implements parallel coupling Conclusion processorfvpatch and equivalent FEM/FAM class for coupled discretisation updates processorlduinterface and field for linear algebra updates OpenFOAM is a mature and validated object-oriented implementation of domain decomposition parallelism There are no specific top-level parallelisation requirements: identical code operates in serial and parallel execution Porting to new communication protocols is easy Future work involves chasing performance with non-trivial communication setup and dynamic load balancing Handling Parallelisation in OpenFOAM p. 20
SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND
Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana
More informationPublic Release: Native Overset Mesh in FOAM-Extend
Public Release: Native Overset Mesh in FOAM-Extend Vuko Vukčević and Hrvoje Jasak Faculty of Mechanical Engineering and Naval Architecture, Uni Zagreb, Croatia Wikki Ltd. United Kingdom 5th OpenFOAM UK
More informationConjugate Simulations and Fluid-Structure Interaction In OpenFOAM
Conjugate Simulations and Fluid-Structure Interaction In OpenFOAM Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom and FSB, University of Zagreb, Croatia 7-9th June 2007 Conjugate Simulations
More informationTurbo Tools and General Grid Interface
Turbo Tools and General Grid Interface Theoretical Basis and Implementation Hrvoje Jasak, Wikki United Kingdom and Germany Turbo Tools and General Grid Interface p. 1 General Grid Interface Objective Present
More informationOpenFOAM: Open Platform for Complex Physics Simulations
OpenFOAM: Open Platform for Complex Physics Simulations Hrvoje Jasak h.jasak@wikki.co.uk, hrvoje.jasak@fsb.hr FSB, University of Zagreb, Croatia Wikki Ltd, United Kingdom 18th October 2007 OpenFOAM: Open
More informationRunning OpenFOAM in parallel
Running OpenFOAM in parallel Tommaso Lucchini Department of Energy Politecnico di Milano Running in parallel The method of parallel computing used by OpenFOAM is known as domain decomposition, in which
More informationDynamic Mesh Handling in OpenFOAM
Dynamic Mesh Handling in OpenFOAM p. 1/18 Dynamic Mesh Handling in OpenFOAM Hrvoje Jasak h.jasak@wikki.co.uk, hrvoje.jasak@fsb.hr Wikki Ltd, United Kingdom and FSB, University of Zagreb, Croatia 47th AIAA
More informationDistributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs
Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship
More informationOverview and Recent Developments of Dynamic Mesh Capabilities
Overview and Recent Developments of Dynamic Mesh Capabilities Henrik Rusche and Hrvoje Jasak h.rusche@wikki-gmbh.de and h.jasak@wikki.co.uk Wikki Gmbh, Germany Wikki Ltd, United Kingdom 6th OpenFOAM Workshop,
More informationFree Surface Flow Simulations
Free Surface Flow Simulations Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd. United Kingdom 11/Jan/2005 Free Surface Flow Simulations p.1/26 Outline Objective Present two numerical modelling approaches for
More informationSecond OpenFOAM Workshop: Welcome and Introduction
Second OpenFOAM Workshop: Welcome and Introduction Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom and FSB, University of Zagreb, Croatia 7-9th June 2007 Second OpenFOAM Workshop:Welcome and
More informationThis offering is not approved or endorsed by OpenCFD Limited, the producer of the OpenFOAM software and owner of the OPENFOAM and OpenCFD trade marks.
Disclaimer This offering is not approved or endorsed by OpenCFD Limited, the producer of the OpenFOAM software and owner of the OPENFOAM and OpenCFD trade marks. Introductory OpenFOAM Course From 8 th
More informationMulti-Physics Simulations in Continuum Mechanics
Multi-Physics Simulations in Continuum Mechanics Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom FSB, University of Zagreb, Croatia Multi-Physics Simulations in Continuum Mechanics p.1/22 Outline
More informationOpenFOAM: A C++ Library for Complex Physics Simulations
OpenFOAM: A C++ Library for Complex Physics Simulations Hrvoje Jasak Aleksandar Jemcov and Željko Tuković h.jasak@wikki.co.uk Wikki Ltd, United Kingdom FSB, University of Zagreb, Croatia Development Department,
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationNew Developments in OpenFOAM
New Developments in OpenFOAM Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom FSB, University of Zagreb, Croatia New Developments in OpenFOAM p.1/25 Outline Objective 1. Present the developing
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI
Introduction to Parallel Programming for Multi/Many Clusters Part II-3: Parallel FVM using MPI Kengo Nakajima Information Technology Center The University of Tokyo 2 Overview Introduction Local Data Structure
More information1.2 Numerical Solutions of Flow Problems
1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian
More informationOpenFOAM Library for Fluid Structure Interaction
OpenFOAM Library for Fluid Structure Interaction 9th OpenFOAM Workshop - Zagreb, Croatia Željko Tuković, P. Cardiff, A. Karač, H. Jasak, A. Ivanković University of Zagreb Faculty of Mechanical Engineering
More informationNumerical Modelling in Continuum Mechanics
Numerical Modelling in Continuum Mechanics Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd. United Kingdom 22/Mar/2005 Numerical Modelling in Continuum Mechanics p.1/31 Outline Objective Present a new way of
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationTAU mesh deformation. Thomas Gerhold
TAU mesh deformation Thomas Gerhold The parallel mesh deformation of the DLR TAU-Code Introduction Mesh deformation method & Parallelization Results & Applications Conclusion & Outlook Introduction CFD
More informationFinite Volume Methodology for Contact Problems of Linear Elastic Solids
Finite Volume Methodology for Contact Problems of Linear Elastic Solids H. Jasak Computational Dynamics Ltd. Hythe House 200 Shepherds Bush Road London W6 7NY, England E-mail: h.jasak@cd.co.uk H.G. Weller
More informationIntroduction to OpenFOAM at SIMDI 06
Introduction to OpenFOAM at SIMDI 06 Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom Introduction to OpenFOAM at SIMDI 06 p.1/20 Open Source CFD Platform OpenFOAM: Open Source Computational
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationRunning in parallel. Total number of cores available after hyper threading (virtual cores)
First at all, to know how many processors/cores you have available in your computer, type in the terminal: $> lscpu The output for this particular workstation is the following: Architecture: x86_64 CPU
More informationPROGRAMMING OF MULTIGRID METHODS
PROGRAMMING OF MULTIGRID METHODS LONG CHEN In this note, we explain the implementation detail of multigrid methods. We will use the approach by space decomposition and subspace correction method; see Chapter:
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationAdaptive numerical methods
METRO MEtallurgical TRaining On-line Adaptive numerical methods Arkadiusz Nagórka CzUT Education and Culture Introduction Common steps of finite element computations consists of preprocessing - definition
More informationComputational Fluid Dynamics - Incompressible Flows
Computational Fluid Dynamics - Incompressible Flows March 25, 2008 Incompressible Flows Basis Functions Discrete Equations CFD - Incompressible Flows CFD is a Huge field Numerical Techniques for solving
More informationMPI Casestudy: Parallel Image Processing
MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by
More informationOpenFOAM Programming the basic classes
OpenFOAM Programming the basic classes Prof Gavin Tabor Friday 25th May 2018 Prof Gavin Tabor OpenFOAM Programming the basic classes Friday 25th May 2018 1 / 30 OpenFOAM : Overview Overview : programming
More information3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs
3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional
More informationHands-On Training with OpenFOAM
Hands-On Training with OpenFOAM Flow Around a 2-D Airfoil Hrvoje Jasak hrvoje.jasak@fsb.hr Faculty of Mechanical Engineering and Naval Architecture University of Zagreb, Croatia Hands-On Training with
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationResilient geometric finite-element multigrid algorithms using minimised checkpointing
Resilient geometric finite-element multigrid algorithms using minimised checkpointing Dominik Göddeke, Mirco Altenbernd, Dirk Ribbrock Institut für Angewandte Mathematik (LS3) Fakultät für Mathematik TU
More informationMemory Efficient Adaptive Mesh Generation and Implementation of Multigrid Algorithms Using Sierpinski Curves
Memory Efficient Adaptive Mesh Generation and Implementation of Multigrid Algorithms Using Sierpinski Curves Michael Bader TU München Stefanie Schraufstetter TU München Jörn Behrens AWI Bremerhaven Abstract
More informationFinite element methods in scientific computing. Wolfgang Bangerth, Texas A&M University
Finite element methods in scientific computing, Texas A&M University Implementing the finite element method A brief re-hash of the FEM, using the Poisson equation: We start with the strong form: Δ u=f...and
More informationRecent developments for the multigrid scheme of the DLR TAU-Code
www.dlr.de Chart 1 > 21st NIA CFD Seminar > Axel Schwöppe Recent development s for the multigrid scheme of the DLR TAU-Code > Apr 11, 2013 Recent developments for the multigrid scheme of the DLR TAU-Code
More informationObject-Oriented CFD Solver Design
Object-Oriented CFD Solver Design Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd. United Kingdom 10/Mar2005 Object-Oriented CFD Solver Design p.1/29 Outline Objective Present new approach to software design
More informationOpenFOAM in Wave and Offshore CFD
OpenFOAM in Wave and Offshore CFD Capabilities of the Naval Hydro Pack Hrvoje Jasak Wikki Ltd. United Kingdom Faculty of Mechanical Engineering and Naval Architecture, Uni Zagreb, Croatia University of
More informationWhy Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends
Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationTransactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN
The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics
More informationIntroduction to Multigrid and its Parallelization
Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are
More informationHighly Parallel Multigrid Solvers for Multicore and Manycore Processors
Highly Parallel Multigrid Solvers for Multicore and Manycore Processors Oleg Bessonov (B) Institute for Problems in Mechanics of the Russian Academy of Sciences, 101, Vernadsky Avenue, 119526 Moscow, Russia
More informationMultigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver
Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Juan J. Alonso Department of Aeronautics & Astronautics Stanford University CME342 Lecture 14 May 26, 2014 Outline Non-linear
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationImplementation of an integrated efficient parallel multiblock Flow solver
Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes
More informationTopology optimization of heat conduction problems
Topology optimization of heat conduction problems Workshop on industrial design optimization for fluid flow 23 September 2010 Misha Marie Gregersen Anton Evgrafov Mads Peter Sørensen Technical University
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationGPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA
GPU-Accelerated Algebraic Multigrid for Commercial Applications Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA ANSYS Fluent 2 Fluent control flow Accelerate this first Non-linear iterations Assemble
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationCoupled Simulation of Flow and Body Motion Using Overset Grids. Eberhard Schreck & Milovan Perić
Coupled Simulation of Flow and Body Motion Using Overset Grids Eberhard Schreck & Milovan Perić Contents Dynamic Fluid-Body Interaction (DFBI) model in STAR-CCM+ Overset grids method in STAR-CCM+ Advantages
More informationMarine Hydrodynamics Solver in OpenFOAM
Marine Hydrodynamics Solver in OpenFOAM p. 1/14 Marine Hydrodynamics Solver in OpenFOAM Hrvoje Jasak and Henrik Rusche h.jasak@wikki.co.uk, h.rusche@wikki.co.uk Wikki, United Kingdom and Germany 4 December
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationS0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS
S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir
More informationAdaptive-Mesh-Refinement Pattern
Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points
More information2 T. x + 2 T. , T( x, y = 0) = T 1
LAB 2: Conduction with Finite Difference Method Objective: The objective of this laboratory is to introduce the basic steps needed to numerically solve a steady state two-dimensional conduction problem
More informationHigh Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms
High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Scientific Computing
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationMultigrid solvers M. M. Sussman sussmanm@math.pitt.edu Office Hours: 11:10AM-12:10PM, Thack 622 May 12 June 19, 2014 1 / 43 Multigrid Geometrical multigrid Introduction Details of GMG Summary Algebraic
More informationWhat is Multigrid? They have been extended to solve a wide variety of other problems, linear and nonlinear.
AMSC 600/CMSC 760 Fall 2007 Solution of Sparse Linear Systems Multigrid, Part 1 Dianne P. O Leary c 2006, 2007 What is Multigrid? Originally, multigrid algorithms were proposed as an iterative method to
More informationData parallel algorithms, algorithmic building blocks, precision vs. accuracy
Data parallel algorithms, algorithmic building blocks, precision vs. accuracy Robert Strzodka Architecture of Computing Systems GPGPU and CUDA Tutorials Dresden, Germany, February 25 2008 2 Overview Parallel
More informationLecture 2 Unstructured Mesh Generation
Lecture 2 Unstructured Mesh Generation MIT 16.930 Advanced Topics in Numerical Methods for Partial Differential Equations Per-Olof Persson (persson@mit.edu) February 13, 2006 1 Mesh Generation Given a
More informationADAPTIVE FINITE ELEMENT
Finite Element Methods In Linear Structural Mechanics Univ. Prof. Dr. Techn. G. MESCHKE SHORT PRESENTATION IN ADAPTIVE FINITE ELEMENT Abdullah ALSAHLY By Shorash MIRO Computational Engineering Ruhr Universität
More informationLecture 4: Locality and parallelism in simulation I
Lecture 4: Locality and parallelism in simulation I David Bindel 6 Sep 2011 Logistics Distributed memory machines Each node has local memory... and no direct access to memory on other nodes Nodes communicate
More informationSimulation of Freak Wave Impact Using the Higher Order Spectrum
Simulation of Freak Wave Impact Using the Higher Order Spectrum The Naval Hydro Pack Hrvoje Jasak and Vuko Vukčević Faculty of Mechanical Engineering and Naval Architecture, Uni Zagreb, Croatia Wikki Ltd.
More informationLarge-scale Gas Turbine Simulations on GPU clusters
Large-scale Gas Turbine Simulations on GPU clusters Tobias Brandvik and Graham Pullan Whittle Laboratory University of Cambridge A large-scale simulation Overview PART I: Turbomachinery PART II: Stencil-based
More informationSpace Filling Curves and Hierarchical Basis. Klaus Speer
Space Filling Curves and Hierarchical Basis Klaus Speer Abstract Real world phenomena can be best described using differential equations. After linearisation we have to deal with huge linear systems of
More informationParallelization Principles. Sathish Vadhiyar
Parallelization Principles Sathish Vadhiyar Parallel Programming and Challenges Recall the advantages and motivation of parallelism But parallel programs incur overheads not seen in sequential programs
More informationT6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller
T6: Position-Based Simulation Methods in Computer Graphics Jan Bender Miles Macklin Matthias Müller Jan Bender Organizer Professor at the Visual Computing Institute at Aachen University Research topics
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationMultigrid Methods for Markov Chains
Multigrid Methods for Markov Chains Hans De Sterck Department of Applied Mathematics, University of Waterloo collaborators Killian Miller Department of Applied Mathematics, University of Waterloo, Canada
More informationAutomatic Hex-Dominant Mesh Generation for CFD Analysis of Formula One Car with cfmeshpro
Automatic Hex-Dominant Mesh Generation for CFD Analysis of Formula One Car with cfmeshpro Alen Cukrov and Franjo Juretić Creative Fields Ltd, X Vrbik 4, 10000 Zagreb, Croatia 1 Introduction This report
More informationLecture 17: Array Algorithms
Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting
More informationConstruction and application of hierarchical matrix preconditioners
University of Iowa Iowa Research Online Theses and Dissertations 2008 Construction and application of hierarchical matrix preconditioners Fang Yang University of Iowa Copyright 2008 Fang Yang This dissertation
More informationCFD exercise. Regular domain decomposition
CFD exercise Regular domain decomposition Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationParallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle
ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of
More informationD036 Accelerating Reservoir Simulation with GPUs
D036 Accelerating Reservoir Simulation with GPUs K.P. Esler* (Stone Ridge Technology), S. Atan (Marathon Oil Corp.), B. Ramirez (Marathon Oil Corp.) & V. Natoli (Stone Ridge Technology) SUMMARY Over the
More informationAlgebraic Multigrid (AMG) for Ground Water Flow and Oil Reservoir Simulation
lgebraic Multigrid (MG) for Ground Water Flow and Oil Reservoir Simulation Klaus Stüben, Patrick Delaney 2, Serguei Chmakov 3 Fraunhofer Institute SCI, Klaus.Stueben@scai.fhg.de, St. ugustin, Germany 2
More informationFinal drive lubrication modeling
Final drive lubrication modeling E. Avdeev a,b 1, V. Ovchinnikov b a Samara University, b Laduga Automotive Engineering Abstract. In this paper we describe the method, which is the composition of finite
More informationAn introduction to mesh generation Part IV : elliptic meshing
Elliptic An introduction to mesh generation Part IV : elliptic meshing Department of Civil Engineering, Université catholique de Louvain, Belgium Elliptic Curvilinear Meshes Basic concept A curvilinear
More informationData mining with sparse grids
Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks
More informationRobot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss
Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for
More informationGraphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General
Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for
More informationLecture 4: Principles of Parallel Algorithm Design (part 4)
Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction
More informationParallel Mesh Partitioning in Alya
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Mesh Partitioning in Alya A. Artigues a *** and G. Houzeaux a* a Barcelona Supercomputing Center ***antoni.artigues@bsc.es
More informationMultiphysics simulations of nuclear reactors and more
Multiphysics simulations of nuclear reactors and more Gothenburg Region OpenFOAM User Group Meeting Klas Jareteg klasjareteg@chalmersse Division of Nuclear Engineering Department of Applied Physics Chalmers
More informationFluent User Services Center
Solver Settings 5-1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Independence Adaption Appendix: Background Finite Volume
More informationIOS: A Middleware for Decentralized Distributed Computing
IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc
More informationReal-Time Shape Editing using Radial Basis Functions
Real-Time Shape Editing using Radial Basis Functions, Leif Kobbelt RWTH Aachen Boundary Constraint Modeling Prescribe irregular constraints Vertex positions Constrained energy minimization Optimal fairness
More informationImmersed Boundary Method in FOAM
Immersed Boundary Method in FOAM Theory, Implementation and Use Hrvoje Jasak and Željko Tuković Chalmers University, Gothenburg Faculty of Mechanical Engineering and Naval Architecture, Zagreb Immersed
More informationsmooth coefficients H. Köstler, U. Rüde
A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin
More informationBDDCML. solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) Jakub Šístek version 1.
BDDCML solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) 2010-2012 Jakub Šístek version 1.3 Jakub Šístek i Table of Contents 1 Introduction.....................................
More informationOptimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH
Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Carsten Clauss, Martin Pöppe, Thomas Bemmerl carsten@lfbs.rwth-aachen.de http://www.mp-mpich.de Lehrstuhl für Betriebssysteme
More informationVolume Illumination, Contouring
Volume Illumination, Contouring Computer Animation and Visualisation Lecture 0 tkomura@inf.ed.ac.uk Institute for Perception, Action & Behaviour School of Informatics Contouring Scaler Data Overview -
More information