Automated performance modeling of the UG4 simulation framework

Size: px
Start display at page:

Download "Automated performance modeling of the UG4 simulation framework"

Transcription

1 Automated performance modeling UG4 simulation framework Andreas Vogel 1, Alexru Calotoiu 2, Arne Nägel 1, Sebastian Reiter 1, Alexre Strube 3, Gabriel Wittum 1, Felix Wolf 2 1 Goe-Universität Frankfurt, Frankfurt am Main, Germany 2 Technische Universität Darmstadt, Darmstadt, Germany 3 Forschungszentrum Jülich, Jülich, Germany Munich, January 2016 Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

2 Introduction Goal: Find scaling issues in large parallel codes. Target code: UG4 (Unstructured Grids 4), a simulation framework for solving partial differential equations on unstructured grids. Idea: Create performance models with fine granularity Usage: Measure scaling behavior different code kernels. Create scaling models for each such kernel. Do this for thouss kernels automation required! Automated performance analysis to identify current bottlenecks. Prediction possible bottlenecks for even larger parallel runs. Allows for prediction resource consumption for larger core counts. Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

3 Outline UG4 Overview parallelization concepts Application Drug diffusion through human skin Performance Modeling Ideas Results Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

4 UG4 overview UG4 1 is a simulation framework for solution PDE s. Fast Cross platform C++ code. Modular plugin based structure. Full scripting support through Lua Java. Flexible distributed unstructured grids. FE-/FV-discretizations. Various applications such as groundwater-flow, elasticity, neuroscience, biology, many more. 100k lines code. Developed with massively parallel applications in mind: Multigrid solvers for optimal complexity (important!). Efficient distribution grid hierarchies. Very good scalability shown for up to cores 2. 1) Vogel, A., Reiter, S., Rupp, M., Nägel, A., Wittum, G.: UG 4: A novel flexible stware system for simulating PDE based models on high performance computers. Comp. Vis. Sci. 16(4), (2013) 2) Reiter, S., Vogel, A., Heppner, I., Rupp, M., Wittum, G.: A massively parallel geometric solver on hierarchically distributed grids. Com. Vis. Sci 16(4), (2013) Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

5 UG4 parallelization concepts Interfaces allow for data exchange beten distributed grid objects. I MH l,a,b I SH l,b,a B A I MH l,b,c I SH l,c,b master interface, I MH l,a,c I S H l,c,a slave interface. C Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

6 UG4 parallelization concepts Interfaces allow for data exchange beten distributed grid objects. Hierarchical distribution guarantees good comp/comm ratio. l P 0 P 1 P 2 P 3 P 0 P 1 P 2 P 3 P 0 P 2 P 0 P 0 P 1 P 2 P 3 ghosts v-interfaces Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

7 UG4 parallelization concepts Interfaces allow for data exchange beten distributed grid objects. Hierarchical distribution guarantees good comp/comm ratio. Horizontal/vertical interfaces are used for communication. &'(#)# &'(#%#!"#!$#!%# &'(#$# &'(#"# *+,-.+/012#-/0',314'# (',5412#-/0',314'# Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

8 Application: Drug diffusion 1. Application: through human Human skin skin Stratum corneum Model human skin 1 : Skin - Geometry 3d Stratum granulosum Stratum spinosum Stratum basale Grid for Stratum corneum: Modeling drug diffusion through human skin Application medicine leads to diffusion substances into skin Questions: Pathway substances, temporal behavior,... Diffusion equation: Andreas Vogel, G-CSC, University Frankfurt, Germany t c s (t, x) = (D s c s (t, x)), green: Lipid - channels red: Skin cells s {cor, lip} FV-Simulation drug diffusion Andreas through Vogel, G-CSC, University Frankfurt, Stratum Germany corneum Hex-Grid with Corneocytes (red) Lipid channels (green). Difficulties: jumping coefficients anisotropic elements. 1) Nägel, A., Heisig, M., Wittum, G.: A comparison two- three-dimensional models for simulation permeability human stratum corneum. European Journal Pharmaceutics Biopharmaceutics 72(2), (2009) Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

9 Skin-Problem: Solution process refined: good element quality coarse: bad element quality Generation MG-hierarchy through parallel anisotropic refinement. Andreas Vogel, G-CSC, University Frankfurt, Germany Element quality improves with each refinement less solver steps. Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

10 enables researcher totests, deduct desired information. data can be parsed studies: In first two focus on modeling drugthis diffusion through each automatic pre-benchmark analyze post-processing scripts thatunder draw information store human skin. First, code behavior ak scaling, For run, data written out a certain format that Skin-Problem: Solution Using tools from Sec. 3 interpretation. Sec. 4, is process analyze in UG4 code in three n subit more for manual vary densely diffusivity skin cells over several ranges magnitude. In third enables researcher totests, deduct desired information. This data can be parsed studies: In firstcarried two focus on shown modeling drug diffusion through steps outdifferent JUBE are in Fig. 2. information Preparation, compilation, study, compare two types solvers, again under ak scaling: automatic pre post-processing scripts that draw human skin. First, analyze code behavior under ak scaling, nstore execution, analysis steps might exist multiple times JUBE will perform geometric solver unpreconditioned conjugate gradient (CG) it more densely for manual interpretation. vary diffusivity skin cells over several ranges magnitude. In third aforementioned steps in sequence. It is important to note that JUBE is method. steps carried out JUBE are shown in Fig. 2. Preparation, compilation, study,able compare different types runs solvers, again under ak scaling: to easily two create combinatorial multiple parameters. For example, execution, analysis steps might exist multiple times JUBE will perform geometric conjugate gradient (CG) in a scaling experiment, one can numbers processes, 8 solver Andreas Vogel et unpreconditioned al. simply specify multiple aforementioned steps in sequence. It is skin. important toelement notequality that isepidrug though /or human good outermost partjube will create refined: method. /ordiffusion different solver setups physical parameter, JUBE 30 able to create for combinatorial runs multiple parameters. For example, dermis (stratum corneum) consists flattened, dead cells (corneocytes), that oneeasily experiment each possible combination, submit tongmg resource p all L m DoF are surrounded inter-cellular lipid. stratumnumbers corneumisprocesses, natural in a scaling experiment, one can simply specify multiple 25 manager, collect allan results, display m toger , Drug diffusion though human skin. outermost part epibarrier to protect underlying tissue, but stillparameter, allows for certain /or different solver JUBE create throughput 2,271,049will 27 20setups /or physical Solve dermis (stratum corneum) consists flattened, dead cells (corneocytes), that ,961, coarse: bad element quality concentrations (e.g.,possible drugs, medicine). latter process can be modeled a one experiment for each combination, submit all m to resource Init ,869,025!!"" are surrounded inter-cellular lipid.m stratum corneum is 29natural Assemble diffusion process, in which diffusion coe cient within corneocytes differs manager, allan results, display toger ,139,670, collect Results 10 barrier to protect tissue, but still allows for throughput certain from oneunderlying in lipid. Different geometric representations stratum #$!!"" concentrations (e.g.,been drugs, medicine). process can beformodeled 5 used Kernel Model time corneum have to compute latter diffusional throughput [17].[s] a Using tools from%#sec. 3diffusion Sec.coe cient 4 analyzed code in log three studies: Solve UG p differs diffusion process, in which within 8.17 corneocytes!" " 0 In following studies, use a 2drug brick--mortar model 3). 5 Results Init log22 p (Fig In two tests2two focus on diffusion through human modeling 213 Green: corneocytes (dead, flattenedgeometric skin in cells) Assemble 1.78 s from Assuming onefirst indiffusion lipid. Different representations stratum driven transport two subdomains {cor, lip} (corprocesses skin. First, analyze code behavior in a ak scaling follod study Red: lipid channels Generation MG-hierarchy through parallel anisotropic refinement. corneum have lipid), been Fig. used to compute diffusional throughput [17]. computational grid isover given 4. governing Left: Measured wallclock times (marks) models (lines) in for3d assembly, Usingneocyte, tools from Sec. 3 Sec.equation 4 cells analyzed UG4 three varying diffusivity skin ranges code magnitude. Instudies: third Diffusion equation: two solver initialization, solution skin 3d problem. Right, top: 3). In following studies, use a brick--mortar model (Fig. In study first twoprovide tests focus on modeling drug diffusion through human a comparison for different types solver: geometric Number gridimproves refinements (L), two degrees each freedom (DoF) number solver steps. Element quality with refinement iterations less cs (t, x) = two (D cmodels Assuming driven transport in sfollod lip} (cors (t, x)). solver (n ).t Right, Performance for gmg is compared inbottom: a ak scaling study tokernels. {cor, unpreconditioned skin. First,diffusion solver analyze code behavior in assubdomains ak scaling study neocyte, lipid), governing equation is given Solution conjugate gradient varying diffusivity (CG) method. skin cells over ranges magnitude. In third diffusion coe cient D is assumed to be constant within each subdomain study provide a comparisons for two different types solver: geometric s {cor, lip}, but may (t, x) beten = (Dsubdomains. t csdiffer s cs (t, x)). For scalability analysis, solver is compared in aak scaling study to unpreconditioned steady though state 5.1compute Drug diffusion concentration human skindistribution. conjugateasgradient (CG) method. solver, employ a geometric method, accelerated an outer diffusion coe cient Ds is assumed to be constant within each subdomain We lip}, consider amay model forbeten permeability human skin. outermost conjugate gradient method. uses afor damped Jacobi smoor, two s {cor, but differ subdomains. scalability analysis, part epidermis (stratum corneum) consists flattened, dead (cor(resp. three) smoothing steps in 2d (resp. 3d), a V-cycle, an LU base solver. compute steady state concentration distribution. 5.1 Drug diffusionfig. though (left) human skin(right) solution for a 2d geometry. cells 5. Instationary stationary 10 Concentration once a an substance at residuum two different timesteps. neocytes) are surrounded inter-cellular lipid. stratum corneum iterations completed an absolute reduction 10 isis As solver, that employ a geometric accelerated outer 1) Nägel, A., Heisig, M., Wittum, G.: statemethod, art in computational modelling skinan permeation. natural barrier to protect underlying but still allows foroutermost achieved. main di culty this uses problem is bad aspect ratiothrough Advanced Drug Delivery (2012) We consider a model permeability tissue human skin. conjugate gradient method. a damped smoor, icallyfor interesting fluxes at Systems bottom domain, FJacobi := c ds, two bot put epidermis certain concentrations drugs, can be framework computational domain vs. 30µm for channels). This is resolved Sebastian Reiter Performance modeling process UG4 simulation iteration count for(e.g., Frankfurt solver are collected using JUBE (Tab. 2). part (stratum corneum) consists lipid flattened, dead cells (cor(resp. three) smoothing steps(0.1µm in G-CSC 2d (resp. 3d), amedicine). V-cycle, anlatter LU base solver. Time [s] Geometry: brick--mortar Andreas Vogel, G-CSC, University Frankfurt, Germany bot

11 Performance Modeling for UG4 - Setting Setting: UG4 s code contains PROFILE calls at many crucial points. Different prile-backends exist. Here ScoreP 1 is used. Weak scaling study for steady state drug diffusion problem. Solver: Geometric Multigrid, Jacobi-smoor, outer CG. 1) 2) A. Vogel, A. Calotoiu, A. Strube, S. Reiter, A. Nägel, F. Wolf, G. Wittum. 10,000 performance models per minute scalability UG4 simulation framework. In J. L. Träff, S. Hunold, F. Versaci, editors, Euro-Par 2015: Parallel Processing, vol oretical Computer Science General Issues, pages Springer International Publishing, 2015 Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

12 Performance Modeling for UG4 Performance Modeling: Run simulations at different process numbers, record timings. Generate performance-model for each code-kernel ( ) each metric (5-10) finding best fit in PMNF : f (p) = n k=1 c k p ik log j k 2 (p) Sort analyze models asymptotic behavior. Complexity O(log p) is considered fine. (*): Performance Model Normal Form 1) Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proc. ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA. ACM (November 2013) 2) Calotoiu, A., Hoefler, T., Wolf, F.: Mass-producing insightful performance models. In: Workshop on Modeling & Simulation Systems Applications, University Washington. Seattle, Washington (Aug 2014) 3) Picard, R.R., Cook, R.D.: Cross-validation regression models. Journal American Statistical Association 79(387), (1984) Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

13 sparse matrix assembling, (bottom). 1 R2, absolute difference Results bottlenecks beteni R Identifying Models: optimum scaledperformance Grid 10, which setup can be considered a normalized 2 3 error, confirms good quality all models Time Kernel Model time = f (p) [ms] Bytes sent 1 R2 Model LoadUGScript æ MPI Allreduce log p init levels æ MPI Allreduce log p2 init top surface æ MPI Allreduce p1/4 Issue: Kernel 1 R2 [10 3 ] tes = f (p) [10 3 ] O(MPI Allreduce) p O(MPI Allreduce) p O(MPI Allreduce) Time Invocations Broadcast membership info using MPI_Allreduce for array length p 2 2 Model 1 R Model Found issue: MPI_Allreduce used for array length 1p. R 3 3 Now: MPI_Allreduce replaced MPI_Comm_split time = f (p) [10 ] invocations = f (p) [10 ] Where: InitSolver: Processes have to signal wher y 76.9 want to take part in communication on certain levels. GMG æ PreSmooth æ jacobi log p log p GMG æ prolongate log p log p2 O(log p) 1) for MPI_Comm_split are known to scale Enhanced assemble linear algorithms with 1) Weak-scaling analysis 3d skin model. Using 3d skin model 1) Siebert, C., Wolf, F.: Parallel sorting with minimal data. In: Recent Advances in Message Passing Interface, 2 pp Springer, Now: above, Using MPI_Comm_split which p).1 described fix 2011 diffusion parameter to scales Dcor = with O(log Tab. 1 shows models for a scalability issue detected. In se kernels, create MPI communicator groups for each level hierarchy, excluding processes Andreas Vogel, G-CSC, University Frankfurt, Germany Fast do issue possible from location group that not own a grid thanks part on to fine level.grained In order to inform performance models. every process on se memberships, employ an MPI Allreduce for an array length p, resulting in a p O(MPI Allreduce) dependency, that will lead to scalability issues large counts. In se kernels, substituted Siebert, C., Wolf, F.: Parallelfor sorting withprocess minimal data. In: Recent Advances in Message Passing Interface, pp Springer,for 2011 MPI Comm split MPI Allreduce, also eliminating linearly growing input. First tests do not show a significant improvement in runtime, hover now Sebastian Reiter G-CSC Frankfurt Performance modeling have UG4 simulation framework dependency is O(MPI Comm split), whose scaling properties been analyzed

14 Models: Multigrid Results8 II Andreas Validating optimal complexity GMG Vogel et al. 30 p L Time [s] Solve Init Assemble Processes DoF ngmg 6 290, ,271, ,961, ,869, ,139,670,081 Kernel Solve Init Assemble Model for time [s] log2 p log22 p 1.78 Fig. 4. Left: Measured wallclock timescount (marks) models(lines) 1) assembly, properties: iteration independent mesh for size Known solver initialization, solution skin 3d problem. Right, top: ak scaling properties ifdegrees one iteration step(dof) scales Weak scaling study for 3d skin Good Number grid refinements (L), problem: freedom number iterations solver (n Right, bottom: Performance models kernels. gmg ). When number processes (1985) aforfactor 8, refine 1) Hackbusch, W.: Multi-grid methods applications,grows vol. 4. Springer once more. Number elements per process is same in all runs (ak scaling). Known: Number MGVogel, iterations independent mesh size. Andreas G-CSC, University Frankfurt, Germany crucial for good ak scalability! Observed: Slight increase in iteration numbers (anisotropy in lipid layers). No GMG code kernel scales worse than O(log p)! Very good overall behavior. Fig. 5. Instationary (left) scaling stationary (right) solution for a 2d geometry. Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

15 Results III Models: Comparison GMG scalability CG scalability Conjugate 10,000 performance models per minute gradient UG Time [s] p L CG GMG Assemble Processes DoF 7 66, , ,050, ,198, ,785,409 Kernel CG GMG Assemble ncg ngmg Model (time [s]) p log22 p Fig. 6. Left: Measured times (marks) models (lines) for assembling solver increases for (CG) laplace problem roughly a factor Right, 2 1) top: execution for count conjugate gradient (GMG) methods. CG: iteration Number grid refinements (L), degrees freedom (DoF) solver iteraweak scaling study for 2d Laplace scaling: increase process count problem: factor 4 withnumber each grid refinement 2d ak tions (ncg, ngmg ). Right, bottom: Performance models for kernels. When number processes grows a factor 4, refine once more. 1) Hackbusch, W.: Iterative solution large sparse systems equations (1994) 5.1 Analysis algebraic solvers Observed: geometric solver justuniversity one Frankfurt, many Germany possible choices for solvers Andreas Vogel, is G-CSC, GMG iteration numbers stay constant. for a sparse matrix problem. refore, perform a ak scaling comparison CG iteration numbers increase a factor 2 (expected from ory) beten method used as a linear solver unpreconditioned conjugate gradient method applied 2 to same problem. GMG: O(log2 p) CG: O( p) Weak scaling comparison conjugate gradient. To allow a oretical analysis, choose a ll known test problem: For model Sebastian Reiter G-CSC Frankfurt2 Performance modeling UG4 simulation framework

16 Conclusions Outlook Automated performance modeling: Helps to find scalability issues in existing codes. Fine granularity helps to easily identify responsible code kernels. Suited for large code-bases massive parallelism. Plus: Automated unit tests, resource consumption, etc. Results for UG4 : UG4 features solvers with nearly optimal ak scalability. Efficient massively parallel runs on flexible unstructured grids. Next steps: Detailed analysis massively parallel adaptive runs. Special focus on domain partitioning, redistribution,... Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

17 End Thank You Financial support from DFG Priority Program 1648 Stware for Exascale Computing (SPPEXA) is gratefully acknowledged. Gauss Centre for Supercomputing (GCS) is gratefully acknowledged for providing computing time on GCS share supercomputer JUQUEEN at Jülich Supercomputing Centre (JSC). Sebastian Reiter G-CSC Frankfurt Performance modeling UG4 simulation framework

Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes

Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes A. Calotoiu 1, T. Hoefler 2, M. Poke 1, F. Wolf 1 1) German Research School for Simulation Sciences 2) ETH Zurich September

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential

More information

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea. Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences

More information

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Algebraic Multigrid (AMG) for Ground Water Flow and Oil Reservoir Simulation

Algebraic Multigrid (AMG) for Ground Water Flow and Oil Reservoir Simulation lgebraic Multigrid (MG) for Ground Water Flow and Oil Reservoir Simulation Klaus Stüben, Patrick Delaney 2, Serguei Chmakov 3 Fraunhofer Institute SCI, Klaus.Stueben@scai.fhg.de, St. ugustin, Germany 2

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

Introduction to Multigrid and its Parallelization

Introduction to Multigrid and its Parallelization Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are

More information

DISP: Optimizations Towards Scalable MPI Startup

DISP: Optimizations Towards Scalable MPI Startup DISP: Optimizations Towards Scalable MPI Startup Huansong Fu, Swaroop Pophale*, Manjunath Gorentla Venkata*, Weikuan Yu Florida State University *Oak Ridge National Laboratory Outline Background and motivation

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

Exploring unstructured Poisson solvers for FDS

Exploring unstructured Poisson solvers for FDS Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests

More information

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System

More information

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible

More information

Parallel Computations

Parallel Computations Parallel Computations Timo Heister, Clemson University heister@clemson.edu 2015-08-05 deal.ii workshop 2015 2 Introduction Parallel computations with deal.ii: Introduction Applications Parallel, adaptive,

More information

smooth coefficients H. Köstler, U. Rüde

smooth coefficients H. Köstler, U. Rüde A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin

More information

PROGRAMMING OF MULTIGRID METHODS

PROGRAMMING OF MULTIGRID METHODS PROGRAMMING OF MULTIGRID METHODS LONG CHEN In this note, we explain the implementation detail of multigrid methods. We will use the approach by space decomposition and subspace correction method; see Chapter:

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Performance and accuracy of hardware-oriented. native-, solvers in FEM simulations

Performance and accuracy of hardware-oriented. native-, solvers in FEM simulations Robert Strzodka, Stanford University Dominik Göddeke, Universität Dortmund Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations Number of slices

More information

Data mining with sparse grids using simplicial basis functions

Data mining with sparse grids using simplicial basis functions Data mining with sparse grids using simplicial basis functions Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Part of the work was supported within the project 03GRM6BN

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

τ-extrapolation on 3D semi-structured finite element meshes

τ-extrapolation on 3D semi-structured finite element meshes τ-extrapolation on 3D semi-structured finite element meshes European Multi-Grid Conference EMG 2010 Björn Gmeiner Joint work with: Tobias Gradl, Ulrich Rüde September, 2010 Contents The HHG Framework τ-extrapolation

More information

Hierarchical Hybrid Grids

Hierarchical Hybrid Grids Hierarchical Hybrid Grids IDK Summer School 2012 Björn Gmeiner, Ulrich Rüde July, 2012 Contents Mantle convection Hierarchical Hybrid Grids Smoothers Geometric approximation Performance modeling 2 Mantle

More information

Peta-Scale Simulations with the HPC Software Framework walberla:

Peta-Scale Simulations with the HPC Software Framework walberla: Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling

Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Iterative Solvers Numerical Results Conclusion and outlook 1/22 Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Part II: GPU Implementation and Scaling on Titan Eike

More information

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm

More information

Highly Parallel Multigrid Solvers for Multicore and Manycore Processors

Highly Parallel Multigrid Solvers for Multicore and Manycore Processors Highly Parallel Multigrid Solvers for Multicore and Manycore Processors Oleg Bessonov (B) Institute for Problems in Mechanics of the Russian Academy of Sciences, 101, Vernadsky Avenue, 119526 Moscow, Russia

More information

Research Collection. WebParFE A web interface for the high performance parallel finite element solver ParFE. Report. ETH Library

Research Collection. WebParFE A web interface for the high performance parallel finite element solver ParFE. Report. ETH Library Research Collection Report WebParFE A web interface for the high performance parallel finite element solver ParFE Author(s): Paranjape, Sumit; Kaufmann, Martin; Arbenz, Peter Publication Date: 2009 Permanent

More information

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

High Performance Computing for PDE Towards Petascale Computing

High Performance Computing for PDE Towards Petascale Computing High Performance Computing for PDE Towards Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik, Univ. Dortmund

More information

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of

More information

A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units

A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units Markus Wagner, Karl Rupp,2, Josef Weinbub Institute for Microelectronics, TU

More information

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers Henrik Löf, Markus Nordén, and Sverker Holmgren Uppsala University, Department of Information Technology P.O. Box

More information

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics

More information

Smoothers. < interactive example > Partial Differential Equations Numerical Methods for PDEs Sparse Linear Systems

Smoothers. < interactive example > Partial Differential Equations Numerical Methods for PDEs Sparse Linear Systems Smoothers Partial Differential Equations Disappointing convergence rates observed for stationary iterative methods are asymptotic Much better progress may be made initially before eventually settling into

More information

TAU mesh deformation. Thomas Gerhold

TAU mesh deformation. Thomas Gerhold TAU mesh deformation Thomas Gerhold The parallel mesh deformation of the DLR TAU-Code Introduction Mesh deformation method & Parallelization Results & Applications Conclusion & Outlook Introduction CFD

More information

Reconstruction of Trees from Laser Scan Data and further Simulation Topics

Reconstruction of Trees from Laser Scan Data and further Simulation Topics Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair

More information

GPU Cluster Computing for FEM

GPU Cluster Computing for FEM GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing

More information

Scalability of Elliptic Solvers in NWP. Weather and Climate- Prediction

Scalability of Elliptic Solvers in NWP. Weather and Climate- Prediction Background Scaling results Tensor product geometric multigrid Summary and Outlook 1/21 Scalability of Elliptic Solvers in Numerical Weather and Climate- Prediction Eike Hermann Müller, Robert Scheichl

More information

Shallow Water Equation simulation with Sparse Grid Combination Technique

Shallow Water Equation simulation with Sparse Grid Combination Technique GUIDED RESEARCH Shallow Water Equation simulation with Sparse Grid Combination Technique Author: Jeeta Ann Chacko Examiner: Prof. Dr. Hans-Joachim Bungartz Advisors: Ao Mo-Hellenbrand April 21, 2017 Fakultät

More information

An Efficient, Geometric Multigrid Solver for the Anisotropic Diffusion Equation in Two and Three Dimensions

An Efficient, Geometric Multigrid Solver for the Anisotropic Diffusion Equation in Two and Three Dimensions 1 n Efficient, Geometric Multigrid Solver for the nisotropic Diffusion Equation in Two and Three Dimensions Tolga Tasdizen, Ross Whitaker UUSCI-2004-002 Scientific Computing and Imaging Institute University

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

ABOUT THE GENERATION OF UNSTRUCTURED MESH FAMILIES FOR GRID CONVERGENCE ASSESSMENT BY MIXED MESHES

ABOUT THE GENERATION OF UNSTRUCTURED MESH FAMILIES FOR GRID CONVERGENCE ASSESSMENT BY MIXED MESHES VI International Conference on Adaptive Modeling and Simulation ADMOS 2013 J. P. Moitinho de Almeida, P. Díez, C. Tiago and N. Parés (Eds) ABOUT THE GENERATION OF UNSTRUCTURED MESH FAMILIES FOR GRID CONVERGENCE

More information

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Hartwig Anzt 1, Marc Baboulin 2, Jack Dongarra 1, Yvan Fournier 3, Frank Hulsemann 3, Amal Khabou 2, and Yushan Wang 2 1 University

More information

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Outline of the Talk Introduction to the TELEMAC System and to TELEMAC-2D Code Developments Data Reordering Strategy Results Conclusions

More information

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma

More information

Resilient geometric finite-element multigrid algorithms using minimised checkpointing

Resilient geometric finite-element multigrid algorithms using minimised checkpointing Resilient geometric finite-element multigrid algorithms using minimised checkpointing Dominik Göddeke, Mirco Altenbernd, Dirk Ribbrock Institut für Angewandte Mathematik (LS3) Fakultät für Mathematik TU

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs 3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional

More information

D036 Accelerating Reservoir Simulation with GPUs

D036 Accelerating Reservoir Simulation with GPUs D036 Accelerating Reservoir Simulation with GPUs K.P. Esler* (Stone Ridge Technology), S. Atan (Marathon Oil Corp.), B. Ramirez (Marathon Oil Corp.) & V. Natoli (Stone Ridge Technology) SUMMARY Over the

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

Finite Element Multigrid Solvers for PDE Problems on GPUs and GPU Clusters

Finite Element Multigrid Solvers for PDE Problems on GPUs and GPU Clusters Finite Element Multigrid Solvers for PDE Problems on GPUs and GPU Clusters Robert Strzodka Integrative Scientific Computing Max Planck Institut Informatik www.mpi-inf.mpg.de/ ~strzodka Dominik Göddeke

More information

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser HPX High Performance CCT Tech Talk Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 What s HPX? Exemplar runtime system implementation Targeting conventional architectures (Linux based SMPs and clusters) Currently,

More information

COMSOL Model Report. 1. Table of Contents. 2. Model Properties. 3. Constants

COMSOL Model Report. 1. Table of Contents. 2. Model Properties. 3. Constants COMSOL Model Report 1. Table of Contents Title - COMSOL Model Report Table of Contents Model Properties Constants Global Expressions Geometry Geom1 Integration Coupling Variables Solver Settings Postprocessing

More information

Adaptive-Mesh-Refinement Pattern

Adaptive-Mesh-Refinement Pattern Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points

More information

Forest-of-octrees AMR: algorithms and interfaces

Forest-of-octrees AMR: algorithms and interfaces Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

Recent developments for the multigrid scheme of the DLR TAU-Code

Recent developments for the multigrid scheme of the DLR TAU-Code www.dlr.de Chart 1 > 21st NIA CFD Seminar > Axel Schwöppe Recent development s for the multigrid scheme of the DLR TAU-Code > Apr 11, 2013 Recent developments for the multigrid scheme of the DLR TAU-Code

More information

course outline basic principles of numerical analysis, intro FEM

course outline basic principles of numerical analysis, intro FEM idealization, equilibrium, solutions, interpretation of results types of numerical engineering problems continuous vs discrete systems direct stiffness approach differential & variational formulation introduction

More information

ETNA Kent State University

ETNA Kent State University Electronic Transactions on Numerical Analysis. Volume, 2, pp. 92. Copyright 2,. ISSN 68-963. ETNA BEHAVIOR OF PLANE RELAXATION METHODS AS MULTIGRID SMOOTHERS IGNACIO M. LLORENTE AND N. DUANE MELSON Abstract.

More information

Product Engineering Optimizer

Product Engineering Optimizer CATIA V5 Training Foils Product Engineering Optimizer Version 5 Release 19 January 2009 EDU_CAT_EN_PEO_FI_V5R19 1 About this course Objectives of the course Upon completion of this course, you will learn

More information

Model Reduction for High Dimensional Micro-FE Models

Model Reduction for High Dimensional Micro-FE Models Model Reduction for High Dimensional Micro-FE Models Rudnyi E. B., Korvink J. G. University of Freiburg, IMTEK-Department of Microsystems Engineering, {rudnyi,korvink}@imtek.uni-freiburg.de van Rietbergen

More information

Efficient Imaging Algorithms on Many-Core Platforms

Efficient Imaging Algorithms on Many-Core Platforms Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based

More information

Fast Dynamic Load Balancing for Extreme Scale Systems

Fast Dynamic Load Balancing for Extreme Scale Systems Fast Dynamic Load Balancing for Extreme Scale Systems Cameron W. Smith, Gerrett Diamond, M.S. Shephard Computation Research Center (SCOREC) Rensselaer Polytechnic Institute Outline: n Some comments on

More information

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation Amir Nejat * and Carl Ollivier-Gooch Department of Mechanical Engineering, The University of British Columbia, BC V6T 1Z4, Canada

More information

Accelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University

Accelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Accelerating GPU computation through mixed-precision methods Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Outline Motivation Truncated Precision using CUDA Solving Linear

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large

More information

A Hybrid Geometric+Algebraic Multigrid Method with Semi-Iterative Smoothers

A Hybrid Geometric+Algebraic Multigrid Method with Semi-Iterative Smoothers NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 013; 00:1 18 Published online in Wiley InterScience www.interscience.wiley.com). A Hybrid Geometric+Algebraic Multigrid Method with

More information

Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations

Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations Stefan Bischof, Ralf Ebner, and Thomas Erlebach Institut für Informatik Technische Universität München D-80290

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Towards a Unified Framework for Scientific Computing

Towards a Unified Framework for Scientific Computing Towards a Unified Framework for Scientific Computing Peter Bastian 1, Marc Droske 3, Christian Engwer 1, Robert Klöfkorn 2, Thimo Neubauer 1, Mario Ohlberger 2, Martin Rumpf 3 1 Interdisziplinäres Zentrum

More information

lecture 8 Groundwater Modelling -1

lecture 8 Groundwater Modelling -1 The Islamic University of Gaza Faculty of Engineering Civil Engineering Department Water Resources Msc. Groundwater Hydrology- ENGC 6301 lecture 8 Groundwater Modelling -1 Instructor: Dr. Yunes Mogheir

More information

Auto Injector Syringe. A Fluent Dynamic Mesh 1DOF Tutorial

Auto Injector Syringe. A Fluent Dynamic Mesh 1DOF Tutorial Auto Injector Syringe A Fluent Dynamic Mesh 1DOF Tutorial 1 2015 ANSYS, Inc. June 26, 2015 Prerequisites This tutorial is written with the assumption that You have attended the Introduction to ANSYS Fluent

More information

Scalable, Hybrid-Parallel Multiscale Methods using DUNE

Scalable, Hybrid-Parallel Multiscale Methods using DUNE MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE R. Milk S. Kaulmann M. Ohlberger December 1st 2014 Outline MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 2 /28 Abstraction

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Massively Parallel Finite Element Simulations with deal.ii

Massively Parallel Finite Element Simulations with deal.ii Massively Parallel Finite Element Simulations with deal.ii Timo Heister, Texas A&M University 2012-02-16 SIAM PP2012 joint work with: Wolfgang Bangerth, Carsten Burstedde, Thomas Geenen, Martin Kronbichler

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

GEOMETRY MODELING & GRID GENERATION

GEOMETRY MODELING & GRID GENERATION GEOMETRY MODELING & GRID GENERATION Dr.D.Prakash Senior Assistant Professor School of Mechanical Engineering SASTRA University, Thanjavur OBJECTIVE The objectives of this discussion are to relate experiences

More information

Partial Differential Equations

Partial Differential Equations Simulation in Computer Graphics Partial Differential Equations Matthias Teschner Computer Science Department University of Freiburg Motivation various dynamic effects and physical processes are described

More information

Accelerating Finite Element Analysis in MATLAB with Parallel Computing

Accelerating Finite Element Analysis in MATLAB with Parallel Computing MATLAB Digest Accelerating Finite Element Analysis in MATLAB with Parallel Computing By Vaishali Hosagrahara, Krishna Tamminana, and Gaurav Sharma The Finite Element Method is a powerful numerical technique

More information

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop

More information

Parallel Sorting with Minimal Data

Parallel Sorting with Minimal Data Parallel Sorting with Minimal Data Christian Siebert 1,2 and Felix Wolf 1,2,3 1 German Research School for Simulation Sciences, 52062 Aachen, Germany 2 RWTH Aachen University, Computer Science Department,

More information

LINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those

LINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications Michael Eberl 1, Wolfgang Karl 1, Carsten Trinitis 1 and Andreas Blaszczyk 2 1 Technische Universitat Munchen

More information

High Performance Computing for PDE Some numerical aspects of Petascale Computing

High Performance Computing for PDE Some numerical aspects of Petascale Computing High Performance Computing for PDE Some numerical aspects of Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik,

More information

Parallel Mesh Partitioning in Alya

Parallel Mesh Partitioning in Alya Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Mesh Partitioning in Alya A. Artigues a *** and G. Houzeaux a* a Barcelona Supercomputing Center ***antoni.artigues@bsc.es

More information

Handling Parallelisation in OpenFOAM

Handling Parallelisation in OpenFOAM Handling Parallelisation in OpenFOAM Hrvoje Jasak hrvoje.jasak@fsb.hr Faculty of Mechanical Engineering and Naval Architecture University of Zagreb, Croatia Handling Parallelisation in OpenFOAM p. 1 Parallelisation

More information

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion

More information

Generic finite element capabilities for forest-of-octrees AMR

Generic finite element capabilities for forest-of-octrees AMR Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

Higher Order Multigrid Algorithms for a 2D and 3D RANS-kω DG-Solver

Higher Order Multigrid Algorithms for a 2D and 3D RANS-kω DG-Solver www.dlr.de Folie 1 > HONOM 2013 > Marcel Wallraff, Tobias Leicht 21. 03. 2013 Higher Order Multigrid Algorithms for a 2D and 3D RANS-kω DG-Solver Marcel Wallraff, Tobias Leicht DLR Braunschweig (AS - C

More information

Mathematical Modelling of Chemical Diffusion through Skin using Grid-based PSEs

Mathematical Modelling of Chemical Diffusion through Skin using Grid-based PSEs Mathematical Modelling of Chemical Diffusion through Skin using Grid-based PSEs Christopher Goodyer 1, Jason Wood 1, and Martin Berzins 2 1 School of Computing, University of Leeds, Leeds, UK ceg@comp.leeds.ac.uk,

More information

Domain Decomposition and hp-adaptive Finite Elements

Domain Decomposition and hp-adaptive Finite Elements Domain Decomposition and hp-adaptive Finite Elements Randolph E. Bank 1 and Hieu Nguyen 1 Department of Mathematics, University of California, San Diego, La Jolla, CA 9093-011, USA, rbank@ucsd.edu. Department

More information

Shape optimisation using breakthrough technologies

Shape optimisation using breakthrough technologies Shape optimisation using breakthrough technologies Compiled by Mike Slack Ansys Technical Services 2010 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Introduction Shape optimisation technologies

More information

Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver

Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Juan J. Alonso Department of Aeronautics & Astronautics Stanford University CME342 Lecture 14 May 26, 2014 Outline Non-linear

More information

Parallel algorithms for Scientific Computing May 28, Hands-on and assignment solving numerical PDEs: experience with PETSc, deal.

Parallel algorithms for Scientific Computing May 28, Hands-on and assignment solving numerical PDEs: experience with PETSc, deal. Division of Scientific Computing Department of Information Technology Uppsala University Parallel algorithms for Scientific Computing May 28, 2013 Hands-on and assignment solving numerical PDEs: experience

More information

Multi-GPU Acceleration of Algebraic Multigrid Preconditioners

Multi-GPU Acceleration of Algebraic Multigrid Preconditioners Multi-GPU Acceleration of Algebraic Multigrid Preconditioners Christian Richter 1, Sebastian Schöps 2, and Markus Clemens 1 Abstract A multi-gpu implementation of Krylov subspace methods with an algebraic

More information