Efficient Algorithmic Approaches for Flow Simulations on Cartesian Grids
|
|
- Hester Hines
- 5 years ago
- Views:
Transcription
1 Efficient Algorithmic Approaches for Flow Simulations on Cartesian Grids M. Bader, H.-J. Bungartz, B. Gatzhammer, M. Mehl, T. Neckel, T. Weinzierl TUM Department of Informatics Chair of Scientific Computing Munich, Germany DEISA PRACE Symposium 2009: HPC Infrastructures for Petascale Applications Amsterdam, May11 13, 2009
2 Prologue or a first Challenge Some people from science and engineering solve great problems using somewhat strange methods whereas some people from scientific computing develop great methods to solve somewhat strange problems. [Think of the world-famous stationary Laplacian on the 1D unit cube ] Principally, we belong to the second group so be prepared to see the application only at the end of the talk! 2
3 Computational Challenges From simulation to optimisation From one-way batch jobs to user interaction From parameter assumptions to identification & estimation It s getting tougher From forward problems to inverse problems From single-physics problems to multi-physics scenarios From island fun From hacker s delight to complex workflows to embedding & integration 3
4 Multi-this this and multi-that that Domains & Education -disciplinary Models -physics Multi- Models -scale Systems -core Numerics -dimensional Numerics -level 4
5 Hence the Motivation for Computational Algorithms Tackling the memory wall : cache-awareness via sophisticated traversal strategies cache-oblivious vs. cache-conscious Tackling on-chip parallelism (multi-core): multi-threading, fine-grain parallelism no more sequential kernel non-standard hardware: accelerators, such as GPGPU, Cell, FPGA Tackling scalability: hybrid concepts, sophisticated & cheap load balancing heterogeneous scenarios (non-standard geometry, multi-level schemes, ) require dynamic load balancing intra- and inter-system (from hybrid systems to the Grid) A promising paradigm: space-filling curves [ SFC: continuous and surjective mapping from unit interval onto unit square/cube ] Lebesgue: the classical one (Morton, octree) Hilbert: the most famous one Peano: our favourite for Cartesian grids Sierpinski: the newcomer for triangles & Co. Annual Annual gain gain in in last last years: years: (avg.) (avg.) CPU CPU performance: performance: 60% 60% memory memory bandwidth: bandwidth: 23% 23% memory memory latency: latency: 5% 5% 5
6 Contents The Scope of Space-Filling Curves The Peano Project Proof of Concept Application The Drift Ratchet 6
7 SFC #1 Lebesgue: Hierarchical Spatial Organisation Lebesgue s space-filling curve known as Morton ordering or quad-/ octrees Applications in a CSE & HPC context: Test of geometric consistency of building models Decomposition and meshing of domains Spatial organisation of FEM identify range of modifications Spatial organisation of particle methods (fast multipole) Integration of location-aware simulation tasks 7
8 SFC #2 Peano: Numerical Linear Algebra TifaMMy TifaMMy cache-efficient cache-efficient matrix matrix multiplication multiplication Peano-based Peano-based traversal traversal with with high high locality locality dense ) dense ) or or sparse sparse matrices) matrices) block-structured block-structured data data structure structure and and algorithm algorithm parallel multicore: multicore: HW-conscious HW-conscious,kernel,kernel OpenMP OpenMP parallel clusters: clusters: distributed distributed caches, caches, MPI MPI application: application: quantum quantum control control (states (states via via matrices) matrices) AMD (2 x quad) AMD (2 x quad) Xeon (4 x quad) Xeon (4 x quad) 8
9 SFC #3 Sierpinski: Tsunami Simulations Sierpinski Sierpinski space-filling space-filling curves curves FEM FEM with with strong strong adaptive adaptive refinement refinement & & coarsening coarsening structured, structured, but but triangular triangular / / tetrahedral tetrahedral high high locality locality and and HW-/cache-efficiency HW-/cache-efficiency Sierpinski-based Sierpinski-based traversal, traversal, newest newest vertex vertex bisection bisection discontinuous discontinuous Galerkin Galerkin discretization discretization application: application: Tsunami Tsunami simulation simulation (shallow (shallow water water eqs.) eqs.) Cooperation with Jörn Cooperation with Jörn AWI AWI 9
10 Contents The Scope of Space-Filling Curves The Peano Project Proof of Concept Application The Drift Ratchet 10
11 Objectives General PDE framework, with focus on CFD/FSI Discretization: FE (strictly conservative) Cartesian grids (at least logically) Straightforward grid generation & adaptation Direct support of multi-level solvers and parallelisation General dimensionality High efficiency 11
12 Grid Organisation: Adaptive Spacetree Cartesian grid cells squares/cubes recursive refinement tri-partitioning tree structure 12
13 Approximation geometric adaptivity, grid hierarchy Eulerian approach (marker-and-cell) Sphere, d=2,3,4 13
14 Traversal for Iterations: Stack Concept cell-oriented operator evaluation ordering of cells along a Peano curve stacks as non-persistent data structure adaptivity & generating systems multi-level high spatial and time locality of data access 14
15 Traversal for Iterations: Stack Concept 2d+2 stacks in d dimensions 15
16 Fast Linear Solvers: Multigrid dehierarchisation compute residual smooth restrict residual 16
17 Parallel Grid Traversal, Dynamic Load Distribution
18 FSI Coupling Environments FSI ce and precice Partitioned Approach to FSI Clip Simulation Program FSI_Init () while (FSI_Is_running()) if (FSI_Is_new_interface_values()) Read coupling data from com.mesh Set time step length Compute values of next time step Write coupling data to com. mesh FSI_Data_exchange () if (FSI_Is_implicit_converged()) Store values of next time step end while FSI_Finalize () 18
19 Contents The Scope of Space-Filling Curves The Peano Project Proof of Concept Application The Drift Ratchet 19
20 Parallel Grid Traversal 20
21 Parallelisation: Memory Overhead 3.5 Parallel/Serial Vertex Number Ratio , ,944 vertices, vertices, successive successive subdivision, subdivision, data data duplication duplication at at subdomain subdomain boundaries, boundaries, worst worst case, case, JUGENE JUGENE Number of Nodes 21
22 Cache Efficiency Scenario Vertices L2 ref s L2 misses Bus data cycles Bus load [%] cube, regular cube, adaptive l-shape, regular l-shape, adp sphere, regular sphere, adaptive Example scenario: 2D Poisson cube, L domain, sphere Itanium2 2x DualCore, 1.3 GHz, 256 kb L2, 3MB L3 (shared), 8 GB single-thread application Messages of the measurements: L2 hit rate > 99.9% low bus traffic (hence well-suited for many-core systems, Cell, ) 22
23 Memory Requirements per DoF bytes/cell bytes/vertex 2D 6 2 grid only Poisson solver, sequential Poisson solver, parallel flow solver 3D 10 2 grid only Poisson solver, sequential Poisson solver, parallel Multigrid flow solver z Threshold Vertices Flop/Cycle L2 hit rate t/dof d=2 1.0 * 10^ * 10^ * 10^ % 4.81 * 10^ * 10^ * 10^ * 10^ % 4.26 * 10^-4 d=3 1.0 * 10^ * 10^ * 10^ % 9.75 * 10^ * 10^ * 10^ * 10^ % 9.52 * 10^-4 Poisson, cube, adaptive, F-cycle 23
24 Memory & Runtime Sequential code with hard-disc streaming Pressure-Poisson-Equation, V-(1/0)-Cycle laptop: 1.8 GHz Intel Centrino, 1GB RAM atsccs: 3.4 GHz Intel Pentium 4,2GB RAM 24
25 Contents The Scope of Space-Filling Curves The Peano Project Proof of Concept Application The Drift Ratchet 25
26 Application: Drift Ratchet Scenario [Matthias and Müller, Asymmetric pores in a silicon membrane acting as massively parallel Brownian ratchets, letters to nature, 424, 2003]; application scenario is a cooperation with the physics dept. of Univ. of Augsburg Ratchets Ratchets or or Brownian Brownian motors motors used used for for sorting sorting macromolecules macromolecules or or other other particles particles (think (think of of a a sieve). sieve). Due Due to to the the pore pore geometry, geometry, (symmetric) (symmetric) periodic periodic pressure pressure b.c. b.c. may may induce induce a a size-dependent size-dependent drift. drift. 26
27 Drift Ratchet: Starting Point CFD scenario involving complex geometries, FSI Need for longer time intervals Physics not yet completely understood Simplified models to start with High technological relevance need for microdevices [ sorting macromolecules such as proteins or DNA ] 27
28 Simulation Scenario Snapshots Peano Peano & precice, precice, 2D 2D 3D 3D 28
29 Results One chamber Two chambers (transit) Clip Re = 0.1, f = 7 khz Clip Re = 0.1, f = 10 khz FSI: FSI: Partitioned Partitioned approach approach (fluid: (fluid: Cartesian Cartesian grid; grid; particle(s): particle(s): triangulated triangulated surface) surface) Explicit Explicit coupling coupling with with divergence divergence correction correction Yet Yet incomplete incomplete model: model: no no Brown, Brown, no no collisions, collisions, no no thermo-dynamical thermo-dynamical effects effects 29
30 First Results Simulations Simulations of of several several cycles cycles Simplified Simplified analytical analytical solution solution vs. vs. simulation simulation (one (one cycle) cycle) 30
31 First Results Re Re = = 0.1, 0.1, f f = = khz khz One One pore pore with with two two chambers chambers 30x30x126 30x30x126 = = 113, ,400 cells cells Oscillating Oscillating pressure pressure b. b. c. c. (grey) (grey) particle particle position position (blue) (blue) and and velocity velocity (red) (red) Velocity boundary particle 1 0 5e Time [s] 31
32 Acknowledgements DFG DEISA project Drift Ratchet Computations & support LRZ, München (D) JSC, Jülich (D) EPCC, Edinburgh (UK) Theoretical Universität Augsburg (Peter Hänggi)... physics again but in an engineering-driven code development All people contributing to Peano Core components CFD & FSI applications 32
33 Communication Optimizing Packet Sizes Infinicluster Time [s] 7e-004 6e-004 5e-004 4e-004 3e-004 2e-004 1e-004 0e+000 2d 3d HLRB II Time [s] 7e-005 6e-005 6e-005 5e-005 5e-005 4e-005 4e-005 2d 3d Jugene Time [s] 7e-004 6e-004 5e-004 4e-004 3e-004 2e-004 1e-004 0e d 3d O(1M) O(1M) dof, dof, (2d) (2d) or or (3d) (3d) Number of Messages per Message Exchange nodes nodes 33
8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More information8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 22 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationParallel Adaptive Tsunami Modelling with Triangular Discontinuous Galerkin Schemes
Parallel Adaptive Tsunami Modelling with Triangular Discontinuous Galerkin Schemes Stefan Vater 1 Kaveh Rahnema 2 Jörn Behrens 1 Michael Bader 2 1 Universität Hamburg 2014 PDES Workshop 2 TU München Partial
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationJoint Advanced Student School 2007 Martin Dummer
Sierpiński-Curves Joint Advanced Student School 2007 Martin Dummer Statement of the Problem What is the best way to store a triangle mesh efficiently in memory? The following points are desired : Easy
More informationEfficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves
Efficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves Csaba Attila Vigh, Dr. Michael Bader Department of Informatics, TU München JASS 2006, course 2: Numerical Simulation:
More informationParallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves
Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Daniel Butnaru butnaru@in.tum.de Advisor: Michael Bader bader@in.tum.de JASS 08 Computational Science and Engineering
More informationEVALUATION OF AN EFFICIENT STACK-RLE CLUSTERING CONCEPT FOR DYNAMICALLY ADAPTIVE GRIDS
SIAM J. SCI. COMPUT. Vol. 38, No. 6, pp. C678 C712 c 2016 Society for Industrial and Applied Mathematics EVALUATION OF AN EFFICIENT STACK-RLE CLUSTERING CONCEPT FOR DYNAMICALLY ADAPTIVE GRIDS MARTIN SCHREIBER,
More informationIntroducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method
Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System
More informationKlima-Exzellenz in Hamburg
Klima-Exzellenz in Hamburg Adaptive triangular meshes for inundation modeling 19.10.2010, University of Maryland, College Park Jörn Behrens KlimaCampus, Universität Hamburg Acknowledging: Widodo Pranowo,
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationPeta-Scale Simulations with the HPC Software Framework walberla:
Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,
More informationsmooth coefficients H. Köstler, U. Rüde
A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationGeneric Topology Mapping Strategies for Large-scale Parallel Architectures
Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous
More informationMassively Parallel Finite Element Simulations with deal.ii
Massively Parallel Finite Element Simulations with deal.ii Timo Heister, Texas A&M University 2012-02-16 SIAM PP2012 joint work with: Wolfgang Bangerth, Carsten Burstedde, Thomas Geenen, Martin Kronbichler
More informationIntroduction to Multigrid and its Parallelization
Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationPerformance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla
Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer,
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationSpace-Filling Curves An Introduction
Department of Informatics Technical University Munich Space-Filling Curves An Introduction Paper accompanying the presentation held on April nd 005 for the Joint Advanced Student School (JASS) in St. Petersburg
More informationMatrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs
Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko
More informationAccelerated Earthquake Simulations
Accelerated Earthquake Simulations Alex Breuer Technische Universität München Germany 1 Acknowledgements Volkswagen Stiftung Project ASCETE: Advanced Simulation of Coupled Earthquake-Tsunami Events Bavarian
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationEfficient Global Element Indexing for Parallel Adaptive Flow Solvers
Procedia Computer Science Volume 29, 2014, Pages 246 255 ICCS 2014. 14th International Conference on Computational Science Efficient Global Element Indexing for Parallel Adaptive Flow Solvers Michael Lieb,
More informationEfficient Imaging Algorithms on Many-Core Platforms
Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based
More informationA Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids
A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011
More informationMemory Efficient Adaptive Mesh Generation and Implementation of Multigrid Algorithms Using Sierpinski Curves
Memory Efficient Adaptive Mesh Generation and Implementation of Multigrid Algorithms Using Sierpinski Curves Michael Bader TU München Stefanie Schraufstetter TU München Jörn Behrens AWI Bremerhaven Abstract
More informationPhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.
Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More information1.2 Numerical Solutions of Flow Problems
1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationGeneration of Multigrid-based Numerical Solvers for FPGA Accelerators
Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System
More informationParallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle
ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of
More informationEfficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling
Iterative Solvers Numerical Results Conclusion and outlook 1/22 Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Part II: GPU Implementation and Scaling on Titan Eike
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationIntegrating GPUs as fast co-processors into the existing parallel FE package FEAST
Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Dipl.-Inform. Dominik Göddeke (dominik.goeddeke@math.uni-dortmund.de) Mathematics III: Applied Mathematics and Numerics
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationRadial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing
Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Natasha Flyer National Center for Atmospheric Research Boulder, CO Meshes vs. Mesh-free
More informationFOR P3: A monolithic multigrid FEM solver for fluid structure interaction
FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany
More informationHandling Parallelisation in OpenFOAM
Handling Parallelisation in OpenFOAM Hrvoje Jasak hrvoje.jasak@fsb.hr Faculty of Mechanical Engineering and Naval Architecture University of Zagreb, Croatia Handling Parallelisation in OpenFOAM p. 1 Parallelisation
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationSome aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE)
Some aspects of parallel program design R. Bader (LRZ) G. Hager (RRZE) Finding exploitable concurrency Problem analysis 1. Decompose into subproblems perhaps even hierarchy of subproblems that can simultaneously
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationGraph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationEfficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI
Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from
More informationSpace Filling Curves and Hierarchical Basis. Klaus Speer
Space Filling Curves and Hierarchical Basis Klaus Speer Abstract Real world phenomena can be best described using differential equations. After linearisation we have to deal with huge linear systems of
More informationGPU Cluster Computing for FEM
GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing
More informationMassively Parallel Phase Field Simulations using HPC Framework walberla
Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich
More informationGeneric finite element capabilities for forest-of-octrees AMR
Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationPlacement de processus (MPI) sur architecture multi-cœur NUMA
Placement de processus (MPI) sur architecture multi-cœur NUMA Emmanuel Jeannot, Guillaume Mercier LaBRI/INRIA Bordeaux Sud-Ouest/ENSEIRB Runtime Team Lyon, journées groupe de calcul, november 2010 Emmanuel.Jeannot@inria.fr
More informationParallel Mesh Partitioning in Alya
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Mesh Partitioning in Alya A. Artigues a *** and G. Houzeaux a* a Barcelona Supercomputing Center ***antoni.artigues@bsc.es
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationComputing on GPU Clusters
Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationEfficiency of adaptive mesh algorithms
Efficiency of adaptive mesh algorithms 23.11.2012 Jörn Behrens KlimaCampus, Universität Hamburg http://www.katrina.noaa.gov/satellite/images/katrina-08-28-2005-1545z.jpg Model for adaptive efficiency 10
More informationTowards real-time prediction of Tsunami impact effects on nearshore infrastructure
Towards real-time prediction of Tsunami impact effects on nearshore infrastructure Manfred Krafczyk & Jonas Tölke Inst. for Computational Modeling in Civil Engineering http://www.cab.bau.tu-bs.de 24.04.2007
More informationMesh reordering in Fluidity using Hilbert space-filling curves
Mesh reordering in Fluidity using Hilbert space-filling curves Mark Filipiak EPCC, University of Edinburgh March 2013 Abstract Fluidity is open-source, multi-scale, general purpose CFD model. It is a finite
More informationFast Dynamic Load Balancing for Extreme Scale Systems
Fast Dynamic Load Balancing for Extreme Scale Systems Cameron W. Smith, Gerrett Diamond, M.S. Shephard Computation Research Center (SCOREC) Rensselaer Polytechnic Institute Outline: n Some comments on
More informationComputational Fluid Dynamics and Interactive Visualisation
Computational Fluid Dynamics and Interactive Visualisation Ralf-Peter Mundani 1, Jérôme Frisch 2 1 Computation in Engineering, TUM 2 E3D, RWTH Aachen University Interdisciplinary Cluster Workshop on Visualization
More informationEfficient O(N log N) algorithms for scattered data interpolation
Efficient O(N log N) algorithms for scattered data interpolation Nail Gumerov University of Maryland Institute for Advanced Computer Studies Joint work with Ramani Duraiswami February Fourier Talks 2007
More informationWorkshop on Efficient Solvers in Biomedical Applications, Graz, July 2-5, 2012
Workshop on Efficient Solvers in Biomedical Applications, Graz, July 2-5, 2012 This work was performed under the auspices of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence
More informationEffect of memory latency
CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable
More informationMulti-Physics Multi-Code Coupling On Supercomputers
Multi-Physics Multi-Code Coupling On Supercomputers J.C. Cajas 1, G. Houzeaux 1, M. Zavala 1, M. Vázquez 1, B. Uekermann 2, B. Gatzhammer 2, M. Mehl 2, Y. Fournier 3, C. Moulinec 4 1) er, Edificio NEXUS
More informationComputational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm
Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm March 17 March 21, 2014 Florian Schornbaum, Martin Bauer, Simon Bogner Chair for System Simulation Friedrich-Alexander-Universität
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationGPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA
GPU-Accelerated Algebraic Multigrid for Commercial Applications Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA ANSYS Fluent 2 Fluent control flow Accelerate this first Non-linear iterations Assemble
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationMultilevel optimization by space-filling curves in adaptive atmospheric modeling
Multilevel optimization by space-filling curves in adaptive atmospheric modeling Jörn Behrens behrens@ma.tum.de http://www.joernbehrens.de/ TU München Zentrum Mathematik (M3) Botzmannstr. 3 85747 Garching
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationTop-Down System Design Approach Hans-Christian Hoppe, Intel Deutschland GmbH
Exploiting the Potential of European HPC Stakeholders in Extreme-Scale Demonstrators Top-Down System Design Approach Hans-Christian Hoppe, Intel Deutschland GmbH Motivation & Introduction Computer system
More informationICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction
ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow
More informationBandwidth Avoiding Stencil Computations
Bandwidth Avoiding Stencil Computations By Kaushik Datta, Sam Williams, Kathy Yelick, and Jim Demmel, and others Berkeley Benchmarking and Optimization Group UC Berkeley March 13, 2008 http://bebop.cs.berkeley.edu
More informationSecond Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering
State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationANSYS HPC Technology Leadership
ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables
More informationTools and Primitives for High Performance Graph Computation
Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World
More informationBenchmarking CPU Performance. Benchmarking CPU Performance
Cluster Computing Benchmarking CPU Performance Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance,
More informationVirtual EM Inc. Ann Arbor, Michigan, USA
Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan,
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationThread and Data parallelism in CPUs - will GPUs become obsolete?
Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für
More informationHigh Performance Computing (HPC) in der Verfahrenstechnik
High Performance Computing (HPC) in der Verfahrenstechnik Hans Hasse 1), Jadran Vrabec 2), Hans-Joachim Bungartz 3) 1) Lehrstuhl für Thermodynamik, TU Kaiserslautern 2) Lehrstuhl für Thermodynamik und
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationTopology and affinity aware hierarchical and distributed load-balancing in Charm++
Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Emmanuel Jeannot, Guillaume Mercier, François Tessier Inria - IPB - LaBRI - University of Bordeaux - Argonne National
More information