ScaFaCoS and P 3 M. Olaf Lenz. Recent Developments. June 3, 2013
|
|
- Morgan Allen
- 6 years ago
- Views:
Transcription
1 ScaFaCoS and P 3 M Recent Developments Olaf Lenz June 3, 2013
2 Outline ScaFaCoS ScaFaCoS Methods Performance Comparison Recent P 3 M developments Olaf Lenz ScaFaCoS and P 3 M 2/23
3 ScaFaCoS Scalable Fast Coulomb Solver Highly scalable, MPI-parallelized Library of different Coulomb solvers Common interface for all methods Developed by groups from Jülich, Wuppertal, Chemnitz, Bonn... and Stuttgart BMBF project, officially ended 2011 Source code on github since two months (yay!) First publication will be submitted soon (since 6 months) Olaf Lenz ScaFaCoS and P 3 M 3/23
4 Interface # include <fcs.h> FCS handle = NULL ; /* Initialize P3M */ fcs_init (& handle, " p3m ", MPI_COMM_WORLD ); /* Set common parameters */ fcs_ set_ common ( handle, near_field, box_a, box_b, box_c, offset, periodicity, total_ particles ); /* Set method - specific parmeters */ fcs_p3m_set_r_cut ( handle, r_cut ); /* Tune the method ( optional ) */ fcs_ tune ( handle, N, max_particles, positions, charges ); /* Run the method */ fcs_run ( handle, N, max_particles, positions, charges, fields, potentials ); /* Finally destroy the handle */ fcs_destroy ( handle ); Olaf Lenz ScaFaCoS and P 3 M 4/23
5 Methods ScaFaCoS currently provides 11 methods: DIRECT, EWALD, P3M, P2NFFT, VMG, PP3MG, PEPC,, MEMD, MMM1D, MMM2D In the following comparison, only the bold methods are considered Distinguish Splitting Methods, Hierarchical Methods and Local Methods (i.e. MEMD) Other methods for reference purposes only (DIRECT, EWALD), for different periodicities only (MMM*D, here, only fully periodic systems are considered), or performed too bad (PEPC) Olaf Lenz ScaFaCoS and P 3 M 5/23
6 Splitting Methods Problems of electrostatic potential Slow decay bad for direct summation Singularity bad for convergence accelerating methods = + Idea of splitting methods: split potential into fast decaying near field and non-singular far field Near field can be computed directly (O(N)) For the far field, other methods can be used Olaf Lenz ScaFaCoS and P 3 M 6/23
7 Splitting Methods: Ewald and Particle-Mesh Ewald Ewald s idea: Compute far field in Fourier space Ewald summation O(N 3/2 ) Particle-Mesh Ewald O(N log N) discretize far field charge distribution onto mesh use FFT to Fourier transform solve Poisson equation in Fourier space back-fft to obtain potential on mesh compute potentials or fields by interpolating the mesh potential In ScaFaCoS: P3M (ICP), P2NFFT (Chemnitz; uses non-equidistant FFT) P. P. Ewald ( ) Olaf Lenz ScaFaCoS and P 3 M 7/23
8 Splitting Methods: Multigrid Solve Poisson equation in far field with multigrid PDE solver use differente levels of successively coarser meshes solve poisson equation on these meshes by recursively improving the solution of the coarser mesh Complexity O(N) Can be extended to handle periodic BC In ScaFaCoS: PP3MG (Wuppertal), VMG (Bonn) l = 4 l = 3 l = 2 l = 1 Restriction Prolongation Smoothing/Solving Olaf Lenz ScaFaCoS and P 3 M 8/23
9 Hierarchical Methods: Barnes-Hut Tree Code Multipole expand successively larger clusters of particles Compute interaction with far away clusters instead of single particles Complexity O(N log N) Can be extended to handle periodic BC In ScaFaCoS: PEPC (Jülich) Olaf Lenz ScaFaCoS and P 3 M 9/23
10 Hierarchical Methods: Fast Multipole Method Expand Barnes-Hut: let clusters interact with each other Put eveything on a grid Complexity O(N) Can be extended to handle periodic BC In ScaFaCoS: (Jülich) Olaf Lenz ScaFaCoS and P 3 M 10/23
11 Local Methods: MEMD See talk of Florian F. Purely local: should show very nice parallel scaling Complexity O(N) Olaf Lenz ScaFaCoS and P 3 M 11/23
12 Benchmark Systems Cloud-wall system (ESPR ES S O test system) 300 charges Olaf Lenz Silica melt charges ScaFaCoS and P3 M 12/23
13 Benchmark Systems 2 When larger systems were needed, the systems were replicated PEPC was removed (too bad) Periodic systems Relatively homogenous density Charge-neutral JUROPA (Linux cluster) for small to intermediate number of cores JUGENE (BlueGene/P HPC machine) for intermediate to large number of cores Accuracies are given by the relative RMS potential error ε pot := N j=1 Φref ( x j ) Φ method ( x j ) 2 Φ ref ( x j ) 2 N j=1 Olaf Lenz ScaFaCoS and P 3 M 13/23 1/2
14 Complexity P2NFFT, P 3 M and are fastest MEMD and Multigrid 10 slower All algorithms show (close-to-)linear behavior log N-term of P2NFFT and P 3 M is invisible No cross-over with Time t/#charges [s] MEMD P2NFFT P3M VMG PP3MG #Charges Silica melt, ε pot 10 3, P = 1 (JUROPA) Olaf Lenz ScaFaCoS and P 3 M 14/23
15 Accuracy and P2NFFT scale very good Can achieve very high accuracy P 3 M not (due to tuning) Multigrid methods suffer from steep potential (or bad tuning) MEMD cannot influence accuracy to any great extent Time t/#charges [s] MEMD P2NFFT P3M VMG PP3MG Relative RMS potential error ε pot N = (Cloud-wall), P = 1 (JUROPA) Olaf Lenz ScaFaCoS and P 3 M 15/23
16 Scaling: Timing Execution time t vs. number of cores P often used to display parallel scaling shows actual execution times hides actual scaling hides differences between algorithms Time t [s] MEMD P2NFFT P3M VMG PP3MG #Cores P N = (Cloud-wall), ε pot 10 3, JUROPA Olaf Lenz ScaFaCoS and P 3 M 16/23
17 Scaling: Relative Parallel Efficiency Parallel efficiency can be used to show scaling: e(p) = t 1 t P P e(p) [0, 1] 1 for optimal scaling Can be thought of effective fraction of CPU used in parallel Relative Parallel Efficiency to compare algorithms: e(p) = t bestp best t P P Relative Parallel Efficiency e(p ) MEMD P2NFFT P3M VMG PP3MG #Cores P N = (Cloud-wall), ε pot 10 3, JUROPA Olaf Lenz ScaFaCoS and P 3 M 17/23
18 Scaling: Comparing Methods Again: P2NFFT, and P 3 M within 2 Scaling of P 3 M is better than P2NFFT and Issue of P 3 M: Tuning MEMD performs OK Scaling of Multigrid methods is very smooth and flat... but e(p) < 10%! Relative Parallel Efficiency e(p ) MEMD P2NFFT P3M VMG PP3MG #Cores P N = (Cloud-wall), ε pot 10 3, JUROPA Olaf Lenz ScaFaCoS and P 3 M 18/23
19 Scaling: Small Systems Relative Parallel Efficiency e(p ) MEMD P2NFFT P3M VMG PP3MG #Cores P N = 8100 (Cloud-wall), ε pot 10 3, JUROPA Relative Parallel Efficiency e(p ) MEMD P2NFFT P3M VMG PP3MG #Cores P N = (Cloud-wall), ε pot 10 3, JUROPA Olaf Lenz ScaFaCoS and P 3 M 19/23
20 Scaling: HPC machine Relative Parallel Efficiency e(p ) MEMD P2NFFT P3M VMG PP3MG #Cores P N = (Cloud-wall), ε pot 10 3, JUROPA Relative Parallel Efficiency e(p ) MEMD P2NFFT P3M VMG PP3MG #Cores P N = (Cloud-wall), ε pot 10 3, JUGENE Older versions of both P 3 M and P2NFFT (JUGENE is dead) All algorithms show better scaling JUGENE has slower cores but better interconnects Olaf Lenz ScaFaCoS and P 3 M 20/23
21 Scaling: Large Systems Relative Parallel Efficiency e(p ) MEMD P2NFFT VMG PP3MG #Cores P N = (Cloud-wall), ε pot 10 3, JUGENE Many algorithms can t handle large systems Relative Parallel Efficiency e(p ) P2NFFT VMG PP3MG #Cores P For really large systems, seems to be good has done time steps of 3 trillion charges!... whatever that s good for N = (Cloud-wall), ε pot 10 3, JUGENE Olaf Lenz ScaFaCoS and P 3 M 21/23
22 ScaFaCoS: Conclusions Performance depends heavily on architecture, compiler and implementation... and tuning! 2 differences between algorithms are normal Within these limits,, P 3 M and P2NFFT perform equally good MEMD slightly worse ( 4), but performs better at larger systems Multigrid methods seem to be worse ( 10)... apparently due to large variation in the potential Olaf Lenz ScaFaCoS and P 3 M 22/23
23 P 3 M: Recent Developments Determined optimal P 3 M components, gained 4 (Florian W.) Improved tuning (Florian W.) CUDA P 3 M: coming to ESPRESSO really soon (Florian W.) First interface to ScaFaCoS (with problems) (Andreas M.) Improved P 3 M code (Florian W., Olaf) In progress: Improved code organization: common code base for ScaFaCoS and ESPRESSO (Florian W., Olaf) In progress: Further improvements in tuning (April, Olaf) Olaf Lenz ScaFaCoS and P 3 M 23/23
FORSCHUNGSZENTRUM JÜLICH GmbH Jülich Supercomputing Centre D Jülich, Tel. (02461)
FORSCHUNGSZENTRUM JÜLICH GmbH Jülich Supercomputing Centre D-52425 Jülich, Tel. (02461) 61-6402 Technical Report Benchmark of fast Coulomb Solvers for open and periodic boundary conditions Sebastian Krumscheid
More informationA Kernel-independent Adaptive Fast Multipole Method
A Kernel-independent Adaptive Fast Multipole Method Lexing Ying Caltech Joint work with George Biros and Denis Zorin Problem Statement Given G an elliptic PDE kernel, e.g. {x i } points in {φ i } charges
More informationIntroduction to Multigrid and its Parallelization
Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are
More informationLecture 4: Locality and parallelism in simulation I
Lecture 4: Locality and parallelism in simulation I David Bindel 6 Sep 2011 Logistics Distributed memory machines Each node has local memory... and no direct access to memory on other nodes Nodes communicate
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationAuto-tuning Multigrid with PetaBricks
Auto-tuning with PetaBricks Cy Chan Joint Work with: Jason Ansel Yee Lok Wong Saman Amarasinghe Alan Edelman Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationPerformance of deal.ii on a node
Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationInteractive Analysis of Large Distributed Systems with Scalable Topology-based Visualization
Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization Lucas M. Schnorr, Arnaud Legrand, and Jean-Marc Vincent e-mail : Firstname.Lastname@imag.fr Laboratoire d Informatique
More informationsmooth coefficients H. Köstler, U. Rüde
A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationSmoothers. < interactive example > Partial Differential Equations Numerical Methods for PDEs Sparse Linear Systems
Smoothers Partial Differential Equations Disappointing convergence rates observed for stationary iterative methods are asymptotic Much better progress may be made initially before eventually settling into
More informationA Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt Hardware/Software Co-Design, University of Erlangen-Nuremberg SPPEXA-Kolloquium, Erlangen, Germany; July 09, 2014 Challenges
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationEfficient O(N log N) algorithms for scattered data interpolation
Efficient O(N log N) algorithms for scattered data interpolation Nail Gumerov University of Maryland Institute for Advanced Computer Studies Joint work with Ramani Duraiswami February Fourier Talks 2007
More informationVariational Methods II
Mathematical Foundations of Computer Graphics and Vision Variational Methods II Luca Ballan Institute of Visual Computing Last Lecture If we have a topological vector space with an inner product and functionals
More informationMultigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver
Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Juan J. Alonso Department of Aeronautics & Astronautics Stanford University CME342 Lecture 14 May 26, 2014 Outline Non-linear
More informationPorting Scientific Applications to OpenPOWER
Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel
More informationApproaches to acceleration: GPUs vs Intel MIC. Fabio AFFINITO SCAI department
Approaches to acceleration: GPUs vs Intel MIC Fabio AFFINITO SCAI department Single core Multi core Many core GPU Intel MIC 61 cores 512bit-SIMD units from http://www.karlrupp.net/ from http://www.karlrupp.net/
More informationIntermediate Parallel Programming & Cluster Computing
High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction with the National Computational Science
More informationKey words. Poisson Solvers, Fast Fourier Transform, Fast Multipole Method, Multigrid, Parallel Computing, Exascale algorithms, Co-Design
FFT, FMM, OR MULTIGRID? A COMPARATIVE STUDY OF STATE-OF-THE-ART POISSON SOLVERS AMIR GHOLAMI, DHAIRYA MALHOTRA, HARI SUNDAR,, AND GEORGE BIROS Abstract. We discuss the fast solution of the Poisson problem
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationN-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo
N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational
More informationGPU Cluster Computing for FEM
GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing
More informationEfficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI
Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from
More informationFast Multipole and Related Algorithms
Fast Multipole and Related Algorithms Ramani Duraiswami University of Maryland, College Park http://www.umiacs.umd.edu/~ramani Joint work with Nail A. Gumerov Efficiency by exploiting symmetry and A general
More informationComputing on GPU Clusters
Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009
More informationAdaptive boundary element methods in industrial applications
Adaptive boundary element methods in industrial applications Günther Of, Olaf Steinbach, Wolfgang Wendland Adaptive Fast Boundary Element Methods in Industrial Applications, Hirschegg, September 30th,
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationFast Multipole, GPUs and Memory Crushing: The 2 Trillion Particle Euclid Flagship Simulation. Joachim Stadel Doug Potter See [Potter+ 17]
Fast Multipole, GPUs and Memory Crushing: The 2 Trillion Particle Euclid Flagship Simulation Joachim Stadel Doug Potter See [Potter+ 17] Outline Predictability The Euclid Flagship Light-cone and Mock Pkdgrav3:
More informationImage deblurring by multigrid methods. Department of Physics and Mathematics University of Insubria
Image deblurring by multigrid methods Marco Donatelli Stefano Serra-Capizzano Department of Physics and Mathematics University of Insubria Outline 1 Restoration of blurred and noisy images The model problem
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationMulti-Domain Pattern. I. Problem. II. Driving Forces. III. Solution
Multi-Domain Pattern I. Problem The problem represents computations characterized by an underlying system of mathematical equations, often simulating behaviors of physical objects through discrete time
More informationEvaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization
Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization Introduction to Storage Attachments: - Local (Direct cheap) SAS, SATA - Remote (SAN, NAS expensive) FC net Types - Block
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationScalability of Processing on GPUs
Scalability of Processing on GPUs Keith Kelley, CS6260 Final Project Report April 7, 2009 Research description: I wanted to figure out how useful General Purpose GPU computing (GPGPU) is for speeding up
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationIterative methods for use with the Fast Multipole Method
Iterative methods for use with the Fast Multipole Method Ramani Duraiswami Perceptual Interfaces and Reality Lab. Computer Science & UMIACS University of Maryland, College Park, MD Joint work with Nail
More informationFFT, FMM, OR MULTIGRID? A COMPARATIVE STUDY OF STATE-OF-THE-ART POISSON SOLVERS FOR UNIFORM AND NON-UNIFORM GRIDS IN THE UNIT CUBE
FFT, FMM, OR MULTIGRID? A COMPARATIVE STUDY OF STATE-OF-THE-ART POISSON SOLVERS FOR UNIFORM AND NON-UNIFORM GRIDS IN THE UNIT CUBE AMIR GHOLAMI, DHAIRYA MALHOTRA, HARI SUNDAR, AND GEORGE BIROS Abstract.
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationNumerical Modelling in Fortran: day 6. Paul Tackley, 2017
Numerical Modelling in Fortran: day 6 Paul Tackley, 2017 Today s Goals 1. Learn about pointers, generic procedures and operators 2. Learn about iterative solvers for boundary value problems, including
More informationMultigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK
Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible
More informationPARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS
Proceedings of FEDSM 2000: ASME Fluids Engineering Division Summer Meeting June 11-15,2000, Boston, MA FEDSM2000-11223 PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Prof. Blair.J.Perot Manjunatha.N.
More informationRecent developments for the multigrid scheme of the DLR TAU-Code
www.dlr.de Chart 1 > 21st NIA CFD Seminar > Axel Schwöppe Recent development s for the multigrid scheme of the DLR TAU-Code > Apr 11, 2013 Recent developments for the multigrid scheme of the DLR TAU-Code
More informationAutomatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014
Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential
More informationAdaptive Mesh Refinement
Aleander Knebe, Universidad Autonoma de Madrid Adaptive Mesh Refinement AMR codes Poisson s equation ΔΦ( ) = 4πGρ( ) Poisson s equation F ( ) = m Φ( ) ΔΦ( ) = 4πGρ( ) particle approach F ( Gm i ) = i m
More informationCommodity Cluster Computing
Commodity Cluster Computing Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne http://capawww.epfl.ch Commodity Cluster Computing 1. Introduction 2. Characterisation of nodes, parallel machines,applications
More informationτ-extrapolation on 3D semi-structured finite element meshes
τ-extrapolation on 3D semi-structured finite element meshes European Multi-Grid Conference EMG 2010 Björn Gmeiner Joint work with: Tobias Gradl, Ulrich Rüde September, 2010 Contents The HHG Framework τ-extrapolation
More informationFluent User Services Center
Solver Settings 5-1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Independence Adaption Appendix: Background Finite Volume
More informationExaFMM. Fast multipole method software aiming for exascale systems. User's Manual. Rio Yokota, L. A. Barba. November Revision 1
ExaFMM Fast multipole method software aiming for exascale systems User's Manual Rio Yokota, L. A. Barba November 2011 --- Revision 1 ExaFMM User's Manual i Revision History Name Date Notes Rio Yokota,
More informationHigh-End Computing Systems
High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University of Kentucky Lexington, KY 40506-0046 http://aggregate.org/hankd/
More informationMultilevel Summation of Electrostatic Potentials Using GPUs
Multilevel Summation of Electrostatic Potentials Using GPUs David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of Illinois at
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationA Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods
A Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods Scott A. Sarra, Derek Sturgill Marshall University, Department of Mathematics, One John Marshall Drive, Huntington
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationExploring unstructured Poisson solvers for FDS
Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests
More informationDirect Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers.
Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers. G. Borrell, J.A. Sillero and J. Jiménez, Corresponding author: guillem@torroja.dmt.upm.es School of Aeronautics, Universidad
More informationVirtual EM Inc. Ann Arbor, Michigan, USA
Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan,
More informationHigh-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma
High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1, Kesheng Wu 2, Alex Sim 2, Michael Churchill 3, Jong Y. Choi 4, Andreas Stathopoulos 1, CS Chang 3, and Scott
More informationImplicit Surfaces. Misha Kazhdan CS598b
Implicit Surfaces Misha Kazhdan CS598b Definition: Given a function F the implicit surface S generated by this function is the set of zero points of F: S ( ) { p F = } = p The interior I is the set of
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationChannels & Keyframes. CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2017
Channels & Keyframes CSE69: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 27 Animation Rigging and Animation Animation System Pose Φ... 2 N Rigging System Triangles Renderer Animation When
More informationEvaluation and Improvements of Programming Models for the Intel SCC Many-core Processor
Evaluation and Improvements of Programming Models for the Intel SCC Many-core Processor Carsten Clauss, Stefan Lankes, Pablo Reble, Thomas Bemmerl International Workshop on New Algorithms and Programming
More informationBenchmark runs of pcmalib on Nehalem and Shanghai nodes
MOSAIC group Institute of Theoretical Computer Science Department of Computer Science Benchmark runs of pcmalib on Nehalem and Shanghai nodes Christian Lorenz Müller, April 9 Addresses: Institute for Theoretical
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationCharacterizing Imbalance in Large-Scale Parallel Programs. David Bo hme September 26, 2013
Characterizing Imbalance in Large-Scale Parallel Programs David o hme September 26, 2013 Need for Performance nalysis Tools mount of parallelism in Supercomputers keeps growing Efficient resource usage
More informationParticle Simulator Research Team
Particle Simulator Research Team 1. Team members Junichiro Makino (Team Leader) Keigo Nitadori (Research Scientist) Yutaka Maruyama (Research Scientist) Masaki Iwasawa (Postdoctoral Researcher) Ataru Tanikawa
More informationAn evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks
An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract
More informationHeight Filed Simulation and Rendering
Height Filed Simulation and Rendering Jun Ye Data Systems Group, University of Central Florida April 18, 2013 Jun Ye (UCF) Height Filed Simulation and Rendering April 18, 2013 1 / 20 Outline Three primary
More informationSPH: Why and what for?
SPH: Why and what for? 4 th SPHERIC training day David Le Touzé, Fluid Mechanics Laboratory, Ecole Centrale de Nantes / CNRS SPH What for and why? How it works? Why not for everything? Duality of SPH SPH
More informationMultigrid Methods for Markov Chains
Multigrid Methods for Markov Chains Hans De Sterck Department of Applied Mathematics, University of Waterloo collaborators Killian Miller Department of Applied Mathematics, University of Waterloo, Canada
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationThe Barnes-Hut Algorithm in MapReduce
The Barnes-Hut Algorithm in MapReduce Ross Adelman radelman@gmail.com 1. INTRODUCTION For my end-of-semester project, I implemented an N-body solver in MapReduce using Hadoop. The N-body problem is a classical
More informationFast Multipole Method on the GPU
Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical
More informationAdaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA
Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA
More informationFast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck
Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank Hans De Sterck Department of Applied Mathematics University of Waterloo, Ontario, Canada joint work with Steve McCormick,
More informationOptimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH
Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Carsten Clauss, Martin Pöppe, Thomas Bemmerl carsten@lfbs.rwth-aachen.de http://www.mp-mpich.de Lehrstuhl für Betriebssysteme
More informationPhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.
Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences
More informationAdaptive-Mesh-Refinement Pattern
Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points
More informationWater. Notes. Free surface. Boundary conditions. This week: extend our 3D flow solver to full 3D water We need to add two things:
Notes Added a 2D cross-section viewer for assignment 6 Not great, but an alternative if the full 3d viewer isn t working for you Warning about the formulas in Fedkiw, Stam, and Jensen - maybe not right
More informationAccelerating Double Precision FEM Simulations with GPUs
Accelerating Double Precision FEM Simulations with GPUs Dominik Göddeke 1 3 Robert Strzodka 2 Stefan Turek 1 dominik.goeddeke@math.uni-dortmund.de 1 Mathematics III: Applied Mathematics and Numerics, University
More informationBulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model
Bulk Synchronous and SPMD Programming The Bulk Synchronous Model CS315B Lecture 2 Prof. Aiken CS 315B Lecture 2 1 Prof. Aiken CS 315B Lecture 2 2 Bulk Synchronous Model The Machine A model An idealized
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationAmazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud
Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Summarized by: Michael Riera 9/17/2011 University of Central Florida CDA5532 Agenda
More informationRadial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing
Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Natasha Flyer National Center for Atmospheric Research Boulder, CO Meshes vs. Mesh-free
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationDesigning Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi
More informationScan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
Scan Matching Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Scan Matching Overview Problem statement: Given a scan and a map, or a scan and a scan,
More informationXRAY Grid TO BE OR NOT TO BE?
XRAY Grid TO BE OR NOT TO BE? 1 I was not always a Grid sceptic! I started off as a grid enthusiast e.g. by insisting that Grid be part of the ESRF Upgrade Program outlined in the Purple Book : In this
More informationFive Dimensional Interpolation:exploring different Fourier operators
Five Dimensional Interpolation:exploring different Fourier operators Daniel Trad CREWES-University of Calgary Summary Five-Dimensional interpolation has become a very popular method to pre-condition data
More informationCRYSTAL in parallel: replicated and distributed. Ian Bush Numerical Algorithms Group Ltd, HECToR CSE
CRYSTAL in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE Introduction Why parallel? What is in a parallel computer When parallel? Pcrystal MPPcrystal
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationT6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller
T6: Position-Based Simulation Methods in Computer Graphics Jan Bender Miles Macklin Matthias Müller Jan Bender Organizer Professor at the Visual Computing Institute at Aachen University Research topics
More informationTFLOP Performance for ANSYS Mechanical
TFLOP Performance for ANSYS Mechanical Dr. Herbert Güttler Engineering GmbH Holunderweg 8 89182 Bernstadt www.microconsult-engineering.de Engineering H. Güttler 19.06.2013 Seite 1 May 2009, Ansys12, 512
More information