A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications
|
|
- Frederica Annis Burke
- 5 years ago
- Views:
Transcription
1 A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications James Bordner, Michael L. Norman San Diego Supercomputer Center University of California, San Diego 15th SIAM Conference on Parallel Processing for Scientific Computing 16 February 2012 James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 1 / 21
2 Enzo-P / Cello Outline Introduction Cello AMR Conclusions James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 2 / 21
3 Enzo-P / Cello Outline Introduction project overview motivation Cello AMR SAMR review patch merging dual-decomposition message-driven execution Conclusions James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 2 / 21
4 Enzo-P / Cello Introduction Cello began as a project to provide Enzo with highly scalable AMR Enzo Enzo-P Cello Enzo: astrophysics / cosmology application patch-based SAMR MPI or MPI / OpenMP 18 years development; 150K SLOC Enzo-P / Cello: petascale fork of Enzo code modified tree-based SAMR MPI or CHARM++ 2 years development; 25K SLOC Work in progress! generated using David A. Wheeler s SLOCCount James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 3 / 21
5 Motivation Enzo s Strengths [ John Wise ] Multiple application domains astrophysical fluid dynamics hydrodynamic cosmology Rich multi-physics capabilities fluid, particle, gravity, radiation Extreme resolution range 3 levels of refinement by 2! Hybrid MPI / OpenMP Active global development community 25 developers James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 / 21
6 Motivation Enzo s Struggles Grid patch meta-data is large 1.5KB/patch (MPI/OpenMP helps) Memory fragmentation Mesh quality 2-to-1 constraint can be violated asymmetric mesh for symmetric problem Load balancing difficulty maintaining parent-child locality Parallel scaling AMR overhead dominates computation [ Tom Abel, John Wise, Ralf Kaehler ] James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 5 / 21
7 Patch-based or Tree-based SAMR? Some advantages of Patch-based AMR Flexible patch size and shape improved refinement efficiency Larger patches smaller surface/volume ratio reduced communication amortized loop overhead Fewer patches smaller trees: reduced meta-data James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 6 / 21
8 Patch-based or Tree-based SAMR? Some advantages of Tree-based AMR Fixed block size and shape simplified load balancing dynamic memory reuse More blocks more parallelism available Smaller nodes: less meta-data Compute only on leaf nodes less communication James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 7 / 21
9 Cello AMR Overview Cello uses a modified tree-based SAMR approach Modifications primarily to address large tree sizes Patch merging to reduce node count Dual-decomposition to maintain parallelism Targeted refinement for deep hierarchies Message-driven execution to address many issues dynamic scheduling latency tolerant overlaps communication / computation automatic load balancing... James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 8 / 21
10 Patch Merging The Basic Idea 25 leaf nodes 13 leaf nodes 25 leaf nodes Merge patches into larger ones when possible 2 Split patches into smaller ones when necessary James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 9 / 21
11 Patch Merging The Basic Idea 25 leaf nodes 13 leaf nodes 25 leaf nodes Merge patches into larger ones when possible 2 Split patches into smaller ones when necessary James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 9 / 21
12 Patch Merging A More Realistic Example Cosmological structure formation L = 38.Mpc 10 2 m 10 7 range in mass density Octree refined to L = 8 has nodes Balanced tree has nodes Patch merging tree has 1057 nodes James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 10 / 21
13 Patch Merging A More Realistic Example Cosmological structure formation L = 38.Mpc 102 m 107 range in mass density Octree refined to L = 8 has nodes Balanced tree has nodes Patch merging tree has 1057 nodes James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 10 / 21
14 Patch Merging A More Realistic Example Cosmological structure formation L = 38.Mpc 102 m 107 range in mass density Octree refined to L = 8 has nodes Balanced tree has nodes Patch merging tree has 1057 nodes James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 10 / 21
15 Patch Merging A More Realistic Example Cosmological structure formation L = 38.Mpc 10 2 m 10 7 range in mass density Octree refined to L = 8 has nodes Balanced tree has nodes Patch merging tree has 1057 nodes James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 10 / 21
16 Patch Merging Summary Could reduce AMR meta-data by 2 to 3 (including 25% to 50% increase in node size) However, there are disadvantages fewer patches: less available parallelism variable sizes: difficult to load-balance How can we regain advantages lost? decompose large Patches into smaller Blocks... James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 11 / 21
17 Dual-Decomposition The Basic Idea Hierarchy Patch Block Hierarchy: octree-like container of distributed Patches Patch: distributed array of blocks Block: local arrays of data (fields, particles, etc.) James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 12 / 21
18 Dual-Decomposition Communication Patterns Intra patch block update Inter patch block update Intra-patch block update neighbor blocks in same patch distributed uniform mesh problem regular communication patterns efficient and scalable Inter-patch block update neighbor blocks in neighbor patches standard coarse/fine interface update James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 13 / 21
19 Dual-Decomposition Communication Patterns Intra patch block update Inter patch block update Intra-patch block update neighbor blocks in same patch distributed uniform mesh problem regular communication patterns efficient and scalable Inter-patch block update neighbor blocks in neighbor patches standard coarse/fine interface update James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 13 / 21
20 Dual-Decomposition Summary Regains parallelism lost in patch merging Maintains same underlying computational mesh Replaces some subtrees with arrays Embedded unigrid-efficiency in uniformly-refined subregions Hierarchy Patch Block James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 1 / 21
21 Message-driven Execution What is CHARM++? CHARM++ is a parallel language and runtime system Provides processor virtualization multiple objects per physical processor runtime system schedules object methods Important advantages for AMR asynchronous: latency-tolerance well-suited for complex, dynamic applications Also provides fault tolerance dynamic load balancing checkpoint/restart James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 15 / 21
22 Message-driven Execution What is CHARM++? Programmer sees a collection of objects CHARM++ objects are called chares chares send messages to each other remote function calls: entry methods CHARM++ runtime system maps chares to physical processors schedules entry methods for execution migrates chares to load balance James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 16 / 21
23 Message-driven Execution CHARM++ Entities CHARM++ supports collections of chares Chare Arrays distributed array of chares migratable elements Chare Groups one chare per processor (non-migratable) Chare Nodegroups one chare per node (non-migratable) James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 17 / 21
24 Message-driven Execution CHARM++ Entities in Cello Main Simulation Patch Block Process P 1 The mainchare creates a Simulation chare group Each Simulation contains some Patch chares Each Patch contains a chare array of Blocks James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 18 / 21
25 Cello CHARM++ Control flow in Enzo-P / Cello Current (unigrid) Enzo-P control flow 1 Initialize create chares and chare arrays set initial conditions 2 Refresh ghost zones 3 Calculate timestep currently involves global reduction this should be avoidable Compute! Main Simulation Patch Block Patch Simulation Patch Block James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 19 / 21
26 Conclusions A beta version of Enzo-P / Cello is available at cello-project.org Uniform Cartesian mesh AMR ETA 3-6 months CHARM++ or MPI Blocks contain arrays of field variables controllable precision, ordering, padding, alignment, etc. PPM hydrodynamics, PPML MHD HDF5 I/O on every 1 k P processes James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 20 / 21
27 Enzo-P/ Cello Enzo-P: Petascale astrophysics and cosmology application Cello: Scalable tree-based AMR framework Patch merging + dual-decomposition suitable for embedded uniformly refined areas estimated 3 fewer nodes Message-driven execution using CHARM++ especially suitable for huge complex dynamic problems latency tolerant, auto load-balancing, checkpointing, etc. Targeted refinement for deep AMR Website Listserv jobordner@ucsd.edu James Bordner (U.C. San Diego) Enzo-P / Cello SIAM PP12 21 / 21
Enzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy
Enzo-P / Cello Formation of the First Galaxies James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2 Michigan State University Department
More informationEnzo-P / Cello. Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology. San Diego Supercomputer Center. Department of Physics and Astronomy
Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2
More informationFormation of the First Galaxies Enzo-P / Cello Adaptive Mesh Refinement
Formation of the First Galaxies Enzo-P / ello Adaptive Mesh Refinement James Bordner, Michael L. Norman, Brian O Shea June 20, 2013 Abstract Enzo [8] is a mature and highly successful parallel adaptive
More informationExperiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture
Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture 1 Introduction Robert Harkness National Institute for Computational Sciences Oak Ridge National Laboratory The National
More informationDynamic Load Partitioning Strategies for Managing Data of Space and Time Heterogeneity in Parallel SAMR Applications
Dynamic Load Partitioning Strategies for Managing Data of Space and Time Heterogeneity in Parallel SAMR Applications Xiaolin Li and Manish Parashar The Applied Software Systems Laboratory Department of
More informationFuture of Enzo. Michael L. Norman James Bordner LCA/SDSC/UCSD
Future of Enzo Michael L. Norman James Bordner LCA/SDSC/UCSD SDSC Resources Data to Discovery Host SDNAP San Diego network access point for multiple 10 Gbs WANs ESNet, NSF TeraGrid, CENIC, Internet2, StarTap
More informationPeta-Scale Simulations with the HPC Software Framework walberla:
Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,
More informationAdaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA
Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA
More informationVisualization Challenges for Large Scale Astrophysical Simulation Data. Ultrascale Visualization Workshop
Visualization Challenges for Large Scale Astrophysical Simulation Data Ralf Kähler (KIPAC/SLAC) Tom Abel (KIPAC/Stanford) Marcelo Alvarez (CITA) Oliver Hahn (Stanford) Hans-Christian Hege (ZIB) Ji-hoon
More informationWelcome to the 2017 Charm++ Workshop!
Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign 2017
More informationAdvanced Parallel Programming. Is there life beyond MPI?
Advanced Parallel Programming Is there life beyond MPI? Outline MPI vs. High Level Languages Declarative Languages Map Reduce and Hadoop Shared Global Address Space Languages Charm++ ChaNGa ChaNGa on GPUs
More informationLoad Balancing Techniques for Asynchronous Spacetime Discontinuous Galerkin Methods
Load Balancing Techniques for Asynchronous Spacetime Discontinuous Galerkin Methods Aaron K. Becker (abecker3@illinois.edu) Robert B. Haber Laxmikant V. Kalé University of Illinois, Urbana-Champaign Parallel
More informationExperiences with ENZO on the Intel Many Integrated Core Architecture
Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and
More informationScientific Computing at Million-way Parallelism - Blue Gene/Q Early Science Program
Scientific Computing at Million-way Parallelism - Blue Gene/Q Early Science Program Implementing Hybrid Parallelism in FLASH Christopher Daley 1 2 Vitali Morozov 1 Dongwook Lee 2 Anshu Dubey 1 2 Jonathon
More informationParallel Algorithms: Adaptive Mesh Refinement (AMR) method and its implementation
Parallel Algorithms: Adaptive Mesh Refinement (AMR) method and its implementation Massimiliano Guarrasi m.guarrasi@cineca.it Super Computing Applications and Innovation Department AMR - Introduction Solving
More informationPreliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede
Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison
More informationComputational Astrophysics 5 Higher-order and AMR schemes
Computational Astrophysics 5 Higher-order and AMR schemes Oscar Agertz Outline - The Godunov Method - Second-order scheme with MUSCL - Slope limiters and TVD schemes - Characteristics tracing and 2D slopes.
More informationI/O Analysis and Optimization for an AMR Cosmology Application
I/O Analysis and Optimization for an AMR Cosmology Application Jianwei Li Wei-keng Liao Alok Choudhary Valerie Taylor ECE Department, Northwestern University {jianwei, wkliao, choudhar, taylor}@ece.northwestern.edu
More informationFourteen years of Cactus Community
Fourteen years of Cactus Community Frank Löffler Center for Computation and Technology Louisiana State University, Baton Rouge, LA September 6th 2012 Outline Motivation scenario from Astrophysics Cactus
More informationCHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison
CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison Support: Rapid Innovation Fund, U.S. Army TARDEC ASME
More informationHigh performance computing and numerical modeling
High performance computing and numerical modeling Volker Springel Plan for my lectures Lecture 1: Collisional and collisionless N-body dynamics Lecture 2: Gravitational force calculation Lecture 3: Basic
More informationFault tolerant issues in large scale applications
Fault tolerant issues in large scale applications Romain Teyssier George Lake, Ben Moore, Joachim Stadel and the other members of the project «Cosmology at the petascale» SPEEDUP 2010 1 Outline Computational
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationIntroducing Overdecomposition to Existing Applications: PlasComCM and AMPI
Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Sam White Parallel Programming Lab UIUC 1 Introduction How to enable Overdecomposition, Asynchrony, and Migratability in existing
More informationEvaluating the Performance of the Community Atmosphere Model at High Resolutions
Evaluating the Performance of the Community Atmosphere Model at High Resolutions Soumi Manna MS candidate, University of Wyoming Mentor: Dr. Ben Jamroz National Center for Atmospheric Research Boulder,
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationTowards a Reconfigurable HPC Component Model
C2S@EXA Meeting July 10, 2014 Towards a Reconfigurable HPC Component Model Vincent Lanore1, Christian Pérez2 1 ENS de Lyon, LIP 2 Inria, LIP Avalon team 1 Context 1/4 Adaptive Mesh Refinement 2 Context
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationLoad Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application
Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of
More informationGAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能
GAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能 Hsi-Yu Schive ( 薛熙于 ), Tzihong Chiueh ( 闕志鴻 ), Yu-Chih Tsai ( 蔡御之 ), Ui-Han Zhang ( 張瑋瀚 ) Graduate Institute
More informationHANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY
HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY Presenters: Harshitha Menon, Seonmyeong Bak PPL Group Phil Miller, Sam White, Nitin Bhat, Tom Quinn, Jim Phillips, Laxmikant Kale MOTIVATION INTEGRATED
More informationDevelopment of a Computational Framework for Block-Based AMR Simulations
Procedia Computer Science Volume 29, 2014, Pages 2351 2359 ICCS 2014. 14th International Conference on Computational Science Development of a Computational Framework for Block-Based AMR Simulations Hideyuki
More informationPerformance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla
Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer,
More informationAdaptive Refinement Tree (ART) code. N-Body: Parallelization using OpenMP and MPI
Adaptive Refinement Tree (ART) code N-Body: Parallelization using OpenMP and MPI 1 Family of codes N-body: OpenMp N-body: MPI+OpenMP N-body+hydro+cooling+SF: OpenMP N-body+hydro+cooling+SF: MPI 2 History:
More informationScalable Interaction with Parallel Applications
Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign Outline Overview Case Studies Charm++
More informationAdaptive-Mesh-Refinement Pattern
Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points
More informationScalable Fault Tolerance Schemes using Adaptive Runtime Support
Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at
More informationAdaptive Blocks: A High Performance Data Structure
Adaptive Blocks: A High Performance Data Structure Quentin F. Stout Electrical Engineering and Computer Science Center for Parallel Computing University of Michigan Ann Arbor, MI 48109-2122 qstout@umich.edu
More informationCOPYRIGHTED MATERIAL. Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges.
Chapter 1 Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges Manish Parashar and Xiaolin Li 1.1 MOTIVATION The exponential growth in computing, networking,
More informationEnabling scalable parallel implementations of structured adaptive mesh refinement applications
J Supercomput (2007) 39: 177 203 DOI 10.1007/s11227-007-0110-z Enabling scalable parallel implementations of structured adaptive mesh refinement applications Sumir Chandra Xiaolin Li Taher Saif Manish
More informationA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale Parallel Programming Laboratory Euro-Par 2009 University of Illinois at Urbana-Champaign
More informationCosmology Simulations with Enzo
Cosmology Simulations with Enzo John Wise (Georgia Tech) Enzo Workshop 17 Oct 2013 Outline Introduction to unigrid cosmology simulations Introduction to nested grid cosmology simulations Using different
More informationAACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn-
AACE: Applications R. Glenn Brook Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- brook@tennessee.edu Ryan C. Hulguin Computational Science
More informationUsing Charm++ to Support Multiscale Multiphysics
LA-UR-17-23218 Using Charm++ to Support Multiscale Multiphysics On the Trinity Supercomputer Robert Pavel, Christoph Junghans, Susan M. Mniszewski, Timothy C. Germann April, 18 th 2017 Operated by Los
More informationForest-of-octrees AMR: algorithms and interfaces
Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationIOS: A Middleware for Decentralized Distributed Computing
IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc
More informationFast Methods with Sieve
Fast Methods with Sieve Matthew G Knepley Mathematics and Computer Science Division Argonne National Laboratory August 12, 2008 Workshop on Scientific Computing Simula Research, Oslo, Norway M. Knepley
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationApplication Example Running on Top of GPI-Space Integrating D/C
Application Example Running on Top of GPI-Space Integrating D/C Tiberiu Rotaru Fraunhofer ITWM This project is funded from the European Union s Horizon 2020 Research and Innovation programme under Grant
More informationScalable Dynamic Adaptive Simulations with ParFUM
Scalable Dynamic Adaptive Simulations with ParFUM Terry L. Wilmarth Center for Simulation of Advanced Rockets and Parallel Programming Laboratory University of Illinois at Urbana-Champaign The Big Picture
More informationTopology and affinity aware hierarchical and distributed load-balancing in Charm++
Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Emmanuel Jeannot, Guillaume Mercier, François Tessier Inria - IPB - LaBRI - University of Bordeaux - Argonne National
More informationWhat is DARMA? DARMA is a C++ abstraction layer for asynchronous many-task (AMT) runtimes.
DARMA Janine C. Bennett, Jonathan Lifflander, David S. Hollman, Jeremiah Wilke, Hemanth Kolla, Aram Markosyan, Nicole Slattengren, Robert L. Clay (PM) PSAAP-WEST February 22, 2017 Sandia National Laboratories
More informationA Kernel-independent Adaptive Fast Multipole Method
A Kernel-independent Adaptive Fast Multipole Method Lexing Ying Caltech Joint work with George Biros and Denis Zorin Problem Statement Given G an elliptic PDE kernel, e.g. {x i } points in {φ i } charges
More informationScalability of Uintah Past Present and Future
DOE for funding the CSAFE project (97-10), DOE NETL, DOE NNSA NSF for funding via SDCI and PetaApps, INCITE, XSEDE Scalability of Uintah Past Present and Future Martin Berzins Qingyu Meng John Schmidt,
More informationEfficient Parallel Extraction of Crack-Free Isosurfaces from Adaptive Mesh Refinement (AMR) Data
Efficient Parallel Extraction of Crack-Free Isosurfaces from Adaptive Mesh Refinement (AMR) Data Gunther H. Weber Lawrence Berkeley National Laboratory University of California, Davis Hank Childs Lawrence
More informationGeneric finite element capabilities for forest-of-octrees AMR
Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationAdaptive Runtime Support
Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at
More informationDesigning Parallel Programs. This review was developed from Introduction to Parallel Computing
Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis
More informationWeek 3: MPI. Day 04 :: Domain decomposition, load balancing, hybrid particlemesh
Week 3: MPI Day 04 :: Domain decomposition, load balancing, hybrid particlemesh methods Domain decompositon Goals of parallel computing Solve a bigger problem Operate on more data (grid points, particles,
More informationLOCAL ADAPTIVE MULTIGRID FOR FINITE INTEGRALS TECHNIQUE. Introduction
ATEE 004 LOCAL ADAPTIVE MULTIGID FO FINITE INTEGALS TECHNIQUE Cătălin Ciobotaru, Daniel Ioan Numerical Methods Lab (LMN), Electrical Engineering Department Politehnica University of Bucharest, Spl. Independenţei
More informationPiTP Summer School 2009
PiTP Summer School 2009 Plan for my lectures Volker Springel Lecture 1 Basics of collisionless dynamics and the N-body approach Lecture 2 Gravitational solvers suitable for collisionless dynamics, parallelization
More informationPREPARING AN AMR LIBRARY FOR SUMMIT. Max Katz March 29, 2018
PREPARING AN AMR LIBRARY FOR SUMMIT Max Katz March 29, 2018 CORAL: SIERRA AND SUMMIT NVIDIA Volta fueling supercomputers IBM Power 9 + NVIDIA Volta V100 Sierra (LLNL): 4 GPUs/node, ~4300 nodes Summit (ORNL):
More informationLarge Scale Simulations of the Non-Thermal Universe
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Large Scale Simulations of the Non-Thermal Universe Claudio Gheller a,, Graziella Ferini a, Maciej Cytowski b, Franco Vazza
More informationIntro to Parallel Computing
Outline Intro to Parallel Computing Remi Lehe Lawrence Berkeley National Laboratory Modern parallel architectures Parallelization between nodes: MPI Parallelization within one node: OpenMP Why use parallel
More informationGPU-Based Visualization of AMR and N-Body Dark Matter Simulation Data. Ralf Kähler (KIPAC/SLAC)
GPU-Based Visualization of AMR and N-Body Dark Matter Simulation Data Ralf Kähler (KIPAC/SLAC) HiPACC-Meeting 03/21/2014 COMPUTER GRAPHICS Rasterization COMPUTER GRAPHICS Assumption (for now): Input object(s)
More informationKnights Landing Scalability and the Role of Hybrid Parallelism
Knights Landing Scalability and the Role of Hybrid Parallelism Sergi Siso 1, Aidan Chalk 1, Alin Elena 2, James Clark 1, Luke Mason 1 1 Hartree Centre @ STFC - Daresbury Labs 2 Scientific Computing Department
More informationA Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant Kalé
A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant Kalé Parallel Programming Laboratory Department of Computer Science University
More informationHierarchical Partitioning Techniques for Structured Adaptive Mesh Refinement Applications
Hierarchical Partitioning Techniques for Structured Adaptive Mesh Refinement Applications Xiaolin Li (xlli@caip.rutgers.edu) and Manish Parashar (parashar@caip.rutgers.edu) Department of Electrical and
More informationParallel Implementation of 3D FMA using MPI
Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system
More informationarxiv: v1 [cs.ms] 8 Aug 2018
ACCELERATING WAVE-PROPAGATION ALGORITHMS WITH ADAPTIVE MESH REFINEMENT USING THE GRAPHICS PROCESSING UNIT (GPU) XINSHENG QIN, RANDALL LEVEQUE, AND MICHAEL MOTLEY arxiv:1808.02638v1 [cs.ms] 8 Aug 2018 Abstract.
More informationLagrangian methods and Smoothed Particle Hydrodynamics (SPH) Computation in Astrophysics Seminar (Spring 2006) L. J. Dursi
Lagrangian methods and Smoothed Particle Hydrodynamics (SPH) Eulerian Grid Methods The methods covered so far in this course use an Eulerian grid: Prescribed coordinates In `lab frame' Fluid elements flow
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationRealistic and Efficient Multi-Channel Communications in WSN
Realistic and Efficient Multi-Channel Communications in WSN Miji Kim Andreas Schädeli Silvia Dorotheea Rus Cievoloth Gilber Coca Olmos Content Introduction Problem description Interference problems Time
More informationExpressing Fault Tolerant Algorithms with MPI-2. William D. Gropp Ewing Lusk
Expressing Fault Tolerant Algorithms with MPI-2 William D. Gropp Ewing Lusk www.mcs.anl.gov/~gropp Overview Myths about MPI and Fault Tolerance Error handling and reporting Goal of Fault Tolerance Run
More informationGeneral Plasma Physics
Present and Future Computational Requirements General Plasma Physics Center for Integrated Computation and Analysis of Reconnection and Turbulence () Kai Germaschewski, Homa Karimabadi Amitava Bhattacharjee,
More informationParallel Programming Concepts. Parallel Algorithms. Peter Tröger
Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,
More informationWhat is Cactus? Cactus is a framework for developing portable, modular applications
What is Cactus? Cactus is a framework for developing portable, modular applications What is Cactus? Cactus is a framework for developing portable, modular applications focusing, although not exclusively,
More informationSparse Direct Solvers for Extreme-Scale Computing
Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering
More informationARCHITECTURE SPECIFIC COMMUNICATION OPTIMIZATIONS FOR STRUCTURED ADAPTIVE MESH-REFINEMENT APPLICATIONS
ARCHITECTURE SPECIFIC COMMUNICATION OPTIMIZATIONS FOR STRUCTURED ADAPTIVE MESH-REFINEMENT APPLICATIONS BY TAHER SAIF A thesis submitted to the Graduate School New Brunswick Rutgers, The State University
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationAchieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation
Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationProgramming Models for Supercomputing in the Era of Multicore
Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need
More informationDebugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.
Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors
More informationFast Dynamic Load Balancing for Extreme Scale Systems
Fast Dynamic Load Balancing for Extreme Scale Systems Cameron W. Smith, Gerrett Diamond, M.S. Shephard Computation Research Center (SCOREC) Rensselaer Polytechnic Institute Outline: n Some comments on
More informationVisualization Tools for Adaptive Mesh Refinement Data
Visualization Tools for Adaptive Mesh Refinement Data Gunther H. Weber 1, Vincent E. Beckner 1, Hank Childs 2, Terry J. Ligocki 1, Mark C. Miller 2, Brian Van Straalen 1 and E. Wes Bethel 1 1 Lawrence
More informationDynamic load balancing in OSIRIS
Dynamic load balancing in OSIRIS R. A. Fonseca 1,2 1 GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal 2 DCTI, ISCTE-Instituto Universitário de Lisboa, Portugal Maintaining parallel load balance
More informationA RESTful catalog for simulations
Mem. S.A.It. Vol. 80, 365 c SAIt 2009 Memorie della A RESTful catalog for simulations R. Wagner Center for Astrophysics and Space Sciences, University of California at San Diego, La Jolla, CA 92093, e-mail:
More informationChapter 3. Design of Grid Scheduler. 3.1 Introduction
Chapter 3 Design of Grid Scheduler The scheduler component of the grid is responsible to prepare the job ques for grid resources. The research in design of grid schedulers has given various topologies
More informationPHYSICALLY BASED ANIMATION
PHYSICALLY BASED ANIMATION CS148 Introduction to Computer Graphics and Imaging David Hyde August 2 nd, 2016 WHAT IS PHYSICS? the study of everything? WHAT IS COMPUTATION? the study of everything? OUTLINE
More informationOptimization of PIERNIK for the Multiscale Simulations of High-Redshift Disk Galaxies
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Optimization of PIERNIK for the Multiscale Simulations of High-Redshift Disk Galaxies Kacper Kowalik a, Artur Gawryszczak
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationCS 350 : Data Structures B-Trees
CS 350 : Data Structures B-Trees David Babcock (courtesy of James Moscola) Department of Physical Sciences York College of Pennsylvania James Moscola Introduction All of the data structures that we ve
More informationCommunication and Topology-aware Load Balancing in Charm++ with TreeMatch
Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop (IEEE Cluster 2013, Indianapolis, IN) Emmanuel Jeannot Esteban Meneses-Rojas Guillaume Mercier François
More informationCPSC / Sonny Chan - University of Calgary. Collision Detection II
CPSC 599.86 / 601.86 Sonny Chan - University of Calgary Collision Detection II Outline Broad phase collision detection: - Problem definition and motivation - Bounding volume hierarchies - Spatial partitioning
More informationAn Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos
An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm Martin Burtscher Department of Computer Science Texas State University-San Marcos Mapping Regular Code to GPUs Regular codes Operate on
More informationOperating System Virtualization: Practice and Experience
Operating System Virtualization: Practice and Experience Oren Laadan and Jason Nieh Columbia University {orenl,nieh}@cs.columbia.edu SYSTOR 2010, Haifa, Israel 1 orenl@cs.columbia.edu SYSTOR 2010, Haifa,
More informationInteractive Analysis of Large Distributed Systems with Scalable Topology-based Visualization
Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization Lucas M. Schnorr, Arnaud Legrand, and Jean-Marc Vincent e-mail : Firstname.Lastname@imag.fr Laboratoire d Informatique
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationComputational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm
Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm March 17 March 21, 2014 Florian Schornbaum, Martin Bauer, Simon Bogner Chair for System Simulation Friedrich-Alexander-Universität
More information