Framework Middleware Bridging Large-apps and Hardware
|
|
- Marilyn Rodgers
- 5 years ago
- Views:
Transcription
1 Framework Middleware Bridging Large-apps and Hardware Zeyao Mo Institute of Applied Physics and Computational Mathematics CAEP Software Center for Numerical Simulation CoDesign 2014, Guangzhou
2 CoDesign:Bottleneck - Programming 100PF 1.0 PF Apps.Soft.:Cooperation Wall : Real Apps.: #researchers # groups #LOC seriously increase TH-1 TH-2 20 TF XX-V 1.0 TF XX-VI Apps.Soft.:Efficiency Wall Performance, Programming, Power, Reliability
3 CoDesign:Bottleneck - Programming Cooperation Wall Efficiency Wall General Parallel Programming Models are sufficient? NOT Enough : now and in the future.
4 General Models: Nested/Hybrid Programming Stacks Hetergeneous models: GPU-CUDA, MIC-offloads, OpenCL, etc.
5 General Models: Parallel Programming Challenges Reconstruction of 18 years of MPI legacy codes, New generation codes for multi-physics complex systems Big and increasing gaps for realization Nested/Hybrid Parallel Programming Models MPI O(10K), NUMA-aware O(10), OpenMP O(100), ILP O(1); Hetergeneous Models: GPU-CUDA, MIC-offload, etc.
6 Big and increasing gaps for realization Nested/hybrid parallel programming models and their evolvement Complex management for for data structure and irregular communication; Multilevel/Hybrid load imbalance arising from physics, stencils, communication, run-time, etc.; Parallel implementation for for fast numerical algorithms; Component-based extensibility required by by complex applications where LOC scales up up to to 100K; Large scale data management and visualization.
7 Great Challenges: Nested/Hybrid Parallel Programming Load imbalance Fast numerial algorithms Code complexity Preprocessing/Visualization API Good solutions Scientific Computing and Engineering Applications common Domain Specific Frameworks Applicaiton A: parallel code A Application N: parallel code N
8 Framework accelerates application software: Encapsulates and separates parallel computing from applications; Significantly simplifies the the introduction of of the the emerging parallel programming models; Applies component-based software engineering to to high performance numerical simulation; Holds the the rewritten efforts for for multiple years (e.g years) even the the machine is is rapidly changing.
9 Make applications Think Parallel, Write Sequential Once framework in in place, ratio of of science experts vs. parallel experts :: 10:1, physics added as as serial code, now and in in the future. --M.Heroux, LANL, July. 2011, DOE Workshop on on Exascale Programming Challenges
10 2. Framework-based Parallel Programming Models for Numerical Simulation Think Parallel, Write Sequential
11 Numerical Simulation Application Programming Think Parallel, Write Sequential Framework: What s the fundamentals? Nested/Hybrid Emerging Parallel Programming Models MPI O(10K), NUMA-aware O(10), OpenMP O(100), ILP O(1)
12 2.1 Framework: what s the fundamentals Physical Models Discrete Stencils Numerical Algorithms Applications Codes extract Data Dependency form Data Structure Promote Computers Computatio nal Patterns Communications support Load Balancing
13 2.1 Framework: what s the fundamentals Special Models Stencils Algorithms separate Library Models Common Stencils Algorithms extract Data Dependency form Data Structure Promote Computers Computatio nal Patterns Communications support Load Balancing
14 2.1 Framework: what s the fundamentals Models Stencils Algorithms Applications Codes Special separate Automatic parallelism Library Models Common Stencils Algorithms extract Parallel Programming Interfaces Data Dependency Data Structures form Computatio nal Patterns Communications support Load Balancing Promote Computers Framework middleware for large scale scientific and engineering computing on meshes
15 2.2 Framework: Separate Parallel Programming Graph Based : Halo exchanges, Collectives Graph Based : Halo exchanges, Collectives Computational Patterns Digraph Based: data driven, task scheduling Digraph Based: data driven, task scheduling
16 Halo Exchange Pattern: Separate Parallel Programming
17 Halo Exchange Pattern: Separate Parallel Programming
18 Halo Exchange Pattern: Separate Parallel Programming Do MPI message passing fill ghost cells for uo ;
19 Halo Exchange Pattern: Separate Parallel Programming User Part
20 Halo Exchange Pattern: Separate Parallel Programming User Part
21 Halo Exchange Pattern: Separate Parallel Programming Framework Part Component-based C++ strategy class implementation user part
22 Halo Exchange Pattern: Separate Parallel Programming Serial performance are also significantly improved: molecular simulation: 3.5% 32.3% peak; radiation and neutron transportation :: double.
23 3. JASMIN framework and its application Think Parallel, Write Sequential
24 3.1 JASMIN Inertial Confinement Fusion Global Climate Modeling Structured Grid Particle Simulation JASMIN J parallel Adaptive Structured Mesh INfrastructure cn/jasmin, 2010SR Unstructured Grid
25 3.1 JASMIN Motivations: Supports the development of parallel codes for large scale numerical simulation on personal computers. Hides parallel programming using millions of CPU cores; Integrates the efficient implementations of parallel fast algorithms; Provides efficient data structures and solver libraries; Supports the component-based software engineering for code extensibility.
26 3.2 key factors Data Structures Fast Algorithms JASMIN Parallelization Component Interfaces
27 3.2.1 Data Structures minimize data movement:mesh partitioning (domain specific) MPI NUMAaware OpenMP 2 domains 5 regions 16 patches SIMD Points / Patch
28 Mesh supported Data Structures
29 3.2.1 Data Structures Multiblock Physical variables Multiblock Sliding Particles SAMR 3-D Contact Various Apps. : Fusion, CFD, Climate, Electromagnetism, Material science, etc.
30 3.2.2 Communications Distributed Undirected Graph (2 processors) Graph Based Patterns: Halo exchanges, Collectives Graph Based Patterns: Halo exchanges, Collectives
31 3.2.2 Communications distributed directed graph (digraph) 8 angles Parallel Sweeping for Sn transport Digraph Based Patterns: data driven, tasks scheduling Digraph Based Patterns: data driven, tasks scheduling
32 3.2.2 Communications (Hierarchical scheduling)
33 3.2.3 dynamic load balancing Global atmosphere model scale up to CPU cores Regional rainstorm model extreme load imbalance Radiation and neutron transport
34 3.2.4 Solvers for linear systems Radiation hydrodynamics: Usual algorithms :O(N )) PAMG+DDM :O(NlogN) data structures solver interfaces JXP Kinsol Hypre
35 3.2.5 SAMR h-adaptivity:advance-estimate-tag-refine-synchronize Coarser level integrates (user provides stencil) Error estimation (user provides) Tag cells for refinement Cluster tagged cells into Boxes Synchornization Finer level is initialized and integrates Generate patches for finer level
36 3.3 Current Version User provides: physics, parameters, numerical methods, expert experiences, special algorithms, etc. User Interfaces:Components based Parallel Programming models. ( C++ classes) JASMIN V. 3.0 Numerical Algorithms:geometry, fast solvers, mature numerical methods, time integrators, etc. HPC implementations( tens of thousands of CPUs): data structures, parallelization, load balancing, adaptivity, visualization, restart, memory, etc. Architecture:Multilayered, Modularized, Object-oriented; Codes: C++/C/F90/F77, MPI+MPI+MPI+OpenMP,660K LOC; Installation: Personal computers, Cluster, MPP.
37 Structured Grid JASMIN Three Frameworks for Scientific and Engineering Computing Unstructured Grid Composite Geometry JAUMIN JCOGIN
38 3.3 Current: 40 application codes, 1,500 KLOC HEDP Nuclear/Energy Material Laser Fusion ElectroMagnetism Hydrodynamics Climate Forecasting Earth Environment Mechanism Fission Energy
39 3.4 Some Applications on PetaFlops System TianHe-1A Codes # CPU cores Codes # CPU cores LARED-S 32,768 RH2D 1,024 LARED-P 72,000 HIME3D 3,600 LAP3D 16,384 PDD3D 4,096 MEPH3D 38,400 LARED-R 512 MD3D 80,000 LARED Integration 128 RT3D 1,000 JMES-FDTD 60,000 NEPTUNE 1,024 Simulation : several hours to tens of hours.
40 Applications-1:Inertial Confinement Fusion codes, researcheres, concurrently develop ICF Application Codes Different Combinations numerical methods Physical parameters Expert Experience Simulation Cycle Hides parallel computing and adaptive implementations using tens of thousands of CPU cores; Provides efficient data structures, algorithms and solvers; Support software engineering for code extensibility.
41 Applications-1:Inertial Confinement Fusion Codes Year 2004 Year 2010 LARED-H 2-D radiation hydrodynamics Lagrange code LARED-R 2-D radiation transport code LARED-S 3-D radiation hydrodynamics Euler code LARED-P 3-D laser plasma interaction code serial Single bolck Without capsule Serial Serial Single level 2-D: single group 3-D: no radiation Parallel Multiblock NIF ignition target Parallel (2048 cores) Scale up a factor of 1000 within 6 years. serial Parallel (32768 cores) SAMR Multi-group diffusion 3-D: radiaiton multigroup diffusion Parallel (36000 cores) Terascale of particles
42 ICF-1: Integration simulations for ignition targets Three temperatures conduction modeling Radiation thermal conduction modeling Transport modeling Multigroup diffusion modeling Laser transfer modeling Multi-physical modeling for radiation hydrodynamcs simulations
43 ICF-1: Integration simulations for ignition targets Radiation temp. Electron temp. Radiation temperatures in the hohlraum. Lagrange Meshes E/R swapped engr. LLNL NIF base target
44 ICF-2: 3-D simulations for laser plasma interactions LARED-P: super-strong lasers transfer and focus over the plasma cone and generate high energy electrons. #mesh: 768M #partiles: 40G CPU Cores: 72,000 Para.Effi.: 44% Phys. time : 320 fs Sim. time: 4.5 hours #steps : fs fs fs Output data 640GB/ 20
45 ICF-3: 3-D simulations for laser plasma hydrodynamics LAP3D: lasers filament and self-focus while transfer over a long distance in the lower density plasma environments. #mesh: 2.1G CPU Cores: 16,384 Para.Effi.: 50% Phys. time : 56.5 ps Sim. time: 12.8 hours #steps : 41,200 Output data 1.74TB/ 104 Parallel rendering using 72 cores Solve the hydrodynamics equations coupled with laser paraxial transfer equations. Simulation results are coincides with the ALPS codes
46 ICF-4: 3-D simulations for radiation hydrodynamics instabilities LARED-S: radiation hydrodynamics instabilities occur over the capsule interfaces while the capsule shell rapidly slows down. #mesh: 160M 256x256x256 3-D disturb # Cores: 32,768 Para.Effi.: 52% P. time : 0.1 ps S. time: 39 hours #steps : 778,580 Output data 4.4TB/ 670 Gray : shell materials. Color: ion temperature for hot spots. Ideal compress
47 ICF-4: 3-D simulations for radiation hydrodynamics instabilities Three Levels AMR: 1300M cells, Resolution : 2000x2000x2000 (8000M cells) 1600M cells 2560x256x cores: 6505 steps, 9 hours SAMR improve : resolution by 50, scale by cores, 77K steps, 39 hours 47
48 Applications-2:Material simulations PDD3D: 3-D discrete dislocation simulations for the locally plastic deformations of metal materials while the shock is enforced. #dislocation: 3 M The details of dislocation structures are discovered. CPU Cores: 4096 Para.Effi.: 47% Phys. time : 5.5 ns Sim. time: 12 hours #steps : 1,800 Output data 196 GB/ 196 The stretch simulations of Cu crystal (0.12 mm 3, r=10 7 /s). Left: dislocation density; Right: local structures.
49 Applications-2:Material simulations MD3D: 3-D molecule dynamics simulations for the structures and dynamics shock responses of metal materials with nano-scale defects. #Molecules: 50G Dislocation holes release and collapse. #Cores: 80,000 Para.Effi.: 62% Ph.time: 5.5 ps Sim. time: 3.0 hours #steps: 30,000 Output data 162 GB/ 150 Shock Shock Front Hot spot Discation holes collapse and interact with each other, hot spots are formed to generate various dynamics responses. Left: local spots; Right: LLNL s results in 2005.
50 Applications-3:Electromagnetic Simulations JMES3D: 3-D FDTD simulations for destroy mechanism of electromagnetic waves with different frequencies and directions. # mesh: 1.2 G #Cores: 60,000 Para.Effi.: 70% Phy.ime : 250 ns S.time: 8.1 hours #steps : 254,000 Output data 736 GB/ 172 Deposited Energy Electronic Field Snapshot
51 Applications-3:Electromagnetic Simulations JMES3D JAS93 L-wave RCS analysis, 3000 CPU-Cores, 7.5 billions of grid cells
52 Atmosphere: GAMIL-JASMIN Applications-4:Climate Forecasting 1.0 degree/ 100 kilometers resolution 1.0 degree/ 100 kilometers resolution Land: CLM-JASMIN Coupler JASMIN 5800 cores, PE=33% cores, PE=33%. Ocean: 0.25 degree/ kilometers resolution LICOM-JASMIN Ocean ice: CSIM4- JASMIN
53 Applications-5:CFD automatic blocks joint and patch-based decomposition for large scale parallelism. mesh blocks distribution # cells: 9,574,784 # blocks: 194 # patches 9600 # CPU cores: 2048 # 10K steps 800 sec. # speedup 580
54 4. Comments Applications Software/Simulation Proxies Apps. Stencil Template Numerical Library Domain-Specific Framework General Parallel Programming Stack Hardware
55 JASMIN JCOGIN / JAUMIN JASMIN3.0 JAUMIN1.0 JCOGIN PFLOPS Exa-scale Computing JASMIN TFLOPS Product Release JASMIN PFLOPS
56 Application codes are heritable 40 codes, 1,500 KLOC Domain-Specific APIs have no changes Improve frameworks towards machines Two months JASMIN Teraflops JAUMIN 1.0 JCOGIN 1.0 JASMIN 3.0 Petaflops Three months JAUMIN 2.0 JCOGIN 2.0 JASMIN Petaflops
57 Thank you for your attentions
Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help?
Acceleration of HPC applications on hybrid CPU- systems: When can Multi-Process Service (MPS) help? GTC 2018 March 28, 2018 Olga Pearce (Lawrence Livermore National Laboratory) http://people.llnl.gov/olga
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationDeveloping PIC Codes for the Next Generation Supercomputer using GPUs. Viktor K. Decyk UCLA
Developing PIC Codes for the Next Generation Supercomputer using GPUs Viktor K. Decyk UCLA Abstract The current generation of supercomputer (petaflops scale) cannot be scaled up to exaflops (1000 petaflops),
More informationALE and AMR Mesh Refinement Techniques for Multi-material Hydrodynamics Problems
ALE and AMR Mesh Refinement Techniques for Multi-material Hydrodynamics Problems A. J. Barlow, AWE. ICFD Workshop on Mesh Refinement Techniques 7th December 2005 Acknowledgements Thanks to Chris Powell,
More informationFast Dynamic Load Balancing for Extreme Scale Systems
Fast Dynamic Load Balancing for Extreme Scale Systems Cameron W. Smith, Gerrett Diamond, M.S. Shephard Computation Research Center (SCOREC) Rensselaer Polytechnic Institute Outline: n Some comments on
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationFourteen years of Cactus Community
Fourteen years of Cactus Community Frank Löffler Center for Computation and Technology Louisiana State University, Baton Rouge, LA September 6th 2012 Outline Motivation scenario from Astrophysics Cactus
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationFrom Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction
ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow
More informationInteractive HPC: Large Scale In-Situ Visualization Using NVIDIA Index in ALYA MultiPhysics
www.bsc.es Interactive HPC: Large Scale In-Situ Visualization Using NVIDIA Index in ALYA MultiPhysics Christopher Lux (NV), Vishal Mehta (BSC) and Marc Nienhaus (NV) May 8 th 2017 Barcelona Supercomputing
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationOverview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization
Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan & Matt Larsen (University of Oregon), Hank Childs (Lawrence Berkeley National Laboratory) 26
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationExperiences with ENZO on the Intel Many Integrated Core Architecture
Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and
More informationCray Users Group 2012 May 2, 2012
Cray Users Group 2012 May 2, 2012 This work was performed under the auspices of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC pf3d simulates
More informationLarge scale CFD computations at CEA
1 Large scale CFD computations at CEA G. Meurant a,h.jourdren a and B. Meltz a a CEA/DIF, PB 12, 91680 Bruyères le Châtel, France This paper describes some large scale Computational Fluid Dynamics computations
More informationFault tolerant issues in large scale applications
Fault tolerant issues in large scale applications Romain Teyssier George Lake, Ben Moore, Joachim Stadel and the other members of the project «Cosmology at the petascale» SPEEDUP 2010 1 Outline Computational
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationDesigning Parallel Programs. This review was developed from Introduction to Parallel Computing
Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis
More informationA numerical microscope for plasma physics
A numerical microscope for plasma physics A new simulation capability developed for heavy-ion inertial fusion energy research will accelerate plasma physics and particle beam modeling, with application
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationGeneral Plasma Physics
Present and Future Computational Requirements General Plasma Physics Center for Integrated Computation and Analysis of Reconnection and Turbulence () Kai Germaschewski, Homa Karimabadi Amitava Bhattacharjee,
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationShallow Water Simulations on Graphics Hardware
Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis
More informationImproving Uintah s Scalability Through the Use of Portable
Improving Uintah s Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks John Holmen1, Alan Humphrey1, Daniel Sunderland2, Martin Berzins1 University of Utah1 Sandia National Laboratories2
More informationParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser
ParalleX A Cure for Scaling Impaired Parallel Applications Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 Tianhe-1A 2.566 Petaflops Rmax Heterogeneous Architecture: 14,336 Intel Xeon CPUs 7,168 Nvidia Tesla M2050
More informationIntel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins
Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications
More informationAdaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA
Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance
More informationFrom Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp
From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with andreas.schaefer@cs.fau.de Friedrich-Alexander-Universität Erlangen-Nürnberg GPU Technology Conference 2013, San José,
More informationScalability of Uintah Past Present and Future
DOE for funding the CSAFE project (97-10), DOE NETL, DOE NNSA NSF for funding via SDCI and PetaApps, INCITE, XSEDE Scalability of Uintah Past Present and Future Martin Berzins Qingyu Meng John Schmidt,
More informationExperiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture
Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture 1 Introduction Robert Harkness National Institute for Computational Sciences Oak Ridge National Laboratory The National
More informationPreliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede
Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationHigh performance computing and numerical modeling
High performance computing and numerical modeling Volker Springel Plan for my lectures Lecture 1: Collisional and collisionless N-body dynamics Lecture 2: Gravitational force calculation Lecture 3: Basic
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationA Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012
A Software Developing Environment for Earth System Modeling Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012 1 Outline Motivation Purpose and Significance Research Contents Technology
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationParallel 3D Sweep Kernel with PaRSEC
Parallel 3D Sweep Kernel with PaRSEC Salli Moustafa Mathieu Faverge Laurent Plagne Pierre Ramet 1 st International Workshop on HPC-CFD in Energy/Transport Domains August 22, 2014 Overview 1. Cartesian
More informationTowards Exascale Computing with the Atmospheric Model NUMA
Towards Exascale Computing with the Atmospheric Model NUMA Andreas Müller, Daniel S. Abdi, Michal Kopera, Lucas Wilcox, Francis X. Giraldo Department of Applied Mathematics Naval Postgraduate School, Monterey
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationLoad Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application
Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of
More informationGeneric finite element capabilities for forest-of-octrees AMR
Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationParallel Algorithms: Adaptive Mesh Refinement (AMR) method and its implementation
Parallel Algorithms: Adaptive Mesh Refinement (AMR) method and its implementation Massimiliano Guarrasi m.guarrasi@cineca.it Super Computing Applications and Innovation Department AMR - Introduction Solving
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationMotivation Goal Idea Proposition for users Study
Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan Computer and Information Science University of Oregon 23 November 2015 Overview Motivation:
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationAllScale Pilots Applications AmDaDos Adaptive Meshing and Data Assimilation for the Deepwater Horizon Oil Spill
This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No. 671603 An Exascale Programming, Multi-objective Optimisation and Resilience
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationCloud-based simulation of plasma sources for surface treatment
Cloud-based simulation of plasma sources for surface treatment Using the PlasmaSolve Simulation Suite (P3S) Adam Obrusnik, Petr Zikan June 7, 2018 Outline 1. About PlasmaSolve 2. PlasmaSolve Simulation
More informationRadiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System
Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins, Todd Harman Scientific Computing and Imaging Institute & University of Utah I. Uintah
More informationFluent User Services Center
Solver Settings 5-1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Independence Adaption Appendix: Background Finite Volume
More informationGAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能
GAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能 Hsi-Yu Schive ( 薛熙于 ), Tzihong Chiueh ( 闕志鴻 ), Yu-Chih Tsai ( 蔡御之 ), Ui-Han Zhang ( 張瑋瀚 ) Graduate Institute
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University How to.. Process terabytes
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More information1 Past Research and Achievements
Parallel Mesh Generation and Adaptation using MAdLib T. K. Sheel MEMA, Universite Catholique de Louvain Batiment Euler, Louvain-La-Neuve, BELGIUM Email: tarun.sheel@uclouvain.be 1 Past Research and Achievements
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationThe next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency
The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar
More informationFLASH Code Tutorial. part IV radiation modules. Robi Banerjee Hamburger Sternwarte
FLASH Code Tutorial part IV radiation modules Robi Banerjee Hamburger Sternwarte banerjee@hs.uni-hamburg.de The Radiation transfer unit idea: get solution of the radiation transfer equation I(x,Ω,ν,t)
More informationDynamic load balancing in OSIRIS
Dynamic load balancing in OSIRIS R. A. Fonseca 1,2 1 GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal 2 DCTI, ISCTE-Instituto Universitário de Lisboa, Portugal Maintaining parallel load balance
More informationTransport Simulations beyond Petascale. Jing Fu (ANL)
Transport Simulations beyond Petascale Jing Fu (ANL) A) Project Overview The project: Peta- and exascale algorithms and software development (petascalable codes: Nek5000, NekCEM, NekLBM) Science goals:
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationI/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings
Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing
More informationAcceleration of a Python-based Tsunami Modelling Application via CUDA and OpenHMPP
Acceleration of a Python-based Tsunami Modelling Application via CUDA and OpenHMPP Zhe Weng and Peter Strazdins*, Computer Systems Group, Research School of Computer Science, The Australian National University
More informationCustomer Success Story Los Alamos National Laboratory
Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Case Study June 2010 Highlights First Petaflop
More informationCCSM Performance with the New Coupler, cpl6
CCSM Performance with the New Coupler, cpl6 Tony Craig Brian Kauffman Tom Bettge National Center for Atmospheric Research Jay Larson Rob Jacob Everest Ong Argonne National Laboratory Chris Ding Helen He
More informationImplementation of an integrated efficient parallel multiblock Flow solver
Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More information1401 HETEROGENEOUS HPC How Fusion Designs Can Advance Science
1401 HETEROGENEOUS HPC How Fusion Designs Can Advance Science Ben Bergen Los Alamos National Laboratory Research Scientist Marcus Daniels Los Alamos National Laboratory Research Scientist VPIC Team Brian
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation
More informationParallel Computing. November 20, W.Homberg
Mitglied der Helmholtz-Gemeinschaft Parallel Computing November 20, 2017 W.Homberg Why go parallel? Problem too large for single node Job requires more memory Shorter time to solution essential Better
More informationThe ECMWF forecast model, quo vadis?
The forecast model, quo vadis? by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int contributors: Piotr Smolarkiewicz, Mats Hamrud, George Mozdzynski, Sylvie Malardel, Christian
More informationThe Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center
The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationThe MOSIX Scalable Cluster Computing for Linux. mosix.org
The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What
More informationPerformance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Kazuhiko Komatsu, S. Momose, Y. Isobe, O. Watanabe, A. Musa, M. Yokokawa, T. Aoyama, M. Sato, H. Kobayashi Tohoku University 14 November,
More informationMulti-Domain Pattern. I. Problem. II. Driving Forces. III. Solution
Multi-Domain Pattern I. Problem The problem represents computations characterized by an underlying system of mathematical equations, often simulating behaviors of physical objects through discrete time
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationPrinciples of Parallel Algorithm Design: Concurrency and Mapping
Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationScalable Software Components for Ultrascale Visualization Applications
Scalable Software Components for Ultrascale Visualization Applications Wes Kendall, Tom Peterka, Jian Huang SC Ultrascale Visualization Workshop 2010 11-15-2010 Primary Collaborators Jian Huang Tom Peterka
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationOPTIMIZATION OF THE CODE OF THE NUMERICAL MAGNETOSHEATH-MAGNETOSPHERE MODEL
Journal of Theoretical and Applied Mechanics, Sofia, 2013, vol. 43, No. 2, pp. 77 82 OPTIMIZATION OF THE CODE OF THE NUMERICAL MAGNETOSHEATH-MAGNETOSPHERE MODEL P. Dobreva Institute of Mechanics, Bulgarian
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationAutomatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014
Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential
More informationIan Foster, An Overview of Distributed Systems
The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,
More information