Improving the Energy- and Time-to-solution of COSMO-ART

Size: px
Start display at page:

Download "Improving the Energy- and Time-to-solution of COSMO-ART"

Transcription

1 Joseph Charles, William Sawyer (ETH Zurich - CSCS) Heike Vogel (KIT), Bernhard Vogel (KIT), Teresa Beck (KIT/UHEI) COSMO User Workshop, MeteoSwiss January 18, 2016

2 Summary 2

3 Main Objectives Utilise project methodologies to attain x5 ETS improvement for COSMO-ART Code optimisations / refactoring on CPUs System software (other compilers, optimised libraries) New algorithms New architectures (GPUs, emerging CPUs, ARM) Technical challenges with a code under constant development Run configuration must be recreated in all subsequent versions Results must be reproducible within an expected variance Target application: COSMO-HAM (ETH Zurich) or COSMO-ART (KIT, EMPA)? Redefinition of baseline to reflect oversights, newer version of ART Management of different branches, validation, incorporations of version, e.g.: COSMO 4.28, COSMO-4.30, COSMO-5.0, COSMO-5.1_beta, OPCODE COSMO-5.1_beta Incongruities / incompatibilities between versions, e.g.: OPCODE COSMO based on 5.0, wasn't upgraded to 5.1 at the end of the project 3

4 Main Results WP5 Roadmap (Mar. 2014) Energy profiling of COSMO-ART baseline (ETH Zurich - CSCS / UHAM / UJI) Optimal setup for discretisation parameters, compilers (ETH Zurich - CSCS) Refactoring for CPUs (ETH Zurich - CSCS / IBM Research - Zurich) ODE Solver algorithmic changes (KIT / ETH Zurich - CSCS) Mixed-precision COSMO-ART (ETH Zurich - CSCS / KIT) Port COSMO-ART components to accelerators (ETH Zurich - CSCS) Feasibility study of a reduced model for gas phase chemistry (KIT) Investigation of possibilities of ART on ARM (UHEI) Milestones MS10 (M30) : Refactored COSMO-ART code prototype for CPUs and multi-core architectures MS11 (M36) : Performance model for ARM and other emerging hardware Deliverables D5.1 (M24) : Benchmarking report on energy requirements of the current COSMO-ART D5.2 (M30) : Refactored COSMO-ART code prototype for CPUs and multi-core architectures D5.3 (M36) : Final delivery of software prototypes, documentation, and summary report 4

5 Exploitable results COSMO-ART optimised version with respect to energy-to-solution Intellectual Property Rights (IPR): OPCODE COSMO : Open source with proprietary background IP. ART : Open source with proprietary background IP, available for scientific use after signing an agreement Usage scenario: COSMO-ART in a more cost-effective and energy-efficient manner on applicable hardware platforms Sector of application: Atmospheric Chemistry research One-moment graupel microphysics standalone C++ code using STELLA Intellectual Property Rights (IPR): Open source with proprietary components from COSMO Consortium Usage scenario: Assess potential performance improvement of COSMO component on multi-core CPU and GPU architectures from a single source code utilising STELLA framework Sector of application: Computational Science Box Model Test Framework for Kinetics PreProcessor (KPP) Intellectual Property Rights (IPR): Open source with proprietary background IP, additional licence for KPPA needed Usage scenario: Comparison of an existing KPP implementation in a given application with the same solvers generated by the KPPA proprietary software Sector of application: Computational Chemistry 5

6 Results Overview 6

7 COSMO-ART: Atmospheric Chemistry as Showcase Ref. Baseline, GNU compiler, 240 PEs COSMO TTS = s COSMO-ART TTS = 4, s Dynamics Physics MPI Comm. (Dyn.) MPI Sync. (Dyn.) MPI Comm. (Phy.) MPI Sync. (Phy.) Other Input Output Dynamics Physics MPI Comm. (Dyn.) MPI Sync. (Dyn.) MPI Comm. (Phy.) MPI Sync. (Phy.) Other Input Output ART COSMO: an ubiquitous weather forecast model in Europe Widespread use in federal weather forecast stations in Germany, Switzerland, Italy, Greece, Poland, Romania and Russia and large number of agencies including military and research institutions COSMO-ART: COSMO extended for Aerosols and Reactive Trace gases, e.g., air quality prediction Massive increase in computational expense due to atmospheric chemistry and additional tracers to advect (only relatively short simulation times currently viable) 7

8 Strategy Overview Aerosol Reactive Transport for atmospheric chemistry Optimisations for time-stepping in solvers generated by kinetics pre-processor (KPP) Proprietary KPP version generating multithreaded CPU and CUDA (GPU) code CPU/GPU-optimised version of COSMO NWP model 8

9 COSMO-ART : Baseline Energy-to-Solution Benchmark at Cabinet Level MONCH (CSCS ETH Zurich) cores using 20 MPI tasks per node (realistic for production) 52 compute nodes were used, each comprised of two Intel Xeon Ivy Bridge EP E v2 ten-core processors operating at 2.2GHz, equipped with 32GB of DDR3 1600MHz RAM and connected via InfiniBand Mellanox SX6036 and FDR switches. This CPU architecture was considered state-of-the-art at beginning of the Exa2Green project Power Measurement System Model: Chauvin Arnoux PEL103 Clamp model: Miniflex MA193 Precision: ± 0.5% TTS = 1,681.6 s ETS = 21,182,799 J 9

10 COSMO-ART: Standalone KPP Test Framework D5.2 (M30): Refactored COSMO-ART code prototype for CPUs and multi-core architectures Two versions for an exclusive benchmarking of gas-phase chemistry 0-dim box model : identical calculation in all cells in the 3D domain extended box model : reads in temperature and chemical concentrations from real run in NetCDF format Single-node evaluation on a 66x56x31 test domain (114,576 grid cells) Piz Daint Cray XC30 (8-core Intel Xeon Sandy Bridge E CPU (2.6GHz) & Tesla K20X) Cray Power Management DataBase (PMDB) + pm_counters Sysfs files (updated at 10 Hz) TTS ETS reduction KPPA KPP KPP Serial TTS ETS OpenMP TTS ETS CUDA TTS ETS 0-dim box model x 1.3 x 1.4 x 1.0 x 1.0 x 3.5 x 5.4 x 25.5 x 23.5 x 33.3 x 18.8 extended box model x 1.4 x 1.4 x 1.0 x 1.0 x 3.4 x 5.3 x 22.3 x 23.3 x 23.2 x

11 COSMO-ART: Gas Phase Chemistry Optimisations Starting point: COSMO-ART (ref): initial reference baseline based on COSMO_4.30 Mixed- and single-precision, e.g., COSMO-ART (sp-dp): mixed precision version based on COSMO_5.1 (beta) and ART_3.0 COSMO-ART (sp): single precision version based on COSMO_5.1 (beta) and ART_3.0 PRACE 2IP WP8 Integrator (G. Fanourgakis, J. Lelieveld and D. Taraborelli) Time-step control: as proposed by Söderlind Positive definition: artificial preservation of positivity to improve stability COSMO-ART (sp, PRACE): based on COSMO_5.1 (beta) and ART_3.0 with pos. def., new time-step control COSMO-ART (sp, PRACE, KPPA): same as above but based on KPPA Replace COSMO with OPCODE COSMO (HP2C project with CPU and GPU support) Slightly different configuration : Requires revised shallow convection scheme Semi-Lagrangian advection scheme (SL3_SC) is slightly different than original SL3_SFD Requires adapted radiation scheme (roughly same run-time) Results now scientifically validated by H. Vogel (KIT) and J. Charles (CSCS) COSMO-ART (sp, PRACE, OPCODE): limited to Cray compiler (because of GPU components) 11

12 COSMO-ART: Preliminary Benchmarking Proof-of-concept benchmarking on two computing platforms at ETH Zurich CSCS : Piz Daint: Cray XC30-8-core Intel Xeon Sandy Bridge E CPU (2.6GHz) per compute node Piz Dora: Cray XC40 - two 12-core Intel Xeon Haswell E v3 CPUs (2.6GHz) per compute node For both : Cray Power Management DataBase (PMDB) + pm_counters Sysfs files (updated at 10 Hz) Remarks: 24h simulation using 288 PEs and the GNU compiler (but: Cray -O2 for OPCODE) COSMO_5.1 (beta) provided by O. Fuhrer and X. Lapillonne (MeteoSwiss) supports a generic tracer transport mechanism for prognostic variables allows a flexible definition of new tracers ART_3.0 provided by H. & B. Vogel (KIT) extensive support from H. & B. Vogel (KIT) for debugging 12

13 COSMO-ART: OPCODE, OpenMP, P.I., Piz Dora ; Intermediate result Constant MPI decomposition (192 processes), variable #nodes and threads Energy Energy-to-solution (J) Time-to-solution (s) Time 0 N=8 N=16 #MPI=24 #MPI=12 1,2 th. 1,2,4 th. N=24 #MPI=8 1,2,6 th. N=48 #MPI=4 1,2,4,6,12 th. N=96 #MPI=2 1,2,4,6,12,24 th. Bottom line: Optimal ETS is on minimal #nodes, with each core running 1 MPI process 13

14 Baseline vs. Final code version : OPCODE COSMO-ART, SP, P.I. Comparison with 1040 cores (=MPI processes) in both cases Energy : CPU + Interconnect + Blowers + AC/DC Conversion 14

15 Crosscutting Activities with other teams 15

16 Performance/Energy-Efficiency Analysis (UHAM/UJI) D5.1 (M24): Benchmarking report on energy requirements of the current COSMO-ART TINTORRUM (UJI) 16 nodes with 2x Intel Westmere E5645 hex-core (2.4 GHz) => 192 MPI processes Power Measurement System (UHAM) ACP8653 Power Distribution Units (PDUs) with 1 S/s and ± 3% accuracy High resolution power-performance tracing framework Extrae instrumentation library + pmlib tracing server + Paraver Visualise and correlate tasks traces with power profile Software Environment COSMO-ART baseline (initial model setup) OpenMPI 1.6.5: 192 cores using 12 MPI processes per node Two MPI policies (UJI) Aggressive : CPU busy-waiting for incoming message Degraded:repeated calls to sched_yield(), picked by the OS 16

17 Impact WP2 (IBM) results on Showcase: KPP on POWER8 Key Aspect : Problem is decoupled in space each point has set of ODEs solved with KPP Considered optimisations: High thread parallelism software optimisation (left, baseline) Loop merging (center) Fast exponential, logarithm, and power evaluation for coefficients of chemical reactions (right, IBM-specific) Transactional memory (was not applicable) Iterative refinement for the linear system, e.g., LU in low precision residual in high precision may be pursued in future Time reduction: 1 thread per core -39%, 8 threads per core -68 % Power increase: 1 thread per core +1%, 8 threads per core +15 % Energy reduction: 1 thread per core -30%, 8 threads per core -58% Unfortunately even the IBM-nonspecific optimisations did not yield performance improvement on target Piz Dora Intel Haswell platform 17

18 Model Reduction of Atm. Chem. Kinetics (UHEI/KIT) Roadmap point #7: Feasibility Study Investigation of popular approaches: Removal of species Lumping into pseudo-species Time-scale separation Repro-modelling and functional representation Assessment of the feasibility within COSMO-ART Focus on Repro-modelling : High-Dimensional Model Representation (HDMR) Implementation and Testing of HDMR : 0D box model : atmospheric chemistry test problem (Kuhn et al., 1998) Results : HDMR models can be tailored to meet any accuracy requirements a the price of higher computing demands for their (a-priori) construction and evaluation HDMR predictions with acceptable accuracy save up to 99% of computing time vs. Rosenbrock Conclusions : HDMR offers a promising approach to reduce time-/energy demands for ART chemical kinetics Further investigation needed to construct optimal HDMR expansions, requiring expert knowledge Other Crosscutting Results : investigate suitability of asynchronous iteration and multigrid methods COSMO-ART mathematical properties and problem size were not suitable for these techniques 18

19 GPU Results 19

20 GPU proofs of concept 1) Replacement of COSMO by OPCODE COSMO (CPU/GPU-enabled) (4 nodes) COSMO-ART TTS Dynamics Physics MPI Comm. (Dyn.) MPI Sync. (Dyn.) 2) Extended box model : utilisation of (CPU/GPU-enabled) KPPA solvers (single node) TTS ETS reduction KPP TTS ETS KPP TTS ETS KPPA Serial TTS ETS OpenMP TTS ETS CUDA TTS ETS 0-dim box model x 1.3 x 1.4 x 1.0 x 1.0 x 3.5 x 5.4 x 25.5 x 23.5 x 33.3 x 18.8 extended box model x 1.4 x 1.4 x 1.0 x 1.0 x 3.4 x 5.3 x 22.3 x 23.3 x 23.2 x ) Utilisation of CPU/GPU-enabled STELLA for Graupel Microphysics 20

21 Bottom Line Summary Planned : 5x ETS improvement in full COSMO-ART benchmark Achieved 3.3x: with OPCODE COSMO, algorithmic improvements, on typical configuration (1040 cores on Piz Dora platform, 44 dual-socket Intel Haswell CPU nodes). Valuable contribution to atmospheric chemistry community For GPU platforms: component benchmarks indicate additional factor >1.6x possible GPU implementation of end-to-end COSMO-ART not completed (unfortunately for CSCS) KPPA had unresolved issues when run in COSMO-ART context Software management issues in merge more time-consuming than expected Results : COSMO-ART community has immediate benefits with the new code for CPU platforms Three exploitable results delivered : STELLA microphysics (CPU/GPU), box model test framework (CPU/GPU), OPCODE COSMO-ART SP with PRACE Integrator (CPU-only) ARM platform was tested with box model (T5.3); result : GPU architectures more promising Enriching collaborations with Exa2Green partners, e.g., ART development (KIT), power monitoring (UHAM/UJI), box model optimisations (IBM), model reduction (UHEI/KIT) The full documentation is available on: 21

Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System

Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System Joseph Charles & William Sawyer (CSCS), Manuel F. Dolz (UHAM), Sandra Catalán (UJI) EnA-HPC, Dresden September 1-2, 2014 1

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational

More information

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

PLAN-E Workshop Switzerland. Welcome! September 8, 2016 PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,

More information

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Accelerating Work at DWD Ulrich Schättler Deutscher Wetterdienst Roadmap Porting operational models: revisited Preparations for enabling practical work at DWD My first steps with the COSMO on a GPU First

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu PP POMPA (WG6) News and Highlights Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team COSMO GM13, Sibiu Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout

More information

CLAW FORTRAN Compiler source-to-source translation for performance portability

CLAW FORTRAN Compiler source-to-source translation for performance portability CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary

More information

Dynamical Core Rewrite

Dynamical Core Rewrite Dynamical Core Rewrite Tobias Gysi Oliver Fuhrer Carlos Osuna COSMO GM13, Sibiu Fundamental question How to write a model code which allows productive development by domain scientists runs efficiently

More information

News from the consortium

News from the consortium Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss News from the consortium Swiss COSMO User Workshop 1st November 2012 COSMO users for NWP by 2012 Members

More information

GPU Consideration for Next Generation Weather (and Climate) Simulations

GPU Consideration for Next Generation Weather (and Climate) Simulations GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

An update on the COSMO- GPU developments

An update on the COSMO- GPU developments An update on the COSMO- GPU developments COSMO User Workshop 2014 X. Lapillonne, O. Fuhrer, A. Arteaga, S. Rüdisühli, C. Osuna, A. Roches and the COSMO- GPU team Eidgenössisches Departement des Innern

More information

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies John C. Linford John Michalakes Manish Vachharajani Adrian Sandu IMAGe TOY 2009 Workshop 2 Virginia

More information

Porting COSMO to Hybrid Architectures

Porting COSMO to Hybrid Architectures Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,

More information

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C. The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1

More information

Physical parametrizations and OpenACC directives in COSMO

Physical parametrizations and OpenACC directives in COSMO Physical parametrizations and OpenACC directives in COSMO Xavier Lapillonne Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie und Klimatologie MeteoSchweiz Name (change on Master slide)

More information

Illinois Proposal Considerations Greg Bauer

Illinois Proposal Considerations Greg Bauer - 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

RAMSES on the GPU: An OpenACC-Based Approach

RAMSES on the GPU: An OpenACC-Based Approach RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU

More information

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler Swiss National Supercomputing

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status.

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status. Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss PP POMPA status Xavier Lapillonne Performance On Massively Parallel Architectures Last year of the project

More information

First Experiences With Validating and Using the Cray Power Management Database Tool

First Experiences With Validating and Using the Cray Power Management Database Tool First Experiences With Validating and Using the Cray Power Management Database Tool Gilles Fourestey, Ben Cumming, Ladina Gilly, and Thomas C. Schulthess Swiss National Supercomputing Center, ETH Zurich,

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System

More information

Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model

Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model www.bsc.es Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model HPC Knowledge Meeting'15 George S. Markomanolis, Jesus Labarta, Oriol Jorba University of Barcelona, Barcelona,

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N.

GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N. GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran G. Ruetsch, M. Fatica, E. Phillips, N. Juffa Outline WRF and RRTM Previous Work CUDA Fortran Features RRTM in CUDA

More information

Deutscher Wetterdienst. Ulrich Schättler Deutscher Wetterdienst Research and Development

Deutscher Wetterdienst. Ulrich Schättler Deutscher Wetterdienst Research and Development Deutscher Wetterdienst COSMO, ICON and Computers Ulrich Schättler Deutscher Wetterdienst Research and Development Contents Problems of the COSMO-Model on HPC architectures POMPA and The ICON Model Outlook

More information

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract

More information

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500

More information

Porting and Optimizing the COSMOS coupled model on Power6

Porting and Optimizing the COSMOS coupled model on Power6 Porting and Optimizing the COSMOS coupled model on Power6 Luis Kornblueh Max Planck Institute for Meteorology November 5, 2008 L. Kornblueh, MPIM () echam5 November 5, 2008 1 / 21 Outline 1 Introduction

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

CMAQ PARALLEL PERFORMANCE WITH MPI AND OPENMP**

CMAQ PARALLEL PERFORMANCE WITH MPI AND OPENMP** CMAQ 5.2.1 PARALLEL PERFORMANCE WITH MPI AND OPENMP** George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 27514, USA 1. INTRODUCTION This presentation reports on implementation of the

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier0) Contributing sites and the corresponding computer systems for this call are: GENCI CEA, France Bull Bullx cluster GCS HLRS, Germany Cray

More information

Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4

Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4 Center for Information Services and High Performance Computing (ZIH) Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4 Minisymposium on Advances in Numerics and Physical Modeling for Geophysical

More information

Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large Scale Production Systems

Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large Scale Production Systems Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large HPC Advisory Council Swiss Conference Guilherme Peretti-Pezzi, CSCS April 11, 2017 Table of Contents 1. Introduction:

More information

EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics

EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics Zbigniew P. Piotrowski *,** EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics *Geophysical Turbulence Program, National Center for Atmospheric Research,

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

REQUEST FOR A SPECIAL PROJECT

REQUEST FOR A SPECIAL PROJECT REQUEST FOR A SPECIAL PROJECT 2018 2020 MEMBER STATE: Germany, Greece, Italy This form needs to be submitted via the relevant National Meteorological Service. Principal Investigator 1 Amalia Iriza (NMA,Romania)

More information

- Part 3 - Energy Aware Numerics. Vincent Heuveline

- Part 3 - Energy Aware Numerics. Vincent Heuveline - Part 3 - Energy Aware Numerics Vincent Heuveline 1 The Challenge 2 Exa2green Engineering Mathematics and Computing Lab UHEI Steinbeis Europa Zentrum SEZ Scientific Computing Group UHAM High Performance

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

A performance portable implementation of HOMME via the Kokkos programming model

A performance portable implementation of HOMME via the Kokkos programming model E x c e p t i o n a l s e r v i c e i n t h e n a t i o n a l i n t e re s t A performance portable implementation of HOMME via the Kokkos programming model L.Bertagna, M.Deakin, O.Guba, D.Sunderland,

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs

Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs William Sawyer (CSCS/ETH), Christian Conti (ETH), Xavier Lapillonne (C2SM/ETH) Programming weather, climate, and earth-system models on heterogeneous

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Performance of deal.ii on a node

Performance of deal.ii on a node Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions

More information

Status of the COSMO GPU version

Status of the COSMO GPU version Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Status of the COSMO GPU version Xavier Lapillonne Contributors in 2015 (Thanks!) Alon Shtivelman Andre Walser

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

CPMD Performance Benchmark and Profiling. February 2014

CPMD Performance Benchmark and Profiling. February 2014 CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

LS-DYNA Performance Benchmark and Profiling. April 2015

LS-DYNA Performance Benchmark and Profiling. April 2015 LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

ANSYS HPC Technology Leadership

ANSYS HPC Technology Leadership ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables

More information

DOI: /jsfi Towards a performance portable, architecture agnostic implementation strategy for weather and climate models

DOI: /jsfi Towards a performance portable, architecture agnostic implementation strategy for weather and climate models DOI: 10.14529/jsfi140103 Towards a performance portable, architecture agnostic implementation strategy for weather and climate models Oliver Fuhrer 1, Carlos Osuna 2, Xavier Lapillonne 2, Tobias Gysi 3,4,

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

NAMD Performance Benchmark and Profiling. January 2015

NAMD Performance Benchmark and Profiling. January 2015 NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

Using an HPC Cloud for Weather Science

Using an HPC Cloud for Weather Science Using an HPC Cloud for Weather Science Provided By: Transforming Operational Environmental Predictions Around the Globe Moving EarthCast Technologies from Idea to Production EarthCast Technologies produces

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

HPC projects. Grischa Bolls

HPC projects. Grischa Bolls HPC projects Grischa Bolls Outline Why projects? 7th Framework Programme Infrastructure stack IDataCool, CoolMuc Mont-Blanc Poject Deep Project Exa2Green Project 2 Why projects? Pave the way for exascale

More information

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing

More information

GPU computing at RZG overview & some early performance results. Markus Rampp

GPU computing at RZG overview & some early performance results. Markus Rampp GPU computing at RZG overview & some early performance results Markus Rampp Introduction Outline Hydra configuration overview GPU software environment Benchmarking and porting activities Team Renate Dohmen

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012 ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information

More information

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 HPC Scale Working Group, Dec 2011 Gilad Shainer, Pak Lui, Tong Liu,

More information

The Icosahedral Nonhydrostatic (ICON) Model

The Icosahedral Nonhydrostatic (ICON) Model The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

UCX: An Open Source Framework for HPC Network APIs and Beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian

More information

HPC IN EUROPE. Organisation of public HPC resources

HPC IN EUROPE. Organisation of public HPC resources HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC resources provided primarily to enable scientific research and development at European universities and other publicly-funded

More information

Vectorisation and Portable Programming using OpenCL

Vectorisation and Portable Programming using OpenCL Vectorisation and Portable Programming using OpenCL Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre (JSC) Andreas Beckmann, Ilya Zhukov, Willi Homberg, JSC Wolfram Schenck, FH Bielefeld

More information

ICON Performance Benchmark and Profiling. March 2012

ICON Performance Benchmark and Profiling. March 2012 ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC

More information

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Visual Analysis of Lagrangian Particle Data from Combustion Simulations Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang

More information

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:

More information

Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs

Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs M. J. McNenly and R. A. Whitesides GPU Technology Conference March 27, 2014 San Jose, CA LLNL-PRES-652254! This work performed under

More information

Challenges in adapting Particle-In-Cell codes to GPUs and many-core platforms

Challenges in adapting Particle-In-Cell codes to GPUs and many-core platforms Challenges in adapting Particle-In-Cell codes to GPUs and many-core platforms L. Villard, T.M. Tran, F. Hariri *, E. Lanti, N. Ohana, S. Brunner Swiss Plasma Center, EPFL, Lausanne A. Jocksch, C. Gheller

More information

MILC Performance Benchmark and Profiling. April 2013

MILC Performance Benchmark and Profiling. April 2013 MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

Evaluating OpenMP s Effectiveness in the Many-Core Era

Evaluating OpenMP s Effectiveness in the Many-Core Era Evaluating OpenMP s Effectiveness in the Many-Core Era Prof Simon McIntosh-Smith HPC Research Group simonm@cs.bris.ac.uk 1 Bristol, UK 10 th largest city in UK Aero, finance, chip design HQ for Cray EMEA

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection

More information

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training

More information

ADAC Federated Testbed Creating a Blueprint for Portable Ecosystems

ADAC Federated Testbed Creating a Blueprint for Portable Ecosystems ADAC Federated Testbed Creating a Blueprint for Portable Ecosystems Sadaf Alam, Jeffrey Vetter, Mark Klein, Maxime Martinasso, ExCL team @ ORNL,... ADAC Workshop February 15, 2018 January, 2016 June, 2016

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work

More information

The Center for High Performance Computing. Dell Breakfast Events 20 th June 2016 Happy Sithole

The Center for High Performance Computing. Dell Breakfast Events 20 th June 2016 Happy Sithole The Center for High Performance Computing Dell Breakfast Events 20 th June 2016 Happy Sithole Background: The CHPC in SA CHPC User Community: South Africa CHPC Existing Users Future Users Introduction

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information