Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs. Thomas C. Schulthess

Size: px
Start display at page:

Download "Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs. Thomas C. Schulthess"

Transcription

1 Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs Thomas C. Schulthess 1

2 Piz Daint 2017 fact sheet ~5 000 NVIDIA P100 GPU accelerated nodes ~1 400 Dual multi-core socket nodes Model Cray XC40/Cray XC50 Number of Hybrid Compute Nodes Number of Multicore Compute Nodes Theoretical Peak Floataing-point Performance per Hybrid Node Theoretical Peak Floating-point Performance per Multicore Node Theoretical Hybrid Peak Performance Theoretical Muliticore Peak Performance Hybrid Memory Capacity per Node Multicore Memory Capacity per Node Total System Memory System Interconnect Sonexion 3000 Storage Capacity Teraflops Intel Xeon E v3/nvidia Tesla P Teraflops Intel Xeon E v Petaflops Petaflops 64 GB; 16 GB CoWoS HBM2 64 GB, 128 GB TB; 83.1 TB Cray Aries routing and communications ASIC, and Dragonfly network topology 6.2 PB Sonexion 3000 Parallel File System Theoretical Peak Performance 112 GB/s Sonexion 1600 Storage Capacity Sonexion 1600 Parallel File System Theoretcal Peak Performance 2.5 PB 138 GB/s 2

3 Euclid Flagship Simulation 2016 Full sky map of the dark matter structure at ½ the age of the Universe. This structure will distort the shapes of more distant galaxies due to weak gravitational lensing. 2 trillion particles using all of available memory on Piz Daint and observing about 25 billion virtual galaxies (*) (*) this catalogue is being used to calibrate the experiments on board the Euclid satellite that will be launched in 2020 with the objective of investigating the nature of dark matter and dark energy Source: Joachim Stadel & Doug Potter (see: Potter et al. Comp. Astro. & Cosmol. (2017) DOI /s

4 Imaging the earth General concept: Collect recordings from large number of earthquakes. Simulate recordings for a simple model of the Earth. Compare observed and simulated recordings. Improve the Earth model to match observations and simulations. data coverage 5.0 min. period [s] 55.0 Source: Andreas Fichtner (andreas.fichtner@erdw.ethz.ch) 4

5 The Collaborative Earth Model First community effort to successively evolve a model of the Earth. Harness distributed resources and man power of many researchers. Overview of current subregions collaborators Source: Andreas Fichtner (andreas.fichtner@erdw.ethz.ch) 5

6 source: Towards Green Aviation with Python at Petascale. P. E. Vincent et al. Supercomputing

7 Website: Github: Paper: Withered et al. Comp. Phys. Comm. (2014) Governing Equations Spatial Discretisation Temporal Discretisation Precision Input Output Platforms Compressible Euler and Navier Stokes Arbitrary order Flux Reconstruction on mixed unstructured grids (tris, quads, hexes, tets, prisms ) Explicit Runge-Kutta schemes single, double.pyfrm.msh.cgns.pyfrs.vtu.pvtu CPU clusters (via C/OpenMP-MPI) MIC clusters (via C/OpenMP-MPI) Nvidia GPU clusters (via CUDA-MPI) AMD GPU clusters (via OpenCL-MPI) source: Peter Vincent 7

8 PyFR ~9k lines of python code source: Peter Vincent 8

9 Science driven exascale computing 9

10 Leadership in weather and climate European world leadership but far away from sufficient accuracy and reliability! Peter Bauer, ECMWF 10

11 The impact of resolution: simulated tropical cyclones 130 km 60 km 25 km Observations HADGEM3 PRACE UPSCALE, P.L. Vidale (NCAS) and M. Roberts (MO/HC) 11

12 What resolution is needed? Bjorn Stevens, MPI-M There are threshold scales in the atmosphere and ocean: going from 100 km to 10 km is incremental, 10 km to 1 km is a leap. At 1km it is no longer necessary to parametrise precipitating convection, ocean eddies, or orographic wave drag and its effect on extratropical storms; ocean bathymetry, overflows and mixing, as well as regional orographic circulation in the atmosphere become resolved; the connection between the remaining parametrisation are now on a physical footing. We spend the last five decades in a paradigm of incremental advances. Here we incrementally improved the resolution of models from 200 to 20km Exascale allows us to make the leap to 1 km. This fundamentally changes the structure of our models. We move from crude parametric presentations to an explicit, physics based, description of essential processes. The last such step change was fifty years ago. This was when, in the late 1960s, climate scientists first introduced global climate models, which were distinguished by their ability to explicitly represent extra-tropical storms, ocean gyres and boundary current. 12

13 The importance of ensembles Peter Bauer, ECMWF 13

14 The relevant metric: Simulate Years Per Day (SPYD) NWP Climate in production Climate spinup Simulation 10 d 100 y y Desired wall clock time 0.1 d 0.1 y 0.5 y ratio 100 1'000 10'000 SYPD

15 Running COSMO 5.0 at global scale on Piz Daint Scaling to full system size: ~5300 GPU accelerate nodes available Running a near-global (±80º covering 97% of Earths surface) COSMO 5.0 simulation > Either on the hosts processors: Intel Xeon E5 2690v3 (Haswell 12c). > Or on the GPU accelerator: PCIe version ofnvidia GP100 (Pascal) GPU 15

16 Near-global climate simulation at 1km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 Fuhrer et al., Geosci. Model Dev. Discuss., in review, 2017 Metric: simulated years per wall-clock day x = 19 km, P100 x = 19 km, Haswell x = 3.7 km, P100 x = 3.7 km, Haswell x = 1.9 km, P100 x = 930 m, P100 SYPD #nodes h x i #nodes t [s] SYPD MWh/SY gridpoints 930 m 4, km 4, km (c) Time compression (SYPD) and energy cost (MWh/SY) for three moist simulations. At 930 m grid spacing obtained with a full 10d simulation, at 1.9 km from 1,000 steps, and at 47 km from 100 steps compression achieved in terms of SYPD. 16

17 x = 19 km, P100 x = 19 km, Haswell x = 3.7 km, P100 x = 3.7 km, Haswell x = 1.9 km, P100 x = 930 m, P100 SYPD 1 100x #nodes And reduce he footprint of the calculation by at least 10x 4888 Fuhrer et al., Geosci. Model Dev. Discuss., in review,

18 Deep Learning toolkits on Cray XC CSCS DL Toolkit C++ & GPU backend Installing a DL toolkit on Cray XC is similar to installing any HPC application few extra libraries are needed to satisfy dependencies Staging a toolkit can be done with SLURM (our resource manager at CSCS) some toolkits (like Spark) require SSH to be available on compute nodes MPI Working on Cray XC Fully CSCS TensorFlow yes no yes yes TensorFlow+MPI yes yes yes in progress MXNet yes no, ext. to use MPI yes in progress Caffe-MPI yes yes yes in progress CNTK yes yes yes in progress Spark no (Java + ext. to use GPUs) no yes yes Theano yes no yes yes 18

19 Moving Tensorflow to Pinz Daint Test-case setting simple neural network learning Standard model: LevNet-5-like convolutional MNIST model Written with Tensorflow/Python Testbed environment Standard desktop with Intel Broadwell (4c) Piz Daint multi-core node with Intel Broadwell (2x18c) Piz Daint hybrid node with Intel Haswell (12c) and NVIDIA Pascal (P100) Remark: this is a simple standard example, with complex models even more speedup expected Desktop Time to solution in sec. 3x speedup 18x speedup Daint MC node Daint hybrid node Source: Marcel Schöngens (schoengens@cscs.ch) 19

20 XC50 supercomputer plus Microsoft s Cognitive Toolkit was used to scale up training 20

21 21

22 22

23 Scaling CNTK with MPI rank i-1 rank i rank i+1 rank i+2 rank i+3 Samples i-1 Samples i Samples i+1 Samples i+2 Samples i+3 Update Gradient Update Gradient Update Gradient Update Gradient Update Gradient Gradient i-1 Gradient i Gradient i+1 Gradient i+2 Gradient +3 Sum gradients using MPI_Iallreduce Gradient Gradient Gradient Gradient Gradient Update Weights Update Weights Update Weights Update Weights Update Weights 23

24 We develop algorithms, we don t have time to deal with C/C++ or MPI a well-known computer science colleague working in machine learning 24

25 echoed by many scientists working with data Nishant Shukla (2017) 25

26 Architectural Developments Traditional Architecture Research Community CSCS User Data Flow CSCS External Login Access (ELA) Piz Daint Login & Mgmt /store Piz Daint Compute 26

27 Architectural Developments Improved Architecture Based Research Community on External Portal CSCS User Data Flow CSCS Domain Specific Portal Repository access Workflow Manager Does Not Scale External Login Access (ELA) Piz Daint Login & Mgmt /store Piz Daint 27

28 Architectural developments Service Oriented Architecture (SOA) Research Community Domain Specific Portal CSCS User Repository access Workflow Manager CSCS Infrastructure Services Authentication & authorization User Management Data Management Workflow Automation Capacity Management IT Infrastructure DWH Networking & security OpenStack Services Archival Storage Active Storage HPC Services [Confidential - For CSCS internal use only] Kick-off Meeting 28

29 and the service should be up most of the time (like 99+ %) 29

30 Supporting Federation using SOA Research Community Domain Specific Portal CSCS User Repository access Workflow Manager Research Community Domain Specific Portal Software services Repository access Workflow Manager Platform services Infrastructure provider Infrastructure services Infrastructure provider Infrastructure services 30

31 Fenix Sites 31

32 Thank you to engineers at Cray, CSCS and NVIDIA for incredibly efficient development/upgrade of Piz Daint! Tim Palmer (U. of Oxford) Peter Bauer (ECMWF) Christoph Schar (ETH Zurich) Bjorn Stevens (MPI-M) Oliver Fuhrer (MeteoSwiss) Sadaf Alam (CSCS) Dirk Pleiter (FZ Jülich) Colin McMurtrie (CSCS) Torsten Hoefler (ETH Zurich) 32

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

PLAN-E Workshop Switzerland. Welcome! September 8, 2016 PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,

More information

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler Swiss National Supercomputing

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational

More information

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500

More information

CSCS Site Update. HPC Advisory Council Workshop Colin McMurtrie, Associate Director and Head of HPC Operations.

CSCS Site Update. HPC Advisory Council Workshop Colin McMurtrie, Associate Director and Head of HPC Operations. CSCS Site Update HPC Advisory Council Workshop 2018 Colin McMurtrie, Associate Director and Head of HPC Operations. 9th April 2018 The Swiss National Supercomputing Centre Driving innovation in computational

More information

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C. The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1

More information

RAMSES on the GPU: An OpenACC-Based Approach

RAMSES on the GPU: An OpenACC-Based Approach RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU

More information

Shifter: Fast and consistent HPC workflows using containers

Shifter: Fast and consistent HPC workflows using containers Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter

More information

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow

More information

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python F.D. Witherden, M. Klemm, P.E. Vincent 1 Overview Motivation. Accelerators and Modern Hardware Python and PyFR. Summary. Motivation

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

GPU Consideration for Next Generation Weather (and Climate) Simulations

GPU Consideration for Next Generation Weather (and Climate) Simulations GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

PRACE Project Access Technical Guidelines - 19 th Call for Proposals PRACE Project Access Technical Guidelines - 19 th Call for Proposals Peer-Review Office Version 5 06/03/2019 The contributing sites and the corresponding computer systems for this call are: System Architecture

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management @SC Asia 2018 Rangan Sukumar, PhD Office of the CTO, Cray Inc. Safe Harbor Statement This presentation

More information

Fast Multipole, GPUs and Memory Crushing: The 2 Trillion Particle Euclid Flagship Simulation. Joachim Stadel Doug Potter See [Potter+ 17]

Fast Multipole, GPUs and Memory Crushing: The 2 Trillion Particle Euclid Flagship Simulation. Joachim Stadel Doug Potter See [Potter+ 17] Fast Multipole, GPUs and Memory Crushing: The 2 Trillion Particle Euclid Flagship Simulation Joachim Stadel Doug Potter See [Potter+ 17] Outline Predictability The Euclid Flagship Light-cone and Mock Pkdgrav3:

More information

Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large Scale Production Systems

Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large Scale Production Systems Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large HPC Advisory Council Swiss Conference Guilherme Peretti-Pezzi, CSCS April 11, 2017 Table of Contents 1. Introduction:

More information

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar

More information

Status of the COSMO GPU version

Status of the COSMO GPU version Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Status of the COSMO GPU version Xavier Lapillonne Contributors in 2015 (Thanks!) Alon Shtivelman Andre Walser

More information

DGX UPDATE. Customer Presentation Deck May 8, 2017

DGX UPDATE. Customer Presentation Deck May 8, 2017 DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated

More information

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu PP POMPA (WG6) News and Highlights Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team COSMO GM13, Sibiu Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016 RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

Introduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15

Introduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15 Introduction to ARSC David Newman (from Tom Logan slides), September 3 2015 What we do: High performance computing, university owned and operated center Provide HPC resources and support Conduct research

More information

CS500 SMARTER CLUSTER SUPERCOMPUTERS

CS500 SMARTER CLUSTER SUPERCOMPUTERS CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

IBM Deep Learning Solutions

IBM Deep Learning Solutions IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

CLAW FORTRAN Compiler source-to-source translation for performance portability

CLAW FORTRAN Compiler source-to-source translation for performance portability CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary

More information

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011

More information

Porting COSMO to Hybrid Architectures

Porting COSMO to Hybrid Architectures Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,

More information

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University Introduction to High Performance Computing Shaohao Chen Research Computing Services (RCS) Boston University Outline What is HPC? Why computer cluster? Basic structure of a computer cluster Computer performance

More information

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017 INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and

More information

Exascale Challenges and Applications Initiatives for Earth System Modeling

Exascale Challenges and Applications Initiatives for Earth System Modeling Exascale Challenges and Applications Initiatives for Earth System Modeling Workshop on Weather and Climate Prediction on Next Generation Supercomputers 22-25 October 2012 Tom Edwards tedwards@cray.com

More information

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016 SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

High Performance Computing. What is it used for and why?

High Performance Computing. What is it used for and why? High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial

More information

Developments in Computing Technology: GPUs

Developments in Computing Technology: GPUs Developments in Computing Technology: GPUs Mark Richardson, Technical Head, CEMAC Mark Richardson, CEMAC, GPU showcase 29 th November 2017 1 Welcome to first CEMAC seminar Will try to hold one every 3

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past,

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Machine Learning on VMware vsphere with NVIDIA GPUs

Machine Learning on VMware vsphere with NVIDIA GPUs Machine Learning on VMware vsphere with NVIDIA GPUs Uday Kurkure, Hari Sivaraman, Lan Vu GPU Technology Conference 2017 2016 VMware Inc. All rights reserved. Gartner Hype Cycle for Emerging Technology

More information

General Plasma Physics

General Plasma Physics Present and Future Computational Requirements General Plasma Physics Center for Integrated Computation and Analysis of Reconnection and Turbulence () Kai Germaschewski, Homa Karimabadi Amitava Bhattacharjee,

More information

High-Performance Distributed RMA Locks

High-Performance Distributed RMA Locks TORSTEN HOEFLER High-Performance Distributed RMA Locks with support of Patrick Schmid, Maciej Besta @ SPCL presented at Wuxi, China, Sept. 2016 ETH, CS, Systems Group, SPCL ETH Zurich top university in

More information

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles

More information

Update on Cray Activities in the Earth Sciences

Update on Cray Activities in the Earth Sciences Update on Cray Activities in the Earth Sciences Presented to the 13 th ECMWF Workshop on the Use of HPC in Meteorology 3-7 November 2008 Per Nyberg nyberg@cray.com Director, Marketing and Business Development

More information

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc Overview Supported Technologies Cray PE Python Support Shifter Urika-XC Anaconda Python Spark Intel BigDL machine

More information

April 2 nd, Bob Burroughs Director, HPC Solution Sales

April 2 nd, Bob Burroughs Director, HPC Solution Sales April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

High Performance Computing. What is it used for and why?

High Performance Computing. What is it used for and why? High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 Trends of Network Topology on Supercomputers Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 From Graph Golf to Real Interconnection Networks Case 1: On-chip Networks Case 2: Supercomputer

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Deep Learning mit PowerAI - Ein Überblick

Deep Learning mit PowerAI - Ein Überblick Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s

More information

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to

More information

Porting Scientific Applications to OpenPOWER

Porting Scientific Applications to OpenPOWER Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel

More information

CSCS Proposal writing webinar Technical review. 12th April 2015 CSCS

CSCS Proposal writing webinar Technical review. 12th April 2015 CSCS CSCS Proposal writing webinar Technical review 12th April 2015 CSCS Agenda Tips for new applicants CSCS overview Allocation process Guidelines Basic concepts Performance tools Demo Q&A open discussion

More information

Improving the Energy- and Time-to-solution of COSMO-ART

Improving the Energy- and Time-to-solution of COSMO-ART Joseph Charles, William Sawyer (ETH Zurich - CSCS) Heike Vogel (KIT), Bernhard Vogel (KIT), Teresa Beck (KIT/UHEI) COSMO User Workshop, MeteoSwiss January 18, 2016 Summary 2 Main Objectives Utilise project

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

HPC future trends from a science perspective

HPC future trends from a science perspective HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively

More information

HPCF Cray Phase 2. User Test period. Cristian Simarro User Support. ECMWF April 18, 2016

HPCF Cray Phase 2. User Test period. Cristian Simarro User Support. ECMWF April 18, 2016 HPCF Cray Phase 2 User Test period Cristian Simarro User Support advisory@ecmwf.int ECMWF April 18, 2016 Content Introduction Upgrade timeline Changes Hardware Software Steps for the testing on CCB Possible

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Lecture 1. Introduction Course Overview

Lecture 1. Introduction Course Overview Lecture 1 Introduction Course Overview Welcome to CSE 260! Your instructor is Scott Baden baden@ucsd.edu Office: room 3244 in EBU3B Office hours Week 1: Today (after class), Tuesday (after class) Remainder

More information

The Tesla Accelerated Computing Platform

The Tesla Accelerated Computing Platform The Tesla Accelerated Computing Platform Axel Koehler, Principal Solution Architect HPC Advisory Council Meeting Lugano 22 March 2016 Introduction TESLA Platform for HPC Agenda TESLA Platform for HYPERSCALE

More information

Cuda C Programming Guide Appendix C Table C-

Cuda C Programming Guide Appendix C Table C- Cuda C Programming Guide Appendix C Table C-4 Professional CUDA C Programming (1118739329) cover image into the powerful world of parallel GPU programming with this down-to-earth, practical guide Table

More information

Execution Models for the Exascale Era

Execution Models for the Exascale Era Execution Models for the Exascale Era Nicholas J. Wright Advanced Technology Group, NERSC/LBNL njwright@lbl.gov Programming weather, climate, and earth- system models on heterogeneous muli- core plajorms

More information

Stan Posey, NVIDIA, Santa Clara, CA, USA

Stan Posey, NVIDIA, Santa Clara, CA, USA Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with

More information

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU

More information

Unified Model Performance on the NEC SX-6

Unified Model Performance on the NEC SX-6 Unified Model Performance on the NEC SX-6 Paul Selwood Crown copyright 2004 Page 1 Introduction The Met Office National Weather Service Global and Local Area Climate Prediction (Hadley Centre) Operational

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Some Reflections on Advanced Geocomputations and the Data Deluge

Some Reflections on Advanced Geocomputations and the Data Deluge Some Reflections on Advanced Geocomputations and the Data Deluge J. A. Rod Blais Dept. of Geomatics Engineering Pacific Institute for the Mathematical Sciences University of Calgary, Calgary, AB www.ucalgary.ca/~blais

More information

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics

More information

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing

More information

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility

More information

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,

More information

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status.

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status. Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss PP POMPA status Xavier Lapillonne Performance On Massively Parallel Architectures Last year of the project

More information

NIA CFD Futures Conference Hampton, VA; August 2012

NIA CFD Futures Conference Hampton, VA; August 2012 Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 10 2 10 1 10 4 10 5 Supported

More information

Our Workshop Environment

Our Workshop Environment Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Our Environment Today Your laptops or workstations: only used for portal access Bridges

More information

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, HPC-CINECA infrastructure: The New Marconi System HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, g.amati@cineca.it Agenda 1. New Marconi system Roadmap Some performance info

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Accelerating Work at DWD Ulrich Schättler Deutscher Wetterdienst Roadmap Porting operational models: revisited Preparations for enabling practical work at DWD My first steps with the COSMO on a GPU First

More information

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Performance and Energy Usage of Workloads on KNL and Haswell Architectures Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research

More information