Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs. Thomas C. Schulthess
|
|
- Rosamund Lee
- 6 years ago
- Views:
Transcription
1 Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs Thomas C. Schulthess 1
2 Piz Daint 2017 fact sheet ~5 000 NVIDIA P100 GPU accelerated nodes ~1 400 Dual multi-core socket nodes Model Cray XC40/Cray XC50 Number of Hybrid Compute Nodes Number of Multicore Compute Nodes Theoretical Peak Floataing-point Performance per Hybrid Node Theoretical Peak Floating-point Performance per Multicore Node Theoretical Hybrid Peak Performance Theoretical Muliticore Peak Performance Hybrid Memory Capacity per Node Multicore Memory Capacity per Node Total System Memory System Interconnect Sonexion 3000 Storage Capacity Teraflops Intel Xeon E v3/nvidia Tesla P Teraflops Intel Xeon E v Petaflops Petaflops 64 GB; 16 GB CoWoS HBM2 64 GB, 128 GB TB; 83.1 TB Cray Aries routing and communications ASIC, and Dragonfly network topology 6.2 PB Sonexion 3000 Parallel File System Theoretical Peak Performance 112 GB/s Sonexion 1600 Storage Capacity Sonexion 1600 Parallel File System Theoretcal Peak Performance 2.5 PB 138 GB/s 2
3 Euclid Flagship Simulation 2016 Full sky map of the dark matter structure at ½ the age of the Universe. This structure will distort the shapes of more distant galaxies due to weak gravitational lensing. 2 trillion particles using all of available memory on Piz Daint and observing about 25 billion virtual galaxies (*) (*) this catalogue is being used to calibrate the experiments on board the Euclid satellite that will be launched in 2020 with the objective of investigating the nature of dark matter and dark energy Source: Joachim Stadel & Doug Potter (see: Potter et al. Comp. Astro. & Cosmol. (2017) DOI /s
4 Imaging the earth General concept: Collect recordings from large number of earthquakes. Simulate recordings for a simple model of the Earth. Compare observed and simulated recordings. Improve the Earth model to match observations and simulations. data coverage 5.0 min. period [s] 55.0 Source: Andreas Fichtner (andreas.fichtner@erdw.ethz.ch) 4
5 The Collaborative Earth Model First community effort to successively evolve a model of the Earth. Harness distributed resources and man power of many researchers. Overview of current subregions collaborators Source: Andreas Fichtner (andreas.fichtner@erdw.ethz.ch) 5
6 source: Towards Green Aviation with Python at Petascale. P. E. Vincent et al. Supercomputing
7 Website: Github: Paper: Withered et al. Comp. Phys. Comm. (2014) Governing Equations Spatial Discretisation Temporal Discretisation Precision Input Output Platforms Compressible Euler and Navier Stokes Arbitrary order Flux Reconstruction on mixed unstructured grids (tris, quads, hexes, tets, prisms ) Explicit Runge-Kutta schemes single, double.pyfrm.msh.cgns.pyfrs.vtu.pvtu CPU clusters (via C/OpenMP-MPI) MIC clusters (via C/OpenMP-MPI) Nvidia GPU clusters (via CUDA-MPI) AMD GPU clusters (via OpenCL-MPI) source: Peter Vincent 7
8 PyFR ~9k lines of python code source: Peter Vincent 8
9 Science driven exascale computing 9
10 Leadership in weather and climate European world leadership but far away from sufficient accuracy and reliability! Peter Bauer, ECMWF 10
11 The impact of resolution: simulated tropical cyclones 130 km 60 km 25 km Observations HADGEM3 PRACE UPSCALE, P.L. Vidale (NCAS) and M. Roberts (MO/HC) 11
12 What resolution is needed? Bjorn Stevens, MPI-M There are threshold scales in the atmosphere and ocean: going from 100 km to 10 km is incremental, 10 km to 1 km is a leap. At 1km it is no longer necessary to parametrise precipitating convection, ocean eddies, or orographic wave drag and its effect on extratropical storms; ocean bathymetry, overflows and mixing, as well as regional orographic circulation in the atmosphere become resolved; the connection between the remaining parametrisation are now on a physical footing. We spend the last five decades in a paradigm of incremental advances. Here we incrementally improved the resolution of models from 200 to 20km Exascale allows us to make the leap to 1 km. This fundamentally changes the structure of our models. We move from crude parametric presentations to an explicit, physics based, description of essential processes. The last such step change was fifty years ago. This was when, in the late 1960s, climate scientists first introduced global climate models, which were distinguished by their ability to explicitly represent extra-tropical storms, ocean gyres and boundary current. 12
13 The importance of ensembles Peter Bauer, ECMWF 13
14 The relevant metric: Simulate Years Per Day (SPYD) NWP Climate in production Climate spinup Simulation 10 d 100 y y Desired wall clock time 0.1 d 0.1 y 0.5 y ratio 100 1'000 10'000 SYPD
15 Running COSMO 5.0 at global scale on Piz Daint Scaling to full system size: ~5300 GPU accelerate nodes available Running a near-global (±80º covering 97% of Earths surface) COSMO 5.0 simulation > Either on the hosts processors: Intel Xeon E5 2690v3 (Haswell 12c). > Or on the GPU accelerator: PCIe version ofnvidia GP100 (Pascal) GPU 15
16 Near-global climate simulation at 1km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 Fuhrer et al., Geosci. Model Dev. Discuss., in review, 2017 Metric: simulated years per wall-clock day x = 19 km, P100 x = 19 km, Haswell x = 3.7 km, P100 x = 3.7 km, Haswell x = 1.9 km, P100 x = 930 m, P100 SYPD #nodes h x i #nodes t [s] SYPD MWh/SY gridpoints 930 m 4, km 4, km (c) Time compression (SYPD) and energy cost (MWh/SY) for three moist simulations. At 930 m grid spacing obtained with a full 10d simulation, at 1.9 km from 1,000 steps, and at 47 km from 100 steps compression achieved in terms of SYPD. 16
17 x = 19 km, P100 x = 19 km, Haswell x = 3.7 km, P100 x = 3.7 km, Haswell x = 1.9 km, P100 x = 930 m, P100 SYPD 1 100x #nodes And reduce he footprint of the calculation by at least 10x 4888 Fuhrer et al., Geosci. Model Dev. Discuss., in review,
18 Deep Learning toolkits on Cray XC CSCS DL Toolkit C++ & GPU backend Installing a DL toolkit on Cray XC is similar to installing any HPC application few extra libraries are needed to satisfy dependencies Staging a toolkit can be done with SLURM (our resource manager at CSCS) some toolkits (like Spark) require SSH to be available on compute nodes MPI Working on Cray XC Fully CSCS TensorFlow yes no yes yes TensorFlow+MPI yes yes yes in progress MXNet yes no, ext. to use MPI yes in progress Caffe-MPI yes yes yes in progress CNTK yes yes yes in progress Spark no (Java + ext. to use GPUs) no yes yes Theano yes no yes yes 18
19 Moving Tensorflow to Pinz Daint Test-case setting simple neural network learning Standard model: LevNet-5-like convolutional MNIST model Written with Tensorflow/Python Testbed environment Standard desktop with Intel Broadwell (4c) Piz Daint multi-core node with Intel Broadwell (2x18c) Piz Daint hybrid node with Intel Haswell (12c) and NVIDIA Pascal (P100) Remark: this is a simple standard example, with complex models even more speedup expected Desktop Time to solution in sec. 3x speedup 18x speedup Daint MC node Daint hybrid node Source: Marcel Schöngens (schoengens@cscs.ch) 19
20 XC50 supercomputer plus Microsoft s Cognitive Toolkit was used to scale up training 20
21 21
22 22
23 Scaling CNTK with MPI rank i-1 rank i rank i+1 rank i+2 rank i+3 Samples i-1 Samples i Samples i+1 Samples i+2 Samples i+3 Update Gradient Update Gradient Update Gradient Update Gradient Update Gradient Gradient i-1 Gradient i Gradient i+1 Gradient i+2 Gradient +3 Sum gradients using MPI_Iallreduce Gradient Gradient Gradient Gradient Gradient Update Weights Update Weights Update Weights Update Weights Update Weights 23
24 We develop algorithms, we don t have time to deal with C/C++ or MPI a well-known computer science colleague working in machine learning 24
25 echoed by many scientists working with data Nishant Shukla (2017) 25
26 Architectural Developments Traditional Architecture Research Community CSCS User Data Flow CSCS External Login Access (ELA) Piz Daint Login & Mgmt /store Piz Daint Compute 26
27 Architectural Developments Improved Architecture Based Research Community on External Portal CSCS User Data Flow CSCS Domain Specific Portal Repository access Workflow Manager Does Not Scale External Login Access (ELA) Piz Daint Login & Mgmt /store Piz Daint 27
28 Architectural developments Service Oriented Architecture (SOA) Research Community Domain Specific Portal CSCS User Repository access Workflow Manager CSCS Infrastructure Services Authentication & authorization User Management Data Management Workflow Automation Capacity Management IT Infrastructure DWH Networking & security OpenStack Services Archival Storage Active Storage HPC Services [Confidential - For CSCS internal use only] Kick-off Meeting 28
29 and the service should be up most of the time (like 99+ %) 29
30 Supporting Federation using SOA Research Community Domain Specific Portal CSCS User Repository access Workflow Manager Research Community Domain Specific Portal Software services Repository access Workflow Manager Platform services Infrastructure provider Infrastructure services Infrastructure provider Infrastructure services 30
31 Fenix Sites 31
32 Thank you to engineers at Cray, CSCS and NVIDIA for incredibly efficient development/upgrade of Piz Daint! Tim Palmer (U. of Oxford) Peter Bauer (ECMWF) Christoph Schar (ETH Zurich) Bjorn Stevens (MPI-M) Oliver Fuhrer (MeteoSwiss) Sadaf Alam (CSCS) Dirk Pleiter (FZ Jülich) Colin McMurtrie (CSCS) Torsten Hoefler (ETH Zurich) 32
NVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationPLAN-E Workshop Switzerland. Welcome! September 8, 2016
PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,
More informationA PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler Swiss National Supercomputing
More informationDeutscher Wetterdienst
Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationCSCS Site Update. HPC Advisory Council Workshop Colin McMurtrie, Associate Director and Head of HPC Operations.
CSCS Site Update HPC Advisory Council Workshop 2018 Colin McMurtrie, Associate Director and Head of HPC Operations. 9th April 2018 The Swiss National Supercomputing Centre Driving innovation in computational
More informationThe challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.
The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationShifter: Fast and consistent HPC workflows using containers
Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter
More informationICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction
ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow
More informationPyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent
PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python F.D. Witherden, M. Klemm, P.E. Vincent 1 Overview Motivation. Accelerators and Modern Hardware Python and PyFR. Summary. Motivation
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationGPU Consideration for Next Generation Weather (and Climate) Simulations
GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationPRACE Project Access Technical Guidelines - 19 th Call for Proposals
PRACE Project Access Technical Guidelines - 19 th Call for Proposals Peer-Review Office Version 5 06/03/2019 The contributing sites and the corresponding computer systems for this call are: System Architecture
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationAI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management
AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management @SC Asia 2018 Rangan Sukumar, PhD Office of the CTO, Cray Inc. Safe Harbor Statement This presentation
More informationFast Multipole, GPUs and Memory Crushing: The 2 Trillion Particle Euclid Flagship Simulation. Joachim Stadel Doug Potter See [Potter+ 17]
Fast Multipole, GPUs and Memory Crushing: The 2 Trillion Particle Euclid Flagship Simulation Joachim Stadel Doug Potter See [Potter+ 17] Outline Predictability The Euclid Flagship Light-cone and Mock Pkdgrav3:
More informationUsing EasyBuild and Continuous Integration for Deploying Scientific Applications on Large Scale Production Systems
Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large HPC Advisory Council Swiss Conference Guilherme Peretti-Pezzi, CSCS April 11, 2017 Table of Contents 1. Introduction:
More informationThe next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency
The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar
More informationStatus of the COSMO GPU version
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Status of the COSMO GPU version Xavier Lapillonne Contributors in 2015 (Thanks!) Alon Shtivelman Andre Walser
More informationDGX UPDATE. Customer Presentation Deck May 8, 2017
DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated
More informationPP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu
PP POMPA (WG6) News and Highlights Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team COSMO GM13, Sibiu Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationIntroduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15
Introduction to ARSC David Newman (from Tom Logan slides), September 3 2015 What we do: High performance computing, university owned and operated center Provide HPC resources and support Conduct research
More informationCS500 SMARTER CLUSTER SUPERCOMPUTERS
CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationDGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo
DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationCLAW FORTRAN Compiler source-to-source translation for performance portability
CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary
More informationA Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids
A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011
More informationPorting COSMO to Hybrid Architectures
Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,
More informationIntroduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University
Introduction to High Performance Computing Shaohao Chen Research Computing Services (RCS) Boston University Outline What is HPC? Why computer cluster? Basic structure of a computer cluster Computer performance
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationExascale Challenges and Applications Initiatives for Earth System Modeling
Exascale Challenges and Applications Initiatives for Earth System Modeling Workshop on Weather and Climate Prediction on Next Generation Supercomputers 22-25 October 2012 Tom Edwards tedwards@cray.com
More informationSUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016
SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationHigh Performance Computing. What is it used for and why?
High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial
More informationDevelopments in Computing Technology: GPUs
Developments in Computing Technology: GPUs Mark Richardson, Technical Head, CEMAC Mark Richardson, CEMAC, GPU showcase 29 th November 2017 1 Welcome to first CEMAC seminar Will try to hold one every 3
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)
PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems
More informationJÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich
JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationNVIDIA DLI HANDS-ON TRAINING COURSE CATALOG
NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past,
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationMachine Learning on VMware vsphere with NVIDIA GPUs
Machine Learning on VMware vsphere with NVIDIA GPUs Uday Kurkure, Hari Sivaraman, Lan Vu GPU Technology Conference 2017 2016 VMware Inc. All rights reserved. Gartner Hype Cycle for Emerging Technology
More informationGeneral Plasma Physics
Present and Future Computational Requirements General Plasma Physics Center for Integrated Computation and Analysis of Reconnection and Turbulence () Kai Germaschewski, Homa Karimabadi Amitava Bhattacharjee,
More informationHigh-Performance Distributed RMA Locks
TORSTEN HOEFLER High-Performance Distributed RMA Locks with support of Patrick Schmid, Maciej Besta @ SPCL presented at Wuxi, China, Sept. 2016 ETH, CS, Systems Group, SPCL ETH Zurich top university in
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationUpdate on Cray Activities in the Earth Sciences
Update on Cray Activities in the Earth Sciences Presented to the 13 th ECMWF Workshop on the Use of HPC in Meteorology 3-7 November 2008 Per Nyberg nyberg@cray.com Director, Marketing and Business Development
More informationPython based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E
Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc Overview Supported Technologies Cray PE Python Support Shifter Urika-XC Anaconda Python Spark Intel BigDL machine
More informationApril 2 nd, Bob Burroughs Director, HPC Solution Sales
April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationHigh Performance Computing. What is it used for and why?
High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationTrends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27
Trends of Network Topology on Supercomputers Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 From Graph Golf to Real Interconnection Networks Case 1: On-chip Networks Case 2: Supercomputer
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationS8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to
More informationPorting Scientific Applications to OpenPOWER
Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel
More informationCSCS Proposal writing webinar Technical review. 12th April 2015 CSCS
CSCS Proposal writing webinar Technical review 12th April 2015 CSCS Agenda Tips for new applicants CSCS overview Allocation process Guidelines Basic concepts Performance tools Demo Q&A open discussion
More informationImproving the Energy- and Time-to-solution of COSMO-ART
Joseph Charles, William Sawyer (ETH Zurich - CSCS) Heike Vogel (KIT), Bernhard Vogel (KIT), Teresa Beck (KIT/UHEI) COSMO User Workshop, MeteoSwiss January 18, 2016 Summary 2 Main Objectives Utilise project
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationHPCF Cray Phase 2. User Test period. Cristian Simarro User Support. ECMWF April 18, 2016
HPCF Cray Phase 2 User Test period Cristian Simarro User Support advisory@ecmwf.int ECMWF April 18, 2016 Content Introduction Upgrade timeline Changes Hardware Software Steps for the testing on CCB Possible
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationLecture 1. Introduction Course Overview
Lecture 1 Introduction Course Overview Welcome to CSE 260! Your instructor is Scott Baden baden@ucsd.edu Office: room 3244 in EBU3B Office hours Week 1: Today (after class), Tuesday (after class) Remainder
More informationThe Tesla Accelerated Computing Platform
The Tesla Accelerated Computing Platform Axel Koehler, Principal Solution Architect HPC Advisory Council Meeting Lugano 22 March 2016 Introduction TESLA Platform for HPC Agenda TESLA Platform for HYPERSCALE
More informationCuda C Programming Guide Appendix C Table C-
Cuda C Programming Guide Appendix C Table C-4 Professional CUDA C Programming (1118739329) cover image into the powerful world of parallel GPU programming with this down-to-earth, practical guide Table
More informationExecution Models for the Exascale Era
Execution Models for the Exascale Era Nicholas J. Wright Advanced Technology Group, NERSC/LBNL njwright@lbl.gov Programming weather, climate, and earth- system models on heterogeneous muli- core plajorms
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationGPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU
More informationUnified Model Performance on the NEC SX-6
Unified Model Performance on the NEC SX-6 Paul Selwood Crown copyright 2004 Page 1 Introduction The Met Office National Weather Service Global and Local Area Climate Prediction (Hadley Centre) Operational
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationSome Reflections on Advanced Geocomputations and the Data Deluge
Some Reflections on Advanced Geocomputations and the Data Deluge J. A. Rod Blais Dept. of Geomatics Engineering Pacific Institute for the Mathematical Sciences University of Calgary, Calgary, AB www.ucalgary.ca/~blais
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More informationEARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA
EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility
More informationObject recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK
Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,
More informationFederal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss PP POMPA status Xavier Lapillonne Performance On Massively Parallel Architectures Last year of the project
More informationNIA CFD Futures Conference Hampton, VA; August 2012
Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 10 2 10 1 10 4 10 5 Supported
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Our Environment Today Your laptops or workstations: only used for portal access Bridges
More informationHPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,
HPC-CINECA infrastructure: The New Marconi System HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, g.amati@cineca.it Agenda 1. New Marconi system Roadmap Some performance info
More informationDeutscher Wetterdienst
Accelerating Work at DWD Ulrich Schättler Deutscher Wetterdienst Roadmap Porting operational models: revisited Preparations for enabling practical work at DWD My first steps with the COSMO on a GPU First
More informationPerformance and Energy Usage of Workloads on KNL and Haswell Architectures
Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research
More information