NVIDIA Update and Directions on GPU Acceleration for Earth System Models
|
|
- Harvey Baker
- 5 years ago
- Views:
Transcription
1 NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA, Austin, TX, USA
2 NVIDIA GPU UPDATE TOPICS OF DISCUSSION ESM GPU PROGRESS WRF DEVELOPMENTS 2
3 NVIDIA GPU: Introduction and Hardware Features GPU Introduction CPU PCIe or NVLink Tesla P100 1x 3x 10x Unified Memory Co-processor to the CPU Threaded Parallel (SIMT) CPUs: x86 Power ARM ORNL Titan #3 Top500.org 18,688 GPUs HPC Motivation: o Performance o Efficiency o Cost Savings 3
4 NVIDIA GPU: Introduction and Hardware Features GPU Introduction CPU Tesla P100 Unified Memory Co-processor to the CPU Threaded Parallel (SIMT) CPUs: x86 Power ARM HPC Motivation: o o o 1x PCIe or NVLink Performance Efficiency Cost Savings 3x 10x Next GPU Current GPUs Since 2014 (Q4 2016) GPU Feature Tesla P100 Tesla K80 Tesla K40 Stream Processors x Core Clock 1328MHz 562MHz 745MHz Boost Clock(s) 1480MHz 875MHz 810MHz, 875MHz Memory Clock 1.4Gbps HBM2 5Gbps GDDR5 6Gbps GDDR5 Memory Bus Width 4096-bit 2 x 384-bit 384-bit Memory Bandwidth 720GB/sec 2 x 240GB/sec 288GB/sec VRAM 16GB 2 x 12GB 12GB Half Precision 21.2 TFLOPS 8.74 TFLOPS 4.29 TFLOPS Single Precision 10.6 TFLOPS 8.74 TFLOPS 4.29 TFLOPS Double Precision 5.3 TFLOPS 2.91 TFLOPS 1.43 TFLOPS (1/2 rate) (1/3 rate) (1/3 rate) GPU GP100 (610mm2) GK210 GK110B Transistor Count 15.3B 2 x 7.1B(?) 7.1B Power Rating 300W 300W 235W Cooling N/A Passive Active/Passive Manufacturing Process TSMC 16nm FinFET TSMC 28nm TSMC 28nm Architecture Pascal Kepler Kepler NOTE: P100 nodes available for community remote access on NVIDIA PSG cluster 4 2.5x 3.7x
5 Speed-up vs Dual Socket Haswell COSMO Dycore Speedups on P100 GPU MeteoSwiss GPU Branch of COSMO Model Dycore Only 50x 45x 40x 35x 30x 25x 20x 15x 10x 5x 0x 2x K80 (4 x GPU) 2x P100 4x P100 8x P100 Socket-to-socket: P100 vs. HSW = 3.5x 27x 14x 7x 3x COSMO 2x HSW CPU Results from NVIDIA Internal Cluster (US) (Preliminary Mar 2016) COSMO 5.3 MCH branch 128x128, 80xVertical Time steps 10 CPU: x86 Xeon Haswell o GHz GPU: Tesla P100 Use of 8-GPU single node CUDA 8 Socket-to-socket: P100 vs. HSW = 3.5x 5
6 Select NVIDIA ESM Highlights Since MultiCore 5 Growth in GPU funded-development; Large GPU system deployments GPUs deployed for operational NWP by MeteoSwiss with COSMO model OpenACC (PGI) developments for ACME Atmosphere in production release New NCAR collaboration launched with OpenACC Hackathon Workshop KISTI 2-week GPU and OpenACC Workshop focus on MPAS and WRF NVIDIA selected for ECWMF ESCAPE program; ESCAPE GPU Workshop US DOE ORNL-led GPU Hackathons included several ES model teams ACME, COAMPS, ECHAM6, FVCOM, NOAA GFDL models 6
7 MeteoSwiss and Operational COSMO NWP on GPUs MeteoSwiss COSMO NWP Configurations Since 2008 IFS from ECMWF 2 per day, 10 day forecast COSMO 7 (6.6 KM) 3 per day, 3 day forecast COSMO 2 (2.2 KM) 8 per day, 24 hr forecast Before GPUs MeteoSwiss COSMO NWP Configurations During 2016 IFS from ECMWF 2 per day, 10 day forecast COSMO E (2.2 KM) 2 per day, 5 day forecast COSMO 1 (1.1 KM) 8 per day, 24 hr forecast With GPUs New configurations of higher resolution and ensemble predictions possible owing to the performance-per-energy gains from GPUs X. Lapillonne, MeteoSwiss; EGU Assembly, Apr
8 MeteoSwiss Weather Prediction Based on GPUs World s First GPU-Accelerated NWP Piz Kesch (Cray CS Storm) Installed at CSCS July x Racks with 48 Total CPUs 192 Tesla K80 Total GPUs High GPU Density Nodes: 2 x CPU + 8 x GPU > 90% of FLOPS from GPUs Operational NWP Mar 16 Image by NVIDIA/MeteoSwiss 8
9 MeteoSwiss Operational COSMO-E Benchmark Cray XC40 Original Code Node = 2 x HSW Cray CS Storm Refactored Code Node = 2 x HSW + 8 x K80 Speedup Vs. Original 9
10 MeteoSwiss Operational COSMO-E Benchmark Cray XC40 Original Code Node = 2 x HSW Cray XC40 Refactored Code Node = 2 x HSW Cray CS Storm Refactored Code Node = 2 x HSW + 8 x K80 Speedup Vs. Original Speedup Vs. Refactored 10
11 Speed-up vs Dual Socket Haswell COSMO Dycore Speedups on P100 GPU MeteoSwiss GPU Branch of COSMO Model Dycore Only 50x 45x 40x 35x 30x 25x 20x 15x 10x 5x 0x 2x K80 (4 x GPU) 2x P100 4x P100 8x P100 Socket-to-socket: P100 vs. HSW = 3.5x 27x 14x 7x 3x COSMO 2x HSW CPU Results from NVIDIA Internal Cluster (US) (Preliminary Mar 2016) COSMO 5.3 MCH branch 128x128, 80xVertical Time steps 10 CPU: x86 Xeon Haswell o GHz GPU: Tesla P100 Use of 8-GPU single node CUDA 8 Socket-to-socket: P100 vs. HSW = 3.5x 11
12 Update on DOE Pre-Exascale CORAL Systems US DOE CORAL Systems ORNL Summit at 200 PF Early 2018 LLNL Sierra at 150 PF Mid-2018 Nodes of POWER 9 + Tesla Volta GPUs NVLink Interconnect for CPUs + GPUs ORNL Summit System Based on original 150 PF plan: Approximately 3,400 total nodes Each node 40+ TF peak performance About 1/5 of total #2 Titan nodes (18K+) Same energy used as #2 Titan (27 PF) CORAL Summit System 5-10x Faster than Titan 1/5th the Nodes, Same Energy Use as Titan (Based on original 150 PF) 12
13 Programming Strategies for GPU Acceleration Applications GPU Libraries Provides Fast Drop-In Acceleration OpenACC Directives GPU-acceleration in Standard Language (Fortran, C, C++) Increasing Development Effort Programming in CUDA Maximum Flexibility with GPU Architecture and Software Features NOTE: Many application developments include a combination of these strategies 13
14 Index: Scalable Rendering for Volume Visualization o Leverages GPU-clusters for largescale (volume) data visualization and interactive visual computing 1.8 billion cells time steps o Commercial software solution available and deployed for in-situ visualization of large-scale data o Plugin for ParaView under development and available soon Dataset courtesy of Prof. Leigh Orf, UW-Madison and Rob Sisneros, NCSA o 14
15 GPUs at Convergence of Data and HPC in ESM Fusion of Observations from Machine Learning with the Model Yandex developments of ML + Model for Hyperlocal NWP with WRF: Yandex Introduces Hyperlocal Weather Forecasting Service Based on Machine Learning Technology DL dominant topic at NCAR workshop Climate Informatics 2015 IBM acquisition of The Weather Company applied data analytics Data Assimilation Next Phase Following Model Development 4DVAR GPU development success with MeteoSwiss and others RIKEN study of 10,240 member ensemble with NICAM (Miyoshi, et al.) Largest ensemble simulation of global weather using real-world data 15
16 CUDA: TQI/SSEC Commercial WRF TempoQuest Plans for CUDA WRF-based software product NVIDIA providing standard engineering guidance \ WRF GPU UPDATE OpenACC: NVIDIA Open WRF Project Migrating routines to 3.8 Initial projections of P100 GPU very good Working towards unified memory capability PGI compiler continues to improve/mature Potential for Full model WRF on GPUs Several months away, hybrid in near term P100 GPU will improve hybrid approach UM + NVLink will improve data transfer times P100 memory bandwidth 3x vs. Kepler-series 16
17 Questions? Stan Posey, Carl Ponder,
NVIDIA HPC Directions for Earth System Models. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
NVIDIA HPC Directions for Earth System Models Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC DIRECTIONS TOPICS OF DISCUSSION ESM GPU PROGRESS PGI UPDATE D. NORTON
More informationGPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU
More informationGPU Developments in Atmospheric Sciences
GPU Developments in Atmospheric Sciences Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA David Hall, Ph.D., Sr. Solutions Architect, NVIDIA, Boulder, CO, USA NVIDIA HPC UPDATE
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationRECENT UPDATES ON ACCELERATING COMPUTING PLATFORM PRADEEP GUPTA SENIOR SOLUTIONS ARCHITECT, NVIDIA
RECENT UPDATES ON ACCELERATING COMPUTING PLATFORM PRADEEP GUPTA SENIOR SOLUTIONS ARCHITECT, NVIDIA GAMING AUTO ENTERPRISE HPC & CLOUD OEM & IP THE WORLD LEADER IN VISUAL COMPUTING 2 # of GPU Developers
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationGPU-Powered WRF in the Cloud for Research and Operational Applications
GPU-Powered WRF in the Cloud for Research and Operational Applications John Manobianco, Chief Scientist Don Berchoff, Chief Technical Officer john@tempoquest.com, don@tempoquest.com 2017 Modeling Research
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationStatus of the COSMO GPU version
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Status of the COSMO GPU version Xavier Lapillonne Contributors in 2015 (Thanks!) Alon Shtivelman Andre Walser
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationSelecting the right Tesla/GTX GPU from a Drunken Baker's Dozen
Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,
More informationDeutscher Wetterdienst
Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational
More informationAdapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs
Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationPeter Messmer Developer Technology Group Stan Posey HPC Industry and Applications
Peter Messmer Developer Technology Group pmessmer@nvidia.com Stan Posey HPC Industry and Applications sposey@nvidia.com U Progress Reported at This Workshop 2011 2012 CAM SE COSMO GEOS 5 CAM SE COSMO GEOS
More informationResponsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc
Responsive Large Data Analysis and Visualization with the ParaView Ecosystem Patrick O Leary, Kitware Inc Hybrid Computing Attribute Titan Summit - 2018 Compute Nodes 18,688 ~3,400 Processor (1) 16-core
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationVOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017
VOLTA: PROGRAMMABILITY AND PERFORMANCE Jack Choquette NVIDIA Hot Chips 2017 1 TESLA V100 21B transistors 815 mm 2 80 SM 5120 CUDA Cores 640 Tensor Cores 16 GB HBM2 900 GB/s HBM2 300 GB/s NVLink *full GV100
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationThe Architecture and the Application Performance of the Earth Simulator
The Architecture and the Application Performance of the Earth Simulator Ken ichi Itakura (JAMSTEC) http://www.jamstec.go.jp 15 Dec., 2011 ICTS-TIFR Discussion Meeting-2011 1 Location of Earth Simulator
More informationPLAN-E Workshop Switzerland. Welcome! September 8, 2016
PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationWhat is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms
CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D
More informationGPU Consideration for Next Generation Weather (and Climate) Simulations
GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.
More informationPresent and Future Leadership Computers at OLCF
Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationGPUS FOR NGVLA. M Clark, April 2015
S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40
More informationCLAW FORTRAN Compiler source-to-source translation for performance portability
CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary
More informationRunning the FIM and NIM Weather Models on GPUs
Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationS8688 : INSIDE DGX-2. Glenn Dearth, Vyas Venkataraman Mar 28, 2018
S8688 : INSIDE DGX-2 Glenn Dearth, Vyas Venkataraman Mar 28, 2018 Why was DGX-2 created Agenda DGX-2 internal architecture Software programming model Simple application Results 2 DEEP LEARNING TRENDS Application
More informationPorting COSMO to Hybrid Architectures
Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,
More informationBigger GPUs and Bigger Nodes. Carl Pearson PhD Candidate, advised by Professor Wen-Mei Hwu
Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by Professor Wen-Mei Hwu 1 Outline Experiences from working with domain experts to develop GPU codes on Blue Waters
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationPower Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017
Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationVSC Users Day 2018 Start to GPU Ehsan Moravveji
Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationNVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU
NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU WP-08608-001_v1.1 August 2017 WP-08608-001_v1.1 TABLE OF CONTENTS Introduction to the NVIDIA Tesla V100 GPU Architecture...
More informationNVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas
NVidia s GPU Microarchitectures By Stephen Lucas and Gerald Kotas Intro Discussion Points - Difference between CPU and GPU - Use s of GPUS - Brie f History - Te sla Archite cture - Fermi Architecture -
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationOpenPOWER Performance
OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationPorting the ICON Non-hydrostatic Dynamics and Physics to GPUs
Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs William Sawyer (CSCS/ETH), Christian Conti (ETH), Xavier Lapillonne (C2SM/ETH) Programming weather, climate, and earth-system models on heterogeneous
More informationThe challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.
The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationAccelerating Insights In the Technical Computing Transformation
Accelerating Insights In the Technical Computing Transformation Dr. Rajeeb Hazra Vice President, Data Center Group General Manager, Technical Computing Group June 2014 TOP500 Highlights Intel Xeon Phi
More informationThe next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency
The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar
More informationDeutscher Wetterdienst
Accelerating Work at DWD Ulrich Schättler Deutscher Wetterdienst Roadmap Porting operational models: revisited Preparations for enabling practical work at DWD My first steps with the COSMO on a GPU First
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationCST STUDIO SUITE R Supported GPU Hardware
CST STUDIO SUITE R 2017 Supported GPU Hardware 1 Supported Hardware CST STUDIO SUITE currently supports up to 8 GPU devices in a single host system, meaning each number of GPU devices between 1 and 8 is
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationExecution Models for the Exascale Era
Execution Models for the Exascale Era Nicholas J. Wright Advanced Technology Group, NERSC/LBNL njwright@lbl.gov Programming weather, climate, and earth- system models on heterogeneous muli- core plajorms
More informationAn update on the COSMO- GPU developments
An update on the COSMO- GPU developments COSMO User Workshop 2014 X. Lapillonne, O. Fuhrer, A. Arteaga, S. Rüdisühli, C. Osuna, A. Roches and the COSMO- GPU team Eidgenössisches Departement des Innern
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University How to.. Process terabytes
More informationFra superdatamaskiner til grafikkprosessorer og
Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation
More informationUpdate on Cray Activities in the Earth Sciences
Update on Cray Activities in the Earth Sciences Presented to the 13 th ECMWF Workshop on the Use of HPC in Meteorology 3-7 November 2008 Per Nyberg nyberg@cray.com Director, Marketing and Business Development
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationExperiences with GPGPUs at HLRS
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Experiences with GPGPUs at HLRS Stefan Wesner, Managing Director High
More informationApril 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,
April 4-7, 2016 Silicon Valley INSIDE PASCAL Mark Harris, October 27, 2016 @harrism INTRODUCING TESLA P100 New GPU Architecture CPU to CPUEnable the World s Fastest Compute Node PCIe Switch PCIe Switch
More informationGPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N.
GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran G. Ruetsch, M. Fatica, E. Phillips, N. Juffa Outline WRF and RRTM Previous Work CUDA Fortran Features RRTM in CUDA
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationPerformance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf
PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationDirective-based Programming for Highly-scalable Nodes
Directive-based Programming for Highly-scalable Nodes Doug Miles Michael Wolfe PGI Compilers & Tools NVIDIA Cray User Group Meeting May 2016 Talk Outline Increasingly Parallel Nodes Exposing Parallelism
More informationS8901 Quadro for AI, VR and Simulation
S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more
More informationNVIDIA Tesla P100. Whitepaper. The Most Advanced Datacenter Accelerator Ever Built. Featuring Pascal GP100, the World s Fastest GPU
Whitepaper NVIDIA Tesla P100 The Most Advanced Datacenter Accelerator Ever Built Featuring Pascal GP100, the World s Fastest GPU NVIDIA Tesla P100 WP-08019-001_v01.2 1 Table of Contents Introduction...
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationInteractive Supercomputing for State-of-the-art Biomolecular Simulation
Interactive Supercomputing for State-of-the-art Biomolecular Simulation John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationProgress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core
Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Tom Henderson NOAA/OAR/ESRL/GSD/ACE Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden,
More informationIntroduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes
Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More information