RECENT UPDATES ON ACCELERATING COMPUTING PLATFORM PRADEEP GUPTA SENIOR SOLUTIONS ARCHITECT, NVIDIA
|
|
- Julian Douglas
- 6 years ago
- Views:
Transcription
1 RECENT UPDATES ON ACCELERATING COMPUTING PLATFORM PRADEEP GUPTA SENIOR SOLUTIONS ARCHITECT, NVIDIA
2 GAMING AUTO ENTERPRISE HPC & CLOUD OEM & IP THE WORLD LEADER IN VISUAL COMPUTING 2
3 # of GPU Developers GPU Accelerator Redefined Parallel Computing in HPC Summit & Sierra: U.S. Announces Two Pre-Exascale Supercomputers Powered by GPU & NVLink Breakthrough in HIV Research: World s Largest Simulation of Virus Uncovers New Discovery Deep Learning: Univ. of Toronto Team Uses GPUs to Win Image-Net Competition, Google Acquires Team Oak Ridge TITAN: World s Fastest Supercomputer Top500: 3 of Top 5 Supercomputers with Tesla GPUs Tsubame: World s First GPU Supercomputer NVIDIA Launches CUDA
4 Vision: Mainstream Parallel Programming Enable more programmers to write portable parallel software in their language of choice Embrace and evolve standards in key languages CUDA continues to evolve as the target low-level platform for GPU acceleration C 4
5 Three Ways to Accelerate Your Application Applications Libraries Directives Languages Drop-in Acceleration Annotate code with compiler hints Modern language features (unified memory, for_each, lambda) 5
6 OpenACC Simple Powerful Portable Fueling the Next Wave of Scientific Discoveries in HPC main() { <serial code> #pragma acc kernels //automatically runs on GPU { <parallel code> } } RIKEN Japan NICAM- Climate Modeling 7-8x Speed-Up 5% of Code Modified University of Illinois PowerGrid- MRI Reconstruction 70x Speed-Up 2 Days of Effort Developers using OpenACC
7 Introducing the NVIDIA OpenACC Toolkit Free Toolkit Offers Simple & Powerful Path to Accelerated Computing PGI Compiler Free OpenACC compiler for academia NVProf Profiler Easily find where to add compiler directives Code Samples Learn from examples of real-world algorithms Documentation Quick start guide, Best practices, Forums Download at 8
8 Three Ways to Accelerate Your Application Applications Libraries Directives Languages Drop-in Acceleration Annotate code with compiler hints Modern language features (unified memory, for_each, lambda) 10
9 Unified Memory: Simpler & Faster with NVLink Traditional Developer View Developer View With Unified Memory Developer View With Pascal & NVLink NVLink 80 GB/s System Memory GPU Memory Unified Memory Unified Memory Share Data Structures at CPU Memory Speeds, not PCIe speeds Oversubscribe GPU Memory 11
10 FAMILIAR CODING STYLE, SINGLE CODE PATH Build parallel algorithms with C++ Parallel for * CPU Sequential for_each() void saxpy(int N, float a, float *x, float *y) { CPU/GPU Thrust Parallel for_each() void saxpy(int N, float a, float *x, float *y) { } using namespace std; auto r = range(0, N); for_each (begin(r), end(r), [=] (int i) { y[i] = a * x[i] + y[i]; }); } using namespace thrust; auto r = counting_iterator<int>(0); for_each (device, r, r+n, [=] device (int i) { y[i] = a * x[i] + y[i]; }); * Available today as an experimental feature in CUDA
11 Portable, High-level Parallel Code TODAY Thrust library allows the same C++ code to target both: NVIDIA GPUs x86, ARM and POWER CPUs Thrust was the inspiration for a proposal to the ISO C++ Committee Committee voted unanimously to accept as official tech. specification working draft N3960 Technical Specification Working Draft: Prototype: 13
12 Tesla Platform Tesla GPU NVLink IBM Power ARM 14
13 Tesla Accelerated Computing Platform Data Center Infrastructure Development System Solutions Communication Infrastructure Management Programming Languages Development Tools Software Solutions / GPU Accelerators GPU Boost Interconnect GPU Direct NVLink System Management NVML / NVIDIA-SMI Compiler Solutions LLVM Profile and Debug CUDA Debugging API Libraries cublas 15
14 TESLA K80 WORLD S FASTEST ACCELERATOR FOR DATA ANALYTICS AND SCIENTIFIC COMPUTING Dual-GPU Accelerator for Max Throughput 2x Faster 2.9 TF 4992 Cores 480 GB/s 25x 20x 15x Deep Learning: Caffe Double the Memory Designed for Big Data Apps 24GB K40 12GB Maximum Performance Dynamically Maximize Perf for Every Application 10x 5x 0x CPU Tesla K40 Tesla K80 Oil & HPC Gas Viz Data Analytics Caffe Benchmark: AlexNet training throughput based on 20 iterations, CPU: 2.70GHz. 64GB System Memory, CentOS 6.2, Peak Perf with GPU Boost on 16
15 330+ GPU-Accelerated Applications 18
16 KEPLER GPU PASCAL GPU NVLink NVLink High-speed GPU Interconnect POWER CPU NVLink PCIe PCIe X86 ARM64 POWER CPU 2014 X86 ARM64 CPU
17 Major Data Center OEMs Support NVLink 20
18 US to Build Two Flagship Supercomputers Powered by the Tesla Platform PFLOPS Peak 10x in Scientific App Performance IBM POWER9 CPU + NVIDIA Volta GPU NVLink High Speed Interconnect 40 TFLOPS per Node, >3,400 Nodes 2017 Major Step Forward on the Path to Exascale 21
19 Accelerated Computing 5x Higher Energy Efficiency GB/s IBM POWER CPU Most Powerful Serial Processor NVIDIA NVLink Fastest CPU-GPU Interconnect NVIDIA Volta GPU Most Powerful Parallel Processor 22
20 IBM HPC Application Update 184 Total applications planned for port 108 Total apps ported to POWER 34 POWER + GPU port complete 24 POWER + GPU port in process 13 Libraries and benchmarks complete 330+ GPU-Accelerated Applications 23
21 Performance ratio Performance ratio NAMD on POWER + GPU NAMD Relative performance (STMV) NAMD Relative performance (apoa01) 450% 500% 400% 350% 371% 395% 450% 400% 414% 441% 300% 350% 250% 1-Haswell 300% 1-Haswell 2-Power8 42L 250% 2-Power8 42L 200% 3-Power8 42A 4-Haswell & 2x K40 200% 3-Power8 42A 4-Haswell & 2x K40 150% 100% 100% 132% 151% 5-Power8 42L & 2x K40 150% 100% 100% 141% 169% 5-Power8 42L & 2x K40 50% 50% 0% STMV Configuration 0% APOA01 Configuration 24
22 PGI FOR OPENPOWER + TESLA Feature parity with PGI Compilers on Linux/x86+Tesla CUDA Fortran, OpenACC, OpenMP, CUDA C/C++ host compiler Integrated with IBM s optimized LLVM / Power code generator Limited access in 2015, Beta 1H 2016, Production in 2016 x86 Recompile 25
23 Enterprise Services for Premier Tesla Support Maximize Uptime & Efficiency for GPU Deployments in the Data Center Rapid response & timely issue resolution Long-term support & maintenance Direct communication w/ tech. experts On-Site consultation, training and more for subscribers Rapid Response to Critical Issues Avg. Days Public Release Maintenance Release Hot-Fix Release 26
24 CUDA bit Floating-Point Storage 2x larger datasets in GPU memory Great for Deep Learning cusparse Dense Matrix * Sparse Vector Speeds up Natural Language Processing Instruction-Level Profiling Pinpoint performance bottlenecks Easier to apply advanced optimizations Release Schedule: 7/6: Release Candidate ~Sept: Production Release *Experimental* GPU Lambdas NVIDIA Confidential. For use under NDA 27
25 AN AWESOME DEVELOPER PLATFORM GeForce GTX TITAN, TITAN Black, TITAN Z and TITAN X GeForce TITAN series GPUs now support: TCC Mode Multi-process server (MPS) CUDA Stream Priorities All relevant nvidia-smi commands * Most of these features will across the entire GeForce product line 28
26 WINDOWS REMOTE DESKTOP CUDA will work with Remote Desktop starting r352 CUDA apps will be able to run as a service on Windows Will work across all GPU products supported on Windows 29
27 ADDITIONAL IMPROVEMENTS See release notes and documentation for more details 64-bit API for cufft n-dimensional Euclidian norm floating-point math functions Bayer CFA to RGB conversion functions in NPP Faster double-precision square-roots (sqrt) CUDA Samples for the cusolver library Nsight Eclipse Edition supported on POWER platform Nsight Eclipse Edition supports multiple CUDA Toolkit versions 30
28 x86_64 Platform Support Linux RHEL & CentOS 6, 7 Fedora 21 Workstation SLES 11 SP3, 12 OpenSUSE 13.2 Ubuntu LTS, Windows 7, 8.1, 10, Server 2008 R2, 2012 R2 Visual Studio 2010, 2012, 2013 [CE] Mac OSX 10.9, 10.10, (~Sept) Alternative Linux host compilers Clang 3.5, 3.6 Intel icc PGI pgc (+) CUDA 7.5 drops support for: Ubuntu LTS on x86 cuda-gdb native debugging on Mac CUDA 7.5 announces deprecation of: Legacy profiler: Use nvprof instead Microsoft Visual Studio 2010 support These will be dropped in a future release 31
29 Power8 Platform Support Linux RHEL 7.2 Ubuntu Alternative Linux POWER8 compilers IBM xlc/xlc 13.1.x 32
30 THANK YOU 33
31 Backup 34
32 Industry Momentum 35
33 Widespread Use of GPUs in Climate & Weather Climate Model GPU Approach Collaboration NICAM OpenACC RIKEN, TiTech CAM-SE (ACME, CESM) OpenACC, CUDA DOE (ORNL, SNL), PGI Weather Ocean WRF OpenACC (1), CUDA (2) (1) NCAR-MMM, (2) SSEC COSMO OpenACC, CUDA CSCS, MeteoSwiss, PGI NIM OpenACC, F2C-ACC NOAA-ESRL, PGI ICON OpenACC CSCS, MPI-M, PGI IFS OpenACC ECMWF, CSC-FI MPAS-A OpenACC NCAR, NOAA-ESRL JMA-GSM, 4DVAR, ASUCA OpenACC, CUDA, H-F JMA, Hitachi, TiTech NEMO OpenACC STFC Additional Evaluations USA GEOS-5, HiRAM, HYCOM, MOM6, COAMPS, MPAS-O, CICE, ROMS, OLAM Europe DYNAMICO, HARMONIE, UM/GungHo/GOcean, ECHAM6 Asia-Pacific GRAPES, KWRF, CFSv2 (IN) 36
34 MeteoSwiss GPU-Driven Weather Prediction MeteoSwiss COSMO NWP Configurations Since 2008 IFS from ECMWF 2 per day, 10 day forecast COSMO 7 (6.6 KM) 3 per day, 3 day forecast COSMO 2 (2.2 KM) 8 per day, 24 hr forecast Before GPUs MeteoSwiss COSMO NWP Configurations During 2016 IFS from ECMWF 2 per day, 10 day forecast COSMO E (2.2 KM) 2 per day, 5 day forecast COSMO 1 (1.1 KM) 8 per day, 24 hr forecast With GPUs New configurations of higher resolution and ensemble predictions possible owing to the performance-per-energy gains from GPUs X. Lapillonne, MeteoSwiss; EGU Assembly, Apr 2015 NVIDIA Confidential. For use under NDA 37
35 The hardware needed to emulate the human brain may be ready even sooner than he predicted in around 2020 using technologies such as graphics processing units (GPUs), which are ideal for brain-software algorithms. Interview with Ray Kurzweil, Director of Engineering at Google & Renowned Futurist Washington Post, April 23,
36 GPUs Turbocharging Data Science We love GPU cards. We just use a lot of them. Jeff Dean, Google In five years, we think 50% of queries will be speech or images. Andrew Ng, Baidu 39
37 Competitive Update 40
38 Phi Struggles with Real World Apps Multigrid Solver: ~66% slower on Phi (Relative to Sandy Bridge) BerkeleyGW MS code: Phi less than 20% faster (Relative to Sandy Bridge) FLASH code: 65% slower on Phi (Relative to Sandy Bridge) 41
39 EXISTING WORKLOADS DON T USE PHI TACC Stampede Utilization 6400 Nodes Phi Provides >75% of FLOPs Less than 3% of node hours executed in Phi queues Sources:
40 330+ GPU-Accelerated Applications 43
41 Three Ways to Accelerate Your Application Applications Libraries Directives Languages Drop-in Acceleration Annotate code with compiler hints Modern language features (unified memory, for_each, lambda) 44
42 5X 10X SPEEDUP USING NVIDIA LIBRARIES BLAS LAPACK SPARSE FFT Math Deep Learning Graphs Image & Signal Processing 45
43 NVLink Unleashes Multi-GPU Performance GPUs Interconnected with NVLink CPU Speedup vs PCIe based Server 2.25x 2.00x Over 2x Application Performance Speedup When Next-Gen GPUs Connect via NVLink Versus PCIe PCIe Switch 1.75x TESLA GPU TESLA GPU 1.50x 1.25x 5x Faster than PCIe Gen3 x x ANSYS Fluent Multi-GPU Sort LQCD QUDA AMBER 3D FFT To learn more: 3D FFT, ANSYS: 2 GPU configuration, All other apps comparing 4 GPU 46 configuration AMBER Cellulose (256x128x128), FFT problem size (256^3)
CUDA 7.5 OVERVIEW WEBINAR 7/23/15
CUDA 7.5 OVERVIEW WEBINAR 7/23/15 CUDA 7.5 https://developer.nvidia.com/cuda-toolkit 16-bit Floating-Point Storage 2x larger datasets in GPU memory Great for Deep Learning cusparse Dense Matrix * Sparse
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationEXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA
EXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA Mark Harris, NVIDIA @harrism #NVSC14 EXTENDING THE REACH OF CUDA 1 Machine Learning 2 Higher Performance 3 New Platforms 4 New Languages 2 GPUS: THE
More informationHPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015
HPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015 Accelerators Surge in World s Top Supercomputers 125 100 75 Top500: # of Accelerated Supercomputers 100+ accelerated systems
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationTESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ
TESLA ACCELERATED COMPUTING Mike Wang Solutions Architect NVIDIA Australia & NZ mikewang@nvidia.com GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER
More informationGPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationOpenACC Course Lecture 1: Introduction to OpenACC September 2015
OpenACC Course Lecture 1: Introduction to OpenACC September 2015 Course Objective: Enable you to accelerate your applications with OpenACC. 2 Oct 1: Introduction to OpenACC Oct 6: Office Hours Oct 15:
More informationOpenPOWER Performance
OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationGPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation
GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationACCELERATING SANJEEVINI: A DRUG DISCOVERY SOFTWARE SUITE
ACCELERATING SANJEEVINI: A DRUG DISCOVERY SOFTWARE SUITE Abhilash Jayaraj, IIT Delhi Shashank Shekhar, IIT Delhi Bharatkumar Sharma, Nvidia Nagavijayalakshmi, Nvidia AGENDA Quick Introduction to Computer
More informationPeter Messmer Developer Technology Group Stan Posey HPC Industry and Applications
Peter Messmer Developer Technology Group pmessmer@nvidia.com Stan Posey HPC Industry and Applications sposey@nvidia.com U Progress Reported at This Workshop 2011 2012 CAM SE COSMO GEOS 5 CAM SE COSMO GEOS
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationStatus and Directions of NVIDIA GPUs for Earth System Modeling
Status and Directions of NVIDIA GPUs for Earth System Modeling Stan Posey HPC Industry Development NVIDIA, Santa Clara, CA, USA 1 NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationTESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications
TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationTESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications
TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationDEEP NEURAL NETWORKS AND GPUS. Julie Bernauer
DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationPresent and Future Leadership Computers at OLCF
Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationNVIDIA HPC Directions for Earth System Models. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
NVIDIA HPC Directions for Earth System Models Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC DIRECTIONS TOPICS OF DISCUSSION ESM GPU PROGRESS PGI UPDATE D. NORTON
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More informationCOMP Parallel Computing. Programming Accelerators using Directives
COMP 633 - Parallel Computing Lecture 15 October 30, 2018 Programming Accelerators using Directives Credits: Introduction to OpenACC and toolkit Jeff Larkin, Nvidia COMP 633 - Prins Directives for Accelerator
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationRunning the FIM and NIM Weather Models on GPUs
Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution
More informationFuture Directions for CUDA Presented by Robert Strzodka
Future Directions for CUDA Presented by Robert Strzodka Authored by Mark Harris NVIDIA Corporation Platform for Parallel Computing Platform The CUDA Platform is a foundation that supports a diverse parallel
More informationIntroduction to GPU Computing. 周国峰 Wuhan University 2017/10/13
Introduction to GPU Computing chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 GPU and Its Application 3 Ways to Develop Your GPU APP An Example to Show the Developments Add GPUs: Accelerate Science
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationGPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester
NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect
More informationIntroduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator
Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming
More informationNVIDIA GPUs in Earth System Modelling Thomas Bradley
NVIDIA GPUs in Earth System Modelling Thomas Bradley Agenda: GPU Developments for CWO Motivation for GPUs in CWO Parallelisation Considerations GPU Technology Roadmap MOTIVATION FOR GPUS IN CWO NVIDIA
More informationNOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY. Peter Messmer
NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY Peter Messmer pmessmer@nvidia.com COMPUTATIONAL CHALLENGES IN HEP Low-Level Trigger High-Level Trigger Monte Carlo Analysis Lattice QCD 2 COMPUTATIONAL
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationAccelerator programming with OpenACC
..... Accelerator programming with OpenACC Colaboratorio Nacional de Computación Avanzada Jorge Castro jcastro@cenat.ac.cr 2018. Agenda 1 Introduction 2 OpenACC life cycle 3 Hands on session Profiling
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationGPU Developments in Atmospheric Sciences
GPU Developments in Atmospheric Sciences Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA David Hall, Ph.D., Sr. Solutions Architect, NVIDIA, Boulder, CO, USA NVIDIA HPC UPDATE
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationNVIDIA GPU Computing Séminaire Calcul Hybride Aristote 25 Mars 2010
NVIDIA GPU Computing 2010 Séminaire Calcul Hybride Aristote 25 Mars 2010 NVIDIA GPU Computing 2010 Tesla 3 rd generation Full OEM coverage Ecosystem focus Value Propositions per segments Card System Module
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationARCHER Champions 2 workshop
ARCHER Champions 2 workshop Mike Giles Mathematical Institute & OeRC, University of Oxford Sept 5th, 2016 Mike Giles (Oxford) ARCHER Champions 2 Sept 5th, 2016 1 / 14 Tier 2 bids Out of the 8 bids, I know
More informationTESLA V100 PERFORMANCE GUIDE May 2018
TESLA V100 PERFORMANCE GUIDE May 2018 TESLA V100 The Fastest and Most Productive GPU for AI and HPC Volta Architecture Tensor Core Improved NVLink & HBM2 Volta MPS Improved SIMT Model Most Productive GPU
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationQuantum ESPRESSO on GPU accelerated systems
Quantum ESPRESSO on GPU accelerated systems Massimiliano Fatica, Everett Phillips, Josh Romero - NVIDIA Filippo Spiga - University of Cambridge/ARM (UK) MaX International Conference, Trieste, Italy, January
More informationMay 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND. Mark Harris, May 10, 2017
May 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND Mark Harris, May 10, 2017 INTRODUCING CUDA 9 BUILT FOR VOLTA FASTER LIBRARIES Tesla V100 New GPU Architecture Tensor Cores NVLink Independent Thread Scheduling
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationIntel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2
Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting
More informationProgramming NVIDIA GPUs with OpenACC Directives
Programming NVIDIA GPUs with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate Programming NVIDIA GPUs with OpenACC Directives Michael Wolfe mwolfe@nvidia.com http://www.pgroup.com/accelerate
More informationScaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies
Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology
More informationCUDA: NEW AND UPCOMING FEATURES
May 8-11, 2017 Silicon Valley CUDA: NEW AND UPCOMING FEATURES Stephen Jones, GTC 2018 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000 GTC ATTENDEES 8,000+ 2 CUDA
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationProgress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core
Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Tom Henderson NOAA/OAR/ESRL/GSD/ACE Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden,
More informationDGX UPDATE. Customer Presentation Deck May 8, 2017
DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated
More informationThe Visual Computing Company
The Visual Computing Company Update NVIDIA GPU Ecosystem Axel Koehler, Senior Solutions Architect HPC, NVIDIA Outline Tesla K40 and GPU Boost Jetson TK-1 Development Board for Embedded HPC Pascal GPU 3D
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationGPU Computing Ecosystem
GPU Computing Ecosystem CUDA 5 Enterprise level GPU Development GPU Development Paths Libraries, Directives, Languages GPU Tools Tools, libraries and plug-ins for GPU codes Tesla K10 Kepler! Tesla K20
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationVSC Users Day 2018 Start to GPU Ehsan Moravveji
Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally
More informationSTRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC. Stefan Maintz, Dr. Markus Wetzstein
STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC Stefan Maintz, Dr. Markus Wetzstein smaintz@nvidia.com; mwetzstein@nvidia.com Companies Academia VASP USERS AND USAGE 12-25% of CPU cycles @ supercomputing
More informationVOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017
VOLTA: PROGRAMMABILITY AND PERFORMANCE Jack Choquette NVIDIA Hot Chips 2017 1 TESLA V100 21B transistors 815 mm 2 80 SM 5120 CUDA Cores 640 Tensor Cores 16 GB HBM2 900 GB/s HBM2 300 GB/s NVLink *full GV100
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationPerformance of deal.ii on a node
Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions
More informationCUDA 6.0 Performance Report. April 2014
CUDA 6. Performance Report April 214 1 CUDA 6 Performance Report CUDART CUDA Runtime Library cufft Fast Fourier Transforms Library cublas Complete BLAS Library cusparse Sparse Matrix Library curand Random
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationOpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4
OpenACC Course Class #1 Q&A Contents OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC/CUDA/OpenMP Q: Is OpenACC an NVIDIA standard or is it accepted
More informationGPUS FOR NGVLA. M Clark, April 2015
S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40
More informationOpenPOWER Performance
OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER
More informationProductive Performance on the Cray XK System Using OpenACC Compilers and Tools
Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid
More informationDeutscher Wetterdienst
Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational
More informationENDURING DIFFERENTIATION. Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded
More informationENDURING DIFFERENTIATION Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationNVIDIA S VISION FOR EXASCALE. Cyril Zeller, Director, Developer Technology
NVIDIA S VISION FOR EXASCALE Cyril Zeller, Director, Developer Technology EXASCALE COMPUTING An industry target of 1 ExaFlops within 20 MW by 2020 1 ExaFlops: a necessity to advance science and technology
More informationMaximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs
Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationThe Arm Technology Ecosystem: Current Products and Future Outlook
The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed
More informationAccelerating Insights In the Technical Computing Transformation
Accelerating Insights In the Technical Computing Transformation Dr. Rajeeb Hazra Vice President, Data Center Group General Manager, Technical Computing Group June 2014 TOP500 Highlights Intel Xeon Phi
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More information