ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
COMMODITY DISRUPTS CUSTOM SOURCE: Top500
ACCELERATED COMPUTING: THE PATH FORWARD It s time to start planning for the end of Moore s Law, and it s worth pondering how it will end, not just when. Robert Colwell Director, Microsystems Technology Office, DARPA
NVIDIA ACCELERATES COMPUTING TFLOPS Fast GPU Engineered for High Throughput NVIDIA GPU x86 CPU Productive Programming Model & Tools Expert Co-Design Accessibility 3.0 2.5 K80 APPLICATION 2.0 1.5 1.0 0.5 0.0 M1060 M2090 K20 K40 Fast GPU + Strong CPU 2008 2009 2010 2011 2012 2013 2014 MIDDLEWARE SYS SW LARGE SYSTEMS PROCESSOR
125 ACCELERATORS SURGE IN WORLD S TOP SUPERCOMPUTERS 100 75 50 25 Top500: # of Accelerated Supercomputers 100+ accelerated systems now on Top500 list 1/3 of total FLOPS powered by accelerators NVIDIA Tesla GPUs sweep 23 of 24 new accelerated supercomputers Tesla supercomputers growing at 50% CAGR over past five years 0 2013 2014 2015
MACHINE LEARNING HPC S 1 ST CONSUMER KILLER-APP NADELLA: SMART AGENTS LIKE CORTANA WILL REPLACE THE WEB BROWSER -BI GOOGLE OPEN-SOURCES TENSORFLOW FACEBOOK MESSENGER ADDS FACIAL RECOGNITION YOUTUBE: CLICK-TO-BUY ADS MICROSOFT OPEN-SOURCES DMTK GOOGLE PHOTOS: ML-POWERED FEATURES
TESLA FOR MACHINE LEARNING 10M Users 40 years of video/day HYPERSCALE SUITE GPU REST Engine GPU Accelerated FFmpeg Image Compute Engine 270M Items sold/day TESLA M40 43% on mobile devices POWERFUL: Fastest Deep Learning Performance TESLA M4 LOW POWER: Highest Hyperscale Throughput
MACHINE LEARNING REVOLUTIONIZING TRANSPORTATION Toyota Invests $1 Billion in Artificial Intelligence in U.S. U.S. News & World Report
END-TO-END MACHINE LEARNING PLATFORM FOR AUTONOMOUS CARS DRIVE PX 100% 90% NVDRIVENET on KITTI Object Detection BAIDU NVIDIA 80% 70% 72% 75% 79% 83% 66% DIGITS DevBox 60% 50% 62% 55% 45% 40% 39% 30% 7/8 7/22 8/5 8/19 9/2 9/16 9/30 10/14 10/28 11/10
MACHINE LEARNING REVOLUTIONIZING AUTONOMOUS MACHINES
Images / Sec / Watt 50 10x Energy Efficiency Alexnet 40 JETSON TX1 Supercomputer on a Module 30 20 10 0 Intel Core i7-6700k (Skylake) Jetson TX1 GPU CPU Memory Power 1 TFLOPS 256-core Maxwell 64-bit ARM A57s 4GB LPDDR4 26 GB/s Under 10W Under 10W for typical use cases
SUPERCOMPUTING EVERYWHERE PC GAMING Titan X for PC Tesla in the Cloud DRIVE PX for Auto Jetson TX1 for Robots
ACCELERATED COMPUTING: THE PATH FORWARD Ian Buck, VP of Accelerated Computing, NVIDIA SC15 Nov. 16, 2015
TESLA ACCELERATES DISCOVERY AND INSIGHT SIMULATION MACHINE LEARNING VISUALIZATION 270M Items sold/day 43% on mobile devices TESLA ACCELERATED COMPUTING
Approximately a third of HPC systems operating today are equipped with accelerators and nearly half of all newly deployed systems have them. ACCELERATED COMPUTING: A TIPPING POINT FOR HPC, Intersect360 Nov 2015
70% OF TOP HPC APPS NOW ACCELERATED INTERSECT360 SURVEY OF TOP APPS VASP NOW ACCELERATED Typically Consumes 10-25% of HPC System Top 10 HPC Apps 90% Accelerated Top 50 HPC Apps 70% Accelerated 1 Dual K80 Server 1.3x 4 Dual CPU Servers 1.0x Intersect360, Nov 2015 HPC Application Support for GPU Computing Dual-socket Xeon E5-2690 v2 3GHz, Dual Tesla K80, FDR InfiniBand Dataset: NiAl-MD Blocked Davidson
370 GPU-Accelerated Applications www.nvidia.com/appscatalog
TESLA FOR SIMLUATION LIBRARIES DIRECTIVES LANGUAGES ACCELERATED COMPUTING TOOLKIT TESLA ACCELERATED COMPUTING
TESLA K80 World s Fastest Accelerator for HPC Dual CPU Server Tesla K80 Server 5x Faster AMBER Performance Simulation Time from 1 Month to 1 Week 0 5 10 15 20 25 30 # of Days CUDA Cores 2496 Peak DP Peak DP w/ Boost GDDR5 Memory Bandwidth Power 1.9 TFLOPS 2.9 TFLOPS 24 GB 480 GB/s 300 W AMBER Benchmark: PME-JAC-NVE Simulation for 1 microsecond CPU: E5-2698v3 @ 2.3GHz. 64GB System Memory, CentOS 6.2
APPLICATION PERFORMANCE BOOSTS DATA CENTER THROUGHPUT TESLA K80: 5X FASTER 1/3 OF NODES ACCELERATED, 2X SYSTEM THROUGHPUT 15x Speed-up vs Dual CPU K80 CPU CPU-only System Accelerated System 10x 5x 0x QMCPACK LAMMPS CHROMA NAMD AMBER 100 Jobs Per Day 220 Jobs Per Day CPU: Dual E5-2698 v3@2.3ghz 3.6GHz, 64GB System Memory, CentOS 6.2 GPU: Single Tesla K80, Boost enabled
Speedup vs Single CPU Core OPENACC DELIVERS TRUE PERF PORTABILITY Paving the Path Forward: Single Code for All HPC Processors 35x Application Performance Benchmark CPU: MPI + OpenMP CPU: MPI + OpenACC CPU + GPU: MPI + OpenACC 30x 25x 20x 15x 30.3x 10x 5x 0x 11.9x 7.6x 7.1x 7.1x 4.1x 4.3x 5.2x 5.3x 359.MINIGHOST (MANTEVO) NEMO (CLIMATE & OCEAN) CLOVERLEAF (PHYSICS) 359.miniGhost: CPU: Intel Xeon E5-2698 v3, 2 sockets, 32-cores total, GPU: Tesla K80- single GPU NEMO: Each socket CPU: Intel Xeon E5-2698 v3, 16 cores; GPU: NVIDIA K80 both GPUs CLOVERLEAF: CPU: Dual socket Intel Xeon CPU E5-2690 v2, 20 cores total, GPU: Tesla K80 both GPUs
TESLA HYPERSCALE FOR MACHINE LEARNING 10M Users 40 years of video/day HYPERSCALE SUITE GPU REST Engine GPU Accelerated FFmpeg Image Compute Engine 270M Items sold/day TESLA M40 43% on mobile devices POWERFUL: Fastest Deep Learning Performance TESLA M4 LOW POWER: Highest Hyperscale Throughput
8x Faster Caffe Performance TESLA M40 World s Fastest Accelerator for Deep Learning CPU Tesla M40 Reduce Training Time from 8 Days to 1 Day 0 1 2 3 4 5 6 7 8 9 # of Days CUDA Cores 3072 Peak SP 7 TFLOPS GDDR5 Memory Bandwidth Power 12 GB 288 GB/s 250W Caffe Benchmark: AlexNet training throughput based on 20 iterations, CPU: E5-2697v2 @ 2.70GHz. 64GB System Memory, CentOS 6.2
Video Processing Stabilization and Enhancements Image Processing Resize, Filter, Search, Auto-Enhance 4x 5x TESLA M4 Highest Throughput Hyperscale Workload Acceleration Video Transcode 2x H.264 & H.265, SD & HD Machine Learning Inference 2x CUDA Cores 1024 Peak SP 2.2 TFLOPS GDDR5 Memory Bandwidth Form Factor Power 4 GB 88 GB/s PCIe Low Profile 50 75 W Preliminary specifications. Subject to change.
TESLA FOR VISUALIZATION IRAY OPTIX INDEX VISUALIZATION TOOLS FOR HPC TESLA ACCELERATED COMPUTING
GROWING ADOPTION IN CLIMATE & WEATHER MeteoSwiss Deploys World s First Accelerated Weather Supercomputer 2x higher resolution for daily forecasts 14x more simulation with ensemble approach for medium range forecasts NOAA Chooses Tesla To Improve Weather Forecast Research Develop global model with 3km resolution, five-fold increase from today s resolution Improved resolution requires 40x higher in computational complexity
NEXT-GEN SUPERCOMPUTERS ARE GPU-ACCELERATED SIMULATION MACHINE LEARNING VISUALIZATION SUMMIT SIERRA U.S. Dept. of Energy Pre-Exascale Supercomputers for Science IBM Watson Breakthrough Natural Language Processing for Cognitive Computing NOAA New Supercomputer for Next-Gen Weather Forecasting TESLA ACCELERATED COMPUTING
World s first in-situ weather simulation, running on Meteoswiss supercomputer Analyze quantum effects in nanowire using 10,800 GPUs on TITAN ENTERPRISE supercomputer Simulation of deadly tornado that hit El Reno, Oklahoma on May 24, 2011 Predicting drug reactions for personalized medicine with GPU-powered IBM Spark ACCELERATED SCIENCE AND DATA ANALYTICS ON DISPLAY AT SC 15