ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

Similar documents
GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation

The Tesla Accelerated Computing Platform

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

Building the Most Efficient Machine Learning System

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017

World s most advanced data center accelerator for PCIe-based servers

Building the Most Efficient Machine Learning System

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

Deep Learning mit PowerAI - Ein Überblick

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications

HPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015

Fast Hardware For AI

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

The Future of High Performance Interconnects

IBM CORAL HPC System Solution

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016

NVIDIA GPU TECHNOLOGY UPDATE

n N c CIni.o ewsrg.au

MACHINE LEARNING WITH NVIDIA AND IBM POWER AI

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

IBM Deep Learning Solutions

Accelerating High Performance Computing.

Timothy Lanfear, NVIDIA HPC

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017

TESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Interconnect Your Future

Stan Posey, NVIDIA, Santa Clara, CA, USA

DGX UPDATE. Customer Presentation Deck May 8, 2017

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester

TESLA V100 PERFORMANCE GUIDE May 2018

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo

IBM Power AC922 Server

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

April 2 nd, Bob Burroughs Director, HPC Solution Sales

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

RECENT UPDATES ON ACCELERATING COMPUTING PLATFORM PRADEEP GUPTA SENIOR SOLUTIONS ARCHITECT, NVIDIA

Looking ahead with IBM i. 10+ year roadmap

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC

GPUS FOR NGVLA. M Clark, April 2015

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Accelerator programming with OpenACC

GPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

NVIDIA PLATFORM FOR AI

Unlock business value with HPC & Artificial Intelligence. FORUM TERATEC June 19, 2018 José RODRIGUES HPC Sales Manager

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain

HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE

S8901 Quadro for AI, VR and Simulation

Interconnect Your Future

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Trends in HPC (hardware complexity and software challenges)

FUJITSU Server PRIMERGY CX400 M4 Workload-specific power in a modular form factor. 0 Copyright 2018 FUJITSU LIMITED

Building NVLink for Developers

Fra superdatamaskiner til grafikkprosessorer og

DEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA

Pedraforca: a First ARM + GPU Cluster for HPC

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

OpenPOWER Performance

NVIDIA Accelerators Models HPE NVIDIA GV100 Nvlink Bridge Kit HPE NVIDIA Tesla V100 FHHL 16GB Computational Accelerator

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

High Performance Computing with Accelerators

Exascale: challenges and opportunities in a power constrained world

NAMD GPU Performance Benchmark. March 2011

Embedded GPGPU and Deep Learning for Industrial Market

Machine Learning on VMware vsphere with NVIDIA GPUs

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Corporate Update. Enabling The Use of Data January Mellanox Technologies

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

Small is the New Big: Data Analytics on the Edge

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

April 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,

Deep learning in MATLAB From Concept to CUDA Code

Embedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

Intel High-Performance Computing. Technologies for Engineering

GPU Computing with NVIDIA s new Kepler Architecture

VSC Users Day 2018 Start to GPU Ehsan Moravveji

Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Transcription:

ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

COMMODITY DISRUPTS CUSTOM SOURCE: Top500

ACCELERATED COMPUTING: THE PATH FORWARD It s time to start planning for the end of Moore s Law, and it s worth pondering how it will end, not just when. Robert Colwell Director, Microsystems Technology Office, DARPA

NVIDIA ACCELERATES COMPUTING TFLOPS Fast GPU Engineered for High Throughput NVIDIA GPU x86 CPU Productive Programming Model & Tools Expert Co-Design Accessibility 3.0 2.5 K80 APPLICATION 2.0 1.5 1.0 0.5 0.0 M1060 M2090 K20 K40 Fast GPU + Strong CPU 2008 2009 2010 2011 2012 2013 2014 MIDDLEWARE SYS SW LARGE SYSTEMS PROCESSOR

125 ACCELERATORS SURGE IN WORLD S TOP SUPERCOMPUTERS 100 75 50 25 Top500: # of Accelerated Supercomputers 100+ accelerated systems now on Top500 list 1/3 of total FLOPS powered by accelerators NVIDIA Tesla GPUs sweep 23 of 24 new accelerated supercomputers Tesla supercomputers growing at 50% CAGR over past five years 0 2013 2014 2015

MACHINE LEARNING HPC S 1 ST CONSUMER KILLER-APP NADELLA: SMART AGENTS LIKE CORTANA WILL REPLACE THE WEB BROWSER -BI GOOGLE OPEN-SOURCES TENSORFLOW FACEBOOK MESSENGER ADDS FACIAL RECOGNITION YOUTUBE: CLICK-TO-BUY ADS MICROSOFT OPEN-SOURCES DMTK GOOGLE PHOTOS: ML-POWERED FEATURES

TESLA FOR MACHINE LEARNING 10M Users 40 years of video/day HYPERSCALE SUITE GPU REST Engine GPU Accelerated FFmpeg Image Compute Engine 270M Items sold/day TESLA M40 43% on mobile devices POWERFUL: Fastest Deep Learning Performance TESLA M4 LOW POWER: Highest Hyperscale Throughput

MACHINE LEARNING REVOLUTIONIZING TRANSPORTATION Toyota Invests $1 Billion in Artificial Intelligence in U.S. U.S. News & World Report

END-TO-END MACHINE LEARNING PLATFORM FOR AUTONOMOUS CARS DRIVE PX 100% 90% NVDRIVENET on KITTI Object Detection BAIDU NVIDIA 80% 70% 72% 75% 79% 83% 66% DIGITS DevBox 60% 50% 62% 55% 45% 40% 39% 30% 7/8 7/22 8/5 8/19 9/2 9/16 9/30 10/14 10/28 11/10

MACHINE LEARNING REVOLUTIONIZING AUTONOMOUS MACHINES

Images / Sec / Watt 50 10x Energy Efficiency Alexnet 40 JETSON TX1 Supercomputer on a Module 30 20 10 0 Intel Core i7-6700k (Skylake) Jetson TX1 GPU CPU Memory Power 1 TFLOPS 256-core Maxwell 64-bit ARM A57s 4GB LPDDR4 26 GB/s Under 10W Under 10W for typical use cases

SUPERCOMPUTING EVERYWHERE PC GAMING Titan X for PC Tesla in the Cloud DRIVE PX for Auto Jetson TX1 for Robots

ACCELERATED COMPUTING: THE PATH FORWARD Ian Buck, VP of Accelerated Computing, NVIDIA SC15 Nov. 16, 2015

TESLA ACCELERATES DISCOVERY AND INSIGHT SIMULATION MACHINE LEARNING VISUALIZATION 270M Items sold/day 43% on mobile devices TESLA ACCELERATED COMPUTING

Approximately a third of HPC systems operating today are equipped with accelerators and nearly half of all newly deployed systems have them. ACCELERATED COMPUTING: A TIPPING POINT FOR HPC, Intersect360 Nov 2015

70% OF TOP HPC APPS NOW ACCELERATED INTERSECT360 SURVEY OF TOP APPS VASP NOW ACCELERATED Typically Consumes 10-25% of HPC System Top 10 HPC Apps 90% Accelerated Top 50 HPC Apps 70% Accelerated 1 Dual K80 Server 1.3x 4 Dual CPU Servers 1.0x Intersect360, Nov 2015 HPC Application Support for GPU Computing Dual-socket Xeon E5-2690 v2 3GHz, Dual Tesla K80, FDR InfiniBand Dataset: NiAl-MD Blocked Davidson

370 GPU-Accelerated Applications www.nvidia.com/appscatalog

TESLA FOR SIMLUATION LIBRARIES DIRECTIVES LANGUAGES ACCELERATED COMPUTING TOOLKIT TESLA ACCELERATED COMPUTING

TESLA K80 World s Fastest Accelerator for HPC Dual CPU Server Tesla K80 Server 5x Faster AMBER Performance Simulation Time from 1 Month to 1 Week 0 5 10 15 20 25 30 # of Days CUDA Cores 2496 Peak DP Peak DP w/ Boost GDDR5 Memory Bandwidth Power 1.9 TFLOPS 2.9 TFLOPS 24 GB 480 GB/s 300 W AMBER Benchmark: PME-JAC-NVE Simulation for 1 microsecond CPU: E5-2698v3 @ 2.3GHz. 64GB System Memory, CentOS 6.2

APPLICATION PERFORMANCE BOOSTS DATA CENTER THROUGHPUT TESLA K80: 5X FASTER 1/3 OF NODES ACCELERATED, 2X SYSTEM THROUGHPUT 15x Speed-up vs Dual CPU K80 CPU CPU-only System Accelerated System 10x 5x 0x QMCPACK LAMMPS CHROMA NAMD AMBER 100 Jobs Per Day 220 Jobs Per Day CPU: Dual E5-2698 v3@2.3ghz 3.6GHz, 64GB System Memory, CentOS 6.2 GPU: Single Tesla K80, Boost enabled

Speedup vs Single CPU Core OPENACC DELIVERS TRUE PERF PORTABILITY Paving the Path Forward: Single Code for All HPC Processors 35x Application Performance Benchmark CPU: MPI + OpenMP CPU: MPI + OpenACC CPU + GPU: MPI + OpenACC 30x 25x 20x 15x 30.3x 10x 5x 0x 11.9x 7.6x 7.1x 7.1x 4.1x 4.3x 5.2x 5.3x 359.MINIGHOST (MANTEVO) NEMO (CLIMATE & OCEAN) CLOVERLEAF (PHYSICS) 359.miniGhost: CPU: Intel Xeon E5-2698 v3, 2 sockets, 32-cores total, GPU: Tesla K80- single GPU NEMO: Each socket CPU: Intel Xeon E5-2698 v3, 16 cores; GPU: NVIDIA K80 both GPUs CLOVERLEAF: CPU: Dual socket Intel Xeon CPU E5-2690 v2, 20 cores total, GPU: Tesla K80 both GPUs

TESLA HYPERSCALE FOR MACHINE LEARNING 10M Users 40 years of video/day HYPERSCALE SUITE GPU REST Engine GPU Accelerated FFmpeg Image Compute Engine 270M Items sold/day TESLA M40 43% on mobile devices POWERFUL: Fastest Deep Learning Performance TESLA M4 LOW POWER: Highest Hyperscale Throughput

8x Faster Caffe Performance TESLA M40 World s Fastest Accelerator for Deep Learning CPU Tesla M40 Reduce Training Time from 8 Days to 1 Day 0 1 2 3 4 5 6 7 8 9 # of Days CUDA Cores 3072 Peak SP 7 TFLOPS GDDR5 Memory Bandwidth Power 12 GB 288 GB/s 250W Caffe Benchmark: AlexNet training throughput based on 20 iterations, CPU: E5-2697v2 @ 2.70GHz. 64GB System Memory, CentOS 6.2

Video Processing Stabilization and Enhancements Image Processing Resize, Filter, Search, Auto-Enhance 4x 5x TESLA M4 Highest Throughput Hyperscale Workload Acceleration Video Transcode 2x H.264 & H.265, SD & HD Machine Learning Inference 2x CUDA Cores 1024 Peak SP 2.2 TFLOPS GDDR5 Memory Bandwidth Form Factor Power 4 GB 88 GB/s PCIe Low Profile 50 75 W Preliminary specifications. Subject to change.

TESLA FOR VISUALIZATION IRAY OPTIX INDEX VISUALIZATION TOOLS FOR HPC TESLA ACCELERATED COMPUTING

GROWING ADOPTION IN CLIMATE & WEATHER MeteoSwiss Deploys World s First Accelerated Weather Supercomputer 2x higher resolution for daily forecasts 14x more simulation with ensemble approach for medium range forecasts NOAA Chooses Tesla To Improve Weather Forecast Research Develop global model with 3km resolution, five-fold increase from today s resolution Improved resolution requires 40x higher in computational complexity

NEXT-GEN SUPERCOMPUTERS ARE GPU-ACCELERATED SIMULATION MACHINE LEARNING VISUALIZATION SUMMIT SIERRA U.S. Dept. of Energy Pre-Exascale Supercomputers for Science IBM Watson Breakthrough Natural Language Processing for Cognitive Computing NOAA New Supercomputer for Next-Gen Weather Forecasting TESLA ACCELERATED COMPUTING

World s first in-situ weather simulation, running on Meteoswiss supercomputer Analyze quantum effects in nanowire using 10,800 GPUs on TITAN ENTERPRISE supercomputer Simulation of deadly tornado that hit El Reno, Oklahoma on May 24, 2011 Predicting drug reactions for personalized medicine with GPU-powered IBM Spark ACCELERATED SCIENCE AND DATA ANALYTICS ON DISPLAY AT SC 15