TESLA PLATFORM. Jan 2018

Size: px
Start display at page:

Download "TESLA PLATFORM. Jan 2018"

Transcription

1 TESLA PLATFORM Jan 2018

2 A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices MOBILE-CLOUD iphone, Amazon AWS 2.5 billion mobile users PC INTERNET WinTel, Yahoo! 1 billion PC users

3 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 3

4 RISE OF GPU COMPUTING APPLICATIONS GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS X per year SYSTEMS 10 4 CUDA ARCHITECTURE X per year 10 2 Single-threaded perf Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for by K. Rupp 4

5 ELEVEN YEARS OF GPU COMPUTING World s First Atomic Model of HIV Capsid GPU-Trained AI Machine Beats World Champion in Go Oak Ridge Deploys World s Fastest Supercomputer w/ GPUs Fermi: World s First HPC GPU AlexNet beats expert code by huge margin using GPUs Stanford Builds AI Machine using GPUs Google Outperforms Humans in ImageNet Top 13 Greenest Supercomputers Powered by NVIDIA GPUs CUDA Launched World s First GPU Top500 System Discovered How H1N1 Mutates to Resist Drugs World s First 3-D Mapping of Human Genome

6 TESLA PLATFORM World s Leading Data Center Platform for Accelerating HPC and AI APPLICATIONS Automotive Retail INTERNET SERVICES Healthcare Manufacturing Finance ENTERPRISE APPLICATIONS Defense HPC +450 Applications INDUSTRY FRAMEWORKS & TOOLS FRAMEWORKS ECOSYSTEM TOOLS NVIDIA SDK cudnn TensorRT NCCL cublas cusparse DeepStream SDK CUDA C/C++ FORTRAN DEEP LEARNING SDK COMPUTEWORKS TESLA GPU & SYSTEMS TESLA GPU NVIDIA DGX-1 NVIDIA HGX-1 SYSTEM OEM CLOUD 6

7 MOST ADOPTED PLATFORM FOR ACCELERATING HPC All Top 15 HPC Apps Accelerated 45, , VASP AMBER NAMD GROMACS Gaussian Simulia Abaqus WRF OpenFOAM ANSYS LS-DYNA BLAST LAMMPS ANSYS Fluent Quantum Espresso GAMESS OAK RIDGE SUMMIT US s next fastest supercomputer 200+ Petaflop HPC; 3+ Exaflop of AI ABCI Supercomputer (AIST) Japan s fastest AI supercomputer Piz Daint Europe s fastest supercomputer 14X GPU DEVELOPERS 500+ GPU-ACCELERATED APPLICATIONS DEFINING THE NEXT GIANT WAVE IN HPC 7

8 MOST ADOPTED PLATFORM FOR ACCELERATING AI ,637 Cloud Services Systems Desktops 25X COMPANIES ENGAGED EVERY DEEP LEARNING FRAMEWORK ACCELERATED AVAILABLE EVERYWHERE 8

9 TESLA PLATFORM FOR HPC 9

10 ns/day BIG INEFFICIENCIES WITH CPU NODES Single GPU ARCHITECTING Server 3.5x Faster MODERN than the DATACENTERS Largest CPU Data Center AMBER Simulation of CRISPR, Nature s Tool for Genome Editing Node with 4x V100 GPUs # of CPUs 48 CPU Nodes Comet Supercomputer AMBER 16 Pre-release, CRSPR based on PDB ID 5f9r, 336,898 atoms CPU: Dual Socket Intel E5-2680v3 12 cores, 128 GB DDR4 per node, FDR IB 10

11 WEAK NODES Lots of Nodes Interconnected with Vast Network Overhead STRONG NODES Few Lightning-Fast Nodes with Performance of Hundreds of Weak Nodes Network Fabric Server Racks 11

12 ARCHITECTING MODERN DATACENTERS Strong Core CPU for Sequential code Volta 5,120 CUDA Cores NVLink for Strong Scaling 125 TFLOPS Tensor Core 12

13 70% OF THE WORLD S SUPERCOMPUTING WORKLOAD ACCELERATED VASP AMBER NAMD GROMACS Gaussian Simulia Abaqus WRF OpenFOAM ANSYS LS-DYNA BLAST LAMMPS ANSYS Fluent Quantum Espresso GAMESS Top 15 HPC Applications 500+ Accelerated Applications Intersect360 Research, Nov 2017 HPC Application Support for GPU Computing 13

14 GPU-ACCELERATED HPC APPLICATIONS 500+ APPLICATIONS LIFE SCIENCES MFG, CAD, & CAE PHYSICS OIL & GAS CLIMATE & WEATHER DEEP LEARNING 50+ app Including: Gaussian VASP AMBER HOOMD- Blue GAMESS 111 apps Including: Ansys Fluent Abaqus SIMULIA AutoCAD CST Studio Suite 20 apps Including: QUDA MILC GTC-P 17 apps Including: RTM SPECFEM 3D 4 apps Including: Cosmos Gales WRF 32 apps Including: Caffe2 MXNet Tensorflow MEDIA & ENT. FEDERAL & DEFENSE DATA SCI. & ANALYTICS SAFETY & SECURITY COMP. FINANCE TOOLS & MGMT. 142 apps Including: DaVinci Resolve Premiere Pro CC Redshift Renderer 13 apps Including: ArcGIS Pro EVNI SocetGXP 23 apps Including: MapD Kinetica Graphistry 15 apps Including: Cyllance FaceControl Syndex Pro 16 apps Including: O-Quant Options Pricing MUREX MISYS 15 apps Including: Bright Cluster Manager HPCtoolkit Vampir 14

15 DEEP LEARNING COMES TO HPC NEW DATA TRAINING SET REGRESSION SET NEW DATA SIMULATION (FP64/FP32) TRAINING (FP32/FP16) REGRESSION TESTING (FP16/INT8) INFERENCE (FP16/INT8) ERRORS 15

16 AI ACCELERATES SCIENCE AI ACCELERATES SCIENTIFIC DISCOVERY UIUC & NCSA: ASTROPHYSICS 5,000X LIGO Signal Processing U. FLORIDA & UNC: DRUG DISCOVERY 300,000X Molecular Energetics Prediction SLAC: ASTROPHYSICS Gravitational Lensing: From Weeks to 10ms PRINCETON & ITER: CLEAN ENERGY 50% Higher Accuracy for Fusion Sustainment U.S. DoE: PARTICLE PHYSICS 33% More Accurate Neutrino Detection U. PITT: DRUG DISCOVERY 35% Higher Accuracy for Protein Scoring 16

17 ONE PLATFORM BUILT FOR BOTH DATA SCIENCE & COMPUTATIONAL SCIENCE CUDA Tesla Platform Accelerating AI Accelerating HPC 17

18 DRAMATICALLY MORE FOR YOUR MONEY Save Up To $8M With Each GPU-Accelerated Rack EQUAL THROUGHPUT WITH FEWER RACKS BUDGET: SMALLER, EFFICIENT 1 RACK ($0.8M) 36 CPUs + 72 V100s Compute Servers, 85% Non-Compute 15% 5 RACKS ($2.0M) RTM 360 CPUs 14 RACKS ($6.0M) 22 RACKS ($9.2M) VASP ResNet-50 (DL Training) 1152 CPUs 1764 CPUs Compute Servers, 39% Rack, Cabling Infrastructure Noncompute, 61% Networking # of Racks (~30 KW Per Rack) 18 Source: Traditional Data Centers Cost model by Microsoft Research on Datacenter Costs

19 DATA CENTER SAVINGS FOR MIXED WORKLOADS 5X Better HPC TCO for Same Throughput SAME THROUGHPUT 1/3 THE COST 1/4 THE SPACE 1/5 THE POWER MIXED WORKLOAD: Materials Science (VASP) Life Sciences (AMBER) Physics (MILC) Deep Learning (ResNet-50) MIXED WORKLOAD: Materials Science (VASP) Life Sciences (AMBER) Physics (MILC) Deep Learning (ResNet-50) 12 Accelerated Servers w/4 V100 GPUs 20 KWatts 160 Self-hosted Servers 96 KWatts 19

20 TESLA V100 The Fastest and Most Productive GPU for AI and HPC Volta Architecture Tensor Core Improved NVLink & HBM2 Volta MPS Improved SIMT Model Most Productive GPU 125 Programmable TFLOPS Deep Learning Efficient Bandwidth Inference Utilization New Algorithms 20

21 VOLTA TO FUEL SUMMIT Next Milestone In AI Supercomputing AI Exascale Today Performance Leadership Accelerated Science ACME 200 PF DIRAC FLASH GTC HACC LSDALTON NAMD 20 PF NUCCOR NWCHEM QMCPACK RAPTOR SPECFEM XGC 3+EFLOPS Tensor Ops 10X Perf Over Titan 5-10X Application Perf Over Titan 21

22 GFLOPS per Watt BREAKTHROUGH EFFICIENCY ON THE PATH TO EXASCALE 13/13 Greenest Supercomputers Powered by Tesla P100 Ahead Of The Curve 35 TSUBAME 3.0 Kukai AIST AI Cloud RAIDEN GPU subsystem Piz Daint Wilkes-2 GOSAT-2 (RCF2) DGX Saturn V Reedbush-H JADE Facebook Cluster Cedar DAVIDE Eurotech Aurora K Tsubame- KFC K20X 5.3 Tsubame- KFC K SaturnV P100 V Tsubame 3 P GF/W Exascale Goal Top GPU Systems in Green500 List with measured performance and NVIDIA Projections for V100 22

23 POWER OF GPU COMPUTING PLATFORM Delivered Value Grows Over Time AMBER Performance (ns/ day) GoogleNet Performance (i/s) AMBER 16 CUDA cudnn 7 CUDA 9 NCCL 2 40 AMBER 16 CUDA AMBER 12 CUDA 4 AMBER 14 CUDA 4 AMBER 14 CUDA cudnn 2 CUDA 6 cudnn 4 CUDA 7 cudnn 6 CUDA 8 NCCL K20 (2013) K40 (2014) K80 (2015) P100 (2016) V100 (2017) 0 8X K80 (2014) 8X MAXWELL (2015) DGX-1 (2016) DGX-1V (2017) Amber dataset: Cellulose NVE; GoogLeNet dataset: Imagenet 23

24 TESLA PLATFORM FOR AI 24

25 AI REVOLUTIONIZING OUR WORLD Search, Assistants, Translation, Recommendations, Shopping, Photos Detect, Diagnose and Treat Diseases Powering Breakthroughs in Agriculture, Manufacturing, EDA 25

26 NEURAL NETWORK COMPLEXITY IS EXPLODING Bigger and More Compute Intensive 350X Inception-v4 30X DeepSpeech 3 10X MoE GNMT AlexNet GoogleNet ResNet-50 Inception-v2 DeepSpeech DeepSpeech 2 OpenNMT Image (GOP * Bandwidth) Speech (GOP * Bandwidth) Translation (GOP * Bandwidth)

27 PLATFORM BUILT FOR AI Delivering 125 TFLOPS of DL Performance with Volta TENSOR CORE TENSOR CORE MATRIX DATA OPTIMIZATION: Dense Matrix of Tensor Compute TENSOR-OP CONVERSION: FP32 to Tensor Op Data for Frameworks VOLTA-OPTIMIZED cudnn VOLTA TENSOR CORE 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized For Deep Learning ALL MAJOR FRAMEWORKS 27

28 GPU DEEP LEARNING IS A NEW COMPUTING MODEL Billions of Trillions of Operations GPU train larger models, accelerate time to market Training Datacenter TRAINING Device 28

29 Speedup vs K80 REVOLUTIONARY AI PERFORMANCE 3X Faster DL Training Performance Exponential Performance over time (GoogleNet) Relative Time to Train Improvements (LSTM) 100x 80x 8x V100 cudnn7 2X CPU 15 Days 60x 40x 8x P100 cudnn6 1X P Hours 20x 0x 1x K80 cudnn2 Q1 15 4x M40 cudnn3 Q3 15 Q2 16 Q2 17 1X V100 6 Hours Over 80X DL Training Performance in 3 Years GoogleNet Training Performance on versions of cudnn Vs 1x K80 cudnn2 3X Reduction in Time to Train Over P100 Neural Machine Translation Training for 13 Epochs German ->English, WMT15 subset CPU = 2x Xeon E V4 29

30 NVIDIA GPUS POWER WORLD S FASTEST DEEP LEARNING PERFORMANCE Time to Train 60 Mins Image of ResNet 50 network 48 Mins 15 Mins ( ) Facebook June '17 IBM Aug '17 Preferred Networks Nov ' Tesla P Tesla P Tesla P100 ResNet-50 ResNet-50 Dataset: Imagenet Trained for 90 Epochs 30

31 GPU DEEP LEARNING IS A NEW COMPUTING MODEL Training Datacenter 10s of billions of image, voice, video queries per day GPU inference for fast response, maximize datacenter throughput DATACENTER INFERENCING Device 31

32 NVIDIA TENSORRT PROGRAMMABLE INFERENCE ACCELERATOR TESLA P4 TensorRT JETSON TX2 DRIVE PX 2 NVIDIA DLA TESLA V100 32

33 Images/Sec (Target 7ms latency) Sentences/Sec (Target 200ms latency) NVIDIA TENSORRT 3 World s Fastest Inference Platform 6,000 ResNet-50 Throughput 600 OpenNMT Throughput 5, , ,000 2,000 14ms ms 1,000 7ms 7ms ms 117ms 0 CPU + TensorFlow V100 + TensorFlow V100 + TensorRT 0 CPU + Torch V100 + Torch V100 + TensorRT IMAGES TRANSLATION 33

34 NVIDIA PLATFORM SAVES DATA CENTER COSTS Game Changing Inference Performance SAME THROUGHPUT 1/4 THE SPACE 1/22 THE POWER INFERENCE WORKLOAD: Image recognition using Resnet 50 INFERENCE WORKLOAD: Image recognition using Resnet 50 1 HGX Server 45,000 images/sec 3 KWatts 160 CPU Servers 45,000 images/sec 65 KWatts Image recognition using Resnet-50 34

35 GPU-ACCELERATED INFERENCE iflytek SPEECH RECOGNITION VALOSSA VIDEO INTELLIGENCE MICROSOFT BING VISUAL SEARCH 35

36 TESLA PRODUCT FAMILY 36

37 END-TO-END PRODUCT FAMILY HYPERSCALE HPC STRONG-SCALE HPC MIXED-APPS HPC FULLY INTEGRATED SUPERCOMPUTER DGX Station Training & Inference - Tesla V100 Tesla V100 with NVLink Tesla V100 with PCI-E Most Efficient Inference & Transcoding - Tesla P4 DGX-1 Server Deep learning training & inference HPC and DL workloads scaling to multiple GPUs HPC workloads with mix of CPU and GPU workloads Fully integrated deep learning solution 37

38 OPTIMIZED FOR DATACENTER EFFICIENCY 30% More Performance in a Rack DL Perf / Watt Max Efficiency DL Perf Watts 75% Perf at Half the Power Max Performance MAXP Computer Vision 13 KW Rack 4 Nodes of 8xV100 1X ResNet-50 Rack Throughput ResNet-50 Training MAXQ Computer Vision 13 KW Rack 7 Nodes of 8xV X ResNet-50 Rack Throughput 38

39 TESLA V100 Core For NVLink Servers For PCIe Servers 5120 CUDA cores, 640 Tensor cores 5120 CUDA cores, 640 Tensor cores Compute 7.8 TF DP 15.7 TF SP 125 TF DL 7 TF DP 14 TF SP 112 TF DL Memory HBM2: 900 GB/s 16 GB HBM2: 900 GB/s 16 GB Interconnect NVLink (up to 300 GB/s) + PCIe Gen3 (up to 32 GB/s) PCIe Gen3 (up to 32 GB/s) Power 300W 250W Available Now Now 39

40 TESLA PLATFORM FOR CLOUD PROVIDERS 40

41 CLOUD GPU DEMAND OUTSTRIPS SUPPLY AWS Launches P2 Instance P2 instance is one of the fastest growing instance in AWS history. - Andrew Jassy, AWS CEO, re:invent 2016 Azure Launches N-Series Preview We ve had thousands of customers participate in the N-Series preview since we launched it back in August. - Corey Sanders, Director of Compute, Azure Q Q

42 GLOBAL CSP OFFERINGS Compute AWS P3 - up to 8X V100 SXM2 Available only in N. Virginia, Oregon, Ireland, Tokyo AWS P2 up to 8X K80 Physical cards ec2/instance-types/p3/ /ec2/instance-types/p2/ GPU Server - up to 4X K80 GPU Server - up to 4X P100 PCIe Public Beta available /gpu/ GPU Server - up to 2X K80, 1X P100 PCIe (In Bare-metal) oudcomputing/bluemix/gpucomputing NC series - up to 2X K80 NC v2 & ND series - up to 4X P100 PCIe/ 4X P40 Available only in US West 2 Region en-us/pricing/details/virtualmachines/series/#n- series X7 shape - up to 2X P100 (In Bare-metal and VM) Available only in Ashburn region. Frankfurt to come in Jan /infrastructure/compute Virtual W/S AWS G3 M60 GPU Server - P100 PCIe vws private alpha available GPU Server - P100 PCIe vws public beta Jan 18 GPU Server - up to 2X M60, 2X M10 GPU Server - M en-us/pricing/details/virtualmachines/series/#n-series GPU Server - M60 Virtual PC GPU Server - up to 4X K520 Physical cards GPU Server - M10 Vmware Horizon Air vpc launch Jan 42

43 NVIDIA GPU CLOUD AI and HPC Everywhere, For Everyone Innovate in minutes, not weeks Removes all the DIY complexity of DL and HPC software integration Cross platform Containers run locally on DGX Systems and TITAN PCs, or on cloud service provider GPU instances Always up to date Monthly updates by NVIDIA to ensure maximum performance NVIDIA GPU Cloud integrates GPU-optimized deep learning frameworks, HPC apps, runtimes, libraries, and OS into a ready-to-run container, available at no charge 43

44 NVIDIA GPU CLOUD SIMPLIFYING AI & HPC DEEP LEARNING HPC APPS HPC VIZ 44

45 NGC GPU-OPTIMIZED DEEP LEARNING CONTAINERS A Comprehensive Catalog of Deep Learning Software NVCaffe Caffe2 Microsoft Cognitive Toolkit (CNTK) DIGITS MXNet PyTorch TensorFlow Theano Torch CUDA (base level container for developers) NEW! NVIDIA TensorRT inference accelerator with ONNX support 45

46 HPC APPS COMING TO NVIDIA GPU CLOUD 46

47 NVIDIA GPU CLOUD FOR HPC VISUALIZATION U CLOUD FOR HPC VISUALIZATION UNIFIED VISUALIZATION FOR LARGE DATA SETS Large-scale Volumetric Rendering Physically Accurate Ray Tracing Production-quality Images Seamless integration with ParaView Early Access NOW Signup now at nvidia.com/gpu-cloud ParaView with NVIDIA IndeX ParaView with NVIDIA OptiX ParaView with NVIDIA Holodeck 47

48 TESLA PLATFORM FOR DEVELOPERS 48

49 49

50 HOW GPU ACCELERATION WORKS Application Code Compute-Intensive Functions GPU 5% of Code Rest of Sequential CPU Code CPU + 50

51 GPU ACCELERATED LIBRARIES Drop-in Acceleration for Your Applications DEEP LEARNING SIGNAL, IMAGE & VIDEO cudnn TensorRT DeepStream SDK cufft NVIDIA NPP CODEC SDK LINEAR ALGEBRA PARALLEL ALGORITHMS cublas cusparse CUDA Math library cusolver curand nvgraph NCCL 51

52 CUDA TOOLKIT 9 UNLEASHES POWER OF VOLTA Optimized for Volta: Tensor Cores Second-Generation NVLink HBM2 Stacked Memory FASTER LIBRARIES GEMM Optimizations for RNNs (cublas) >20x Faster Image Processing (NPP) FFT Optimizations Across Various Sizes (cufft) COOPERATIVE THREAD GROUPS DEVELOPER TOOLS & PLATFORM UPDATES Flexible Thread Groups Efficient Parallel Algorithms Synchronize Across Thread Blocks in a Single GPU or Multi-GPUs 1.3x Faster Compiling New OS and Compiler Support Unified Memory Profiling NVLink Visualization 52

53 WHAT IS OPENACC OpenACC is a directivesbased programming approach to parallel computing designed for performance and portability on CPUs and accelerators for HPC (OpenPOWER, Sunway, x86 CPU & Xeon Phi, NVIDIA GPU, PEZY-SC) Add Simple Compiler Directive main() { <serial code> #pragma acc kernels { <parallel code> } } Read more at 53

54 Speedup vs Single Haswell Core OPENACC: EASY ONBOARD TO GPU COMPUTING A Widely Adopted Directives Model for Parallel Programing POWER Sunway x86 CPU x86 Xeon Phi NVIDIA GPU AMD PEZY-SC AWE Hydrodynamics CloverLeaf mini-app (bm32 data set) x x PGI OpenACC Intel/IBM OpenMP 77x x 10x 11x 11x 0 Multicore Broadwell Multicore POWER8 1x 2x 4x Volta V100 3 of Top 5 HPC Apps: ANSYS Fluent, VASP, Gaussian 5 CAAR Codes: GTC, XGC, ACME, FLASH, LSDalton 2017 Gordon Bell Finalist: CAM-SE on TaihuLight SIMPLE. POWERFUL. PORTABLE. ADOPTED BY KEY HPC CODES 54

55 LSDalton Numeca PowerGrid INCOMP3D Quantum Chemistry 12X speedup in 1 week CFD 10X faster kernels 2X faster app Medical Imaging 40 days to 2 hours CFD 3X speedup NekCEM COSMO CloverLeaf MAESTRO CASTRO Computational Electromagnetics 2.5X speedup 60% less energy Climate Weather 40X speedup 3X energy efficiency CFD 4X speedup Single CPU/GPU code Astrophysics 4.4X speedup 4 weeks effort 55

56 OPENACC RESOURCES Guides Talks Tutorials Videos Books Spec Code Samples Teaching Materials Events Success Stories Courses Slack Stack Overflow Resources Success Stories FREE Compilers Compilers and Tools Events 56

57 NVIDIA DEEP LEARNING SDK High performance GPU-acceleration for deep learning Powerful tools and libraries for designing and deploying GPU-accelerated deep learning applications High performance building blocks for training and deploying deep neural networks on NVIDIA GPUs Industry vetted deep learning algorithms and linear algebra subroutines for developing novel deep neural networks Multi-GPU and multi-node scaling that accelerates training on up to eight GPU developer.nvidia.com/deep-learning-software We are amazed by the steady stream of improvements made to the NVIDIA Deep Learning SDK and the speedups that they deliver. Frédéric Bastien, Team Lead (Theano) MILA 57

58 Images/Second NVIDIA COLLECTIVECOMMUNICATIONS LIBRARY (NCCL) Multi-GPU and multi-node collective communication primitives High-performance multi-gpu and multi-node collective communication primitives optimized for NVIDIA GPUs Fast routines for multi-gpu multi-node acceleration that maximizes inter-gpu bandwidth utilization Easy to integrate and MPI compatible. Uses automatic topology detection to scale HPC and deep learning applications over PCIe and NVLink Accelerates leading deep learning frameworks such as Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and more Multi-GPU: NVLink, PCIe 8,000 7,000 6,000 5,000 4,000 3,000 2,000 1, Multi-Node: InfiniBand verbs, IP Sockets Automatic Topology Detection Near-Linear Multi-Node Scaling NCCL developer.nvidia.com/nccl Microsoft Cognitive Toolkit multi-node scaling performance (images/sec), NVIDIA DGX-1 + cudnn 6 (FP32), ResNet50, Batch size: 64 58

59 NVIDIA DIGITS Interactive Deep Learning GPU Training System Interactive deep learning training application for engineers and data scientists Simplify deep neural network training with an interactive interface to train and validate, and visualize results Built-in workflows for image classification, object detection and image segmentation Improve model accuracy with pre-trained models from the DIGITS Model Store Faster time to solution with multi-gpu acceleration developer.nvidia.com/digits 59

60 Images/Second NVIDIA cudnn Deep Learning Primitives High performance building blocks for deep learning frameworks Drop-in acceleration for widely used deep learning frameworks such as Caffe2, Microsoft Cognitive Toolkit, PyTorch, Tensorflow, Theano and others Accelerates industry vetted deep learning algorithms, such as convolutions, LSTM RNNs, fully connected, and pooling layers Fast deep learning training performance tuned for NVIDIA GPUs developer.nvidia.com/cudnn Deep Learning Training Performance 12,000 10,000 8,000 6,000 4,000 2,000 0 cudnn 2 cudnn 4 cudnn 6 NCCL 1.6 8x K80 8x Maxwell DGX-1 DGX-1V NVIDIA has improved the speed of cudnn with each release while extending the interface to more operations and devices at the same time. Evan Shelhamer, Lead Caffe Developer, UC Berkeley cudnn 7 NCCL 2 60

61 Layer & Tensor Fusion Weight & Activation Precision Calibration Kernel Auto-tuning NVIDIA TensorRT 3 Programmable Inference Accelerator TensorRT Compiler for Optimized Neural Networks Weight & Activation Precision Calibration Layer & Tensor Fusion Kernel Auto-Tuning Multi-Stream Execution Trained Neural Network Dynamic Tensor Memory Multi-Stream Execution Compiled & Optimized Neural Network 61

62

ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017

ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017 ACCELERATED COMPUTING: THE PATH FORWARD Jensen Huang, Founder & CEO SC17 Nov. 13, 2017 COMPUTING AFTER MOORE S LAW Tech Walker 40 Years of CPU Trend Data 10 7 GPU-Accelerated Computing 10 5 1.1X per year

More information

ENDURING DIFFERENTIATION. Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded

More information

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION Timothy Lanfear ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf

More information

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO

More information

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016 RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD

More information

MACHINE LEARNING WITH NVIDIA AND IBM POWER AI

MACHINE LEARNING WITH NVIDIA AND IBM POWER AI MACHINE LEARNING WITH NVIDIA AND IBM POWER AI July 2017 Joerg Krall Sr. Business Ddevelopment Manager MFG EMEA jkrall@nvidia.com A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices

More information

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded

More information

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence

More information

TESLA V100 PERFORMANCE GUIDE May 2018

TESLA V100 PERFORMANCE GUIDE May 2018 TESLA V100 PERFORMANCE GUIDE May 2018 TESLA V100 The Fastest and Most Productive GPU for AI and HPC Volta Architecture Tensor Core Improved NVLink & HBM2 Volta MPS Improved SIMT Model Most Productive GPU

More information

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60

More information

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017

POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

A NEW COMPUTING ERA. DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017

A NEW COMPUTING ERA. DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017 A NEW COMPUTING ERA DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X per year 10 3 1.5X per year Single-threaded

More information

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X

More information

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

NVIDIA PLATFORM FOR AI

NVIDIA PLATFORM FOR AI NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING

More information

Accelerating High Performance Computing.

Accelerating High Performance Computing. Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13 GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning

More information

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC

More information

NVIDIA GPU TECHNOLOGY UPDATE

NVIDIA GPU TECHNOLOGY UPDATE NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS

More information

DGX UPDATE. Customer Presentation Deck May 8, 2017

DGX UPDATE. Customer Presentation Deck May 8, 2017 DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated

More information

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,

More information

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017 DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most

More information

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain

More information

NEW NVIDIA PLATFORM FOR AI

NEW NVIDIA PLATFORM FOR AI NEW NVIDIA PLATFORM FOR AI Pedro Mario Cruz e Silva (pcruzesilva@nvidia.com) LinkedIn Solution Architect Manager Enterprise Latin America Global Oil & Gas Team "GTC 2017: 'I AM AI' OPENING IN KEYNOTE"

More information

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr 19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME

More information

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect

More information

DEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA

DEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA DEEP LEARNING ALISON B LOWNDES Deep Learning Solutions Architect & Community Manager EMEA 1 THE GPU-ACCELERATED WORLD HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 3 Why is Deep Learning

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

IBM Deep Learning Solutions

IBM Deep Learning Solutions IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle

More information

The Exascale Era Has Arrived

The Exascale Era Has Arrived Technology Spotlight The Exascale Era Has Arrived Sponsored by NVIDIA Steve Conway, Earl Joseph, Bob Sorensen, and Alex Norton November 2018 EXECUTIVE SUMMARY Earlier this year, scientists broke the exascale

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC and hyperscale customers deploy accelerated

More information

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC enterprise and hyperscale customers deploy

More information

Deep Learning mit PowerAI - Ein Überblick

Deep Learning mit PowerAI - Ein Überblick Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s

More information

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable

More information

GTC Jensen Huang Founder & CEO

GTC Jensen Huang Founder & CEO GTC 2018 Jensen Huang Founder & CEO 2 3 4 SCREEN-SPACE AMBIENT OCCLUSION BAKED LIGHTING 5 GLOBAL ILLUMINATION 6 SCREEN-SPACE REFLECTIONS ENVIRONMENT MAPS 7 RAY TRACED REFLECTIONS 8 SCREEN-SPACE REFRACTION

More information

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,

More information

GPU-Accelerated Deep Learning

GPU-Accelerated Deep Learning GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,

More information

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016 SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering

More information

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid

More information

April 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,

April 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27, April 4-7, 2016 Silicon Valley INSIDE PASCAL Mark Harris, October 27, 2016 @harrism INTRODUCING TESLA P100 New GPU Architecture CPU to CPUEnable the World s Fastest Compute Node PCIe Switch PCIe Switch

More information

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Deep Learning: Transforming Engineering and Science The MathWorks, Inc. Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA

More information

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics

More information

S8901 Quadro for AI, VR and Simulation

S8901 Quadro for AI, VR and Simulation S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more

More information

NVIDIA DEEP LEARNING INSTITUTE

NVIDIA DEEP LEARNING INSTITUTE NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC. Stefan Maintz, Dr. Markus Wetzstein

STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC. Stefan Maintz, Dr. Markus Wetzstein STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC Stefan Maintz, Dr. Markus Wetzstein smaintz@nvidia.com; mwetzstein@nvidia.com Companies Academia VASP USERS AND USAGE 12-25% of CPU cycles @ supercomputing

More information

CUDA: NEW AND UPCOMING FEATURES

CUDA: NEW AND UPCOMING FEATURES May 8-11, 2017 Silicon Valley CUDA: NEW AND UPCOMING FEATURES Stephen Jones, GTC 2018 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000 GTC ATTENDEES 8,000+ 2 CUDA

More information

HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab

HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab HPC and AI Solution Overview Garima Kochhar HPC and AI Innovation Lab 1 Dell EMC HPC and DL team charter Design, develop and integrate HPC and DL Heading systems Lorem ipsum dolor sit amet, consectetur

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Table of Contents: The Accelerated Data Center Optimizing Data Center Productivity Same Throughput with Fewer Server Nodes

More information

Accelerated Platforms: The Future of Computing. Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018

Accelerated Platforms: The Future of Computing. Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018 Accelerated Platforms: The Future of Computing Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018 Forces Shaping Computing 10 7 10 6 10 5 GPU PERFORMANCE CPU PERFORMANCE

More information

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017 Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC

More information

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Steve Scott, Tesla CTO SC 11 November 15, 2011

Steve Scott, Tesla CTO SC 11 November 15, 2011 Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost

More information

Stan Posey, NVIDIA, Santa Clara, CA, USA

Stan Posey, NVIDIA, Santa Clara, CA, USA Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with

More information

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015 PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and

More information

The Tesla Accelerated Computing Platform

The Tesla Accelerated Computing Platform The Tesla Accelerated Computing Platform Axel Koehler, Principal Solution Architect HPC Advisory Council Meeting Lugano 22 March 2016 Introduction TESLA Platform for HPC Agenda TESLA Platform for HYPERSCALE

More information

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA, S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print

More information

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

NVIDIA FOR DEEP LEARNING. Bill Veenhuis NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA

More information

Cisco UCS C480 ML M5 Rack Server Performance Characterization

Cisco UCS C480 ML M5 Rack Server Performance Characterization White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

HPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015

HPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015 HPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015 Accelerators Surge in World s Top Supercomputers 125 100 75 Top500: # of Accelerated Supercomputers 100+ accelerated systems

More information

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017 INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and

More information

INTRODUCING THE DGX FAMILY. Marc Domenech May 8, 2017

INTRODUCING THE DGX FAMILY. Marc Domenech May 8, 2017 INTRODUCING THE DGX FAMILY Marc Domenech May 8, 2017 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering AI computing

More information

TESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ

TESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ TESLA ACCELERATED COMPUTING Mike Wang Solutions Architect NVIDIA Australia & NZ mikewang@nvidia.com GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER

More information

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int

More information

VSC Users Day 2018 Start to GPU Ehsan Moravveji

VSC Users Day 2018 Start to GPU Ehsan Moravveji Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally

More information

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance

More information

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?

More information

Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies

Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU

NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU WP-08608-001_v1.1 August 2017 WP-08608-001_v1.1 TABLE OF CONTENTS Introduction to the NVIDIA Tesla V100 GPU Architecture...

More information

NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY. Peter Messmer

NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY. Peter Messmer NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY Peter Messmer pmessmer@nvidia.com COMPUTATIONAL CHALLENGES IN HEP Low-Level Trigger High-Level Trigger Monte Carlo Analysis Lattice QCD 2 COMPUTATIONAL

More information

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU

More information

IBM Power AC922 Server

IBM Power AC922 Server IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated

More information

CafeGPI. Single-Sided Communication for Scalable Deep Learning

CafeGPI. Single-Sided Communication for Scalable Deep Learning CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks

More information

GPUS FOR NGVLA. M Clark, April 2015

GPUS FOR NGVLA. M Clark, April 2015 S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40

More information

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Paving the Path to Exascale November 2017 Mellanox Accelerates Leading HPC and AI Systems Summit CORAL System Sierra CORAL System Fastest Supercomputer in Japan Fastest Supercomputer

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

NVIDIA DEEP LEARNING PLATFORM

NVIDIA DEEP LEARNING PLATFORM TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance and Efficiency for AI Services, From the Data Center to the Network s Edge Introduction Artificial intelligence (AI), the dream

More information

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University Introduction to High Performance Computing Shaohao Chen Research Computing Services (RCS) Boston University Outline What is HPC? Why computer cluster? Basic structure of a computer cluster Computer performance

More information

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13 Introduction to GPU Computing chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 GPU and Its Application 3 Ways to Develop Your GPU APP An Example to Show the Developments Add GPUs: Accelerate Science

More information

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade

More information

INVESTOR UPDATE. September 2018

INVESTOR UPDATE. September 2018 INVESTOR UPDATE September 2018 SAFE HARBOR Forward-Looking Statements Except for the historical information contained herein, certain matters in this presentation including, but not limited to, statements

More information