TESLA PLATFORM. Jan 2018
|
|
- Shanon Green
- 5 years ago
- Views:
Transcription
1 TESLA PLATFORM Jan 2018
2 A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices MOBILE-CLOUD iphone, Amazon AWS 2.5 billion mobile users PC INTERNET WinTel, Yahoo! 1 billion PC users
3 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 3
4 RISE OF GPU COMPUTING APPLICATIONS GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS X per year SYSTEMS 10 4 CUDA ARCHITECTURE X per year 10 2 Single-threaded perf Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for by K. Rupp 4
5 ELEVEN YEARS OF GPU COMPUTING World s First Atomic Model of HIV Capsid GPU-Trained AI Machine Beats World Champion in Go Oak Ridge Deploys World s Fastest Supercomputer w/ GPUs Fermi: World s First HPC GPU AlexNet beats expert code by huge margin using GPUs Stanford Builds AI Machine using GPUs Google Outperforms Humans in ImageNet Top 13 Greenest Supercomputers Powered by NVIDIA GPUs CUDA Launched World s First GPU Top500 System Discovered How H1N1 Mutates to Resist Drugs World s First 3-D Mapping of Human Genome
6 TESLA PLATFORM World s Leading Data Center Platform for Accelerating HPC and AI APPLICATIONS Automotive Retail INTERNET SERVICES Healthcare Manufacturing Finance ENTERPRISE APPLICATIONS Defense HPC +450 Applications INDUSTRY FRAMEWORKS & TOOLS FRAMEWORKS ECOSYSTEM TOOLS NVIDIA SDK cudnn TensorRT NCCL cublas cusparse DeepStream SDK CUDA C/C++ FORTRAN DEEP LEARNING SDK COMPUTEWORKS TESLA GPU & SYSTEMS TESLA GPU NVIDIA DGX-1 NVIDIA HGX-1 SYSTEM OEM CLOUD 6
7 MOST ADOPTED PLATFORM FOR ACCELERATING HPC All Top 15 HPC Apps Accelerated 45, , VASP AMBER NAMD GROMACS Gaussian Simulia Abaqus WRF OpenFOAM ANSYS LS-DYNA BLAST LAMMPS ANSYS Fluent Quantum Espresso GAMESS OAK RIDGE SUMMIT US s next fastest supercomputer 200+ Petaflop HPC; 3+ Exaflop of AI ABCI Supercomputer (AIST) Japan s fastest AI supercomputer Piz Daint Europe s fastest supercomputer 14X GPU DEVELOPERS 500+ GPU-ACCELERATED APPLICATIONS DEFINING THE NEXT GIANT WAVE IN HPC 7
8 MOST ADOPTED PLATFORM FOR ACCELERATING AI ,637 Cloud Services Systems Desktops 25X COMPANIES ENGAGED EVERY DEEP LEARNING FRAMEWORK ACCELERATED AVAILABLE EVERYWHERE 8
9 TESLA PLATFORM FOR HPC 9
10 ns/day BIG INEFFICIENCIES WITH CPU NODES Single GPU ARCHITECTING Server 3.5x Faster MODERN than the DATACENTERS Largest CPU Data Center AMBER Simulation of CRISPR, Nature s Tool for Genome Editing Node with 4x V100 GPUs # of CPUs 48 CPU Nodes Comet Supercomputer AMBER 16 Pre-release, CRSPR based on PDB ID 5f9r, 336,898 atoms CPU: Dual Socket Intel E5-2680v3 12 cores, 128 GB DDR4 per node, FDR IB 10
11 WEAK NODES Lots of Nodes Interconnected with Vast Network Overhead STRONG NODES Few Lightning-Fast Nodes with Performance of Hundreds of Weak Nodes Network Fabric Server Racks 11
12 ARCHITECTING MODERN DATACENTERS Strong Core CPU for Sequential code Volta 5,120 CUDA Cores NVLink for Strong Scaling 125 TFLOPS Tensor Core 12
13 70% OF THE WORLD S SUPERCOMPUTING WORKLOAD ACCELERATED VASP AMBER NAMD GROMACS Gaussian Simulia Abaqus WRF OpenFOAM ANSYS LS-DYNA BLAST LAMMPS ANSYS Fluent Quantum Espresso GAMESS Top 15 HPC Applications 500+ Accelerated Applications Intersect360 Research, Nov 2017 HPC Application Support for GPU Computing 13
14 GPU-ACCELERATED HPC APPLICATIONS 500+ APPLICATIONS LIFE SCIENCES MFG, CAD, & CAE PHYSICS OIL & GAS CLIMATE & WEATHER DEEP LEARNING 50+ app Including: Gaussian VASP AMBER HOOMD- Blue GAMESS 111 apps Including: Ansys Fluent Abaqus SIMULIA AutoCAD CST Studio Suite 20 apps Including: QUDA MILC GTC-P 17 apps Including: RTM SPECFEM 3D 4 apps Including: Cosmos Gales WRF 32 apps Including: Caffe2 MXNet Tensorflow MEDIA & ENT. FEDERAL & DEFENSE DATA SCI. & ANALYTICS SAFETY & SECURITY COMP. FINANCE TOOLS & MGMT. 142 apps Including: DaVinci Resolve Premiere Pro CC Redshift Renderer 13 apps Including: ArcGIS Pro EVNI SocetGXP 23 apps Including: MapD Kinetica Graphistry 15 apps Including: Cyllance FaceControl Syndex Pro 16 apps Including: O-Quant Options Pricing MUREX MISYS 15 apps Including: Bright Cluster Manager HPCtoolkit Vampir 14
15 DEEP LEARNING COMES TO HPC NEW DATA TRAINING SET REGRESSION SET NEW DATA SIMULATION (FP64/FP32) TRAINING (FP32/FP16) REGRESSION TESTING (FP16/INT8) INFERENCE (FP16/INT8) ERRORS 15
16 AI ACCELERATES SCIENCE AI ACCELERATES SCIENTIFIC DISCOVERY UIUC & NCSA: ASTROPHYSICS 5,000X LIGO Signal Processing U. FLORIDA & UNC: DRUG DISCOVERY 300,000X Molecular Energetics Prediction SLAC: ASTROPHYSICS Gravitational Lensing: From Weeks to 10ms PRINCETON & ITER: CLEAN ENERGY 50% Higher Accuracy for Fusion Sustainment U.S. DoE: PARTICLE PHYSICS 33% More Accurate Neutrino Detection U. PITT: DRUG DISCOVERY 35% Higher Accuracy for Protein Scoring 16
17 ONE PLATFORM BUILT FOR BOTH DATA SCIENCE & COMPUTATIONAL SCIENCE CUDA Tesla Platform Accelerating AI Accelerating HPC 17
18 DRAMATICALLY MORE FOR YOUR MONEY Save Up To $8M With Each GPU-Accelerated Rack EQUAL THROUGHPUT WITH FEWER RACKS BUDGET: SMALLER, EFFICIENT 1 RACK ($0.8M) 36 CPUs + 72 V100s Compute Servers, 85% Non-Compute 15% 5 RACKS ($2.0M) RTM 360 CPUs 14 RACKS ($6.0M) 22 RACKS ($9.2M) VASP ResNet-50 (DL Training) 1152 CPUs 1764 CPUs Compute Servers, 39% Rack, Cabling Infrastructure Noncompute, 61% Networking # of Racks (~30 KW Per Rack) 18 Source: Traditional Data Centers Cost model by Microsoft Research on Datacenter Costs
19 DATA CENTER SAVINGS FOR MIXED WORKLOADS 5X Better HPC TCO for Same Throughput SAME THROUGHPUT 1/3 THE COST 1/4 THE SPACE 1/5 THE POWER MIXED WORKLOAD: Materials Science (VASP) Life Sciences (AMBER) Physics (MILC) Deep Learning (ResNet-50) MIXED WORKLOAD: Materials Science (VASP) Life Sciences (AMBER) Physics (MILC) Deep Learning (ResNet-50) 12 Accelerated Servers w/4 V100 GPUs 20 KWatts 160 Self-hosted Servers 96 KWatts 19
20 TESLA V100 The Fastest and Most Productive GPU for AI and HPC Volta Architecture Tensor Core Improved NVLink & HBM2 Volta MPS Improved SIMT Model Most Productive GPU 125 Programmable TFLOPS Deep Learning Efficient Bandwidth Inference Utilization New Algorithms 20
21 VOLTA TO FUEL SUMMIT Next Milestone In AI Supercomputing AI Exascale Today Performance Leadership Accelerated Science ACME 200 PF DIRAC FLASH GTC HACC LSDALTON NAMD 20 PF NUCCOR NWCHEM QMCPACK RAPTOR SPECFEM XGC 3+EFLOPS Tensor Ops 10X Perf Over Titan 5-10X Application Perf Over Titan 21
22 GFLOPS per Watt BREAKTHROUGH EFFICIENCY ON THE PATH TO EXASCALE 13/13 Greenest Supercomputers Powered by Tesla P100 Ahead Of The Curve 35 TSUBAME 3.0 Kukai AIST AI Cloud RAIDEN GPU subsystem Piz Daint Wilkes-2 GOSAT-2 (RCF2) DGX Saturn V Reedbush-H JADE Facebook Cluster Cedar DAVIDE Eurotech Aurora K Tsubame- KFC K20X 5.3 Tsubame- KFC K SaturnV P100 V Tsubame 3 P GF/W Exascale Goal Top GPU Systems in Green500 List with measured performance and NVIDIA Projections for V100 22
23 POWER OF GPU COMPUTING PLATFORM Delivered Value Grows Over Time AMBER Performance (ns/ day) GoogleNet Performance (i/s) AMBER 16 CUDA cudnn 7 CUDA 9 NCCL 2 40 AMBER 16 CUDA AMBER 12 CUDA 4 AMBER 14 CUDA 4 AMBER 14 CUDA cudnn 2 CUDA 6 cudnn 4 CUDA 7 cudnn 6 CUDA 8 NCCL K20 (2013) K40 (2014) K80 (2015) P100 (2016) V100 (2017) 0 8X K80 (2014) 8X MAXWELL (2015) DGX-1 (2016) DGX-1V (2017) Amber dataset: Cellulose NVE; GoogLeNet dataset: Imagenet 23
24 TESLA PLATFORM FOR AI 24
25 AI REVOLUTIONIZING OUR WORLD Search, Assistants, Translation, Recommendations, Shopping, Photos Detect, Diagnose and Treat Diseases Powering Breakthroughs in Agriculture, Manufacturing, EDA 25
26 NEURAL NETWORK COMPLEXITY IS EXPLODING Bigger and More Compute Intensive 350X Inception-v4 30X DeepSpeech 3 10X MoE GNMT AlexNet GoogleNet ResNet-50 Inception-v2 DeepSpeech DeepSpeech 2 OpenNMT Image (GOP * Bandwidth) Speech (GOP * Bandwidth) Translation (GOP * Bandwidth)
27 PLATFORM BUILT FOR AI Delivering 125 TFLOPS of DL Performance with Volta TENSOR CORE TENSOR CORE MATRIX DATA OPTIMIZATION: Dense Matrix of Tensor Compute TENSOR-OP CONVERSION: FP32 to Tensor Op Data for Frameworks VOLTA-OPTIMIZED cudnn VOLTA TENSOR CORE 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized For Deep Learning ALL MAJOR FRAMEWORKS 27
28 GPU DEEP LEARNING IS A NEW COMPUTING MODEL Billions of Trillions of Operations GPU train larger models, accelerate time to market Training Datacenter TRAINING Device 28
29 Speedup vs K80 REVOLUTIONARY AI PERFORMANCE 3X Faster DL Training Performance Exponential Performance over time (GoogleNet) Relative Time to Train Improvements (LSTM) 100x 80x 8x V100 cudnn7 2X CPU 15 Days 60x 40x 8x P100 cudnn6 1X P Hours 20x 0x 1x K80 cudnn2 Q1 15 4x M40 cudnn3 Q3 15 Q2 16 Q2 17 1X V100 6 Hours Over 80X DL Training Performance in 3 Years GoogleNet Training Performance on versions of cudnn Vs 1x K80 cudnn2 3X Reduction in Time to Train Over P100 Neural Machine Translation Training for 13 Epochs German ->English, WMT15 subset CPU = 2x Xeon E V4 29
30 NVIDIA GPUS POWER WORLD S FASTEST DEEP LEARNING PERFORMANCE Time to Train 60 Mins Image of ResNet 50 network 48 Mins 15 Mins ( ) Facebook June '17 IBM Aug '17 Preferred Networks Nov ' Tesla P Tesla P Tesla P100 ResNet-50 ResNet-50 Dataset: Imagenet Trained for 90 Epochs 30
31 GPU DEEP LEARNING IS A NEW COMPUTING MODEL Training Datacenter 10s of billions of image, voice, video queries per day GPU inference for fast response, maximize datacenter throughput DATACENTER INFERENCING Device 31
32 NVIDIA TENSORRT PROGRAMMABLE INFERENCE ACCELERATOR TESLA P4 TensorRT JETSON TX2 DRIVE PX 2 NVIDIA DLA TESLA V100 32
33 Images/Sec (Target 7ms latency) Sentences/Sec (Target 200ms latency) NVIDIA TENSORRT 3 World s Fastest Inference Platform 6,000 ResNet-50 Throughput 600 OpenNMT Throughput 5, , ,000 2,000 14ms ms 1,000 7ms 7ms ms 117ms 0 CPU + TensorFlow V100 + TensorFlow V100 + TensorRT 0 CPU + Torch V100 + Torch V100 + TensorRT IMAGES TRANSLATION 33
34 NVIDIA PLATFORM SAVES DATA CENTER COSTS Game Changing Inference Performance SAME THROUGHPUT 1/4 THE SPACE 1/22 THE POWER INFERENCE WORKLOAD: Image recognition using Resnet 50 INFERENCE WORKLOAD: Image recognition using Resnet 50 1 HGX Server 45,000 images/sec 3 KWatts 160 CPU Servers 45,000 images/sec 65 KWatts Image recognition using Resnet-50 34
35 GPU-ACCELERATED INFERENCE iflytek SPEECH RECOGNITION VALOSSA VIDEO INTELLIGENCE MICROSOFT BING VISUAL SEARCH 35
36 TESLA PRODUCT FAMILY 36
37 END-TO-END PRODUCT FAMILY HYPERSCALE HPC STRONG-SCALE HPC MIXED-APPS HPC FULLY INTEGRATED SUPERCOMPUTER DGX Station Training & Inference - Tesla V100 Tesla V100 with NVLink Tesla V100 with PCI-E Most Efficient Inference & Transcoding - Tesla P4 DGX-1 Server Deep learning training & inference HPC and DL workloads scaling to multiple GPUs HPC workloads with mix of CPU and GPU workloads Fully integrated deep learning solution 37
38 OPTIMIZED FOR DATACENTER EFFICIENCY 30% More Performance in a Rack DL Perf / Watt Max Efficiency DL Perf Watts 75% Perf at Half the Power Max Performance MAXP Computer Vision 13 KW Rack 4 Nodes of 8xV100 1X ResNet-50 Rack Throughput ResNet-50 Training MAXQ Computer Vision 13 KW Rack 7 Nodes of 8xV X ResNet-50 Rack Throughput 38
39 TESLA V100 Core For NVLink Servers For PCIe Servers 5120 CUDA cores, 640 Tensor cores 5120 CUDA cores, 640 Tensor cores Compute 7.8 TF DP 15.7 TF SP 125 TF DL 7 TF DP 14 TF SP 112 TF DL Memory HBM2: 900 GB/s 16 GB HBM2: 900 GB/s 16 GB Interconnect NVLink (up to 300 GB/s) + PCIe Gen3 (up to 32 GB/s) PCIe Gen3 (up to 32 GB/s) Power 300W 250W Available Now Now 39
40 TESLA PLATFORM FOR CLOUD PROVIDERS 40
41 CLOUD GPU DEMAND OUTSTRIPS SUPPLY AWS Launches P2 Instance P2 instance is one of the fastest growing instance in AWS history. - Andrew Jassy, AWS CEO, re:invent 2016 Azure Launches N-Series Preview We ve had thousands of customers participate in the N-Series preview since we launched it back in August. - Corey Sanders, Director of Compute, Azure Q Q
42 GLOBAL CSP OFFERINGS Compute AWS P3 - up to 8X V100 SXM2 Available only in N. Virginia, Oregon, Ireland, Tokyo AWS P2 up to 8X K80 Physical cards ec2/instance-types/p3/ /ec2/instance-types/p2/ GPU Server - up to 4X K80 GPU Server - up to 4X P100 PCIe Public Beta available /gpu/ GPU Server - up to 2X K80, 1X P100 PCIe (In Bare-metal) oudcomputing/bluemix/gpucomputing NC series - up to 2X K80 NC v2 & ND series - up to 4X P100 PCIe/ 4X P40 Available only in US West 2 Region en-us/pricing/details/virtualmachines/series/#n- series X7 shape - up to 2X P100 (In Bare-metal and VM) Available only in Ashburn region. Frankfurt to come in Jan /infrastructure/compute Virtual W/S AWS G3 M60 GPU Server - P100 PCIe vws private alpha available GPU Server - P100 PCIe vws public beta Jan 18 GPU Server - up to 2X M60, 2X M10 GPU Server - M en-us/pricing/details/virtualmachines/series/#n-series GPU Server - M60 Virtual PC GPU Server - up to 4X K520 Physical cards GPU Server - M10 Vmware Horizon Air vpc launch Jan 42
43 NVIDIA GPU CLOUD AI and HPC Everywhere, For Everyone Innovate in minutes, not weeks Removes all the DIY complexity of DL and HPC software integration Cross platform Containers run locally on DGX Systems and TITAN PCs, or on cloud service provider GPU instances Always up to date Monthly updates by NVIDIA to ensure maximum performance NVIDIA GPU Cloud integrates GPU-optimized deep learning frameworks, HPC apps, runtimes, libraries, and OS into a ready-to-run container, available at no charge 43
44 NVIDIA GPU CLOUD SIMPLIFYING AI & HPC DEEP LEARNING HPC APPS HPC VIZ 44
45 NGC GPU-OPTIMIZED DEEP LEARNING CONTAINERS A Comprehensive Catalog of Deep Learning Software NVCaffe Caffe2 Microsoft Cognitive Toolkit (CNTK) DIGITS MXNet PyTorch TensorFlow Theano Torch CUDA (base level container for developers) NEW! NVIDIA TensorRT inference accelerator with ONNX support 45
46 HPC APPS COMING TO NVIDIA GPU CLOUD 46
47 NVIDIA GPU CLOUD FOR HPC VISUALIZATION U CLOUD FOR HPC VISUALIZATION UNIFIED VISUALIZATION FOR LARGE DATA SETS Large-scale Volumetric Rendering Physically Accurate Ray Tracing Production-quality Images Seamless integration with ParaView Early Access NOW Signup now at nvidia.com/gpu-cloud ParaView with NVIDIA IndeX ParaView with NVIDIA OptiX ParaView with NVIDIA Holodeck 47
48 TESLA PLATFORM FOR DEVELOPERS 48
49 49
50 HOW GPU ACCELERATION WORKS Application Code Compute-Intensive Functions GPU 5% of Code Rest of Sequential CPU Code CPU + 50
51 GPU ACCELERATED LIBRARIES Drop-in Acceleration for Your Applications DEEP LEARNING SIGNAL, IMAGE & VIDEO cudnn TensorRT DeepStream SDK cufft NVIDIA NPP CODEC SDK LINEAR ALGEBRA PARALLEL ALGORITHMS cublas cusparse CUDA Math library cusolver curand nvgraph NCCL 51
52 CUDA TOOLKIT 9 UNLEASHES POWER OF VOLTA Optimized for Volta: Tensor Cores Second-Generation NVLink HBM2 Stacked Memory FASTER LIBRARIES GEMM Optimizations for RNNs (cublas) >20x Faster Image Processing (NPP) FFT Optimizations Across Various Sizes (cufft) COOPERATIVE THREAD GROUPS DEVELOPER TOOLS & PLATFORM UPDATES Flexible Thread Groups Efficient Parallel Algorithms Synchronize Across Thread Blocks in a Single GPU or Multi-GPUs 1.3x Faster Compiling New OS and Compiler Support Unified Memory Profiling NVLink Visualization 52
53 WHAT IS OPENACC OpenACC is a directivesbased programming approach to parallel computing designed for performance and portability on CPUs and accelerators for HPC (OpenPOWER, Sunway, x86 CPU & Xeon Phi, NVIDIA GPU, PEZY-SC) Add Simple Compiler Directive main() { <serial code> #pragma acc kernels { <parallel code> } } Read more at 53
54 Speedup vs Single Haswell Core OPENACC: EASY ONBOARD TO GPU COMPUTING A Widely Adopted Directives Model for Parallel Programing POWER Sunway x86 CPU x86 Xeon Phi NVIDIA GPU AMD PEZY-SC AWE Hydrodynamics CloverLeaf mini-app (bm32 data set) x x PGI OpenACC Intel/IBM OpenMP 77x x 10x 11x 11x 0 Multicore Broadwell Multicore POWER8 1x 2x 4x Volta V100 3 of Top 5 HPC Apps: ANSYS Fluent, VASP, Gaussian 5 CAAR Codes: GTC, XGC, ACME, FLASH, LSDalton 2017 Gordon Bell Finalist: CAM-SE on TaihuLight SIMPLE. POWERFUL. PORTABLE. ADOPTED BY KEY HPC CODES 54
55 LSDalton Numeca PowerGrid INCOMP3D Quantum Chemistry 12X speedup in 1 week CFD 10X faster kernels 2X faster app Medical Imaging 40 days to 2 hours CFD 3X speedup NekCEM COSMO CloverLeaf MAESTRO CASTRO Computational Electromagnetics 2.5X speedup 60% less energy Climate Weather 40X speedup 3X energy efficiency CFD 4X speedup Single CPU/GPU code Astrophysics 4.4X speedup 4 weeks effort 55
56 OPENACC RESOURCES Guides Talks Tutorials Videos Books Spec Code Samples Teaching Materials Events Success Stories Courses Slack Stack Overflow Resources Success Stories FREE Compilers Compilers and Tools Events 56
57 NVIDIA DEEP LEARNING SDK High performance GPU-acceleration for deep learning Powerful tools and libraries for designing and deploying GPU-accelerated deep learning applications High performance building blocks for training and deploying deep neural networks on NVIDIA GPUs Industry vetted deep learning algorithms and linear algebra subroutines for developing novel deep neural networks Multi-GPU and multi-node scaling that accelerates training on up to eight GPU developer.nvidia.com/deep-learning-software We are amazed by the steady stream of improvements made to the NVIDIA Deep Learning SDK and the speedups that they deliver. Frédéric Bastien, Team Lead (Theano) MILA 57
58 Images/Second NVIDIA COLLECTIVECOMMUNICATIONS LIBRARY (NCCL) Multi-GPU and multi-node collective communication primitives High-performance multi-gpu and multi-node collective communication primitives optimized for NVIDIA GPUs Fast routines for multi-gpu multi-node acceleration that maximizes inter-gpu bandwidth utilization Easy to integrate and MPI compatible. Uses automatic topology detection to scale HPC and deep learning applications over PCIe and NVLink Accelerates leading deep learning frameworks such as Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and more Multi-GPU: NVLink, PCIe 8,000 7,000 6,000 5,000 4,000 3,000 2,000 1, Multi-Node: InfiniBand verbs, IP Sockets Automatic Topology Detection Near-Linear Multi-Node Scaling NCCL developer.nvidia.com/nccl Microsoft Cognitive Toolkit multi-node scaling performance (images/sec), NVIDIA DGX-1 + cudnn 6 (FP32), ResNet50, Batch size: 64 58
59 NVIDIA DIGITS Interactive Deep Learning GPU Training System Interactive deep learning training application for engineers and data scientists Simplify deep neural network training with an interactive interface to train and validate, and visualize results Built-in workflows for image classification, object detection and image segmentation Improve model accuracy with pre-trained models from the DIGITS Model Store Faster time to solution with multi-gpu acceleration developer.nvidia.com/digits 59
60 Images/Second NVIDIA cudnn Deep Learning Primitives High performance building blocks for deep learning frameworks Drop-in acceleration for widely used deep learning frameworks such as Caffe2, Microsoft Cognitive Toolkit, PyTorch, Tensorflow, Theano and others Accelerates industry vetted deep learning algorithms, such as convolutions, LSTM RNNs, fully connected, and pooling layers Fast deep learning training performance tuned for NVIDIA GPUs developer.nvidia.com/cudnn Deep Learning Training Performance 12,000 10,000 8,000 6,000 4,000 2,000 0 cudnn 2 cudnn 4 cudnn 6 NCCL 1.6 8x K80 8x Maxwell DGX-1 DGX-1V NVIDIA has improved the speed of cudnn with each release while extending the interface to more operations and devices at the same time. Evan Shelhamer, Lead Caffe Developer, UC Berkeley cudnn 7 NCCL 2 60
61 Layer & Tensor Fusion Weight & Activation Precision Calibration Kernel Auto-tuning NVIDIA TensorRT 3 Programmable Inference Accelerator TensorRT Compiler for Optimized Neural Networks Weight & Activation Precision Calibration Layer & Tensor Fusion Kernel Auto-Tuning Multi-Stream Execution Trained Neural Network Dynamic Tensor Memory Multi-Stream Execution Compiled & Optimized Neural Network 61
62
ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017
ACCELERATED COMPUTING: THE PATH FORWARD Jensen Huang, Founder & CEO SC17 Nov. 13, 2017 COMPUTING AFTER MOORE S LAW Tech Walker 40 Years of CPU Trend Data 10 7 GPU-Accelerated Computing 10 5 1.1X per year
More informationENDURING DIFFERENTIATION. Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded
More informationENDURING DIFFERENTIATION Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf
More informationGPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation
GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationMACHINE LEARNING WITH NVIDIA AND IBM POWER AI
MACHINE LEARNING WITH NVIDIA AND IBM POWER AI July 2017 Joerg Krall Sr. Business Ddevelopment Manager MFG EMEA jkrall@nvidia.com A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices
More informationA NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017
A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded
More informationSYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS
SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence
More informationTESLA V100 PERFORMANCE GUIDE May 2018
TESLA V100 PERFORMANCE GUIDE May 2018 TESLA V100 The Fastest and Most Productive GPU for AI and HPC Volta Architecture Tensor Core Improved NVLink & HBM2 Volta MPS Improved SIMT Model Most Productive GPU
More informationEFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationPOWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017
POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationA NEW COMPUTING ERA. DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017
A NEW COMPUTING ERA DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X per year 10 3 1.5X per year Single-threaded
More informationA NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA
A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X
More informationTESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications
TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationTESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications
TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationNVIDIA PLATFORM FOR AI
NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationGPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13
GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning
More informationTOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC
TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationDGX UPDATE. Customer Presentation Deck May 8, 2017
DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated
More informationObject recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK
Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,
More informationDEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017
DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X
More informationInference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationNEW NVIDIA PLATFORM FOR AI
NEW NVIDIA PLATFORM FOR AI Pedro Mario Cruz e Silva (pcruzesilva@nvidia.com) LinkedIn Solution Architect Manager Enterprise Latin America Global Oil & Gas Team "GTC 2017: 'I AM AI' OPENING IN KEYNOTE"
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationGPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester
NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect
More informationDEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA
DEEP LEARNING ALISON B LOWNDES Deep Learning Solutions Architect & Community Manager EMEA 1 THE GPU-ACCELERATED WORLD HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 3 Why is Deep Learning
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationThe Exascale Era Has Arrived
Technology Spotlight The Exascale Era Has Arrived Sponsored by NVIDIA Steve Conway, Earl Joseph, Bob Sorensen, and Alex Norton November 2018 EXECUTIVE SUMMARY Earlier this year, scientists broke the exascale
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationS INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS
S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC and hyperscale customers deploy accelerated
More informationS8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC enterprise and hyperscale customers deploy
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationDGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo
DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable
More informationGTC Jensen Huang Founder & CEO
GTC 2018 Jensen Huang Founder & CEO 2 3 4 SCREEN-SPACE AMBIENT OCCLUSION BAKED LIGHTING 5 GLOBAL ILLUMINATION 6 SCREEN-SPACE REFLECTIONS ENVIRONMENT MAPS 7 RAY TRACED REFLECTIONS 8 SCREEN-SPACE REFRACTION
More informationHPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov
HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,
More informationGPU-Accelerated Deep Learning
GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,
More informationNVIDIA DLI HANDS-ON TRAINING COURSE CATALOG
NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationSUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016
SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationApril 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,
April 4-7, 2016 Silicon Valley INSIDE PASCAL Mark Harris, October 27, 2016 @harrism INTRODUCING TESLA P100 New GPU Architecture CPU to CPUEnable the World s Fastest Compute Node PCIe Switch PCIe Switch
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationS8901 Quadro for AI, VR and Simulation
S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more
More informationNVIDIA DEEP LEARNING INSTITUTE
NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationSTRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC. Stefan Maintz, Dr. Markus Wetzstein
STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC Stefan Maintz, Dr. Markus Wetzstein smaintz@nvidia.com; mwetzstein@nvidia.com Companies Academia VASP USERS AND USAGE 12-25% of CPU cycles @ supercomputing
More informationCUDA: NEW AND UPCOMING FEATURES
May 8-11, 2017 Silicon Valley CUDA: NEW AND UPCOMING FEATURES Stephen Jones, GTC 2018 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000 GTC ATTENDEES 8,000+ 2 CUDA
More informationHPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab
HPC and AI Solution Overview Garima Kochhar HPC and AI Innovation Lab 1 Dell EMC HPC and DL team charter Design, develop and integrate HPC and DL Heading systems Lorem ipsum dolor sit amet, consectetur
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Table of Contents: The Accelerated Data Center Optimizing Data Center Productivity Same Throughput with Fewer Server Nodes
More informationAccelerated Platforms: The Future of Computing. Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018
Accelerated Platforms: The Future of Computing Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018 Forces Shaping Computing 10 7 10 6 10 5 GPU PERFORMANCE CPU PERFORMANCE
More informationInterconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationThe Tesla Accelerated Computing Platform
The Tesla Accelerated Computing Platform Axel Koehler, Principal Solution Architect HPC Advisory Council Meeting Lugano 22 March 2016 Introduction TESLA Platform for HPC Agenda TESLA Platform for HYPERSCALE
More informationS THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,
S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print
More informationNVIDIA FOR DEEP LEARNING. Bill Veenhuis
NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA
More informationCisco UCS C480 ML M5 Rack Server Performance Characterization
White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco
More informationFast Hardware For AI
Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights
More informationHPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015
HPC with the NVIDIA Accelerated Computing Toolkit Mark Harris, November 16, 2015 Accelerators Surge in World s Top Supercomputers 125 100 75 Top500: # of Accelerated Supercomputers 100+ accelerated systems
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationINTRODUCING THE DGX FAMILY. Marc Domenech May 8, 2017
INTRODUCING THE DGX FAMILY Marc Domenech May 8, 2017 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering AI computing
More informationTESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ
TESLA ACCELERATED COMPUTING Mike Wang Solutions Architect NVIDIA Australia & NZ mikewang@nvidia.com GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER
More informationDEEP NEURAL NETWORKS AND GPUS. Julie Bernauer
DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int
More informationVSC Users Day 2018 Start to GPU Ehsan Moravveji
Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationS8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer
S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?
More informationScaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies
Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationNVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU
NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU WP-08608-001_v1.1 August 2017 WP-08608-001_v1.1 TABLE OF CONTENTS Introduction to the NVIDIA Tesla V100 GPU Architecture...
More informationNOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY. Peter Messmer
NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY Peter Messmer pmessmer@nvidia.com COMPUTATIONAL CHALLENGES IN HEP Low-Level Trigger High-Level Trigger Monte Carlo Analysis Lattice QCD 2 COMPUTATIONAL
More informationGPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationGPUS FOR NGVLA. M Clark, April 2015
S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationInterconnect Your Future
Interconnect Your Future Paving the Path to Exascale November 2017 Mellanox Accelerates Leading HPC and AI Systems Summit CORAL System Sierra CORAL System Fastest Supercomputer in Japan Fastest Supercomputer
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationUnified Deep Learning with CPU, GPU, and FPGA Technologies
Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine
More informationNVIDIA DEEP LEARNING PLATFORM
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance and Efficiency for AI Services, From the Data Center to the Network s Edge Introduction Artificial intelligence (AI), the dream
More informationIntroduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University
Introduction to High Performance Computing Shaohao Chen Research Computing Services (RCS) Boston University Outline What is HPC? Why computer cluster? Basic structure of a computer cluster Computer performance
More informationIntroduction to GPU Computing. 周国峰 Wuhan University 2017/10/13
Introduction to GPU Computing chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 GPU and Its Application 3 Ways to Develop Your GPU APP An Example to Show the Developments Add GPUs: Accelerate Science
More informationShrinath Shanbhag Senior Software Engineer Microsoft Corporation
Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade
More informationINVESTOR UPDATE. September 2018
INVESTOR UPDATE September 2018 SAFE HARBOR Forward-Looking Statements Except for the historical information contained herein, certain matters in this presentation including, but not limited to, statements
More information