GTC Jensen Huang Founder & CEO

Size: px

Start display at page:

Download "GTC Jensen Huang Founder & CEO"

Ethan Moore
5 years ago
Views:

1 GTC 2018 Jensen Huang Founder & CEO

2 2

3 3

4 4

5 SCREEN-SPACE AMBIENT OCCLUSION BAKED LIGHTING 5

6 GLOBAL ILLUMINATION 6

7 SCREEN-SPACE REFLECTIONS ENVIRONMENT MAPS 7

8 RAY TRACED REFLECTIONS 8

9 SCREEN-SPACE REFRACTION DEPTH SORTING 9

10 CAUSTICS 10

11 SUBSURFACE SHADING APPROXIMATION 11

12 SUBSURFACE SCATTERING 12

13 13

14 ANNOUNCING NVIDIA RTX TECHNOLOGY 14

15 ANNOUNCING QUADRO GV100 WITH NVIDIA RTX TECHNOLOGY GIANT LEAP FOR REAL-TIME COMPUTER GRAPHICS 2 GV100s Connected by NVLink2 64GB HBM2 Memory 10,240 CUDA Cores 236 TFLOPS Tensor Cores NVIDIA OptiX Vulkan Microsoft DXR NVIDIA RTX Technology NVIDIA Volta GPU 15

16 ONE BILLION IMAGES RENDERED EVERY YEAR GAMING MEDIA & ENTERTAINMENT PRODUCT DESIGN ARCHITECTURE 400 Games 500 Movies 12M Designers 150,000 Architects 16

17 TRADITIONAL RENDER FARM 280 Dual-CPU Servers 168 kw 17

18 NVIDIA RTX QUADRO GV100 BIG SAVINGS FOR RENDERING 14 Quad-GPU Servers 24 kw 1/5 the Cost 1/7 the Space 1/7 the Power 18

19 NVIDIA RTX EXCITEMENT TOOLS ENGINES GAMING MEDIA & ENTERTAINMENT PRODUCT DESIGN ARCHITECTURE With RTX we can now do ray tracing renders interactively. It s just fantastic! Sébastien Guichou, CTO, Isotropix NVIDIA RTX opens the door to make ray tracing a reality! Kim Libreri, CTO, Epic Games 19

RISE OF GPU COMPUTING 10 7 GPU-Accelerated Computing 820,000 GPU Developers 10X in 5 Yrs 2013 2018 10 5 1.1X per year 8M CUDA Downloads 5X in 5 Yrs 2013 2018 10 3 Single-threaded perf 1.

20 RISE OF GPU COMPUTING 10 7 GPU-Accelerated Computing 820,000 GPU Developers 10X in 5 Yrs X per year 8M CUDA Downloads 5X in 5 Yrs Single-threaded perf 1.5X per year , GTC Registrations 4X in 5 Yrs 40 Years of CPU Trend Data Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for by K. Rupp PF 2018 Total GPU FLOPS of Top 50 Systems 15X in 5 Yrs 20

21 SCIENCE NEEDS SUPERCHARGED COMPUTERS Reinventing the Lithium-Ion Battery 7 Days on Titan Mapping the Earth s Core 17 Days on Titan Cloud-Resolving Climate Simulation 840 Days on Piz Daint Understanding HIV s Structure 16 Days on Blue Waters 21

SUPERCHARGED COMPUTING Fermi GPU Server 2013 HPC Applications Amber 12 NAMD 2.9 GPU Acceleration Stack cublas 5.0 cufft 5.0 NPP 5.0 CUDA 5.0 curand 5.

2 100 GPU Accelerated Computing 10 Volta GPU Server 2018 Moore s Law HPC Applications Amber 16 CHROMA 2018 Gyrokinetic TC 2017 LAMMPS 2018 MILC 2018 NAMD

22 SUPERCHARGED COMPUTING Fermi GPU Server 2013 HPC Applications Amber 12 NAMD 2.9 GPU Acceleration Stack cublas 5.0 cufft 5.0 NPP 5.0 CUDA 5.0 curand 5.0 cusparse 5.0 Res Mgr R304 BaseOS CentOS GPU Accelerated Computing 10 Volta GPU Server 2018 Moore s Law HPC Applications Amber 16 CHROMA 2018 Gyrokinetic TC 2017 LAMMPS 2018 MILC 2018 NAMD 2.13 Quantum Esp. 6.1 SPECFEM3D 2018 GPU Acceleration Stack cublas 9.0 cufft 9.0 NPP 9.0 CUDA 9.0 curand 9.0 cusparse 9.0 Res Mgr R384 BaseOS Ubuntu CPU Measured performance of Amber, CHROMA, GTC, LAMMPS, MILC, NAMD, Quantum Espresso, SPECFEM3D 22

23 TRADITIONAL HPC CLUSTER 600 Dual-CPU Servers 360 kw 23

24 NVIDIA TESLA V100 BIG SAVINGS FOR HPC 30 Quad-GPU Servers 48 kw 1/5 the Cost 1/7 the Space 1/7 the Power 24

25 CLARA MEDICAL IMAGING SUPERCOMPUTER IMAGING & VISUALIZATION APPS CUDA CUDNN TENSORRT OGL RTX GPU CONTAINERS VGPU NVIDIA GPU SERVER 25

26 CLARA MEDICAL IMAGING SUPERCOMPUTER IMAGING & VISUALIZATION APPS CUDA CUDNN TENSORRT OGL RTX GPU CONTAINERS VGPU NVIDIA GPU SERVER DL-BASED IMAGE RECONSTRUCTION DL-BASED BRAIN SEGMENTATION CINEMATIC RENDERING 26

27 CLARA MEDICAL IMAGING SUPERCOMPUTER IMAGING & VISUALIZATION APPS CUDA CUDNN TENSORRT OGL RTX GPU CONTAINERS VGPU NVIDIA GPU SERVER 27

28 IMAGING DEVELOPMENT PARTNERS ULTRASOUND MRI CT X-RAY MAMMO PET HEALTHCARE PROVIDERS STARTUPS IMAGING COMPANIES 28

29 NVIDIA AI PLATFORM Announcing NEW 32GB 2X Announcing NEW 32GB 2X Tesla V100 DGX-1 and DGX Station Every Cloud Every Computer Maker NVIDIA GPU Cloud NVIDIA AI Inference TITAN V 29

30 AlexNet 30

CAMBRIAN EXPLOSION Convolutional Networks Recurrent Networks

Species Capsule Nets Encoder/Decoder ReLu BatchNorm LSTM GRU Beam

Experts Neural Collaborative Filtering Concat Dropout Pooling

31 CAMBRIAN EXPLOSION Convolutional Networks Recurrent Networks Generative Adversarial Networks Reinforcement Learning New Species Capsule Nets Encoder/Decoder ReLu BatchNorm LSTM GRU Beam Search 3D-GAN MedGAN Conditional GAN DQN Simulation Mixture of Experts Neural Collaborative Filtering Concat Dropout Pooling WaveNet CTC Attention Coupled GAN Speech Enhancement GAN DDPG Block Sparse LSTM 31

32 CAMBRIAN EXPLOSION Convolutional Networks Recurrent Networks Generative Adversarial Networks Reinforcement Learning New Species Capsule Nets Encoder/Decoder ReLu BatchNorm LSTM GRU Beam Search 3D-GAN MedGAN Conditional GAN DQN Simulation Mixture of Experts Neural Collaborative Filtering Concat Dropout Pooling WaveNet CTC Attention Coupled GAN Speech Enhancement GAN DDPG Block Sparse LSTM 32

33 THE WORLD WANTS A GIGANTIC GPU 33

34 THE WORLD S LARGEST GPU 16 Tesla V100 32GB Connected by NVSwitch On-chip Memory Fabric Semantic Extended Across All GPUs 512GB HBM2 and 14.4TB/sec Aggregate 81,920 CUDA Cores 2,000 TFLOPS Tensor Cores 34

35 THE WORLD S LARGEST GPU 2B Transistors TSMC 12FFN 16 Tesla V100 32GB Connected by NVSwitch On-chip Memory Fabric Semantic Extended Across All GPUs 512GB HBM2 and 14.4TB/sec Aggregate 81,920 CUDA Cores 2,000 TFLOPS Tensor Cores 35

36 THE WORLD S LARGEST GPU 18 Links 25Gbps * 8 Bi-directional 16 Tesla V100 32GB Connected by NVSwitch On-chip Memory Fabric Semantic Extended Across All GPUs 512GB HBM2 and 14.4TB/sec Aggregate 81,920 CUDA Cores 2,000 TFLOPS Tensor Cores 36

37 THE WORLD S LARGEST GPU 7.2 Terabits/sec or 900 GB/sec 16 Tesla V100 32GB Connected by NVSwitch On-chip Memory Fabric Semantic Extended Across All GPUs 512GB HBM2 and 14.4TB/sec Aggregate 81,920 CUDA Cores 2,000 TFLOPS Tensor Cores 37

38 THE WORLD S LARGEST GPU Every GPU-to-GPU at 300 GB/sec 16 Tesla V100 32GB Connected by NVSwitch On-chip Memory Fabric Semantic Extended Across All GPUs 512GB HBM2 and 14.4TB/sec Aggregate 81,920 CUDA Cores 2,000 TFLOPS Tensor Cores 38

39 ANNOUNCING NVIDIA DGX-2 THE LARGEST GPU EVER CREATED 2 PFLOPS 512GB HBM2 10 kw 350 lbs 39

10X IN 6 MONTHS DGX-1 V100 16GB SEPT 17 Framework pytorch 0.2 TensorFlow 1.3 MXNet 0.11 Caffe2 0.8.1 CNTK 2.0 Python 2.7 System Software Stack NCCL 2.0.2 cudnn 7.0.2 cublas 9.0 cufft 9.0 NPP 9.

40 10X IN 6 MONTHS DGX-1 V100 16GB SEPT 17 Framework pytorch 0.2 TensorFlow 1.3 MXNet 0.11 Caffe CNTK 2.0 Python 2.7 System Software Stack NCCL cudnn cublas 9.0 cufft 9.0 NPP 9.0 CUDA 9.0 Res Mgr R384 BaseOS days 10 Fairseq is a neural machine translation network, published by Facebook in May 17. Fairseq is trained with WMT 14 English-French dataset in 55 epochs DGX-2 V100 32GB MAR 18 5 Framework pytorch 0.3 TensorFlow 1.7 MXNet 1.0 Caffe CNTK 2.3 Python 2.7 or 3.6 System Software Stack NCCL 2.2 cudnn 7.1 cublas 9.2 cufft 9.2 NPP 9.2 CUDA 9.2 Res Mgr R396 BaseOS DGX-1 Time to Train FAIRSEQ 1.5 days DGX-2 40

41 ANNOUNCING NVIDIA DGX-2 $399K Available in Q3 41

42 TRADITIONAL HYPERSCALE CLUSTER 300 Dual-CPU Servers $3M 180 kw 42

43 NVIDIA DGX-2 FOR DEEP LEARNING 1 DGX-2 Big Savings for Deep Learning 10 kw 1/8 the Cost 1/60 the Space 1/18 the Power 43

44 500X IN 5 YEARS 2 GTX 580s DEC 12 AlexNet Framework System Software Stack cuda-convnet NCCL N/A cudnn N/A cublas 5.0 cufft 5.0 NPP 5.0 CUDA 5.0 Res Mgr R days DGX-2 MAR 18 Framework NV Caffe System Software Stack NCCL 2.2 cudnn 7.1 cublas 9.2 cufft 9.2 NPP 9.2 CUDA 9.2 Res Mgr R min 2 GTX 580s DGX-2 Time to Train AlexNet 44

45 NVIDIA GPU CLOUD Optimized Stacks for Every Cloud 20,000+ Registered Organizations 30 Containers NOW on AWS, GCP, AliCloud, Oracle Cloud, DGX 45

46 PLASTER 46

47 NVIDIA AI INFERENCE ASR RNN++ SPEECH SYNTH DGN, S2S RECOMMENDER MLP-NCF NLP RNN IMAGE / VIDEO CNN TensorRT CNNs 30M HYPERSCALE SERVERS TensorRT 2 INT8 TensorRT 3 Tensor Core TensorRT 4 TensorFlow Integration Kaldi Optimization ONNX WinML 190X IMAGE / VIDEO ResNet-50 with TensorFlow Integration 50X NLP GNMT 45X RECOMMENDER Neural Collaborative Filtering 36X SPEECH SYNTH WaveNet 60X ASR DeepSpeech 2 DNN Sept 16 Apr 17 Sept 17 Apr 18 All speed-ups are chip-to-chip CPU to GV

48 ANNOUNCING KUBERNETES ON NVIDIA GPUS Scale-up Thousands of GPUs Instantly Multi-region, Self-healing Cluster Orchestration GPU Optimized Out-of-the-Box KUBERNETES GPU ACCELERATED NVIDIA GPU CLOUD APPLICATIONS NVIDIA GPU CONTAINERS DOCKER NVIDIA GPUs AWS GCP AZURE NVIDIA GPU SERVERS 48

49 49

NVIDIA AI INFERENCE CSPs VIDEO ANALYTICS SPEECH RECOMMENDATION SYSTEMS MAPPING AUTOMOTIVE ROBOTICS SMART CITIES ETAIL HEALTHCARE MANUFACTURING NVIDIA s inference platform made it possible to derive

50 NVIDIA AI INFERENCE CSPs VIDEO ANALYTICS SPEECH RECOMMENDATION SYSTEMS MAPPING AUTOMOTIVE ROBOTICS SMART CITIES ETAIL HEALTHCARE MANUFACTURING NVIDIA s inference platform made it possible to derive real-time understanding of live videos. Nicolas Koumchatzky, Head of Cortex, Twitter We believe TensorRT could dramatically improve productivity for our enterprise customers. Markus Noga, Head of Machine Learning, SAP 50

NVIDIA AI PLATFORM Tesla V100 NEW 32GB DGX

51 NVIDIA AI PLATFORM Tesla V100 NEW 32GB DGX Systems NEW with V100 32GB NEW DGX-2 Every Cloud NGC Now on AWS, GCP, AliCloud, Oracle NVIDIA GPU Cloud 30 GPU-Optimized Containers NVIDIA AI Inference NEW TensorRT 4, TensorFlow Kaldi, ONNX, WinML TITAN V Out of stock! 51

52 NVIDIA RESEARCH RECENT WORK 200 RESEARCHERS Seattle Redmond Santa Clara Salt Lake City St. Louis Austin Westford Charlottesville Durham Lund Berlin Helsinki Bill Dally NVIDIA Chief Scientist Graphics Deep Learning Robotics Computer Vision Parallel Architectures Programming Systems Circuits VLSI Networks RTX CNN Image Inpainting NVSwitch Noise-to-Noise Denoising CuDNN Progressive GAN 52

53 NVIDIA RESEARCH CONDITIONAL GAN 200 RESEARCHERS Seattle Redmond Santa Clara Salt Lake City St. Louis Austin Westford Charlottesville Durham Lund Berlin Helsinki Bill Dally NVIDIA Chief Scientist Graphics Deep Learning Robotics Computer Vision Parallel Architectures Programming Systems Circuits VLSI Networks 53

54 EVERYTHING THAT MOVES WILL BE AUTONOMOUS Cars Robotaxis Trucks Delivery Vans Buses Tractors 54

55 NVIDIA DRIVE END-TO-END PLATFORM COLLECT DATA TRAIN MODELS SIMULATE DRIVE Cars Pedestrians Path Cars Pedestrians Path Lanes Signs Lights Lanes Signs Lights 55

NVIDIA PERCEPTION INFRASTRUCTURE LARGE-SCALE DEEP LEARNING MODEL DEVELOPMENT Data Factory Train on NVIDIA DGX Library of Labeled Data Workflow, Tools, Supercomputing Infrastructure Data Ingest,

56 NVIDIA PERCEPTION INFRASTRUCTURE LARGE-SCALE DEEP LEARNING MODEL DEVELOPMENT Data Factory Train on NVIDIA DGX Library of Labeled Data Workflow, Tools, Supercomputing Infrastructure Data Ingest, Labeling, Training, Validation, Adaptation Automation, Best Model Discovery, Traceability, Reproducibility Purpose-built for Safety Standards of Automotive Data is the new source code DRIVE Pegasus Validate/ Verify Test Data 56

57 57

58 58

59 NVIDIA DRIVE ROADMAP ONE ARCHITECTURE DRIVE Pegasus Orin Auto-Grade Super Energy-Efficient ASIL-D Functional Safety DRIVE PX 2 DRIVE Xavier DRIVE PX Parker 59

60 SIMULATION THE PATH TO BILLIONS OF MILES World drives trillions of miles each year. U.S. has 770 accidents per billion miles. A fleet of 20 test cars cover 1 million miles per year. 60

61 ANNOUNCING NVIDIA DRIVE SIM AND CONSTELLATION AV VALIDATION SYSTEM Virtual Reality AV Simulator Same Architecture as DRIVE Computer Simulate Rare and Difficult Conditions, Recreate Scenarios, Run Regression Tests, Drive Billions of Virtual Miles 10,000 Constellations Drive 3B Miles per Year 61

62 ANNOUNCING NVIDIA DRIVE SIM AND CONSTELLATION AV VALIDATION SYSTEM Virtual Reality AV Simulator Same Architecture as DRIVE Computer Simulate Rare and Difficult Conditions, Recreate Scenarios, Run Regression Tests, Drive Billions of Virtual Miles 10,000 Constellations Drive 3B Miles per Year 62

63 ANNOUNCING NVIDIA DRIVE SIM AND CONSTELLATION AV VALIDATION SYSTEM Virtual Reality AV Simulator Same Architecture as DRIVE Computer Simulate Rare and Difficult Conditions, Recreate Scenarios, Run Regression Tests, Drive Billions of Virtual Miles 10,000 Constellations Drive 3B Miles per Year 63

ANNOUNCING NVIDIA DRIVE SIM AND CONSTELLATION AV VALIDATION SYSTEM Virtual Reality AV Simulator Same Architecture as DRIVE Computer Simulate Rare

64 ANNOUNCING NVIDIA DRIVE SIM AND CONSTELLATION AV VALIDATION SYSTEM Virtual Reality AV Simulator Same Architecture as DRIVE Computer Simulate Rare and Difficult Conditions, Recreate Scenarios, Run Regression Tests, Drive Billions of Virtual Miles 10,000 Constellations Drive 3B Miles per Year 64

65 65

66 CARS TRUCKS 370 PARTNERS DEVELOPING ON NVIDIA DRIVE MOBILITY SERVICES SUPPLIERS MAPPING LIDAR CAMERA / RADAR STARTUPS 66

67 ROBOTICS BOOSTS EVERY INDUSTRY Delivery Consumer Healthcare Agriculture Retail Logistics Manufacturing 67

68 NVIDIA ISAAC ROBOTICS PLATFORM SIMULATION TRAINING DEPLOYMENT SDK 68

69 69

70 70

71 THE GPU COMPUTING REVOLUTION CONTINUES QUADRO GV100 NEW TESLA V100 32GB NEW TENSORRT 4 AND MORE DRIVE SIM & CONSTELLATION ISAAC NVIDIA RTX NEW DGX-2 1 ST 2PF COMPUTER 300 SERVERS IN A BOX Kubernetes On NVIDIA GPUs ONE ARCHITECTURE XAVIER PEGASUS - ORIN CLARA GRAPHICS AI AUTO NEW PLATFORMS 71

72 72

NVIDIA PLATFORM FOR AI

NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING