NVIDIA PLATFORM FOR AI

Size: px

Start display at page:

Download "NVIDIA PLATFORM FOR AI"

Britton Stephens
5 years ago
Views:

1 NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin

2 i am ai 2

3 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3

GPU COMPUTING AT THE HEART OF AI 40 Years of CPU Trend

2020 Original data up to the year 2010 collected and

4 GPU COMPUTING AT THE HEART OF AI 40 Years of CPU Trend Data GPU-Computing perf 1.5X per year 1.1X per year 1000X by X per year Single-threaded perf Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for by K. Rupp Performance Beyond Moore s Law Big Bang of Modern AI 4

5 AlexNet

6 CAMBRIAN EXPLOSION Convolutional Networks Recurrent Networks Generative Adversarial Networks Reinforcement Learning

Convolutional Networks Recurrent Networks Generative Adversarial Networks Reinforcement Learning New Species There is a Cambrian explosion of neural networks.

The hyperscale datacenters that host them serve billions of people, cost billions to operate, and are among the most complex computers the world has ever made.

7 Convolutional Networks Recurrent Networks Generative Adversarial Networks Reinforcement Learning New Species There is a Cambrian explosion of neural networks. Since AlexNet, thousands of new models have emerged. With hundreds of layers and billions of parameters, their complexity has soared by 500X in just 5 years. The hyperscale datacenters that host them serve billions of people, cost billions to operate, and are among the most complex computers the world has ever made. Maintaining great quality of service while minimizing cost is incredibly difficult. Jensen helps us remember with PLASTER. PROGRAMMABILITY LATENCY ACCURACY SIZE THROUGHPUT ENERGY EFFICIENCY RATE OF LEARNING

8 REVOLUTIONARY AI PERFORMANCE Volta is the Most Advanced Data Center GPU Ever Built Performance up to 100 CPUs 21 billion transistors 5120 CUDA cores New Tensor Core architecture inspired by the demands of deep learning 8

9 MAXIMIZING PERFORMANCE ON VOLTA 12 8 Greater Than 10x Performance K80 vs. V K80 GPU Generational Training Scaling V100 Tensor Core ResNet-152 Training, 8x K80 (16 GPUs total) compared with 8x V100 NVLink GPUs using NVIDIA containers 9

10 DEEP LEARNING 10

11 AI AND DEEP LEARNING 11

12 NVIDIA AI PLATFORM Announcing NEW 32GB 2X Announcing NEW 32GB 2X Tesla V100 DGX-1 and DGX Station Every Cloud Every Computer Maker NVIDIA GPU Cloud NVIDIA AI Inference TITAN V

13 DEEP LEARNING SOFTWARE developer.nvidia.com/deep-learning 13

14 WHAT IS THE BEST DEEP LEARNING FRAMEWORK? 14

15 DL FRAMEWORKS How to choose? Jeff Dean and Francois Chollet from Google have indicated relevant DL framework statistics for adoption. 15

16 DL FRAMEWORKS How to choose? 16

17 DL FRAMEWORKS How to choose? 17

18 INFERENCE 18

19 AI INFERENCING AT THE SPEED OF LIGHT 19

20 THE BRAIN OF AI CARS NVIDIA DRIVE scalable AI platform for entire range of autonomous driving 320+ companies have adopted DRIVE, for data centers and in vehicles Includes automakers and suppliers, mapping and sensor companies, startups and research orgs 20

21 NVIDIA DRIVE AUTOMOTIVE PERCEPTION 21

22 NVIDIA TENSORRT PROGRAMMABLE INFERENCE ACCELERATOR TESLA P4 TensorRT JETSON TX2 DRIVE PX 2 NVIDIA DLA TESLA V100 Frameworks Platforms 22

23 TENSOR RT 23

24 NVIDIA TENSORRT 10X BETTER DATA CENTER TCO 160 CPU Servers 45,000 Images / Second 65 KWatts 24

25 NVIDIA TENSORRT 10X BETTER DATA CENTER TCO 1 NVIDIA HGX with 8 Tesla V100 GPUs 45,000 Images / Second 3 KWatts 1/6 the Cost 1/20 the Power 4 Racks in a Box 25

26 TENSORRT - NVIDIA AI INFERENCE ASR RNN++ SPEECH SYNTH DGN, S2S RECOMMENDER MLP-NCF NLP RNN IMAGE / VIDEO CNN TensorRT CNNs 30M HYPERSCALE SERVERS TensorRT 2 INT8 TensorRT 3 Tensor Core TensorRT 4 TensorFlow Integration Kaldi Optimization ONNX WinML 190X IMAGE / VIDEO ResNet-50 with TensorFlow Integration 50X NLP GNMT 45X RECOMMENDER Neural Collaborative Filtering 36X SPEECH SYNTH WaveNet 60X ASR DeepSpeech 2 DNN Sept 16 Apr 17 Sept 17 Apr 18 All speed-ups are chip-to-chip CPU to GV100.

27 BIG DATA & ANALYTICS 27

28 GIGABYTES TERABYTES EXABYTES ZETTABYTES PETABYTES DATA DELUGE TO DATA HUNGRY AI Sensors Infotainment Systems Streaming Video DIGITAL WEB BUSINESS PROCESS INCREASING User DATA VARIETY Generated Content Web Logs Offer Details Purchase Detail Social Network A/B Testing Segmentation Support Contacts User Click Stream Offer History Purchase Record Payment Record Mobile Web Dynamic Pricing Search Marketing Sentiment Behavioral Targeting Dynamic Funnels IoT Data Business Data Feeds Natural Language Processing HD Video Speech To Text Product/ Service Logs SMS/MMS Wearable Devices Cyber Security Logs Connected Vehicles Machine Data 28

29 WORKAROUNDS ARE NOT THE ANSWERS $ Sampling misses the whole picture EXPLORE THE OUTLIERS AND LONG-TAIL EVENTS Pre-aggregation struggles at scale RELY ON ACCURATE DATA Scale out on CPU infrastructure has tremendous hidden costs SCALE WITH A ROI 29

30 NVIDIA ACCELERATED ANALYTICS GPUs in the Data Center ANALYZE VISUALIZE AI-ACCELERATE 30

GPU FOR ANALYTICS SOLUTIONS + ARCHITECTURES

31 GPU FOR ANALYTICS SOLUTIONS + ARCHITECTURES DEEP LEARNING VISUALIZATION ACCELERATED VISUALIZATION DATABASES ACCELERATED DATABASES CORE TECHNOLOGIES CORE TECHNOLOGIES Spark Scheduler Mesos TRADITIONAL DATA CENTER GPU-ACCELERATED DATA CENTER NVIDIA Tesla GPUs NVIDIA DGX Products Cloud 31

BlazeGraph 1843 GPUs 700X-800X faster than graphs in all cases SQream 1403 700M Edges Single Node Xeon 2650 vs 2 K80

32 Speed-up (higher is faster) GPU-ACCELERATION HAS NO LIMITS MapD Kinetica Leading In-Memory DB > 50x Slower NoSQL DB s > 100x Slower Aggregate of queries - Time (s) Less is better! BlazeGraph 1843 GPUs 700X-800X faster than graphs in all cases SQream M Edges Single Node Xeon 2650 vs 2 K B Edges 16 EC2 r3.xlarge vs 16 K40s B Edges 16 EC2 r3.4xlarge vs 16 K40s2 1.98B Edges Spark CPU Baseline 1 Speed-up over baseline spark CPU configuration 32

33 GPU-ACCELERATION HAS NO LIMITS MapD 33

34 MAPD: GPU Accelerated Database 34

35 ML ACROSS INDUSTRIES Finance Healthcare Telco 35

36 GPU ACCELERATED ML AND BIG DATA gpuopenanalytics.com

37 H2O4GPU PERFORMANCE 5x 10x 40x GLM XGBoost K-Means 37

38 NVIDIA VOLTA IN EVERY CLOUD, EVERY DATACENTER 38

39 NVIDIA GPU CLOUD Optimized Stacks for Every Cloud 20,000+ Registered Organizations 30 Containers NOW on AWS, GCP, AliCloud, Oracle Cloud, DGX

40 HOW TO START? Develop on GeForce, Deploy on Tesla GeForce Start development using GeForce Cloud Scale out on cloud Data Center Deploy on data center

41 developer.nvidia.com 41

42 INCEPTION PROGRAM

43 NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded