Accelerated Platforms: The Future of Computing. Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018

Size: px
Start display at page:

Download "Accelerated Platforms: The Future of Computing. Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018"

Transcription

1 Accelerated Platforms: The Future of Computing Marc Hamilton, VP Solutions Architecture & Engineering, NVIDIA Korea AI Conference 2018

2 Forces Shaping Computing GPU PERFORMANCE CPU PERFORMANCE Beyond Moore s Law

3 Forces Shaping Computing GPU PERFORMANCE CPU PERFORMANCE ` Beyond Moore s Law 1000x Every 10 Years Accelerated Computing

4 Forces Shaping Computing GPU PERFORMANCE CPU PERFORMANCE ` DATA DEEP NEURAL NETWORK PROGRAM Beyond Moore s Law 1000x Every 10 Years Accelerated Computing The Deep Learning Revolution

5 NVIDIA is Accelerators for Demanding Applications Platforms That Provide Complete Solutions Graphics AI Healthcare Autonomous Vehicles Data Science Robotics

6 Accelerating Deep Learning Accelerating Graphics Accelerating Science Accelerating Data Science Scaling Accelerating Autonomous Vehicles Accelerating Robotics Conclusion

7 Deep Learning Everywhere Internet & Cloud Image Classification Speech Recognition Language Translation Language Processing Sentiment Analysis Recommendations Medicine & Biology Cancer Cell Detection Diabetic Grading Drug Discovery Media & Entertainment Video Captioning Video Search Real Time Translation Intelligent Video Analytics Traffic Analysis Retail Analytics Access Control Transportation Pedestrian Detection Lane Tracking Traffic Sign Recognition

8 AI IMPROVES SEMICON INSPECTION ACCURACY In the semiconductor industry, inaccurate or false positive fault inspections can lead to huge product losses. SK Hynix deployed an AI fault detection solution with NVIDIA Tesla GPUs, DGX Station, Jetson TX2, CUDA and TensorRT. With its new deep learning-based tool, SK Hynix achieved over 90% inspection accuracy.

9 Deep Learning Was Enabled by Hardware

10 Deep Learning is Gated by Hardware 350X Inception-v4 30X DeepSpeech 3 10X MoE GNMT AlexNet GoogLeNet ResNet-50 Inception-v2 DeepSpeech DeepSpeech 2 OpenNMT Image Network Complexity GOPS * Bandwidth Speech Network Complexity GOPS * Bandwidth Translation Network Complexity GOPS * Bandwidth

11 Tesla V100 Tensor Core GPU 21B Transistors TSMC 12nm FFN 815mm 2 5,120 CUDA Cores 7.5 FP64 TFLOPS 15 FP32 TFLOPS 125 Tensor TFLOPS 20 MB SM RF 16 MB Cache 32 GB 900 GB/s 300 GB/s NVLink

12 A 0,0 A 0,1 A 0,2 A 0,3 B 0,0 B 0,1 B 0,2 B 0,3 C 0,0 C 0,1 C 0,2 C 0,3 Tensor Core A 1,0 A 1,1 A 1,2 A 1,3 B 1,0 B 1,1 B 1,2 B 1,3 C 1,0 C 1,1 C 1,2 C 1,3 D = + Mixed Precision Matrix Math A 2,0 A 2,1 A 2,2 A 2,3 B 2,0 B 2,1 B 2,2 B 2,3 C 2,0 C 2,1 C 2,2 C 2,3 4x4 Matrices A 3,0 A 3,1 A 3,2 A 3,3 B 3,0 B 3,1 B 3,2 B 3,3 C 3,0 C 3,1 C 3,2 C 3,3 D = AB + C FP16 or FP32

13 Turing Accelerates Inference Quadro RTX ,608 CUDA Cores 576 Tensor Cores 48 GB GDDR6 Memory TFLOPS FP TOPS INT8 522 TOPS INT4 672 GB/s DRAM BW 250 GB/s (Bidirectional) NVLink Channels 295 W TENSOR CORES SHADER COMPUTE RT CORES Tesla T4 2,560 CUDA Cores 320 Tensor Cores 16 GB GDDR6 Memory 65 TFLOPS FP TOPS INT8 260 TOPS INT4 320 GB/s DRAM BW 70 W

14 Turing Tensor Core - Optimized for Inference Multi-Precision for AI Inference T4: 65 TFLOPS FP TOPS INT8 260 TOPS INT4 RTX 8000: 130 TFLOPS FP TOPS INT8 520 TOPS INT4

15 TFLOPS / TOPS Speedup vs. CPU Server Speedup vs. CPU Server Speedup vs. CPU Server World s Most Performant Inference Platform Up to 36X Faster Than CPUs Accelerates All AI Workloads Peak Performance Speech Inference Video Inference Natural Language Processing Inference FLOAT INT8 FLOAT INT8 INT4 P4 T CPU Server Tesla P4 Tesla T CPU Server Tesla P4 Tesla T CPU Server Tesla P4 Tesla T4 Speedup: 6X Faster Int8 Ops vs P4 Speedup: 21X Faster DeepSpeech 2 Speedup: 27X Faster ResNet-50 (7ms Latency Limit) Speedup: 36X Faster GNMT

16 TESLA P4/T4 TensorRT JETSON AGX NVIDIA TensorRT 5 Optimizer Runtime DRIVE AGX Multi-Precision Acceleration of All Frameworks Containerized Inference Serving Engine Docker and Kubernetes Integration TESLA V100 NVIDIA DLA Platforms Layer & Tensor Fusion Precision Calibration Kernel Auto-Tuning Dynamic Tensor Memory

17 TensorRT Inference Server DNN Models NV DL SDK NV Docker TensorRT Inference Server Kubernetes

18 Space and Power Reduction Game-Changing Inference Performance = Inference Workload 200 CPU Servers 60 KWatts Speech, NLP and Video Inference Workload 1 T4 Accelerated Server 2 KWatts Speech, NLP and Video

19 Accelerating Deep Learning Accelerating Graphics Accelerating Science Accelerating Data Science Scaling Accelerating Autonomous Vehicles Accelerating Robotics Conclusion

20 Turing Revolutionizes Graphics Quadro RTX ,608 CUDA Cores 576 Tensor Cores 72 RT Cores 48 GB GDDR6 Memory 130 TFLOPS FP TOPS INT8 520 TOPS INT4 336 GB/s DRAM BW 250 GB/s NVLINK Channels 295W Turing SM 14 TFLOPS + 14 TIPS Concurrent FP & INT Execution Variable Rate Shading RT Core 10 Giga Rays/sec Ray Triangle Intersection BVH Traversal Tensor Core 114 TFLOPS FP TOPS INT8 455 TOPS INT4

21 Deep Learning for Imaging Colorizing UC Berkeley In-Painting NVIDIA FP32 INT32 TC FP32 INT32 TC PASCAL TITAN Xp TURING 2080 Ti Turing 9X Peak FLOPS Denoising Disney Research, Pixar, UCSB SuperRez NVIDIA

22 Giga Rays/s (Primary) Turing Ray Tracing Performance >10 Giga Rays GTX 1080 Ti RTX 2080 Ti GTX 1080 Ti RTX 2080 Ti 11.3 TFLOPS 68 RT Cores 1.1 Giga Rays 10+ Giga Rays 10 TFLOPS / Giga Ray ~10X faster than 1080 Ti 0 Mustang Dragon Veyron-NG Blade Buddha GeoMean

23

24 Turing A Giant Leap Gaming Reinvented World s First Ray Tracing GPU Universal Deep Learning Accelerator

25 Accelerating Deep Learning Accelerating Graphics Accelerating Science Accelerating Data Science Scaling Accelerating Autonomous Vehicles Accelerating Robotics Conclusion

26 Accelerating Science VASP AMBER NAMD GROMACS Gaussian Simulia Abaqus WRF OpenFOAM ANSYS LS-DYNA BLAST LAMMPS ANSYS Fluent Quantum Espresso GAMESS Top 15 HPC Applications Intersect360 Research, Nov 2017 HPC Application Support for GPU Computing 600 Accelerated Applications

27 NVIDIA Powers World s Fastest Supercomputer Summit Becomes First System to Scale the 100 PetaFLOPS Milestone = 122 PF HPC 3 EF AI 27,648 Volta V100 Tensor Core GPUs

28 NVIDIA Powers Fastest Supercomputers in US, Europe, Japan, Industry 17 of World s 20 Most Energy-Efficient Supercomputers ORNL Summit World s Fastest 27,648 GPUs 122 PF LLNL Sierra US 2 nd Fastest 17,280 GPUs 72 PF ABCI Japan s Fastest 4,352 GPUs 20 PF Piz Daint Europe s Fastest 5,320 GPUs 20 PF ENI HPC4 Fastest Industrial 3,200 GPUs 12 PF

29 HPC Algorithms Based on First Principles Theory Proven Models for Accurate Results AI Neural Networks That Learn Patterns From Large Data Sets Improve Predictive Accuracy and Faster Response Time AI A New Instrument for Science Dramatically Improves Accuracy and Time-to-Solution Commercially viable fusion energy Understanding cosmological dark energy and matter Clinically viable precision medicine Improvement and validation of the Standard Model of Physics Climate/weather forecasts with ultra- high fidelity

30 AI for Science Transformative Tool to Accelerate the Pace of Scientific Innovation 90% Accuracy Fusion Sustainment Clean Energy 33% Faster Track Neutrinos Particle Physics 5,000X Faster Process LIGO Signal Understanding Universe 300,000X Faster Predict Molecular Energetics Drug Discovery 70% Accuracy Score Protein Ligand Drug Discovery 11% Higher Accuracy Monitor Earth s Vital Climate Weeks to 10 milliseconds Analyze Gravitational Lensing Astrophysics 14X Faster Generate Bose-Einstein Condensate (Physics) Improves Accuracy Enabling Realization of Full Scientific Potential Accelerates Time to Solution Unlocking Science in Exciting New Ways

31 AI TURNS SATELLITE IMAGES INTO VALUABLE INSIGHT Satellite imagery has many uses including disaster recovery, crop yield prediction, urban planning, and national defense. The Satrec Initiative and SI Analytics (SIA) apply GPU-powered AI to turn satellite images into valuable data for its customers. With the NVIDIA DGX Station to improve speed and efficiencies, Satrec and SIA now analyze 30K satellite images in 3 minutes, vs. 40 minutes with previous methods.

32 Tensor Core GPU Fuses HPC & AI Computing HPC (Simulation) FP64, FP32 AI (Deep Learning) FP16, INT8 HPC AI Volta Tensor Core GPU Multi-Precision Computing Fusion of HPC & AI

33 Tensor Core GPU Delivering Breakthrough Performance Multi-Precision Computing for Advancing Science Unlocking the Power of Superconductivity Finding Genes-to-Disease Connection 150x 50x 1x 1x Titan Node Summit Node Titan Node Summit Node Volta Tensor Core GPU Materials APP QMCPack (FP64, FP32) Genomics APP CoMet (FP16)

34 AI IMPROVES QUALITY AND PRODUCTION YIELD Delivering products of impeccable quality is a great opportunity for manufacturers to differentiate, but it raises the bar for detecting the smallest product defects. LG Consulting and Solutions (LG CNS) is using AI to identify product defects for the entire LG Electronics and LG Display production lines. With NVIDIA Tesla P4, Jetson TX2, DGX-1 and TensorRT to speed training and inference, LG CNS achieved 1.5% higher yield, 65% quality improvement, and reduced inspection-induced employee fatigue.

35 Reduced Cost, Space, Power 5X Better HPC TCO for Same Throughput = Amber, CHROMA, GTC, LAMMPS, MILC, NAMD, Quantum Expresso, SPECFEM3D Amber, CHROMA, GTC, LAMMPS, MILC, NAMD, Quantum Expresso, SPECFEM3D Mixed HPC Workload 160 Self-hosted Skylake CPU Servers 96 KWatts Mixed HPC Workload 8 Accelerated Servers with 4 V100 GPUs 13 KWatts 1/5 the Cost 1/7 the Space 1/7 the Power

36 Accelerating Deep Learning Accelerating Graphics Accelerating Science Accelerating Data Science Scaling Accelerating Autonomous Vehicles Accelerating Robotics Conclusion

37 INTERNET DEEP LEARNING $36B RETAIL HEALTHCARE FINANCIAL SERVICES LOGISTICS TELECOM AD TECH The New HPC Market $9B SCIENTIFIC COMPUTING HADOOP NUMPY SKL PANDAS SCIENTIFIC COMPUTING MACHINE LEARNING

38 The Defacto Data Science Platform PYTHON 1991 Guido van Rossum Interpreted language emphasizing readability PANDAS SKLEARN 2006 Travis Oliphant Multi-dimensional arrays, math functions 2008 Wes McKinney Data manipulation and analysis NUMPY 2010 Inria Machine learning library

39 The Defacto Data Science Platform PYTHON PYTHON PANDAS SKLEARN DASK PANDAS SKLEARN Matthew Rocklin NUMPY NUMPY Parallel processing in Python data analytics Dynamic task scheduling Collection of parallel arrays, data frames, lists

40 RAPIDS Accelerated Data Science PYTHON PYTHON PYTHON CUDF CUML PANDAS SKLEARN PANDAS SKLEARN 2016 Wes McKinney PANDAS-LIKE SKLEARN-LIKE DASK DASK NUMPY NUMPY CUDA ARROW Cross-language platform for in-memory data Columnar memory format Vectorized execution engine Zero-copy IPC Designed with GPU in mind

41 RAPIDS Accelerated Data Science PYTHON PYTHON PYTHON PANDAS SKLEARN DASK PANDAS SKLEARN DASK CUDF RAPIDS CUML CUGRAPH DEEP LEARNING FRAMEWORKS CUDNN CUDA NUMPY NUMPY ARROW

42 RAPIDS: Dramatic ML Acceleration ETL ML 20 CPU Nodes 20 CPU Nodes 20 CPU Nodes 50 CPU Nodes 50 CPU Nodes 50 CPU Nodes 100 CPU Nodes 100 CPU Nodes 100 CPU Nodes DGX-2 DGX-2 DGX SECONDS 2 Hours 1 Hour 3 Hours SECONDS SECONDS ETL ML End-to-End

43

44 DGX GB DGX GB Enterprise-Scale Data Science DGX STATION RTX GB 96 GB TESLA V GB

45 Accelerating Deep Learning Accelerating Graphics Accelerating Science Accelerating Data Science Scaling Accelerating Autonomous Vehicles Accelerating Robotics Conclusion

46 New NVIDIA DGX-2 The Largest GPU Ever Created 2 PFLOPS 512 GB HBM2 16 TB/sec Memory Bandwidth 10 kw 160 kg

47 The World s Largest GPU 16 Tesla V100 32GB Connected by NVSwitch On-Chip Memory Fabric Semantic Extended Across All GPUs 512 GB HBM2 and 14.4 TB/sec Aggregate 81,920 CUDA Cores 2,000 TFLOPS Tensor Cores

48 NVSwitch Parameter Spec Bidirectional Bandwidth per NVLink 51.5 GB/s NRZ Lane Rate (x8 per NVLink) Transistors Gbps 2 Billion NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS Process TSMC 12FFN Die Size 106 mm^2 Bidirectional Aggregate Bandwidth 928 GB/s NVLink Ports 18 Mgmt Port (Config, Maintenance, Errors) PCIe NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS PORT LOGIC XBAR PORT LOGIC MANAGEMENT XBAR PORT LOGIC NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS NVLINK PHYS LD/ST BW Efficiency (128B pkts) 80.0% Copy Engine BW Efficiency (256B pkts) 88.9%

49 Traditional Machine Learning Cluster 300 Servers $3M 180 kw

50 GPU-Accelerated Machine Learning Cluster DGX-2 and Rapids for Predictive Analytics 1 DGX-2 10 kw 1/8 the Cost 1/15 the Space 1/18 the Power

51 NVIDIA Accelerated HPC Platform SCIENCE CUDA DL TRAINING cudnn DL INFERENCE New TensorRT Hyperscale Inference Platform MACHINE LEARNING New RAPIDS Dense HPC NVIDIA HPC Acceleration Stacks Hyperscale HPC

52 Accelerating Deep Learning Accelerating Graphics Accelerating Science Accelerating Data Science Scaling Accelerating Autonomous Vehicles Accelerating Robotics Conclusion

53 Trunk Opening NVIDIA DRIVE Software-Defined Car Powerful and Efficient for AI, CV, AR, HPC Rich Software Development Platform 370+ Partners Developing on DRIVE Eye Gaze Detect RADAR Distracted Driver Drowsy Driver Track Cyclist Alert CG Lidar Localization LIDAR LIDAR Localization Path Perception Camera Localization Path Planning Surround Perception Lanes Signs Lights Egomotion DRIVE AGX Xavier DRIVE AGX Pegasus

54 NVIDIA DRIVE TRAINING SIMULATING DRIVING Cars Pedestrians Lanes Path Signs Lights

55

56 16 Lane CSI 109 Gbps CPHY 1.1 1Gb Ethernet DLA 5.7 TFLOPS FP TOPS INT8 Xavier World s First Autonomous Machine Processor Multimedia Engines 1.2 GPIX/s Encode 1.8 GPIX/s Decode 4 GPIX/s Video Image Compositor Vision Accelerator 1.7 TOPS Stereo & Optical Flow Engine 2x 3.1 TOPS Industry Standard High-Speed IO PCle Gen4 Root and Endpoint USB 3.1 Gen2 Host and Device UFS 2.1 Embedded Storage ISP 2.4 GPIX/s Native Full-Range HDR Tile-Based Processing Most Complex SOC Ever Made 9 Billion Transistors, 350mm 2, 12nFFN ~8,000 Engineering Years Volta Tensor Core GPU FP32 / FP16 / INT8 Multi-Precision 512 CUDA Tensor Cores 2.8 CUDA TFLOPS (FP16) 22.6 Tensor Core DL TOPS Carmel ARM64 CPU 8 Cores 10-wide Superscalar 21 SpecInt2K6 256-Bit LPDDR4X 137 GB/s

57 NVIDIA DRIVE World s First Autonomous Vehicle Platform DRIVE IX Available Now DRIVE AGX Xavier Developer Kit Available Now DRIVE AV Available Now

58 Accelerating Deep Learning Accelerating Graphics Accelerating Science Accelerating Data Science Scaling Accelerating Autonomous Vehicles Accelerating Robotics Conclusion

59 NVIDIA Isaac SENSOR PROCESSING MAPPING & LOCALIZATION PERCEPTION PATH & TASK PLANNING SITUATION UNDERSTANDING DIVERSITY & REDUNDANCY

60 Efficient Learning of Robust Policies Randomize Physical Parameters to Match Real World Rollouts [Chebotar-Handa-Makoviychuk-Macklin-Ratliff-Fox: 18]

61 BETTER HUMAN-ROBOT IN TERACTION, BETTER SERVICE ROBOTS From assisting the elderly to helping with every day chores, service robots hold the promise to make lives easier. But complex crowded environments could mean collisions between robots, humans and objects. Naver Labs designs autonomous robots with better human interaction. With its obstacle avoidance, AROUND G, built on NVIDIA Jetson Xavier for real-time preprocessing and neural network inference critical in navigating real world environments Naver Labs aims to popularize service robots in the coming future.

62 New NVIDIA AGX Embedded AI HPC High-Speed SerDes 109 Gbps Gbps I/O Up to 320 TOPS Tensor Ops Up to 25 TFLOPS FP32 Up to 16 GIGA Rays Starting from 15W

63 NVIDIA Jetson AGX World s First Edge AI Computer Isaac Gems Jetson AGX Xavier Developer Kit Available Now Isaac Sim

64 AI DELIVERS BUSINESS VALUE Realizing the harmonizing AI and robot technology Enhancing embedded deep learning with NVIDIA Jetson

65 New NVIDIA Platforms PYTHON SCIENCE CUDA DL TRAINING cudnn DL INFERENCE New TensorRT Hyperscale Inference Platform MACHINE LEARNING New RAPIDS DASK CUDF RAPIDS CUML CUDA CUGRAPH DEEP LEARNING FRAMEWORKS CUDNN ARROW CUDA GPUs NVIDIA HPC Acceleration Stacks Ecosystem

66

ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017

ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017 ACCELERATED COMPUTING: THE PATH FORWARD Jensen Huang, Founder & CEO SC17 Nov. 13, 2017 COMPUTING AFTER MOORE S LAW Tech Walker 40 Years of CPU Trend Data 10 7 GPU-Accelerated Computing 10 5 1.1X per year

More information

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X

More information

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016 RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD

More information

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO

More information

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded

More information

A NEW COMPUTING ERA. DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017

A NEW COMPUTING ERA. DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017 A NEW COMPUTING ERA DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X per year 10 3 1.5X per year Single-threaded

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

INVESTOR UPDATE. September 2018

INVESTOR UPDATE. September 2018 INVESTOR UPDATE September 2018 SAFE HARBOR Forward-Looking Statements Except for the historical information contained herein, certain matters in this presentation including, but not limited to, statements

More information

POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017

POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year

More information

Accelerating High Performance Computing.

Accelerating High Performance Computing. Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational

More information

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence

More information

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

NVIDIA PLATFORM FOR AI

NVIDIA PLATFORM FOR AI NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016 SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering

More information

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect

More information

Autonomous Driving Solutions

Autonomous Driving Solutions Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA

More information

ENDURING DIFFERENTIATION. Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded

More information

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION Timothy Lanfear ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf

More information

GTC Jensen Huang Founder & CEO

GTC Jensen Huang Founder & CEO GTC 2018 Jensen Huang Founder & CEO 2 3 4 SCREEN-SPACE AMBIENT OCCLUSION BAKED LIGHTING 5 GLOBAL ILLUMINATION 6 SCREEN-SPACE REFLECTIONS ENVIRONMENT MAPS 7 RAY TRACED REFLECTIONS 8 SCREEN-SPACE REFRACTION

More information

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017 DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X

More information

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Deep Learning: Transforming Engineering and Science The MathWorks, Inc. Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA

More information

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

GTC was the introduction to the future of AI, a protector, a healer, a helper, a guardian, a visionary, and just a little slice of amazing.

GTC was the introduction to the future of AI, a protector, a healer, a helper, a guardian, a visionary, and just a little slice of amazing. GTC 20I8 GTC was the introduction to the future of AI, a protector, a healer, a helper, a guardian, a visionary, and just a little slice of amazing. IT Business Edge Press quote. Publication Clearly the

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

TESLA V100 PERFORMANCE GUIDE May 2018

TESLA V100 PERFORMANCE GUIDE May 2018 TESLA V100 PERFORMANCE GUIDE May 2018 TESLA V100 The Fastest and Most Productive GPU for AI and HPC Volta Architecture Tensor Core Improved NVLink & HBM2 Volta MPS Improved SIMT Model Most Productive GPU

More information

NVIDIA GPU TECHNOLOGY UPDATE

NVIDIA GPU TECHNOLOGY UPDATE NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

MACHINE LEARNING WITH NVIDIA AND IBM POWER AI

MACHINE LEARNING WITH NVIDIA AND IBM POWER AI MACHINE LEARNING WITH NVIDIA AND IBM POWER AI July 2017 Joerg Krall Sr. Business Ddevelopment Manager MFG EMEA jkrall@nvidia.com A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain

More information

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

TACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE

TACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE TACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE Nicola Rieke, Senior Deep Learning Solution Architect Healthcare EMEA Fausto Milletari, Senior Deep Learning Solution Architect Healthcare NALA INTRODUCTION

More information

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC

More information

The Exascale Era Has Arrived

The Exascale Era Has Arrived Technology Spotlight The Exascale Era Has Arrived Sponsored by NVIDIA Steve Conway, Earl Joseph, Bob Sorensen, and Alex Norton November 2018 EXECUTIVE SUMMARY Earlier this year, scientists broke the exascale

More information

Deep Learning mit PowerAI - Ein Überblick

Deep Learning mit PowerAI - Ein Überblick Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Paving the Path to Exascale November 2017 Mellanox Accelerates Leading HPC and AI Systems Summit CORAL System Sierra CORAL System Fastest Supercomputer in Japan Fastest Supercomputer

More information

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable

More information

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,

More information

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018 Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks

More information

NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING. September 13, 2016

NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING. September 13, 2016 NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING September 13, 2016 AI FOR AUTONOMOUS DRIVING MAPPING KALDI LOCALIZATION DRIVENET Training on DGX-1 NVIDIA DGX-1 NVIDIA DRIVE PX 2 Driving with DriveWorks

More information

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to

More information

VOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017

VOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017 VOLTA: PROGRAMMABILITY AND PERFORMANCE Jack Choquette NVIDIA Hot Chips 2017 1 TESLA V100 21B transistors 815 mm 2 80 SM 5120 CUDA Cores 640 Tensor Cores 16 GB HBM2 900 GB/s HBM2 300 GB/s NVLink *full GV100

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

DGX UPDATE. Customer Presentation Deck May 8, 2017

DGX UPDATE. Customer Presentation Deck May 8, 2017 DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated

More information

The Tesla Accelerated Computing Platform

The Tesla Accelerated Computing Platform The Tesla Accelerated Computing Platform Axel Koehler, Principal Solution Architect HPC Advisory Council Meeting Lugano 22 March 2016 Introduction TESLA Platform for HPC Agenda TESLA Platform for HYPERSCALE

More information

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most

More information

NEW NVIDIA PLATFORM FOR AI

NEW NVIDIA PLATFORM FOR AI NEW NVIDIA PLATFORM FOR AI Pedro Mario Cruz e Silva (pcruzesilva@nvidia.com) LinkedIn Solution Architect Manager Enterprise Latin America Global Oil & Gas Team "GTC 2017: 'I AM AI' OPENING IN KEYNOTE"

More information

S8901 Quadro for AI, VR and Simulation

S8901 Quadro for AI, VR and Simulation S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more

More information

INTRODUCING THE DGX FAMILY. Marc Domenech May 8, 2017

INTRODUCING THE DGX FAMILY. Marc Domenech May 8, 2017 INTRODUCING THE DGX FAMILY Marc Domenech May 8, 2017 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering AI computing

More information

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,

More information

NVIDIA TURING GPU ARCHITECTURE. Graphics Reinvented

NVIDIA TURING GPU ARCHITECTURE. Graphics Reinvented NVIDIA TURING GPU ARCHITECTURE Graphics Reinvented WP-09183-001_v01 TABLE OF CONTENTS Introduction to the NVIDIA Turing Architecture...1 NVIDIA Turing Key Features... 3 New Streaming Multiprocessor (SM)...

More information

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr 19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME

More information

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13 GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning

More information

GPU-Accelerated Deep Learning

GPU-Accelerated Deep Learning GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC and hyperscale customers deploy accelerated

More information

Realtime Object Detection and Segmentation for HD Mapping

Realtime Object Detection and Segmentation for HD Mapping Realtime Object Detection and Segmentation for HD Mapping William Raveane Lead AI Engineer Bahram Yoosefizonooz Technical Director NavInfo Europe Advanced Research Lab Presented at GTC Europe 2018 AI in

More information

GPUS FOR NGVLA. M Clark, April 2015

GPUS FOR NGVLA. M Clark, April 2015 S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40

More information

Is your IT Infrastructure Ready for Machine Learning & Artificial Intelligence?

Is your IT Infrastructure Ready for Machine Learning & Artificial Intelligence? BRKPAR-2955 Is your IT Infrastructure Ready for Machine Learning & Artificial Intelligence? Hoseb Dermanilian, EMEA BDM, NetApp Arnaud BASSALER, CSE, Cisco Systems Agenda Introduction AI, Machine Learning

More information

NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU

NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU WP-08608-001_v1.1 August 2017 WP-08608-001_v1.1 TABLE OF CONTENTS Introduction to the NVIDIA Tesla V100 GPU Architecture...

More information

SUPERCHARGED COMPUTING FOR THE DA VINCIS AND EINSTEINS OF OUR TIME

SUPERCHARGED COMPUTING FOR THE DA VINCIS AND EINSTEINS OF OUR TIME SUPERCHARGED COMPUTING FOR THE DA VINCIS AND EINSTEINS OF OUR TIME Twenty-five years ago, we set out to transform computer graphics. Fueled by the massive growth of the gaming market and its insatiable

More information

Small is the New Big: Data Analytics on the Edge

Small is the New Big: Data Analytics on the Edge Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

April 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,

April 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27, April 4-7, 2016 Silicon Valley INSIDE PASCAL Mark Harris, October 27, 2016 @harrism INTRODUCING TESLA P100 New GPU Architecture CPU to CPUEnable the World s Fastest Compute Node PCIe Switch PCIe Switch

More information

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance

More information

TESLA PLATFORM. Jan 2018

TESLA PLATFORM. Jan 2018 TESLA PLATFORM Jan 2018 A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices MOBILE-CLOUD iphone, Amazon AWS 2.5 billion mobile users PC INTERNET WinTel, Yahoo! 1 billion PC users

More information

What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system.

What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system. Point Grey White Paper Series What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system More and more, machine vision systems are expected

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Table of Contents: The Accelerated Data Center Optimizing Data Center Productivity Same Throughput with Fewer Server Nodes

More information

Xilinx ML Suite Overview

Xilinx ML Suite Overview Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

The State of Accelerated Applications. Michael Feldman

The State of Accelerated Applications. Michael Feldman The State of Accelerated Applications Michael Feldman Accelerator Market in HPC Nearly half of all new HPC systems deployed incorporate accelerators Accelerator hardware performance has been advancing

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC enterprise and hyperscale customers deploy

More information

NVIDIA DEEP LEARNING INSTITUTE

NVIDIA DEEP LEARNING INSTITUTE NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

Deep learning in MATLAB From Concept to CUDA Code

Deep learning in MATLAB From Concept to CUDA Code Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,

More information

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017 Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC

More information

In partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER

In partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER In partnership with VelocityAI REFERENCE JULY // 2018 Contents Introduction 01 Challenges with Existing AI/ML/DL Solutions 01 Accelerate AI/ML/DL Workloads with Vexata VelocityAI 02 VelocityAI Reference

More information

THE LEADER IN VISUAL COMPUTING

THE LEADER IN VISUAL COMPUTING MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning

More information

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA, S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print

More information

S CUDA on Xavier

S CUDA on Xavier S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000

More information

Cisco UCS C480 ML M5 Rack Server Performance Characterization

Cisco UCS C480 ML M5 Rack Server Performance Characterization White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

HOW TO BUILD A MODERN AI

HOW TO BUILD A MODERN AI HOW TO BUILD A MODERN AI FOR THE UNKNOWN IN MODERN DATA 1 2016 PURE STORAGE INC. 2 Official Languages Act (1969/1988) 3 Translation Bureau 4 5 DAWN OF 4 TH INDUSTRIAL REVOLUTION BIG DATA, AI DRIVING CHANGE

More information

OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit

OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit OpenCAPI Technology Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI Topics Computation

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Deploying Deep Learning Networks to Embedded GPUs and CPUs Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train

More information

NVIDIA T4 FOR VIRTUALIZATION

NVIDIA T4 FOR VIRTUALIZATION NVIDIA T4 FOR VIRTUALIZATION TB-09377-001-v01 January 2019 Technical Brief TB-09377-001-v01 TABLE OF CONTENTS Powering Any Virtual Workload... 1 High-Performance Quadro Virtual Workstations... 3 Deep Learning

More information

DEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA

DEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA DEEP LEARNING ALISON B LOWNDES Deep Learning Solutions Architect & Community Manager EMEA 1 THE GPU-ACCELERATED WORLD HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 3 Why is Deep Learning

More information

IBM HPC Technology & Strategy

IBM HPC Technology & Strategy IBM HPC Technology & Strategy Hyperion HPC User Forum Stuttgart, October 1st, 2018 The World s Smartest Supercomputers Klaus Gottschalk gottschalk@de.ibm.com HPC Strategy Deliver End to End Solutions for

More information

NVIDIA Accelerators Models HPE NVIDIA GV100 Nvlink Bridge Kit HPE NVIDIA Tesla V100 FHHL 16GB Computational Accelerator

NVIDIA Accelerators Models HPE NVIDIA GV100 Nvlink Bridge Kit HPE NVIDIA Tesla V100 FHHL 16GB Computational Accelerator Overview Hewlett Packard supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology. The

More information

Fra superdatamaskiner til grafikkprosessorer og

Fra superdatamaskiner til grafikkprosessorer og Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc

More information

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int

More information

Signs of Intelligent Life: AI Simplifies IoT

Signs of Intelligent Life: AI Simplifies IoT Signs of Intelligent Life: AI Simplifies IoT JEDEC Mobile & IOT Forum Stephen Lum Samsung Semiconductor, Inc. Copyright 2018 APPLICATIONS DRIVE CHANGES IN ARCHITECTURES x86 Processors Apps Processors FPGA

More information

IBM Power AC922 Server

IBM Power AC922 Server IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated

More information

Accelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx

Accelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx Accelerating your Embedded Vision / Machine Learning design with the revision Stack Giles Peckham, Xilinx Xilinx Foundation at the Edge Vision Customers Using Xilinx >80 ADAS Models From 23 Makers >80

More information

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,

More information

MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT. Kurtis McBride CEO, Miovision

MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT. Kurtis McBride CEO, Miovision MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT Kurtis McBride CEO, Miovision ABOUT MIOVISION COMPANY Founded in 2005 40% growth, year over year Offices in Kitchener, Canada

More information

Enabling Technology for the Cloud and AI One Size Fits All?

Enabling Technology for the Cloud and AI One Size Fits All? Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence

More information