Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
|
|
- Osborne McDaniel
- 5 years ago
- Views:
Transcription
1 Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA
2
3 Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video
4 AI Inference is exploding 1 Billion Videos Watched Per Day Facebook LIVE VIDEO 1 Billion Voice Searches Per Day Google, Bing, etc. SPEECH 1 Trillion Ads/Rankings Per Day Impressions RECOMMENDATIONS
5 Bigger and More Compute Intensive in quest for accuracy 350X Inception-v4 30X DeepSpeech 3 10X MoE GNMT AlexNet GoogleNet ResNet-50 Inception-v2 DeepSpeech DeepSpeech 2 OpenNMT Image (GOP * Bandwidth) Speech (GOP * Bandwidth) Translation (GOP * Bandwidth)
6 Cambrian Explosion Convolutional Networks Recurrent Networks Generative Adversarial Networks Reinforcement Learning New Species Capsule Nets Encoder/Decoder ReLu BatchNorm LSTM GRU Beam Search 3D-GAN MedGAN Conditional GAN DQN Simulation Mixture of Experts Neural Collaborative Filtering Concat Dropout Pooling WaveNet CTC Attention Coupled GAN Speech Enhancement GAN DDPG Block Sparse LSTM
7 Inefficiency Limits Innovation Single Model Only! Single Framework Only Custom Development ASR NLP Some systems are overused while others are underutilized Recommender Solutions can only support models from one framework Developers need to reinvent the plumbing for every application
8 Speed/accuracy trade-offs for modern convolutional object detectors, Google Research More Accuracy = Time
9 Inference Performance Latency Throughput Efficiency Easy Deploy
10 Inference Optimization Lightweight Model Deep Compression Small Computation Suitable Lightweight Hardware No Network Modification Continuous the Enovation E.g; MobileNet SqueezeNet SEP-Nets E.g; Quantization and Binarization Network Pruning and Sharing Distillation / Factorization
11 TensorRT From framework to target FRAMEWORKS GPU PLATFORMS TESLA T4 TensorRT Optimizer Runtime JETSON TX2 DRIVE PX 2 NVIDIA DLA TESLA V100
12 Throughput (images/sec) Speed-up ResNet-50 V Batch Response time (ms) TensorRT INT8 TensorRT FP16 TensorRT FP32 GPU Native FP32 CPU Native FP32
13 TensorRT 5 More Layers / Plugin / APIs Inference Server Integration
14 TensorRT Workflow Step 1. Optimize trained model Model Optimize Plan
15 TensorRT Workflow Step 2. Deploy optimized plan Embedded Data center Plan Runtime Engine Automotive
16 Step-by-Step TensorRT Plan Build Model Parser C++/Python API Network TensorRT Builder Engine Network Definitions tensorrt-developer-guide/index.html#support_op
17 7 Steps to Deployment with TensorRT Step 1: Convert trained model into TensorRT format Step 2: Create a model parser Step 3: Register inputs and outputs Step 4: Optimize model and create a runtime engine Step 5: Serialize optimized engine Step 6: De-serialize engine Step 7: Perform inference
18 Custom Layer Support using Plugin Layer Model Parser C++/Python API Network TensorRT Builder Engine Network Definitions Plugin Factory Plugin A Plugin B
19 Plugin Layer Procedure icudaengine Plugin Factory Plugin Layer Phase 1. Initialize Layer? (missing) Initialize Phase 2. Inference enqueue Phase 3. Store/Load Serialize Deserialize
20 Supporting Features Feature C++ Python (x86 only) NvCaffeParser NvUffParser CNNs Yes Yes Yes Yes RNNs Yes Yes No No INT8 Calibration Yes Yes N/A N/A Asymmetric Padding Yes Yes No No
21 Define your network using C/C++ or Python TensorRT API INetworkDefinition* network = builder->createnetwork(); convertrnnweights, convertrnnbias setweightsforgate, setbiasforgate
22 TensorRT RNN Programming /usr/src/tensorrt/samples/ common/dumptfwts.py Dump Weights Load Weights Convert Weights Set Weights ckpt wts TF format TRT TF format format TensorRT RNN Layer Get started with samplecharrnn in documentation
23 NMT with Transformer Inference Deploy highly-optimized language translation apps in production environments Running 3 binded engines (Encoder, Generator, Suffle Engine) Modular Network Merge Support NMT layers such as Gather, Softmax, Batch GEMM and Top K Get started with NMT sample in documentation Encoder Input Setup Encoder RNN Runs on CPU GPU-Accelerated Decoder RNN Attention Model Projection TopK Decoder Input Output Beam Search Batch Reduction Beam Scoring Beam Shuffle * CPU-Only: TensorFlow on SKL core, FP32 GPU: V100, TensorRT 5, FP16; Sorted data, Batch=128, English to German
24 Concat Recommendation Inference Deploy multi-layer perceptron (MLP) based recommendation apps in production Predict events (click, purchase, interest) accurately based on input items (user, query, observed activity) Get started with examples on MNIST character recognition and movie recommender system in the documentation 1 2 N User Context Item Item Item... Embedding Embedding Embedding Embedding Embedding xn Fused MLP Kernel Fully Connected Bias Activation Fully Connected Bias Activation Fully Connected Bias Runs on CPU GPU-Accelerated * CPU-Only: TensorFlow on Intel Xeon E v4 CPU at 2.20 GHz; GPU: V100 with custom networks Activation TopK Output Top Scores
25 TFLOPS / TOPS Inferencing with Low precision Peak Performance Float INT8 P4 65 Float INT8 INT4 T4
26 Post Training Quantization Minimize information loss between FP32 and INT8 inference on calibration dataset 100 mini-batches are sufficient to determine quantization parameter (~500 samples)
27 Maximize throughput at low latency with mixed precision compute in production New INT8 APIs and Optimizations FP32 Training Auto Calibration FP32 weights Optimize to INT8 Apply INT8 quantization aware training or custom calibration algorithms with new APIs Control precision per-layer with new APIs Optimizations for depth wise convolution operation FP32 Training FP32 weights Custom Calibration Custom Calibration Calibration Data O(1000) Images FP32 or INT8 weights Custom Activation ranges Optimize to INT8 Getting Start with INT8 sample in the documentation Quantization Aware Training in FP32 Quantization Aware Training Custom Activation Ranges FP32 or accurate INT8 weights Optimize to INT8
28 Inference Performance Throughput Performance/WATT Higher is better!! P4 - INT8 V100 - Mixed P4 - INT8 V100 - Mixed
29 WIDELY ADOPTED 30
30 NVIDIA INFERENCE MOMENTUM Image Tagging Video Analysis Advertising Impact Video Captioning Cybersecurity Visual Search Finding Music Sports Performance Customer Service Visual Search Industrial Inspection Voice Recognition 31
31 Using GPUs made it possible to enable media understanding of the Twitter platform, by drastically reducing media deep learning model training time and by allowing us to derive real-time understanding of live videos at inference time. Nicolas Koumchatzky, Head of Twitter Cortex 32
32 At Microsoft, we are working hard to deliver the most advanced AI-powered services to our customers. Using NVIDIA GPUs in real-time inference workloads has improved Bing s advanced search offerings, enabling us to reduce object detection latency for images. We look forward to working with NVIDIA s next generation inference hardware and software to expand the way people benefit from AI products and services. -Jordi Ribas CVP, Bing and AI Products, Microsoft AI is becoming increasingly pervasive, and inference is a critical capability customers need to successfully deploy their AI models, so we re excited to support NVIDIA s Turing Tesla T4 GPUs on Google Cloud Platform soon. -Chris Kleban, product manager at Google Cloud 33
33 SEOUL NOVEMBER 7-8,2018
EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationA NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017
A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded
More informationS8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer
S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?
More informationNVIDIA PLATFORM FOR AI
NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING
More informationXilinx ML Suite Overview
Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame
More informationNVIDIA FOR DEEP LEARNING. Bill Veenhuis
NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA
More informationA NEW COMPUTING ERA. DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017
A NEW COMPUTING ERA DAVID B. KIRK, FELLOW NVIDIA AI Conference Singapore 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X per year 10 3 1.5X per year Single-threaded
More informationDEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017
DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for
More informationGPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13
GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning
More informationNVIDIA AI INFERENCE PLATFORM
TECHNICAL OVERVIEW NVIDIA AI INFERENCE PLATFORM Giant Leaps in Performance and Efficiency for AI Services, from the Data Center to the Network s Edge Introduction The artificial intelligence revolution
More informationMIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius
MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationGTC Jensen Huang Founder & CEO
GTC 2018 Jensen Huang Founder & CEO 2 3 4 SCREEN-SPACE AMBIENT OCCLUSION BAKED LIGHTING 5 GLOBAL ILLUMINATION 6 SCREEN-SPACE REFLECTIONS ENVIRONMENT MAPS 7 RAY TRACED REFLECTIONS 8 SCREEN-SPACE REFRACTION
More informationDeep Learning Inferencing on IBM Cloud with NVIDIA TensorRT
Deep Learning Inferencing on IBM Cloud with NVIDIA TensorRT Khoa Huynh Senior Technical Staff Member (STSM), IBM Larry Brown Senior Software Engineer, IBM Agenda Introduction Inferencing with PyCaffe TensorRT
More informationA NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA
A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X
More informationPOINT CLOUD DEEP LEARNING
POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57 Introduction AGENDA Previous Work Method Result Conclusion 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification
More informationDeploying Deep Learning Networks to Embedded GPUs and CPUs
Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationPOWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017
POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year
More informationTENSORRT. RN _v01 June Release Notes
TENSORRT RN-08624-030_v01 June 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. 6. Overview...1 Release 4.0.1... 2 Release 3.0.4... 6 Release 3.0.2...
More informationDeep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications Jongsoo Park Facebook AI System SW/HW Co-design Team Sep-21 2018 Team Introduction
More informationDeep Learning Inference on Openshift with GPUs
Deep Learning Inference on Openshift with GPUs OpenShift Commons, Seattle, Dec 10 2018 Tripti Singhal Product Manager, NVIDIA Deep Learning Software Tushar Katarki Product Manager, AI on OpenShift AGENDA
More informationTENSORRT. SWE-SWDOCTRT-001-RELN_vTensorRT October Release Notes
TENSORRT SWE-SWDOCTRT-001-RELN_v 5.0.3 October 2018 Release Notes TABLE OF CONTENTS Chapter 1. Overview...1 Chapter 2. Release 5.x.x... 2 2.1. Release 5.0.3... 2 2.2. Release 5.0.2... 3 2.3. Release 5.0.1
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationCNN optimization. Rassadin A
CNN optimization Rassadin A. 01.2017-02.2017 What to optimize? Training stage time consumption (CPU / GPU) Inference stage time consumption (CPU / GPU) Training stage memory consumption Inference stage
More informationMIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA
MIXED PRECISION TRAINING OF NEURAL NETWORKS Carl Case, Senior Architect, NVIDIA OUTLINE 1. What is mixed precision training with FP16? 2. Considerations and methodology for mixed precision training 3.
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationWu Zhiwen.
Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms
More informationNVIDIA DEEP LEARNING PLATFORM
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance and Efficiency for AI Services, From the Data Center to the Network s Edge Introduction Artificial intelligence (AI), the dream
More informationDeep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor
WELCOME TO Operant conditioning 1938 Spiking Neuron 1952 first neuroscience department 1964 Deep learning prevalence mid 2000s The Turing Machine 1936 Transistor 1947 First computer science department
More informationIntroduction to Deep Learning in Signal Processing & Communications with MATLAB
Introduction to Deep Learning in Signal Processing & Communications with MATLAB Dr. Amod Anandkumar Pallavi Kar Application Engineering Group, Mathworks India 2019 The MathWorks, Inc. 1 Different Types
More informationSpeculations about Computer Architecture in Next Three Years. Jan. 20, 2018
Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization
More informationTEXAS INSTRUMENTS DEEP LEARNING (TIDL) GOES HERE FOR SITARA PROCESSORS GOES HERE
YOUR TEXAS INSTRUMENTS VIDEO TITLE DEEP LEARNING (TIDL) GOES HERE FOR SITARA PROCESSORS OVERVIEW THE SUBTITLE GOES HERE Texas Instruments Deep Learning (TIDL) for Sitara Processors Overview Texas Instruments
More informationHPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov
HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,
More informationMoonRiver: Deep Neural Network in C++
MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement
More informationTENSORRT. RN _v01 January Release Notes
TENSORRT RN-08624-030_v01 January 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter 1. 2. 3. 4. Overview...1 Release 3.0.2... 2 Release 3.0.1... 4 Release 2.1... 10 RN-08624-030_v01
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationXilinx ML Suite Overview
Xilinx ML Suite Overview Jim Heaton Sr. FAE Deep Learning explores the study of algorithms that can learn from and make predictions on data Deep Learning is Re-defining Many Applications Cloud Acceleration
More informationTutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY
Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationDeep learning in MATLAB From Concept to CUDA Code
Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,
More informationOPENSEQ2SEQ: A DEEP LEARNING TOOLKIT FOR SPEECH RECOGNITION, SPEECH SYNTHESIS, AND NLP
OPENSEQ2SEQ: A DEEP LEARNING TOOLKIT FOR SPEECH RECOGNITION, SPEECH SYNTHESIS, AND NLP Boris Ginsburg, Vitaly Lavrukhin, Igor Gitman, Oleksii Kuchaiev, Jason Li, Vahid Noroozi, Ravi Teja Gadde, Chip Nguyen
More informationConvolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research
Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution
More informationNVIDIA DLI HANDS-ON TRAINING COURSE CATALOG
NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationOpen Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018
Copyright Khronos Group 2018 - Page 1 Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc peter.mcguinness@gobrach.com May 2018 Khronos Mission E.g. OpenGL ES provides 3D
More informationDevice Placement Optimization with Reinforcement Learning
Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Mohammad Norouzi, Naveen Kumar, Rasmus Larsen, Yuefeng Zhou, Quoc Le, Samy Bengio, and
More informationUnified Deep Learning with CPU, GPU, and FPGA Technologies
Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationNEW NVIDIA PLATFORM FOR AI
NEW NVIDIA PLATFORM FOR AI Pedro Mario Cruz e Silva (pcruzesilva@nvidia.com) LinkedIn Solution Architect Manager Enterprise Latin America Global Oil & Gas Team "GTC 2017: 'I AM AI' OPENING IN KEYNOTE"
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationShrinath Shanbhag Senior Software Engineer Microsoft Corporation
Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade
More informationNVIDIA DEEP LEARNING INSTITUTE
NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationInference Engine compiler and SDK FWDNXT 2018
Inference Engine compiler and SDK 1 Deep Learning processor Best performance per power Best utilization Efficient use of memory bandwidth Low latency Scalability: IoT to cloud 2 Deep Learning processor
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationMACHINE LEARNING WITH NVIDIA AND IBM POWER AI
MACHINE LEARNING WITH NVIDIA AND IBM POWER AI July 2017 Joerg Krall Sr. Business Ddevelopment Manager MFG EMEA jkrall@nvidia.com A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationNvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018
Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks
More informationGPU Coder: Automatic CUDA and TensorRT code generation from MATLAB
GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB Ram Kokku 2018 The MathWorks, Inc. 1 GPUs and CUDA programming faster Performance CUDA OpenCL C/C++ GPU Coder MATLAB Python Ease of programming
More informationImplementing Deep Learning for Video Analytics on Tegra X1.
Implementing Deep Learning for Video Analytics on Tegra X1 research@hertasecurity.com Index Who we are, what we do Video analytics pipeline Video decoding Facial detection and preprocessing DNN: learning
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationNVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING. September 13, 2016
NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING September 13, 2016 AI FOR AUTONOMOUS DRIVING MAPPING KALDI LOCALIZATION DRIVENET Training on DGX-1 NVIDIA DGX-1 NVIDIA DRIVE PX 2 Driving with DriveWorks
More informationDEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA
DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised
More informationDeep Learning Compiler
Deep Learning Compiler AWS AI Acknowledgement Amazon Sagemaker Neo Enables developers to train machine learning models once and run them anywhere in the cloud and at the edge Hardware targets Intel CPU,
More informationScaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research
Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)
More informationAn introduction to Machine Learning silicon
An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationIndex. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,
Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110
More informationDeep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm
Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm What is Machine Learning (ML)? Artificial Intelligence Machine Learning Deep Learning Neural Networks Additional
More informationDeep Learning on Modern Architectures. Keren Zhou 4/17/2017
Deep Learning on Modern Architectures Keren Zhou 4/17/2017 HPC Software Stack Application Algorithm Data Layout CPU GPU MIC Others HPC Software Stack Deep Learning Algorithm Data Layout CPU GPU MIC Others
More informationNeural Network Exchange Format
Copyright Khronos Group 2017 - Page 1 Neural Network Exchange Format Deploying Trained Networks to Inference Engines Viktor Gyenes, specification editor Copyright Khronos Group 2017 - Page 2 Outlook The
More informationS INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS
S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC
More informationIn partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER
In partnership with VelocityAI REFERENCE JULY // 2018 Contents Introduction 01 Challenges with Existing AI/ML/DL Solutions 01 Accelerate AI/ML/DL Workloads with Vexata VelocityAI 02 VelocityAI Reference
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationBrainchip OCTOBER
Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History
More informationWhy data science is the new frontier in software development
Why data science is the new frontier in software development And why every developer should care Jeff Prosise jeffpro@wintellect.com @jprosise Assertion #1 Being a programmer is like being the god of your
More informationThe OpenVX Computer Vision and Neural Network Inference
The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos
More informationAutonomous Driving Solutions
Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA
More informationOnto Petaflops with Kubernetes
Onto Petaflops with Kubernetes Vishnu Kannan Google Inc. vishh@google.com Key Takeaways Kubernetes can manage hardware accelerators at Scale Kubernetes provides a playground for ML ML journey with Kubernetes
More informationAccelerate Machine Learning on macos with Intel Integrated Graphics. Hisham Chowdhury May 23, 2018
Accelerate Machine Learning on macos with Intel Integrated Graphics Hisham Chowdhury May 23, 2018 Apple Machine Learning Stack Machine Learning Application 1 Machine Learning Application 2 Vision Natural
More informationGTC was the introduction to the future of AI, a protector, a healer, a helper, a guardian, a visionary, and just a little slice of amazing.
GTC 20I8 GTC was the introduction to the future of AI, a protector, a healer, a helper, a guardian, a visionary, and just a little slice of amazing. IT Business Edge Press quote. Publication Clearly the
More informationHello Edge: Keyword Spotting on Microcontrollers
Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of
More informationAccelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx
Accelerating your Embedded Vision / Machine Learning design with the revision Stack Giles Peckham, Xilinx Xilinx Foundation at the Edge Vision Customers Using Xilinx >80 ADAS Models From 23 Makers >80
More informationTOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC
TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC
More informationHow GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics. Jan Neumann Comcast Labs DC May 10th, 2017
How GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics Jan Neumann Comcast Labs DC May 10th, 2017 Comcast Applied Artificial Intelligence Lab Media & Video Analytics Smart TV Deep Learning
More informationDeep learning at Microsoft
Deep learning at Services Skype Translator Cortana Bing HoloLens Research Services ImageNet: 2015 ResNet 28.2 25.8 ImageNet Classification top-5 error (%) 16.4 11.7 7.3 6.7 3.5 ILSVRC 2010 NEC America
More informationTHE NVIDIA DEEP LEARNING ACCELERATOR
THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural
More informationNew Research Trends in Cybersecurity Privacy Attacks on Decentralized Deep Learning
New Research Trends in Cybersecurity Privacy Attacks on Decentralized Deep Learning Prof. Luigi V. Mancini Tel. +39 06 4991 8421 E-mail: mancini@di.uniroma1.it http://www.di.uniroma1.it/mancini Machine
More informationConvolutional Neural Networks
NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise
More informationBandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design
Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters
More informationPersistent RNNs. (stashing recurrent weights on-chip) Gregory Diamos. April 7, Baidu SVAIL
(stashing recurrent weights on-chip) Baidu SVAIL April 7, 2016 SVAIL Think hard AI. Goal Develop hard AI technologies that impact 100 million users. Deep Learning at SVAIL 100 GFLOP/s 1 laptop 6 TFLOP/s
More informationSnapdragon NPE Overview
March 2018 Linaro Connect Hong Kong Snapdragon NPE Overview Mark Charlebois Director, Engineering Qualcomm Technologies, Inc. Caffe2 Snapdragon Neural Processing Engine Efficient execution on Snapdragon
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationSony's deep learning software "Neural Network Libraries/Console and its use cases in Sony
GTC 2018 Sony's deep learning software "Neural Network Libraries/Console and its use cases in Sony Yoshiyuki Kobayashi Senior Machine Learning Researcher Sony Network Communications Inc. / Sony Corporation
More informationOPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi
OPTIMIZED GPU KERNELS FOR DEEP LEARNING Amir Khosrowshahi GTC 17 Mar 2015 Outline About nervana Optimizing deep learning at assembler level Limited precision for deep learning neon benchmarks 2 About nervana
More informationPluto A Distributed Heterogeneous Deep Learning Framework. Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud
Pluto A Distributed Heterogeneous Deep Learning Framework Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud Outline PAI(Platform of Artificial Intelligence) PAI Overview Deep Learning with PAI Pluto
More informationGPU-Accelerated Deep Learning
GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,
More informationConvolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,
Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep
More information