Deploying Deep Learning Networks to Embedded GPUs and CPUs
|
|
- Rudolf Montgomery
- 5 years ago
- Views:
Transcription
1 Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1
2 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Acceleration with GPU s Scale to clusters 2
3 Multi-Platform Deep Learning Deployment Desktop Data-center Nvidia TX1, TX2, TK1 Raspberry pi Mobile Beagle bone Embedded 3
4 Multi-Platform Deep Learning Deployment Need code that takes advantage of: NVIDIA CUDA libraries, including cudnn and TensorRT Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) for Intel processors ARM Compute libraries for ARM processors 4
5 Multi-Platform Deep Learning Deployment Need code that takes advantage of: NVIDIA CUDA libraries, including cudnn and TensorRT NVIDIA Jetson TX1 board Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) for Intel processors Android Phone ARM Compute libraries for ARM processors Intel Xeon Desktop PC Raspberry Pi Board 5
6 Algorithm Design to Embedded Deployment Workflow Conventional Approach Desktop GPU C++ Challenges Integrating multiple libraries and packages Verifying and maintaining multiple implementations Algorithm & vendor lock-in 1 High-level language Deep learning framework Large, complex software stack 2 C/C++ Low-level APIs Application-specific libraries Embedded GPU C++ 3 C/C++ Target-optimized libraries Optimize for memory & speed 6
7 Solution- GPU Coder for Deep Learning Deployment Target Libraries Application logic Intel MKL-DNN Library GPU Coder NVIDIA TensorRT & cudnn Libraries ARM Compute Library 7
8 Deep Learning Deployment Workflows INFERENCE ENGINE DEPLOYMENT INTEGRATED APPLICATION DEPLOYMENT Trained DNN Preprocessing Postprocessing cnncodegen codegen Portable target code Portable target code 8
9 Workflow for Inference Engine Deployment INFERENCE ENGINE DEPLOYMENT Trained DNN Steps for inference engine deployment 1. Generate the code for trained model >> cnncodegen(net, 'targetlib', 'cudnn ) 2. Copy the generated code onto target board cnncodegen Portable target code 3. Build the code for the inference engine >> make C./codegen f mk 4. Use hand written main function to call inference engine 5. Generate the exe and test the executable >> make C./ 9
10 How to get a Trained DNN into MATLAB? Model importer Train in MATLAB Trained DNN Reference model Transfer learning 10
11 Deep Learning Inference Deployment Target Libraries Model importer Intel MKL-DNN Library Train in MATLAB Trained DNN NVIDIA TensorRT & cudnn Libraries Reference model Transfer learning ARM Compute Library 11
12 Building DNN from Scratch Load Training Data Build Layer Architecture Set Training Options Train Network %% Create a datastore imds = imagedatastore( Data,... 'IncludeSubfolders',true,'LabelSource','foldernames'); num_classes = numel(unique(imds.labels)); %% Build layer architecture layers = [imageinputlayer([ ]) convolution2dlayer(5,20) relulayer() maxpooling2dlayer(2,'stride',2) fullyconnectedlayer(512) fullyconnectedlayer(2) softmaxlayer() classificationlayer()]; %% Set Training Options trainopts = trainingoptions( 'sgdm',... 'MiniBatchSize', minibatchsize,... 'Plots', 'training-progress ); %% Train Network net = trainnetwork(imds, layers, trainopts); 12
13 Pedestrian Detection DNN Deployment on ARM Processor layers = [imageinputlayer([ ]) convolution2dlayer(5,20) relulayer() maxpooling2dlayer(2,'stride',2) CrossChannelNormalizationLayer(5,'K',1); convolution2dlayer(5,20) relulayer() maxpooling2dlayer(2,'stride',2) fullyconnectedlayer(512) fullyconnectedlayer(2) softmaxlayer() classificationlayer()]; 13
14 Pedestrian Detection DNN Deployment on ARM Processor ARM Neon instruction set architecture Example: ARM Cortex A ARM Compute Library Low-level Software functions Computer vision, machine learning etc Pedestrian detection on Raspberry pi 14
15 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 15
16 Importing DNN from Open Source Framework Caffe Model Importer (including Caffe Model Zoo) importcaffelayers importcaffenetwork network = importcaffenetwork(protofile, 'yolo.caffemodel'); TensorFlow-Keras Model Importer importkeraslayers importkerasnetwork 16
17 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Object Detection Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 17
18 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 18
19 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 19
20 Layered Architecture for Segnet- Semantic Segmentation DAG Network Total number of layers: 91 20
21 NVIDIA TensorRT PROGRAMMABLE INFERENCE ACCELERATOR TESLA P4 TensorRT JETSON TX2 DRIVE PX 2 NVIDIA DLA TESLA V100 21
22 Performance Summary (VGG-16) on TitanXP GPU Coder (TensorRT int8) GPU Coder (TensorRT fp32) GPU Coder (cudnn fp32) MATLAB (cudnn fp32)
23 How Good is Generated Code Performance? Performance of CNN inference (Alexnet) on Titan XP GPU Performance of CNN inference (Alexnet) on Jetson (Tegra) TX2 23
24 Frames per second Alexnet Inference on NVIDIA Titan Xp GPU Coder + TensorRT (3.0.1, int8) GPU Coder + TensorRT (3.0.1) GPU Coder + cudnn MXNet (1.1.0) TensorFlow (1.6.0) Batch Size CPU Intel(R) Xeon(R) CPU E GHz Testing platform GPU cudnn Pascal Titan Xp v7 24
25 Frames per second VGG-16 Inference on NVIDIA Titan Xp GPU Coder + TensorRT (3.0.1, int8) GPU Coder + TensorRT (3.0.1) GPU Coder + cudnn MXNet (1.1.0) TensorFlow (1.6.0) Batch Size CPU Intel(R) Xeon(R) CPU E GHz Testing platform GPU cudnn Pascal Titan Xp v7 25
26 Frames per second Alexnet Inference on Jetson TX2: Frame-Rate Performance GPU Coder + TensorRT GPU Coder + cudnn C++ Caffe (1.0.0-rc5) Batch Size 26
27 Brief Summary DNN libraries are great for inference, GPU coder generates code that takes advantage of: NVIDIA CUDA libraries, including cudnn, and TensorRT Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) ARM Compute libraries for mobile platforms 27
28 Brief Summary DNN libraries are great for inference, GPU coder generates code that takes advantage of: NVIDIA CUDA libraries, including cudnn, and TensorRT but, applications require more than just inference Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) ARM Compute libraries for mobile platforms 28
29 Deep learning Workflows- Integrated Application Deployment INTEGRATED APPLICATION DEPLOYMENT Preprocessing Postprocessing codegen Portable target code 29
30 Traffic sign detection and recognition YOLO Recognition net Object detection DNN Strongest Bounding Box Classifier DNN 30
31 Traffic sign detection and recognition 31
32 Traffic sign detection and recognition 32
33 GPU Coder Helps You Deploy Applications to GPUs Faster GPU Coder CUDA Kernel creation Memory allocation Data transfer minimization Library function mapping Loop optimizations Dependence analysis Data locality analysis GPU memory allocation Data-dependence analysis Dynamic memcpy reduction 33
34 CUDA Code Generation from GPU Coder app Integrated editor and simplified workflow for code generation 34
35 Summary- GPU Coder MATLAB algorithm (functional reference) Build type Call CUDA from MATLAB directly Desktop GPU Call CUDA from (C++) handcoded main() Desktop GPU Embedded GPU Call CUDA from (C++) hand-coded main()..mex.lib Cross-compiled.lib C++ C++ 1 Functional test 2 Deployment unit-test 3 Deployment integration-test 4 Real-time test 35
36 MATLAB Deep Learning Framework DEPLOYMENT Intel MKL-DNN Library Access Data Manage large image sets Automate image labeling Easy access to models Design + Train Acceleration with GPU s Scale to clusters NVIDIA TensorRT & cudnn Libraries ARM Compute Library 36
37 Speaker Details LinkedIn: Contact MathWorks India Products/Training Enquiry Booth Call: Share your experience with MATLAB & Simulink on Social Media Use #MATLABEXPO I use #MATLAB because Attending #MATLABEXPO Examples I use #MATLAB because it helps me be a data scientist! Attending #MATLABEXPO Learning new capabilities in #MATLAB and #Simulink at #MATLABEXPO. Share your session feedback: Please fill in your feedback for this session in the feedback form 37
Deep learning in MATLAB From Concept to CUDA Code
Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationIntroduction to Deep Learning in Signal Processing & Communications with MATLAB
Introduction to Deep Learning in Signal Processing & Communications with MATLAB Dr. Amod Anandkumar Pallavi Kar Application Engineering Group, Mathworks India 2019 The MathWorks, Inc. 1 Different Types
More informationEmbarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA
Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Engineer pierre.nowodzienski@mathworks.fr 2018 The MathWorks, Inc. 1 From Data to Business value Make decisions Get
More informationGPU Coder: Automatic CUDA and TensorRT code generation from MATLAB
GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB Ram Kokku 2018 The MathWorks, Inc. 1 GPUs and CUDA programming faster Performance CUDA OpenCL C/C++ GPU Coder MATLAB Python Ease of programming
More informationEFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationDemystifying Deep Learning
Demystifying Deep Learning Let the computers do the hard work Jérémy Huard 2015 The MathWorks, Inc. 1 2 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source
More informationNVIDIA FOR DEEP LEARNING. Bill Veenhuis
NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA
More informationDEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017
DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X
More informationDemystifying Deep Learning
Demystifying Deep Learning Mandar Gujrathi Mandar.Gujrathi@mathworks.com.au 2015 The MathWorks, Inc. 1 2 Deep Learning Applications Voice assistants (speech to text) Teaching character to beat video game
More informationNvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018
Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 개발에서구현까지 MATLAB 환경에서의딥러닝 김종남 Application Engineer 2015 The MathWorks, Inc. 2 3 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source
More informationGPU-Accelerated Deep Learning
GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,
More informationRealtime Object Detection and Segmentation for HD Mapping
Realtime Object Detection and Segmentation for HD Mapping William Raveane Lead AI Engineer Bahram Yoosefizonooz Technical Director NavInfo Europe Advanced Research Lab Presented at GTC Europe 2018 AI in
More informationInference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos
More informationNVIDIA DLI HANDS-ON TRAINING COURSE CATALOG
NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationMIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT. Kurtis McBride CEO, Miovision
MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT Kurtis McBride CEO, Miovision ABOUT MIOVISION COMPANY Founded in 2005 40% growth, year over year Offices in Kitchener, Canada
More informationImplementing Deep Learning for Video Analytics on Tegra X1.
Implementing Deep Learning for Video Analytics on Tegra X1 research@hertasecurity.com Index Who we are, what we do Video analytics pipeline Video decoding Facial detection and preprocessing DNN: learning
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationA performance comparison of Deep Learning frameworks on KNL
A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description
More informationAutonomous Driving Solutions
Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationNVIDIA DEEP LEARNING INSTITUTE
NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationDEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM
DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection,
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationTENSORRT. RN _v01 January Release Notes
TENSORRT RN-08624-030_v01 January 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter 1. 2. 3. 4. Overview...1 Release 3.0.2... 2 Release 3.0.1... 4 Release 2.1... 10 RN-08624-030_v01
More informationWhat s New in MATLAB and Simulink
What s New in MATLAB Simulink Fabrizio Sara 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms
More informationEnd to End Optimization Stack for Deep Learning
End to End Optimization Stack for Deep Learning Presenter: Tianqi Chen Paul G. Allen School of Computer Science & Engineering University of Washington Collaborators University of Washington AWS AI Team
More informationWhat s New in MATLAB and Simulink The MathWorks, Inc. 1
What s New in MATLAB Simulink 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms applications
More informationS8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer
S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?
More informationTOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC
TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC
More informationTackling Big Data Using MATLAB
Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate
More informationNVIDIA DATA LOADING LIBRARY (DALI)
NVIDIA DATA LOADING LIBRARY (DALI) RN-09096-001 _v01 September 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. DALI DALI DALI DALI DALI Overview...1 Release
More informationFast Hardware For AI
Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights
More informationNVIDIA PLATFORM FOR AI
NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING
More informationDEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA
DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised
More informationObject recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK
Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,
More informationAutomated Driving Development
Automated Driving Development with MATLAB and Simulink MANOHAR REDDY M 2015 The MathWorks, Inc. 1 Using Model-Based Design to develop high quality and reliable Active Safety & Automated Driving Systems
More informationDEEP NEURAL NETWORKS AND GPUS. Julie Bernauer
DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int
More informationDesigning GPU-accelerated applications with RTMaps (Real-Time Multisensor Applications) Framework and NVIDIA DriveWorks
MUNICH OCT 10-12, 2017 Designing GPU-accelerated applications with RTMaps (Real-Time Multisensor Applications) Framework and NVIDIA DriveWorks Xavier Rouah Lead Software Engineer Brief introduction about
More informationHardware and Software Co-Design for Motor Control Applications
Hardware and Software Co-Design for Motor Control Applications Gaurav Dubey Durvesh Kulkarni 2015 The MathWorks, Inc. 1 Key trend: Increasing demands from motor drives Advanced algorithms require faster
More informationA NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA
A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationAutoTVM & Device Fleet
AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Hardware Learning to Optimize Tensor Programs Frameworks High-level data flow graph
More informationWu Zhiwen.
Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms
More informationHigh Performance Computing
High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason
More informationOpen Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018
Copyright Khronos Group 2018 - Page 1 Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc peter.mcguinness@gobrach.com May 2018 Khronos Mission E.g. OpenGL ES provides 3D
More informationA NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017
A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded
More informationNVIDIA DEEP LEARNING PLATFORM
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance and Efficiency for AI Services, From the Data Center to the Network s Edge Introduction Artificial intelligence (AI), the dream
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationDEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA
DEEP LEARNING ALISON B LOWNDES Deep Learning Solutions Architect & Community Manager EMEA 1 THE GPU-ACCELERATED WORLD HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 3 Why is Deep Learning
More informationDeep Learning Inferencing on IBM Cloud with NVIDIA TensorRT
Deep Learning Inferencing on IBM Cloud with NVIDIA TensorRT Khoa Huynh Senior Technical Staff Member (STSM), IBM Larry Brown Senior Software Engineer, IBM Agenda Introduction Inferencing with PyCaffe TensorRT
More informationCisco UCS C480 ML M5 Rack Server Performance Characterization
White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco
More informationAccelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx
Accelerating your Embedded Vision / Machine Learning design with the revision Stack Giles Peckham, Xilinx Xilinx Foundation at the Edge Vision Customers Using Xilinx >80 ADAS Models From 23 Makers >80
More informationTENSORRT. RN _v01 June Release Notes
TENSORRT RN-08624-030_v01 June 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. 6. Overview...1 Release 4.0.1... 2 Release 3.0.4... 6 Release 3.0.2...
More informationAccelerate Deep Learning Inference with openvino toolkit
Accelerate Deep Learning Inference with openvino toolkit Priyanka Bagade, IoT Developer Evangelist, Intel Core and Visual Computing Group Optimization Notice Intel s compilers may or may not optimize to
More informationS INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS
S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC
More informationStandards for Vision Processing and Neural Networks
Copyright Khronos Group 2017 - Page 1 Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD radha.giduthuri@ieee.org Agenda Why we need a standard? Khronos NNEF Khronos OpenVX
More informationNVIDIA AI INFERENCE PLATFORM
TECHNICAL OVERVIEW NVIDIA AI INFERENCE PLATFORM Giant Leaps in Performance and Efficiency for AI Services, from the Data Center to the Network s Edge Introduction The artificial intelligence revolution
More informationWhat s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system.
Point Grey White Paper Series What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system More and more, machine vision systems are expected
More informationWhat s New in MATLAB May 16, 2017
What s New in MATLAB May 16, 2017 2017 The MathWorks, Inc. 1 Agenda MATLAB Foundation Working with Data Building & Sharing MATLAB Applications Application Specific Enhancements Summary and Wrap-up 2 Agenda
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for
More informationDeveloping Optimization Algorithms for Real-World Applications
Developing Optimization Algorithms for Real-World Applications Gautam Ponnappa PC Training Engineer Viju Ravichandran, PhD Education Technical Evangelist 2015 The MathWorks, Inc. 1 2 For a given system,
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationCaffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications
Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications Ryosuke Tanno and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1. INTRODUCTION Deep
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationDeep Learning on Modern Architectures. Keren Zhou 4/17/2017
Deep Learning on Modern Architectures Keren Zhou 4/17/2017 HPC Software Stack Application Algorithm Data Layout CPU GPU MIC Others HPC Software Stack Deep Learning Algorithm Data Layout CPU GPU MIC Others
More informationSVM multiclass classification in 10 steps 17/32
SVM multiclass classification in 10 steps import numpy as np # load digits dataset from sklearn import datasets digits = datasets. load_digits () # define training set size n_samples = len ( digits. images
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just
More informationSmall is the New Big: Data Analytics on the Edge
Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,
More informationEmbedded GPGPU and Deep Learning for Industrial Market
Embedded GPGPU and Deep Learning for Industrial Market Author: Dan Mor GPGPU and HPEC Product Line Manager September 2018 Table of Contents 1. INTRODUCTION... 3 2. DIFFICULTIES IN CURRENT EMBEDDED INDUSTRIAL
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationWhat s New in MATLAB and Simulink
What s New in MATLAB Simulink Selmane Sekkai - Cynthia Cudicini Application Engineering selmane.sekkai@mathworks.fr - cynthia.cudicini@mathworks.fr 1 Analysis Visualization Modeling Simulation Testing
More informationNeural Network Exchange Format
Copyright Khronos Group 2017 - Page 1 Neural Network Exchange Format Deploying Trained Networks to Inference Engines Viktor Gyenes, specification editor Copyright Khronos Group 2017 - Page 2 Outlook The
More informationDeep Learning Inference on Openshift with GPUs
Deep Learning Inference on Openshift with GPUs OpenShift Commons, Seattle, Dec 10 2018 Tripti Singhal Product Manager, NVIDIA Deep Learning Software Tushar Katarki Product Manager, AI on OpenShift AGENDA
More informationWhat s New for MATLAB David Willingham
What s New for MATLAB David Willingham 2015 The MathWorks, Inc. 1 MATLAB Execution Engine Redesigned execution engine runs MATLAB code faster All MATLAB code is now JIT compiled A platform for future improvements
More informationPOWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017
POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year
More informationS CUDA on Xavier
S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000
More informationDIGITS DEEP LEARNING GPU TRAINING SYSTEM
DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection, Localization,
More informationWhat s New in MATLAB and Simulink Young Joon Lee Principal Application Engineer
What s New in MATLAB Simulink Young Joon Lee Principal Application Engineer 2016 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers
More informationHPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov
HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationDesign your autonomous vehicle applications with NVIDIA DriveWorks components on RTMaps
SAN JOSE MAY 8-11, 2017 Design your autonomous vehicle applications with NVIDIA DriveWorks components on RTMaps Nicolas du Lac CEO, Intempora Brief introduction about Intempora Intempora Software editor
More informationVoice, Image, Video : AI in action with AWS. 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Voice, Image, Video : AI in action with AWS A long heritage of machine learning at Amazon Personalized recommendations Fulfillment automation and inventory management Drones Voice driven interactions Inventing
More informationTENSORRT. SWE-SWDOCTRT-001-RELN_vTensorRT October Release Notes
TENSORRT SWE-SWDOCTRT-001-RELN_v 5.0.3 October 2018 Release Notes TABLE OF CONTENTS Chapter 1. Overview...1 Chapter 2. Release 5.x.x... 2 2.1. Release 5.0.3... 2 2.2. Release 5.0.2... 3 2.3. Release 5.0.1
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 MATLAB 의 C 코드생성 워크플로우및최적화요령 정승혁과장 2015 The MathWorks, Inc. 2 MATLAB Coder User Story Using MATLAB Try a new idea quickly Evaluation of the system by testing and analysis High
More informationBandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design
Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters
More informationCNN optimization. Rassadin A
CNN optimization Rassadin A. 01.2017-02.2017 What to optimize? Training stage time consumption (CPU / GPU) Inference stage time consumption (CPU / GPU) Training stage memory consumption Inference stage
More informationEXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA
EXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA Mark Harris, NVIDIA @harrism #NVSC14 EXTENDING THE REACH OF CUDA 1 Machine Learning 2 Higher Performance 3 New Platforms 4 New Languages 2 GPUS: THE
More informationGetting started with Caffe. Jon Barker, Solutions Architect
Getting started with Caffe Jon Barker, Solutions Architect Caffe tour Overview Agenda Example applications Setup Performance Hands-on lab preview 2 A tour of Caffe 3 What is Caffe? An open framework for
More informationAI, ML, DL. Artificial Intelligence. Machine Learning
Agenda Recap: AI and QuAI What's new with QuAI? JupyterHub A new addition to QuAI portfolio Intel OpenVINO on QNAP NAS Training to inference, QNAP leads the way Case studies & demo - from training to inference
More informationConvolutional Neural Network Layer Reordering for Acceleration
R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationLecture 1 Introduction. I. Why Study Compilers? II.Example: Machine Learning III. Course Syllabus
Lecture 1 Introduction I. Why Study Compilers? II.Example: Machine Learning III. Course Syllabus Chapters 1.1-1.5, 8.4, 8.5, 9.1 CS243: Introduction 1 Why Study Compilers? CS243: Introduction 2 1 Reasons
More informationOpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017
OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 Agenda Why Zynq SoCs for Traditional Computer Vision Automated
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More information