Deploying Deep Learning Networks to Embedded GPUs and CPUs

Size: px
Start display at page:

Download "Deploying Deep Learning Networks to Embedded GPUs and CPUs"

Transcription

1 Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1

2 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Acceleration with GPU s Scale to clusters 2

3 Multi-Platform Deep Learning Deployment Desktop Data-center Nvidia TX1, TX2, TK1 Raspberry pi Mobile Beagle bone Embedded 3

4 Multi-Platform Deep Learning Deployment Need code that takes advantage of: NVIDIA CUDA libraries, including cudnn and TensorRT Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) for Intel processors ARM Compute libraries for ARM processors 4

5 Multi-Platform Deep Learning Deployment Need code that takes advantage of: NVIDIA CUDA libraries, including cudnn and TensorRT NVIDIA Jetson TX1 board Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) for Intel processors Android Phone ARM Compute libraries for ARM processors Intel Xeon Desktop PC Raspberry Pi Board 5

6 Algorithm Design to Embedded Deployment Workflow Conventional Approach Desktop GPU C++ Challenges Integrating multiple libraries and packages Verifying and maintaining multiple implementations Algorithm & vendor lock-in 1 High-level language Deep learning framework Large, complex software stack 2 C/C++ Low-level APIs Application-specific libraries Embedded GPU C++ 3 C/C++ Target-optimized libraries Optimize for memory & speed 6

7 Solution- GPU Coder for Deep Learning Deployment Target Libraries Application logic Intel MKL-DNN Library GPU Coder NVIDIA TensorRT & cudnn Libraries ARM Compute Library 7

8 Deep Learning Deployment Workflows INFERENCE ENGINE DEPLOYMENT INTEGRATED APPLICATION DEPLOYMENT Trained DNN Preprocessing Postprocessing cnncodegen codegen Portable target code Portable target code 8

9 Workflow for Inference Engine Deployment INFERENCE ENGINE DEPLOYMENT Trained DNN Steps for inference engine deployment 1. Generate the code for trained model >> cnncodegen(net, 'targetlib', 'cudnn ) 2. Copy the generated code onto target board cnncodegen Portable target code 3. Build the code for the inference engine >> make C./codegen f mk 4. Use hand written main function to call inference engine 5. Generate the exe and test the executable >> make C./ 9

10 How to get a Trained DNN into MATLAB? Model importer Train in MATLAB Trained DNN Reference model Transfer learning 10

11 Deep Learning Inference Deployment Target Libraries Model importer Intel MKL-DNN Library Train in MATLAB Trained DNN NVIDIA TensorRT & cudnn Libraries Reference model Transfer learning ARM Compute Library 11

12 Building DNN from Scratch Load Training Data Build Layer Architecture Set Training Options Train Network %% Create a datastore imds = imagedatastore( Data,... 'IncludeSubfolders',true,'LabelSource','foldernames'); num_classes = numel(unique(imds.labels)); %% Build layer architecture layers = [imageinputlayer([ ]) convolution2dlayer(5,20) relulayer() maxpooling2dlayer(2,'stride',2) fullyconnectedlayer(512) fullyconnectedlayer(2) softmaxlayer() classificationlayer()]; %% Set Training Options trainopts = trainingoptions( 'sgdm',... 'MiniBatchSize', minibatchsize,... 'Plots', 'training-progress ); %% Train Network net = trainnetwork(imds, layers, trainopts); 12

13 Pedestrian Detection DNN Deployment on ARM Processor layers = [imageinputlayer([ ]) convolution2dlayer(5,20) relulayer() maxpooling2dlayer(2,'stride',2) CrossChannelNormalizationLayer(5,'K',1); convolution2dlayer(5,20) relulayer() maxpooling2dlayer(2,'stride',2) fullyconnectedlayer(512) fullyconnectedlayer(2) softmaxlayer() classificationlayer()]; 13

14 Pedestrian Detection DNN Deployment on ARM Processor ARM Neon instruction set architecture Example: ARM Cortex A ARM Compute Library Low-level Software functions Computer vision, machine learning etc Pedestrian detection on Raspberry pi 14

15 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 15

16 Importing DNN from Open Source Framework Caffe Model Importer (including Caffe Model Zoo) importcaffelayers importcaffenetwork network = importcaffenetwork(protofile, 'yolo.caffemodel'); TensorFlow-Keras Model Importer importkeraslayers importkerasnetwork 16

17 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Object Detection Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 17

18 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 18

19 Deep Learning Inference Deployment Model importer Target Libraries NVIDIA TensorRT & cudnn Libraries Train in MATLAB Trained DNN Intel MKL-DNN Library Reference model Transfer learning ARM Compute Library 19

20 Layered Architecture for Segnet- Semantic Segmentation DAG Network Total number of layers: 91 20

21 NVIDIA TensorRT PROGRAMMABLE INFERENCE ACCELERATOR TESLA P4 TensorRT JETSON TX2 DRIVE PX 2 NVIDIA DLA TESLA V100 21

22 Performance Summary (VGG-16) on TitanXP GPU Coder (TensorRT int8) GPU Coder (TensorRT fp32) GPU Coder (cudnn fp32) MATLAB (cudnn fp32)

23 How Good is Generated Code Performance? Performance of CNN inference (Alexnet) on Titan XP GPU Performance of CNN inference (Alexnet) on Jetson (Tegra) TX2 23

24 Frames per second Alexnet Inference on NVIDIA Titan Xp GPU Coder + TensorRT (3.0.1, int8) GPU Coder + TensorRT (3.0.1) GPU Coder + cudnn MXNet (1.1.0) TensorFlow (1.6.0) Batch Size CPU Intel(R) Xeon(R) CPU E GHz Testing platform GPU cudnn Pascal Titan Xp v7 24

25 Frames per second VGG-16 Inference on NVIDIA Titan Xp GPU Coder + TensorRT (3.0.1, int8) GPU Coder + TensorRT (3.0.1) GPU Coder + cudnn MXNet (1.1.0) TensorFlow (1.6.0) Batch Size CPU Intel(R) Xeon(R) CPU E GHz Testing platform GPU cudnn Pascal Titan Xp v7 25

26 Frames per second Alexnet Inference on Jetson TX2: Frame-Rate Performance GPU Coder + TensorRT GPU Coder + cudnn C++ Caffe (1.0.0-rc5) Batch Size 26

27 Brief Summary DNN libraries are great for inference, GPU coder generates code that takes advantage of: NVIDIA CUDA libraries, including cudnn, and TensorRT Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) ARM Compute libraries for mobile platforms 27

28 Brief Summary DNN libraries are great for inference, GPU coder generates code that takes advantage of: NVIDIA CUDA libraries, including cudnn, and TensorRT but, applications require more than just inference Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) ARM Compute libraries for mobile platforms 28

29 Deep learning Workflows- Integrated Application Deployment INTEGRATED APPLICATION DEPLOYMENT Preprocessing Postprocessing codegen Portable target code 29

30 Traffic sign detection and recognition YOLO Recognition net Object detection DNN Strongest Bounding Box Classifier DNN 30

31 Traffic sign detection and recognition 31

32 Traffic sign detection and recognition 32

33 GPU Coder Helps You Deploy Applications to GPUs Faster GPU Coder CUDA Kernel creation Memory allocation Data transfer minimization Library function mapping Loop optimizations Dependence analysis Data locality analysis GPU memory allocation Data-dependence analysis Dynamic memcpy reduction 33

34 CUDA Code Generation from GPU Coder app Integrated editor and simplified workflow for code generation 34

35 Summary- GPU Coder MATLAB algorithm (functional reference) Build type Call CUDA from MATLAB directly Desktop GPU Call CUDA from (C++) handcoded main() Desktop GPU Embedded GPU Call CUDA from (C++) hand-coded main()..mex.lib Cross-compiled.lib C++ C++ 1 Functional test 2 Deployment unit-test 3 Deployment integration-test 4 Real-time test 35

36 MATLAB Deep Learning Framework DEPLOYMENT Intel MKL-DNN Library Access Data Manage large image sets Automate image labeling Easy access to models Design + Train Acceleration with GPU s Scale to clusters NVIDIA TensorRT & cudnn Libraries ARM Compute Library 36

37 Speaker Details LinkedIn: Contact MathWorks India Products/Training Enquiry Booth Call: Share your experience with MATLAB & Simulink on Social Media Use #MATLABEXPO I use #MATLAB because Attending #MATLABEXPO Examples I use #MATLAB because it helps me be a data scientist! Attending #MATLABEXPO Learning new capabilities in #MATLAB and #Simulink at #MATLABEXPO. Share your session feedback: Please fill in your feedback for this session in the feedback form 37

Deep learning in MATLAB From Concept to CUDA Code

Deep learning in MATLAB From Concept to CUDA Code Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,

More information

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Deep Learning: Transforming Engineering and Science The MathWorks, Inc. Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA

More information

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

Introduction to Deep Learning in Signal Processing & Communications with MATLAB Introduction to Deep Learning in Signal Processing & Communications with MATLAB Dr. Amod Anandkumar Pallavi Kar Application Engineering Group, Mathworks India 2019 The MathWorks, Inc. 1 Different Types

More information

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Engineer pierre.nowodzienski@mathworks.fr 2018 The MathWorks, Inc. 1 From Data to Business value Make decisions Get

More information

GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB

GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB Ram Kokku 2018 The MathWorks, Inc. 1 GPUs and CUDA programming faster Performance CUDA OpenCL C/C++ GPU Coder MATLAB Python Ease of programming

More information

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60

More information

Demystifying Deep Learning

Demystifying Deep Learning Demystifying Deep Learning Let the computers do the hard work Jérémy Huard 2015 The MathWorks, Inc. 1 2 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source

More information

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

NVIDIA FOR DEEP LEARNING. Bill Veenhuis NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA

More information

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017 DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X

More information

Demystifying Deep Learning

Demystifying Deep Learning Demystifying Deep Learning Mandar Gujrathi Mandar.Gujrathi@mathworks.com.au 2015 The MathWorks, Inc. 1 2 Deep Learning Applications Voice assistants (speech to text) Teaching character to beat video game

More information

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018 Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 2015 The MathWorks, Inc. 1 개발에서구현까지 MATLAB 환경에서의딥러닝 김종남 Application Engineer 2015 The MathWorks, Inc. 2 3 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source

More information

GPU-Accelerated Deep Learning

GPU-Accelerated Deep Learning GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,

More information

Realtime Object Detection and Segmentation for HD Mapping

Realtime Object Detection and Segmentation for HD Mapping Realtime Object Detection and Segmentation for HD Mapping William Raveane Lead AI Engineer Bahram Yoosefizonooz Technical Director NavInfo Europe Advanced Research Lab Presented at GTC Europe 2018 AI in

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT. Kurtis McBride CEO, Miovision

MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT. Kurtis McBride CEO, Miovision MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT Kurtis McBride CEO, Miovision ABOUT MIOVISION COMPANY Founded in 2005 40% growth, year over year Offices in Kitchener, Canada

More information

Implementing Deep Learning for Video Analytics on Tegra X1.

Implementing Deep Learning for Video Analytics on Tegra X1. Implementing Deep Learning for Video Analytics on Tegra X1 research@hertasecurity.com Index Who we are, what we do Video analytics pipeline Video decoding Facial detection and preprocessing DNN: learning

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

A performance comparison of Deep Learning frameworks on KNL

A performance comparison of Deep Learning frameworks on KNL A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description

More information

Autonomous Driving Solutions

Autonomous Driving Solutions Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA

More information

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most

More information

NVIDIA DEEP LEARNING INSTITUTE

NVIDIA DEEP LEARNING INSTITUTE NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM

DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection,

More information

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

TENSORRT. RN _v01 January Release Notes

TENSORRT. RN _v01 January Release Notes TENSORRT RN-08624-030_v01 January 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter 1. 2. 3. 4. Overview...1 Release 3.0.2... 2 Release 3.0.1... 4 Release 2.1... 10 RN-08624-030_v01

More information

What s New in MATLAB and Simulink

What s New in MATLAB and Simulink What s New in MATLAB Simulink Fabrizio Sara 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms

More information

End to End Optimization Stack for Deep Learning

End to End Optimization Stack for Deep Learning End to End Optimization Stack for Deep Learning Presenter: Tianqi Chen Paul G. Allen School of Computer Science & Engineering University of Washington Collaborators University of Washington AWS AI Team

More information

What s New in MATLAB and Simulink The MathWorks, Inc. 1

What s New in MATLAB and Simulink The MathWorks, Inc. 1 What s New in MATLAB Simulink 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms applications

More information

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?

More information

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC

More information

Tackling Big Data Using MATLAB

Tackling Big Data Using MATLAB Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate

More information

NVIDIA DATA LOADING LIBRARY (DALI)

NVIDIA DATA LOADING LIBRARY (DALI) NVIDIA DATA LOADING LIBRARY (DALI) RN-09096-001 _v01 September 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. DALI DALI DALI DALI DALI Overview...1 Release

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

NVIDIA PLATFORM FOR AI

NVIDIA PLATFORM FOR AI NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING

More information

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised

More information

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,

More information

Automated Driving Development

Automated Driving Development Automated Driving Development with MATLAB and Simulink MANOHAR REDDY M 2015 The MathWorks, Inc. 1 Using Model-Based Design to develop high quality and reliable Active Safety & Automated Driving Systems

More information

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int

More information

Designing GPU-accelerated applications with RTMaps (Real-Time Multisensor Applications) Framework and NVIDIA DriveWorks

Designing GPU-accelerated applications with RTMaps (Real-Time Multisensor Applications) Framework and NVIDIA DriveWorks MUNICH OCT 10-12, 2017 Designing GPU-accelerated applications with RTMaps (Real-Time Multisensor Applications) Framework and NVIDIA DriveWorks Xavier Rouah Lead Software Engineer Brief introduction about

More information

Hardware and Software Co-Design for Motor Control Applications

Hardware and Software Co-Design for Motor Control Applications Hardware and Software Co-Design for Motor Control Applications Gaurav Dubey Durvesh Kulkarni 2015 The MathWorks, Inc. 1 Key trend: Increasing demands from motor drives Advanced algorithms require faster

More information

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X

More information

IBM Deep Learning Solutions

IBM Deep Learning Solutions IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle

More information

AutoTVM & Device Fleet

AutoTVM & Device Fleet AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Hardware Learning to Optimize Tensor Programs Frameworks High-level data flow graph

More information

Wu Zhiwen.

Wu Zhiwen. Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms

More information

High Performance Computing

High Performance Computing High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason

More information

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018 Copyright Khronos Group 2018 - Page 1 Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc peter.mcguinness@gobrach.com May 2018 Khronos Mission E.g. OpenGL ES provides 3D

More information

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded

More information

NVIDIA DEEP LEARNING PLATFORM

NVIDIA DEEP LEARNING PLATFORM TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance and Efficiency for AI Services, From the Data Center to the Network s Edge Introduction Artificial intelligence (AI), the dream

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware

More information

DEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA

DEEP LEARNING ALISON B LOWNDES. Deep Learning Solutions Architect & Community Manager EMEA DEEP LEARNING ALISON B LOWNDES Deep Learning Solutions Architect & Community Manager EMEA 1 THE GPU-ACCELERATED WORLD HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 3 Why is Deep Learning

More information

Deep Learning Inferencing on IBM Cloud with NVIDIA TensorRT

Deep Learning Inferencing on IBM Cloud with NVIDIA TensorRT Deep Learning Inferencing on IBM Cloud with NVIDIA TensorRT Khoa Huynh Senior Technical Staff Member (STSM), IBM Larry Brown Senior Software Engineer, IBM Agenda Introduction Inferencing with PyCaffe TensorRT

More information

Cisco UCS C480 ML M5 Rack Server Performance Characterization

Cisco UCS C480 ML M5 Rack Server Performance Characterization White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco

More information

Accelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx

Accelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx Accelerating your Embedded Vision / Machine Learning design with the revision Stack Giles Peckham, Xilinx Xilinx Foundation at the Edge Vision Customers Using Xilinx >80 ADAS Models From 23 Makers >80

More information

TENSORRT. RN _v01 June Release Notes

TENSORRT. RN _v01 June Release Notes TENSORRT RN-08624-030_v01 June 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. 6. Overview...1 Release 4.0.1... 2 Release 3.0.4... 6 Release 3.0.2...

More information

Accelerate Deep Learning Inference with openvino toolkit

Accelerate Deep Learning Inference with openvino toolkit Accelerate Deep Learning Inference with openvino toolkit Priyanka Bagade, IoT Developer Evangelist, Intel Core and Visual Computing Group Optimization Notice Intel s compilers may or may not optimize to

More information

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC

More information

Standards for Vision Processing and Neural Networks

Standards for Vision Processing and Neural Networks Copyright Khronos Group 2017 - Page 1 Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD radha.giduthuri@ieee.org Agenda Why we need a standard? Khronos NNEF Khronos OpenVX

More information

NVIDIA AI INFERENCE PLATFORM

NVIDIA AI INFERENCE PLATFORM TECHNICAL OVERVIEW NVIDIA AI INFERENCE PLATFORM Giant Leaps in Performance and Efficiency for AI Services, from the Data Center to the Network s Edge Introduction The artificial intelligence revolution

More information

What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system.

What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system. Point Grey White Paper Series What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system More and more, machine vision systems are expected

More information

What s New in MATLAB May 16, 2017

What s New in MATLAB May 16, 2017 What s New in MATLAB May 16, 2017 2017 The MathWorks, Inc. 1 Agenda MATLAB Foundation Working with Data Building & Sharing MATLAB Applications Application Specific Enhancements Summary and Wrap-up 2 Agenda

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for

More information

Developing Optimization Algorithms for Real-World Applications

Developing Optimization Algorithms for Real-World Applications Developing Optimization Algorithms for Real-World Applications Gautam Ponnappa PC Training Engineer Viju Ravichandran, PhD Education Technical Evangelist 2015 The MathWorks, Inc. 1 2 For a given system,

More information

CafeGPI. Single-Sided Communication for Scalable Deep Learning

CafeGPI. Single-Sided Communication for Scalable Deep Learning CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks

More information

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance

More information

Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications

Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications Ryosuke Tanno and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1. INTRODUCTION Deep

More information

Deep Learning Frameworks with Spark and GPUs

Deep Learning Frameworks with Spark and GPUs Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,

More information

Deep Learning on Modern Architectures. Keren Zhou 4/17/2017

Deep Learning on Modern Architectures. Keren Zhou 4/17/2017 Deep Learning on Modern Architectures Keren Zhou 4/17/2017 HPC Software Stack Application Algorithm Data Layout CPU GPU MIC Others HPC Software Stack Deep Learning Algorithm Data Layout CPU GPU MIC Others

More information

SVM multiclass classification in 10 steps 17/32

SVM multiclass classification in 10 steps 17/32 SVM multiclass classification in 10 steps import numpy as np # load digits dataset from sklearn import datasets digits = datasets. load_digits () # define training set size n_samples = len ( digits. images

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just

More information

Small is the New Big: Data Analytics on the Edge

Small is the New Big: Data Analytics on the Edge Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,

More information

Embedded GPGPU and Deep Learning for Industrial Market

Embedded GPGPU and Deep Learning for Industrial Market Embedded GPGPU and Deep Learning for Industrial Market Author: Dan Mor GPGPU and HPEC Product Line Manager September 2018 Table of Contents 1. INTRODUCTION... 3 2. DIFFICULTIES IN CURRENT EMBEDDED INDUSTRIAL

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training

More information

What s New in MATLAB and Simulink

What s New in MATLAB and Simulink What s New in MATLAB Simulink Selmane Sekkai - Cynthia Cudicini Application Engineering selmane.sekkai@mathworks.fr - cynthia.cudicini@mathworks.fr 1 Analysis Visualization Modeling Simulation Testing

More information

Neural Network Exchange Format

Neural Network Exchange Format Copyright Khronos Group 2017 - Page 1 Neural Network Exchange Format Deploying Trained Networks to Inference Engines Viktor Gyenes, specification editor Copyright Khronos Group 2017 - Page 2 Outlook The

More information

Deep Learning Inference on Openshift with GPUs

Deep Learning Inference on Openshift with GPUs Deep Learning Inference on Openshift with GPUs OpenShift Commons, Seattle, Dec 10 2018 Tripti Singhal Product Manager, NVIDIA Deep Learning Software Tushar Katarki Product Manager, AI on OpenShift AGENDA

More information

What s New for MATLAB David Willingham

What s New for MATLAB David Willingham What s New for MATLAB David Willingham 2015 The MathWorks, Inc. 1 MATLAB Execution Engine Redesigned execution engine runs MATLAB code faster All MATLAB code is now JIT compiled A platform for future improvements

More information

POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017

POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year

More information

S CUDA on Xavier

S CUDA on Xavier S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000

More information

DIGITS DEEP LEARNING GPU TRAINING SYSTEM

DIGITS DEEP LEARNING GPU TRAINING SYSTEM DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection, Localization,

More information

What s New in MATLAB and Simulink Young Joon Lee Principal Application Engineer

What s New in MATLAB and Simulink Young Joon Lee Principal Application Engineer What s New in MATLAB Simulink Young Joon Lee Principal Application Engineer 2016 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers

More information

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,

More information

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information

High-Performance Data Loading and Augmentation for Deep Neural Network Training

High-Performance Data Loading and Augmentation for Deep Neural Network Training High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose

More information

Design your autonomous vehicle applications with NVIDIA DriveWorks components on RTMaps

Design your autonomous vehicle applications with NVIDIA DriveWorks components on RTMaps SAN JOSE MAY 8-11, 2017 Design your autonomous vehicle applications with NVIDIA DriveWorks components on RTMaps Nicolas du Lac CEO, Intempora Brief introduction about Intempora Intempora Software editor

More information

Voice, Image, Video : AI in action with AWS. 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Voice, Image, Video : AI in action with AWS. 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Voice, Image, Video : AI in action with AWS A long heritage of machine learning at Amazon Personalized recommendations Fulfillment automation and inventory management Drones Voice driven interactions Inventing

More information

TENSORRT. SWE-SWDOCTRT-001-RELN_vTensorRT October Release Notes

TENSORRT. SWE-SWDOCTRT-001-RELN_vTensorRT October Release Notes TENSORRT SWE-SWDOCTRT-001-RELN_v 5.0.3 October 2018 Release Notes TABLE OF CONTENTS Chapter 1. Overview...1 Chapter 2. Release 5.x.x... 2 2.1. Release 5.0.3... 2 2.2. Release 5.0.2... 3 2.3. Release 5.0.1

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning

More information

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 2015 The MathWorks, Inc. 1 MATLAB 의 C 코드생성 워크플로우및최적화요령 정승혁과장 2015 The MathWorks, Inc. 2 MATLAB Coder User Story Using MATLAB Try a new idea quickly Evaluation of the system by testing and analysis High

More information

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters

More information

CNN optimization. Rassadin A

CNN optimization. Rassadin A CNN optimization Rassadin A. 01.2017-02.2017 What to optimize? Training stage time consumption (CPU / GPU) Inference stage time consumption (CPU / GPU) Training stage memory consumption Inference stage

More information

EXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA

EXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA EXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA Mark Harris, NVIDIA @harrism #NVSC14 EXTENDING THE REACH OF CUDA 1 Machine Learning 2 Higher Performance 3 New Platforms 4 New Languages 2 GPUS: THE

More information

Getting started with Caffe. Jon Barker, Solutions Architect

Getting started with Caffe. Jon Barker, Solutions Architect Getting started with Caffe Jon Barker, Solutions Architect Caffe tour Overview Agenda Example applications Setup Performance Hands-on lab preview 2 A tour of Caffe 3 What is Caffe? An open framework for

More information

AI, ML, DL. Artificial Intelligence. Machine Learning

AI, ML, DL. Artificial Intelligence. Machine Learning Agenda Recap: AI and QuAI What's new with QuAI? JupyterHub A new addition to QuAI portfolio Intel OpenVINO on QNAP NAS Training to inference, QNAP leads the way Case studies & demo - from training to inference

More information

Convolutional Neural Network Layer Reordering for Acceleration

Convolutional Neural Network Layer Reordering for Acceleration R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

Lecture 1 Introduction. I. Why Study Compilers? II.Example: Machine Learning III. Course Syllabus

Lecture 1 Introduction. I. Why Study Compilers? II.Example: Machine Learning III. Course Syllabus Lecture 1 Introduction I. Why Study Compilers? II.Example: Machine Learning III. Course Syllabus Chapters 1.1-1.5, 8.4, 8.5, 9.1 CS243: Introduction 1 Why Study Compilers? CS243: Introduction 2 1 Reasons

More information

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 Agenda Why Zynq SoCs for Traditional Computer Vision Automated

More information

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and

More information