Deep learning in MATLAB From Concept to CUDA Code
|
|
- Ella Cole
- 6 years ago
- Views:
Transcription
1 Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics Ram Kokku Principal Engineer MathWorks 2017 The MathWorks, Inc. 1
2 Talk Outline Design Deep Learning & Vision Algorithms Manage large image sets Automate image labeling Easy access to models Pre-built training frameworks Accelerate and Scale Training Acceleration with GPU s Scale to clusters High Performance Deployment Automate compilation with GPU Coder On TitanXP: 7x faster than TensorFlow 5x faster than pycaffe2 On Jetson: On par with TensorRT 2x faster than C++-Caffe 2
3 Example: Transfer Learning Workflow Transfer Learning Images Labels Load Reference Network Modify Network Structure Learn New Weights New Classifier Training Data Labels: Cars, Trucks, BigTrucks, SUVs, Vans 3
4 Example: Transfer Learning in MATLAB Set up training dataset Split, shuffle, re-arrange images Read image, Data augmentation (clip, rotate, resize, etc) Easily manage large sets of images Single line of code to access images Operates on disk, database, big-data file system 4
5 Example: Transfer Learning in MATLAB Set up training dataset Load Reference Network Create DNNs in MATLAB 1. Easy access to research models 2. Caffe Model importer 3. Build from scratch 5
6 Example: Transfer Learning in MATLAB Set up training dataset Load Reference Network Modify Network Structure 6
7 Example: Transfer Learning in MATLAB Set up training dataset Load Reference Network Modify Network Structure 7
8 Example: Transfer Learning in MATLAB Set up training dataset Load Reference Network Modify Network Structure Learn New Weights Many more training options 8
9 Deep learning on CPU, GPU, multi-gpu and clusters More GPUs 9
10 More CPUs Deep learning on CPU, GPU, multi-gpu and clusters More GPUs 10
11 More CPUs Deep learning on CPU, GPU, multi-gpu and clusters More GPUs 11
12 Visualizing and Debugging Intermediate Results Training Accuracy Visualization Deep Dream Filters Many options for visualizations and debugging Examples to get started Layer Activations Feature Visualization Deep Dream Activations 13
13 GPU Coder for Deployment: New Product in R2017b GPU Coder Accelerated implementation of parallel algorithms on GPUs Neural Networks Deep Learning, machine learning Image Processing and Computer Vision Image filtering, feature detection/extraction Signal Processing and Communications FFT, filtering, cross correlation, 7x faster than state-of-art 700x faster than CPUs for feature extraction 20x faster than CPUs for FFTs 14
14 GPU Coder Compilation Flow GPU Coder CUDA Kernel creation Memory allocation Data transfer minimization Library function mapping Loop optimizations Dependence analysis Data locality analysis GPU memory allocation Data-dependence analysis Dynamic memcpy reduction 15
15 GPU Coder Generates CUDA from MATLAB: saxpy Scalarized MATLAB CUDA Vectorized MATLAB CUDA kernel for GPU parallelization Loops and matrix operations are directly compiled in to kernels 16
16 Generated CUDA Optimized for Memory Performance Kernel data allocation is automatically optimized CUDA kernel for GPU parallelization CUDA Mandelbrot space 17
17 Algorithm Design to Embedded Deployment Workflow MATLAB algorithm (functional reference) Build type Call CUDA from MATLAB directly Desktop GPU Call CUDA from (C++) handcoded main() Desktop GPU Embedded GPU Call CUDA from (C++) hand-coded main()..mex.lib Cross-compiled.lib C++ C++ 1 Functional test 2 Deployment unit-test 3 Deployment integration-test 4 Real-time test 20
18 Demo: Alexnet Deployment with mex Code Generation 21
19 Algorithm Design to Embedded Deployment on Tegra GPU MATLAB algorithm (functional reference) Build type Call CUDA from MATLAB directly Tesla GPU Call CUDA from (C++) handcoded main() Tesla GPU Tegra GPU Call CUDA from (C++) hand-coded main(). Cross-compiled on host with Linaro toolchain.mex.lib Cross-compiled.lib C++ C++ 1 Functional test 2 Deployment unit-test 3 Deployment integration-test 4 Real-time test (Test in MATLAB on host) (Test generated code in MATLAB on host + GPU) (Test generated code within C/C++ app on host + GPU) (Test generated code within C/C++ app on Tegra target) 22
20 Alexnet Deployment to Tegra: Cross-Compiled with lib Two small changes 1. Change build-type to lib 2. Select cross-compile toolchain 23
21 End-to-End Application: Lane Detection Alexnet Transfer Learning Output of CNN is lane parabola co-efficients according to: y = ax^2 + bx + c Image Lane detection CNN Left lane co-efficients Right lane co-efficients Post-processing (find left/right lane points) Image with marked lanes GPU coder generates code for whole application 24
22 How Good is Generated Code Performance Performance of image processing and computer vision Performance of CNN inference (Alexnet) on Titan XP GPU Performance of CNN inference (Alexnet) on Jetson (Tegra) TX2 25
23 GPU Coder for Image Processing and Computer Vision Fog removal Stereo disparity Distance transform SURF feature extraction Ray tracing Orders magnitude speedup over CPU 26
24 Frames per second Alexnet Inference on NVIDIA Titan XP MATLAB GPU Coder (R2017b) 2x 5x 7x mxnet (0.10) MATLAB (R2017b) Caffe2 (0.8.1) TensorFlow (1.2.0) Batch Size CPU Intel(R) Xeon(R) CPU E GHz Testing platform GPU cudnn Pascal Titan Xp v5 27
25 Frames per second Alexnet Inference on Jetson TX2: Frame-Rate Performance x TensorRT (2.1) MATLAB GPU Coder (R2017b) 200 2x C++ Caffe (1.0.0-rc5) Batch Size 28
26 Peak Memory (MB) Alexnet Inference on Jetson TX2: Memory Performance C++ Caffe (1.0.0-rc5) MATLAB GPU Coder (R2017b) TensorRT 2.1 (using giexec wrapper) Batch Size 30
27 Design Your DNNs in MATLAB, Deploy with GPU Coder Design Deep Learning & Vision Algorithms Manage large image sets Automate image labeling Easy access to models Pre-built training frameworks Accelerate and Scale Training Acceleration with GPU s Scale to clusters High Performance Deployment Automate compilation with GPU Coder On TitanXP: 7x faster than TensorFlow 5x faster than pycaffe2 On Jetson TX2: On par with TensorRT 2x faster than C++-Caffe 31
28 Check Out Deep Learning in MATLAB and GPU Coder Deep learning in MATLAB GPU Coder systematics.co.il\mwevents 32
Deep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationDeploying Deep Learning Networks to Embedded GPUs and CPUs
Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train
More informationEmbarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA
Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Engineer pierre.nowodzienski@mathworks.fr 2018 The MathWorks, Inc. 1 From Data to Business value Make decisions Get
More informationGPU Coder: Automatic CUDA and TensorRT code generation from MATLAB
GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB Ram Kokku 2018 The MathWorks, Inc. 1 GPUs and CUDA programming faster Performance CUDA OpenCL C/C++ GPU Coder MATLAB Python Ease of programming
More informationIntroduction to Deep Learning in Signal Processing & Communications with MATLAB
Introduction to Deep Learning in Signal Processing & Communications with MATLAB Dr. Amod Anandkumar Pallavi Kar Application Engineering Group, Mathworks India 2019 The MathWorks, Inc. 1 Different Types
More informationDemystifying Deep Learning
Demystifying Deep Learning Let the computers do the hard work Jérémy Huard 2015 The MathWorks, Inc. 1 2 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source
More informationDemystifying Deep Learning
Demystifying Deep Learning Mandar Gujrathi Mandar.Gujrathi@mathworks.com.au 2015 The MathWorks, Inc. 1 2 Deep Learning Applications Voice assistants (speech to text) Teaching character to beat video game
More informationNVIDIA DLI HANDS-ON TRAINING COURSE CATALOG
NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationNVIDIA FOR DEEP LEARNING. Bill Veenhuis
NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA
More informationNvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018
Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks
More informationDEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017
DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X
More informationDEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM
DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection,
More informationGPU-Accelerated Deep Learning
GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 개발에서구현까지 MATLAB 환경에서의딥러닝 김종남 Application Engineer 2015 The MathWorks, Inc. 2 3 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source
More informationNVIDIA DEEP LEARNING INSTITUTE
NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationEFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationNVIDIA DATA LOADING LIBRARY (DALI)
NVIDIA DATA LOADING LIBRARY (DALI) RN-09096-001 _v01 September 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. DALI DALI DALI DALI DALI Overview...1 Release
More informationAutonomous Driving Solutions
Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA
More informationMIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT. Kurtis McBride CEO, Miovision
MIOVISION DEEP LEARNING TRAFFIC ANALYTICS SYSTEM FOR REAL-WORLD DEPLOYMENT Kurtis McBride CEO, Miovision ABOUT MIOVISION COMPANY Founded in 2005 40% growth, year over year Offices in Kitchener, Canada
More information컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision
1 컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision 김종남 Application Engineer 2017 The MathWorks, Inc. 2 Three Main Topics New capabilities for computer vision system design: Deep Learning 3-D Vision
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationParallel and Distributed Computing with MATLAB The MathWorks, Inc. 1
Parallel and Distributed Computing with MATLAB 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster insight on more complex problems with larger datasets
More informationParallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer
Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster
More informationRealtime Object Detection and Segmentation for HD Mapping
Realtime Object Detection and Segmentation for HD Mapping William Raveane Lead AI Engineer Bahram Yoosefizonooz Technical Director NavInfo Europe Advanced Research Lab Presented at GTC Europe 2018 AI in
More informationFast Hardware For AI
Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights
More informationDEEP NEURAL NETWORKS AND GPUS. Julie Bernauer
DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int
More informationTENSORRT. RN _v01 January Release Notes
TENSORRT RN-08624-030_v01 January 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter 1. 2. 3. 4. Overview...1 Release 3.0.2... 2 Release 3.0.1... 4 Release 2.1... 10 RN-08624-030_v01
More informationDIGITS DEEP LEARNING GPU TRAINING SYSTEM
DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection, Localization,
More informationA NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017
A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded
More informationInference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos
More informationSpeeding up MATLAB Applications Sean de Wolski Application Engineer
Speeding up MATLAB Applications Sean de Wolski Application Engineer 2014 The MathWorks, Inc. 1 Non-rigid Displacement Vector Fields 2 Agenda Leveraging the power of vector and matrix operations Addressing
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationImplementing Deep Learning for Video Analytics on Tegra X1.
Implementing Deep Learning for Video Analytics on Tegra X1 research@hertasecurity.com Index Who we are, what we do Video analytics pipeline Video decoding Facial detection and preprocessing DNN: learning
More informationOptimizing and Accelerating Your MATLAB Code
Optimizing and Accelerating Your MATLAB Code Sofia Mosesson Senior Application Engineer 2016 The MathWorks, Inc. 1 Agenda Optimizing for loops and using vector and matrix operations Indexing in different
More informationHigh Performance Computing
High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason
More informationS CUDA on Xavier
S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationTENSORRT 4.0 RELEASE CANDIDATE (RC)
TENSORRT 4.0 RELEASE CANDIDATE (RC) DU-08731-001_v4.0 RC March 2018 Installation Guide TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. Getting Started... 2 Chapter 3. Downloading TensorRT...3 Chapter
More informationWu Zhiwen.
Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms
More informationTENSORRT. RN _v01 June Release Notes
TENSORRT RN-08624-030_v01 June 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. 6. Overview...1 Release 4.0.1... 2 Release 3.0.4... 6 Release 3.0.2...
More informationDeep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017
Deep Learning Frameworks COSC 7336: Advanced Natural Language Processing Fall 2017 Today s lecture Deep learning software overview TensorFlow Keras Practical Graphical Processing Unit (GPU) From graphical
More informationWhat s New in MATLAB and Simulink The MathWorks, Inc. 1
What s New in MATLAB Simulink 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms applications
More informationObject recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK
Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,
More informationDesigning GPU-accelerated applications with RTMaps (Real-Time Multisensor Applications) Framework and NVIDIA DriveWorks
MUNICH OCT 10-12, 2017 Designing GPU-accelerated applications with RTMaps (Real-Time Multisensor Applications) Framework and NVIDIA DriveWorks Xavier Rouah Lead Software Engineer Brief introduction about
More informationAll You Want To Know About CNNs. Yukun Zhu
All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image
More informationWhat s New for MATLAB David Willingham
What s New for MATLAB David Willingham 2015 The MathWorks, Inc. 1 MATLAB Execution Engine Redesigned execution engine runs MATLAB code faster All MATLAB code is now JIT compiled A platform for future improvements
More informationEXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA
EXTENDING THE REACH OF PARALLEL COMPUTING WITH CUDA Mark Harris, NVIDIA @harrism #NVSC14 EXTENDING THE REACH OF CUDA 1 Machine Learning 2 Higher Performance 3 New Platforms 4 New Languages 2 GPUS: THE
More informationS INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS
S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC
More informationCaffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications
Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications Ryosuke Tanno and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1. INTRODUCTION Deep
More informationTENSORRT 3.0. DU _v3.0 February Installation Guide
TENSORRT 3.0 DU-08731-001_v3.0 February 2018 Installation Guide TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. Getting Started... 2 Chapter 3. Downloading TensorRT...4 Chapter 4. Installing TensorRT...
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 C/C++ 사용자를위한 MATLAB 활용 : 알고리즘개발및검증 이웅재부장 2015 The MathWorks, Inc. 2 Signal Processing Algorithm Design with C/C++ Specification Algorithm Development C/C++ Testing & Debugging
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More information개발과정에서의 MATLAB 과 C 의연동 ( 영상처리분야 )
개발과정에서의 MATLAB 과 C 의연동 ( 영상처리분야 ) Application Engineer Caleb Kim 2016 The MathWorks, Inc. 1 Algorithm Development with MATLAB for C/C++ Programmers Objectives Use MATLAB throughout algorithm development
More informationCNN optimization. Rassadin A
CNN optimization Rassadin A. 01.2017-02.2017 What to optimize? Training stage time consumption (CPU / GPU) Inference stage time consumption (CPU / GPU) Training stage memory consumption Inference stage
More informationAccelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx
Accelerating your Embedded Vision / Machine Learning design with the revision Stack Giles Peckham, Xilinx Xilinx Foundation at the Edge Vision Customers Using Xilinx >80 ADAS Models From 23 Makers >80
More informationA performance comparison of Deep Learning frameworks on KNL
A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description
More informationCUDNN. DU _v07 December Installation Guide
CUDNN DU-08670-001_v07 December 2017 Installation Guide TABLE OF CONTENTS Chapter Overview... 1 Chapter Installing on Linux... 2 Prerequisites... 2 Installing NVIDIA Graphics Drivers... 2 Installing CUDA...
More informationWhat s New in MATLAB and Simulink
What s New in MATLAB Simulink Fabrizio Sara 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms
More informationGetting started with Caffe. Jon Barker, Solutions Architect
Getting started with Caffe Jon Barker, Solutions Architect Caffe tour Overview Agenda Example applications Setup Performance Hands-on lab preview 2 A tour of Caffe 3 What is Caffe? An open framework for
More informationEmbedded GPGPU and Deep Learning for Industrial Market
Embedded GPGPU and Deep Learning for Industrial Market Author: Dan Mor GPGPU and HPEC Product Line Manager September 2018 Table of Contents 1. INTRODUCTION... 3 2. DIFFICULTIES IN CURRENT EMBEDDED INDUSTRIAL
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationTwo FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters
Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed
More informationNVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING. September 13, 2016
NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING September 13, 2016 AI FOR AUTONOMOUS DRIVING MAPPING KALDI LOCALIZATION DRIVENET Training on DGX-1 NVIDIA DGX-1 NVIDIA DRIVE PX 2 Driving with DriveWorks
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions
More informationNVIDIA PLATFORM FOR AI
NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING
More informationThe OpenVX Computer Vision and Neural Network Inference
The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos
More informationManycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT
Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate
More informationA NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA
A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationDEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA
DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised
More informationSmall is the New Big: Data Analytics on the Edge
Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,
More informationWhat s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system.
Point Grey White Paper Series What s inside: What is deep learning Why is deep learning taking off now? Multiple applications How to implement a system More and more, machine vision systems are expected
More informationTENSORFLOW. DU _v1.8.0 June User Guide
TENSORFLOW DU-08601-001_v1.8.0 June 2018 User Guide TABLE OF CONTENTS Chapter 1. Overview Of... 1 1.1. Contents Of The NVIDIA Container... 1 Chapter 2. Pulling The Container... 3 Chapter 3. Running A Container...4
More informationLarge Data in MATLAB: A Seismic Data Processing Case Study U. M. Sundar Senior Application Engineer
Large Data in MATLAB: A Seismic Data Processing Case Study U. M. Sundar Senior Application Engineer 2013 MathWorks, Inc. 1 Problem Statement: Scaling Up Seismic Analysis Challenge: Developing a seismic
More informationGPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13
GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationCan FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.
Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.) Andreas Kurth 2017-12-05 1 In short: The situation Image credit:
More informationGetting Started with MATLAB Francesca Perino
Getting Started with MATLAB Francesca Perino francesca.perino@mathworks.it 2014 The MathWorks, Inc. 1 Agenda MATLAB Intro Importazione ed esportazione Programmazione in MATLAB Tecniche per la velocizzazione
More informationSemantic Segmentation
Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,
More informationS8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer
S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationSlalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Florian Tramèr (joint work with Dan Boneh) Intel, Santa Clara August 30 th 2018 Trusted execution of ML: 3 motivating
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationScaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1
Scaling MATLAB for Your Organisation and Beyond Rory Adams 2015 The MathWorks, Inc. 1 MATLAB at Scale Front-end scaling Scale with increasing access requests Back-end scaling Scale with increasing computational
More informationThe Tesla Accelerated Computing Platform
The Tesla Accelerated Computing Platform Axel Koehler, Principal Solution Architect HPC Advisory Council Meeting Lugano 22 March 2016 Introduction TESLA Platform for HPC Agenda TESLA Platform for HYPERSCALE
More informationMit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen
Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Michael Glaßer Application Engineering MathWorks Germany 2014 The MathWorks, Inc. 1 Key Takeaways 1. Speed up your serial
More informationDefense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR
Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / 2017. 10. 31 syoh@add.re.kr Page 1/36 Overview 1. Introduction 2. Data Generation Synthesis 3. Distributed Deep Learning 4. Conclusions
More informationMit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen
Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Frank Graeber Application Engineering MathWorks Germany 2013 The MathWorks, Inc. 1 Speed up the serial code within core
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationNeural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics
More informationImplementation of Deep Convolutional Neural Net on a Digital Signal Processor
Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm
More informationPOWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017
POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year
More information