Standards for Neural Networks Acceleration and Deployment

Size: px
Start display at page:

Download "Standards for Neural Networks Acceleration and Deployment"

Transcription

1 Copyright Khronos Group Page 1 Standards for Neural Networks Acceleration and Deployment Radhakrishna Giduthuri Advanced Micro Devices radha.giduthuri@ieee.org

2 Agenda Why we need a standard? Khronos NNEF Khronos OpenVX AMD open-source project: bit.ly/openvx-amd dog cat Network Architecture Pre-trained Network Model (weights, ) Copyright Khronos Group Page 2

3 Copyright Khronos Group Page 3 Neural Network End-to-End Workflow input data vector or output data Lots of Labeled Data Problem Definition Data Gathering Training Dataset Validation Dataset Feedback Deploy (Application) Neural Network Model Training

4 Neural Network End-to-End Workflow Neural Network Training Frameworks Third Party Tools Vision/AI Applications Trained Network Model Vision and Neural Network Inferencing Runtime Datasets Neural Network Model Architecture Desktop and Cloud Hardware cudnn MIOpen MKL-DNN Deployment Embedded/Mobile Embedded/Mobile Embedded/Mobile Vision/Inferencing Hardware Embedded/Mobile/Desktop/Cloud Vision/Inferencing Hardware Vision/Inferencing Hardware Vision/Inferencing Hardware GPU DSP Custom CPU FPGA Copyright Khronos Group Page 4

5 Copyright Khronos Group Page 5 Problem: Neural Network Fragmentation Neural Network Training and Inferencing Fragmentation NN Training Framework 1 NN Training Framework 2 NN Training Framework 3 Every Tool Needs an Exporter to Every Accelerator Inference Engine 1 Inference Engine 2 Inference Engine 3 Neural Network Inferencing Fragmentation toll on Applications Vision/AI Application Every Application Needs know about Every Accelerator API Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3

6 Khronos APIs Connect Software to Silicon Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for 3D graphics, Virtual and Augmented Reality, Parallel Computing, Vision Processing and Neural Networks Copyright Khronos Group Page 6

7 Khronos Open Standards Tracking and Positioning Vision sensor(s) Download 3D augmentation object and scene data Geometric scene reconstruction Semantic scene understanding (Neural Networks) Generate Low Latency 3D Content and Augmentations VR/AR Application Interact with sensor, haptic and display devices Copyright Khronos Group Page 7

8 Copyright Khronos Group Page 8 NNEF - Solving Neural Network Fragmentation Before NNEF NN Training and Inferencing Fragmentation NN Training Framework 1 NN Training Framework 2 NN Training Framework 3 Every Tool Needs an Exporter to Every Accelerator Inference Engine 1 Inference Engine 2 Inference Engine 3 With NNEF- NN Training and Inferencing Interoperability NN Training Framework 1 NN Training Framework 2 Inference Engine 1 Inference Engine 2 NN Training Framework 3 Optimization and processing tools NNEF is a Cross-vendor Neural Net file format Encapsulates network formal semantics, structure, data formats, commonly-used operations (such as convolution, pooling, normalization, etc.) Inference Engine 3

9 Copyright Khronos Group Page 9 OpenVX - Solving Inferencing Fragmentation Before OpenVX Vision and NN Inferencing Fragmentation Vision/AI Application Every Application Needs know about Every Accelerator API Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3 With OpenVX Vision and NN Inferencing Interoperability Vision/AI Application Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3 OpenVX is an open, royalty-free standard for cross platform acceleration of computer vision and neural network applications.

10 Copyright Khronos Group Page 10 Neural Net Workflow with Khronos Standards Neural Network Training Frameworks Third Party Tools Vision/AI Applications Datasets Neural Network Model Architecture Desktop and Cloud Hardware cudnn MIOpen MKL-DNN Trained Network Model Vision and Neural Network Inferencing Runtime Deployment Embedded/Mobile Vision/Inferencing Embedded/Mobile Embedded/Mobile Embedded/Mobile/Desktop/Cloud Vision/Inferencing Hardware Hardware Vision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA

11 CIFAR-10 Example: NNEF Structure Input: 32x32 RGB image Output: 10 classes airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck version 1.0 graph cifar10( data ) -> ( prob ) { border = ignore data = external(shape = [0,3,32,32]); conv1w = variable(shape = [32,3,5,5], label = conv1/weights ); conv1b = variable(shape = [1,32], label = conv1/bias ); conv1 = conv(data, conv1w, conv1b, padding = [(2,2),(2,2)]); pool1 = max_pool(conv1, size=[1,1,3,3], stride=[1,1,2,2], no_border); relu1 = relu(pool1); conv2w = variable(shape = [32,32,5,5], label = conv2/weights ); conv2b = variable(shape = [1,32], label = conv2/bias ); conv2 = conv(relu1, conv2w, conv2b, padding = [(2,2),(2,2)]); relu2 = relu(conv2); pool2 = avg_pool(relu2, size=[1,1,3,3], stride=[1,1,2,2], no_border); conv3w = variable(shape = [64,32,5,5], label = conv3/weights ); conv3b = variable(shape = [1,64], label = conv3/bias ); conv3 = conv(pool2, conv3w, conv3b, padding = [(2,2),(2,2)]); relu3 = relu(conv3); pool3 = avg_pool(relu3, size=[1,1,3,3], stride=[1,1,2,2], no_border); ip1w = variable(shape = [64,64,4,4], label = ip1/weights ); ip1b = variable(shape = [1,64], label = ip1/bias ); ip1 = conv(pool3, ip1w, ip1b, padding = [(0,0),(0,0)]); ip2w = variable(shape = [10,64,1,1], label = ip2/weights ); ip2b = variable(shape = [1,10], label = ip2/bias ); ip2 = conv(ip1, ip2w, ip2b, padding = [(0,0),(0,0)]); prob = softmax(ip2); } Copyright Khronos Group Page 11

12 Copyright Khronos Group Page 12 NNEF Tensor File Format 0x4E 0xEF 0x01 0x00 Offset to actual data from the beginning of file (e.g., 32 for typical convolution weights and 24 for convolution biases) Number of tensor dimensions (e.g., 4 for typical convolution weights and 2 for convolution biases) 1 st dimension of tensor Last dimension of tensor Data type of tensor: 0x00 float 0x01 quantized Bit-width of each item in tensor Optional: variable length quantization algorithm string Actual tensor data (multiple bytes) Length of quantization algorithm string

13 Copyright Khronos Group Page 13 CIFAR-10 Example: NNEF Container Filename Type Description Size in bytes graph.nnef Text file Neural network structure 1,195 conv1/weights.dat Tensor file Weights of conv1 9,632 conv1/bias.dat Tensor file Biases of conv1 148 conv2/weights.dat Tensor file Weights of conv2 102,432 conv2/bias.dat Tensor file Biases of conv2 148 conv3/weights.dat Tensor file Weights of conv3 204,832 conv3/bias.dat Tensor file Biases of conv3 276 ip1/weights.dat Tensor file Weights of ip1 262,176 ip1/bias.dat Tensor file Biases of ip1 276 ip2/weights.dat Tensor file Weights of ip2 2,592 ip2/bias.dat Tensor file Biases of ip2 60 * Recommended: IEEE tar archive with optional compression

14 Application Development Workflow Vision/AI Applications Trained Format Structure plus weights (quantized or not). Can be augmented with private data Graph Compiler Inference Engine Structure only OR Structure plus weights (quantized or not) Training Framework Third Party Tools Vendor Format Khronos open-source projects: - NNEF Parser: base parser for use by NNEF validator tool and NNEF Importers - NNEF Exporters: generate NNEF from deep learning frameworks Copyright Khronos Group Page 14

15 Copyright Khronos Group Page 15 NNEF Status and Roadmap V1.0 provisional release - late December NNEF has formed an advisory panel, you are invited today to participate First version will focus on interface between framework and embedded inference engines - Khronos open-source projects to promote NNEF exporters and importers Support First cut range of network types - Field is moving very fast but we aim to keep up with developments NNEF Roadmap - Track development of new network types - Allow authoring and retraining (3rd party tools) - Address a wider range of applications (outside vision apps) - Increase the expressive power of the format

16 Copyright Khronos Group Page 16 Khronos Advisory Panels Specification drafts and invitations for requirements and feedback Requirements and feedback on specification drafts Working Group Khronos Members Any company can join Membership Fee Sign NDA and IP Framework Shared list and Repository Hosted by Khronos. Under Khronos NDA Advisory Panel Khronos Advisors Invited industry experts $0 Cost Sign NDA and IP Framework Advisory Panels Active for NNEF, Vulkan, and OpenCL/SYCL

17 An open, royalty-free standard for cross platform acceleration of computer vision and neural network applications. Copyright Khronos Group Page 17

18 Copyright Khronos Group Page 18 OpenVX Power Efficiency Wide range of vision hardware architectures OpenVX provides a high-level Graph-based abstraction -> Enables Graph-level optimizations! Can be implemented on almost any hardware or processor! -> Portable, Efficient Vision Processing! X100 X10 X1 Dedicated Hardware Vision DSPs GPU Compute Multi-core CPU Computation Flexibility Vision Node Vision Engines GPU Vision Node CNN Nodes Middleware DSP Vision Processing Graph Applications Hardware Vision Node Software Portability

19 Copyright Khronos Group Page 19 OpenVX - Graph-Level Abstraction OpenVX developers express a graph of image operations ( Nodes ) - Using a C API Nodes can be executed on any hardware or processor coded in any language - Implementers can optimize under the high-level graph abstraction Graphs are the key to run-time power and performance optimizations - E.g. Node fusion, tiled graph processing for cache efficiency etc. Camera Input OpenVX Nodes Pyr t Rendering Output RGB Frame Color Conversion YUV Frame Channel Extract Gray Frame Image Pyramid Optical Flow Array of Keypoints Harris Track OpenVX Graph Feature Extraction Example Graph Ftr t-1 Array of Features

20 Copyright Khronos Group Page 20 OpenVX Efficiency through Graphs.. Graph Scheduling Memory Management Kernel Fusion Data Tiling Split the graph execution across the whole system: CPU / GPU / dedicated HW Reuse pre-allocated memory for multiple intermediate data Replace a subgraph with a single faster node Execute a subgraph at tile granularity instead of image granularity Faster execution or lower power consumption Less allocation overhead, more memory for other applications Better memory locality, less kernel launch overhead Better use of data cache and local memory

21 Simple Edge Detector in OpenVX vx_graph g = vxcreategraph(); vx_image input = vxcreateimage(1920, 1080); vx_image output = vxcreateimage(1920, 1080); vx_image horiz = vxcreatevirtualimage(g); vx_image vert = vxcreatevirtualimage(g); vx_image mag = vxcreatevirtualimage(g); Declare Input and Output Images Declare Intermediate Images vxsobel3x3node(g, input, horiz, vert); vxmagnitudenode(g, horiz, vert, mag); vxthresholdnode(g, mag, THRESH, output); Construct the Graph topology Compile the Graph Execute the Graph status = vxverifygraph(g); status = vxprocessgraph(g); inp Sobel h v Mag m Thresh out Copyright Khronos Group Page 21

22 Copyright Khronos Group Page 22 OpenVX Evolution Conformant Implementations AMD OpenVX Tools - Open source, highly optimized for x86 CPU and OpenCL for GPU - Graph Optimizer looks at entire processing pipeline and removes, replaces, merges functions to improve performance and bandwidth - - Scripting for rapid prototyping - - live 360º camera stitching - - neural networks (develop) bit.ly/openvx-amd OpenVX 1.0 and OpenVX 1.1 New Functionality Conditional node execution Feature detection Classification operators Expanded imaging operations Extensions Neural Network Acceleration Graph Save and Restore 16-bit image operation Safety Critical OpenVX 1.1 SC for safety-certifiable systems OpenVX 1.2 Spec released May 2017 New Functionality Under Discussion NNEF Import Programmable user kernels with accelerator offload Streaming/pipelining OpenVX Roadmap

23 Copyright Khronos Group Page 23 Basic OpenVX Components Context Data Objects Image, Tensor, Pyramid, Array, LUT, Remap, Scalar, Threshold, Distribution, Matrix, Convolution, Delay, ObjectArray Graphs Nodes Kernel instances, parameters, attributes Kernels Built-in vision functions, Vendor extensions, User-defined Virtual Data Objects Intermediate data without host access, enables several optimizations Extensions NN, Import/Export, ICD, Miscellaneous Directives, Hints, Logging, Performance Measurements

24 Copyright Khronos Group Page 24 Basic OpenVX Vision Functions Kernels Element-wise Functions Add, Subtract, Multiply, AbsDiff, And, Or, Xor, Not, Min, Max, Magnitude, Phase, Threshold, TableLookup, ColorDepth, ChannelExtract, ChannelCombine, ColorConvert, Copy, AccumulateImage [Squared/Weighted], Tensor Add/Subtract/Multiply/LUT/ Reduction Functions Histogram, MeanStdDev, MinMaxLoc Control Flow Scalar Operations, Select Filtering Functions Box3x3, Convolve, Dilate3x3, Erode3x3, Gaussian3x3, Median3x3, Sobel3x3, GaussianPyramid, NonLinearFilter, LaplacianPyramid/Reconstruct, NonMaxSupression, Bilateral, LBP, HOG, Geometric Functions Remap, ScaleImage, WarpAffine, WarpPerspective, HalfScaleGaussian Complex Functions CannyEdgeDetector, EqualizeHist, FastCorners, HarrisCorners, IntegralImage, OpticalFlowPyrLK, HoughLinesP, MatrixMult, See VX/vx_nodes.h for functions to create kernel instances (nodes) in a graph

25 Copyright Khronos Group Page 25 New OpenVX 1.2 Functions Feature detection: find features useful for object detection and recognition - Histogram of gradients HOG Template matching - Local binary patterns LBP Line finding Classification: detect and recognize objects in an image based on a set of features - Import a classifier model trained offline - Classify objects based on a set of input features Image Processing: transform an image - Generalized nonlinear filter: Dilate, erode, median with arbitrary kernel shapes - Non maximum suppression: Find local maximum values in an image - Bilateral filter: edge-preserving noise reduction Conditional execution & node predication - Selectively execute portions of a graph based on a true/false predicate Many, many minor improvements New Extensions - Import/export: compile a graph; save and run later - 16-bit support: signed 16-bit image data - Neural networks: s are represented as OpenVX nodes B A S Condition C If A then S B else S C

26 Copyright Khronos Group Page 26 Feature Detectors Histogram of gradients HOG Technique which counts occurrences of gradient orientation in localized portions of an image Local binary patterns LBP A binary pattern that describe that describe the image texture. Hough Lines Find lines in a picture. Used in line departure warning algorithms.

27 Copyright Khronos Group Page 27 Template Matching a technique in digital image processing for finding small parts of an image which match a template image Allow very strong denoising Tracking Object detection

28 Copyright Khronos Group Page 28 Classifier Extension (Detection and classification) Classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs. Feature detection refers to methods that aim at computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not This extension do both. Possible methods: SVM Cascade (Decision Tree)

29 Copyright Khronos Group Page 29 OpenVX 1.2 and Neural Net Extension Convolution Neural Network topologies can be represented as OpenVX graphs - s are represented as OpenVX nodes - s connected by multi-dimensional tensors objects - types include convolution, activation, pooling, fully-connected, soft-max - CNN nodes can be mixed with traditional vision nodes Import/Export Extension - Efficient handling of network Weights/Biases or complete networks OpenVX will be able to import NNEF files into OpenVX Neural Nets Native Camera Control Vision Node Vision Node CNN Nodes Vision Node Downstream Application Processing An OpenVX graph mixing CNN nodes with traditional vision nodes

30 Copyright Khronos Group Page 30 Supported networks can be Overfeat Alexnet GoogLeNet versions (Inception) ResNet LSTM RNN/BiRNN Faster-RCNN FCN

31 Copyright Khronos Group Page 31 OpenVX Neural Network Extension Two main parts: (1) a tensor object and (2) a set of CNN layer nodes A vx_tensor is a multi-dimensional array that supports at least 4 dimensions Tensor creation and deletion functions Simple math for tensors - Element-wise Add, Subtract, Multiply, TableLookup, and Bit-depth conversion - Transposition of dimensions and generalized matrix multiplication - vxcopytensorpatch, vxquerytensor (#dims, dims, element type, Q)

32 Copyright Khronos Group Page 32 Importance of Quantization OpenVX support 8-bit by default and 16-bit by extension Quantization is very attractive in low power embedded system Less bandwidth than float (Faster and less power) Need less internal memory (Significant reduction in silicon area) HW is smaller (Int8 and Int16 HW units are smaller than float) Quantization drawbacks Less accurate Degradation in performance quality (Some applications prefer the degradation if significant better performance is achieved)

33 Copyright Khronos Group Page 33 Neural Network Quantization Q78 was the first trivial quantization UINT8 quantization was introduced by google with GEMMLOWP. Use by TensorFlow and Google s TPU accelerator. min x max x INT8 with DFP (Dynamic Fixed Point) see Ristretto (a tool for CNN-approximation) min x max x

34 Copyright Khronos Group Page 34 Other Quantizations in OpenVX 1.2 HOG Features are 16-bit. Tensors are 8-bit and 16-bit only. Classification extension input is Tensor and therefore accept only 8-bits or 16 bits. All Image processing operations are 8-bit or as 16-bit extension. Template Matching input is 8-bit and output is 16-bit.

35 Copyright Khronos Group Page 35 OpenVX Neural Network Extension Tensor types of INT16, INT7.8, INT8, and UINT8 are supported - Other types may be supported by a vendor Conformance tests will be up to some tolerance in precision - To allow for optimizations, e.g., weight compression Eight neural network layer nodes: vxactivation vxconvolution vxdeconvolution vxfullyconnected vxnormalization vxpooling vxsoftmax vxroipooling

36 Copyright Khronos Group Page 36 An Example of Convolution Neural Network Optimizations Conv Activation Pooling Conv Activation Pooling Conv Activation Conv Activation Conv Conv Activation Activation Pooling Pooling Fully Connected Activ. Fully Connected Activ. Fully Connected Activation Conv Activation Pooling Conv Activation Pooling Conv Activation Conv Activation

37 Copyright Khronos Group Page 37 Safety Critical APIs OpenGL SC Fixed function graphics subset Experience and Guidelines OpenGL SC April 2016 Shader programmable pipeline subset New Generation APIs for safety certifiable vision, graphics and compute e.g. ISO and DO-178B/C OpenGL ES Fixed function graphics OpenGL ES Shader programmable pipeline OpenVX SC 1.1 Released 1st May 2017 Restricted deployment implementation executes on the target hardware by reading the binary format and executing the pre-compiled graphs Khronos SCAP Safety Critical Advisory Panel Guidelines for designing APIs that ease system certification. Open to Khronos member AND industry experts OpenCL SC TSG Formed Working on OpenCL SC 1.2 Eliminate Undefined Behavior Eliminate Callback Functions Static Pool of Event Objects

38 Copyright Khronos Group Page 38 OpenVX SC - Safety Critical Vision Processing OpenVX based on OpenVX 1.1 main specification - Enhanced determinism - Specification identifies and numbers requirements MISRA C clean per KlocWorks v10 Divides functionality into development and deployment feature sets - Adds requirement to support import/export extension OpenVX SC Development Feature Set (Create Graph) Verify Export Binary format Import OpenVX SC Deployment Feature Set (Execute Graph) Entire graph creation API Implementation dependent format No graph creation API

39 Copyright Khronos Group Page 39 OpenVX Application Deployment Workflow Development Stage Deployed Application Create and verify OpenVX graph Create context Export all the objects that needs access during deployment OpenVX Binary in Vendor Specific Format Import objects from binary Graph execution Release all objects Release all objects Vendor Tools

40 Copyright Khronos Group Page 40 How OpenVX Compares to Alternatives Governance Open standard API designed to be implemented and shipped by IHVs Community-driven, open source library Open standard API designed to be implemented and shipped by IHVs Programming Model Graph defined with C API and then compiled for run-time execution Immediate runtime function calls reading to and from memory Explicit kernels are compiled and executed via run-time API Built-in Vision Functionality Small but growing set of popular functions Vast. Mainly on PC/CPU None. User programs their own or call vision library over OpenCL Target Hardware Any combination of processors or non-programmable hardware Mainly PCs and GPUs Any heterogeneous combination of IEEE FP-capable processors Optimization Opportunities Pre-declared graph enables significant optimizations Each function reads/writes memory. Power performance inefficient Any execution topology can be explicitly programmed Conformance Implementations must pass conformance to use trademark Extensive Test Suite but no formal Adopters program Implementations must pass conformance to use trademark Consistency All core functions must be available in conformant implementations Available functions vary depending on implementation / platform All core functions must be available in all conformant implementations

41 Copyright Khronos Group Page 41 ed Vision/ Neural Net Ecosystem Implementers may use OpenCL or Vulkan to implement OpenVX nodes on programmable processors OpenVX enables the graph to be extended to include hardware architectures that don t support programmable APIs Vulkan Roadmap Enhanced compute especially useful for vision and inferencing on mobile and Android platforms Application OpenCL Roadmap Flexible precision for widespread deployment on lowcost embedded processors Programmable Vision Processors Dedicated Vision Hardware And then implementors can use OpenVX to enable a developer to easily connect those nodes into a graph The OpenVX graph abstraction enables implementers to optimize execution across diverse hardware architectures for optimal power and performance

42 AMD OpenVX open-source project with NN bit.ly/openvx-amd All OpenVX projects on GitHub Neural network acceleration for inference Inference Application Development Workflow Pre-trained CAFFE model Application openvx and vx_nn libraries MIOpenGEMM, MIOpen, and OpenCL libraries Select/upload pre-trained NN Models Inference Result Viewer (GUI) Sample Inference Application anninferenceserver GPUs dog Browse Images Submit for Inference scheduler annmodule library anninferenceapp embedded IoT devices... P47 Server Node inference_generator Workstation LOOM: 360º live video stitching Network Server (tcp/ip) JPEG decode & scaling on CPUs Inference Generator Server Config Pre-compiled Models Uploaded Models (Temporary Storage) Copyright Khronos Group Page 42

43 Copyright Khronos Group Page 43 OpenVX Benefits and Resources Faster development of efficient and portable vision applications - Developers are protected from hardware complexities - No platform-specific performance optimizations needed Graph description enables significant automatic optimizations - Scheduling, memory management, kernel fusion, and tiling Performance portability to diverse hardware - Hardware agility for different use case requirements - Application software investment is protected as hardware evolves OpenVX Resources - OpenVX Overview OpenVX Specifications: current, previous, and extensions OpenVX Resources: implementations, tutorials, reference guides, etc. -

44 Copyright Khronos Group Page 44 Any Questions? Khronos is working on a comprehensive set of solutions for vision and inferencing - ed ecosystem that include multiple developer and deployment options Please let us know if we can help you participate in any Khronos activities! A quick hands-on OpenVX Tutorial - Please contact us with any comments or questions! - Radhakrishna Giduthuri radha.giduthuri@ieee.org These slides and further information on all these standards at Khronos -

Standards for Vision Processing and Neural Networks

Standards for Vision Processing and Neural Networks Copyright Khronos Group 2017 - Page 1 Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD radha.giduthuri@ieee.org Agenda Why we need a standard? Khronos NNEF Khronos OpenVX

More information

The OpenVX Computer Vision and Neural Network Inference

The OpenVX Computer Vision and Neural Network Inference The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos

More information

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018 Copyright Khronos Group 2018 - Page 1 Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc peter.mcguinness@gobrach.com May 2018 Khronos Mission E.g. OpenGL ES provides 3D

More information

Open Standards for AR and VR Neil Trevett Khronos President NVIDIA VP Developer January 2018

Open Standards for AR and VR Neil Trevett Khronos President NVIDIA VP Developer January 2018 Copyright Khronos Group 2018 - Page 1 Open Standards for AR and Neil Trevett Khronos President NVIDIA VP Developer Ecosystem ntrevett@nvidia.com @neilt3d January 2018 Khronos Mission E.g. OpenGL ES provides

More information

Kari Pulli Intel. Radhakrishna Giduthuri, AMD. Frank Brill NVIDIA. OpenVX Webinar. June 16, 2016

Kari Pulli Intel. Radhakrishna Giduthuri, AMD. Frank Brill NVIDIA. OpenVX Webinar. June 16, 2016 Kari Pulli Intel Radhakrishna Giduthuri, AMD Frank Brill NVIDIA OpenVX Webinar June 16, 2016 Copyright Khronos Group 2016 - Page 1 Vision Acceleration Kari Pulli Sr. Principal Engineer Intel Copyright

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 Update on Khronos Standards for Vision and Machine Learning December 2017 Neil Trevett Khronos President NVIDIA VP Developer Ecosystem ntrevett@nvidia.com @neilt3d www.khronos.org Copyright Khronos Group

More information

Neural Network Exchange Format

Neural Network Exchange Format Copyright Khronos Group 2017 - Page 1 Neural Network Exchange Format Deploying Trained Networks to Inference Engines Viktor Gyenes, specification editor Copyright Khronos Group 2017 - Page 2 Outlook The

More information

Accelerating Vision Processing

Accelerating Vision Processing Accelerating Vision Processing Neil Trevett Vice President Mobile Ecosystem at NVIDIA President of Khronos and Chair of the OpenCL Working Group SIGGRAPH, July 2016 Copyright Khronos Group 2016 - Page

More information

Silicon Acceleration APIs

Silicon Acceleration APIs Copyright Khronos Group 2016 - Page 1 Silicon Acceleration APIs Embedded Technology 2016, Yokohama Neil Trevett Vice President Developer Ecosystem, NVIDIA President, Khronos ntrevett@nvidia.com @neilt3d

More information

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Copyright Khronos Group Page 1. Vulkan Overview. June 2015 Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration

More information

Overview and AR/VR Roadmap

Overview and AR/VR Roadmap Khronos Group Inc. 2018 - Page 1 Overview and AR/ Roadmap Neil Trevett Khronos President NVIDIA VP Developer Ecosystems ntrevett@nvidia.com @neilt3d Khronos Group Inc. 2018 - Page 2 Khronos Connects Software

More information

Standards Update. Copyright Khronos Group Page 1

Standards Update. Copyright Khronos Group Page 1 Standards Update VR/AR, 3D, Web, Vision and Deep Learning Neil Trevett Khronos President NVIDIA VP Developer Ecosystem ntrevett@nvidia.com @neilt3d www.khronos.org Copyright Khronos Group 2017 - Page 1

More information

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 Open Standards and Open Source Together How Khronos APIs Accelerate Fast and Cool Applications Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page

More information

Wu Zhiwen.

Wu Zhiwen. Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 OpenCL State of the Nation Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Toronto, May 2017 Copyright Khronos Group 2017

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 OpenCL State of the Nation Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Toronto, May 2017 Copyright Khronos Group 2017

More information

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems Open Standards for Building Virtual and Augmented Realities Neil Trevett Khronos President NVIDIA VP Developer Ecosystems Khronos Mission Asian Members Software Silicon Khronos is an International Industry

More information

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1 Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Ecosystem @neilt3d Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 Gaming Market Briefing Overview of APIs GDC March 2016 Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Copyright

More information

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1 Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page 1 Khronos Connects Software to Silicon Open Consortium creating ROYALTY-FREE,

More information

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1 Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page 1 Khronos Connects Software to Silicon Open Consortium creating ROYALTY-FREE,

More information

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 Agenda Why Zynq SoCs for Traditional Computer Vision Automated

More information

Xilinx ML Suite Overview

Xilinx ML Suite Overview Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame

More information

Khronos Connects Software to Silicon

Khronos Connects Software to Silicon Press Pre-Briefing GDC 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem All Materials Embargoed Until Tuesday 3 rd March, 12:01AM Pacific Time Copyright Khronos Group 2015 - Page

More information

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just

More information

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade

More information

Open Standard APIs for Augmented Reality

Open Standard APIs for Augmented Reality Copyright Khronos Group 2014 - Page 1 Open Standard APIs for Augmented Reality Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group Copyright Khronos Group 2014 - Page 2 Khronos

More information

OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Copyright Khronos Group, 2012 - Page 1 OpenCL Overview Shanghai March 2012 Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Copyright Khronos Group, 2012 - Page 2 Processor

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for

More information

AR Standards Update Austin, March 2012

AR Standards Update Austin, March 2012 AR Standards Update Austin, March 2012 Neil Trevett President, The Khronos Group Vice President Mobile Content, NVIDIA Copyright Khronos Group, 2012 - Page 1 Topics Very brief overview of Khronos Update

More information

Deep Learning Requirements for Autonomous Vehicles

Deep Learning Requirements for Autonomous Vehicles Deep Learning Requirements for Autonomous Vehicles Pierre Paulin, Director of R&D Synopsys Inc. Chipex, 1 May 2018 1 Agenda Deep Learning and Convolutional Neural Networks for Embedded Vision Automotive

More information

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Helena Zheng ML Group, Arm Arm Technical Symposia 2017, Taipei Machine Learning is a Subset of Artificial

More information

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016 The Path to Embedded Vision & AI using a Low Power Vision DSP Yair Siegel, Director of Segment Marketing Hotchips August 2016 Presentation Outline Introduction The Need for Embedded Vision & AI Vision

More information

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm What is Machine Learning (ML)? Artificial Intelligence Machine Learning Deep Learning Neural Networks Additional

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 OpenCL and Ecosystem State of the Nation Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Oxford, May 2018 Copyright Khronos

More information

WebGL Meetup GDC Copyright Khronos Group, Page 1

WebGL Meetup GDC Copyright Khronos Group, Page 1 WebGL Meetup GDC 2012 Copyright Khronos Group, 2012 - Page 1 Copyright Khronos Group, 2012 - Page 2 Khronos API Ecosystem Trends Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos

More information

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

NVIDIA FOR DEEP LEARNING. Bill Veenhuis NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA

More information

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and

More information

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning, Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110

More information

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012 Copyright Khronos Group 2012 Page 1 OpenCL 1.2 August 2012 Copyright Khronos Group 2012 Page 2 Khronos - Connecting Software to Silicon Khronos defines open, royalty-free standards to access graphics,

More information

SIGGRAPH Briefing August 2014

SIGGRAPH Briefing August 2014 Copyright Khronos Group 2014 - Page 1 SIGGRAPH Briefing August 2014 Neil Trevett VP Mobile Ecosystem, NVIDIA President, Khronos Copyright Khronos Group 2014 - Page 2 Significant Khronos API Ecosystem Advances

More information

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015 Copyright Khronos Group 2015 - Page 1 Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information

April 4-7, 2016 Silicon Valley VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783. Elif Albuz, April 4, 2016

April 4-7, 2016 Silicon Valley VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783. Elif Albuz, April 4, 2016 April 4-7, 2016 Silicon Valley VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif Albuz, April 4, 2016 Motivation Introduction to VisionWorks AGENDA VisionWorks Software Stack VisionWorks

More information

Khronos and the Mobile Ecosystem

Khronos and the Mobile Ecosystem Copyright Khronos Group, 2011 - Page 1 Khronos and the Mobile Ecosystem Neil Trevett VP Mobile Content, NVIDIA President, Khronos Copyright Khronos Group, 2011 - Page 2 Topics It s not just about individual

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 OpenCL A State of the Union Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Vienna, April 2016 Copyright Khronos Group 2016

More information

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Khronos Mission Software Silicon Khronos is

More information

An introduction to Machine Learning silicon

An introduction to Machine Learning silicon An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional

More information

Vulkan 1.1 March Copyright Khronos Group Page 1

Vulkan 1.1 March Copyright Khronos Group Page 1 Vulkan 1.1 March 2018 Copyright Khronos Group 2018 - Page 1 Vulkan 1.1 Launch and Ongoing Momentum Strengthening the Ecosystem Improved developer tools (SDK, validation/debug layers) More rigorous conformance

More information

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014 Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos Copyright Khronos Group 2014 - Page 1 Khronos

More information

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most

More information

The OpenVX Classifier Extension. The Khronos OpenVX Working Group, Editors: Tomer Schwartz, Radhakrishna Giduthuri

The OpenVX Classifier Extension. The Khronos OpenVX Working Group, Editors: Tomer Schwartz, Radhakrishna Giduthuri The OpenVX Classifier Extension The Khronos OpenVX Working Group, Editors: Tomer Schwartz, Radhakrishna Giduthuri Version 1.2.1 (provisional), Wed, 15 Aug 2018 06:03:09 +0000 Table of Contents 1. Classifiers

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)

More information

Vision Acceleration. Launch Briefing October Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group

Vision Acceleration. Launch Briefing October Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group Copyright Khronos Group 2014 - Page 1 Vision Acceleration Launch Briefing October 2014 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group Copyright Khronos Group 2014 - Page

More information

Enable AI on Mobile Devices

Enable AI on Mobile Devices Enable AI on Mobile Devices Scott Wang 王舒翀 Senior Segment Manager Mobile, BSG ARM Tech Forum 2017 14 th June 2017, Shenzhen AI is moving from core to edge Ubiquitous AI Safe and autonomous Mixed reality

More information

Mobile AR Hardware Futures

Mobile AR Hardware Futures Copyright Khronos Group, 2010 - Page 1 Mobile AR Hardware Futures Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Two Perspectives NVIDIA - Tegra 2 mobile processor Khronos

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

OpenCL Press Conference

OpenCL Press Conference Copyright Khronos Group, 2011 - Page 1 OpenCL Press Conference Tokyo, November 2011 Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Copyright Khronos Group, 2011 - Page

More information

High Performance Computing

High Performance Computing High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning

More information

TIOVX TI s OpenVX Implementation

TIOVX TI s OpenVX Implementation TIOVX TI s OpenVX Implementation Aish Dubey Product Marketing, Automotive Processors Embedded Vision Summit, 3 May 2017 1 TI SOC platform heterogeneous cores High level processing Object detection and

More information

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such

More information

Movidius Neural Compute Stick

Movidius Neural Compute Stick Movidius Neural Compute Stick You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Next Generation Visual Computing

Next Generation Visual Computing Next Generation Visual Computing (Making GPU Computing a Reality with Mali ) Taipei, 18 June 2013 Roberto Mijat ARM Addressing Computational Challenges Trends Growing display sizes and resolutions Increasing

More information

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010 Copyright Khronos Group, 2011 - Page 1 OpenCL GDC, March 2010 Authoring and accessibility Application Acceleration System Integration Copyright Khronos Group, 2011 - Page 2 Khronos Family of Standards

More information

Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.

Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc. Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs Lihua Zhang, Ph.D. MulticoreWare Inc. lihua@multicorewareinc.com Overview More & more mobile apps are beginning to require

More information

OpenVX 1.2 Quick Reference Guide Page 1

OpenVX 1.2 Quick Reference Guide Page 1 OpenVX 1.2 Quick Reference Guide Page 1 OpenVX (Open Computer Vision Acceleration API) is a low-level programming framework domain to access computer vision hardware acceleration with both functional and

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

Vulkan Launch Webinar 18 th February Copyright Khronos Group Page 1

Vulkan Launch Webinar 18 th February Copyright Khronos Group Page 1 Vulkan Launch Webinar 18 th February 2016 Copyright Khronos Group 2016 - Page 1 Copyright Khronos Group 2016 - Page 2 The Vulkan Launch Webinar Is About to Start! Kathleen Mattson - Webinar MC, Khronos

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

Accelerating Implementation of Low Power Artificial Intelligence at the Edge

Accelerating Implementation of Low Power Artificial Intelligence at the Edge Accelerating Implementation of Low Power Artificial Intelligence at the Edge A Lattice Semiconductor White Paper November 2018 The emergence of smart factories, cities, homes and mobile are driving shifts

More information

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

Deep learning in MATLAB From Concept to CUDA Code

Deep learning in MATLAB From Concept to CUDA Code Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,

More information

Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group. Copyright Khronos Group Page 1

Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group. Copyright Khronos Group Page 1 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group Copyright Khronos Group 2014 - Page 1 Khronos Standards 3D Asset Handling - 3D authoring asset interchange - 3D asset transmission

More information

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Deploying Deep Learning Networks to Embedded GPUs and CPUs Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train

More information

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Object Detection. Part1. Presenter: Dae-Yong

Object Detection. Part1. Presenter: Dae-Yong Object Part1 Presenter: Dae-Yong Contents 1. What is an Object? 2. Traditional Object Detector 3. Deep Learning-based Object Detector What is an Object? Subset of Object Recognition What is an Object?

More information

Graph Streaming Processor

Graph Streaming Processor Graph Streaming Processor A Next-Generation Computing Architecture Val G. Cook Chief Software Architect Satyaki Koneru Chief Technology Officer Ke Yin Chief Scientist Dinakar Munagala Chief Executive Officer

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Tutorial Practice Session

Tutorial Practice Session Copyright Khronos Group 2016 - Page 1 Tutorial Practice Session Step 2: Graphs Copyright Khronos Group 2016 - Page 2 Why graphs? Most APIs (e.g., OpenCV) are based on function calls - a function abstracts

More information

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Mali Developer Resources. Kevin Ho ARM Taiwan FAE Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture

More information

Copyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012

Copyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012 Copyright Khronos Group, 2012 - Page 1 Khronos Overview Taiwan, February 2012 Copyright Khronos Group, 2012 - Page 2 Khronos - Connecting Software to Silicon Creating open, royalty-free API standards -

More information

All You Want To Know About CNNs. Yukun Zhu

All You Want To Know About CNNs. Yukun Zhu All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image

More information

High-Performance Data Loading and Augmentation for Deep Neural Network Training

High-Performance Data Loading and Augmentation for Deep Neural Network Training High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose

More information

gltf Briefing September 2016 Copyright Khronos Group Page 1

gltf Briefing September 2016 Copyright Khronos Group Page 1 gltf Briefing September 2016 Copyright Khronos Group 2016 - Page 1 Copyright Khronos Group 2016 - Page 2 Background and Motivation OpenGL ES and WebGL have led to a proliferation of Web 3D but no standard

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

Tutorial on Machine Learning Tools

Tutorial on Machine Learning Tools Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow

More information

NVIDIA DEEP LEARNING INSTITUTE

NVIDIA DEEP LEARNING INSTITUTE NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM

DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection,

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

Design guidelines for embedded real time face detection application

Design guidelines for embedded real time face detection application Design guidelines for embedded real time face detection application White paper for Embedded Vision Alliance By Eldad Melamed Much like the human visual system, embedded computer vision systems perform

More information

The Benefits of GPU Compute on ARM Mali GPUs

The Benefits of GPU Compute on ARM Mali GPUs The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >

More information