Standards for Neural Networks Acceleration and Deployment

Size: px

Start display at page:

Download "Standards for Neural Networks Acceleration and Deployment"

Elijah Hilary Cunningham
5 years ago
Views:

1 Copyright Khronos Group Page 1 Standards for Neural Networks Acceleration and Deployment Radhakrishna Giduthuri Advanced Micro Devices radha.giduthuri@ieee.org

2 Agenda Why we need a standard? Khronos NNEF Khronos OpenVX AMD open-source project: bit.ly/openvx-amd dog cat Network Architecture Pre-trained Network Model (weights, ) Copyright Khronos Group Page 2

3 Copyright Khronos Group Page 3 Neural Network End-to-End Workflow input data vector or output data Lots of Labeled Data Problem Definition Data Gathering Training Dataset Validation Dataset Feedback Deploy (Application) Neural Network Model Training

MKL-DNN Deployment Embedded/Mobile Embedded/Mobile Embedded/Mobile Vision/Inferencing Hardware Embedded/Mobile/Desktop/Cloud

4 Neural Network End-to-End Workflow Neural Network Training Frameworks Third Party Tools Vision/AI Applications Trained Network Model Vision and Neural Network Inferencing Runtime Datasets Neural Network Model Architecture Desktop and Cloud Hardware cudnn MIOpen MKL-DNN Deployment Embedded/Mobile Embedded/Mobile Embedded/Mobile Vision/Inferencing Hardware Embedded/Mobile/Desktop/Cloud Vision/Inferencing Hardware Vision/Inferencing Hardware Vision/Inferencing Hardware GPU DSP Custom CPU FPGA Copyright Khronos Group Page 4

5 Copyright Khronos Group Page 5 Problem: Neural Network Fragmentation Neural Network Training and Inferencing Fragmentation NN Training Framework 1 NN Training Framework 2 NN Training Framework 3 Every Tool Needs an Exporter to Every Accelerator Inference Engine 1 Inference Engine 2 Inference Engine 3 Neural Network Inferencing Fragmentation toll on Applications Vision/AI Application Every Application Needs know about Every Accelerator API Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3

6 Khronos APIs Connect Software to Silicon Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for 3D graphics, Virtual and Augmented Reality, Parallel Computing, Vision Processing and Neural Networks Copyright Khronos Group Page 6

7 Khronos Open Standards Tracking and Positioning Vision sensor(s) Download 3D augmentation object and scene data Geometric scene reconstruction Semantic scene understanding (Neural Networks) Generate Low Latency 3D Content and Augmentations VR/AR Application Interact with sensor, haptic and display devices Copyright Khronos Group Page 7

Copyright Khronos Group 2017 - Page 8 NNEF - Solving Neural Network Fragmentation Before NNEF NN Training and Inferencing Fragmentation NN Training Framework 1 NN Training Framework 2 NN Training

8 Copyright Khronos Group Page 8 NNEF - Solving Neural Network Fragmentation Before NNEF NN Training and Inferencing Fragmentation NN Training Framework 1 NN Training Framework 2 NN Training Framework 3 Every Tool Needs an Exporter to Every Accelerator Inference Engine 1 Inference Engine 2 Inference Engine 3 With NNEF- NN Training and Inferencing Interoperability NN Training Framework 1 NN Training Framework 2 Inference Engine 1 Inference Engine 2 NN Training Framework 3 Optimization and processing tools NNEF is a Cross-vendor Neural Net file format Encapsulates network formal semantics, structure, data formats, commonly-used operations (such as convolution, pooling, normalization, etc.) Inference Engine 3

Copyright Khronos Group 2017 - Page 9 OpenVX - Solving Inferencing Fragmentation Before OpenVX Vision and NN Inferencing Fragmentation Vision/AI Application Every Application Needs know about Every

9 Copyright Khronos Group Page 9 OpenVX - Solving Inferencing Fragmentation Before OpenVX Vision and NN Inferencing Fragmentation Vision/AI Application Every Application Needs know about Every Accelerator API Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3 With OpenVX Vision and NN Inferencing Interoperability Vision/AI Application Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3 OpenVX is an open, royalty-free standard for cross platform acceleration of computer vision and neural network applications.

10 Copyright Khronos Group Page 10 Neural Net Workflow with Khronos Standards Neural Network Training Frameworks Third Party Tools Vision/AI Applications Datasets Neural Network Model Architecture Desktop and Cloud Hardware cudnn MIOpen MKL-DNN Trained Network Model Vision and Neural Network Inferencing Runtime Deployment Embedded/Mobile Vision/Inferencing Embedded/Mobile Embedded/Mobile Embedded/Mobile/Desktop/Cloud Vision/Inferencing Hardware Hardware Vision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA

11 CIFAR-10 Example: NNEF Structure Input: 32x32 RGB image Output: 10 classes airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck version 1.0 graph cifar10( data ) -> ( prob ) { border = ignore data = external(shape = [0,3,32,32]); conv1w = variable(shape = [32,3,5,5], label = conv1/weights ); conv1b = variable(shape = [1,32], label = conv1/bias ); conv1 = conv(data, conv1w, conv1b, padding = [(2,2),(2,2)]); pool1 = max_pool(conv1, size=[1,1,3,3], stride=[1,1,2,2], no_border); relu1 = relu(pool1); conv2w = variable(shape = [32,32,5,5], label = conv2/weights ); conv2b = variable(shape = [1,32], label = conv2/bias ); conv2 = conv(relu1, conv2w, conv2b, padding = [(2,2),(2,2)]); relu2 = relu(conv2); pool2 = avg_pool(relu2, size=[1,1,3,3], stride=[1,1,2,2], no_border); conv3w = variable(shape = [64,32,5,5], label = conv3/weights ); conv3b = variable(shape = [1,64], label = conv3/bias ); conv3 = conv(pool2, conv3w, conv3b, padding = [(2,2),(2,2)]); relu3 = relu(conv3); pool3 = avg_pool(relu3, size=[1,1,3,3], stride=[1,1,2,2], no_border); ip1w = variable(shape = [64,64,4,4], label = ip1/weights ); ip1b = variable(shape = [1,64], label = ip1/bias ); ip1 = conv(pool3, ip1w, ip1b, padding = [(0,0),(0,0)]); ip2w = variable(shape = [10,64,1,1], label = ip2/weights ); ip2b = variable(shape = [1,10], label = ip2/bias ); ip2 = conv(ip1, ip2w, ip2b, padding = [(0,0),(0,0)]); prob = softmax(ip2); } Copyright Khronos Group Page 11

12 Copyright Khronos Group Page 12 NNEF Tensor File Format 0x4E 0xEF 0x01 0x00 Offset to actual data from the beginning of file (e.g., 32 for typical convolution weights and 24 for convolution biases) Number of tensor dimensions (e.g., 4 for typical convolution weights and 2 for convolution biases) 1 st dimension of tensor Last dimension of tensor Data type of tensor: 0x00 float 0x01 quantized Bit-width of each item in tensor Optional: variable length quantization algorithm string Actual tensor data (multiple bytes) Length of quantization algorithm string

13 Copyright Khronos Group Page 13 CIFAR-10 Example: NNEF Container Filename Type Description Size in bytes graph.nnef Text file Neural network structure 1,195 conv1/weights.dat Tensor file Weights of conv1 9,632 conv1/bias.dat Tensor file Biases of conv1 148 conv2/weights.dat Tensor file Weights of conv2 102,432 conv2/bias.dat Tensor file Biases of conv2 148 conv3/weights.dat Tensor file Weights of conv3 204,832 conv3/bias.dat Tensor file Biases of conv3 276 ip1/weights.dat Tensor file Weights of ip1 262,176 ip1/bias.dat Tensor file Biases of ip1 276 ip2/weights.dat Tensor file Weights of ip2 2,592 ip2/bias.dat Tensor file Biases of ip2 60 * Recommended: IEEE tar archive with optional compression

14 Application Development Workflow Vision/AI Applications Trained Format Structure plus weights (quantized or not). Can be augmented with private data Graph Compiler Inference Engine Structure only OR Structure plus weights (quantized or not) Training Framework Third Party Tools Vendor Format Khronos open-source projects: - NNEF Parser: base parser for use by NNEF validator tool and NNEF Importers - NNEF Exporters: generate NNEF from deep learning frameworks Copyright Khronos Group Page 14

Copyright Khronos Group 2017 - Page 15 NNEF Status and Roadmap V1.

participate First version will focus on interface between framework and embedded inference engines - Khronos

is moving very fast but we aim to keep up with developments NNEF Roadmap - Track development of new network

15 Copyright Khronos Group Page 15 NNEF Status and Roadmap V1.0 provisional release - late December NNEF has formed an advisory panel, you are invited today to participate First version will focus on interface between framework and embedded inference engines - Khronos open-source projects to promote NNEF exporters and importers Support First cut range of network types - Field is moving very fast but we aim to keep up with developments NNEF Roadmap - Track development of new network types - Allow authoring and retraining (3rd party tools) - Address a wider range of applications (outside vision apps) - Increase the expressive power of the format

16 Copyright Khronos Group Page 16 Khronos Advisory Panels Specification drafts and invitations for requirements and feedback Requirements and feedback on specification drafts Working Group Khronos Members Any company can join Membership Fee Sign NDA and IP Framework Shared list and Repository Hosted by Khronos. Under Khronos NDA Advisory Panel Khronos Advisors Invited industry experts $0 Cost Sign NDA and IP Framework Advisory Panels Active for NNEF, Vulkan, and OpenCL/SYCL

17 An open, royalty-free standard for cross platform acceleration of computer vision and neural network applications. Copyright Khronos Group Page 17

18 Copyright Khronos Group Page 18 OpenVX Power Efficiency Wide range of vision hardware architectures OpenVX provides a high-level Graph-based abstraction -> Enables Graph-level optimizations! Can be implemented on almost any hardware or processor! -> Portable, Efficient Vision Processing! X100 X10 X1 Dedicated Hardware Vision DSPs GPU Compute Multi-core CPU Computation Flexibility Vision Node Vision Engines GPU Vision Node CNN Nodes Middleware DSP Vision Processing Graph Applications Hardware Vision Node Software Portability

19 Copyright Khronos Group Page 19 OpenVX - Graph-Level Abstraction OpenVX developers express a graph of image operations ( Nodes ) - Using a C API Nodes can be executed on any hardware or processor coded in any language - Implementers can optimize under the high-level graph abstraction Graphs are the key to run-time power and performance optimizations - E.g. Node fusion, tiled graph processing for cache efficiency etc. Camera Input OpenVX Nodes Pyr t Rendering Output RGB Frame Color Conversion YUV Frame Channel Extract Gray Frame Image Pyramid Optical Flow Array of Keypoints Harris Track OpenVX Graph Feature Extraction Example Graph Ftr t-1 Array of Features

20 Copyright Khronos Group Page 20 OpenVX Efficiency through Graphs.. Graph Scheduling Memory Management Kernel Fusion Data Tiling Split the graph execution across the whole system: CPU / GPU / dedicated HW Reuse pre-allocated memory for multiple intermediate data Replace a subgraph with a single faster node Execute a subgraph at tile granularity instead of image granularity Faster execution or lower power consumption Less allocation overhead, more memory for other applications Better memory locality, less kernel launch overhead Better use of data cache and local memory

21 Simple Edge Detector in OpenVX vx_graph g = vxcreategraph(); vx_image input = vxcreateimage(1920, 1080); vx_image output = vxcreateimage(1920, 1080); vx_image horiz = vxcreatevirtualimage(g); vx_image vert = vxcreatevirtualimage(g); vx_image mag = vxcreatevirtualimage(g); Declare Input and Output Images Declare Intermediate Images vxsobel3x3node(g, input, horiz, vert); vxmagnitudenode(g, horiz, vert, mag); vxthresholdnode(g, mag, THRESH, output); Construct the Graph topology Compile the Graph Execute the Graph status = vxverifygraph(g); status = vxprocessgraph(g); inp Sobel h v Mag m Thresh out Copyright Khronos Group Page 21

Copyright Khronos Group 2017 - Page 22 OpenVX Evolution Conformant Implementations AMD OpenVX Tools - Open source, highly optimized for x86 CPU

bandwidth - - Scripting for rapid prototyping - - live 360º camera stitching - - neural networks (develop) bit.ly/openvx-amd OpenVX 1.

1 New Functionality Conditional node execution Feature detection Classification operators Expanded imaging operations Extensions Neural Network

22 Copyright Khronos Group Page 22 OpenVX Evolution Conformant Implementations AMD OpenVX Tools - Open source, highly optimized for x86 CPU and OpenCL for GPU - Graph Optimizer looks at entire processing pipeline and removes, replaces, merges functions to improve performance and bandwidth - - Scripting for rapid prototyping - - live 360º camera stitching - - neural networks (develop) bit.ly/openvx-amd OpenVX 1.0 and OpenVX 1.1 New Functionality Conditional node execution Feature detection Classification operators Expanded imaging operations Extensions Neural Network Acceleration Graph Save and Restore 16-bit image operation Safety Critical OpenVX 1.1 SC for safety-certifiable systems OpenVX 1.2 Spec released May 2017 New Functionality Under Discussion NNEF Import Programmable user kernels with accelerator offload Streaming/pipelining OpenVX Roadmap

23 Copyright Khronos Group Page 23 Basic OpenVX Components Context Data Objects Image, Tensor, Pyramid, Array, LUT, Remap, Scalar, Threshold, Distribution, Matrix, Convolution, Delay, ObjectArray Graphs Nodes Kernel instances, parameters, attributes Kernels Built-in vision functions, Vendor extensions, User-defined Virtual Data Objects Intermediate data without host access, enables several optimizations Extensions NN, Import/Export, ICD, Miscellaneous Directives, Hints, Logging, Performance Measurements

24 Copyright Khronos Group Page 24 Basic OpenVX Vision Functions Kernels Element-wise Functions Add, Subtract, Multiply, AbsDiff, And, Or, Xor, Not, Min, Max, Magnitude, Phase, Threshold, TableLookup, ColorDepth, ChannelExtract, ChannelCombine, ColorConvert, Copy, AccumulateImage [Squared/Weighted], Tensor Add/Subtract/Multiply/LUT/ Reduction Functions Histogram, MeanStdDev, MinMaxLoc Control Flow Scalar Operations, Select Filtering Functions Box3x3, Convolve, Dilate3x3, Erode3x3, Gaussian3x3, Median3x3, Sobel3x3, GaussianPyramid, NonLinearFilter, LaplacianPyramid/Reconstruct, NonMaxSupression, Bilateral, LBP, HOG, Geometric Functions Remap, ScaleImage, WarpAffine, WarpPerspective, HalfScaleGaussian Complex Functions CannyEdgeDetector, EqualizeHist, FastCorners, HarrisCorners, IntegralImage, OpticalFlowPyrLK, HoughLinesP, MatrixMult, See VX/vx_nodes.h for functions to create kernel instances (nodes) in a graph

25 Copyright Khronos Group Page 25 New OpenVX 1.2 Functions Feature detection: find features useful for object detection and recognition - Histogram of gradients HOG Template matching - Local binary patterns LBP Line finding Classification: detect and recognize objects in an image based on a set of features - Import a classifier model trained offline - Classify objects based on a set of input features Image Processing: transform an image - Generalized nonlinear filter: Dilate, erode, median with arbitrary kernel shapes - Non maximum suppression: Find local maximum values in an image - Bilateral filter: edge-preserving noise reduction Conditional execution & node predication - Selectively execute portions of a graph based on a true/false predicate Many, many minor improvements New Extensions - Import/export: compile a graph; save and run later - 16-bit support: signed 16-bit image data - Neural networks: s are represented as OpenVX nodes B A S Condition C If A then S B else S C

Copyright Khronos Group 2017 - Page 26 Feature Detectors

occurrences of gradient orientation in localized portions

that describe that describe the image texture.

26 Copyright Khronos Group Page 26 Feature Detectors Histogram of gradients HOG Technique which counts occurrences of gradient orientation in localized portions of an image Local binary patterns LBP A binary pattern that describe that describe the image texture. Hough Lines Find lines in a picture. Used in line departure warning algorithms.

Copyright Khronos Group 2017 - Page 27 Template Matching a technique in digital image processing for

27 Copyright Khronos Group Page 27 Template Matching a technique in digital image processing for finding small parts of an image which match a template image Allow very strong denoising Tracking Object detection

Copyright Khronos Group 2017 - Page 28 Classifier Extension (Detection and classification) Classification is the problem of identifying to which of a set of categories (sub-populations) a new

28 Copyright Khronos Group Page 28 Classifier Extension (Detection and classification) Classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs. Feature detection refers to methods that aim at computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not This extension do both. Possible methods: SVM Cascade (Decision Tree)

29 Copyright Khronos Group Page 29 OpenVX 1.2 and Neural Net Extension Convolution Neural Network topologies can be represented as OpenVX graphs - s are represented as OpenVX nodes - s connected by multi-dimensional tensors objects - types include convolution, activation, pooling, fully-connected, soft-max - CNN nodes can be mixed with traditional vision nodes Import/Export Extension - Efficient handling of network Weights/Biases or complete networks OpenVX will be able to import NNEF files into OpenVX Neural Nets Native Camera Control Vision Node Vision Node CNN Nodes Vision Node Downstream Application Processing An OpenVX graph mixing CNN nodes with traditional vision nodes

30 Copyright Khronos Group Page 30 Supported networks can be Overfeat Alexnet GoogLeNet versions (Inception) ResNet LSTM RNN/BiRNN Faster-RCNN FCN

Copyright Khronos Group 2017 - Page 31 OpenVX Neural Network Extension Two main parts: (1) a tensor object and (2) a set of CNN layer nodes A vx_tensor is a multi-dimensional array that supports at

31 Copyright Khronos Group Page 31 OpenVX Neural Network Extension Two main parts: (1) a tensor object and (2) a set of CNN layer nodes A vx_tensor is a multi-dimensional array that supports at least 4 dimensions Tensor creation and deletion functions Simple math for tensors - Element-wise Add, Subtract, Multiply, TableLookup, and Bit-depth conversion - Transposition of dimensions and generalized matrix multiplication - vxcopytensorpatch, vxquerytensor (#dims, dims, element type, Q)

32 Copyright Khronos Group Page 32 Importance of Quantization OpenVX support 8-bit by default and 16-bit by extension Quantization is very attractive in low power embedded system Less bandwidth than float (Faster and less power) Need less internal memory (Significant reduction in silicon area) HW is smaller (Int8 and Int16 HW units are smaller than float) Quantization drawbacks Less accurate Degradation in performance quality (Some applications prefer the degradation if significant better performance is achieved)

33 Copyright Khronos Group Page 33 Neural Network Quantization Q78 was the first trivial quantization UINT8 quantization was introduced by google with GEMMLOWP. Use by TensorFlow and Google s TPU accelerator. min x max x INT8 with DFP (Dynamic Fixed Point) see Ristretto (a tool for CNN-approximation) min x max x

34 Copyright Khronos Group Page 34 Other Quantizations in OpenVX 1.2 HOG Features are 16-bit. Tensors are 8-bit and 16-bit only. Classification extension input is Tensor and therefore accept only 8-bits or 16 bits. All Image processing operations are 8-bit or as 16-bit extension. Template Matching input is 8-bit and output is 16-bit.

35 Copyright Khronos Group Page 35 OpenVX Neural Network Extension Tensor types of INT16, INT7.8, INT8, and UINT8 are supported - Other types may be supported by a vendor Conformance tests will be up to some tolerance in precision - To allow for optimizations, e.g., weight compression Eight neural network layer nodes: vxactivation vxconvolution vxdeconvolution vxfullyconnected vxnormalization vxpooling vxsoftmax vxroipooling

Copyright Khronos Group 2017 - Page 36 An Example of Convolution Neural Network Optimizations Conv Activation Pooling Conv Activation Pooling Conv Activation Conv Activation Conv Conv

36 Copyright Khronos Group Page 36 An Example of Convolution Neural Network Optimizations Conv Activation Pooling Conv Activation Pooling Conv Activation Conv Activation Conv Conv Activation Activation Pooling Pooling Fully Connected Activ. Fully Connected Activ. Fully Connected Activation Conv Activation Pooling Conv Activation Pooling Conv Activation Conv Activation

Copyright Khronos Group 2017 - Page 37 Safety

0 - April 2016 Shader programmable pipeline

0-2007 Shader programmable pipeline OpenVX SC

deployment implementation executes on the

and executing the pre-compiled graphs Khronos

37 Copyright Khronos Group Page 37 Safety Critical APIs OpenGL SC Fixed function graphics subset Experience and Guidelines OpenGL SC April 2016 Shader programmable pipeline subset New Generation APIs for safety certifiable vision, graphics and compute e.g. ISO and DO-178B/C OpenGL ES Fixed function graphics OpenGL ES Shader programmable pipeline OpenVX SC 1.1 Released 1st May 2017 Restricted deployment implementation executes on the target hardware by reading the binary format and executing the pre-compiled graphs Khronos SCAP Safety Critical Advisory Panel Guidelines for designing APIs that ease system certification. Open to Khronos member AND industry experts OpenCL SC TSG Formed Working on OpenCL SC 1.2 Eliminate Undefined Behavior Eliminate Callback Functions Static Pool of Event Objects

38 Copyright Khronos Group Page 38 OpenVX SC - Safety Critical Vision Processing OpenVX based on OpenVX 1.1 main specification - Enhanced determinism - Specification identifies and numbers requirements MISRA C clean per KlocWorks v10 Divides functionality into development and deployment feature sets - Adds requirement to support import/export extension OpenVX SC Development Feature Set (Create Graph) Verify Export Binary format Import OpenVX SC Deployment Feature Set (Execute Graph) Entire graph creation API Implementation dependent format No graph creation API

Copyright Khronos Group 2017 - Page 39 OpenVX Application

Create and verify OpenVX graph Create context Export all

Binary in Vendor Specific Format Import objects from binary

39 Copyright Khronos Group Page 39 OpenVX Application Deployment Workflow Development Stage Deployed Application Create and verify OpenVX graph Create context Export all the objects that needs access during deployment OpenVX Binary in Vendor Specific Format Import objects from binary Graph execution Release all objects Release all objects Vendor Tools

Copyright Khronos Group 2017 - Page 40 How OpenVX Compares to Alternatives Governance Open standard API designed to be implemented and shipped by IHVs Community-driven, open source library Open

40 Copyright Khronos Group Page 40 How OpenVX Compares to Alternatives Governance Open standard API designed to be implemented and shipped by IHVs Community-driven, open source library Open standard API designed to be implemented and shipped by IHVs Programming Model Graph defined with C API and then compiled for run-time execution Immediate runtime function calls reading to and from memory Explicit kernels are compiled and executed via run-time API Built-in Vision Functionality Small but growing set of popular functions Vast. Mainly on PC/CPU None. User programs their own or call vision library over OpenCL Target Hardware Any combination of processors or non-programmable hardware Mainly PCs and GPUs Any heterogeneous combination of IEEE FP-capable processors Optimization Opportunities Pre-declared graph enables significant optimizations Each function reads/writes memory. Power performance inefficient Any execution topology can be explicitly programmed Conformance Implementations must pass conformance to use trademark Extensive Test Suite but no formal Adopters program Implementations must pass conformance to use trademark Consistency All core functions must be available in conformant implementations Available functions vary depending on implementation / platform All core functions must be available in all conformant implementations

Copyright Khronos Group 2017 - Page 41 ed Vision/ Neural Net Ecosystem Implementers may use OpenCL or Vulkan to implement OpenVX nodes on programmable processors OpenVX enables the graph to be

41 Copyright Khronos Group Page 41 ed Vision/ Neural Net Ecosystem Implementers may use OpenCL or Vulkan to implement OpenVX nodes on programmable processors OpenVX enables the graph to be extended to include hardware architectures that don t support programmable APIs Vulkan Roadmap Enhanced compute especially useful for vision and inferencing on mobile and Android platforms Application OpenCL Roadmap Flexible precision for widespread deployment on lowcost embedded processors Programmable Vision Processors Dedicated Vision Hardware And then implementors can use OpenVX to enable a developer to easily connect those nodes into a graph The OpenVX graph abstraction enables implementers to optimize execution across diverse hardware architectures for optimal power and performance

AMD OpenVX open-source project with NN bit.

Neural network acceleration for inference

Pre-trained CAFFE model Application openvx and

libraries Select/upload pre-trained NN Models

Application anninferenceserver GPUs dog Browse

library anninferenceapp embedded IoT devices.

Workstation LOOM: 360º live video stitching

on CPUs Inference Generator Server Config

42 AMD OpenVX open-source project with NN bit.ly/openvx-amd All OpenVX projects on GitHub Neural network acceleration for inference Inference Application Development Workflow Pre-trained CAFFE model Application openvx and vx_nn libraries MIOpenGEMM, MIOpen, and OpenCL libraries Select/upload pre-trained NN Models Inference Result Viewer (GUI) Sample Inference Application anninferenceserver GPUs dog Browse Images Submit for Inference scheduler annmodule library anninferenceapp embedded IoT devices... P47 Server Node inference_generator Workstation LOOM: 360º live video stitching Network Server (tcp/ip) JPEG decode & scaling on CPUs Inference Generator Server Config Pre-compiled Models Uploaded Models (Temporary Storage) Copyright Khronos Group Page 42

43 Copyright Khronos Group Page 43 OpenVX Benefits and Resources Faster development of efficient and portable vision applications - Developers are protected from hardware complexities - No platform-specific performance optimizations needed Graph description enables significant automatic optimizations - Scheduling, memory management, kernel fusion, and tiling Performance portability to diverse hardware - Hardware agility for different use case requirements - Application software investment is protected as hardware evolves OpenVX Resources - OpenVX Overview OpenVX Specifications: current, previous, and extensions OpenVX Resources: implementations, tutorials, reference guides, etc. -

Copyright Khronos Group 2017 - Page 44 Any Questions?

44 Copyright Khronos Group Page 44 Any Questions? Khronos is working on a comprehensive set of solutions for vision and inferencing - ed ecosystem that include multiple developer and deployment options Please let us know if we can help you participate in any Khronos activities! A quick hands-on OpenVX Tutorial - Please contact us with any comments or questions! - Radhakrishna Giduthuri radha.giduthuri@ieee.org These slides and further information on all these standards at Khronos -

Standards for Vision Processing and Neural Networks

Copyright Khronos Group 2017 - Page 1 Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD radha.giduthuri@ieee.org Agenda Why we need a standard? Khronos NNEF Khronos OpenVX