Standards for Vision Processing and Neural Networks

Similar documents
The OpenVX Computer Vision and Neural Network Inference

Standards for Neural Networks Acceleration and Deployment

Open Standards for AR and VR Neil Trevett Khronos President NVIDIA VP Developer January 2018

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Copyright Khronos Group Page 1

Accelerating Vision Processing

Neural Network Exchange Format

Silicon Acceleration APIs

Standards Update. Copyright Khronos Group Page 1

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

Overview and AR/VR Roadmap

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Open Standard APIs for Augmented Reality

Kari Pulli Intel. Radhakrishna Giduthuri, AMD. Frank Brill NVIDIA. OpenVX Webinar. June 16, 2016

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems

Khronos Connects Software to Silicon

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Vision Acceleration. Launch Briefing October Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

Wu Zhiwen.

AR Standards Update Austin, March 2012

WebGL Meetup GDC Copyright Khronos Group, Page 1

Copyright Khronos Group Page 1

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014

Copyright Khronos Group Page 1

SIGGRAPH Briefing August 2014

OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

Vulkan 1.1 March Copyright Khronos Group Page 1

Khronos and the Mobile Ecosystem

April 4-7, 2016 Silicon Valley VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783. Elif Albuz, April 4, 2016

Enable AI on Mobile Devices

OpenCL Press Conference

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

Vulkan Launch Webinar 18 th February Copyright Khronos Group Page 1

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

Copyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012

Higher Level Programming Abstractions for FPGAs using OpenCL

Mobile AR Hardware Futures

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

Graph Streaming Processor

Tutorial Practice Session

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017

Next Generation Visual Computing

TIOVX TI s OpenVX Implementation

How to Build Optimized ML Applications with Arm Software

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Deploying Deep Learning Networks to Embedded GPUs and CPUs

How to Build Optimized ML Applications with Arm Software

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

Accelerating Implementation of Low Power Artificial Intelligence at the Edge

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

OpenCL The Open Standard for Heterogeneous Parallel Programming

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.

gltf Briefing September 2016 Copyright Khronos Group Page 1

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

OpenCL: History & Future. November 20, 2017

HKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro

Standards update and liaison report January 2019

Introduction to OpenGL ES 3.0

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

End to End Optimization Stack for Deep Learning

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Open Standard APIs for Embedded Vision Processing

Xilinx ML Suite Overview

Deep Learning Requirements for Autonomous Vehicles

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1

Applying OpenCL. IWOCL, May Andrew Richards

Beyond Hardware IP An overview of Arm development solutions

Khronos Updates GDC 2017 Neil Trevett Vice President Developer Ecosystem, NVIDIA President,

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

Design guidelines for embedded real time face detection application

An introduction to Machine Learning silicon

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

The OpenVX Classifier Extension. The Khronos OpenVX Working Group, Editors: Tomer Schwartz, Radhakrishna Giduthuri

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

Open Standards for Today s Gaming Industry

Standards for WebVR. Neil Trevett. Khronos President Vice President Mobile Content,

Technology and Design Tools for Multicore Embedded Systems Software Development

Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group. Copyright Khronos Group Page 1

SYCL for OpenCL in a Nutshell

WebGL, WebCL and OpenCL

Movidius Neural Compute Stick

Unified Deep Learning with CPU, GPU, and FPGA Technologies

The Benefits of GPU Compute on ARM Mali GPUs

Research Faculty Summit Systems Fueling future disruptions

Adding Advanced Shader Features and Handling Fragmentation

Transcription:

Copyright Khronos Group 2017 - Page 1 Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD radha.giduthuri@ieee.org

Agenda Why we need a standard? Khronos NNEF Khronos OpenVX dog Network Architecture Pre-trained Network Model (weights, ) Copyright Khronos Group 2017 - Page 2

Copyright Khronos Group 2017 - Page 3 Neural Network End-to-End Workflow Neural Network Training Frameworks Third Party Tools Vision/AI Applications Datasets Network Architecture Trained Network Model Vision and Neural Network Inferencing Runtime Desktop and Cloud Hardware cudnn MIOpen MKL-DNN Embedded/Mobile Vision/Inferencing Embedded/Mobile Embedded/Mobile Embedded/Mobile/Desktop/Cloud Vision/Inferencing Hardware Hardware Vision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA

Copyright Khronos Group 2017 - Page 4 Problem: Neural Network Fragmentation Neural Network Training and Inferencing Fragmentation NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Every Tool Needs an Exporter to Every Accelerator Inference Engine 1 Inference Engine 2 Inference Engine 3 Neural Network Inferencing Fragmentation toll on Applications Vision/AI Application Every Application Needs know about Every Accelerator API Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3

Copyright Khronos Group 2017 - Page 5 Khronos APIs Connect Software to Silicon Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for 3D graphics, Virtual and Augmented Reality, Parallel Computing, Vision Processing and Neural Networks

Copyright Khronos Group 2017 - Page 6 NNEF - Solving Neural Network Fragmentation Before NNEF NN Training and Inferencing Fragmentation NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Every Tool Needs an Exporter to Every Accelerator Inference Engine 1 Inference Engine 2 Inference Engine 3 With NNEF- NN Training and Inferencing Interoperability NN Authoring Framework 1 NN Authoring Framework 2 Inference Engine 1 Inference Engine 2 NN Authoring Framework 3 Optimization and processing tools NNEF is a Cross-vendor Neural Net file format Encapsulates network formal semantics, structure, data formats, commonly-used operations (such as convolution, pooling, normalization, etc.) Inference Engine 3

Copyright Khronos Group 2017 - Page 7 OpenVX - Solving Inferencing Fragmentation Before OpenVX Vision and NN Inferencing Fragmentation Vision/AI Application Every Application Needs know about Every Accelerator API Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3 With OpenVX Vision and NN Inferencing Interoperability Vision/AI Application Inference Engine 1 Inference Engine 2 Inference Engine 3 Hardware 1 Hardware 2 Hardware 3 OpenVX is an open, royalty-free standard for cross platform acceleration of computer vision and neural network applications.

Copyright Khronos Group 2017 - Page 8 Neural Net Workflow with Khronos Standards Neural Network Training Frameworks Third Party Tools Vision/AI Applications Datasets Network Architecture Trained Network Model Vision and Neural Network Inferencing Runtime Desktop and Cloud Hardware cudnn MIOpen MKL-DNN Embedded/Mobile Vision/Inferencing Embedded/Mobile Embedded/Mobile Embedded/Mobile/Desktop/Cloud Vision/Inferencing Hardware Hardware Vision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA

Copyright Khronos Group 2017 - Page 9 Application Development Workflow Vision/AI Applications Trained Format Structure plus weights (quantized or not). Can be augmented with private data Graph Compiler Inference Engine Structure only OR Structure plus weights (quantized or not) Training Framework Third Party Tools Vendor Format

Copyright Khronos Group 2017 - Page 10 NNEF Status and Roadmap V1.0 is under development, industry comments are being sought now - NNEF has formed an advisory panel, you are invited today to participate First version will focus on interface between framework and embedded inference engines - But will allow training as secondary goal Support First cut range of network types - Field is moving very fast but we aim to keep up with developments NNEF Roadmap - Track development of new network types - Allow authoring and retraining (3rd party tools) - Address a wider range of applications (outside vision apps) - Increase the expressive power of the format

Copyright Khronos Group 2017 - Page 11 Khronos Advisory Panels Specification drafts and invitations for requirements and feedback Requirements and feedback on specification drafts Working Group Khronos Members Any company can join Membership Fee Sign NDA and IP Framework Shared Email list and Repository Hosted by Khronos. Under Khronos NDA Advisory Panel Khronos Advisors Invited industry experts $0 Cost Sign NDA and IP Framework Advisory Panels Active for NNEF, Vulkan, and OpenCL/SYCL

An open, royalty-free standard for cross platform acceleration of computer vision and neural network applications. Copyright Khronos Group 2017 - Page 12

Copyright Khronos Group 2017 - Page 13 OpenVX Power Efficiency Wide range of vision hardware architectures OpenVX provides a high-level Graph-based abstraction -> Enables Graph-level optimizations! Can be implemented on almost any hardware or processor! -> Portable, Efficient Vision Processing! X100 X10 X1 Dedicated Hardware Vision DSPs GPU Compute Multi-core CPU Computation Flexibility Vision Node Vision Engines GPU Vision Node CNN Nodes Middleware DSP Vision Processing Graph Applications Hardware Vision Node Software Portability

Copyright Khronos Group 2017 - Page 14 OpenVX - Graph-Level Abstraction OpenVX developers express a graph of image operations ( Nodes ) - Using a C API Nodes can be executed on any hardware or processor coded in any language - Implementers can optimize under the high-level graph abstraction Graphs are the key to run-time power and performance optimizations - E.g. Node fusion, tiled graph processing for cache efficiency etc. Camera Input OpenVX Nodes Pyr t Rendering Output RGB Frame Color Conversion YUV Frame Channel Extract Gray Frame Image Pyramid Optical Flow Array of Keypoints Harris Track OpenVX Graph Feature Extraction Example Graph Ftr t-1 Array of Features

Copyright Khronos Group 2017 - Page 15 OpenVX Efficiency through Graphs.. Graph Scheduling Memory Management Kernel Fusion Data Tiling Split the graph execution across the whole system: CPU / GPU / dedicated HW Reuse pre-allocated memory for multiple intermediate data Replace a subgraph with a single faster node Execute a subgraph at tile granularity instead of image granularity Faster execution or lower power consumption Less allocation overhead, more memory for other applications Better memory locality, less kernel launch overhead Better use of data cache and local memory

Copyright Khronos Group 2017 - Page 16 Simple Edge Detector in OpenVX vx_graph g = vxcreategraph(); vx_image input = vxcreateimage(1920, 1080); vx_image output = vxcreateimage(1920, 1080); vx_image horiz = vxcreatevirtualimage(g); vx_image vert = vxcreatevirtualimage(g); vx_image mag = vxcreatevirtualimage(g); Declare Input and Output Images Declare Intermediate Images vxsobel3x3node(g, input, horiz, vert); vxmagnitudenode(g, horiz, vert, mag); vxthresholdnode(g, mag, THRESH, output); Construct the Graph topology Compile the Graph Execute the Graph status = vxverifygraph(g); status = vxprocessgraph(g); i h S M m T o v

Copyright Khronos Group 2017 - Page 17 OpenVX Evolution Conformant Implementations Conformant Implementations OpenVX 1.0 Spec released October 2014 New Functionality Expanded Nodes Functionality Enhanced Graph Framework AMD OpenVX Tools - Open source, highly optimized for x86 CPU and OpenCL for GPU - Graph Optimizer looks at entire processing pipeline and removes, replaces, merges functions to improve performance and bandwidth - Scripting for rapid prototyping, without re-compiling, at production performance levels http://gpuopen.com/compute-product/amd-openvx/ OpenVX 1.1 Spec released May 2016 New Functionality Conditional node execution Feature detection Classification operators Expanded imaging operations Extensions Neural Network Acceleration Graph Save and Restore 16-bit image operation Safety Critical OpenVX 1.1 SC for safety-certifiable systems OpenVX 1.2 Spec released May 2017 New Functionality Under Discussion NNEF Import Programmable user kernels with accelerator offload Streaming/pipelining OpenVX Roadmap

AMD s open-source implementation Highly-optimized for x86 CPU and OpenCL GPU Available on github.com http://github.com/gpuopen-professionalcompute-libraries/amdovx-modules Additional modules LOOM: highly optimized library for real-time 360 degree video stitching NN: neural network module built on top MIOpen (OpenCL-based machine learning primitives) Includes a tool to import pre-trained Caffe models into NN (develop branch) Copyright Khronos Group 2017 - Page 18

Copyright Khronos Group 2017 - Page 19 New OpenVX 1.2 Functions Feature detection: find features useful for object detection and recognition - Histogram of gradients HOG Template matching - Local binary patterns LBP Line finding Classification: detect and recognize objects in an image based on a set of features - Import a classifier model trained offline - Classify objects based on a set of input features Image Processing: transform an image - Generalized nonlinear filter: Dilate, erode, median with arbitrary kernel shapes - Non maximum suppression: Find local maximum values in an image - Edge-preserving noise reduction Conditional execution & node predication - Selectively execute portions of a graph based on a true/false predicate Many, many minor improvements New Extensions - Import/export: compile a graph; save and run later - 16-bit support: signed 16-bit image data - Neural networks: Layers are represented as OpenVX nodes B A S Condition C If A then S B else S C

Copyright Khronos Group 2017 - Page 20 OpenVX 1.2 and Neural Net Extension Convolution Neural Network topologies can be represented as OpenVX graphs - Layers are represented as OpenVX nodes - Layers connected by multi-dimensional tensors objects - Layer types include convolution, activation, pooling, fully-connected, soft-max - CNN nodes can be mixed with traditional vision nodes Import/Export Extension - Efficient handling of network Weights/Biases or complete networks OpenVX will be able to import NNEF files into OpenVX Neural Nets Native Camera Control Vision Node Vision Node CNN Nodes Vision Node Downstream Application Processing An OpenVX graph mixing CNN nodes with traditional vision nodes

Copyright Khronos Group 2017 - Page 21 OpenVX Neural Network Extension Two main parts: (1) a tensor object and (2) a set of CNN layer nodes A vx_tensor is a multi-dimensional array that supports at least 4 dimensions Tensor creation and deletion functions Simple math for tensors - Element-wise Add, Subtract, Multiply, TableLookup, and Bit-depth conversion - Transposition of dimensions and generalized matrix multiplication - vxcopytensorpatch, vxquerytensor (#dims, dims, element type, Q)

Copyright Khronos Group 2017 - Page 22 OpenVX Neural Network Extension Tensor types of INT16, INT7.8, INT8, and U8 are supported - Other types may be supported by a vendor Conformance tests will be up to some tolerance in precision - To allow for optimizations, e.g., weight compression Eight neural network layer nodes: vxactivationlayer vxconvolutionlayer vxdeconvolutionlayer vxfullyconnectedlayer vxnormalizationlayer vxpoolinglayer vxsoftmaxlayer vxroipoolinglayer

Copyright Khronos Group 2017 - Page 23 Safety Critical APIs OpenGL SC 1.0-2005 Fixed function graphics subset Experience and Guidelines OpenGL SC 2.0 - April 2016 Shader programmable pipeline subset New Generation APIs for safety certifiable vision, graphics and compute e.g. ISO 26262 and DO-178B/C OpenGL ES 1.0-2003 Fixed function graphics OpenGL ES 2.0-2007 Shader programmable pipeline OpenVX SC 1.1 Released 1st May 2017 Restricted deployment implementation executes on the target hardware by reading the binary format and executing the pre-compiled graphs Khronos SCAP Safety Critical Advisory Panel Guidelines for designing APIs that ease system certification. Open to Khronos member AND industry experts OpenCL SC TSG Formed Working on OpenCL SC 1.2 Eliminate Undefined Behavior Eliminate Callback Functions Static Pool of Event Objects

Copyright Khronos Group 2017 - Page 24 OpenVX SC - Safety Critical Vision Processing OpenVX 1.1 - based on OpenVX 1.1 main specification - Enhanced determinism - Specification identifies and numbers requirements MISRA C clean per KlocWorks v10 Divides functionality into development and deployment feature sets - Adds requirement to support import/export extension OpenVX SC Development Feature Set (Create Graph) Verify Export Binary format Import OpenVX SC Deployment Feature Set (Execute Graph) Entire graph creation API Implementation dependent format No graph creation API

Copyright Khronos Group 2017 - Page 25 How OpenVX Compares to Alternatives Governance Open standard API designed to be implemented and shipped by IHVs Community-driven, open source library Open standard API designed to be implemented and shipped by IHVs Programming Model Graph defined with C API and then compiled for run-time execution Immediate runtime function calls reading to and from memory Explicit kernels are compiled and executed via run-time API Built-in Vision Functionality Small but growing set of popular functions Vast. Mainly on PC/CPU None. User programs their own or call vision library over OpenCL Target Hardware Any combination of processors or non-programmable hardware Mainly PCs and GPUs Any heterogeneous combination of IEEE FP-capable processors Optimization Opportunities Pre-declared graph enables significant optimizations Each function reads/writes memory. Power performance inefficient Any execution topology can be explicitly programmed Conformance Implementations must pass conformance to use trademark Extensive Test Suite but no formal Adopters program Implementations must pass conformance to use trademark Consistency All core functions must be available in conformant implementations Available functions vary depending on implementation / platform All core functions must be available in all conformant implementations

Copyright Khronos Group 2017 - Page 26 OpenVX Benefits and Resources Faster development of efficient and portable vision applications - Developers are protected from hardware complexities - No platform-specific performance optimizations needed Graph description enables significant automatic optimizations - Scheduling, memory management, kernel fusion, and tiling Performance portability to diverse hardware - Hardware agility for different use case requirements - Application software investment is protected as hardware evolves OpenVX Resources - OpenVX Overview - https://www.khronos.org/openvx - OpenVX Specifications: current, previous, and extensions - https://www.khronos.org/registry/openvx - OpenVX Resources: implementations, tutorials, reference guides, etc. - https://www.khronos.org/openvx/resources

Copyright Khronos Group 2017 - Page 27 Layered Vision/ Neural Net Ecosystem Implementers may use OpenCL or Vulkan to implement OpenVX nodes on programmable processors OpenVX enables the graph to be extended to include hardware architectures that don t support programmable APIs Vulkan Roadmap Enhanced compute especially useful for vision and inferencing on mobile and Android platforms Application OpenCL Roadmap Flexible precision for widespread deployment on lowcost embedded processors Programmable Vision Processors Dedicated Vision Hardware And then implementors can use OpenVX to enable a developer to easily connect those nodes into a graph The OpenVX graph abstraction enables implementers to optimize execution across diverse hardware architectures for optimal power and performance

Copyright Khronos Group 2017 - Page 28 Any Questions? Khronos working on a comprehensive set of solutions for vision and inferencing - Layered ecosystem that include multiple developer and deployment options These slides and further information on all these standards at Khronos - www.khronos.org Please let us know if we can help you participate in any Khronos activities! - You are very welcome and we appreciate your input! Please contact us with any comments or questions! - Radhakrishna Giduthuri radha.giduthuri@ieee.org