Neural Network Exchange Format

Similar documents
Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Standards for Vision Processing and Neural Networks

The OpenVX Computer Vision and Neural Network Inference

Open Standards for AR and VR Neil Trevett Khronos President NVIDIA VP Developer January 2018

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

An introduction to Machine Learning silicon

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Unified Deep Learning with CPU, GPU, and FPGA Technologies

How to Build Optimized ML Applications with Arm Software

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

Xilinx ML Suite Overview

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems

Standards for Neural Networks Acceleration and Deployment

End to End Optimization Stack for Deep Learning

Graph Streaming Processor

Accelerating Vision Processing

Copyright Khronos Group Page 1

How to Build Optimized ML Applications with Arm Software

Overview and AR/VR Roadmap

SIGGRAPH Briefing August 2014

Wu Zhiwen.

HKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017

AR Standards Update Austin, March 2012

NVIDIA DEEP LEARNING INSTITUTE

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

TEXAS INSTRUMENTS DEEP LEARNING (TIDL) GOES HERE FOR SITARA PROCESSORS GOES HERE

WebGL Meetup GDC Copyright Khronos Group, Page 1

Singularity for GPU and Deep Learning

Silicon Acceleration APIs

Accelerating Implementation of Low Power Artificial Intelligence at the Edge

What s New in MATLAB and Simulink

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Making Sense of Artificial Intelligence: A Practical Guide

Relay: a high level differentiable IR. Jared Roesch TVMConf December 12th, 2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Open Standard APIs for Augmented Reality

Managing Deep Learning Workflows

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Review: The best frameworks for machine learning and deep learning

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation

Movidius Neural Compute Stick

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

What s New in MATLAB and Simulink

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

Copyright Khronos Group Page 1

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Khronos and the Mobile Ecosystem

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

Revolutionizing the Datacenter

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Sony's deep learning software "Neural Network Libraries/Console and its use cases in Sony

All You Want To Know About CNNs. Yukun Zhu

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

Improving Performance of Machine Learning Workloads

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB

Copyright Khronos Group Page 1

Why data science is the new frontier in software development

Free Learning OpenCV 3 Computer Vision With Python - Second Edition Ebooks Online

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Hardware and Software. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

GPU-Accelerated Deep Learning

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

Snapdragon NPE Overview

Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Usable while performant: the challenges building. Soumith Chintala

OpenCL Press Conference

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

Bringing Intelligence to Enterprise Storage Drives

EGLStream DataLocator Extension Specification

Deep Learning mit PowerAI - Ein Überblick

Core ML in Depth. System Frameworks #WWDC17. Krishna Sridhar, Core ML Zach Nation, Core ML

Khronos Connects Software to Silicon

MoonRiver: Deep Neural Network in C++

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

Vision Acceleration. Launch Briefing October Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group

Enable AI on Mobile Devices

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014

CafeGPI. Single-Sided Communication for Scalable Deep Learning

Standards for WebVR. Neil Trevett. Khronos President Vice President Mobile Content,

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Mobile AR Hardware Futures

HIRP OPEN 2018 Compiler & Programming Language. An Efficient Framework for Optimizing Tensors

Deep Learning Accelerators

Embedded GPGPU and Deep Learning for Industrial Market

Transcription:

Copyright Khronos Group 2017 - Page 1 Neural Network Exchange Format Deploying Trained Networks to Inference Engines Viktor Gyenes, specification editor

Copyright Khronos Group 2017 - Page 2 Outlook The NN deployment problem and the proposed solution About Khronos, NNEF group, AIMOTIVE NNEF design philosophy NNEF components NNEF usage, impact Release schedule and feedback Future directions

Copyright Khronos Group 2017 - Page 3 NNEF In a Nutshell - The problem There is a wide range of open-source deep learning frameworks available - Caffe, Torch, Theano, TensorFlow, Chainer, CNTK, MXNet Caffe2, PyTorch - Each framework has its own model format to store trained networks Various chip vendors have released or are planning to release deep learning inference kits / engines - Nvidia cudnn/gie, Apple DLKit, Qualcomm, Inference engines need to be compatible with many deep learning frameworks Network descriptions have no clear semantics (ambiguities) Caffe Torch TensorFlow Vendor A Vendor B Vendor C

Copyright Khronos Group 2017 - Page 4 NNEF In a Nutshell The solution Create a unified network description format to facilitate deployment of networks from frameworks to inference engines - Describe network structure with clear semantics - Describe network data Let frameworks convert their representation to the exchange format Let inference engines import only the exchange format - No need to worry about where the network was trained Caffe Torch TensorFlow NNEF Vendor A Vendor B Vendor C

Copyright Khronos Group 2017 - Page 5 About Khronos Group Khronos Group is an international organization, an open consortium of leading hardware and software companies creating Open Standards - Everyone is welcome to join, various membership levels Has authored various Royalty Free specifications - Parallel computing, graphics and vision, sensor processing - OpenGL, OpenCL, OpenVX, Vulkan, - Independent but cooperating Working Groups for each standard Focus is on hardware acceleration - Embedded devices, edge devices, mobile - CPUs, GPUs, FPGAs, SoCs, Conformance tests and adopters program for specification integrity - Cross vendor compatibility

Copyright Khronos Group 2017 - Page 6 About the NNEF group Neural Network Exchange Format Working Group was founded in September 2016 - Initiated by AImotive - After an exploration that began in early 2016 to investigate industry requirements - The standardization idea was also circulated among DL framework developers NNEF group is in collaboration with the OpenVX group - OpenVX is an image processing API with a neural network extension - Provides an execution model for running neural networks on embedded devices

Copyright Khronos Group 2017 - Page 7 About AImotive AImotive is a software company delivering artificial intelligence based software stack for self-driving cars - Software components for recognition, localization, control - Relying primarily on camera inputs - Hardware IP for custom chips to run neural networks in a low power budget with high efficiency Heavily builds on neural network solutions - We use various deep learning frameworks to train networks - We use GPUs and FPGAs for prototyping, custom chips for production - We experience the NN exchange problem in-house and in relation with partners

Copyright Khronos Group 2017 - Page 8 Deep Learning Frameworks - Similarities and Differences We examined and worked with various frameworks - Torch, Caffe, TensorFlow (examined Theano, Chainer, Caffe2, PyTorch) They vary in the way they build networks, but the underlying operations are very similar - Most of the core ops are powered by the same implementation (cudnn) - They build a computational graph that is similar on the lower level - The high level interface is different, often use scripting languages However, there are critical differences in the operations - Differences in parameterizations of computations (mathematical formulas) - Differences in output shape computations (asymmetric padding) - Differences in output value computations (border handling, image resizing)

Copyright Khronos Group 2017 - Page 9 NNEF Design Philosophy Convey all relevant information from DL frameworks to inference engines Platform independence - No hardware specification, no hardware specific data formats, etc. Flexible, extensible description (rapidly changing field) - By vendor specific operations - By future use cases and operations Easy to consume by engines/libraries/drivers written in low level languages - Scripting languages are often not available in embedded environments Implementable and optimizable on various hardware platforms - Hierarchical description, multiple levels of granularity Support for quantization techniques

Copyright Khronos Group 2017 - Page 10 NNEF Design Philosophy Supported Network Architectures Support at least the following network architectures - Fully connected networks (MLPs, auto-encoders) - Convolutional networks (feedforward, encoder-decoder) - Recurrent networks (LSTMs, GRUs) Support the following learning tasks - Image classification - Semantic segmentation - Object detection, instance segmentation - Language processing (syntactic analysis, sentiment analysis) - Audio processing - Video processing (action classification)

Copyright Khronos Group 2017 - Page 11 NNEF Design Philosophy Validation of Network Description Ensure that a network description can be easily validated - Syntactic/semantic validity of a document - Validity of the resulting graph - Implementation independent aspects - For example well defined tensor shapes and proper initialization Possibility to check that an inference engine can execute a network - Without loading the whole network - Whether all operations/parameterizations are supported

Copyright Khronos Group 2017 - Page 12 What is included in the standard NNEF aims to abstract out the network description from frameworks - Only the network structure and data (no data feeding or training logic) A distilled set of frequently used operations - Well defined input-output mapping (output shape and value) - Well defined parameterization (math formulas) A simple syntax for describing networks on a medium-low level - Very high level scripting is not priority, too many options, hard to standardize Data format for storing network parameters (weights) Support for describing quantized networks

Copyright Khronos Group 2017 - Page 13 NNEF Components Structure description Devised a simple language to describe network structure - Python-like syntax, limited set of features - But strictly typed, easier to validate Supports the hierarchical description of graph fragments - Similar to Python functions for graph building - Define larger fragments (compound ops) from smaller ones (primitives) - Instantly extensible with new compounds that can be built from primitives - Vendors don t need to implement all primitives, can optimize compounds A predefined set of primitive and compound operations for building networks - Element-wise, activation, linear, pooling, normalization

Copyright Khronos Group 2017 - Page 14 NNEF Components Data storage The structure description has data parameters (network weights) - Typically named in a hierarchical fashion according to network structure - For example variable scopes Parameter tensors are stored in a separate format - Simple data-format to store tensor data in floating point or quantized representation - Organized into hierarchy according to scopes All the data and structure description is wrapped around with a container - Results in a single data-stream - May provide compression or encryption

NNEF Components Quantization info Quantization is a crucial element of executing networks efficiently on embedded hardware Quantization information needs to be stored in the network description - In a platform independent manner - No reference to underlying data representations, like bit widths, arithmetic precision, etc. - Approach: pseudo quantization, conceptually on real-valued data Quantization algorithms are various - Describe them as compounds built from primitives - Rounding and clamping operations Quantization algorithm for activations and for stored parameters - The data itself may be stored in the quantized format - Along with quantization algorithm Copyright Khronos Group 2017 - Page 15

Copyright Khronos Group 2017 - Page 16 NNEF for Deep Learning Frameworks It is possible to write third party converters/exporters - We have done that for Caffe and TensorFlow - Need to map structure description and parameter data to NNEF It would be good to have built-in support for NNEF export in deep learning frameworks - Needs to be maintained when frameworks evolve (may happen frequently) or when the standard is updated (happens rarely) Importing NNEF to DL frameworks would be possible when all features are supported by the framework - Reverse conversion tools

Copyright Khronos Group 2017 - Page 17 NNEF for Inference Engines APIs may choose to implement a subset of ops - Or even a subset of parametrizations for a given op APIs may choose which ops to treat as atomic and which as compound - Provide optimized compounds APIs may choose to compile NNEF offline into a vendor specific description - Or consume NNEF directly NNEF does not define conformance tests - Only the ideal case of infinite arithmetic - When is an implementation accurate enough? - Can t be defined without reference to data representation (platform dep.) - Want to leave room for fast approximations at the cost of accuracy - OpenVX will define execution model and conformance tests using NNEF

Copyright Khronos Group 2017 - Page 18 Usage Scenarios, Impact The most important target use case is deployment of trained networks to inference engines Further use cases may include - A common format for network conversion tools - High level graph transformations and optimizations - Quantization - It would also facilitate transferring networks among DL frameworks Further impact may include - Research results exported into NNEF would become immediately executable on various hardware (portability) and possible to integrate into various applications - May drive DL frameworks to be even more compatible with each other

Copyright Khronos Group 2017 - Page 19 Planned Release Schedule Start with a Provisional Release - Opportunity to take community feedback into account before finalizing the release - Scheduled to be released before the end of 2017 The final release would probably arrive after a period of public feedback - Around mid 2018 Subsequent releases would contain improvements as necessary - Syntactic features - New operations - Support for training

Copyright Khronos Group 2017 - Page 20 NNEF Advisory Panel Anyone who wishes to review the NNEF specification draft can join an Advisory Panel - After signing and NDA with Khronos Group Provides early access to specification drafts Share feedback on mailing list or teleconferences organized on-demand

Copyright Khronos Group 2017 - Page 21 Future directions Training At its core, the training process is just another recurrent computation - Many complicated aspects: support at least the computational aspect - Ignore data-feeding, parameter tuning, validation, etc. aspects The primitive ops are very similar - Backward ops are often used in feed-forward networks as well (deconv) - Solvers and regularizers are possible to describe from primitives - Initializer ops can be introduced Generate training graph with automatic differentiation from inference graph - Either in the DL frameworks before export (e.g. Caffe2) - Or in NNEF by third party graph conversion tools

Copyright Khronos Group 2017 - Page 22 Thank you! Contact Info Viktor Gyenes, PhD Lead AI Research Engineer AIMOTIVE Budapest, Hungary / Mountain View, CA viktor.gyenes@aimotive.com www.aimotive.com Khronos Group www.khronos.org