Copyright Khronos Group 2014 - Page 1 Open Standard APIs for Embedded Vision Processing Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group
Copyright Khronos Group 2014 - Page 2 Speakers This Morning Neil Trevett - Vice President Mobile Ecosystem, NVIDIA - President, Khronos - Chair, OpenCL Working Group Mikael Sevenier - Chair, Camera working group Jim Steele - CTO, Sensor Platforms - Chair, StreamInput
Copyright Khronos Group 2014 - Page 3 Khronos Connects Software to Silicon Open Consortium creating ROYALTY-FREE, OPEN STANDARD APIs for hardware acceleration Defining the roadmap for low-level silicon interfaces needed on every platform Graphics, compute, rich media, vision, sensor and camera processing Rigorous specifications AND conformance tests for crossvendor portability Acceleration APIs BY the Industry FOR the Industry Well over a BILLION people use Khronos APIs Every Day
Copyright Khronos Group 2014 - Page 4 Khronos Standards Visual Computing - 3D Graphics - Heterogeneous Parallel Computing 3D Asset Handling - 3D authoring asset interchange - 3D asset transmission format with compression Camera Control API Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion Over 100 companies defining royalty-free APIs to connect software to silicon Acceleration in HTML5-3D in browser no Plug-in - Heterogeneous computing for JavaScript
Copyright Khronos Group 2014 - Page 5 Sensors & Vision Driving Key Mobile Use Cases Computational Photography and Videography Natural UI with Face, Body and Gesture Tracking 3D Scene and Object Reconstruction Augmented Reality Time
Vision Pipeline Challenges and Opportunities Growing Camera Diversity Capturing color, range and lightfields Diverse Vision Processors Driving for high performance and low power Sensor Proliferation Diverse sensor awareness of the user and surroundings Light / Proximity 2 cameras 3 microphones Touch Camera sensors >20MPix Novel sensor configurations Stereo pairs Active Structured Light Active TOF Plenoptic Arrays Camera ISPs Dedicated vision IP blocks DSPs and DSP arrays Programmable GPUs Multi-core CPUs 19 Position - GPS - WiFi (fingerprint) - Cellular trilateration - NFC/Bluetooth Beacons Accelerometer Magnetometer Gyroscope Pressure / Temp / Humidity Flexible sensor and camera control to generate required image stream Camera Control API Use best processing available for image stream processing with code portability Control/fuse vision data by/with all other sensor data on device Copyright Khronos Group 2014 - Page 6
Copyright Khronos Group 2014 - Page 7 OpenVX Power Efficient Vision Acceleration Acceleration API for real-time vision - Focus on mobile and embedded systems Enable diverse efficient implementations - From CPUs, through GPUs and DSPs to dedicated hardware Foundational API for vision acceleration - Can be used by middleware libraries or by applications directly Complementary to OpenCV - Which is great for prototyping Khronos open source sample implementation - To be released with final specification - Sample - not reference - spec remains the definitive definition of OpenVX operation OpenCV open source library Open source sample implementation Application Other higher-level CV libraries Hardware vendor implementations
Copyright Khronos Group 2014 - Page 8 OpenVX Graphs The Key to Efficiency Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated VXU Utility Library for access to single nodes - Easy way to start using OpenVX by calling each node independently Native Camera Control OpenVX Node OpenVX Node OpenVX Node Example OpenVX Graph OpenVX Node Heterogeneous Processing
Copyright Khronos Group 2014 - Page 9 OpenVX 1.0 Function Overview Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters Image Processing - Arithmetic, Logical, and statistical operations - Multichannel Color and BitDepth Extraction and Conversion - 2D Filtering and Morphological operations - Image Resizing and Warping Core Computer Vision - Pyramid computation - Integral Image computation Feature Extraction and Tracking - Histogram Computation and Equalization - Canny Edge Detection - Harris and FAST Corner detection - Sparse Optical Flow Widely used extensions adopted into future versions of the core OpenVX Specification Evolution OpenVX 1.0 defines framework for creating, managing and executing graphs Focused set of widely used functions that are readily accelerated Implementers can add functions as extensions
Copyright Khronos Group 2014 - Page 10 Example Graph - Stereo Machine Vision OpenVX Graph Camera 1 Stereo Rectify with Remap Compute Depth Map (User Node) Detect and track objects (User Node) Object coordinates Camera 2 Stereo Rectify with Remap Image Pyramid Compute Optical Flow Delay Tiling extension enables user nodes (extensions) to also optimally run in local memory
Copyright Khronos Group 2014 - Page 11 OpenVX and OpenCV are Complementary Governance Conformance Community driven open source with no formal specification No conformance tests for consistency and every vendor implements different subset Formal specification defined and implemented by hardware vendors Full conformance test suite / process creates a reliable acceleration platform Portability APIs can vary depending on processor Hardware abstracted for portability Scope Efficiency Very wide 1000s of imaging and vision functions Multiple camera APIs/interfaces Memory-based architecture Each operation reads and writes memory Tight focus on hardware accelerated functions for mobile vision Use external camera API Graph-based execution Optimizable computation, data transfer Use Case Rapid experimentation Production development & deployment
Copyright Khronos Group 2014 - Page 12 OpenVX Participants and Timeline Provisional 1.0 specification released November 2013 for industry feedback Aiming for specification finalization and conformance tests 3Q14 Itseez is working group chair (the convener of OpenCV) Qualcomm and TI are specification editors
Copyright Khronos Group 2014 - Page 13 OpenCL Portable Heterogeneous Computing Portable Heterogeneous programming of diverse compute resources - Targeting supercomputers -> embedded systems -> mobile devices One code tree can be executed on CPUs, GPUs, DSPs and hardware - Dynamically interrogate system load and balance work across available processors OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels OpenCL across multiple devices Kernel OpenCL Code Kernel OpenCL Code Kernel OpenCL Code Kernel Code GPU DSP HW CPU CPU
Copyright Khronos Group 2014 - Page 14 OpenCL as Foundation for Parallel Compute 100+ tool chains and languages leveraging OpenCL - Heterogeneous solutions emerging for the most popular programming languages C++ AMP Shevlin Park Uses Clang and LLVM SYCL C++ syntax compiler extensions WebCL JavaScript binding for initiation of OpenCL C kernels Language for image processing and computational photography Aparapi Java language extensions for parallelism River Trail Language extensions to JavaScript PyOpenCL Python wrapper around OpenCL Harlan High level language for GPU programming Compiler directives for Fortran, C and C++ SPIR Standard Portable Intermediate Representation (extending LLVM for parallel computation) SPIR 1.2 Released in January 2014 OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources
Copyright Khronos Group 2014 - Page 15 OpenVX and OpenCL are Complementary Use Case Architecture Target Hardware Precision Ease of Use General Heterogeneous programming Language-based needs online compilation Exposed architected memory model can impact performance portability Full IEEE floating point mandated General-purpose math libraries with no built-in vision functions Domain targeted Vision processing Library-based - no online compiler required Abstracted node and memory model - diverse implementations can be optimized for power and performance Minimal floating point requirements optimized for vision operators Fully implemented vision operators and framework out of the box It is possible to use OpenCL to build OpenVX Nodes
Need for Camera Control API We have choice of APIs for image and vision image processing - BUT no open standard API for camera control to FEED these APIs! Need advanced control of ISP and camera subsystem - Generate sophisticated image stream for advanced imaging & vision apps No system API fulfills all developer requirements - Advanced, high-frequency burst control of camera and sensor operation - Portable support for diversity of sensors: e.g. depth sensors and sensor arrays - Tight system integration: e.g. synch of camera and MEMS sensors Scope of Camera Control API 3A - Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) Lens, sensor, aperture control Bayer Pre-processing Image Signal Processor (ISP) RGB/YUV Post-processing Image/Vision Applications Sensor, Color Filter Array Lens, Flash, Focus, Aperture Copyright Khronos Group 2014 - Page 16
Copyright Khronos Group 2014 - Page 17 Advanced Camera Control Use Cases High-dynamic range (HDR) and computational flash photography - High-speed burst with individual frame control over exposure and flash Subject isolation and depth detection - High-speed burst with individual frame control over focus Rolling shutter elimination - High-precision intra-frame synchronization between camera and motion sensor Augmented Reality - 60Hz, low-latency capture with motion sensor synchronization - Multiple Region of Interest (ROI) capture - Synchronized stereo sensors for scene scaling - Detailed feedback on camera operation per frame Time-of-flight or structured light depth camera processing - Aligned stacking of data from multiple sensors
Copyright Khronos Group 2014 - Page 18 Camera API Architecture will be FCAM-based No global state - State travels with image requests - Every stage in the pipeline may have different state - Enables fast, deterministic state changes Synchronize devices - Lens, flash, sound capture, gyro - Devices can schedule Actions - E.g. to be triggered on exposure change
Copyright Khronos Group 2014 - Page 19 Khronos Camera API Requirements Application control over ISP processing (including 3A) - Including multiple, re-entrant ISPs Control multiple sensors with synch and alignment - E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras Enhanced per frame detailed control - Format flexibility, Region of Interest (ROI) selection Global timing & synchronization - E.g. Between cameras and MEMS sensors Flexible processing/streaming - Multiple input and output streams with RAW, Bayer or YUV Processing - Streaming of rows (not just frames) Enable new camera functionality not available on current platforms and align with future platform directions for easy adoption
Copyright Khronos Group 2014 - Page 20 Camera API Design Milestones and Philosophy C-language API starting from proven designs - e.g. FCAM Design alignment with widely used hardware standards - e.g. MIPI CSI Focus on mobile, power-limited devices - But do not preclude other use cases such as automotive, surveillance, DSLR Minimize overlap and maximize interoperability with other Khronos APIs - But other Khronos APIs are not mandated Support vendor-specific extensions Apr13 Working group proposed Group charter approved Jul13 4Q13 Architectural Design First draft specification 1Q14 2Q14 Sample implementation and tests Specification ratification 3Q14
Copyright Khronos Group 2014 - Page 21 Potential Adoption on Android Android Exposes Java camera APIs to developers - Controls underlying Camera HAL Camera HAL v1 API simplified basic point and shoot apps - Difficult or impossible to do much else Camera HAL v3 API is a fundamentally different API - Streams-based to enable more sophisticated camera applications Khronos Camera API builds on FCAM with a goal of being forward compatible with Android architecture Camera API Open source project developed by Nokia and Stanford HAL V3 adopts many FCAM ideas and can use EGL in its implementation Khronos Camera API may be used to IMPLEMENT Android Camera HAL and provide an advanced native camera API in NDK
Copyright Khronos Group 2014 - Page 22 StreamInput Jim Steele CTO, Sensor Platforms Chair, StreamInput Working Group
Sensor Industry Fragmentation Copyright Khronos Group 2014 - Page 23
Copyright Khronos Group 2014 - Page 24 Low-level Sensor Abstraction API Apps request semantic sensor information StreamInput defines possible requests, e.g. Read Physical or Virtual Sensors e.g. Game Quaternion Context detection e.g. Am I in an elevator? Apps Need Sophisticated Access to Sensor Data Without coding to specific sensor hardware Sensor Discoverability Sensor Code Portability Advanced Sensors Everywhere Multi-axis motion/position, quaternions, context-awareness, gestures, activity monitoring, health and environmental sensors StreamInput processing graph provides optimized sensor data stream High-value, smart sensor fusion middleware can connect to apps in a portable way Apps can gain magical situational awareness
Copyright Khronos Group 2014 - Page 25 Sensor Types Basic sensor data: - Acceleration, Magnetic Field, Angular Rates - Pressure, Ambient Light, Proximity, Temperature, Humidity, RGB light, UV light - Heart rate, Blood Oxygen Level, Skin Hydration, Breathalyzer Sensor fusion - Orientation (Quaternion or Euler Angles), Gravity, Linear Acceleration - Position Context awareness - Device Motion: general movement of the device: still, free-fall, - Carry: how the device is being held by a user: in pocket, in hand, - Posture: how the body holding the device is positioned: standing, sitting, step, - Transport: about the environment around the device: in elevator, in car,
Copyright Khronos Group 2014 - Page 26 StreamInput: Potential Sensor Fusion Stack Applications Platforms can provide increased access to improved sensor data stream driving faster, deeper sensor usage by applications OS Sensor APIs (E.g. Android SensorManager or ios CoreMotion) Middleware (E.g. Context-awareness engines, gaming engines) Middleware engines need platformportable access to native, low-level sensor data streams StreamInput implementations compete on sensor stream quality, reduced power consumption, environment triggering and context detection enabling sensor subsystem vendors to increased ADDED VALUE Hardware transport interfaces are defined by each system, e.g. IIO or HID sensor Sensor Sensor Low-level native API defines access to fused sensor data stream and context-awareness Sensor Hub Sensor Hub Mobile or embedded platforms without sensor fusion APIs can provide direct application access to StreamInput Embedded processors or peripheral hardware implementing StreamInput provide a standard interface to other system processors
Copyright Khronos Group 2014 - Page 27 Khronos APIs for Augmented Reality AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together Audio Rendering MEMS Sensors Sensor Fusion Application on CPUs, GPUs and DSPs Precision timestamps on all sensor samples Vision Processing Advanced Camera Control and stream generation Camera Control API EGLStream - stream data between APIs 3D Rendering and Video Composition On GPU
Copyright Khronos Group 2014 - Page 28 Summary Khronos is building a family of interoperating APIs for portable and power-efficient vision processing OpenVX 1.0 has been provisionally released and non-members are invited to provide feedback on the forums - http://www.khronos.org/message_boards/forumdisplay.php/110-openvx-general Khronos camera and sensor fusion APIs are currently in design and complement and integrate with OpenVX Any company is welcome to join Khronos to influence the direction of mobile and embedded vision processing! - $15K annual membership fee for access to all Khronos API working groups - Well-defined IP framework protects your IP and conformant implementations www.khronos.org - ntrevett@nvidia.com