Acceleration Standards for Mobile Augmented Reality

Similar documents
WebGL Meetup GDC Copyright Khronos Group, Page 1

SC24/WG9 Liaison Meeting

AR Standards Update Austin, March 2012

Khronos and the Mobile Ecosystem

Copyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012

Mobile AR Hardware Futures

Open Standard APIs for Augmented Reality

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Open Standards for Today s Gaming Industry

The State of Gaming APIs

WebGL, WebCL and Beyond!

Copyright Khronos Group Page 1

Vision Acceleration. Launch Briefing October Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group

SIGGRAPH Briefing August 2014

Open Standard APIs for Embedded Vision Processing

Completing the Multimedia Architecture

OpenMAX AL, OpenSL ES

OpenCL Press Conference

Open Standards for AR and VR Neil Trevett Khronos President NVIDIA VP Developer January 2018

Khronos Overview The State of the Art in Open Standards for Visual Computing

Graphics Technology Update

Overview and AR/VR Roadmap

Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group. Copyright Khronos Group Page 1

WebGL, WebCL and OpenCL

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Copyright Khronos Group Page 1

The Benefits of GPU Compute on ARM Mali GPUs

Standards for WebVR. Neil Trevett. Khronos President Vice President Mobile Content,

Streaming Media. Advanced Audio. Erik Noreke Standardization Consultant Chair, OpenSL ES. Copyright Khronos Group, Page 1

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

Bringing the Power of the GPU to the Web

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Khronos Connects Software to Silicon

The OpenVX Computer Vision and Neural Network Inference

Accelerating Vision Processing

Introduction to OpenGL ES 3.0

INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.

Next Generation Visual Computing

Our Technology Expertise for Software Engineering Services. AceThought Services Your Partner in Innovation

GPGPU on Mobile Devices

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Multimedia in Mobile Phones. Architectures and Trends Lund

gltf Briefing September 2016 Copyright Khronos Group Page 1

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Visual HTML5. Human Information Interaction for Knowledge Extraction, Interaction, Utilization, Decision making HI-I-KEIUD

Developing a Reference Model for Augmented Reality. 5th International AR Standards Community Meeting 19 March 2012

GTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation:

Adding Advanced Shader Features and Handling Fragmentation

Copyright Khronos Group, Page 1

Standards for Vision Processing and Neural Networks

Vulkan 1.1 March Copyright Khronos Group Page 1

Hardware Accelerated Graphics for High Performance JavaFX Mobile Applications

Copyright Khronos Group, Page 1

Dave Shreiner, ARM March 2009

Khronos Updates GDC 2017 Neil Trevett Vice President Developer Ecosystem, NVIDIA President,

3D Graphics in Future Mobile Devices. Steve Steele, ARM

Copyright Khronos Group, Page 1

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

POWERVR MBX & SGX OpenVG Support and Resources

Creating the Embedded Media Processing Ecosystem

Silicon Acceleration APIs

Neural Network Exchange Format

GTC Interaction Simplified. Gesture Recognition Everywhere: Gesture Solutions on Tegra

Copyright Khronos Group Page 1

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir

OpenGL ES 2.0 : Start Developing Now. Dan Ginsburg Advanced Micro Devices, Inc.

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer

The Mobile Advantage. Erik Noreke Independent Standardization Consultant Chair, OpenSL ES. Copyright Khronos Group, Page 1

Bringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse

OpenGL BOF Siggraph 2011

Profiling and Debugging Games on Mobile Platforms

Embedded Media Processing Ecosystem

WebGL. Announcements. WebGL for Graphics Developers. WebGL for Web Developers. Homework 5 due Monday, 04/16. Final on Tuesday, 05/01

HTML5 Evolution and Development. Matt Spencer UI & Browser Marketing Manager

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

More performance options

ARM Multimedia IP: working together to drive down system power and bandwidth

Windowing System on a 3D Pipeline. February 2005

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

Graphics and Imaging Architectures

Module Introduction. Content 15 pages 2 questions. Learning Time 25 minutes

Connecting with Tizen : An Overview & Roadmap. Mohan Rao

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

OpenCL The Open Standard for Heterogeneous Parallel Programming

Streaming Media Portability

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1

Emerging Vision Technologies: Enabling a New Era of Intelligent Devices

THE LEADER IN VISUAL COMPUTING

KHRONOS STANDARDS UPDATE. Neil Trevett, GTC, 26 th March 2018

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

Transcription:

Acceleration Standards for Mobile Augmented Reality Neil Trevett Khronos President Vice President Mobile Content, NVIDIA November 2012 Copyright Khronos Group 2012 Page 1

Copyright Khronos Group 2012 Page 2 Accelerating AR to Meet User Expectations Mobile is an enabling platform for Augmented Reality - Mobile SOC, vision and sensor capabilities are expanding quickly But consumer expectations are not being met -.. Vision-based AR must be sub-pixel accurate, 60Hz smooth AND low power Power is now the main challenge for a quality AR user experience - Delivering the needed computation acceleration in a mobile form factor Acceleration APIs can be used to optimize AR performance AND power - OPEN STANDARD APIs can do that with minimized fragmentation and reduced developer costs SOC = System On Chip Complete compute system minus memory and some peripherals

Mobile SOC Performance Increases CPU/GPU AGGREGATE PERFORMANCE 100 10 1 2010 Tegra 2 Google Nexus 7 HTC One X 5x 2011 Tegra 3 10x WAYNE 50x LOGAN 75x 2012 2013 2014 Android Tablets STARK 25x perf increase in next three years Core i5 Core 2 Duo Copyright Khronos Group 2012 Page 3

BUT!! Mobile Thermal Design Point Leakage current in latest silicon geometries means that Moore s Law no longer delivers more performance without increasing power 4-5 Screen takes 250-500mW 2-4W 10 Screen takes 1-2W Resolution makes a difference! The ipad3 screen takes up to 8W 7 Screen takes 1W 4-7W 6-10W Typical max system power levels before thermal failure Even as battery technology improves - these thermal limits remain 30-90W Copyright Khronos Group 2012 Page 4

How to Save Power? Much more expensive to MOVE data than COMPUTE data Energy efficiency must now be key metric during silicon AND software design - Awareness of where data lives, where computation happens, how is it scheduled Need to use hardware acceleration - Lots of processing in parallel - Efficient caching and memory usage - Reduces data movement 32-bit Float Operation 7pJ Send 32-bits Off-chip 50pJ Send 32-bits 2mm 24pJ Write 32-bits to Memory 600pJ For 40nm, 1V process 32-bit Register Write 0.5pJ 32-bit Integer Add 1pJ Copyright Khronos Group 2012 Page 5

Copyright Khronos Group 2012 Page 6 Example - Typical Camera ISP Camera ISP (Image Signal Processor) has little or no programmability - Scan-line-based, Data flows through compact hardware pipe - No global memory used to minimize power BUT computational photography apps now want to mix nonprogrammable ISP processing with more flexible GPU or CPU processing - ISP pipelines will provide tap/insertion points to/from CPU/GPU at critical points ~760 math Ops ~42K vals = 670Kb ~250Gops @ 300MHz

Copyright Khronos Group 2012 Page 7 Engineering for Power Dark Silicon - specialized hardware - turned on only when needed - Lots of space for transistors just can t turn them all on at same time - Accelerator blocks increase locality and parallelism to save power compared to programmable processors Be power smart when using programmable processors - Instrumentation for energy-aware compilers and profilers - Dynamic and feedback-driven software power optimization - Power optimizing compiler back-end compilers / installers Smart, holistic use of sensors and peripherals - Wireless modems and networks - Motion sensors, cameras, networking, GPS Apple A5 Significant potion of die is already not CPU/GPU or IO = dedicated hardware

Copyright Khronos Group 2012 Page 8 Programmers View of Typical SOC c. 2012 Unified memory is critical departure from PC architecture Common virtual address space for all units but typically not coherent CPU Complex 1-5 Cortex A9/A15 Cores with NEON L1 Cache L2 Cache Automatic scaling of frequency/ voltage and # cores to meet current software load HD Audio Engine / IO Software configurability 2D/3D GPU Complex Significant programmable acceleration. Fully C-programmable soon. 10-50+ cores Graphics Cache Unified Memory Controller for 0.5-2GB DDR3L/LP-DDR3 Peripheral Busses 1-4 Display Controllers Sensor Array Image Signal Processor Video Encoder Decoder Vision Processor (Future) Typically NO or very limited programmable functionality 1-3 Cameras Software configurability Programmable? Or dedicated hardware for power efficiency?

Copyright Khronos Group 2012 Page 9 Khronos Connects Software to Silicon ROYALTY-FREE, OPEN STANDARD APIs for advanced hardware acceleration Graphics, video, audio, compute, visual and sensor processing Low level silicon to software interface needed on every platform Defines the forward looking roadmap for the silicon community Shipping on billions of devices across multiple operating systems Rigorous conformance tests for crossvendor consistency Khronos is OPEN for any company to join and participate Acceleration APIs BY the Industry FOR the Industry

Mobile Innovation Hot Spots New platform capabilities being driven by SILICON and APIs Vision - Camera as sensor Console-Class 3D Performance, Quality, Controllers and TV connectivity Computational Photography Gesture Processing Augmented Reality Media and Image Streaming Heterogeneous Parallel Processing Sensor Fusion Smart use of sensors to make devices magically context aware location, usage, position HTML5 Bringing acceleration to Web Apps that can be discovered on the Net and run on any platform Copyright Khronos Group 2012 Page 10

Example use of Khronos APIs in AR Audio Rendering Positional Sensors Positional and GPS Sensor Data Computer Vision and Tracking Camera Synchronization and sensor fusion Position and Tracking Semantics Synchronize processed camera frame to output displayed frame? Video TAP to Vision Subsystem Camera Processing Application on CPUs Video stream to GPU 3D Rendering and Video Composition On GPU Control Camera, Preprocess and generate video streams Copyright Khronos Group 2012 Page 11

3D API Family Tree ES3 is backward compatible so new features can be added incrementally Programmable vertex and fragment shaders Fixed function 3D Pipeline OpenGL ES 2.0 Content OpenGL ES 1.1 Content OpenGL ES 3.0 Content Mobile 3D WebGL 1.0 OpenGL ES 1.1 WebGL-Next OpenGL ES 2.0 OpenGL ES 3.0 ES-Next OpenGL ES 1.0 OpenGL 1.3 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.1 OpenGL 3.3 OpenGL 4.2 OpenGL 4.0 OpenGL 3.0 OpenGL 3.2 OpenGL 4.1 2003 2004 GL-Next OpenGL 4.3 is a superset of DX11 Desktop 3D 2002 OpenGL 4.3 2005 2006 2007 2008 2009 2010 2011 2012 2013 Copyright Khronos Group 2012 Page 12

OpenGL ES 3.0 Highlights Better looking, faster performing games and apps at lower power - Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in shader programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback Make life better for the programmer - Tighter requirements for supported features to reduce implementation variability Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified Standardized Texture Compression - #1 developer request! Copyright Khronos Group 2012 Page 13

OpenCL Heterogeneous Computing OpenCL Kernel OpenCL Code Kernel OpenCL Code Kernel OpenCL Code Kernel Code A low-level, cross-platform, cross-vendor standard for harnessing all system compute resources C Platform Layer API - Query, select and initialize compute devices CPU Kernel Language Specification - Subset of ISO C99 with language extensions - Well-defined numerical accuracy - IEEE 754 rounding with specified max error - Rich built-in functions: cross, dot, sin, pow, log C Runtime API - Runtime or build-time compilation of kernels - Execute compute kernels across multiple devices CPU OpenCL Kernel OpenCL Code Kernel OpenCL Code Kernel OpenCL Code Kernel Code GPU GPU CPU One code tree can be executed on CPUs or GPUs Copyright Khronos Group 2012 Page 14

Copyright Khronos Group 2012 Page 15 OpenMAX AL Streaming Media Framework Enables key video, image stream and camera use cases - Enables optimal hardware acceleration with app portability Create Media Objects to play and process images and video with AV sync - Connect to variety of input and output objects to PLAY and RECORD media Full range of video effects and controls - Including playback rate, post processing, and image manipulation OpenMAX AL includes sophisticated camera controls Sources Analog Radio Camera Audio Input URI Efficient data routing for CPU processing DSrc Data Tap to CPU OpenMAX AL Media Object Processing DSnk Sinks EGLStream to GPU Audio Mix Display Window URI EGLStreams extension enables efficient transfer of image stream to GPU texture memory Memory Memory

Copyright Khronos Group 2012 Page 16 Developer Requested Camera Extensions Query camera information - Focal length (fx, fy), principal point (cx, cy), skew (s), image resolution (h, w) - Spatial information of how cameras and sensors are placed on device - Calibration and lens distortion FCAM++ for extensive exposure parameters in single or burst mode - Shutter, aperture, ISO, white balance, frame rate, focus modes, resolution - Preload parameters for each shot in a burst ROI extraction - From wide angle and fish-eye lenses Rolling shutter compensation - Using accelerometer Data output format control - Grayscale, RGB(A), YUV - Access to the raw data e.g. Bayer pattern

Copyright Khronos Group 2012 Page 17 OpenVX Vision Hardware Acceleration Layer - Enable hardware vendors to implement accelerated imaging and vision algorithms - For use by high-level libraries or apps Focus on enabling real-time vision - On mobile and embedded systems Diversity of efficient implementations - From programmable processors to dedicated hardware pipelines Dedicated hardware can help make vision processing performant and low-power enough for pervasive always-on use OpenCV open source library Application Open source sample implementation? Other higher-level CV libraries Hardware vendor implementations

Copyright Khronos Group 2012 Page 18 Current OpenVX Participants Aiming for specification first draft in early 2013 Itseez is working group chair

Possible Implementations of Vision Stack StreamInput combines camera vision with other sensors on device OpenVX does not need the camera, GUI or compute functionality in OpenCV use Khronos API interop instead OpenCV community can progressively port over OpenVX acceleration provided by silicon community Camera input from OpenMAX AL or other camera subsystems Data and event interop with OpenCL and OpenGL ES for display and compute processing Use OpenCL to implement OpenCV OR OpenVX with parallel execution on CPUs/GPUs OpenVX does not duplicate OpenCV functionality JUST provides essential acceleration Dedicated hardware OpenVX can be implemented using dedicated hardware Copyright Khronos Group 2012 Page 19

Copyright Khronos Group 2012 Page 20 Portable Access to Sensor Fusion Apps request semantic sensor information StreamInput defines possible requests, e.g. Provide Skeleton Position Am I in an elevator? Standardized Node Intercommunication Universal Timestamps E.g. align samples from camera and other sensors Input Device Input Device Input Device Advanced Sensors Everywhere RGB and depth cameras, multi-axis motion/position, touch and gestures, microphones, wireless controllers, haptics keyboards, mice, track pads Filter Node Filter Node Processing graph provides sensor data stream Utilizes optimized, smart, sensor middleware Apps can gain magical situational awareness Filter Node App Apps Need Sophisticated Access to Sensor Data Without coding to specific sensor hardware

Copyright Khronos Group 2012 Page 21 OpenSL ES Advanced Audio OpenSL ES does for audio what OpenGL ES does for graphics - Advanced audio functionality from simple playback to full 3D positional audio Object-based native audio API for simplicity and high performance - Same object framework as OpenMAX AL - Reduces development time Attractive alternative to open source frameworks - Tightly defined specification with full conformance tests - Robust application portability across platforms and OS

Copyright Khronos Group 2012 Page 22 EGLStream Streaming Images EGL originally embedded version of WGL - Abstraction layer to window systems and memory buffers Role has expanded to provide API interop data and events - EGLImages single buffers that can be passed between APIs - EGLStreams provides stream of images with AV sync - Cross process EGLStreams Producer and Consumer can be in different processes for performance or security e.g. browser compositing process Android SurfaceTexture is a Java wrapper around EGLStreams - Captures video decode or camera preview to OpenGL ES texture OpenMAX AL Media Player is the EGLStream Producer and controls production of frames. EGLStreams enables and hides details of video frame transport. Enables multiple buffering modes for different uses cases e.g.: FIFO and explicit latch/release OpenGL ES GL_TEXTURE_EXTERNAL is the EGLStream Consumer and converts video format into RGB OpenGL ES texture Camera File URL Etc. MEDIA PLAYER Object EGLStream? GL_TEXTURE_EXTERNAL

OS API Adoption Mobile Operating Systems OpenGL ES 2.0 Shipping - Android 2.2 OpenSL ES 1.0 (subset) Shipping Android 2.3 OpenMAX AL 1.0 (subset) Shipping - Android 4.0 EGL 1.4 Shipping under SDK -> NDK Opera and Firefox WebGL now Chrome soon OpenGL 3.2 on MacOS OpenCL 1.1 on MacOS OpenGL ES 2.0 on ios Can enable on MacOS Safari ios5 enables WebGL for iads Microsoft Windows RT: - Only Microsoft native APIs - HTML5 but not yet WebGL Copyright Khronos Group 2012 Page 23

Using Android Java for AR Native APIs can supplement Android capabilities now and be absorbed into platform C and Java APIs by Google over time Positional Sensors No advanced camera controls: ROI, burst parameter control, focus modes, format choices MediaSDK Camera SensorManager Positional and GPS Sensor Data Computer Vision and Tracking Camera Processing Java Callback Preview frames Limited fusion and cross sensor synch. No virtual sensors Synchronization and sensor fusion OpenCV through JNI 30FPS but 2-3 frames latency Video stream to GPU SurfaceTexture Position and Tracking Semantics Need synch between input and composited frames JetPlayer Audio Rendering Application on CPUs 3D Rendering and Video Composition On GPU GLSurfaceView Java provides access to EGL-type and OpenGL ES functionality Copyright Khronos Group 2012 Page 24

Leveraging Proven Native APIs into HTML5 Leverage native API investments into the Web - Faster API development and deployment - Familiar foundation reduces developer learning curve Khronos and W3C exploring liaison - Multiple potential joint projects WebMAX? Camera control and video processing WebAudio Advanced JavaScript Audio Canvas WebVX? Vision Processing Device and Sensor APIs Device Orientation Working Groups JavaScript Native Native APIs shipping or working group underway JavaScript API shipping or working group underway Possible future JavaScript APIs Copyright Khronos Group 2012 Page 25

Expanding Platform Reach for Graphics and Computation Desktop Mobile Production Browsers Shipping with WebGL: Desktop - Chrome, Firefox, Opera, Safari Mobile - Opera and Firefox Apple ios Safari uses WebGL for iads Web Graphics Interop Interop Interop WebGL on majority of production desktops now. WebGL pervasively available on mobile in next 12 months Typed Arrays WebCL will start deploying in next 12-18 months Compute Full Profile Full Profile and Embedded Profile OpenCL pervasively available on mobile in next 18-24 months Copyright Khronos Group 2012 Page 26

Copyright Khronos Group 2012 Page 27 Khronos and MPEG Standards COLLADA 3D Authoring Format ARAF - AR Application format 3DG 3D Graphics Coding Media Framework OpenMAX Rich Media Framework MPEG-4-2D/3D graphics MPEG-21 - Media Management/Protection MPEG--V - Sensors and actuators MPEG--U - Advanced User interaction MPEG--M - Media oriented middleware Codecs Low level Acceleration APIs for Graphics, Compute, Vision, Sensors Silicon

Copyright Khronos Group 2012 Page 28 Need 3D Transmission Standard? Efficient codecs for compressed binary 3D asset blobs - Geometry, textures, materials, animations, physics - Separate form scene graph description e.g. WebGL could use JSON - Perhaps use RESTful APIs and LOD management web services approach? Many initiatives under way time for communication and collaboration? - MPEG 3D Mesh Progressive Streaming (3DMC), Bones Based Animation (BBA) - Google Body compression - Delta and ZigZag encoding - COLLADA2JSON at COLLADA working group - Multiple papers presented at Web3D 2012 An effective and widely adopted codec ignites previously unimagined opportunities for a media type Audio Video Images 3D MP3 AVC PNG/JPEG??

Copyright Khronos Group 2012 Page 29 3D Codec Concepts - Transmission Format Specification enables reliable encoding and decoding of 3D Assets - Encoding/Decoding can be independent of server/client - Standardized, widely-used codec catalyzes optimized implementations 3D Asset Server Encode 3D Assets Decode 3D Assets 3D Client Challenges - Complexity and variety of 3D asset types to be processed - Efficiency of general codec versus application-specific encoding schema WebGL browser community suggests to try multiple techniques in open source and JavaScript to inform standardization

Copyright Khronos Group 2012 Page 30 Transmission Format Collaboration Exploring multi-sdo collaboration - To accelerate transmission format exploration and development Khronos has drafted liaison MOU to encapsulate this model - Feedback welcome! 4. Each SDO ratifies specification to be royalty free and publicly distributes 1. Members of Participating SDOs can attend liaison working group Liaison Working Group Problem is use of key term specification 2. Liaison Working Group drafts specification + Other SDOs! 3. Liaison Working sends draft spec to each SDO for ratification - using SDO s own ratification process for royalty-free specifications

Copyright Khronos Group 2012 Page 31 Summary Advances in SOC silicon processing and associated APIs are enabling significant new mobile use cases including Augmented Reality Holistic cooperation between hardware and software needed to deliver increasing computational loads in a fixed power budget Cooperative API standards enabling mobile OS and apps to tap into the potential of mobile SOC acceleration Sensor and vision processing now just as important as graphics acceleration Industry consensus needed on need and method to create 3D streaming transmission format www.khronos.org