Next Generation Visual Computing

Similar documents
Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

The Benefits of GPU Compute on ARM Mali GPUs

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

SIGGRAPH Briefing August 2014

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

3D Graphics in Future Mobile Devices. Steve Steele, ARM

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

WebGL Meetup GDC Copyright Khronos Group, Page 1

Bifrost - The GPU architecture for next five billion

Copyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

Mobile AR Hardware Futures

Profiling and Debugging Games on Mobile Platforms

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Bringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse

Vulkan 1.1 March Copyright Khronos Group Page 1

AR Standards Update Austin, March 2012

Multimedia in Mobile Phones. Architectures and Trends Lund

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Khronos and the Mobile Ecosystem

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Copyright Khronos Group Page 1

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Khronos Connects Software to Silicon

ARM Mobile GPU Compute Accelerates UX Differentiation

The Mobile Advantage. Erik Noreke Independent Standardization Consultant Chair, OpenSL ES. Copyright Khronos Group, Page 1

Modern Processor Architectures. L25: Modern Compiler Design

ARM Multimedia IP: working together to drive down system power and bandwidth

Copyright Khronos Group Page 1

Altera SDK for OpenCL

Open Standard APIs for Augmented Reality

Our Technology Expertise for Software Engineering Services. AceThought Services Your Partner in Innovation

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

Building Ultra-Low Power Wearable SoCs

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

More performance options

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014

Enable AI on Mobile Devices

The OpenVX Computer Vision and Neural Network Inference

GTC Interaction Simplified. Gesture Recognition Everywhere: Gesture Solutions on Tegra

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Hardware Accelerated Graphics for High Performance JavaFX Mobile Applications

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems

Overview. Technology Details. D/AVE NX Preliminary Product Brief

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Overview and AR/VR Roadmap

Heterogeneous Computing

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

Mali GPU acceleration of HEVC and VP9 Decoder

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

The LPGPU2 Project. Ben Juurlink, TU Berlin

OpenCL Press Conference

Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors

Dave Shreiner, ARM March 2009

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

Standards for Vision Processing and Neural Networks

Unleash the DSP performance of Arm Cortex processors

ARM processors driving automotive innovation

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

Open Standards for Today s Gaming Industry

Accelerating Vision Processing

OpenMAX AL, OpenSL ES

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Higher compression efficiency, exceptional image quality, faster encoding time and lower costs

THE LEADER IN VISUAL COMPUTING

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

џ џ џ џ џ џ џ џ џ џ џ џ .976 REAL CINEMA

Developing the Bifrost GPU architecture for mainstream graphics

The State of Gaming APIs

PowerVR GPU IP from Wearables to Servers. Kristof Beets Director of Business Development May 2015

9 GENERATION INTEL CORE DESKTOP PROCESSORS

mbed OS Update Sam Grove Technical Lead, mbed OS June 2017 ARM 2017

High Quality Real Time Image Processing Framework on Mobile Platforms using Tegra K1. Eyal Hirsch

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

World s most advanced data center accelerator for PCIe-based servers

Simplify System Complexity

Higher Level Programming Abstractions for FPGAs using OpenCL

Handheld Devices. Kari Pulli. Research Fellow, Nokia Research Center Palo Alto. Material from Jyrki Leskelä, Jarmo Nikula, Mika Salmela

The Changing Face of Edge Compute

Next Generation Enterprise Solutions from ARM

ARM instruction sets and CPUs for wide-ranging applications

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation

Streaming Media. Advanced Audio. Erik Noreke Standardization Consultant Chair, OpenSL ES. Copyright Khronos Group, Page 1

HTML5 Evolution and Development. Matt Spencer UI & Browser Marketing Manager

OMAP Android Integration

Beyond Hardware IP An overview of Arm development solutions

Acceleration Standards for Mobile Augmented Reality

Simplify System Complexity

Growth outside Cell Phone Applications

Transcription:

Next Generation Visual Computing (Making GPU Computing a Reality with Mali ) Taipei, 18 June 2013 Roberto Mijat ARM

Addressing Computational Challenges Trends Growing display sizes and resolutions Increasing computational power and novel applications Persistent users expectation of improved experience Limitations Limited and restricted energy and thermal budgets In mobile, processing power greatly outgrowing battery capacity Traditional scaling solutions not sustainable Necessities Increase computational efficiency of processing platforms Make use of heterogeneous and parallel computing Leverage new technologies such as GPU Compute 2

Complementary Compute Architectures Note: characteristics of generic CPUs and GPUs 3

Heterogeneous Computing Operating System Most application processing CPU Programmable through C-like languages and APIs GPU Cost effective, efficient, great floating point performance Control ALU ALU ALU ALU Caches RAM GPU used as computational accelerators or companion processor 2D/3D graphics Advanced Image Processing Accelerate/Complement ISP functionality Offload video codec blocks Accelerate physics computation 4

Benefits of GPU Computing Performance Faster computation Offload and acceleration of non-graphical applications Energy Efficiency Free-up CPU resource by offloading to GPU Better load-balance across system resources Increased system efficiency using the best processor for the job Cost Reduction Reduced cost through h/w consolidation and software flexibility Simpler interface to parallel programming through modern APIs Improved user experience Remove computational barriers Enable new use cases and applications 5

Adoption of Mobile GPU Compute 2012 2013 2014 2015+ OpenCL Full Profile Khronos conformant GPUs in mobile SoCs GPU Compute capable devices start shipping OEMs and SiPs evaluating leading GPU Compute solutions Gradual roll-out of GPU Compute APIs in mobile/embedded platforms Android RenderScript computation first enabled on GPU 6

Adoption of Mobile GPU Compute 2012 2013 2014 2015+ First public demonstrations of GPU Compute Mobile benchmarks ISVs and OEMs start porting/optimizing libraries and key use-case functionality using GPU Compute Computational Photography and Advanced Imaging GPU acceleration Codec vendors develop GPU Compute enabled HEVC decoders Exploration by mainstream developers 7

Adoption of Mobile GPU Compute 2012 2013 2014 2015+ Mainstream support for GPU Computing in Mobile and Embedded GPU Compute widely available and utilized by developers/libraries Introduction of GPUs implementing HSA features, full system coherency Hardware consolidation and software cost reduction through migration of selected ISP/DSP functionality to GPU New use cases, innovation 8

OPENCL 9

OpenCL Overview OpenCL is A framework to enable general purpose parallel computing A computing language portable across heterogeneous processing platforms An API to define and control the platforms A royalty-free open standard, interoperable with existing APIs OpenCL enables easier, better programming of heterogeneous parallel compute systems, and unleashes the general purpose computational power of GPUs needed by emerging workloads OpenCL and the OpenCL logo are trademarks of Apple Inc. 10

OpenCL Programming Model Application Optimize performance critical code Program The kernel is executed over each element of the N-dimensional index space Index space (NDRange) Kernel Runtime Compiler - OpenCL kernel - Native kernel Can use static compilation Binaries are cached Kernel object Can be built to target any supported device Execute command Work-item: instance of a kernel executing on a point in the index space Work-group: collection of work-items 11

The ARM OpenCL Implementation Implements the latest version of the standard Implements Full Profile, supports 64-bit Optimized for interoperability with existing Mali software stack Optimized for interoperability between CPU and GPU Architected for Cache Coherent Interconnect support Extensible design 12

With Full Profile you know what you get Full Profile defines the baseline set of features for OpenCL Embedded Profile defines a subset of the specification Designed to enable OpenCL on less capable devices Making optional a large set of features, restricting developers Reducing precision of floating point maths Key Feature Embedded Full FP32 precision Relaxed IEEE-754 Built-in atomic operations Optional Supported 64-bit integer Optional Supported Online compiler Optional Supported 3D image writes Optional Supported Linear interpolation for floating point images Optional Supported Size of buffers and memory Limited Supported Image data type requirements Reduced Supported 13

RENDERSCRIPT 14

Introduction to RenderScript Compute framework and API for Android Officially introduced in Honeycomb Cross-platform control-slave architecture, with runtime compilation A graphics engine component has been deprecated since Jelly Bean Complements existing APIs by adding: A compute API for parallel processing similar to OpenCL A scripting language based on C99 supporting vector data types Designed for portability, performance, usability On-device JIT compilation and dynamic thread launch Native code optimization to maximize performance critical algorithms Mali-T604 is the first GPU to support RenderScript 15

Online compilation How RenderScript works Java App Reflected Layer llvm-rs-cc Portable Bitcode RenderScript Script Online compilation Dalvik JIT libbcc Executable librs Machine Code ARM Compute System (Cortex CPU + Mali GPU + AMBA 4) 16

DESIGNED FOR GPU COMPUTE 17

Mali-T600 : Designed for GPU Compute Comprehensive support for general purpose data types 8/16/32/64-bit signed/unsigned integer FP16, FP32, FP64 2,3,4,8,16 wide vectors 2D/3D images Floating Point precision & performance Full IEEE 754-2008 compliance 100s of GFLOPs performance for non graphical workloads Sustainable and proven performance for real life workloads 18

Mali-T600: Designed for GPU Compute Hardware acceleration Most common mathematical functions implemented in h/w >70% coverage within newest industry APIs Most operations compute in one cycle Optimal memory throughput and latency Optimized for stream and generic load/store operations Tight integration with system using latest AMBA interfaces Leverage on new Cache Coherent Interconnect technologies Task management implemented in hardware Optimal automatic distribution of compute workloads Optimal dynamic power management Efficient use of processing resources 19

GPU Compute on Mali: here today! Passed Khronos Conformance Only OpenCL 1.1 Full Profile on Linux and Android outside of console and desktop space. Proven in Silicon Samsung Exynos 5 Dual, implements Full Profile OpenCL and RenderScript DDKs available now Mali-T600 shipping in real products Google Chromebook Google Nexus 10 InSignal Arndale Community Board API exposed for developers RenderScript on Android for Nexus 10 20

Example of the benefits of GPU Compute from the real world USE CASES 21

Example use cases for GPU Computing Mobile Computational Photography Physics in games Moving and still image real-time stabilization Information extraction: object detection, classification and tracking Imaging: correction, improvement, consolidation Content and context understanding HDR Augmented Reality DTV/STB 2D to 3D conversion Super resolution Pre and post processing Camera based UI Trans-coding Information extraction and superimposition Automotive Lane Detection Smart Head-Light Road Sign Recognition Night Vision Object Classification Pedestrian, Vehicle and Collision Detection Vehicle Detection Dynamic cruise control 100s GFLOPs of efficient processing power: improve existing use-cases, enable next generation use-cases 22

Advanced Image Processing RenderScript is the official Heterogeneous Compute Android API Since Android ICS 4.2 it has been enabled to target the GPU Complex image filters can be greatly accelerated by GPU Compute Filter Speed-up [1] MotionBlur 3.5x Cloud 4.2x Labyrinth 3.8x TitleReflection 7.3x WhirlPinch 3.6x Wave 7.0x Bicubic 15.4x Image size: 2560x1920 [1] Acceleration compares RenderScript compiled on device (LLVM) on dual-core Cortex -A15 and Mali -T604 on a stock Google Nexus 10 23

Video Processing APK Proprietary Transcoding/Processing Pipeline Image filters implemented using RenderScript Optimized for ARM + Mali-T600 GPU Compute Filter FPS (GPU+CPU vs CPU only) Speed-up Deshake (720p) 28 / 8 3.5x Upscaling (720p to 1080p) 20 / 3 6.7x 24

GPU Compute accelerated superscaling Accelerated using RenderScript On Google Nexus 10 (Mali-T604) 25

Next Generation Multimedia Codecs High Efficiency Video Coding (HEVC) Latest video compression standard ratified by ITU in Jan 2013 Improved video quality and double data compression from H.264 Can support up to 8k UHD ARM is collaborating with multiple codec vendors Ensuring widest availability of HEVC across multiple ARM platforms Enabling HEVC early, in software, through NEON and GPU Compute Flexibility of software solutions critical as HEVC rolls out 26

Why GPU Compute for HEVC High resolution HEVC decoding maximises CPU load GPUs are traditionally idle during video playback GPU architecture suites acceleration of parallel codec blocks Offloading computation to the GPU frees up the CPU to perform other (system) tasks Combining CPU (NEON) and GPU Compute enable most efficient HEVC decode Mali GPUs are well suited for Video Acceleration with significant power/performance benefits Ittiam Systems 27

Physics (Cloth Simulation) 28

ISP Pipeline Offload to GPU (OpenCL) Entire ISP pipeline offloaded to the GPU using OpenCL More flexibility Sensor and camera module vendors can invest in optimized portable software libraries instead of hardware ISP SoC implementers can reduce BoM by offloading ISP blocks to the GPU Mali-T604 demo was previewed at MWC13 OpenCL Raw Data form HDR Sensor Noise reduction HDR reconstruction Tone mapping Colour conversion Rendering De-noising Gamma correction OpenGL ES 29

Gesture User Interfaces eyesight TM s gesture recognition technology using GPU Compute on ARM s Mali-T600 offers unique capabilities Reduction of overall power consumption Reduction of load from the CPU Robust recognition in challenging lighting conditions Enhanced user experience Higher FPS for more gesture capabilities and features 30

Energy used for unit of work (lower is better) Computer Vision Based Applications Computer Vision entails the acquisition, processing, analysis and understanding of sensor data (images), in order to derive information to enable decisions to be made In this example: Consistent 6x speed up ~5x more energy efficiency Face detection study on Mali-T604 based silicon 31

Conclusions Improve energy efficiency through heterogeneous computing Use the best processor for the task Balance workload across system resources Offload heavy parallel computation to the GPU Bring the benefits of GPU Compute to key use cases Computational Photography and Advanced Imaging Next generation of multimedia codecs Computer Vision applications The Mali Ecosystem is making GPU Compute a reality 32