The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014
ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies > 10bn ARM-based chips in 2013 > 50bn ARM-based chips to date Products CPUs Multimedia processors Interconnect Physical IP Business model Designing and licensing of IP Not manufacturing or selling on chips 2 SEMICON Europa 2014
The Evolution of Mobile GPU Compute OpenGL ES 3.1 Compute Shaders GPU Compute within OpenGL ES API OpenCL Full Profile / RenderScript Portable Heterogeneous Parallel Computation OpenGL ES 2.0 Programmable pipeline Mali-200 Mali-300 Mali-400 MP Mali-450 MP Mali-T600 Series Mali-T700 Series OpenGL ES 1.1 Fixed pipeline ARM Mali -55 GPU 3 SEMICON Europa 2014 2007 2009 2010 2012 2013
GPU Compute: Improve Existing and Enable New Solutions Increased system-level energy efficiency Better load-balance across system resources Complement CPU processing Enable choice of best processor for the job Use heterogeneous compute APIs designed for concurrency Free up CPU resource Offload non-graphical computational tasks to GPU Flexibility, portability and programmability Software solution leveraging CPU+GPU subsystem Industry standard portable APIs Improve user experience Remove computational barrier to improve visual quality, responsiveness, accuracy within existing compute & energy budgets Reduce cost, risk and TTM Enable new applications using existing silicon design 4 SEMICON Europa 2014
(Pre)history of the Camera Phone 1H 2000 0.35 Mpix 20 photos max Needs downloading 2H 2002 0.3 Mpix (640x480) Self-time Digital zoom White balance control Filters 2012 8MPix Auto Focus, 2005 Flash 2MPix Zero shutter lag LED Flash Professional optics Autofocus 2013 41MPix Professional grade optics Optical image stabilization Auto/manual focus Two flashes: Xenon + LED Great user control (exposure, ISO, more) 2H 2000 0.11 Mpix Browse and send supported 5 SEMICON Europa 2014 2004 1.3 Mpix 8-photo burst mode Wireless download Near printing quality 2006 3.2 Mpix Xenon flash Autofocus Image stabilization 2007 5MPix Video record 30fps 2013 8MPix Video stabilization Face detection Panorama Take photos while shooting videos
The Spring of Computational Photography Multi image processing Picture-in-picture dual shot Beautification Best face photo composition HDR & 3D Panorama Real time processing Stabilization, de-noising Face and features detection Motion capture Post production Advanced image filters Occlusion removal 2D to 3D conversion 6 SEMICON Europa 2014 And many more.
Typical Imaging Pipeline (Simplified) Auto Focus (AF) Auto Exposure (AE) Auto White Balance (AWB) (Drive sensor, aperture, etc) Lens CMOS Sensor De-mosaicing Bad pixel correction Lens correction (Shading, Distortion, Vignetting) White balance Noise reduction Colour correction Tone mapping Sharpening Gamma correction Edge enhancement HDR re-composition Segmentation Feature detection Depth extraction Up-scaling Stitching JPEG encoding 7 SEMICON Europa 2014
The Platform InSignal Arndale Development Board Based on Samsung Exynos 5250 Processor Dual core ARM Cortex -A15 CPU clocked at 1.7 GHz Quad core ARM Mali -T604 GPU OpenCL 1.1 Full Profile Linux Mali-T604 high level architecture 8 SEMICON Europa 2014
GPU Compute HDR Prototyping Heterogeneous Compute Processing CPU + GPU + ISP Targeted interoperation optimizations OpenCL TM Native HDR Sensor De-Bayer Gamma Correction Colour Conversion Blur/Reduce HDR Reconstruction Rendering Tone Mapping Noise Reduction OpenGL ES 9 SEMICON Europa 2014
Interoperation Optimizations Ratio of time spent processing data on the GPU vs the total host application time (per frame) No Interrops Processing time on GPU Copy frame for rendering Other Reduction of overheads With Interrops Improvement in efficiency >50% reduction in total execution time Relative comparison execution time per frame (lower is better) 10 SEMICON Europa 2014
Partner Use-case: Apical Optimizing for GPU Compute To make a smartphone camera closer to a DSLR, need to increase dynamic range and improve low light Requires burst capture with per-pixel motion compensation Potential to do better than OIS for low light Apical is presenting its latest algorithms at SPIE Apical is already porting to OpenCL on ARM Mali GPUs Complex algorithms, impossible on CPU alone CPU / Neon TM With GPU Acceleration Block-level motion compensation Texture matching Image reconstruction Expect around 20x reduction in processing time 11 SEMICON Europa 2014
Image Processing Proven Benefits MULTICOREWARE [1] OpenCL JPEG ARCSOFT NEON MULTICOREWARE [2] 5x reduction in energy consumed [1] Acceleration compares RenderScript compiled on device (LLVM) on dual-core ARM Cortex - A15 and ARM Mali -T604 on a stock Google Nexus 10 device [2] Battery drain test measured on Google Nexus 10 30 iterations of de-shake transcoding 12 SEMICON Europa 2014
Computer Vision (Face and Gesture Detection) Increased robustness and detection accuracy in poor lighting conditions Face Detection Relative Comparison Tested on an instrumented InSignal Arndale Community Board Algorithm based on OpenCV Face Detection example OpenCL kernels re-written and optimized for Mali-T604 Average results represented, permuting CPU and GPU operational frequencies 13 SEMICON Europa 2014
Face Detection Using OpenCL 14 SEMICON Europa 2014
15 SEMICON Europa 2014
HEVC and VP9 Decode Using OpenCL ARM is collaborating with several codec vendors Ensuring widest availability of HEVC across multiple ARM -based platforms Enabling HEVC early, in software, through ARM NEON and GPU Compute Multiple partners developing OpenCL -enabled HEVC codecs for ARM Mali -T600 GPU 16 SEMICON Europa 2014
Resources Hardware Samsung Chromebook (binary drivers available from ARM website) InSignal Arndale available from www.arndaleboard.org (Linux BSP available from ARM) Google Nexus 10 Developer tools and other resources http://www.malideveloper.com/ Forum: http://community.arm.com/groups/arm-mali-graphics SDK (example code) Mali-T600 Series OpenCL TM Developer Guide ARM DS-5 Streamline performance analyzer 17 SEMICON Europa 2014
Thank You The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners 18 SEMICON Europa 2014
The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 19 SEMICON Europa 2014