Mali GPU acceleration of HEVC and VP9 Decoder

Similar documents
MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

Multimedia Decoder Using the Nios II Processor

High Efficiency Video Coding: The Next Gen Codec. Matthew Goldman Senior Vice President TV Compression Technology Ericsson

Advanced Video Coding: The new H.264 video compression standard

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

High Efficiency Video Coding. Li Li 2016/10/18

MPEG-4: Simple Profile (SP)

HEVC The Next Generation Video Coding. 1 ELEG5502 Video Coding Technology

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015

Parallel H.264/AVC Motion Compensation for GPUs using OpenCL

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

White paper: Video Coding A Timeline

Introduction to Video Encoding

Introduction to Video Encoding

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

Mark Kogan CTO Video Delivery Technologies Bluebird TV

AVS VIDEO DECODING ACCELERATION ON ARM CORTEX-A WITH NEON

Parallelism In Video Streaming

PREFACE...XIII ACKNOWLEDGEMENTS...XV

Building an Area-optimized Multi-format Video Encoder IP. Tomi Jalonen VP Sales

High Efficiency Video Coding (HEVC) test model HM vs. HM- 16.6: objective and subjective performance analysis

Scalable Extension of HEVC 한종기

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

IMPLEMENTATION OF H.264 DECODER ON SANDBLASTER DSP Vaidyanathan Ramadurai, Sanjay Jinturkar, Mayan Moudgill, John Glossner

The Basics of Video Compression

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

Lec 10 Video Coding Standard and System - HEVC

The VC-1 and H.264 Video Compression Standards for Broadband Video Services

The Scope of Picture and Video Coding Standardization

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

The Benefits of GPU Compute on ARM Mali GPUs

Image and Video Coding I: Fundamentals

Intel Media Server Studio 2018 R1 - HEVC Decoder and Encoder Release Notes (Version )

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

Video Quality Analysis for H.264 Based on Human Visual System

High Efficiency Video Decoding on Multicore Processor

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis

Video Encoding with. Multicore Processors. March 29, 2007 REAL TIME HD

H.264 Parallel Optimization on Graphics Processors

FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Introduction of Video Codec

Using animation to motivate motion

Video compression Beyond H.264, HEVC

Intel Stress Bitstreams and Encoder (Intel SBE) HEVC Getting Started

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

MediaTek High Efficiency Video Coding

Next Generation Visual Computing

H.264 AVC 4k Decoder V.1.0, 2014

Introduction to Video Compression

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Lecture 13 Video Coding H.264 / MPEG4 AVC

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Development of Low Power ISDB-T One-Segment Decoder by Mobile Multi-Media Engine SoC (S1G)

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD

EE 5359 H.264 to VC 1 Transcoding

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

Anatomy of a Video Codec

Video Compression Standards (II) A/Prof. Jian Zhang

A macroblock-level analysis on the dynamic behaviour of an H.264 decoder

Image and Video Coding I: Fundamentals

ENCODER COMPLEXITY REDUCTION WITH SELECTIVE MOTION MERGE IN HEVC ABHISHEK HASSAN THUNGARAJ. Presented to the Faculty of the Graduate School of

Objective: Introduction: To: Dr. K. R. Rao. From: Kaustubh V. Dhonsale (UTA id: ) Date: 04/24/2012

Video Coding Using Spatially Varying Transform

CMPT 365 Multimedia Systems. Media Compression - Video

Performance Analysis of H.264 Encoder on TMS320C64x+ and ARM 9E. Nikshep Patil

10.2 Video Compression with Motion Compensation 10.4 H H.263

Professor, CSE Department, Nirma University, Ahmedabad, India

COMPARISON OF HIGH EFFICIENCY VIDEO CODING (HEVC) PERFORMANCE WITH H.264 ADVANCED VIDEO CODING (AVC)

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Week 14. Video Compression. Ref: Fundamentals of Multimedia

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand

IMPLEMENTATION OF DEBLOCKING FILTER ALGORITHM USING RECONFIGURABLE ARCHITECTURE

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Smoooth Streaming over wireless Networks Sreya Chakraborty Final Report EE-5359 under the guidance of Dr. K.R.Rao

Testing HEVC model HM on objective and subjective way

ESE532 Spring University of Pennsylvania Department of Electrical and System Engineering System-on-a-Chip Architecture

Parallel Scalability of Video Decoders

Overview, implementation and comparison of Audio Video Standard (AVS) China and H.264/MPEG -4 part 10 or Advanced Video Coding Standard

Module Introduction. Content 15 pages 2 questions. Learning Time 25 minutes

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

High Efficiency Video Coding (HEVC)

A Parallel Transaction-Level Model of H.264 Video Decoder

MPEG-2. ISO/IEC (or ITU-T H.262)

Updates in MPEG-4 AVC (H.264) Standard to Improve Picture Quality and Usability

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT

Architecture Considerations for Multi-Format Programmable Video Processors

"Block Artifacts Reduction Using Two HEVC Encoder Methods" Dr.K.R.RAO

Performance analysis of VP8 Video Decoder on TMS Platform:

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

How an MPEG-1 Codec Works

Transcription:

Mali GPU acceleration of HEVC and VP9 Decoder

2 Web Video continues to grow!!! Video accounted for 50% of the mobile traffic in 2012 - Citrix ByteMobile's 4Q 2012 Analytics Report. Globally, IP video traffic will be 79 percent of all consumer Internet traffic in 2018, up from 66 percent in 2013 - Cisco Visual Networking Index: Forecast and Methodology, 2013-2018, Cisco Systems, Inc

3 Web Video continues to grow!!! Slide Source : Google I/O 2013

Video compression is the solution!!! The existing videos have to be compressed more.. Offers upto 50% more compression HEVC and VP9 are two new video coding standards

HEVC standard HEVC aka H.265 is a video compression standard, jointly developed by ISO/IEC MPEG and ITU-T VCEG Licensing cost to patent holders ~ $0.20 per unit as Royalty Courtesy : Qualcomm Inc

VP9 standard VP9 is a video compression standard, developed by Google No patent licensing cost Royalty FREE Courtesy : Google I/O 2013

HEVC and VP9 computational complexity Higher compression ratio comes with a higher computational complexity Multiple factors makes HEVC and VP9 more complex Recursive Coding block structure Non-square inter prediction block sizes Higher size of Transforms Higher number of intra prediction angles Higher taps in filters Additional tools such as SAO etc., Implementing HEVC and VP9 decoders on mobile offers multiple choices and hence multiple challenges

HEVC and VP9 on Mobile processors Key Vectors CPU Only DSP Only H/W based CPU + GPU / Open CL Power Consumption High Moderate Low Moderate Silicon Cost Low High High Low Development Lead Time Low High High Low Upgradability High High Low High Portability High Low None High CPU + GPU offers a good trade-off!!!

HEVC/VP9 decoder GPU acceleration GPUs are generally idle during video playout GPUs are massively multithreaded devices. i.e., GPUs will require hundreds or thousands of threads to be executed in parallel at any given time So only highly data parallel algorithms inside a video codec can be efficiently offloaded to the GPU Offloading the video decode task to GPU, would enable the CPU to perform other tasks

HEVC/VP9 Decoder with GPU Acceleration Parsing & Entropy Decode Motion Compensa tion Inverse Quant Intra Prediction Inverse Transform Reconstru ction Loop filtering Not suitable for GPU execution Data parallel execution possible, suitable for GPU execution

Motion compensation The current picture/frame pixels is predicted from the reference frame s pixels The reference picture can be from past or future The prediction happens on a block-by-block basis And there can be multiple reference frames for each block

Motion compensation The most compute intensive part of Motion compensation is sub-pixel interpolation Luma 8 filter Chroma 4 tap filter in HEVC and 8 tap filter in VP9 Sub pixel interpolation is data parallel, i.e., interpolation of each block within a frame can happen in parallel and hence suited for GPU computing

Inverse quantization and transforms The residue value need to be Inverse quantized 2-D Inverse DCT transformations should be performed over the inverse quantized data Both these process are data parallel and 2-D Inverse DCT in particular is highly compute intensive and hence suitable for GPU computing Data from Bitstream parse Inverse Quant Inverse Transform Residue data output to Reconstruction

Reconstruction and In-loop filters Reconstruction : The output from the Motion compensation and intra prediction should be added with the output from Inverse transform In loop filtering such as Deblocking and SAO(only HEVC) filters are applied over reconstructed samples Again these are data parallel and compute intensive jobs that can be accelerated with GPU Parsing & Entropy Decode Motion Compensa tion Inverse Quant Intra Prediction Inverse Transform Recon Deblocking & SAO

Benefits of Mali GPUs for Video Mali GPUs are highly suited for Video acceleration, with a easyto-optimize architecture The 128-bit vector processing Suits DSP algorithms like Video processing Presence of GPU cache instead of Local memory No requirement for data transfers from/to global memory. Can be understood just like a CPU Flexible OpenCL workgroup size Works optimizally for a large range of OpenCL workgroup sizes. Multiple block sizes in a Video frame can be handled efficiently No divergent threads Similar to CPU code, conditional code can be used in OpenCL kernels as well. Different kinds of filter types, filter lengths etc., in video decode can be handled efficiently CCI/CCN support Mali GPUs supports ARM CCI/CCN technologies for cache coherent access of data between CPU and GPU. This will reduce the DDR memory stalls and cache synchronization overheads significantly

Challenges Efficient Partitioning of work between CPU and GPU The effective FPS of decoder will be the minimum of the FPS achieved by the CPU and GPU for their respective work So the partitioning needs to be efficient so that both of them perform their respective work at almost the same speed(fps) Efficient pipelining data between CPU and GPU The algorithms running on CPU will depend on the output of algorithms from GPU and/or vice versa A good design should make sure neither the CPU nor the GPU spend any time waiting for the output of the other Cache coherency Cache coherency between CPU and GPU data need to ensured.

Summary Online video traffic increases rapidly HEVC and VP9 answers the video traffic problem, by offering upto 50% more compression on video HEVC and VP9 decoding on mobile is challenging CPU + GPU based solutions offers a good trade off GPU accelerated video decoding is feasible, but with certain challenges The power and performance results of GPU accelerated video decoder is very impressive THANK YOU