MULTIMEDIA PROCESSING Real-time H.264 video enhancement by using AMD APP SDK

Similar documents
OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

viewdle! - machine vision experts

SIMULATOR AMD RESEARCH JUNE 14, 2015

Automatic Intra-Application Load Balancing for Heterogeneous Systems

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

Heterogeneous Computing

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016


Solid State Graphics (SSG) SDK Setup and Raw Video Player Guide

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

AMD 780G. Niles Burbank AMD. an x86 chipset with advanced integrated GPU. Hot Chips 2008

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

ROCm: An open platform for GPU computing exploration

FUSION PROCESSORS AND HPC

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

Fusion Enabled Image Processing

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

Vulkan (including Vulkan Fast Paths)

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

Understanding GPGPU Vector Register File Usage

Generic System Calls for GPUs

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

HyperTransport Technology

RegMutex: Inter-Warp GPU Register Time-Sharing

Anatomy of AMD s TeraScale Graphics Engine

Designing Natural Interfaces

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

Optimizing Film, Media with OpenCL & Intel Quick Sync Video

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

Sequential Consistency for Heterogeneous-Race-Free

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

Graphics Hardware 2008

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

More performance options

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

OpenCL API. OpenCL Tutorial, PPAM Dominik Behr September 13 th, 2009

Driver Options in AMD Radeon Pro Settings. User Guide

1 Presentation Title Month ##, 2012

AMD AIB Partner Guidelines. Version February, 2015

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability

Cilk Plus: Multicore extensions for C and C++

DR. LISA SU

High Performance Graphics 2010

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v2.1

AMD Radeon ProRender plug-in for Universal Scene Description. Installation Guide

ASYNCHRONOUS SHADERS WHITE PAPER 0

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v3.0

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

AMD EPYC CORPORATE BRAND GUIDELINES

AMD RYZEN CORPORATE BRAND GUIDELINES

INTRODUCING RYZEN MARCH

Fan Control in AMD Radeon Pro Settings. User Guide. This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings.

Release Notes Compute Abstraction Layer (CAL) Stream Computing SDK New Features. 2 Resolved Issues. 3 Known Issues. 3.

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

AUDIO BENEFITS AND CHALLENGES OF HETEROGENEOUS COMPUTING. Carl Wakeland AMD Principal Member of Technical Staff

AMD Security and Server innovation

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation

JavaFX. JavaFX 2.2 System Requirements Release 2.2 E

Transcription:

MULTIMEDIA PROCESSING Real-time H.264 video enhancement by using AMD APP SDK Wei-Lien Hsu AMD SMTS Gongyuan Zhuang AMD MTS

OUTLINE Motivation OpenDecode Video deblurring algorithms Acceleration by clamdfft Experimental results Conclusion and Remarks 3 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

MOTIVATION In order to support high-quality video playback with very low power consumption, AMD Unified Video Decoder (UVD) technology has completely offloaded H.264/AVC, VC-1, MPEG2 and MPEG4 video codec tasks from CPU. UVD engine can be accessed by using DXVA API on Windows or XVBA API on Linux. For OpenCL developers to use the hardware video decoder, AMD APP SDK introduces OpenDecode API. (OpenDecode can be directly used my application, does not require plug-in) OpenDecode is fully interoperable with the OpenCL, post-processing of the decompressed video can be efficiently performed directly inside the GPU memory without the need to copy data back and forth between CPU and GPU. This presentation demonstrates a working example for using OpenDecode to decode H.264 video content OpenCL FFT kernel library routines to enhance the visual quality in shader processors. 4 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE OpenDecode is part of AMD OpenVideo implementation that provides a way for OpenCL video applications to leverage the powerful UVD unit inside AMD Radeon graphics cards. AMD OpenVideo Decode API design goals: Be interoperable with the OpenCL API, which allows for shared surface memory between the two domains Provide platform-independent video codec functions Support multiple video codec standards Support full decoding acceleration Extendable to support hardware encoding functions in the future 5 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Decoding framework Ineroperability of OpenCL The AMD extensions to OpenCL provide functions for obtaining the platform context handle for creating: OpenDecode sessions Output surface buffers Command buffer queues Kernel objects OpenDecode API is a thin layer, it designed is to passes parameters to multimedia driver (in CL- UVDLib) for executing decoding tasks. Open GL API Layer CAL OpenCL API Layer OpenPlatform Intertop Layer CL-UVDLib Applications OpenDecode API Layer OpenDecode -CAL I/F 6 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Decoding pipeline Decompression takes place on the GPU, it operates concurrently with CPU, GPU compute and GPU graphics. Decode Post Processing Render App OpenDecode OpenCL OpenGL API UVD Compute Shaders Display HW 7 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE The essential steps in performing a decode by using the Open Video Decode API can be summarized as follows: Initialization Context creation Session creation Decode execution Process Decode Output Session and context destruction 8 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Step 1: Initialization Use the OpenDecode functions to get the Device Info. and Capabilities: OVDecodeGetDeviceInfo Obtains the information about the device (including number of devices, device ID, Cap size of each device) OVDecodeGetDeviceCaps Obtains the supported output format and the compression profile. The application then can verify that these values support the requested decode OVDecodeGetDeviceInfo(&numDevices, deviceinfo); for(unsigned int i = 0; i < numdevices; i++){ ovdecode_cap *caps = new ovdecode_cap[deviceinfo[i].decode_cap_size]; status = OVDecodeGetDeviceCap(deviceInfo[i].device_id,deviceInfo[i].decode_cap_size,caps); for(unsigned int j = 0; j < deviceinfo[i].decode_cap_size; j++){ if(caps[j].output_format == OVD_NV12_INTERLEAVED_AMD && caps[j].profile == OVD_H264_HIGH_41){ ovdeviceid = deviceinfo[i].device_id; break;} } } 9 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Step 2: Creating the Context Create a context for the decode by using clcreatecontext: Initializes the OpenDecode function pointers Creates callbacks functions Allocates buffers // Use clgetplatformid() to obtain a platform_id provided by Advanced Micro Devices, Inc. intptr_t properties[] = {CL_CONTEXT_PLATFORM,(cl_context_properties)platform_id, CL_WGL_HDC_KHR, (inptr_t)hdc, CL_GL_CONTEXT_KHT, (inptr_t)hrc, 0}; ovdcontext = clcreatecontext( properties, 1, &cldeviceid, 0, 0, &err); 10 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Step 3: Creating the Session Create a decode session using OVDecodeCreateSession based on input/output format and video resolution: Call UVDLib to creates a decode driver session (initializes parameters based on input data) Performs the internal resource allocation (creates comand buffers and allocates surface memory) Initializes UVD configuration, register UVD client and send Start Decoder Session message to firmware. ovdecode_profile profile = OVD_H264_HIGH_41; ovdecode_format oformat = OVD_NV12_INTERLEAVED_AMD; owidth = video_width; oheight = video_height; session = OVDecodeCreateSession( ovdcontext, ovdeviceid, profile, oformat, owidth, oheight); 11 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Step 4: Decoding Decode execution goes through OpenCL and starts the UVD decode function output surface = clcreatebuffer((cl_context)ovdcontext, CL_MEM_READ_WRITE, host_ptr_size, NULL, &err); OVresult res = OVDecodePicture(session, &picture_parameter,//profile, level, resolution &pic_parameter_2, pic_parameter_2_size,// codec dependent &bitstream_data, bitstream_data_read_size, slice_data_control, slice_data_control_size,//multi-slice output_surface, num_event_in_wait_list, NULL, &eventrunvideoprogram, 0); clwaitforevents(1, (cl_event *)&(eventrunvideoprogram)); OVDecodePicture Latency: ~5ms for 720x480; ~12ms for 1920x1080 12 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Step 5: Process Decode Output The decompressed video data can be read to CPU or post-processed by using the GPU compute shader units, then displayed by OpenGL rendering pipeline Read nv12 data from video buffer to host buffer clenqueuereadbuffer(cl_cmd_queue, Post-process nv12_buffer, CL_TRUE, 0, clenqueuendrangekernel(cl_cmd_queue, owidth*oheight*1.5, g_decoded_frame, 0, NULL, 0); nv12_to_rgb_kernel, 2, 0, // YUV to RGB conversion kernel globalthreads, localthreads,0, 0, 0); 13 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

OPENDECODE Step 6: Destroying Session and Context The final step is to release the resources and close the decode session OVDecodeDestroySession Frees allocation of resources needed for the decode session and set the UVD clock to idle state clreleasecontext Frees driver session and all internal resources clreleasememobject((cl_mem)output_surface); OVDecodeDestroySession(session); clreleasecontext((cl_context)ovdcontext); 14 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

VIDEO DEBLURRING ALGORITHMS Video/Image enhancement by using Deblurring Algorithms Aims to remove undesired artifacts in images, including blurring introduced by an optical system, a motion blur, and a noise from electronic sources Removing blur leads to deconvolution techniques The complex deconvolution processing methods can be very time consuming and their computation can be challenging even for recent computer hardware This presentation demonstrates a detailed performance evaluation of deconvolution algorithms with respect to their OpenCL acceleration 15 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

VIDEO DEBLURRING ALGORITHMS Wiener Filter http://blogs.mathworks.com/steve/2007/11/02/image-deblurring-wiener-filter/ A simple frequency domain deconvolution method Assume a shift-invariant blur model with independent additive noise, which is given by y(m,n) = h(m,n)*x(m,n) +u(m,n) For known blurring and noise models, Wiener Filter only requires 2 FFTs for convolving each color channel with the inverse filter. However, its quality is usually much worse than sophisticated deconvolution methods 16 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

VIDEO DEBLURRING ALGORITHMS Krishnan and Fergus s algorithm Recently Dilip Krishnan and Rob Fergus have provided a fast deconvolution algorithm that yields high quality results This method use a Hyper-Laplacian image prior to regularizing the deconvolution problem The resulting optimization problem is solved by using alternating minimization in conjunction with a half-quadratic penalty While one sub-problem is to calculate the deblurred image by FFT, the other sub-problem is to find a optimal auxiliary variable W This method has been shown to be efficiently accelerated by GPU programming 17 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

VIDEO DEBLURRING ALGORITHMS The complete deconvolution algorithm using Hyper-Laplacian Priors refer to Fast Image Deconvolution using Hyper-Laplacian Priors by Dilip Krishnan and Rob Fergus http://cs.nyu.edu/~dilip/wordpress/papers/fid_nips09.pdf 18 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

VIDEO DEBLURRING ALGORITHMS W-sub problem For a fixed value of α, w in (5) only depends on two variables, β and v, hence can easily be tabulated to form a lookup table X-sub problem Computational requirement: Krishnan and Fergus s method requires 3 FFTs for each iteration and a total of 18 FFTs for the entire deconvolution (for T=1 and 6 values of β) for each color channel. 19 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

ACCELERATION BY CLAMDFFT clamdfft is an OpenCL library implementation of discrete Fast Fourier Transforms clamdfft 1.2 features: Provides a fast and accurate platform for calculating discrete FFTs Single precision floating point formats Works on CPU, GPU, or APU 1D, 2D, and 3D transforms with batch support Planar and interleaved data formats Power-of-2 dimensions (other radices in progress) 20 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

ACCELERATION BY CLAMDFFT Client program provided Initialization (create OpenCL context and set up devices) clamdfftsetup( setupdata.get( )) Data Initialization clamdfftcreatedefaultplan(&plhandle, context, dim, cllengths) clamdfftsetlayout(plhandle, inlayout, outlayout );//set 2D, I/O buffers, Planar complex Bake Plan clamdfftbakeplan( plhandle, 1, &queue, NULL, NULL ) Set Up Temp Buffer as needed clamdfftgettmpbufsize(plhandle, &buffer size ) clcreatebuffer ( context, CL_MEM_READ_WRITE, buffer size, 0, &medstatus); 21 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

ACCELERATION BY CLAMDFFT, CONT'D Execution clamdfftenqueuetransform( plhandle, CLFFT_FORWARD, 1, &queue, 0, NULL, &outevent, &clmembuffersin[ 0 ], BuffersOut, clmedbuffer ) clfinish(queue) Read Output Buffer clenqueuereadbuffer(queue,clmembuffersin[0],cl_true,0,buffsizebytesin,&output[0],0,null,null) Destroy and Clean-up clamdfftdestroyplan( &plhandle ) clamdfftteardown( ) 22 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

EXPERIMENTAL RESULTS Run times on an AMD Phenom II X4 Processor (2.6 GHz, 3.25GB RAM) with 32-bit Windows 7 system and AMD Radeon HD 6950 graphics card: 1920x1080 H.264 decode and 1024x1024 FFT for Wiener Filter 26.5ms per frame 1920x1080 H.264 decode and 1024x1024 FFT for Krishnan and Fergus s method 106ms per frame 720x480 H.264 decode and 512x512 FFT for Wiener Filter 8.4ms per frame 720x480 H.264 decode and 512x512 FFT for Krishnan and Fergus s method 28ms per frame 23 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

CONCLUSION AND REMARKS We have presented the use of OpenCL interoperability, OpenDecode and clamdfft for performing hardware decode and video quality enhancement by GPU Runtime performance indicates that we can achieve real-time H.264 decoding and deblurring for HD video by using Wiener filter and for 512x512 video by using Krishnan and Fergus s method for high quality Optimization opportunities: The hardware decoding latency can hidden by interleaving OVDecodePicture with post-processing The efficiency of clamdfft can be improved by utilizing transforms in batches, which can minimize the penalty of transfer overhead This presentation is based on the OpenDecode API Tutorial and the OpenCL Fast Fourier Transforms document OpenDecode sample code can be downloaded at http://developer.amd.com/zones/openclzone/pages/openclappexamples.aspx clamdfft 1.2 binary packages are available for download at http://developer.amd.com/gpu/appmathlibs/pages/default.aspx 24 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011

QUESTIONS

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, AMD Phenom, AMD Radeon, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used with permission by Khronos. Windows is a registered trademark of Microsoft Corporation. 2011 Advanced Micro Devices, Inc. All rights reserved. 26 Real-time H.264 Video Enhancement by Using AMD APP SDK June 2011