STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

Similar documents
ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

viewdle! - machine vision experts

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

SIMULATOR AMD RESEARCH JUNE 14, 2015

Designing Natural Interfaces

Panel Discussion: The Future of I/O From a CPU Architecture Perspective


FUSION PROCESSORS AND HPC

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

Understanding GPGPU Vector Register File Usage

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

HyperTransport Technology

Automatic Intra-Application Load Balancing for Heterogeneous Systems

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

Vulkan (including Vulkan Fast Paths)

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

AMD EPYC CORPORATE BRAND GUIDELINES

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

RegMutex: Inter-Warp GPU Register Time-Sharing

Fusion Enabled Image Processing

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

Cilk Plus: Multicore extensions for C and C++

ROCm: An open platform for GPU computing exploration

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

Sequential Consistency for Heterogeneous-Race-Free

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v3.0

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

Driver Options in AMD Radeon Pro Settings. User Guide

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

AMD RYZEN CORPORATE BRAND GUIDELINES

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v2.1

Generic System Calls for GPUs

AMD AIB Partner Guidelines. Version February, 2015

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

Anatomy of AMD s TeraScale Graphics Engine

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

MULTIMEDIA PROCESSING Real-time H.264 video enhancement by using AMD APP SDK

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

Fan Control in AMD Radeon Pro Settings. User Guide. This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings.

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

User Guide. TexturePerformancePBO Demo

Heterogeneous Computing

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

1 Presentation Title Month ##, 2012

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

NVIDIA nforce 790i SLI Chipsets

AMD Radeon ProRender plug-in for Universal Scene Description. Installation Guide

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

AMD 780G. Niles Burbank AMD. an x86 chipset with advanced integrated GPU. Hot Chips 2008

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

Forza Horizon 4 Benchmark Guide

NUMA Topology for AMD EPYC Naples Family Processors

High Performance Graphics 2010

Memory Population Guidelines for AMD EPYC Processors

DR. LISA SU

Solid State Graphics (SSG) SDK Setup and Raw Video Player Guide

H264 Encoder Codec. API Specification. 04/27/2017 Revision SOC Technologies Inc.

User Manual. Nvidia Jetson Series Carrier board Aetina ACE-N622

Graphics Hardware 2008

Device Pack. Network Video Management System Standard Edition. Release Note. Software Version: 9.5a Sony Corporation

INTRODUCING RYZEN MARCH

Thermal Design Guide for Socket SP3 Processors

Device Pack. Network Video Management System Standard Edition. Release Note. Software Version: Sony Corporation

Technical Report. SLI Best Practices

Transcription:

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session 2116 Christopher Mayer AMD Sr. Software Engineer

CONTENT Introduction Pinned Memory Streaming Video Data How does the APU change the game 3 Streaming Video Data Into 3D Applications June 2011

INTRODUCTION Use Cases Why streaming video content on the GPU Integrate video in a 3D scene Process video on the GPU Render additional information on the video Broadcast applications Moderate amount of video streams Complex rendering Surveillance systems Usually a huge number of video streams Simple rendering only 4 Streaming Video Data Into 3D Applications June 2011

INTRODUCTION AMD Ventuz Demo Showed at ISE 2011 5 Streaming Video Data Into 3D Applications June 2011

REQUIREMENTS Fast data transfer Low latency High bandwidth Small setup time for transfer Reduced amount memory copies Constant frame rates No frame drops Easy access to data buffer 6 Streaming Video Data Into 3D Applications June 2011

REQUIREMENTS Data Size 720x525 1280x720 1920x1080 2048x1536 Number of Pixels 378 000 921 600 2 073 600 3 145 728 Size of one Frame (RGB) 1.08 MB 2.64 MB 5.93 MB 9 MB Bandwidth when playing at 60 HZ 64.88 MB/sec 158 MB/sec 356 MB/sec 540 MB/sec 7 Streaming Video Data Into 3D Applications June 2011

DATA PATH Graphics Capture System Memory 8 Streaming Video Data Into 3D Applications June 2011

AMD PINNED MEMORY 9 Streaming Video Data Into 3D Applications June 2011

PINNED MEMORY ON AMD FIREPRO TM Pinned memory is non-swappable system memory The memory can directly be accessed by the GPU Memory needs to be allocated by the application The memory needs to be aligned to the page size (usually 4K) The driver will pin the memory On AMD FirePro TM, the extension AMD_pinned_memory can be used to create buffers AMD_EXTERNAL_VIRTUAL_MEMORY is available as target for glbufferdata Access to the memory is not synchronized by the driver. The application needs to control access to the buffers. GLSync objects can be used to verify if a transfer into or from a buffer is finished Pinned memory buffers can be used in the same way as other OpenGL buffer objects e.g., they can be bound as GL_PIXEL_UNPACK_BUFFER 10 Streaming Video Data Into 3D Applications June 2011

PINNED MEMORY Buffer Creation // Allocate system memory and add 4K for alignment m_pbuffermemory[i].pbasepointer = new char[m_uibuffersize + 4096]; ZeroMemory(m_pBufferMemory[i].pBasePointer, (m_uibuffersize + 4096)); // Align memory to 4K boundaries long addr = (long) m_pbuffermemory[i].pbasepointer; m_pbuffermemory[i].palignedpointer = (char*)((addr + 4095) & (~0xfff)); // create buffer to downstream data and pin the memory glbindbuffer(gl_external_virtual_memory_amd, m_pbuffer[i]); glbufferdata(gl_external_virtual_memory_amd, m_uibuffersize, m_pbuffermemory[i].palignedpointer, GL_STREAM_DRAW); glbindbuffer(gl_external_virtual_memory_amd, 0) The application can update the buffer at any time by writing to m_pbuffermemory[i].palignedpointer The application can read the buffer content at any time by accessing m_pbuffermemory[i].palignedpointer No map / unmap calls needed Make sure the buffer is currently not accessed by the GPU 11 Streaming Video Data Into 3D Applications June 2011

PINNED MEMORY Buffer Access Copy data from a buffer into a texture // Bind buffer as unpack buffer to copy data into a texture object glbindbuffer(gl_pixel_unpack_buffer, m_pbuffer[m_uibufferidx]); // Copy pinned memory to texture gltexsubimage2d(gl_texture_2d, 0, 0, 0, m_uitexwidth, m_uitexheight, m_nextformat, m_ntype, NULL); // Insert Sync object to check for completion m_unpackfence = glfencesync(gl_sync_gpu_commands_complete, 0); Copy data from framebuffer into pinned memory buffer // Copy FB into pinned mem buffer glreadpixels(0, 0, m_uibufferwidth, m_uibufferheight, m_nextformat, m_ntype, NULL); m_packfence = glfencesync(gl_sync_gpu_commands_complete, 0); Synchronizing the buffer access if (glissync(fence)) { // Make sure that buffer memory is no longer accessed by drawing glclientwaitsync(fence, GL_SYNC_FLUSH_COMMANDS_BIT, OneSecond); gldeletesync(fence); } // Bind buffer as pack buffer to copy data into a texture object glbindbuffer(gl_pixel_pack_buffer, m_ppackbuffer[m_uibufferidx]); 12 Streaming Video Data Into 3D Applications June 2011

PINNED MEMORY - PERFORMANCE PBO vs. Pinned Memory 3.00 Speedup 2.50 2.00 1.50 1.00 0.50 0.00 256x256 720x525 720x625 1280x720 1920x1080 2048x1536 13 Streaming Video Data Into 3D Applications June 2011

PINNED MEMORY Summary Easy access since memory is always present No mapping/un-mapping is required Reduced overhead for data transfer Lower latency Best choice to download permanently changing data Buffer access needs to be synchronized by the application 14 Streaming Video Data Into 3D Applications June 2011

15 Streaming Video Data Into 3D Applications June 2011 STREAMING DATA

STREAMING DATA Goals Continuous data acquisition at constant rate e.g., DVD player at 59.94 HZ No input frames should be dropped Rendering needs to happen at constant frame rate No tearing on video data No stuttering while displaying video data as texture Data acquisition Rendering capture Transfer to memory Transfer to GPU Render 16 Streaming Video Data Into 3D Applications June 2011

STREAMING DATA N N+1 N+2 N+3 N+4 N+5 N+6 N N+1 N+2 N+3 N+5 Capture Render N N+1 N+2 N+3 N+4 17 Streaming Video Data Into 3D Applications June 2011

STREAMING DATA Buffer Access Data Acquisition Rendering WaitFor VBlank Wait for a empty buffer Grant write access Copy to Texture ReleaseBuffer GetBuffer Buffer 1 Buffer 2 GetBuffer Image Processing Wait for a full buffer Grant read access CopyToBuffer ReleaseBuffer Draw 18 Streaming Video Data Into 3D Applications June 2011

STREAMING DATA Synchronizing the Buffer // get a buffer for writing. Produce new data unsigned int SyncedBuffer::getBufferForWriting(char* &pbuffer) { // Wait until an empty slot is available WaitForSingleObject(m_hNumEmpty, INFINITE); } // Enter critical section WaitForSingleObject(m_pBuffer[m_uiHead].hMutex, INFINITE); pbuffer = m_pbuffer[m_uihead].pdata; return m_uihead; // get a buffer for reading. Consume data unsigned int SyncedBuffer::getBufferForReading(char* &pbuffer) { // Wait until the buffer is available WaitForSingleObject(m_hNumFull, INFINITE); } // Block buffer WaitForSingleObject(m_pBuffer[m_uiTail].hMutex, INFINITE); pbuffer = m_pbuffer[m_uitail].pdata; return m_uitail; void SyncedBuffer::releaseWriteBuffer() { // Leave critical section ReleaseSemaphore(m_pBuffer[m_uiHead].hMutex, 1, 0); } // Increment the number of Full buffers ReleaseSemaphore(m_hNumFull, 1, &m_lnumfullelements); ++m_lnumfullelements; // switch to next buffer m_uihead = (m_uihead + 1) % m_uisize; void SyncedBuffer::releaseReadBuffer() { // Release buffer ReleaseSemaphore(m_pBuffer[m_uiTail].hMutex, 1, NULL); } // Increase number of emty buffers ReleaseSemaphore(m_hNumEmpty, 1, NULL); // switch to next buffer m_uitail = (m_uitail + 1) % m_uisize; 19 Streaming Video Data Into 3D Applications June 2011

20 Streaming Video Data Into 3D Applications June 2011 HOW DOES THE APU CHANGE THE GAME

HOW DOES THE APU CHANGE THE GAME Having an APU and discrete graphics in a system allows distribution of work to two GPUs Additional computing steps that can be implemented efficiently on a GPU can be handled by the APU in parallel to the rendering on the discrete GPU More time for rendering is available on the discrete GPU 21 Streaming Video Data Into 3D Applications June 2011

HOW DOES THE APU CHANGE THE GAME Usually we have time left in the capture thread N N+1 The remaining time can be used to augment quality Doing de-interlacing Performing color space conversion Post processing of image data Those tasks can benefit greatly by running on a SIMD Engine Running those tasks on the APU frees time in the Render thread to augment complexity of 3D content Capture Render N 22 Streaming Video Data Into 3D Applications June 2011

HOW DOES THE APU CHANGE THE GAME Data Acquisition and Processing Using the APU Rendering Using the Discrete GPU WaitFor VBlank Wait for an empty buffer Grant write access Copy to Texture ReleaseBuffer GetBuffer Buffer 1 Buffer 2 GetBuffer Draw Wait for a full buffer Grant read access Image processing CopyToBuffer ReleaseBuffer 23 Streaming Video Data Into 3D Applications June 2011

HOW DOES THE APU CHANGE THE GAME Pinned memory can be used for data exchange between APU and discrete GPU Since data needs to be loaded into memory, the additional costs for data transfer on the APU remains small The SIMD engine offers great benefit for image processing algorithms For video streaming the APU is a great additional resource to offload tasks from the discrete GPU 24 Streaming Video Data Into 3D Applications June 2011

25 Streaming Video Data Into 3D Applications June 2011

QUESTIONS

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, AMD FirePro, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. 2011 Advanced Micro Devices, Inc. All rights reserved. 27 Streaming Video Data Into 3D Applications June 2011