STREAMING VIDEO DATA INTO 3D APPLICATIONS Session 2116 Christopher Mayer AMD Sr. Software Engineer
CONTENT Introduction Pinned Memory Streaming Video Data How does the APU change the game 3 Streaming Video Data Into 3D Applications June 2011
INTRODUCTION Use Cases Why streaming video content on the GPU Integrate video in a 3D scene Process video on the GPU Render additional information on the video Broadcast applications Moderate amount of video streams Complex rendering Surveillance systems Usually a huge number of video streams Simple rendering only 4 Streaming Video Data Into 3D Applications June 2011
INTRODUCTION AMD Ventuz Demo Showed at ISE 2011 5 Streaming Video Data Into 3D Applications June 2011
REQUIREMENTS Fast data transfer Low latency High bandwidth Small setup time for transfer Reduced amount memory copies Constant frame rates No frame drops Easy access to data buffer 6 Streaming Video Data Into 3D Applications June 2011
REQUIREMENTS Data Size 720x525 1280x720 1920x1080 2048x1536 Number of Pixels 378 000 921 600 2 073 600 3 145 728 Size of one Frame (RGB) 1.08 MB 2.64 MB 5.93 MB 9 MB Bandwidth when playing at 60 HZ 64.88 MB/sec 158 MB/sec 356 MB/sec 540 MB/sec 7 Streaming Video Data Into 3D Applications June 2011
DATA PATH Graphics Capture System Memory 8 Streaming Video Data Into 3D Applications June 2011
AMD PINNED MEMORY 9 Streaming Video Data Into 3D Applications June 2011
PINNED MEMORY ON AMD FIREPRO TM Pinned memory is non-swappable system memory The memory can directly be accessed by the GPU Memory needs to be allocated by the application The memory needs to be aligned to the page size (usually 4K) The driver will pin the memory On AMD FirePro TM, the extension AMD_pinned_memory can be used to create buffers AMD_EXTERNAL_VIRTUAL_MEMORY is available as target for glbufferdata Access to the memory is not synchronized by the driver. The application needs to control access to the buffers. GLSync objects can be used to verify if a transfer into or from a buffer is finished Pinned memory buffers can be used in the same way as other OpenGL buffer objects e.g., they can be bound as GL_PIXEL_UNPACK_BUFFER 10 Streaming Video Data Into 3D Applications June 2011
PINNED MEMORY Buffer Creation // Allocate system memory and add 4K for alignment m_pbuffermemory[i].pbasepointer = new char[m_uibuffersize + 4096]; ZeroMemory(m_pBufferMemory[i].pBasePointer, (m_uibuffersize + 4096)); // Align memory to 4K boundaries long addr = (long) m_pbuffermemory[i].pbasepointer; m_pbuffermemory[i].palignedpointer = (char*)((addr + 4095) & (~0xfff)); // create buffer to downstream data and pin the memory glbindbuffer(gl_external_virtual_memory_amd, m_pbuffer[i]); glbufferdata(gl_external_virtual_memory_amd, m_uibuffersize, m_pbuffermemory[i].palignedpointer, GL_STREAM_DRAW); glbindbuffer(gl_external_virtual_memory_amd, 0) The application can update the buffer at any time by writing to m_pbuffermemory[i].palignedpointer The application can read the buffer content at any time by accessing m_pbuffermemory[i].palignedpointer No map / unmap calls needed Make sure the buffer is currently not accessed by the GPU 11 Streaming Video Data Into 3D Applications June 2011
PINNED MEMORY Buffer Access Copy data from a buffer into a texture // Bind buffer as unpack buffer to copy data into a texture object glbindbuffer(gl_pixel_unpack_buffer, m_pbuffer[m_uibufferidx]); // Copy pinned memory to texture gltexsubimage2d(gl_texture_2d, 0, 0, 0, m_uitexwidth, m_uitexheight, m_nextformat, m_ntype, NULL); // Insert Sync object to check for completion m_unpackfence = glfencesync(gl_sync_gpu_commands_complete, 0); Copy data from framebuffer into pinned memory buffer // Copy FB into pinned mem buffer glreadpixels(0, 0, m_uibufferwidth, m_uibufferheight, m_nextformat, m_ntype, NULL); m_packfence = glfencesync(gl_sync_gpu_commands_complete, 0); Synchronizing the buffer access if (glissync(fence)) { // Make sure that buffer memory is no longer accessed by drawing glclientwaitsync(fence, GL_SYNC_FLUSH_COMMANDS_BIT, OneSecond); gldeletesync(fence); } // Bind buffer as pack buffer to copy data into a texture object glbindbuffer(gl_pixel_pack_buffer, m_ppackbuffer[m_uibufferidx]); 12 Streaming Video Data Into 3D Applications June 2011
PINNED MEMORY - PERFORMANCE PBO vs. Pinned Memory 3.00 Speedup 2.50 2.00 1.50 1.00 0.50 0.00 256x256 720x525 720x625 1280x720 1920x1080 2048x1536 13 Streaming Video Data Into 3D Applications June 2011
PINNED MEMORY Summary Easy access since memory is always present No mapping/un-mapping is required Reduced overhead for data transfer Lower latency Best choice to download permanently changing data Buffer access needs to be synchronized by the application 14 Streaming Video Data Into 3D Applications June 2011
15 Streaming Video Data Into 3D Applications June 2011 STREAMING DATA
STREAMING DATA Goals Continuous data acquisition at constant rate e.g., DVD player at 59.94 HZ No input frames should be dropped Rendering needs to happen at constant frame rate No tearing on video data No stuttering while displaying video data as texture Data acquisition Rendering capture Transfer to memory Transfer to GPU Render 16 Streaming Video Data Into 3D Applications June 2011
STREAMING DATA N N+1 N+2 N+3 N+4 N+5 N+6 N N+1 N+2 N+3 N+5 Capture Render N N+1 N+2 N+3 N+4 17 Streaming Video Data Into 3D Applications June 2011
STREAMING DATA Buffer Access Data Acquisition Rendering WaitFor VBlank Wait for a empty buffer Grant write access Copy to Texture ReleaseBuffer GetBuffer Buffer 1 Buffer 2 GetBuffer Image Processing Wait for a full buffer Grant read access CopyToBuffer ReleaseBuffer Draw 18 Streaming Video Data Into 3D Applications June 2011
STREAMING DATA Synchronizing the Buffer // get a buffer for writing. Produce new data unsigned int SyncedBuffer::getBufferForWriting(char* &pbuffer) { // Wait until an empty slot is available WaitForSingleObject(m_hNumEmpty, INFINITE); } // Enter critical section WaitForSingleObject(m_pBuffer[m_uiHead].hMutex, INFINITE); pbuffer = m_pbuffer[m_uihead].pdata; return m_uihead; // get a buffer for reading. Consume data unsigned int SyncedBuffer::getBufferForReading(char* &pbuffer) { // Wait until the buffer is available WaitForSingleObject(m_hNumFull, INFINITE); } // Block buffer WaitForSingleObject(m_pBuffer[m_uiTail].hMutex, INFINITE); pbuffer = m_pbuffer[m_uitail].pdata; return m_uitail; void SyncedBuffer::releaseWriteBuffer() { // Leave critical section ReleaseSemaphore(m_pBuffer[m_uiHead].hMutex, 1, 0); } // Increment the number of Full buffers ReleaseSemaphore(m_hNumFull, 1, &m_lnumfullelements); ++m_lnumfullelements; // switch to next buffer m_uihead = (m_uihead + 1) % m_uisize; void SyncedBuffer::releaseReadBuffer() { // Release buffer ReleaseSemaphore(m_pBuffer[m_uiTail].hMutex, 1, NULL); } // Increase number of emty buffers ReleaseSemaphore(m_hNumEmpty, 1, NULL); // switch to next buffer m_uitail = (m_uitail + 1) % m_uisize; 19 Streaming Video Data Into 3D Applications June 2011
20 Streaming Video Data Into 3D Applications June 2011 HOW DOES THE APU CHANGE THE GAME
HOW DOES THE APU CHANGE THE GAME Having an APU and discrete graphics in a system allows distribution of work to two GPUs Additional computing steps that can be implemented efficiently on a GPU can be handled by the APU in parallel to the rendering on the discrete GPU More time for rendering is available on the discrete GPU 21 Streaming Video Data Into 3D Applications June 2011
HOW DOES THE APU CHANGE THE GAME Usually we have time left in the capture thread N N+1 The remaining time can be used to augment quality Doing de-interlacing Performing color space conversion Post processing of image data Those tasks can benefit greatly by running on a SIMD Engine Running those tasks on the APU frees time in the Render thread to augment complexity of 3D content Capture Render N 22 Streaming Video Data Into 3D Applications June 2011
HOW DOES THE APU CHANGE THE GAME Data Acquisition and Processing Using the APU Rendering Using the Discrete GPU WaitFor VBlank Wait for an empty buffer Grant write access Copy to Texture ReleaseBuffer GetBuffer Buffer 1 Buffer 2 GetBuffer Draw Wait for a full buffer Grant read access Image processing CopyToBuffer ReleaseBuffer 23 Streaming Video Data Into 3D Applications June 2011
HOW DOES THE APU CHANGE THE GAME Pinned memory can be used for data exchange between APU and discrete GPU Since data needs to be loaded into memory, the additional costs for data transfer on the APU remains small The SIMD engine offers great benefit for image processing algorithms For video streaming the APU is a great additional resource to offload tasks from the discrete GPU 24 Streaming Video Data Into 3D Applications June 2011
25 Streaming Video Data Into 3D Applications June 2011
QUESTIONS
Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, AMD FirePro, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. 2011 Advanced Micro Devices, Inc. All rights reserved. 27 Streaming Video Data Into 3D Applications June 2011