A GPU-Enabled Rapid Image Processing Framework

Size: px

Start display at page:

Download "A GPU-Enabled Rapid Image Processing Framework"

Ferdinand Reynolds
5 years ago
Views:

1 A GPU-Enabled Rapid Image Processing Framework Mark Davey Lead HPC Engineer

2 The Foundry We develop, market and sell NUKE - Compositing MARI - 3D Texture Painting HIERO - Shot Management KATANA - Look-Development and Lighting MODO - 3D Modelling and Rendering

3 Who uses our software? Well established global client base: Film Animation Commercial advertising Broadcast Aardman Animations BaseFX Studios BlueBolt Blue Sky Studios Brainstorm Digital CoSA VFX Digital Domain Digital Fusion Double Negative Dreamworks The Embassy Fido Framestore Igloo VFX Industrial Light and Magic Jellyfish Look Effects Lucasfilm Entertainment March Entertainment Method Studios Mikros Image The Moving Picture Company Mr. X Passion Pictures Pixomondo Prime Focus Rushes Smoke and Mirrors Sony Pictures Imageworks Sony Pictures Animation The Mill Township Unexpected Union VFX Walt Disney Animation Warner Bros. Animation Weta Digital ZOIC Studios

4 At the heart - 2D Image Processing A fundamental component of our products Used in effects such as: Noise reduction Keying Motion and disparity estimation Colour correction 3D texture creation We need to make it as fast as possible!

5 Moving to GPUs Traditionally used the CPU for image processing Lots of legacy code GPUs are great at image processing Our customers often have GPUs, but not always (e.g. on render farms) So need a CPU path Do not want to write same code multiple times (debugging, maintenance, new hardware, etc)

6 Solution: A Rapid Image Processing Framework (RIP) Image processing algorithms expressed as kernels Kernels written in a C++ like, domain-specific language Kernels run over an iteration space Metadata expresses access patterns, image formats, bounday conditions, etc Kernels are converted to an Abstract Syntax Tree (AST) The AST is translated into different languages

7 Example Kernel class GainImage : Kernel<eComponentWise> { param: Image<eRead, epoint> src; Image<eWrite, epoint> dst; float gain; local: void define() { defineparam(gain, "gain", 1.0f); } void kernel() { dst() = src() * gain; } };

8 Example Kernel class GainImage : Kernel<eComponentWise> { param: Image<eRead, epoint> src; Image<eWrite, epoint> dst; float gain; local: void define() { defineparam(gain, "gain", 1.0f); } void kernel() { dst() = src() * gain; } };

9 Example Kernel class GainImage : Kernel<eComponentWise> { param: Image<eRead, epoint> src; Image<eWrite, epoint> dst; float gain; local: void define() { defineparam(gain, "gain", 1.0f); } void kernel() { dst() = src() * gain; } };

10 Kernel Types Iteration Independently run at every point in an iteration space. Example: Gain Rolling Run over iteration space in order, providing access to results from the previous point along an axis Example: Box blur Reduction Image data is reduced down to a single value example: Maximum pixel value

11 Kernel Metadata Granularity - Component Pixel Image access - Read Write Memory Access - Point Ranged Random Edge Methods - None Clamped Constant

12 Current Language Support CUDA OpenCL C++ Scalar C++ SIMD (SSE2, SSE4.1, AVX)

13 Current RIP-Based Effects Denoise Video Retimer Convolution Depth-Based Blur Motion Blur Vector Generator

14 Run-time RIP Generate kernels at run time (JIT) for specific image formats and data types Profile machine to determine best language options to use CPU kernels compiled using LLVM GPU kernels currently translated to OpenCL

15 Example - Denoise Proprietary Wavelet-Based Algorithm Requires 20+ Kernels Tunable parameters for best results Must run at interactive speeds Legacy CPU plug-in too slow

16 Denoise - Tests GPUs CPUs Quadro FX 3800M 4 SM Core i7m 2 Core + HT Quadro K600 1 SMX Core i7-3667u 2 Core + HT Quadro SM Xeon E Core Quadro SM Xeon X Core + HT Quadro K SMX Xeon E Core + HT Geforce GTX SM Xeon E Core + HT Tesla K20 13 SMX Image: 1920x1080 (1080p) RGB, 32-bit float

19 The RIP Node - Fast R&D Develop kernels at run-time within our software using the RIP language No other development tools required Automatically creates parameter sliders via kernel introspection Use graph of nodes to create complex algorithms Great for rapid research and development

20 Example - Directional Blur CPU: Xeon X5550 GPU: Quadro 6000 Legacy CPU: 227s GPU RIP: 5.6s (40 times faster) GPU RIP Pixel: 3.0s (75 times faster)

21 GPU Image Processing - Issues Memory is finite and limited Our software supports very large images Not always possible to process a whole image on a GPU Point, and ranged access processing is tiled Relatively long transfer times Try to keep intermediate data on GPU as long as possible We are working on better caching

22 The Future - Highlights Beyond 2D image processing 3D data Deep data Arrays of structures Heterogeneous computing Use all available devices Efficient scheduling Minimise data transfers Unified kernel results Greater GPU optimisation Better caching Better use of Kepler architecture

23 Questions?

Accelerating high-end compositing with CUDA in NUKE. Jon Wadelton NUKE Product Manager

Accelerating high-end compositing with CUDA in NUKE Jon Wadelton NUKE Product Manager 2 Overview What is NUKE? Image processing - exploiting the GPU The Foundry Approach Simple examples in NUKEX Real world