1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l. her.
2 Previous work Significant difference pre- and post-cuda Much work done on CPUs and customized hardware, but no satisfactory results compared to rasterization A highly optimized CPU implementation has been developed. It takes advantage of caching, SIMD instructions and frame to frame coherence Programmable ray processing unit (RPU) chip specialized for raytracing Low framerates for all these techniques. True real-time performance has been obtained using GPUs
3 Introduction to ray-tracing Ray tracing is used to render realistic shadows, reflections, scattering and dispersion High degree of visual realism, as it approximates the behaviour of light drawback is high computational complexity Traditionally done on CPUs, where it could take hours to render a single scene. If the high number of cores on GPUs is taken advantage of, ray tracing is a viable option for real-time visualization Ray tracing is an embarrassingly parallel problem. Each ray is traced independently, typically one ray per pixel
4
5 Optimization techniques 75% to 95% of time spent on intersection computing It is possible to skip large amounts of objects, easy example is rays cast in lower half means objects in upper half can be disregarded Uniform grid: Scene is divided into equally sized cubes storing data on contained objects. Intersected cubes are trivially computed k-d tree: Recursively split the scene into two parts separated by a plane Bounded volume hierarchy: Geometric form that entirely covers one or more objects. Organized in a tree where each node covers their children Early ray termination and volume skipping (ray casting)
6 Nvidia CUDA General purpose on GPU architecture Extension of C Freely available Host code executed on computer, device code executed on GPU Device code commonly called kernels. Kernel code executed in parallel on device General program flow: 1. Copy data to be processed from host to GPU 2. CPU instruct calculations done on GPU 3. Result is copied back to host CUDA 5.0 released October 15th
7 Nvidia OptiX Programmable ray tracing engine released September 21, 2009 Runs on top of Nvidia CUDA. Set of library functions for graphics rendering and other applications using ray tracing Programmer writes programs (kernels) that handle the various events of ray tracing, e.g. ray generation, ray hit/miss, bounding box intersection etc. Programs executed during different stages of the ray tracing. This and implementation of desired behaviour is core of the flexibility In host code an API set up through a context structure containing the configuration and components of ray tracing the programs, geometry and the surface properties of materials
8 Presented implementations Snow crystal scene implemented from scratch Centerpiece scene with the camera rotating around a center object with different modes and objects Test suite with 14 parameteres for finding optimized settings for ray-tracing snow particles
9 Snow crystal scene Polygon models of snow crystals, transparent material. Crystals rotate while falling down slowly Each crystal has a region at the top where it respawns when it reaches the bottom Scheme where some crystal spawn close to the camera, and some spawn at the back of the scene' Snow crystal models with 516 and 1040 triangles Complex glass shader from OptiX SDK
10 Testing system and results GPU: Nvidia Quadro FX 5800 CPU: Intel Core 2 Quad Q9550 2.83 Ghz Memory: 4x Corsair 2 GB DDR3 1333 Mhz OS: Microsoft Windows XP 64 bit Compiler: Microsoft Visual Studio 2008 Main issue is refraction and reflection when rays hits snow crystals Shows how ray branching is the major challenge of ray tracing performance, GPUs do not handle this well Camera position and surface type does also greatly affect performance Though OptiX is capable of rendering scenes with highdefinition and complex models in real-time
11 Performance vs optimized ray-tracers Optix results 3-4 times slower, both results from GT200 generation GPUs Reasons are that OptiX is more flexible, and the implementations in AL09 was hand optimized at assembly levels for a specific scene
12 Speedup with multiple GPUs As expected there was a slowdown on the simplest scenes Almost perfect speedup for some complex scenes Scenes that speed up well have a lot of GPU computations compared to tasks like image display, data transfer and CPU computations
13 Conclusions and future work OptiX engine gives control over the power of modern GPUs without having to create a ray tracer from scratch True real-time performance of ray tracing was achieved, with fps ranging from 20,63 to 67,51 on 1 GPU. Though still slower than hand-optimized ray-tracers Near perfect speedup on dual GPU for scenes with high computational complexity GPUs cannot do branching efficiently. Efficient branching is important as the directions of reflected rays are not known in advance Incorporate ideas and ideas from recent work into full-scale applications for use in medical imaging etc. Optimize for OptiX