Realtime and Interactive Ray Tracing Basics and Latest Developments

Size: px

Start display at page:

Download "Realtime and Interactive Ray Tracing Basics and Latest Developments"

Clyde Hodges
6 years ago
Views:

1 Realtime and Interactive Ray Tracing Basics and Latest Developments Thomas Wöllert (Dipl.-Inf. (FH)) Matriculation no Semestergroup IG2 Advanced Seminar Semester Thesis Summer 2006 Master of Science Computer Graphics and Image Processing Department of Computer Science and Mathematics Munich University of Applied Sciences

3 Unsichtbarer Text, damit das Zitat in die Mitte der Seite gerückt werden kann. No pessimist ever discovered the secret of the stars or sailed an uncharted land, or opened a new doorway for the human spirit. Helen Keller ( ) [1]

5 Abstract Type: Author: Titel: Advanced Seminar Thesis Wöllert, Thomas (Dipl.-Inf. (FH)) Realtime and Interactive Ray Tracing, Basics and Latest Developments Date: 22 nd June 2006 Number of Pages: 66 Field of Study: University: Advisor: Master of Science - Computer Graphics and Image Processing Munich University of Applied Sciences, Germany Prof. Dr. A. Nischwitz Ray Tracing was first described by Arthur Appel in Due to its high rendering time it is almost solely used in pre-rendered motion pictures featuring realistic illumination, reflection and refraction. During the past years big strides have been made to prepare ray tracing for realtime applications like computer games and virtual dynamic environments. This semester thesis starts with an explanation of the principles of how ray tracing works, presenting the terms and different rendering styles (i.e. recursive, distribution etc.). Afterwards reasons, why ray tracing is not already used in todays graphics cards, as well as pros and cons refering to common rasterization are discussed. Section 3 focuses on different methods to speed up the ray tracing process. This can be accomplished by reducing the number of rays or by using acceleration structures (i.e. uniform grid, kd-tree, etc.) to improve the intersection tests, taking up most of the time in the rendering process. Additional information on the latest developments regarding acceleration structures especially designed for dynamic scenes, can be found in Chapter 4. With the described tools it is possible to realize different approaches aiming at realtime ray tracing, which are presented in Chapter 5 (CPU, hybrid CPU-GPU, GPU, special purpose hardware). A special focus is laid upon GPU ray tracers in Section 6, explaining different approaches and their problems. A benchmark visualizing the results concludes this chapter. Vital for the mass marketing of ray tracing is an easy-to-use programming interface. One approach with that potential is described in Part 7 of this document: OpenRT Realtime Ray Tracing API, featuring an OpenGL-like programming language. Specific applications of ray tracing are described in Section 8, presenting graphical (i.e. massively complex models) and non-graphical (i.e. collision detection, artificial intelligence) examples. The final Part 9 concludes this document. Keywords: ray tracing, interactive rendering, programmable graphics hardware, GPU, acceleration structures

7 Contents Abstract List of Figures List of Tables v ix xi 1 Introduction Task Motivation Overview Basic Principles of Ray Tracing Terms The Ray Tracing Rendering Algorithm The First Approach Recursive Ray Tracing Distribution Ray Tracing Shader Ray Tracing Ray Tracing vs. Rasterization Acceleration Methods Computing Less Samples in the Image Plane Reducing the Number of Secondary Rays Accelerated Intersection Tests Primitive-specific Tests Bounding Volumes Spatial Subdivision Schemes Bounding Volume Hierarchies vii

8 viii CONTENTS 4 Acceleration Methods for Dynamic Scenes Dynamic Bounding Volume Hierarchies Coherent Grid Traversal Distributed Interactive Ray Tracing Benchmarking Animated Ray Tracing Approaches to realize Realtime Ray Tracing Software Approaches Using Programmable GPUs Special purpose Hardware Architectures Using the GPU for Ray Tracing Stream Computation Choosing the Acceleration Structure Benchmarks Realtime Ray Tracing API (RTRT/OpenRT) 39 8 Applications Computer Graphic Applications Computer Games Visualizing Massively Complex Models Free-Form Surfaces and Volume Ray Tracing Mixed Reality Rendering A.I.-Vision and Collision Detection Conclusions Summary Final Thoughts Bibliography 51

9 List of Figures 1.1 Rendered Image from Final Fantasy - The Spirits Within (Courtesy of Columbia Pictures) Rendered Image from Ice Age II - Meltdown (Courtesy of Twentieth Century Fox) Basic Ray Tracing: Rays being cast into a scene, filled with primitives (Courtesy of Wald [8]) Recursive Ray Tracing: Generating secondary rays at t hit of the primary ray (Courtesy of Wald [8]) Spatial Subdivision: Triangle B belongs to multiple voxels. Stopping the ray traversal early would mean to miss the intersection with triangle A (Courtesy of Thrane [20]) Uniform grid: Traversal of an uniform grid (Courtesy of Havran [9]) KD-Tree: Three steps of the tree construction (Courtesy of Havran [9]) BVH: Tree for cow example (Courtesy of Somchaipeng [23]) BVH: A solid cow and the levels in its bounding volume hierarchy (Courtesy of Somchaipeng [23]) Dynamic BVH: Two childnodes of a BVH tree. Bounding volumes move over time (left to right) (Courtesy of Wald [24]) Dynamic BVH: Rendered scene using triangles, rendering at 1024x1024 pixels (Courtesy of Wald [24]) Coherent Grid Traversal: Computation steps (Courtesy of Wald [25]) Coherent Grid Traversal: Adaption to the scene geometry of kd-tree (a) and grid (b) (Courtesy of Wald [25]) Distributed Ray Tracing: Robots (left) with color-coded objects (right). Triangles of the same object belong to the same color (Courtesy of Wald [26]) Distributed Ray Tracing: Two-level hierarchy with a top level BSP-tree containing references to instances. Objects consist of geometry and a local BSP tree (Courtesy of Wald [26]) ix

10 x LIST OF FIGURES 4.7 BART: Images from kitchen (left) and robots (right) (Courtesy of Lext [28]) BART: Images from museum (Courtesy of Lext [28]) VIZARD: A rendered volume dataset of a lobster (Courtesy of Meißner [41]) VolumePro: Rendered medical volume data (Courtesy of Mitsubishi Electric [42]) SaarCOR: Scenes rendered at 4.5 and 7.5 fps on the 66 MHz prototype (Courtesy of Woop [45]) The stream programming model (Courtesy of Purcell [39]) The programmable graphics pipeline (Courtesy of Purcell [39]) The kernels used in the streaming ray tracer (Courtesy of Purcell [39]) Uniform grid stored in several textures (Courtesy of Purcell [39]) Example for traversal of a BVH tree in a texture (Courtesy of Thrane [20]) Benchmark: Purcell GPU Test Scenes: Cornell box, teapotahedron, Quake 3 (Courtesy of Purcell [39]) Benchmark: Thrane & Simonsen: Cows, Robots, Kitchen and Bunny (from left to right) (Courtesy of Thrane [20]) Quake 3 Raytraced: Multiple reflections Quake 3 Raytraced: Look-through portal Quake 3 Raytraced: High number of polygons Quake 3 Raytraced: Area crowded with characters UNC power plant: Highly detailed building (Courtesy of Wald [55]) UNC power plant: Rendering with shadows (Courtesy of Wald [55]) Boing 777: Overview (Courtesy of Boeing Corp.) Boing 777: Engine interior (Courtesy of Boeing Corp.) Boing 777: Dynamic loading (at startup) (Courtesy of Boeing Corp.) Boing 777: Dynamic loading (after startup) (Courtesy of Boeing Corp.) Free-Form Surfaces: Round face (Courtesy of Benthin [57]) Free-Form Surfaces: Chess game (Courtesy of Benthin [57]) Volume Ray Tracing: Bonsai (Courtesy of Marmitt [58]) Volume Ray Tracing: Skull (Courtesy of Marmitt [58]) Mixed Reality: Shadows and reflections (Courtesy of Pomi [59] [60]) Mixed Reality: TV with reflections (Courtesy of Pomi [59] [60]) Mixed Reality: TV as red light source (Courtesy of Pomi [59] [60]) Mixed Reality: TV as green light source (Courtesy of Pomi [59] [60]) A.I. Vision - Visible player (Courtesy of Pohl [52]) A.I. Vision - Hidden player (Courtesy of Pohl [52])

11 List of Tables 6.1 Benchmark: CPU-GPU Hybrid: Speedups if using the GPU to render the teapot scene Benchmark: Purcell GPU: Scene complexity and results, with eye rays (E), shadow rays (S), reflection rays (R) Benchmark: Thrane & Simonsen: Scene complexity Benchmark: Thrane & Simonsen: Average rendering times in milliseconds per frame, including shadow and reflection rays where applicable

13 Chapter 1 Introduction This semester thesis was created in the context of the Master of Science advanced seminar, held at the Munich University of Applied Sciences [2]. The main focus was laid upon using the GPU 1 on the modern graphics cards for acceleration purposes. 1.1 Task Staying up-to-date on the latest developments in a certain area is often a difficult task. Development and science take rapid strides forward every day, making it almost impossible for someone to stay informed, without needing a 48 hour day. The task of this advanced seminar was to study and collect information on a specific topic, and present them adequately in this written document and in a presentation for all participants. Ray tracing has the potential to determine the way, how computer graphics will evolve over the next decade. This document is designed as a starting point, someone can use to get an overview not only about the basics of ray tracing, but also about its latest developments (up to the time of this writing in June, 2006). 1.2 Motivation People have always tried to break out of their normal world. Reading books has been and still is a popular way to do that, with the drawback, that the experience only exists within the reader s mind. It is impossible to see this other world aside of paintings on the book s cover. Since the beginning of the 20 th century movies have tried to remedy this disadvantage, presenting fictional worlds to their viewers. To tell the story, models and puppeteers were used acting as starships or monsters. Today these first steps looked crude and not very convincing (sometimes the ropes hanging on the wings of the starfighter were visible). 1 GPU, Graphics Processing Unit 1

2 CHAPTER 1. INTRODUCTION Computer games were born around the mid of the last century, but found no mass market until the first cheap 8-bit home computers (i.e. Apple, Commodore VIC-20 etc.

Evolving over the years, movies and computers merged. Today no movie is created without extensive use of computer graphics, to make the story more convincing for the viewer.

14 2 CHAPTER 1. INTRODUCTION Computer games were born around the mid of the last century, but found no mass market until the first cheap 8-bit home computers (i.e. Apple, Commodore VIC-20 etc.) [3] were available around Due to the hardware s limited graphical abilities, most games of that time solely relied on text (i.e Zork by Infocom [4]). Evolving over the years, movies and computers merged. Today no movie is created without extensive use of computer graphics, to make the story more convincing for the viewer. As one of the first movies to be completely rendered within a computer, Final Fantasy - The Spirits Within (2001) (see Figure 1.1) [5] showed everyone, that human actors could be realistically replaced by computer-generated characters. Afterwards many motion pictures were following in these footsteps, i.e. Ice Age (2002), Ice Age II (2006) (see Figure 1.2) [6] and others. Figure 1.1: Rendered Image from Final Fantasy - The Spirits Within (Courtesy of Columbia Pictures). Figure 1.2: Rendered Image from Ice Age II - Meltdown (Courtesy of Twentieth Century Fox). Among other new graphic technologies, ray tracing played a vital role in creating these virtual worlds. It enabled the director to simulate realistic illumination and other effects. Major disadvantage of ray tracing is, that rendering one image takes its time, making it unfeasible for computer games. These games cannot rely on pre-rendered imagery, because the world needs to be dynamic, immediately responding to the user s actions. Ray tracing was first described by Arthur Appel in 1968 [7]. While being solely used in prerendered images for some time, ray tracing has become more and more interesting for realtime scenes during the past years. The processing power in todays computer CPUs 2 and graphics cards has evolved far enough, to make realtime ray tracing possible. 2 CPU, Central Processing Unit

15 1.3. OVERVIEW Overview The purpose of this document is to provide an overview of the field of ray tracing. Not only covering the basics but also presenting the latest developments in algorithms and hardware. This semester thesis starts with an explanation of the principles of how ray tracing works, presenting the terms and different rendering styles (i.e. recursive, distribution etc.). Afterwards reasons are discussed, why ray tracing is not already used in todays graphics cards, also mentioning pros and cons referring to common rasterization. As explained in Part 2, the intersection tests in the ray tracing algorithm need to be accelerated. Therefore Chapter 3 explains methods to speed up these tests by reducing the number of rays as well as creating support structures like uniform grids, kd-trees and others. However, these methods and support structures have been developed for static scenes. Using them in a dynamic environment often yields their problems adapting to these scenes. Latest developments mainly focus on creating new acceleration methods for dynamic scenes, which are explained in Section 4. Now with all the tools ready, Chapter 5 presents different approaches on how realtime ray tracing can be realized. Possibilities like CPU-, GPU-, hybrid CPU-GPU-, and special-purpose hardware techniques are discussed, explaining their advantages and drawbacks. A special focus is laid upon GPU ray tracers in Part 6, explaining different approaches and their problems. A benchmark visualizing the results concludes this chapter. Vital for the mass marketing of ray tracing is an easy-to-use programming interface. As one such approach the OpenRT Realtime Ray Tracing API is explained in Part 7, featuring an OpenGL-like programming language. Specific graphical applications of ray tracing are described in Section 8, featuring computer games, massively complex geometry rendering and others. Some applications leaving the graphics area are described afterwards, showing, how ray tracing can be useful for other fields of use like collision detection and artificial intelligence. Chapter 9 concludes this document, offering a resume as well as a perspective of future developments and how they will have an impact on the field of computer graphics.

17 Chapter 2 Basic Principles of Ray Tracing In contrast to polygonal rendering methods used on todays graphics cards, ray tracing tries to simulate the natural way light rays traverse through a scene. This chapter describes the basic elements and terms of ray tracing as well as different algorithm approaches. 2.1 Terms The core task of all ray tracing algorithms is to cast a ray of light into a scene containing geometry. If doing that, the ray might or might not intersect objects in the scene. The problem is now to determine, which object, if any, has been hit by the ray and where. In order to do that, the light ray needs to be defined by its origin o and its direction d. Using the time t as a variable leads to r(t) = o + t d. Additionally a time t max is defined, telling the algorithm, when to discard the ray in case it did not hit any geometry (the ray then left the scene). There are two different types of rays: primary rays and secondary rays. Primary rays are all rays where the origin o is the same position as the eye point. In contrast, secondary rays are only generated, if primary rays hit an object in the scene. Such secondary rays include shadow, reflection, and refraction rays. A more detailed description is given in section 2.2. The ray tracer itself covers three different problems as described by Wald in his PhD thesis [8]: Finding the closest intersection to the origin, finding any intersection along the ray, and finding all intersections along the rays path. Finding the closest intersection to the origin is the basic task of any ray tracer. It involves determining the time t hit, when the ray intersected with a primitive P in the scene. When P is hit, most algorithms also store additional information like the surface material or the normal vector in order to correctly shade the rays pixel or generate secondary rays if needed. The second problem is to find any intersection along the ray. This visibility test is applied to the ray between its origin o and its end o+t max d. Most ray tracers include very sophisticated algorithms capable of such tests, which are especially helpful if shooting shadow rays. Normal 5

hit at all. Finding all intersections along the path of a ray is only necessary for some advanced lighting algorithms as described by Wald [8] which are not very common (i.e. global monte carlo techniques to compute radiosity).

18 6 CHAPTER 2. BASIC PRINCIPLES OF RAY TRACING primary rays might require more intersection tests to determine the primitive which have been hit, but in case of a shadow ray it is only interesting, if a geometry is hit at all. Finding all intersections along the path of a ray is only necessary for some advanced lighting algorithms as described by Wald [8] which are not very common (i.e. global monte carlo techniques to compute radiosity). 2.2 The Ray Tracing Rendering Algorithm Since ray tracing was first described about thirty years ago, several different approaches have been invented, all with the goal to include more effects (i.e. reflections) than the previous algorithm The First Approach The first approach was created by Arthur Appel for the rendering of solid objects in 1968 [7]. A three dimensional scene needs to be rendered in a two dimensional image, which is then displayed on the screen. For each pixel in the image one primary ray is generated, originating at the eye point. In case an object is hit by a ray, the object s material properties and color determine the color of the ray and therefore the color of the pixel in the image (see Figure 2.1). Figure 2.1: Basic Ray Tracing: Rays being cast into a scene, filled with primitives (Courtesy of Wald [8]). At this stage the pixel of the image received a certain color, based on the primitive which has been hit by the associated light ray, but lights and shadows are not supported. A simple way to implement shadows is to generate a shadow ray everytime a primary ray hits an object. The origin o of the shadow ray is the intersection point determined for the primary ray: r(t hit ) = o eye + t hit d. The direction d of the shadow ray is defined by the position of all light sources in the scene. One shadow ray is generated per light source. In case there is no intersection along the shadow ray s path, the light from the specific light source is reaching the hit point. If there is an intersection found, another object is located in the path between the light source and the hit point. Both cases cause the pixel color to change (becoming

19 2.2. THE RAY TRACING RENDERING ALGORITHM 7 either lighter (no obstacle between the origin and the light source) or darker (obstacle casting a shadow)). Shadow rays do not generate any new shadow or primary rays. This approach is unable to display any kind of reflection or refraction because no new primary rays are generated. Also the rendered shadows are not diffuse, resulting in sharp borders between the shadow ed and light ed area Recursive Ray Tracing Today the most common approach for ray tracing involving recursive calls has first been described by Turner Whitted [10] in Additionally to the ray casting described by Arthur Appel, it was now possible to generate secondary rays accounting for reflections and refractions. If a primary ray intersects an object, not only shadow rays but also reflection and refraction rays are generated. An example is shown in Figure 2.2, where a primary ray hits a glass, which is partly reflective and refractive. Figure 2.2: Recursive Ray Tracing: Generating secondary rays at t hit of the primary ray (Courtesy of Wald [8]). In the left image of Figure 2.2 a reflective secondary ray is generated. The direction d of this ray is determined by the normal vector at the hit point of the primary ray and the reflection material property of the glass. The origin o of the secondary ray is the same as the position of the primary ray at t hit. The same happens with the refraction secondary ray, for which d is based on the materials refraction property. Both generated secondary rays act like new primary rays, which means, that they can also generate shadow rays, when hitting a new object as well as new secondary rays. The only difference is, that the color, generated by all secondary rays, also affects the color of the primary ray and therefore the pixel in the image Distribution Ray Tracing Recursive ray tracing already implemented the possibility to generate reflections, refractions, and shadows. A drawback was the fact, that neither smooth shadows nor blurs or similar effects were supported.

20 8 CHAPTER 2. BASIC PRINCIPLES OF RAY TRACING These restrictions have been removed by Cook et al. [11]. He started by modeling all these effects with a probability distribution, which allowed computing of i.e. smooth shadows via stochastic sampling. Glossy reflections for example can then be computed by stochastically sampling this distribution and recursively shooting rays into the sampled directions. However, in order to achieve a sufficient quality, this technique requires a relatively large number of samples and is usually quite costly Shader Ray Tracing Till now all described algorithms are generating the color of a ray based on the color and material properties of the objects themselves or any generated secondary ray hit. Nowadays programmable shaders are already an important tool in creating even more realistic materials or other effects in traditional rasterization techniques. So it seemed straight forward to also use the same approach in ray tracing with the extended ability, that a shader is also able to generate new secondary rays. This made it possible to separate the shading process from the actual ray tracing process. Additionally one shader does not need information from any other shader, so combining shaders is also easily possible (i.e. a piece of wood visible through a liquid environment, featuring a wood surface shader, combined with a water environment shader). Using this approach several different shader classes can be identified: camera, surface, light, environment, volume, and pixel shaders as described by Wald [8]. Camera Shaders Camera shaders are responsible for generating and casting primary rays. This enables the programmer to generate different kinds of camera views, i.e. using a shader to simulate a fish-eye lens. Also all special effects, which affect the whole image, can be placed within a camera shader (i.e. motion blur, depth of field etc.). Surface Shaders Each object in the scene has its own shader determining, what happens if a ray hits. The surface shader also takes care of generating shadow rays and adjusting its pixel color value depending on the results. Also additional reflection and refraction rays might be generated by this type of shader. Light Shaders A light shader takes care of any shadow ray, which reaches the specific light source. This enables the ray tracer to support a wider range of light sources (i.e. different shapes and colors). Environment Shaders All rays leaving the scene without hitting any object are taken care of by the environment shader, which can be used for example to simulate a cloudy sky.

21 2.3. RAY TRACING VS. RASTERIZATION 9 Volume Shaders, Pixel Shaders etc. Volume shaders are used to compute attenuation of a ray, if travelling between two objects. That way different environments like water can be simulated, perhaps by changing the rays direction, when traversing through the water. Pixel shaders can take care of post-imageprocessing (i.e. tone mapping). 2.3 Ray Tracing vs. Rasterization As described on the previous pages it seems, that ray tracing has all the advantages on its side: simple algorithms, realistic reflections and refractions, programmable shaders, and much more. Ray tracing has significant advantages compared to rasterization. Taking a look on rastarization used by todays graphics cards, it is obvious, that most effects (i.e. shadows, reflections etc.) can only be computed by using programming tricks. For instance to compute the shadow of an object, a scene needs to be rendered at least twice, because the pixels and vertices do not know if they are within a shadow or not during the first rendering. Due to that, programming new shader effects becomes more and more complicated and costly for the developers. Another drawback is the fact that such shaders are often only approximations of the real effects, which means that the generated images are not physically correct (i.e. reflections). Different shaders also cannot simply be combined in rasterization. Still, rasterization has its advantages. The used technique made graphics cards so cheap, that a new mass market has been created. All computer games, released today, use the advantages of fast graphic boards as well as the simple programming, to bring more realistic virtual worlds to life. So the question is not, whether ray tracing is replacing rasterization, but when, and whether there might be some intermediate stage with rasterization and ray tracing working together, each doing, what they can do best. It is a fact that GPUs are getting faster every year, but more importantly the restrictions in programming them are also diminishing. So GPUs might turn out to be the perfect ray tracing processors in a few years. More information regarding this topic can be found in Chapter 5 and 6 of this document. Some interesting insights into the fight between rasterization and ray tracing can be found in a script based on a panel hold at SIGGRAPH Several panelists from NVidia, ATI, Silicon Graphics and the Saarland University discussed the question, when and if ray tracing is going to replace rasterization [12]. Though the stated facts are only personal opinions.

23 Chapter 3 Acceleration Methods The last chapter showed, that ray tracing algorithms are fairly simple copies of what happens in a real room at the moment when the lights are turned on. One of the main drawbacks is still the needed rendering time. In 1980 Whitted already discovered, that about 95% of his computation time is taken up by intersection tests [10]. Therefore making ray tracing faster is mainly possible by speeding up these tests. Some of the proposed improvements are described on the following pages, started by a way to reduce the number of rays sent into the scene. Graph theory and trees make up a big proportion of further improvements as shown later in this chapter. 3.1 Computing Less Samples in the Image Plane The first approach to reduce the number of intersection tests is clearly a reduction of the number of primary rays sent into the scene. The following list is far from being complete, considering the large amount of people working on the acceleration problem. One method described by Glassner [13] manages to reduce the number of primary rays by sampling the image plane adaptively. Instead of tracing a ray through each pixel of the plane, a fixed spacing is used to subsample the image. The colors of the pixels between the spacing are determined via a given heuristic based on the colors of the adjacent pixels. However, this works best, if the geometry in the scene is quite large. Highly detailed objects, and high frequency features (i.e. textures) suffer from using this method, which might result in certain artifacts, especially in animations as described by Wald [8]. Another method named Vertex Tracing was first described by Ullmann et al. [14]. He also gives up tracing rays through all pixels in the image plane, and instead only sends rays into the image targeting the corners of visible vertices. The colors between these corners are then interpolated using standard rasterization on the graphics card. This can reduce the number of rays in scenes with simple geometry significantly, but breaks down for highly-detailed objects with a lot of triangles. Additionally this technique has similar problems as the one described by Glassner in the previous paragraph. By using interpolation, fine details or high frequency features might be lost, resulting in poor image quality. 11

24 12 CHAPTER 3. ACCELERATION METHODS 3.2 Reducing the Number of Secondary Rays Till now the number of primary rays have been reduced, but these only make up a small portion of the rays actually traced in the image, more exactly one for each pixel in the image plane. Computing secondary rays to handle reflections and refractions pose a bigger problem, perhaps resulting in dozens of secondary rays for each primary ray depending on the number of lights or reflecting and refracting objects in the scene. One approach is to reduce the number of shadow rays. These only need to check, if there is an object between their origin and the targeted light source. A cache technique called Shadow Cache was proposed by Haines and Greenberg [15] in In case a shadow ray is not able to reach a certain light source, due to an object in the rays path, the targeted light source remembers this object in a cache. The next ray shot at the same light source is first tested for intersection of the cached object due to the fact, that certain rays are often all occluded by the same object. However this algorithm quickly breaks down in scenes with highly detailed geometry, because one triangle is less likely to occlude more than one shadow ray. In 2002 Fernandez et al. [16] proposed Local Illumination Environments (LIE) subdividing the whole scene into a set of voxels. Each voxel stores information, on how different light sources influence this region of space. The LIE voxels and information need to be precomputed before the actual ray tracing starts, but if this is done the number of shadow rays can be significantly reduced (i.e. by skipping some light sources, which do not have any influence in the respective LIE voxel). Another technique described by Wald in his PhD thesis [8], focuses on smooth shadows, which can produce a fairly large amount of shadow rays to be approximated (see Section 2.2.3). Single Sample Soft Shadows regard all light sources in the scene as point lights, which reduces the number of casted shadow rays to one. This point light is then attenuated depending on how narrowly the ray misses potential occluders. Still, with the algorithm only approximating the problem, it creates convincing shadows. More possibilities to reduce shadow rays are described by Wald in his PhD thesis [8]. 3.3 Accelerated Intersection Tests The preface of this chapter referred to the fact, that 95 % of the ray tracing computation time is spent in intersection tests. The previous paragraphs reduced the number of rays but had no impact on the real time needed for these tests. The following paragraphs describe common approaches to speed up the intersection by using certain acceleration structures Primitive-specific Tests There are many known algorithms aiming at fast primitive intersection tests (i.e. triangletriangle, line-line, etc.). As these algorithms are commonly available and not a ray tracing specific problem, they are not described in this document.

25 3.3. ACCELERATED INTERSECTION TESTS Bounding Volumes Bounding volumes are an easy approach to speed up intersection tests. Every object in the scene is surrounded by a simple bounding volume (i.e. a rectangle). The first intersection test of a ray is computed against the bounding volumes of the objects in the scene. If the ray does not intersect the volume, the whole object in the volume can be skipped. Problem is the fact that simple bounding volumes might fail to approximate the shape of certain geometry they enclose (i.e. spheres). The ray might still miss the sphere, but intersects the bounding volume creating unneeded intersection tests against the sphere s triangles. Therefore simple bounding volumes are not used in ray tracers Spatial Subdivision Schemes Spatial subdivision techniques divide the three-dimensional space of the scene into a finite number of voxels, which do not overlap each other. Each voxel keeps information on which primitives (i.e. triangles) it contains. The subdivision is performed by taking the scene space and dividing it into two areas. This division can then be called recursively until a certain division depth is reached, or if the subareas only contain a given minimum number of triangles. If a ray is shot into such a scene the spatial acceleration structure sequentially iterates through all encountered voxels. All primitives within an encountered voxel need to be intersected with the ray. As soon as an intersection with a primitive is found, the ray tracing algorithm can be terminated, skipping all further voxels. This works just fine as long as a triangle only belongs to one spatial region in the grid. Due to the fact, that spatial subdivision divides the space and not the geometry, it is common, that a triangle is part of two adjacent grid regions. In that case early ray termination might result in errors, because certain intersections with geometry after the current voxel might be missed (see Figure 3.1). In the shown example voxel 1 holds a reference to triangle B, because at least a part of it is contained in this voxel. If the ray is intersected with the geometry in voxel 1, it is tested against the complete triangle B resulting in a hit. In case the traversal is stopped, the intersection with triangle A in voxel 2 is lost, which would have occured before the hit of triangle B. A solution to this problem would be to avoid overlapping geometry. A triangle leaping in both voxels could simply be divided into two independent triangles, each fully contained in one voxel. However this possibility is rarely used because it might generate much more triangles raising the memory requirements. Solving this problem is easy. If a ray hits a primitive, which is contained in two voxels, additional intersections have to be computed by also intersecting the ray with all primitives contained in both voxels. However, this might result in double computations of the triangles contained (i.e. intersecting triangle B two times). To avoid this, a technique called mailboxing has been introduced by Amanatides and Woo in 1987 [17]. A unique id number is assigned to each ray. If a triangle is tested for intersection with a ray, it remembers the ray s id number. During the next intersection test with the same triangle it is first checked, if the id number matches the one of the current ray. If it does, the intersection can be skipped because it has already been computed.

26 14 CHAPTER 3. ACCELERATION METHODS Figure 3.1: Spatial Subdivision: Triangle B belongs to multiple voxels. Stopping the ray traversal early would mean to miss the intersection with triangle A (Courtesy of Thrane [20]). However, mailboxing creates problems, if using multithreaded implementations. If rays are traversed through the spatial structure in multiple threads, remembering the last ray id in the triangle, might be pointless, because another ray is tested for intersection inbetween, trashing this kind of caching as described by Wald [8]. A solution for this problem is hashed mailboxing, though less efficient, but preferable if many threads are used or memory is scarce [18]. Only two spatial subdivision approaches, uniform grids and kd-trees are presented on the next pages, as these are the most commonly used. Uniform Grid The uniform grid was first described by Fujimoto et al. in 1986 [19] and follows the idea of spatial subdivision schemes as described before. Before starting the grid s construction, a resolution for all three axis of the grid has to be determined. The best parameters for this resolution are depending on the scene geometry. More voxels mean, that there are only few triangles to intersect per voxel, but this causes longer grid traversal. However, less voxels result in more intersection tests due to more triangles contained in each voxel. Several different ideas exist, described by Thrane et al. [20], but it is still necessary to tweak the resolution by hand depending on the scene. To traverse the grid, the 3D version of the 2D line drawing algorithm is used, known as 3D digital differential analyzer (DDA) algorithm (see Figure 3.2). Examples are described in detail by Fujimoto et al. [19] and Thrane et al. [20]. Christen [21] also included several code examples in his Diploma thesis. KD-Trees As the name already suggests kd-trees are a version of spatial subdivision structures arranged in a tree, more exactly a binary tree. The primitives (i.e. triangles) of the scene s geometry are stored in the tree s leaves. This specialization of binary trees has first been described by Bentley in 1975 [22].

27 3.3. ACCELERATED INTERSECTION TESTS 15 Figure 3.2: Uniform grid: Traversal of an uniform grid (Courtesy of Havran [9]). Constructing a kd-tree begins with a bounding box sourrounding the whole scene and its triangles. First a splitting plane is chosen, dividing the bounding box in half, which creates two child nodes for the tree. All primitives of the original box get assigned to the new child node which now contains the primitive. Primitives contained in both new boxes get referenced in both child nodes. This procedure can be repeated recursively till certain criteria are met: Either a set maximum tree depth has been reached, or the number of primitives in a node dropped below a certain threshold. Choosing the position of the splitting plane is the main problem during the tree s construction. Several approaches have been taken in calculating the optimal bounding box division. Some are described by Thrane et al. [20]. Extensive research into this problem has also been done by Havran in his PhD thesis [9]. An example construction is shown in Figure 3.3. Figure 3.3: KD-Tree: Three steps of the tree construction (Courtesy of Havran [9]).

28 16 CHAPTER 3. ACCELERATION METHODS Traversing the tree starts at a given node N (i.e. the root node). If N is a leaf node, all primitives referenced in the leaf are tested for intersection with the ray. If N is an internal node, the child node first intersected by the ray is called recursively. In case an intersection is found, it can be returned as the nearest one. If there was no intersection detected in the sub-tree, the recursion goes back to the next branch and calls this child node recursively Bounding Volume Hierarchies Bounding volume hierarchies (BVH) differ from spatial subdivision techniques by dividing objects, not the scene space. As described on page 13, simple bounding volumes enclose a certain geometry. Intersection testing against such a bounding volume is easier and faster than testing against the enclosed primitives. If a ray is not intersecting the bounding volume, it can also not intersect any primitive included. Main advantage of BVHs is the fact that the used bounding boxes are faster to test for intersection than the included geometry, compared to uniform grids or kd-trees, which always need to test the triangles in the current ray area. As Thrane et al. point out [20], that in practice the most widely used bounding volume for BVHs is the axis aligned bounding box (AABB). The AABB makes up for its loose fit by allowing fast ray intersection. It is also a good structure in terms of simplicity of implementation. Glassner gives an overview over studies done on more complex forms [13]. The construction of a BVH depends on the contents of the scene (similar to kd-trees). A good overview and pseudo code listing is given by Thrane et al. [20] using the traversal quality of a BVH as a quality criteria. If traversal is cheap, many intersection tests can be skipped very fast. The traversal itself is done using recursive descent, similar to kd-trees with a little change, that all child nodes need to be investigated. This is based on the fact that the created bounding volumes can overlap each other. The child nodes in the tree also follow no sorting by default. However, whether sorting is really improving the intersection time is still not clear, both sides have their supporters. An example tree for the rendering of a solid cow is displayed in Fig The created bounding volumes (spheres in this case), dividing the cow, are shown in Fig Each image corresponds to a level in the BVH tree. Figure 3.4: BVH: Tree for cow example (Courtesy of Somchaipeng [23]).

29 Figure 3.5: BVH: A solid cow and the levels in its bounding volume hierarchy (Courtesy of Somchaipeng [23]).

31 Chapter 4 Acceleration Methods for Dynamic Scenes The acceleration methods mentioned in Chapter 3 have been developed in the early years of ray tracing, especially designed to render static scenes. This results in certain restrictions for interactive ray tracing, as the rendered scenes are only suitable for static walkthroughs. All described acceleration structures need to be re-created every time the scene changes (i.e. due to unpredictable user interaction), at worst-case for every frame, rendering them unfeasible due to their high construction costs. This is a major disadvantage for applications like interactive simulations or games, which need to react to user interactions. Latest researches have concentrated on this aspect of ray tracing acceleration. Different approaches are presented on the next pages, as well as a benchmark independently developed to stress-test available ray tracers, creating a common performance basis. 4.1 Dynamic Bounding Volume Hierarchies The first approach has been presented by Wald et al. [24] early They used a bounding volume hierarchy (BVH) as described in Section 3.3.4, rendering a deformable scene (see Fig. 4.2). Deformable scenes include moving triangles, but no triangles are split, created, or destroyed over time. The entire scene is ray traced using a single BVH, whose topology is constant for the whole animation. Figure 4.1 shows two child nodes of the BVH tree. When the objects move, a BVH can keep the same hierarchy, and only needs to update the bounding volumes. Though the new hierarchy mai not be as good as the old one, it will always be correct. In contrast to spatial subdivion structures (i.e. uniform grids, kd-trees), the BVH subdivides the object hierarchy, which is more robust over time, than a given subdivision of space. As a result, a BVH can be quickly updated between frames avoiding a complete per frame rebuilding phase. Before getting into more detail, on how a BVH can be used for a dynamic scene, Wald et al. first described how BVHs can be made faster for static scenes. Till now kd-tree implementations are still superior in speed compared to BVHs. 19

20 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES Figure 4.1: Dynamic BVH: Two childnodes of a BVH tree. Bounding volumes move over time (left to right) (Courtesy of Wald [24]).

Times ranging up to about 30 seconds for a complex scene have been measured in case the BVH is re-built for every frame. Applying the explained dynamic update, these times could be reduced to about 0.

32 20 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES Figure 4.1: Dynamic BVH: Two childnodes of a BVH tree. Bounding volumes move over time (left to right) (Courtesy of Wald [24]). In order to keep their algorithm open for user-based interactions, they did not base their approach on the knowledge of all possible deformations of a model. Times ranging up to about 30 seconds for a complex scene have been measured in case the BVH is re-built for every frame. Applying the explained dynamic update, these times could be reduced to about seconds per frame. The shown scene was ray traced at 3.7 frames per second on a dual-2.6 GHz Opteron desktop PC including shadows and texturing. Figure 4.2: Dynamic BVH: Rendered scene using triangles, rendering at 1024x1024 pixels (Courtesy of Wald [24]). 4.2 Coherent Grid Traversal Another possibility to accelerate ray tracing was also presented by Wald et al. [25] early in He used a uniform grid together with ray packets, frustum testing and SIMD 1 extensions. 1 SIMD, Single Instruction Multiple Data is a set of operations for efficiently handling large quantities of data in parallel, as in a vector processor or array processor. First popularized in large-scale supercomputers (as opposed to MIMD parallelization), smaller-scale SIMD operations have now become widespread in personal computer hardware. Today the term is associated almost entirely with these smaller units.

33 4.2. COHERENT GRID TRAVERSAL 21 A key feature of this algorithm is to exploit the nature of a packet of rays with almost the same direction. These coherent rays are then traversing through the grid in one single package, because the grid elements, which they are visiting and intersecting, are most likely the same. The algorithm first computes the packet s bounding frustum (see image a in Figure 4.3), which is then traversed through the grid one slice at a time (see image b). For each slice (blue), the frustums overlap with the slice (yellow) is computed, which determines the actual cells (red) overlapped by the frustum. Picture c shows, that each frustum traversal step requires only one four-float SIMD addition to incrementally compute the minimum and maximum coordinates of the frustum slice overlap, plus one SIMD float-to-int truncation to compute the overlapped grid cells. Viewed down the major traversal axis (see d ), each ray packet (green) will have corner rays, which define the frustum boundaries (dashed). At each slice, this frustum covers all of the cells covered by the rays. Figure 4.3: Coherent Grid Traversal: Computation steps (Courtesy of Wald [25]). The computed frustum is then used to improve the triangle intersection. As shown in Figure 4.4, a grid (see b) does not adapt as well to the scene geometry as a kd-tree (see a). This causes the grid to often intersect triangles (red), which a kd-tree would have successfully avoided. These triangles however usually lie far outside the view frustum, and can be inexpensively discarded by inverse frustum culling during frustum-triangle intersection. Aside of these basics more detailed information are given on the re-creation of the grid for every frame due to the dynamic nature of the supported scene.

34 22 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES Figure 4.4: Coherent Grid Traversal: Adaption to the scene geometry of kd-tree (a) and grid (b) (Courtesy of Wald [25]). 4.3 Distributed Interactive Ray Tracing Another approach created by Wald et al. [26] involved the creation of a scene graph similar to the ones used in OpenGL implementations. This method separates the scene into independent objects with common properties concerning dynamic updates. Three classes of objects were identified: Static objects are treated as usual, objects undergoing affine transformations are handled by transforming rays, and objects with unstructured motion are rebuilt whenever necessary. The approach is based on the observations made by Lext et al. [28] of how dynamic scenes behave: Large parts of a scene often remain static over long periods of time. Other parts undergo well-structured transformations like affine transforms. Yet other parts are changed in a totally unstructured way. This common structure within scenes can be exploited by maintaining geometry in separate objects according to their dynamic properties, and handling the various kinds of motion with different, specialized algorithms, that are then combined into a common architecture. Each object can consist of an arbitrary number of triangles. It has its own acceleration structure and can be updated independently of the rest of the scene. Of course an additional top-level acceleration structure must then be maintained, which accelerates ray traversal between the objects in a scene. Each ray then first starts traversing this toplevel structure. As soon as a leaf is found, the ray is intersected with the objects in the leaf by simply traversing the respective objects local acceleration structures. An example can be seen in Figure 4.5, where the robots are divided into different parts, which are coded in different colors. Each part is represented by its own acceleration structure, integrated into a top-level structure (see Figure 4.6). In the given example (see Figure 4.6) a close relative to the KD-tree, the binary space partitioning (BSP) is used. A top-level BSP contains references to the instances of the objects. Additionally in a second level each sub-object is again represented by its own BSP tree. Another advantage of this structure is the fact that equal objects only need to be loaded once, as the top-level BSP is only working with references (i.e. a forest of hundreds of the same trees is represented by a single instance of the tree), reducing memory consumption. The paper especially focuses on the problem of ray tracing in a distributed environment, i.e. one master server sending ray tracing data to all connected client PCs for computation.

Objects consist of geometry and a local BSP tree (Courtesy of Wald [26]). Therefore the bottleneck was mainly located in the communication between the different clients and the master server.

35 4.4. BENCHMARKING ANIMATED RAY TRACING 23 Figure 4.5: Distributed Ray Tracing: Robots (left) with color-coded objects (right). Triangles of the same object belong to the same color (Courtesy of Wald [26]). Figure 4.6: Distributed Ray Tracing: Two-level hierarchy with a top level BSP-tree containing references to instances. Objects consist of geometry and a local BSP tree (Courtesy of Wald [26]). Therefore the bottleneck was mainly located in the communication between the different clients and the master server. However, the combination of different acceleration structures as some sort of scene graph showed some improvements and possibilities. 4.4 Benchmarking Animated Ray Tracing While reading the papers on new approaches to accelerate ray tracing, especially dynamic scenes, it is a challenge to compare the results presented in these documents. To test their approaches, every scientist creates own, often unique, test scenes, measuring the frame rate. The problem is to judge the significance of these results with the next paper, offering different test scenes revealing drawbacks, which did not show up in the first scenes. To remedy this problem, a benchmark for animated ray tracing (BART) was proposed by Lext et al. [28], to measure and compare performance and quality of ray traced scenes, that are animated. BART is a suite of test scenes, placed in the public domain, designed to stress ray tracing algorithms, where both the camera and objects are animated parametrically. Also rules on how to measure performance and the error in the rendered images, if using approximating algorithms, are described. Previously there has only been one recognized benchmark, the Standard Procedural Database

24 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES (SPD) created by Haines [29] in 1987.

To construct a widely-usable benchmark, Lext et al. first identified, what stresses existing ray tracing algorithms and thus decreases performance.

Hierarchical animation using translation, rotation, and scaling Unorganized animation (i.e. not just combinations of translations, rotations, scalings) Teapot in the

7: BART: Images from kitchen (left) and robots (right) (Courtesy of Lext [28]) Figure 4.8: BART: Images from museum (Courtesy of Lext [28]) Lext et al.

36 24 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES (SPD) created by Haines [29] in With ray tracing focused on static scenes in these days, the benchmark primarily also targeted single static images and walkthroughs. To construct a widely-usable benchmark, Lext et al. first identified, what stresses existing ray tracing algorithms and thus decreases performance. The goal was to implement each of these potential stresses into the benchmark resulting in the scenarios described in the following list. Hierarchical animation using translation, rotation, and scaling Unorganized animation (i.e. not just combinations of translations, rotations, scalings) Teapot in the stadium problem Low frame-to-frame coherency Large working-set sizes Overlap of bounding volumes or overlap of their projections Changing object distribution Figure 4.7: BART: Images from kitchen (left) and robots (right) (Courtesy of Lext [28]) Figure 4.8: BART: Images from museum (Courtesy of Lext [28]) Lext et al. also include reasons of why they think, that these scenarios are best suited to stress-test many different ray tracing algorithms. Test scenes aiming at these problems have also been created, called kitchen, robots, and museum (see Figures 4.7, 4.8). All the needed data, as well as sample parsers and additional source code is available for download at the BART homepage [30].

37 Chapter 5 Approaches to realize Realtime Ray Tracing Now with the basics and acceleration structures described in Chapters 2, 3 and 4, the next pages are used to shed some light on already existing approaches to realize realtime ray tracing. These implementations differ not only in the used algorithms (i.e. acceleration structures), but most importantly in the needed hardware. Approaches solely relying on CPU computing power are described, as well as CPU-GPU-hybrid- and sole GPU-implementations. The chapter is concluded by information, regarding some special purpose hardware architectures, solely created for the purpose of tracing rays. 5.1 Software Approaches Realtime ray tracing systems running on the CPU are common. In order to run these at an interactive frame rate, two different problems have to be solved. First, the best ray tracing algorithm and acceleration structure need to be chosen, paying careful attention on implementing them optimally on the given hardware. Second, even the best CPU or algorithm cannot deliver frames at interactive rates today. To do that, the nature of the algorithm for parallel processing must be exploited by using a shared-memory architecture, a cluster of PCs, or multi-cpu computers working together. The ray tracing algorithm can trivially be expanded to support parallelization with the problems starting, when it gets to the communication and synchronization. As shared-memory systems support fast inter-processor communication and synchronization with little programming effort, the first ray tracing approach has been implemented on such structures. The chase to achieve interactive frame rates has been won in 1995 by Muuss [31]. A full-featured ray tracer with shadows, reflections, refractions and shaders has been developed in 1999 at the university of Utah by Parker et al. [32]. However, the used shared-memory systems are quite costly and therefore only in limited use, i.e. at universities. Small companies cannot afford these and have to rely on standard PCs and PC clusters as they are readily available and cheap. Compared to the described supercomputers, such systems have serious drawbacks, when it comes to inter-processor communication. 25

38 26 CHAPTER 5. APPROACHES TO REALIZE REALTIME RAY TRACING Additionally they have less memory, less communication bandwidth, and a higher latency. The first implementation using PC clusters has been completed at the Saarland university by Wald et al. in 2001 [33]. Wald used a client-server approach to overcome the small bandwidths offered by the cluster s PCs. The server is not computing any data by itself, but solely handles the distribution of image parts and geometry to his clients. The limited client memory posed its own unique problems, when rendering came to massively complex models consisting of several gigabytes of data. To avoid the complete geometry being copied to all clients, a transparent software caching layer has been implemented, loading data from the server if required (see Section 8.1.2). More information can be found in Wald s PhD thesis [8]. 5.2 Using Programmable GPUs Graphics cards have included support for programmable shaders a few years ago in an effort to increase the realism of their renderers. This led to a great amount of flexibility, transforming the fast GPU into a parallel working co-processor. Since this time, programmers have tried to exploit the computing power of the GPU for other purposes than their designers originally intended (i.e. Fast-Fourier-Transformations etc.). Many examples for such implementations can be found at the GPGPU homepage [34]. With dual and quad GPU solutions entering the market (NVidia SLI [35], ATI Crossfire [36]) the computing power can be doubled or quadrupled easily. However, programming the GPU is still severly limited, compared to the freedom if working on the CPU, but these constraints diminish more and more with each new shader model and graphics card generation. In 2002 Carr et al. [37] followed an idea, to use the GPU for ray-triangle intersections. Due to the limits in programming the GPU at that time, when the need came to flow control, he used a CPU-GPU-hybrid implementation. To feed the GPU with the necessary intersection data, the CPU is used to reorganize the rays into efficient structures, because the ray tracing algorithm performs best, if intersection testing is done for groups of coherent rays. In the end the frame rates rivaled single-cpu ray tracing implementations of that time. However, a lot of performance was lost because the approach required too much communication between the CPU and GPU in both directions, which often does not pay off due to the high communication cost. For example the cost of sending the data for a ray over the PCI bus is rather high compared to just performing the intersection on the CPU itself. Even worse, the traversal algorithm on the CPU depends on the results of the intersection computations, requiring a read-back from the GPU, which is both rather slow and has very high latencies [8]. Purcell et al. [38] [39] managed to map the complete ray tracing algorithm to the GPU, without the need to run a part of the code on the CPU in Looking at the GPU as a stream processor, he subdivided the ray tracing process in streams and kernels. Streams are regarded as a flow of data, one can read from or write to. The processing work is done by the kernels, each of which having an input- and output-stream. Several kernels are used in a row to perform the ray tracing algorithm. Still his conclusion was, that ray tracing on the GPU is not very much faster than equal implementations on a CPU. Ray tracing on the GPU, especially the approach used by Purcell et al. [39] is described in more detail in Chapter 6 of this thesis.

5.3. SPECIAL PURPOSE HARDWARE ARCHITECTURES 27 5.

39 5.3. SPECIAL PURPOSE HARDWARE ARCHITECTURES Special purpose Hardware Architectures Similar to todays graphics cards produced by NVidia, or ATI, creating an accelerator card especially designed to run the ray tracing algorithm has also been considered. Several approaches have been developed over the years presented on the following pages. One of the first accelerator cards is named VIZARD (Visualization Accelerator for Real- Time Display), developed in 1998 at the university of Tübingen in Germany. The main aim by Meißner et al. [41] was to accelerate real-time volume rendering, which is often used in medical and scientific applications. Rendering such sampled data is still a challenging task for the CPU. The used ray tracing algorithm was simplified by only using ray casting. In ray casting only primary rays are generated, without any secondary rays and therefore no reflections and refractions. Also shading was not implemented. However, it was possible to define cut-planes, to look inside the volume data, especially useful for medical applications. An example rendering of a lobster can be seen in Figure 5.1. A rate of 10 frames per second could be reached for datasets containing ( ) voxels, casting (65.536) rays. Figure 5.1: VIZARD: A rendered volume dataset of a lobster (Courtesy of Meißner [41]). A year later, in 1999, Mitsubishi Electric developed the VolumePro Real-Time Ray-Casting System [42]. Interactive rendering of volume datasets was also a main goal of this accelerator card. Additionally to optical improvements the main advantage over VIZARD was VolumePro s higher rendering speed, reaching up to 30 frames per second (see Figure 5.2). A more general approach was taken in 2002 with the developed 3DCGiRAM architecture. Created by IBM Japan in conjunction with several local universities, 3DCGiRAM featured interactive ray tracing of a 3D scene, including reflections and refractions [43]. Running at 333 MHz, simulated frame rates of about 18 frames per second have been measured. A promising approach, called SaarCOR, has also been created at the Saarland university in Germany. With the first paper presented in 2004 [44] Schmittler et al. showed, that real time ray tracing was possible using an acceleration card running at 90 MHz (later at 66 MHz).

28 CHAPTER 5. APPROACHES TO REALIZE REALTIME RAY TRACING Figure 5.

The advantage, compared to previously described approaches, is SaarCOR s general usability.

) as well as texturing and programmable shaders, it offers all features needed in modern computer

Created in conjunction with OpenRT (see Chapter 7) it is easily possible to program SaarCOR in a

40 28 CHAPTER 5. APPROACHES TO REALIZE REALTIME RAY TRACING Figure 5.2: VolumePro: Rendered medical volume data (Courtesy of Mitsubishi Electric [42]). The advantage, compared to previously described approaches, is SaarCOR s general usability. Supporting all kinds of rays (primary, secondary, shadow etc.) as well as texturing and programmable shaders, it offers all features needed in modern computer graphics. Created in conjunction with OpenRT (see Chapter 7) it is easily possible to program SaarCOR in a common shader language style. More information can be found in a paper by Woop et al. [45] and the PhD thesis of J. Schmittler describing the hardware architecture in great detail [46]. Figure 5.3: SaarCOR: Scenes rendered at 4.5 and 7.5 fps on the 66 MHz prototype (Courtesy of Woop [45]).

41 Chapter 6 Using the GPU for Ray Tracing A short introduction, describing several possibilities to use the GPU for ray tracing, has already been given in Section 5.2. On the following pages the approach developed by Purcell et al. [38] [39] will be described in more detail. So far it has been the most successful attempt to map the entire ray tracing algorithm onto the graphics card. After the description of the basics, the next step is to choose a fitting acceleration structure (to get an overview of these structures see Chapter 3). The criteria are somewhat different compared to CPU implementations, because several limitations in programming the GPU make some structures more or less useful. At the end of this chapter a comparison is made between the CPU and GPU approaches presenting some benchmarks. 6.1 Stream Computation The stream computation as described by Purcell et al. [38] [39] is a specific way to abstract the GPU in order to program it. The stream programming model constrains the way, software is written, such that locality and parallelism are explicit within a program. This model consists of programs called kernels and separate data streams (see Figure 6.1). Computation is carried out arranging input data in a stream and feeding this stream to a number of processors, each executing a kernel on the stream elements one by one. The results of each kernel invocation are placed in an output stream. Figure 6.1: The stream programming model (Courtesy of Purcell [39]). 29

42 30 CHAPTER 6. USING THE GPU FOR RAY TRACING An easy to use stream programming language for the GPU is called BrookGPU [47]. The main advantage is that any implementation written in Brook can be compiled either for normal CPUs or for GPUs, without the need of reprogramming. Purcell pointed out, that his ray tracing approach was recently reimplemented in BrookGPU within only a couple of days. In order to apply the stream programming model to the GPU, a closer look must be taken at the processing pipeline of a graphics card (see Figure 6.2). The grey boxes in the image show, where the programmable vertex and fragment engines are located. Also the input, respective output stream data types at each stage are shown. Figure 6.2: The programmable graphics pipeline (Courtesy of Purcell [39]). The vertex processor of a graphics card is a programmable unit, that operates on incoming vertex attributes, such as position, color, texture coordinates and so on. The vertex program stage is generally used to transform the vertices from model coordinates to screen coordinates using matrix multiplication. After the rasterization is complete the fragment processor is called. In addition to the possibilities of the vertex processor, the fragment processor offers texture operations to access images. This stage is generally used to modify the color of each fragment with texture mapping or other mathematical operations. A current state of the art graphics card from NVidia, GeForce 7800 GTX, features 24 fragment and 8 vertex pipelines. As the numbers indicate, the main processing power of the GPU lies in the fragment processors. Therefore the kernels of the stream programming model are implemented as fragment programs, with input and output streams realized as textures. According to the stream programming model, the ray tracing algorithm has been broken down by Purcell into four different kernels (see Figure 6.3). The eye ray generator kernel produces

43 6.1. STREAM COMPUTATION 31 a stream of viewing rays. Each viewing ray is a single ray corresponding to a pixel in the image. The traversal kernel reads the stream of rays produced by the eye ray generator. It then steps a ray through the grid, until the ray encounters a voxel containing triangles. The ray and voxel address are placed in the output stream and passed to the intersection kernel. This kernel is responsible for testing a ray with all the triangles contained in a voxel. The intersector has two types of output. If a ray-triangle intersection (hit) occurs in that voxel, the ray and the triangle, which is hit, are sent to the output for shading. If no hit occurs, the ray is passed back to the traversal kernel and the search for voxels, containing triangles, continues. The shading kernel computes a color. If a ray terminates at this hit, then the color is written to the accumulated image. Additionally, the shading kernel may generate shadow or secondary rays. In this case, these new rays are passed back to the traversal stage. Figure 6.3: The kernels used in the streaming ray tracer (Courtesy of Purcell [39]). Eye Ray Generator In order to start tracing rays, the primary rays pointing from the eye into the scene have to be generated. On a GPU, the interpolation capability of the rasterizer can be used to generate all the primary rays in a single kernel invocation, as described by Thrane et al. [20]. Given the four corners of the viewing rectangle and the eye point, the four rays lining the corners of the view frustum can be computed. If the rasterizer of the graphics card now interpolates the direction of these four rays across a certain pixel region (i.e pixels), the result is, that all primary rays are generated, which are needed for an image of that size. The information about these rays can now be stored in two textures, one holding the direction of the ray, the second holding the ray s origin.

44 32 CHAPTER 6. USING THE GPU FOR RAY TRACING Traversal The traversal kernel depends on the used acceleration structure. Purcell focused on a uniform grid (see Section 3.3.3), because it was the easiest structure to implement on the GPU at that time. Other approaches use a traversal kernel depending on the acceleration structure (see Thrane et al. [20]), because they want to test different structures for their GPU applicability and performance. In any case the traversal kernel forwards data to the intersection kernel, in case a region of the scene has been reached by the ray, where triangles might be intersected. Intersection The intersection kernel takes a stream of ray-voxel pairs, which are determined by the acceleration structure in the traversal kernel. Now the ray-triangle intersection tests are computed for all triangles in the voxel. If a hit occurs, a ray-triangle pair is passed to the shading stage. Because triangles can overlap multiple grid cells (see Figure 3.1 on page 14), it is possible for an intersection point to be outside of the current voxel. The intersection kernel checks for this case and treats it as a miss. Note that rejecting intersections in this way, may cause a ray to be tested against the same triangle multiple times in different voxels. Although this can be avoided by mailboxing, which Amanatides and Woo introduced [17] (see Section 3.3.3), the multi-threaded implementation of this technique on a GPU is very difficult. Shading The shading kernel evaluates the color of the triangle at the point the ray hit the surface. Shading data consists of vertex normals and colors for each triangle. The hit information, which is passed to the shader includes the triangles ID number, which makes it possible to access the proper shading information by a simple lookup. The shading kernel optionally generates shadow, reflection, refraction or randomly generated rays, depending on the material, which the ray hit. These secondary rays are placed in the stream of rays processed by the traverser. Each invocation of the shading kernel returns both a color and a new ray for each pixel. The shading kernel also takes the color buffer output by previous shading passes as input. This makes it possible to combine the colors of successive specular surfaces as successive rays are traced. 6.2 Choosing the Acceleration Structure Choosing the acceleration structure might seem trivial: Simply take fastest. Till now the best results have been accomplished by using tree-like structures (i.e. kd-trees, see Section 3.3.3). Like Thrane et al. [20] have shown, tree traversal is a non-trivial problem on the GPU. They tried to traverse through a binary tree, using the estimate, that no assumptions can be made about the structure of the tree or the traversal order. As most programmers know, such a traversal can only be accomplished by using a stack. The situation becomes problematic, because in fragment programs, there is no indexable writable memory available. The only places, where data can be written to at runtime, is in temporary registers, local to the fragment

45 6.2. CHOOSING THE ACCELERATION STRUCTURE 33 program, and in the ouput buffer, where the result of the fragment program computation is stored. Several stack implementations have been tested by Thrane et al. [20] till they reached the conclusion, that efficient general purpose tree traversal is not feasible on current graphics hardware. However, they found an efficient strategy to traverse bounding volume hierarchies. Some traversal strategies for kd-trees running on the GPU are also described by Thrane, but the previous example with the binary tree shows, that kd-trees on the GPU might be a lot slower than on the CPU. Since the acceleration structures have already been described in Chapter 3 the following pages focus on the GPU implications, if using a specific structure, based on research done by Thrane et al. [20]. Uniform Grid The idea behind uniform grids has already been presented in Section Purcell also selected the uniform grid as the acceleration structure of his choice [39]. He argued, that no acceleration structure is better than the other for different scenes. Additionally the uniform grid can easily be implemented on the GPU. As Thrane points out, the computation times (i.e. for accessing a voxel) are mostly constant. Additionally no stack is needed during traversal. Detailed code listings can be found in Thrane s Master thesis [20]. The only information, which can efficiently be shared with the graphics card, is a texture. The conclusion is, that the information stored in the used acceleration structure (i.e. uniform grid) must be packed in a texture. The approach used by Purcell [39] explains how a uniform grid could be stored in such a way (see Figure 6.4). Each grid cell contains a pointer to the start of a list of triangles contained in that cell, or a null pointer, if the cell is empty. The triangle lists are stored in another texture. Each entry in the triangle list is a pointer to a set of vertex data for the indicated triangle. Triangle vertices are stored in a set of three separate textures. Figure 6.4: Uniform grid stored in several textures (Courtesy of Purcell [39]).

46 34 CHAPTER 6. USING THE GPU FOR RAY TRACING KD-Trees It was already described, that CPU and GPU implementations of the same acceleration structure might differ a lot. On the CPU, kd-trees are one of the best structures to accelerate ray tracing as determined by Havran in his PhD thesis [9]. The problem of the missing stack to traverse the tree could be solved by using an older approach. Before recursive descent traversal was developed, kd-trees were traversed in a sequential manner. One such approach has been described by Foley and Sugerman [48]. Implementation details are given by Thrane et al. [20]. Bounding Volume Hierarchies Bounding volume hierarchies are also represented as a tree-structure. Therefore they face the same problems on the GPU as kd-trees, most importantly the missing stack. The solution to solve this problem lies in the representation of the tree in the texture. Thrane et al. [20] describe a way to build a tree-texture which can be traversed in a sequential manner (see Figure 6.5). Figure 6.5: Example for traversal of a BVH tree in a texture (Courtesy of Thrane [20]). 6.3 Benchmarks As mentioned in Section 4.4 on page 23, benchmarking ray tracing systems requires agreements on certain parameters. The frame rate achieved by a certain renderer depends on the image size, the number of triangles in the scene s geometry, and much more. Many authors create their own test scenes, with only a few, choosing scenes from the BART benchmark. However, the results presented on the following pages should give a fairly good image, of where GPUsupported ray tracing stands compared to CPU implementations.

47 6.3. BENCHMARKS 35 N. Carr et al. - The Ray Engine The following results have been reported by Carr et al. [37] benchmarking their CPU-GPU hybrid approach. They used an ATI Radeon 8500 graphics card, but gave no information, which CPU they used. Sadly Carr only described a few results, some even based on assumptions regarding future graphics cards. The only comparable results have been taken by rendering the infamous teapot (see Table 6.1). He compared a CPU-only approach with the CPU-GPU implementation and reported a speedup of about 22%. Graphics cards at that time did not support a fast read-back of data to the CPU, which was necessary for his implementation. Therefore he also measured the possibility of using an asynchronous read-back and reached a theoretical speedup of 34% compared to the CPU-only implementation. In the end he measured the theoretical speedup, in case an infinitely fast GPU is used, resulting in a 73% raise. System Rays per Sec. Speedup CPU only 135,812 plus GPU 165,098 22% Asynch. Readback 183,273 34% Infinitely fast GPU 234,102 73% Table 6.1: Benchmark: CPU-GPU Hybrid: Speedups if using the GPU to render the teapot scene. He concluded, that his ray tracer performed at speeds comparable to the fastest CPU ray tracer of that time. He combined the best features offered by the CPU and GPU (CPU for traversal of the acceleration structure and for ray coherence, and GPU for ray-triangle intersection) at the expense of a slow read-back of data to the CPU. The AGP 1 graphics bus supports high-bandwidth transmission from the CPU to the GPU, but less bandwidth for recovery of the results. Carr et al. were very anxious regarding their hybrid approach, but compared to the research, which was done during the last years it seems that no one shared their enthusiasm. Mapping the full ray tracing process to the GPU is a popular approach many people focused at, because the read-back delay to the CPU could not be overcome till the newest PCI-Express graphics cards have been available. No new benchmark results, involving such a graphics card, have been available at the time of this writing. T. Purcell - Ray Tracing on a Stream Processor Purcell made the first step to map the whole ray tracing process to the GPU [39], regarding the graphics card as a stream processor (see Section 6.1). He tested his results on an ATI Radeon 9700 Pro, running at DirectX 9, Windows XP, with Catalyst 2.3 drivers on a dual Pentium III 800 MHz machine with 1 GB RAM. Three different scenes, all not part of the 1 AGP, Accelerated Graphics Port (also called Advanced Graphics Port) is a high-speed point-to-point channel for attaching a graphics card to a computer s motherboard, primarily to assist in the acceleration of 3D computer graphics. Some motherboards have been built with multiple independent AGP slots. AGP is slowly being phased out in favour of PCI Express.

36 CHAPTER 6. USING THE GPU FOR RAY TRACING BART benchmark were used (see Figure 6.6 and Table 6.2). All images have been rendered at 256 256 pixels. Figure 6.6: Benchmark: Purcell GPU Test Scenes: Cornell box, teapotahedron, Quake 3 (Courtesy of Purcell [39]).

48 36 CHAPTER 6. USING THE GPU FOR RAY TRACING BART benchmark were used (see Figure 6.6 and Table 6.2). All images have been rendered at pixels. Figure 6.6: Benchmark: Purcell GPU Test Scenes: Cornell box, teapotahedron, Quake 3 (Courtesy of Purcell [39]). Scene Triangles Rays Framerate Cornell Box 32 S 10.5 Teapotahedron 840 E, S, R 1.0 Teapotahedron 840 E, R 1.5 Quake S 1.8 Table 6.2: Benchmark: Purcell GPU: Scene complexity and results, with eye rays (E), shadow rays (S), reflection rays (R). He concludes, that his system roughly achieves the same ray-triangle intersection tests per second as the CPU-GPU hybrid developed by Carr et al. [37]. Looking at the results Purcell reports this seems a bit odd, especially compared with the ATI Radeon 8500 of Carr and Purcell s Radeon 9700 Pro. Such a large gap between these two graphics cards should result in a high performance boost compared to the CPU-GPU hybrid. Purcell also discovered some bottlenecks, which are mainly caused by GPU limitations. The main restriction occured from the 24-bit floating point numbers on his graphics card. This only allowed to address a texture through integer calculations and caps his scene complexity at triangles at a time. Additionally he used several OpenGL extensions not available in the normal OpenGL API at that time, mainly the creation of light-weight buffers, which a fragment program can render into. These are an alternative to full screen frame buffers (p-buffers), if a full rendering context is not needed. He also concluded, that the GPU hardware does not fully support the stream processor abstraction as well as it could. The stream processor he implemented is far less general than it should be. Additionally he points out, that programming the GPU is not easy, because high level languages, debuggers, and code profilers have been missing at that time. However, Purcell put much hope into the GPU to become a full high-performance parallel co-processor.

6.3. BENCHMARKS 37 N. Thrane et al. - A Comparison of Acceleration Structures for GPU Assisted Ray Tracing Both Carr and Purcell focused on one acceleration structure (i.e. the uniform grid in Purcell s implementation).

They implemented a GPU ray tracer able to support uniform grid, kd-tree and bounding volume hierarchy as acceleration structure. They used a 3.

49 6.3. BENCHMARKS 37 N. Thrane et al. - A Comparison of Acceleration Structures for GPU Assisted Ray Tracing Both Carr and Purcell focused on one acceleration structure (i.e. the uniform grid in Purcell s implementation). The only comparison of different structures found, during the work on this document, has been given by Thrane and Simonsen in their Master s thesis [20]. They implemented a GPU ray tracer able to support uniform grid, kd-tree and bounding volume hierarchy as acceleration structure. They used a 3.2 GHz Pentium 4, with 1 GB RAM, and a NVidia GeForce 6800 Ultra at 400 MHz with 256 MB RAM. Windows XP Professional, service pack 2, was used as operating system, with the graphics card running at NVidia drivers. Apart from Carr and Purcell they used highly complex scenes as found in the BART benchmark (see Figure 6.7 and Table 6.3). More interestingly, two scenes they tested were animated (Cows and Bunny), which might give an idea, on how the tested acceleration structures compete in interactive ray tracing. Figure 6.7: Benchmark: Thrane & Simonsen: Cows, Robots, Kitchen and Bunny (from left to right) (Courtesy of Thrane [20]) Scene Triangles Cows Robots (BART) Kitchen (BART) Bunny Cornell Box (from Purcell) 32 Table 6.3: Benchmark: Thrane & Simonsen: Scene complexity. The results (see Table 6.4) show that Thrane & Simonsen implemented two different versions of kd-trees and bounding volume hierarchies. They conclude, that uniform grids were the first to be implemented on the GPU by Purcell, but have turned out to be the slowest, except for single-object type scenes. The uniform grid is not suited for scenes with high variance in geometric density because it is incapable of adapting to such changes. Additionally, traversal on the GPU suffers from a relatively large amount of data, required to represent the current

50 38 CHAPTER 6. USING THE GPU FOR RAY TRACING Cows Robots Kitchen Bunny Cornell Box Uniform grid Kd-tree (Restart) Kd-tree (Backtrack) BVH (Kay/Kajiya) BVH (Goldsmith/Salmon) Table 6.4: Benchmark: Thrane & Simonsen: Average rendering times in milliseconds per frame, including shadow and reflection rays where applicable. state of the traversal. Both variants of the kd-tree clearly outperform the uniform grid, but loose against the bounding volume hierarchies. As Thrane et al. point out, that might be the result of the simple implementation on the GPU. Kd-trees suffer from complicated traversal strategies. Both bounding volume hierarchies are easier to construct and traverse than any other structure on the GPU.

51 Chapter 7 Realtime Ray Tracing API (RTRT/OpenRT) Rasterization graphics engines often make use of standardized APIs, to ease up their usage. That way a programmer can implement his application at a higher level without the need to actually know the hardware running on the computer (i.e. DirectX, OpenGL etc.). The basic idea behind OpenRT [49] is to create an API, similar to OpenGL, so programmers can make easy use of the ray tracer without the need to actually know, how it works. OpenRT is consisting of three different parts. Primarily it is the realtime ray tracing project started at the Saarland university. Secondly it is a ray tracer at its core. The third part is formed by the API to ease up the usage. The lack of a common API severly hampers the wide-spread use of ray tracing. Dozens of ray tracing implementations can be found on the internet, but all these applications are not compatible to each other. Standardized APIs are crucial for new technologies. They allow to build a common user base, where the user can abstract his programming work from the underlying hardware. Also if programmers built their application on top of the API, certain changes and optimizations are possible in the underlying layers, without changing the API itself. Therefore all previously written programs can make immediate use of the upgrades by installing the new API version. The following list presents some key goals during the development of OpenRT: Create a highly optimized ray tracer Use acceleration structures to make interactive dynamic scenes possible Abstract from the underlying hardware (run it on single Desktop PCs, clusters, special purpose hardware etc.) Offer all features of ray tracing (i.e. shaders, complex geometry etc.). Be as similar to OpenGL as possible. 39

52 40 CHAPTER 7. REALTIME RAY TRACING API (RTRT/OPENRT) One of the first rendering engines, designed for industrial use, was RenderMan [50] created by Pixar. It has been used to create several motion pictures (some completely rendered) supporting shaders and other features. OpenRT also offers the flexibility to use different shaders in a plug-and-play manner as well as complex geometry. The shader environment is based on C++, allowing easy implementation and conversion of existing shaders. Another key feature of a renderer is the ability to run on different hardware. Exchanging lower layers, to either run on a single desktop PC, a clusters of computers, or specific purpose hardware, should not affect the application. Also due to the nature of interactive ray tracing, support and optimizations for dynamic scenes are a must-have. To ease up the usage, only triangles are used as geometric primitives. This makes handling very similar to OpenGL, although a ray tracer can support free form surfaces (see Section 8.1.3). The similarity to OpenGL should also help programmers get accustomed to the new rendering engine by keeping the learning time and effort minimal. Additionally porting existing OpenGL-based applications would then be possible without investing too much time. As of SIGGRAPH 2005 a non-commercial version of OpenRT is available on the Internet [49]. It is limited to a lower resolution and a single-desktop PC (not running on clusters), but already shows the easy installation and use of such an API. Most of the examples described in Chapter 8 are implemented in OpenRT (i.e. Quake 3 Raytraced [52], visualization of massively complex models [55], mixed reality rendering [59] [60]).

53 Chapter 8 Applications Ray tracing is in its core a concept developed to produce realistic images, supporting features like reflection, refraction, shadows and more. Therefore most of the examples and applications described in this chapter focus on the graphical usage of ray tracing. However, ray tracing is in its heart still only a fast intersection test, making byproducts, i.e. for artificial intelligence in games and simulations, feasible, which are also mentioned in this chapter. 8.1 Computer Graphic Applications With ray tracing designed to create images, the number of applications surpasses the space available for this chapter. The following pages represent some very different examples all with their unique problems and possibilities in the field of ray tracing Computer Games With more and cheaper computers available to everyone, the market for computer games has grown extensively over the past years. The big amount of money, people are spending on games, laid the base for the development of fast CPUs and graphics cards. If ray tracing could be introduced in computer games, a lot more research and money would be available for this field of technology. First steps to incorporate ray tracing in games have already been taken on by Daniel Pohl et al. at the Saarland university [52]. He based his project on the popular ego-shooter Quake 3 by id Sofware [53]. Removing the rasterization engine of the game, he and a few other students built a new ray tracing 3d-engine from scratch in about six months, using OpenRT as a base. Compared to the years of development, companies invest in the creation of a traditional game engine, half a year seems negligible. As already described much time can be saved by using ray tracing, because the rendering algorithm is automatically taking care of many effects (i.e. shadows, reflections, refractions etc.). Programmers of rasterization engines need to use a lot of tricks to bring such effects to life in their games. As already described in Section 2.3 rasterization operates on a stream 41

42 CHAPTER 8. APPLICATIONS of independent triangles, and therefore cannot efficiently render the mentioned effects in one rendering pass.

situations (i.e. multiple reflections (see Figure 8.1)) [51]. The game engine developed by Daniel Pohl et al.

More ray tracing specific effects like portals (see Figure 8.

Also a level-of-detail mechanism is not necessary to reduce scene complexity (see Figure 8.

54 42 CHAPTER 8. APPLICATIONS of independent triangles, and therefore cannot efficiently render the mentioned effects in one rendering pass. Every effect has to be split into several rendering passes by the application, mostly relying on approximations, which are inaccurate and break down in many situations (i.e. multiple reflections (see Figure 8.1)) [51]. The game engine developed by Daniel Pohl et al. supports player and bot movement, including shooting and jumping, collision detection and many special effects like jumppads and teleporters. More ray tracing specific effects like portals (see Figure 8.2) and surveillance cameras are automatically rendered correctly by default, if they recursively see each other. Also a level-of-detail mechanism is not necessary to reduce scene complexity (see Figure 8.3), which allows for highly crowded scenes with many characters (see Figure 8.4). All screenshots shown on this page are courtesy of Daniel Pohl [52]. Figure 8.1: Quake 3 Raytraced: Multiple reflections Figure 8.2: Quake 3 Raytraced: Lookthrough portal Figure 8.3: Quake 3 Raytraced: High number of polygons Figure 8.4: Quake 3 Raytraced: Area crowded with characters Other games and technology demos have already been developed with screenshots and videos available at [54].

8.1. COMPUTER GRAPHIC APPLICATIONS 43 8.1.2 Visualizing Massively Complex Models Rendering a model either via rasterization or ray tracing is not problematic as long as the needed data (i.e. polygons, textures etc.

With CAD 1 becoming more and more important in the industry, displaying highly complex models at interactive frame rates on the engineers desktop computer, becomes a necessity.

55 8.1. COMPUTER GRAPHIC APPLICATIONS Visualizing Massively Complex Models Rendering a model either via rasterization or ray tracing is not problematic as long as the needed data (i.e. polygons, textures etc.) fits within the memory of the graphics card or host computer. With CAD 1 becoming more and more important in the industry, displaying highly complex models at interactive frame rates on the engineers desktop computer, becomes a necessity. Both described projects have been developed at the Saarland university. The first involved rendering a UNC 2 power plant reference model, featuring fifty million triangles [55] (see Figure 8.5). Previous works regarding this power plant architecture featured frame rates of 5 to 15 frames per second on an early SGI 3 Onyx with four R4400 and an InfiniteReality graphics subsystem. A reduction of rendered polygons by 96% has been achieved by using several tricks, like replacing distant geometry with textured meshes. Also some pre-compilation time has been invested in creating several level-of-detail levels of the model. The main drawback of the UNC approach is the tremendous preprocessing time estimated to be three weeks for a single copy of the power-plant model. Most of the used reduction strategies are implicit when using ray tracing, rendering the needed pre-processing time obsolete. The data management of such large models was also a major field of work. These problems involved mapping the high amount of data into the memory of a single client or a cluster. In the latter case an optimal distribution of the data on the network was needed to optimize the performance. The final version consisted of two servers, one to display and one to store and distribute the model data where needed. The rendering itself was processed by seven dual Pentium-III 800 MHz clients all connected by Gigabit Ethernet switches. Using this configuration interactive frame rates of 6-12 frames per second could be reached, all together with shaders and shadows (see Figure 8.6). Figure 8.5: UNC power plant: Highly detailed building (Courtesy of Wald [55]) Figure 8.6: UNC power plant: Rendering with shadows (Courtesy of Wald [55]) 1 CAD, Computer Aided Design 2 UNC, University of North Carolina at Chapel Hill, USA 3 SGI, Silicon Graphics Inc.

44 CHAPTER 8. APPLICATIONS The second example involved the rendering of a highly-detailed Boeing 777 model featuring about 350 million triangles (see Figures 8.7, 8.8).

7: Boing 777: Overview (Courtesy of Boeing Corp.) Figure 8.8: Boing 777: Engine interior (Courtesy of Boeing Corp.

56 44 CHAPTER 8. APPLICATIONS The second example involved the rendering of a highly-detailed Boeing 777 model featuring about 350 million triangles (see Figures 8.7, 8.8). A special feature of the Boeing 777 model was the low degree of occlusion. Many rays penetrate the model deeper than visible due to the models skeleton appearance. Figure 8.7: Boing 777: Overview (Courtesy of Boeing Corp.) Figure 8.8: Boing 777: Engine interior (Courtesy of Boeing Corp.) Memory management was also a great problem with the model being even more detailed than the previously described UNC power plant. To improve loading times and interactive navigation, geometry proxies have been used, resulting in a crude model at startup (see Figure 8.9). While more data is processed over time, more fine details become visible (see Figure 8.10). Figure 8.9: Boing 777: Dynamic loading (at startup) (Courtesy of Boeing Corp.) Figure 8.10: Boing 777: Dynamic loading (after startup) (Courtesy of Boeing Corp.) With shadows and shading enabled the final application achieved frame rates of 3-7 frames per second at pixels on a single dual-cpu desktop PC with 1.8 GHz.

8.1. COMPUTER GRAPHIC APPLICATIONS 45 8.1.3 Free-Form Surfaces and Volume Ray Tracing Up to now, most ray tracers use triangles to display a scene s geometry, much like standard rasterization approaches.

e. perfectly round spheres) described via mathematical formulas and not triangles.

57 8.1. COMPUTER GRAPHIC APPLICATIONS Free-Form Surfaces and Volume Ray Tracing Up to now, most ray tracers use triangles to display a scene s geometry, much like standard rasterization approaches. While triangles are a must-have in a normal rasterization engine, this constraint is just used for the sake of simplicity in a ray tracer. Certain algorithms are already supporting perfect shapes (i.e. perfectly round spheres) described via mathematical formulas and not triangles. Free form objects, described by splines instead of triangles, offer many advantages like reduced memory requirements and higher precision results (see Figures 8.11, 8.12, by Benthin et al. [57]). However, their usage poses special problems regarding the used intersection tests and spatial index structures. Figure 8.11: Free-Form Surfaces: Round face (Courtesy of Benthin [57]) Figure 8.12: Free-Form Surfaces: Chess game (Courtesy of Benthin [57]) While perfect shapes are normally described mathematically, volumetric data sets are much more difficult to handle. Consisting of sole independent points within the scene, forming an object. Such data sets can be generated by taking a three-dimensional laser scan of a real geometry. One task in volume ray tracing is to find the correct intersection of a ray with the interpolated implicit surface defined by the data values. Depending on the intersection algorithm, the results are either accurate but slow, or fast but only approximate the solution. New optimized algorithms have also been used to achieve frame rates of 2 frames per second on a dual Pentium-IV running at 2.2 GHz (see Figures 8.13, 8.14, by Marmitt et al. [58]) Mixed Reality Rendering The fundamental problem of all science-fiction movies is, how real actors should be copied into the computer generated scene which does not really exist? After solving this problem, the next step is to generate certain effects involving the actors (i.e. an actor casting a shadow on a virtual object in the scene, or the actor being reflected in a virtual mirror). All such effects can normally only be generated with a lot of tricks and processing time. Andreas Pomi et al. [59] [60] presented a way, to stream videos directly into the ray tracer. These videos are acting as textures being directly integrated into the scene.

46 CHAPTER 8. APPLICATIONS Figure 8.13: Volume Ray Tracing: Bonsai (Courtesy of Marmitt [58]) Figure 8.

15 two billboards have been generated in the scene, acting as planes for the streamed video textures.

reflections on the floor. Also realistic television effects are possible. In Figure 8.

Similar to the actors, the image is reflected by the table.

light source. Two examples with a red and green light effect are shown in Figures 8.17 and 8.18. Figure 8.

58 46 CHAPTER 8. APPLICATIONS Figure 8.13: Volume Ray Tracing: Bonsai (Courtesy of Marmitt [58]) Figure 8.14: Volume Ray Tracing: Skull (Courtesy of Marmitt [58]) In Figure 8.15 two billboards have been generated in the scene, acting as planes for the streamed video textures. The ray tracer treats these images like textures, which causes the actors to cast realistic shadows and reflections on the floor. Also realistic television effects are possible. In Figure 8.16 the streamed video texture is used as a television image. Similar to the actors, the image is reflected by the table. Additionally certain color effects are possible, if a shader is added to the television, making it act like a light source. Two examples with a red and green light effect are shown in Figures 8.17 and Figure 8.15: Mixed Reality: Shadows and reflections (Courtesy of Pomi [59] [60]) Figure 8.16: Mixed Reality: TV with reflections (Courtesy of Pomi [59] [60]) Figure 8.17: Mixed Reality: TV as red light source (Courtesy of Pomi [59] [60]) Figure 8.18: Mixed Reality: TV as green light source (Courtesy of Pomi [59] [60])

59 8.2. A.I.-VISION AND COLLISION DETECTION A.I.-Vision and Collision Detection Although graphical applications for ray tracing clearly dominate the field of applications, there are also some byproducts for other problems like artificial intelligence. Programming an artificial intelligence algorithm, which needs to react to other characters or human players in the game or simulation is always challenging. One part involves actually seeing other characters or humans. Such problems can be very interesting in case the computer-controlled character is working against the player. Modern computer games give the player a wide-range of possibilities to play against the computer. One genre specialized in hide-and-seek games, with the player acting as thief, needing to get past the guards without being seen. To do that, the player hides in the shadows and sneaks around corners in the back of the computer-controlled characters. For a programmer this creates serious problems, because the A.I. algorithms of the guards need to check if they are actually seeing the player. This is easy, if there is no direct line of sight between the guard and the thief, but problematic, if the player hides in a dark corner the guard is directly looking at or if the player is allowed to use camouflage (see Figures 8.19, 8.20). Figure 8.19: A.I. Vision - Visible player (Courtesy of Pohl [52]) Figure 8.20: A.I. Vision - Hidden player (Courtesy of Pohl [52]) In order to detect the player, the computer-controlled character could use the ray tracer to render a few pixels of the player. A heuristic based on the contrast or the color of the rendered pixels could then be used to check, if the guard is seeing the player, or if the camouflage is successful. Aside of the heuristic all code needed, already exists within the ray tracer. A similar solution is possible, if collision detection is needed. Ray tracing a few samples is enough to perform a collision detection against the real geometry of the scene. No bounding boxes or other ways to simplify the problem need to be used.

Lecture 2 - Acceleration Structures

INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 2 - Acceleration Structures Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Problem Analysis