Bandwidth and Memory Efficiency in Real-Time Ray Tracing

Size: px
Start display at page:

Download "Bandwidth and Memory Efficiency in Real-Time Ray Tracing"

Transcription

1 Bandwidth and Memory Efficiency in Real-Time Ray Tracing Pedro Filipe Vitorino Castro Lousada Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisors: Prof. João António Madeiras Pereira Dr. Vasco Alexandre da Silva Costa Examination Committee Chairperson: Prof. Nuno João Neves Mamede Supervisor: Prof. João António Madeiras Pereira Member of the Committee: Prof. Fernando Birra October 2016

2 ii

3 Acknowledgments I would like to start by thanking my family, especially for all the support, encouragement and motivation they have provided throughout my academic years. I also want to thank prof. João Madeiras and Dr. Vasco Costa for providing expertise, guidance and insight during the development of this thesis. Lastly but not least I also want to thank all my friends and colleagues for the words of encouragement provided during the past months. iii

4 Abstract Real time ray tracing has been given a lot of attention in recent years in the academic and research community. Several novel algorithms have appeared that parallelize different aspects of the ray tracing algorithm through the use of a GPU. Among theses, the creation of BVHs. We believe that recent approaches have failed to consider the performance impact of memory accesses in GPU and how their cost affects the overall performance of the application. In this work we show that by reducing memory bandwidth and footprint we are able to achieve significant improvements in BVH traversal times. We do this by compressing the BVH and the triangle mesh in parallel manner after its creation in each frame and then decompressing them as needed while traversing the BVH. Keywords: Ray Tracing, Bounding Volume Hierarchy, Parallelization, GPU, Quantization, Memory iv

5 Resumo Nos últimos anos ray tracing em tempo real tem sido alvo de bastante atenção na comunidade académica e de investigação. Surgiram novos algoritmos que paralelizam diferentes aspectos do algoritmo de ray tracing através do uso de um GPU. Entre estes aspectos destaca-se a criação de BVHs. A nosso ver estas abordagens não têm considerado a importância e o impacto na performance que os acessos à memória global do GPU têm. Com este trabalho pretendemos demonstrar que através da redução da utilização da largura de banda de memória conseguimos reduções significativas no tempo de travessia da BVH. Exploramos técnicas de compressão da BVH e da malha poligonal mediante a sua construção a cada frame e descomprimimos as mesmas à medida que são necessárias para efectuar testes de intersecção. Palavras-chave: Ray Tracing, Bounding Volume Hierarchy, Paralelização, GPU, Quantização, Memória v

6 Contents Acknowledgments iv Abstract v Resumo vi Contents viii List of Figures x List of Tables xi 1 Introduction Motivation Problem Description Goals Contributions Work Outline Background and State of the Art Background Ray Tracing GPU Computing Acceleration Structures Kd-Trees Uniform Grids Bounding Volume Hierarchies State of the art Parallel Construction of Bounding Volume Hierarchies Random-Accessible Compressed Bounding Volume Hierarchies Approach and Architecture Construction Of The BVH Binary Radix Tree Properties Morton Codes Parallel Construction of BVH Calculating Bounding Volumes vi

7 3.2 Compression BVH Compression Mesh Compression Rendering Evaluation Evaluation Methodology Experimental Scenes Hypothesis Experimental Results and Analysis Number of Ray-Box Intersection Tests Number of Ray-Triangle Intersection Tests Global Memory Reads Bounding Volume Hierarchy Total Size Kernel Execution Time Frames Per Second Conclusions Future Work Bibliography 48 A Data Structures 49 B BVH Traversal Kernel Code 50 vii

8 List of Figures 2.1 Ray Tracing Diagram Difference between rasterization and ray tracing (source: Intel). This figure shows how Ray-Tracing can better simulate soft shadows and reflections Example of a typical CUDA processing flow (source: NVIDIA) Memory hierarchy in CUDA (source: NVIDIA) A visual representation of a kd-tree. P1 through P4 represent planes of space division Five rays traversing a grid A visual representation of a Bounding Volume Hierarchy Example 2-D Morton code ordering of triangles with the first two levels of the hierarchy. Blue and red bits indicate x and y axes, respectively. Source: [1] A visual representation of an ordered radix tree. The numbers 0-7 act as keys (and such as leaf nodes) of the tree. Each internal node represents a common range prefix of the binary value of all the leaf nodes bellow it. Notice we have N 1 internal nodes for N keys Centroid of a triangle Interweaving of bits in order to create a Morton coder Bar representation of the keys covered by each internal node Pseudocode for the parallel construction of a BVH (source:[2] ) Resulting BVH created in parallel around Stanford s Bunny Ray-box intersection errors caused by not using a quantized parent bounding box Mesh deformation caused by quantization of triangles vertices. Notice the small black dots between triangles. Image is a closeup of the tip of the ear of Stanford s Bunny CORNELL test scene STANFORD BUNNY test scene ASIAN DRAGON test scene Number of ray-box intersection tests in CORNELL Number of ray-box intersection tests in STANFORD BUNNY Number of ray-box intersection tests in ASIAN DRAGON Number of ray-triangle intersection tests in CORNELL viii

9 4.8 Number of ray-triangle intersection tests in STANFORD BUNNY Number of ray-triangle intersection tests in ASIAN DRAGON Estimated memory read during intersection tests in CORNELL in each frame Estimated memory read during intersection tests in STANFORD BUNNY in each frame Estimated memory read during intersection tests in ASIAN DRAGON in each frame Frames per seconds in each test scene ix

10 List of Tables 4.1 Size of internal node, leaf node and triangle in each algorithm Kernel execution time for CORNELL frame Kernel execution time for STANDFORD BUNNY frame Kernel execution time for ASIAN DRAGON frame x

11 Chapter 1 Introduction 1.1 Motivation Real-time rendering concerns itself with the generation of synthetic images at a rate fast enough so that the viewer can interact with a virtual environment. An image appears on screen, the viewer acts or reacts, and this feedback affects what is generated next. Two of the more popular approaches to synthetic image generation are Rasterization and Ray-Tracing. Both have been used in computer graphics for the past decades. Each method allow us to generate 2D images from 3D scenes composed of virtual objects. Rasterization quickly became the dominant approach for real time applications. Its initially low computational requirements, its ability to incrementally move from software to hardware, and the ever increasing performance of dedicated graphics hardware made it the most attractive option in generating images at high frame rates [3]. Ray-Tracing then received little attention outside the academic world. Its high computation costs made it a much more expensive and slow approach compared to rasterization. Ray tracing offers a fairly long list of advantages over rasterization. Ray-tracing can easily simulate nonlocal effects such as shadows, reflections and refraction. In Rasterization reflections and shadows are hard to compute; refractions being very hard. Due to its mathematical correctness, ray tracing can make generated images look more realistic. Real time applications such as games, simulations and even medical software can greatly benefit from this. It s only a matter of being able to generate them at a rate fast enough required by the application. 1.2 Problem Description At its core, the ray tracing algorithm follows the following logic: for each pixel of the display, cast a ray that propagates in a straight line until it intersects an element of the scene being rendered. The intersection is then used to determine the color of the pixel as a function of the intersected element s surface. For a simple scene with no secondary rays (i.e. reflections, shadows, refractions, etc) this 1

12 means having at least N xm intersection tests (N being the number of rays and M the number of polygons in the scene). For its nature, ray tracing inherently leads to a high number of expensive floating point operations. Put this together with an irregular and high bandwidth memory access pattern and performance will be an issue when trying to achieve real time rendering. One way to optimize the standard ray tracing algorithm is through the use of acceleration structures. Acceleration structures allow us to lower the number of intersection tests needed to render an image. Currently Bounding Volume Hierarchies show the most promise. The state of the art in this kind of structure has been targeted towards generating the structure at a rate fast enough to be usable in a real time application, typically creating a new BVH or adapting an existing one at each frame. Most algorithms use a GPU to achieve the performance they need. Despite their capability of performing a high number of floating point operations per second, GPUs tend to suffer from slow memory accesses. Most algorithms that focus on constructing a BVH in a parallel manner tend to overlook this and be careless with memory accesses and bandwidth, resulting in overall less than optimal performance. 1.3 Goals Our goal with this work is to explore the lack of concern given to memory accesses in the recent state of the art. We believe that by concerning ourselves with GPU memory footprint and bandwidth efficiency we are able to improve performances. In this work we aim to develop a ray tracing application supporting one of the latest state of the art algorithms in parallel BVH construction and improve it further through memory compression techniques. 1.4 Contributions Our contribution is a novel algorithm based in existing techniques for real time BVH construction with focus and improvements in BVH and triangle mesh compression. We are able to reduce the total size of memory used to store both the BVH and the triangle mesh as well as reduce the memory bandwidth of the application. We see improvements of up to 40% in occupied memory and reductions of up to 20% in memory bandwidth. These improvements then lead to a decrease in the time it takes to generate each frame, namely through the acceleration of the traversal of the BVH. 1.5 Work Outline This document is divided into 5 different chapters. In Chapter 2 we briefly talk about the basis of the ray-tracing algorithm. We mention the advantages of GPU computing and provide some insight on how it works. We also give a brief overview on the usage of acceleration structures in ray-tracing and mention their advantages and disadvantages. Still in this chapter we take a look into the state of the art in parallel construction of BVH s and BVH compression. In Chapter 3 we describe our algorithm and 2

13 techniques in detail. Chapter 4 provides the evaluation methods we use and showcases our results. The final Chapter 5 is where we make our concluding remarks and discuss potential ideas for future work. 3

14 Chapter 2 Background and State of the Art In order to better contextualize our work, this section delves into the topics that this work focuses on. We first look into the standard ray tracing algorithm as presented by T. Whitted in We describe the general idea behind the algorithm and provide pseudocode of its implementation. We then proceed to look into GPU Computing. We explore the architectural model and the advantages we are able to gather through the use of a GPU. After this we go over the most commonly used acceleration structures, analyze their advantages and disadvantages and justify our reasoning in choosing the one we think is more suitable. Finally, we look into the state of the art in the construction of Bounding Volume Hierarchies in a parallel manner in GPUs and also their compression. 4

15 2.1 Background Ray Tracing First introduced by T. Whitted [4], ray tracing is a technique based in global illumination shading. It employs the use of ray casting [5] to intersect eye rays with objects in a computer simulated scene. Figure 2.1: Ray Tracing Diagram As shown in figure 2.1 ray tracing works by creating a series of Primary Rays with their origin in the Spectator and targeted at a virtual Window X pixels wide and Y pixels high, depending on the resolution of the screen or the ray tracing application window. Primary Rays traverse the scene until they encounter a Scene Object. When an object is hit by a Primary Ray 2 things happen: Shadow Rays are generated from the intersection point to check if the area is shaded, and, if the material of the object is reflective or refractive Secondary Rays are created to traverse the scene once more in order to create secondary visual effects such as reflection and refractions. Pseudocode for this algorithm can be found in Algorithm 1. 5

16 Algorithm 1 Whitted Ray Tracing Algorithm 1: while true do 2: for each pixel p do 3: ray Ray(camera.origin, camera.direction, p) 4: pixelcolor CastRay(ray) 5: end for 6: end while 7: 8: function CASTRAY(ray) 9: if ray.depth > MAX DEP T H then 10: return BLACK 11: end if 12: intersection IntersectRayW ithp rimitives(ray) 13: if intersection = null then 14: return backgroundcolor 15: end if 16: outputcolor Color(intersection) 17: if ObjectIsRef lective(intersection) then 18: ref lectionray Ref lectray(ray, intersection) 19: ref lectionray.depth ray.depth : outputcolor outputcolor + CastRay(ref lectionray) intersection.ref lective coef 21: end if 22: if ObjectIsRef ractive(intersection) then 23: ref ractionray Ref ractray(ray, intersection) 24: ref ractionray.depth ray.depth : outputcolor outputcolor + CastRay(ref ractionray) intersection.ref ractive coef 26: end if 27: return outputcolor 28: end function 29: 30: function COLOR(intersection) 31: color black 32: for each light l do 33: shadowray ShadowRay(intersection.intersectionP oint, l) 34: if IntersectRayW ithp rimitives(shadowray) = null then 35: color color + dif f usecomponent 36: color color + specularcomponent 37: end if 38: end for 39: return color 40: end function 6

17 The main difference between rasterization and ray tracing is the ability to calculate effects such as shadows, reflections and refractions. Whereas rasterization engines only try to simulate these effects through the use of texturing, ray tracing takes a more mathematical approach. Since these effects are highly geometry dependent, simulated methods look rather unrealistic when changing object position or viewer position. The difference in ability to simulate secondary visual effects between rasterization and ray tracing can be seen in Figure 2.2. Figure 2.2: Difference between rasterization and ray tracing (source: Intel). Ray-Tracing can better simulate soft shadows and reflections. This figure shows how 7

18 2.1.2 GPU Computing There are 2 options when it comes to GPU computing: CUDA[6] and OpenCL, we chose to work with CUDA since it provides a vast array of libraries to work with, has the best integration with modern IDE s and also provides the best profiling and debugging tools. Modern GPUs are what we call SIMD (Single Instruction Multiple Data) devices, meaning that they can perform the same operation on multiple data points simultaneously. Even though a modern CPU core can dispatch more operations per second than a GPU core, the GPU as a whole can vastly outperform a CPU by having thousands of cores running the same operations versus the 4-8 cores present in the CPU. The idea behind GPU accelerated computing is to use a CPU and GPU together in a heterogeneous co-processing computing model. The sequential part of the application runs on the CPU and the computationally intensive part is run by the GPU [7]. The massive parallelism of programmable GPUs naturally lends itself to inherently parallel problems such as ray tracing. A GPU accelerated application consists of 2 main components: a host and a device. The host refers to the CPU and the system s memory, while the device refers to the GPU and its memory. Code run on the host can manage memory on both the host and device, and also launches kernel functions which are executed in the device. A typical sequence of operations for a CUDA program consist of (Figure 2.3): 1. Declare and allocate host and initialize the host s data 2. Transfer data from host memory to device memory 3. Execute kernel functions 4. Transfer results from device memory to host memory A given kernel executes a single program across many parallel threads. Typically, each kernel completes execution before the next kernel begins, with an implicit barrier synchronization between kernels. GPU s have support for multiple, independent kernels to execute simultaneously, but many kernels are large enough to fill the entire device. Threads are decomposed into thread blocks; threads within a given block may efficiently synchronize with each other and have shared access to per-block on-chip memory[1]. Each thread also has access to private memory visible only to that thread. And, obviously, all threads have access to the same global memory (figure 2.4). 8

19 Figure 2.3: Example of a typical CUDA processing flow (source: NVIDIA) Figure 2.4: Memory hierarchy in CUDA (source: NVIDIA) 9

20 2.1.3 Acceleration Structures One of the major problems of ray tracing is the sheer number of intersection tests one must perform in order to check if a certain ray intersections an object of the scene. Acceleration structures help us reduce the number of intersections to test by organizing the geometry of the scene into a data structure that can easily be explored. The organization of acceleration structure is typically hierarchical, loosely meaning that the topmost level encloses the levels bellow it, and so on. Typically an acceleration structure is a preprocess of the rendering step. This is often because their construction is rather expensive and, for static scenes, it can be reused without any modifications. Despite their weight, applications tend to benefit through the use of acceleration structures as queries typically go from O(n) to O(log n). Historically, ray tracing research has concentrated mostly on the effectiveness of acceleration structures (i.e., reducing the number of primitive operations), and on the efficiency of the traversal and intersection operations). The time for building these data structures was then typically ignored, since it is usually insignificant in a non real-time application. Consequently, as ray tracing started to reach interactive rendering performance, build times could no longer be ignored. This situation opened up an entirely new research challenge for consideration in ray tracing: how to create build algorithms that were fast enough and flexible enough to be used at interactive frame rates [8]. The choice of data structure is greatly correlated with the kind of scene and animations we ll have in the application. According to Ingo Wald et al[8]. we can refer to each scene s animation as follows: Static - A scene whose geometry does not change at all from frame to frame. Partially static - A scene in which a certain amount of the primitives are moving, while other parts remain static, such as a few characters moving through an otherwise static game environment. Hierarchical - The scene s primitives can be partitioned into groups of primitives such that all of the primitives of a given group are subject to the same linear or rigid-body deformation. Semi-hierarchical - A scene that can be partitioned into objects whose motion is primarily hierarchical, plus some small amount of incoherent motion within each object. Deformable - Changes can occur to the vertices of a primitive but its connectivity is unchanged. Arbitrary - Arbitrary changes to the scene topology may occur. Despite this categorization, applications often use different kind of animations. In games, for example, most objects will have hierarchical animation. The motion of the objects themselfs may itself be an example of incoherent motion (e.g. characters appearing and disappearing). In addition, characters often have skeleton animations, providing a good example of semi-hierarchical motion. It is likely that no one technique will handle all kinds of motions equally well. As one would expect the choice of an acceleration structure strongly affects the traversal performance, and also can facilitate (or inhibit) the choice of certain algorithms for updating or rebuilding the acceleration structure. At the current state-of-the-art, grids[9], kd-trees[10][11][12], and BVHs[13][1][2] all feature fast construction and traversal algorithms that support animated scenes. All of them make suitable candidates and for this reason we will briefly look into each one and discuss both their advantages and disadvantages in order to justify our choice in using a BVH. 10

21 2.1.4 Kd-Trees In its basis, a kd-tree is a binary tree that partitions space along a series of planes. Each non-leaf node of the tree corresponds to a plane, the points on the left of the plane are sent to the left subtree and the points to the right of the plane are sent to the right subtree. A visual representation of this can be seen in Figure 2.5. Kd-trees are considered by many to be the best known method to accelerate ray tracing of static scenes as they have the fastest reported times for primary and shadow rays. If we wish to find the first intersection point along a ray, the problem is the simplest of all space partitioning data structures. Each volume of space is represented just once, so the traversal algorithm can traverse these voxels in strict front-to-back order, and can perform an early exit as soon as any intersection is found. Their primary disadvantage for dynamic scenes is that rebuilding or updating the structure each frame is a very costly process. Work has been done towards developing and researching methods that allow a quick rebuild of kd-trees in a parallelized manner[10][11][12]. However, since this is a relatively recent problem these methods are still somewhat rudimentary and lacking performance. If memory usage by the acceleration structure is not a problem, then a kd-tree is often the most effective solution for ray tracing of large scenes. The problem for us is that GPU memory still isn t an abundant resource and so kd-trees based ray tracers suffer by consuming large amounts of memory. During the construction of a kd-tree each plane is placed on top of a median point in the scene, most often than not splitting objects and primitives. This splitting causes an increase in memory usage since now each primitive must be referenced in 2 or more different cells. And the more finely a scene is subdivided more memory the kd-tree will use. Figure 2.5: A visual representation of a kd-tree. P1 through P4 represent planes of space division. 11

22 2.1.5 Uniform Grids Grids are the simpler form of acceleration structures, the concept behind a grid is that space is divided into evenly sized blocks. While both kd-trees and BVHs are hierarchical structures, grids fall into the category of uniform spatial subdivision. Grids are very easy to build as it s simply a matter of directly calculating in which cell a point will fall into. This makes them very suitable to animated scenes as they can be quickly rebuilt every frame despite the kinds of animations present in the scene. The construction process can also be easily parallelized since there are no hierarchical constrains when it comes to filling in the cells. Memory bandwidth proves to be a major bottleneck in parallel grid rebuilds as shown by Ize et al.. The matter of fact is that when using grids we have a low amount of computation performed and a high amount of data that needs to be processed[9]. By having a high bandwidth usage and low computation we are forfeiting all the advantages we gain by making use of a GPU. And, much like kd-trees, primitives end up falling into multiple cells as their vertices are divided in the construction process. A visual representation of an uniform grid can be seen in figure 2.6. Figure 2.6: Five rays traversing a grid Bounding Volume Hierarchies Bounding Volume Hierarchies are a tree-like structure that subdivide a scene into smaller segments. While in both kd-trees and grids spatially divides the scene, a Bounding Volume Hierarchy divides a scene around its objects. All objects are wrapped into individual Bounding Volumes that form the leaf nodes of the tree. Bounding Volumes are then recursively enclosed together by more embracing Bounding Volumes until we are left with a single Bounding Volume that wraps around the entire scene. Bounding Volume Hierarchies have long been ignored in real-time ray tracing as they were once thought to be unable to compete with kd-trees and grids in respect to render time, mainly due to slower traversal speeds. However, given continuous research[13][1][2] and their advantages with respect to dynamic scenes they now make a viable alternative to other acceleration structures. In a typical object hierarchy data structure, it s easy to update the data structure as an object moves because the object lives in just one node and the bounds for that node can be updated with relatively simple and localized update operations. For deformable scenes, just re-fitting a Bounding Volume Hierarchy i.e., recomputing the hierarchy node s bounding volumes, but not changing the hierarchy itself is sufficient to produce a valid Bounding Volume Hierarchy for the new frame. Bounding Volume Hierarchies also allow 12

23 for incremental changes to the hierarchy[8]. In space subdivision each primitive can be referenced from multiple cells, resulting in higher memory usage than one would ideally like. But in Bounding Volume Hierarchies, primitives are only referenced exactly once, allowing us to save GPU memory and bandwidth. Spatial subdivision often leads to visiting the same object several times along the ray, which cannot happen with an object hierarchy. The same is true for empty cells that frequently occur in spatial subdivision, but simply do not exist in object hierarchies. The effectiveness of a Bounding Volume Hierarchy depends on what actual hierarchy the build algorithm produces. Like for kd-trees, best results seem to be achieved using top-down, surface area heuristics-based builds. 1 Figure 2.7: A visual representation of a Bounding Volume Hierarchy. 1 The SAH provides a method for estimating the cost of a structure for ray tracing. It assumes an infinite uniformly distributed rays, in which the probability of a ray hitting a (convex) volume V is proportional to the surface area of that volume 13

24 2.2 State of the art Parallel Construction of Bounding Volume Hierarchies Generating an acceleration structure is a necessary step when trying to achieve real-time performances. One can take two approaches: To construct a new structure every frame or to reuse the same structure and adjust it at each frame. The latter option has been recently explored [14] but still seems to be lacking performance when processing large trees or when large modifications are needed to the data structure. Most approaches [15] that explore a construction of new hierarchy at each frame tended to rely on serial algorithms running on the CPU to construct the necessary hierarchical acceleration structures. While once a necessary restriction due to architectural limitations, modern GPUs provide all the facilities necessary to implement hierarchy construction directly. Doing so should provide a strong benefit, as building structures directly on the GPU avoids the need for the relatively expensive latency introduced by frequently copying data structures between CPU and GPU memory spaces. The first to explore such method was Lauterbach et al.[1]. Lauterbach introduced a novel algorithm using spatial Morton codes[16] to reduce the construction of BVH to a simple sorting problem. Morton codes are used to determine a primitive s order along a space filling curve. They can be computed directly from a primitive s geometric coordinates. Lauterbach expects each input primitive to be represented by an AABB 2 and that the enclosing AABB of the entire input geometry is known. By taking the barycenter of each primitive s AABB as its representative point and by quantizing each of the 3 coordinates of the each representative points into k-bit integers, a 3k-bit Morton code is constructed by interleaving the successive bits of these quantized coordinates. Figure 2.8 shows a 2D representation of this. Sorting the Morton codes will automatically lay the associated points in order along a Morton curve. It will also order the corresponding primitives in a spatially coherent way. Because of this, sorting geometric primitives according to their Morton codes has been used to improve cache coherence as a ray that hits a certain primitive will likely hit the primitive adjacent to it. The main problem of Lauterbach s work and others[15][17] is that in order to build the resulting BVH a series of sequential steps must be taken. Lauterbach et al create their BVH by sequentially observing each bit of each Morton code and grouping the primitives according to the value of the bit. At each level if said bit has value 0 then the primitive it is placed in a group, if the bit has value 1 then it is placed in the opposite group. Garanzha et al. takes a different approach by generate one level of nodes at a time, starting from the root. They process the nodes of the BVH on a given level in parallel. For this they use binary search to partition the primitives contained within each node. The resulting child nodes are then enumerated using an atomic counter, and subsequently process them on the next round. Karras[2] further developed this line of though by proving a way to build a BVH in a total parallel manner. Karras introduces an in-place algorithm for constructing a binary radix tree (also called a Patricia tree) which can directly be converted into a BVH. 2 AABB - Axis-aligned minimum bounding box 14

25 He bases his approach on the fact that for a scene with N primitives we know we can make a Patricia tree with N 1 internal nodes to represent it. He analyzes the similarities between each Morton code and its neighbours determine the position of each internal node in the Patricia Tree. The child-parent association is then calculated based on the range of same-value-bits with the rest of the Morton codes. This will explained in further detail in Chapter 3. Figure 2.8: Example 2-D Morton code ordering of triangles with the first two levels of the hierarchy. Blue and red bits indicate x and y axes, respectively. Source: [1] Random-Accessible Compressed Bounding Volume Hierarchies An aggravating trend that s been appearing is that the growth rate of the data access speed is significantly slower than that of processing speed. This often makes an application to have to wait a considerable amount of time for data to arrive. Several approaches have been developed to address this problem. Unfortunately most of them make it so that once a BVH has been compressed it can only be accessed/traversed by first decompressing it in its entirety. Kim et al.[18] developed a novel compact representation of a BVH that makes it possible to randomly access a BVH. The authors create a cluster based layout-preserving BVH compression and decompression method. During the compression, consecutive BVs are decomposed into a set of clusters, each of which will serve as an access point for random access at runtime. Despite being able to achieve good compression results (compression factors up to 12:1) we felt this approach wasn t suitable for a real time ray tracing application as the compression mechanism is a very 15

26 heavy and time consuming process. Adding this compression time on top of the construction of the BVH and the intersection tests on each frame would result in a very low frame rate. 16

27 17

28 Chapter 3 Approach and Architecture In order to achieve a high frame rate one must reduce the time it takes to generate an image at each frame. For this we make use of a GPU to help us accelerate all the floating point operations required by the intersection process of a ray-tracer. We aim to reduce the memory bandwidth and memory footprint of our application. We make use of research made in the area of parallel BVH construction in GPU to naturally reduce the number of intersection tests performed (thus, reducing memory bandwidth as well). We then explore BVH and triangle meshes compression techniques. With a compressed BVH and triangle mesh we can expect our rendering kernels to be able to achieve lesser rendering times as the data transfered between the GPU s global memory and the kernel s local memory will be smaller. All of this, which define our proposed solution model, will be addressed in further detail in the following sections. Before we delve into the contents of the sections below we must note that we describe in detail only part of the application s algorithm, more specifically what happens inside the GPU s kernels. As referenced in section our application consists of both Host and Device code. The Host code was written in C++ for increased performance. It deals with OpenGL setup and is responsible at reach frame for displaying the image generated by the ray tracing kernels. In order to achieve a higher performance level we had our kernels write the resulting color of the ray tracing process of each pixel into a texture located in the GPU s Texture Memory. This texture is then mapped into a simple quad for rendering. In order to create an interactive application the camera controls are treated on the Host code. A new position is calculated and sent to the rendering kernels at each frame. The Host code is also responsible for loading the scene s primitives into the GPU s global memory at startup. The Device code is therefore only responsible for building a new BVH, perform the necessary intersection tests, and, generate a new image at each frame. 18

29 3.1 Construction Of The BVH Binary Radix Tree Properties A radix tree is a space-optimized trie often used for indexing string data, although it can also be used to index any data divisible in smaller comparable chunks such as characters, binary numbers, etc. For simplicity, assume that from now on our radix tree only contains binary values and that they are in a lexicographical order as in our application each key will correspond to a sorted Morton code. Given a set of keys k 0,..., k n 1 represented as bits, a radix tree can be seen as a hierarchical representation of the common bits of each key. The keys are represented in the leaf nodes of the tree, and each internal node corresponds to the longest common prefix shared between the keys in that subtree. Assume the example referenced in Figure 3.1. As one would expect the root of the tree covers the full range of keys. At each level the keys are partitioned according to their first differing bit. The first difference occurs between keys k 3 and k 4, thus, the left child of the root node contains keys k 0 through k 3 and the right child contains k 4 through k 7. We continue this process to essentially get a hierarchical representation of the common prefixes between each key. At the bottom level of the tree we will find that each child references a key. Figure 3.1: A visual representation of an ordered radix tree. The numbers 0-7 act as keys (and such as leaf nodes) of the tree. Each internal node represents a common range prefix of the binary value of all the leaf nodes bellow it. Notice we have N 1 internal nodes for N keys. 19

30 A radix tree is considered a compact data structure as it omits nodes which only have one child, thus removing redundant information and decreasing the overall size of the tree in memory. One property of a binary radix tree is that any given tree with N keys will have n 1 internal nodes. This allows us to know even before we construct the tree how many nodes, and thus how much memory, we will require. Assuming a lexicographically ordered tree allows us to express [i, j] as the range of keys covered by any given internal node. We use δ(i, j) to denote the length of the longest common prefix between the keys k i and k j. The ordering of the keys automatically implies that δ(i, j ) δ(i, j ) for any i, j [i, j] [2]. We can then determine the common prefix shared between key under a given node by comparing the first and last key it covers - all the other keys in between are guaranteed to share the same or a larger prefix. In practice each internal node partitions the keys under it on their first differing bit, the one following δ(i, j). We can safely assume that in the range [k i, k j ] part of the keys will have said bit set to 0 and others will have it set to 1. Since we are working with an ordered tree all of the keys with the bit set to 0 will be presented before the keys with the bit set to 1. We call the position of the last key where this bit has value 0 the split position, denoted by γ [i, j 1]. Since the split position is where the first bit differs between the keys in the range we can say that δ(γ, γ + 1) = δ(i, j). The ranges [k i, k γ ] and [k γ+1, k j ] will be subdivided at the next differing bit. We can thus say for sure is that δ(i, γ) > δ(i, j) and δ(γ + 1, j) > δ(i, j). Taking Figure 3.1 once more as reference, we can see that the first differing bit happens between keys k 3 and k 4, the range is then split at γ = 3 resulting in the subranges [k 0, k 3 ] and [k 4, k 7 ]. The left child then splits its range [k 0, k 3 ] at the third bit, γ = 1. The right child splits the range [k 4, k 7 ] at γ = 4, at the second bit, and so on Morton Codes We start our process with a series of primitives represented by 3D points. In order to generate our BVH we first start by deciding in which order each leaf node will be represented in the tree. A good approach is to sort the leafs according to their position, generally we will want primitives close to each other in 3D space to appear close to each other on the tree. In order to do this we sort them via a spacefilling curve, more specifically a Z-order curve. For this we take the centroid of each triangle (Figure 3.2) and express it relative to the bounding volume of the entire scene, known after loading the scene s data in the Host code. Let bvh min be defined as the minimum point of the scene s bounding volume and bvh max as its maximum. If we define c as the centroid point of a given triangle then we can express q, the same point but now in coordinates relative to the bounding volume of the scene through the following formula: c.x bvh min.x q = ( bvh min.x bvh max.x, c.y bvh min.y bvh min.y bvh max.y, c.z bvh min.z bvh min.z bvh max.z ) (3.1) q s coordinate values now vary between 0 and 1. We can think of it as a point inside the 3D space delimited by the scene s bounding volume. The closer each coordinate is to 0 the closer the point will be to the minimum point of the bounding volume, and, the closer it is to 1 the closer it will be to its maximum 20

31 Figure 3.2: Centroid of a triangle. point. We now want to express this 3D point as a Morton code. The first step in this process is to transform our point from a continuous space into a discrete one. We achieve this by quantizing each floating point coordinate into a range created by the difference between the scene s bounding volume maximum and minimum extremes. Let 2 p represent the precision of our quantization, where p stands for the number of bits we have available to represent each integer coordinate: q = ( c.x bvh min.x bvh min.x bvh max.x, 2 p c.y bvh min.y bvh min.y bvh max.y, 2 p c.z bvh min.z bvh min.z bvh max.z ) (3.2) 2 p Morton codes are best expressed as a single integer so to represent a 3D point in a single integer value we will have to make some precision sacrifices. We assume an architecture of 32 bit sized integers, meaning that in order to represent 3 distinct values in 32 bits we will have 10 bits for each of the 3 Cartesian coordinates. equation: Fixing the value of p at 10, we are then left with the following quantization q = ( (c.x bvh min.x) 1024 bvh min.x bvh max.x, (c.y bvh min.y) 1024 bvh min.y bvh max.y, (c.z bvh min.z) 1024 bvh min.z bvh max.z ) (3.3) Where q is now a 3D point who s coordinates are composed of 10 bit integers. To correctly represent this point along a Z-order curve we interleave the bits of all three coordinates together to form a single binary number. We take the value of each coordinate, expand the bits by inserting 2 bit gaps after each bit and then criss-cross them as shown in Figure 3.3. Beyond this point it s simply a matter of calculation the Morton codes for each primitive and sort them via a parallel sorting algorithm, we used radix sort for this Parallel Construction of BVH Now that we have a set of ordered keys we can begin the process of building the hierarchy. This section follow the same line of thought described by Karras in [2]. A naive way of building the BVH is to start at the root, find the first differing bit, create the child nodes, and repeat the process for each child recursively. This approach lack by being very focused and inherently sequential, we would be forced to stand idle as a single thread first processed the root node, then 2 threads processed each of its children 21

32 Figure 3.3: Interweaving of bits in order to create a Morton coder. and so on. This would be a waste of time and GPU computing power. Ideally we should make use of as much information as we possibly can. We know for a fact (see section 3.1.1) that for any tree with n keys we will have n 1 internal nodes, its just a matter of finding out how these nodes connect amongst themselves and to to the leafs. The idea is to assign indexes for the internal nodes in a way that enables finding their children without depending on earlier results, this way we can fully parallelize the construction of the BVH. We create 2 separate arrays to store our nodes, L for the leaf nodes and I for the internal nodes. We define the layout of I as having the root node at the index 0, denoted by I 0. The indexes of its children are assigned according to its respective split position. In fact, the position of each internal node will be defined by its split position. For any given node the left child will be defined in I γ if it covers more than one key and at L γ if it doesn t. Similarly the right child will be located either at I γ+1 or at L γ+1. Assuming this layout has an important property, the index of every internal node will either coincide to the first or the last key it covers. Take for example the root node, it covers the entire range of keys [0, n 1] and is located at position I 0. Let γ 0 be the first split position, since the left child covers the range [0, γ 0 ] it will automatically have to be located at I γ0 since I 0 is already taken by the root. More generally speaking if a node covers the range [i, j], then its left child will be located at the end of the range [i, γ] and its right child located at the beginning of the range [γ + 1, j]. In Figure 3.4 we present a visual example of this property. Node 0 covers the entire range [k 0, k 7 ] and is therefore located at I 0. Its children, node 3 and 4 cover the ranges [k 0, k 3 ] and [k 4, k 7 ] and are placed at I 3 and I 4, respectively. Node s 3 children in turn cover the keys [k 0, k 1 ] and [k 2, k 3 ] so we place them at I 1 and I 2. Node 4 is a little bit different, while its right child covers the range [k 5, k 7 ] and is placed at I 5, its left child would only cover a single key, k 4, we avoid creating an internal node for this and instead reference directly a leaf node. The same happens for node 5 right child. Nodes 1, 2 and 6 all reference 2 leaf nodes. Interestingly, this process will never result in gaps or duplicates when populating the internal node array. An advantage of using this scheme is that each internal node is conveniently placed next to a sibling node in memory. And since the internal nodes run from 0 to n 1 we can use them to directly address the nodes in memory. 22

33 Figure 3.4: Bar representation of the keys covered by each internal node. In order to construct the BVH, we need to know more than simply what nodes cover which keys, we need to know how the nodes connect amongst themselves, i.e. the parent-child relations. We start by analyzing each internal node I i in parallel. Based in our numbering we know for sure that its range will cover at least the key k i and either the keys to the left, [k 0, k i 1 ], or the keys to the right, [k i+1, k n 1 ]. We also know that since each node covers at least 2 keys either key k i 1 or key k i+1 will belong to the same node.the other key will belong to a neighboring node. Lets say we want to determine the direction in which the range of keys covered by node I i extends in. We call this direction d. In order to determine d we compare the length of the common prefix between the keys k i 1, k i, and k i+1. If key k i 1 shares a larger common bit prefix with key k i then we say that the range extends from k i to the left and that d = 1. If on the other hand k i and k i+1 both share a larger common prefix then we say that the range extends from k i to the right and that d = +1. In other words, if δ(i 1, i) > δ(i, i + 1) then d = 1. And if δ(i 1, i) < δ(i, i + 1), then d = +1. Knowing this we can say that k i and k i+d both belong to node I i, and that k i d belongs to node I i d. At this point we know the starting point of our range, k i, and we know in which direction it extends in, d. We do not know yet when it stops, i.e., how far it extents. Since k i d does not belong to the range of keys covered by node I i, we can safely assume that the keys which are share amongst themselves a larger common prefix than k i and k i d do. We call this common prefix δ min so that δ min = δ(i, d 1). δ min is in fact by definition the lower bound value under which all keys contained in the range covered by I i have a higher common prefix length amongst themselves. In other words δ(i, j) > δ min for any k i belonging to I i. This being said all we need to do to find the other end of the range is to search for the 23

34 largest l that satisfies the equation δ(i, i + ld) > δ min. The fastest way to do this is to start at k i and in powers of 2 increase the value of l until it no longer satisfies δ(i, i + ld) > δ min. Once this happens we know that we have gone too far and we have left the range of keys covered by I i. Lets call this upper bound l max. We know for sure the correct value of l is somewhere in the range [l max /2, l max 1]. Now it s only a matter of using a binary search to find the value of l under which δ(i, d(l + 1) + i) δ min. After finding the value of l we can use j = i + ld to specify the other end of the range. Lets call δ node the length of the common prefix shared between k i and k j, given by δ(i, j). We use δ node to search the split position γ that partitions the keys covered by I i. What we do now is perform a search for the largest s [0, l 1] that satisfies δ(i, i + sd) > δ node. I.e., we need to find the furthest key which shares a larger prefix with k i than k j does. Discovering γ allows us to determine the ranges covered by each children. The left child will have a range covering [min(i, j]), γ] and the right child will cover [γ + 1, max(i, j)]. For the final step we analyze the values of i, j and γ. If i = γ we know I i s left child is the leaf node L γ, otherwise it s the internal node I γ. Correspondingly if j = γ + 1 we say I i s right child is the leaf node L γ+1, otherwise it s internal node I γ+1. Pseudocode for this algorithm can be found in Figure 3.5. Figure 3.5: Pseudocode for the parallel construction of a BVH (source:[2] ). 24

35 3.1.4 Calculating Bounding Volumes Traditional methods for BVHs calculate the bounding boxes sequentially in a bottom-up fashion They rely on the fact that the bounding boxes of the level below are known a priori. This method can be partially parallelized by processing the nodes of each level at the same time but this doesn t erase the fact that a synchronization layer must exist between each level. An algorithm of this sort is expect to have a time complexity of O(n log n). Karras dictates there is a better way. We can start a thread from each leaf node and walk up the tree using parent pointers recorded during the radix tree construction. Each internal node maintains an atomic counter to track how many threads it has been visited by. The first thread to reach the node terminates itself while the second thread continues up the tree. This way we are guaranteed we always have the bounding box of both children before calculating the bounding box of the current node. Using this method each node is processed exactly once which leads to O(n) time complexity. The result is now a fully functioning BVH we can use to perform intersection tests. See Figure 3.6 for a visualization of a BVH constructed around Stanford s Bunny. Figure 3.6: Resulting BVH created in parallel around Stanford s Bunny. 25

36 3.2 Compression Reducing memory footprint and bandwidth has been our goal with this application. A certain amount of bandwidth is automatically reduced by the simple use of a BVH. Using a brute-force approach in our application would mean to test every single ray against every single triangle, resulting in fetching the entire scene from global memory at each frame. Through the use of a BVH we are on one hand reducing the amount of information we fetch from global memory by only fetching small segments of the BVH at each time until we reach a small number of primitives. On the other hand, a BVH is an acceleration structure, and they take up memory space too. By reducing memory bandwidth we have increased memory footprint. We intend to lessen this effect by doing post-processing on the BVH to reduce its overall size. It should also be noted that through the use of a BVH we are not simply reducing memory bandwidth. We are also drastically reducing the number ray-triangle intersection tests, replacing them with ray-box intersection tests which are much simpler. Pseudocode for our algorithm can be found in Algorithms 2 and 3. We will explain BVH compression and mesh compression in detail in the following sections. Algorithm 2 Our Ray Tracing Algorithm: Part 1 1: while true do 2: mortoncodes GenerateAndSortM ortoncodes() 3: bvh CreateBV H(mortonCodes) 4: bvh CompressBV H(bvh) 5: for each pixel p do 6: ray Ray() 7: ray.origin camera.origin 8: ray.direction camera.direction 9: ray.depth 0 10: pixelcolor CastRay(bvh, ray) 11: end for 12: end while 13: 14: function COLOR(bvh, intersection) 15: color black 16: for each light l do 17: shadowray ShadowRay(intersection.intersectionP oint, l) 18: if IntersectRayW ithbv H(bvh, shadowray) = null then 19: color color + dif f usecomponent 20: color color + specularcomponent 21: end if 22: end for 23: return color 24: end function 26

37 Algorithm 3 Our Ray Tracing Algorithm: Part 2 1: function CASTRAY(bvh, ray) 2: if ray.depth > MAX DEP T H then 3: return backgroundcolor 4: end if 5: intersection IntersectRayW ithbv H(bvh, ray) 6: if intersection = null then 7: return backgroundcolor 8: end if 9: outputcolor Color(bvh, intersection) 10: if ObjectIsRef lective(intersection) then 11: ref lectionray Ref lectray(ray, intersection) 12: ref lectionray.depth ray.depth : outputcolor outputcolor + CastRay(bvh, ref lectionray) 14: end if 15: if ObjectIsRef ractive(intersection) then 16: ref ractionray Ref ractray(ray, intersection) 17: ref ractionray.depth ray.depth : outputcolor outputcolor + CastRay(bvh, ref ractionray) 19: end if 20: return outputcolor 21: end function 22: 23: function INTERSECTRAYWITHBVH(bvh, ray) 24: intersections null 25: if IntersectRayW ithbox(ray, node.boundingbox) then 26: if IsLeaf N ode(node.lef tchild) then 27: intersections intersections+intersectrayw ithp rimitives(ray, node.lef tchild.primitives) 28: else 29: intersections intersections + IntersectRayW ithbv H(node.lef tchild, ray) 30: end if 31: if IsLeaf N ode(node.rightchild) then 32: intersections intersections+intersectrayw ithp rimitives(ray, node.rightchild.primitives) 33: else 34: intersections intersections + IntersectRayW ithbv H(node.rightChild, ray) 35: end if 36: end if 37: return intersections 38: end function 27

38 3.2.1 BVH Compression Since we rebuild our BVH at every frame each instance only lives for a few milliseconds. It doesn t make sense for us to make use of heavy compression algorithm, mainly target at compressing persistent BVH s. We thus chose to make a simple use of quantization in order to compress our BVH. We take a fully parallelized approach in this process. We start a thread at each internal node and make our way up to the top of the tree. Once we reach the top of the tree we start making the same path in the opposite direction, that is, from the root node to the internal node in question. As we descend each level we take the bounding box of the previous parent and subdivide each dimension in 1024 segments. We treat this 1024x1024x1024 grid as a voxel space and map the minimum and maximum points of the current level s node bounding box into the nearest corresponding voxels. It becomes clear we lose some precision as we descend each level since we are going from floating point coordinates into integer indexes. Each index will be contained in the range [0, 1024[, this is so we can store 3 of these indexes in a single 32 bit integer. Each bounding box can then be stored as a single 2 integer data structure instead of a 6 float data structure, reducing its size down from 24 bytes to 8 bytes. It is obvious from this method that out BVH will become more loosely coupled as we need to perform roundings when going from a continuous space (world coordinates) into a discrete one (voxel indexes). We round up when mapping the maximum point of a bounding volume to a voxel and round down when mapping the minimum so out bounding volumes remain coherent. We predict this loss of precision won t have a big impact in our intersection tests, we will most likely intersect a few more bounding volumes as they will be slightly larger in size but we don t expect a significant increase in the number of tests. One thing to take into account is that in each thread we must quantize the entire path from the root to each internal node. If we simply quantize a node relative to the world coordinates of the parent s bounding box we ll find mismatches on the true size of the bounding volume. This effect will have an impact on the rendered image as it can make us dismiss ray-triangle intersection tests. It can be seen in Figure 3.7. This happens because at each level of the tree we slightly change the position of the maximum and minimum points that define the bounding volume of that node. These changes in position accumulate over the levels so we have to take into account when quantizing the bounding volume of a node that the parent s bounding volume doesn t correspond to the original world position. When traversing the BVH we will need to make inverse quantization of each node s bounding volume to perform the ray-box intersection tests Mesh Compression Karras[2] uses each leaf node as a redirect to a primitive. We can remove an indirection level by having each leaf node envelop a single triangle and store said triangle in the node itself. We can also try and diminish the number of memory necessary to store a triangle by following the same line of thought as in section By quantizing the position of each vertex of each triangle into the bounding box that encapsulates it, we can go from having to store 9 floats to just 3 integers. This will of course have an impact on the model itself as we will be losing the original coordinates of each vertex. When 28

39 Figure 3.7: Ray-box intersection errors caused by not using a quantized parent bounding box. recalculating each vertex back to world coordinates we will be changing the final world positions of each triangle, resulting in slight mesh deformation. Interestingly enough, this effect isn t noticeable unless examining models very up close (see Figure 3.8). In a practical application like a game or simulation where the camera and the objects are constantly moving this effect could pass up unnoticed. This compression step can easily be done in the same kernel as the quantization the internal nodes of the tree, thus having little to no effect on the post-processing time of the BVH. 3.3 Rendering Rendering is simply a process of creating primary rays for the camera and check for intersections with the BVH. We used the algorithm proposed by Williams et al. in [19] to perform our ray-box intersections. Once we found we intersected a leaf node we used the method described by Moller et al. in [20] to perform ray-triangle intersections with the triangle contained in the leaf node. Secondary rays that resulted from intersections with objects are also checked for intersections with the BVH instead of the primitives directly. Phong shading and Lambertian reflectance were the methods used in shading. 29

40 Figure 3.8: Mesh deformation caused by quantization of triangles vertices. Notice the small black dots between triangles. Image is a closeup of the tip of the ear of Stanford s Bunny. Concluding Remarks In this chapter we presented the algorithm we used in our ray tracing application. We use a novel algorithm to build a new BVH in a parallel manner at each frame every frame. We then performed a series of optimizations and post-processing steps in hopes of increasing the performance of our ray tracing algorithm. We hope to achieve this gain in performance by reducing the number of global memory accesses to the GPU. In the next chapter we perform a series of benchmarks in order to measure the success of our approach against a vanilla version of Karras work[2]. 30

41 31

42 Chapter 4 Evaluation 4.1 Evaluation Methodology We implemented our algorithm in OpenGL/C++ and CUDA/C++. We also implemented the algorithm described by Karras [2] and benchmarked it against our own. Our tests then compare: No compression (Karras), internal node compression, and internal node compression together with mesh compression. Testes were made using an NVIDIA GeForce GTX 970 with 4 GB of RAM. Since our algorithm runs mainly in the GPU the CPU and system memory have no impact on the test results. We chose to render our images in a 1280x720p as this is one of the more commonly used resolutions for multimedia content. We measure the amount of intersection tests performed, namely ray-box and ray-triangle, in order to determine how the loss of precision of the bounding volumes impacts the number of intersection tests needed. We also measure the amount of time spent in each section of the algorithm. We then proceed to measure what the impact of our algorithm is in the usage of GPU memory and bandwidth consumption. 4.2 Experimental Scenes We tested our algorithm with three different scenes, ASIAN DRAGON, CORNELL and STANFORD BUNNY. CORNELL (Figure 4.1, 152 triangles) is a scene representative of highly reflective and refractive scenes. It also represents a low polygon scene. It consists of an icosphere surrounded by five mirrors and a glass prism. This scene focuses on testing performance gains in scenes with a high number of secondary rays. The STANFORD BUNNY scene (Figure 4.2, 70K triangles) is what we would call a medium sized scenes. It serves as a midterm between the very low polygon scenes (CORNELL) and the very high polygon scenes (ASIAN DRAGON). This scene also only features eye and shadow rays. ASIAN DRAGON (Figure 4.3, 7.2 M triangles ) is representative of complex objects with a high number of triangles. It s a dense model with a great number of triangles in a small space. Our intent with this 32

43 scene is to measure the performance gains of BVH compression for very dense BVHs. This scene only features eye and shadow rays. 4.3 Hypothesis We hypothesized our compression of the BVH will have an impact on many different levels. On one side we expect to have a greater number of ray-box intersections as our bounding volumes will be slightly larger. We also expect ray-triangle intersection to increase, although to a lesser extend. Memory footprint should be significantly reduced, as well as the amount of data fetched from memory. A direct result of the compression. Altogether we expect to have more, but faster, intersection tests as fewer bytes will be fetched from global memory in each test. Overall we also expect to see an increase in the number of frames per seconds Figure 4.1: CORNELL test scene. 33

44 Figure 4.2: STANFORD BUNNY test scene. Figure 4.3: ASIAN DRAGON test scene. 34

45 4.4 Experimental Results and Analysis Number of Ray-Box Intersection Tests In this section we measure the number of ray-box intersection test in each frame. As expected both the node compression and the node+mesh compression variations of our algorithm present the same number of intersections. Cornell Number of Ray- Box Intersec5ons Millions Karras Node Compression Node + Mesh Compression Figure 4.4: Number of ray-box intersection tests in CORNELL. Despite being a small, enclosed scene we notice that lack of precision of the bounding volume areas causes an increase of 9% in intersection tests. We assume this is likely caused by reflected and refracted rays that would normally pass tangent to the icosphere but are instead tested and categorized as a hit. Stanford Bunny Number of Ray- Box Intersec5ons Millions Karras Node Compression Node + Mesh Compression Figure 4.5: Number of ray-box intersection tests in STANFORD BUNNY. In STANDFORD BUNNY we notice an increase of 7.1% in the number of ray-box intersection tests, making it the less affected of all the 3 scenes. 35

46 Asian Dragon Number of Ray- Box Intersec5ons Millions Karras Node Compression Node + Mesh Compression Figure 4.6: Number of ray-box intersection tests in ASIAN DRAGON. In ASIAN DRAGON we notice an increase of 12.38% in the number of ray-box intersection tests. Since this scene has a high tree we expect rounding errors to accumulate over the levels and lead to this increase in the number of intersection tests Number of Ray-Triangle Intersection Tests In this section we measure the number of ray-triangle intersections. This number is a direct impact of the loss of precision of the bounding volumes. Since the more leaf bounding volumes are intersected, the more ray-triangle intersections we have to perform. As expected both variations of our algorithm will present the same number of tests. Cornell Number of Ray- Triangle Intersec7ons Millions Karras Node Compression Node + Mesh Compression Figure 4.7: Number of ray-triangle intersection tests in CORNELL. CORNELL has an increase of 11.1% tests performed. We believe this number is greater in comparison to the other test scenes because in this scene almost every ray will hit an object and thus will have to traverse the entire BVH, making it more prone to the effects of loss of precision of the bounding volumes. 36

47 Stanford Bunny Number of Ray- Triangle Intersec7ons Millions Karras Node Compression Node + Mesh Compression Figure 4.8: Number of ray-triangle intersection tests in STANFORD BUNNY. In STANFORD BUNNY we notice an increase of 7.4% in the number of tests performed. This increase in the number of ray-triangle test seems to be similar to the increase of ray-box intersection tests. Asian Dragon Number of Ray- Triangle Intersec7ons Millions Karras Node Compression Node + Mesh Compression Figure 4.9: Number of ray-triangle intersection tests in ASIAN DRAGON. ASIAN DRAGON has an increase of 10.75% tests performed. As in STANFORD BUNNY this number seems to go in hand with the increase of ray-box intersectiont tests, although slighty lower. 37

48 4.4.3 Global Memory Reads In this section we measure the total number of bytes read from global memory at each frame from intersection tests alone. We base our calculation in the number of intersection tests measured and the following formula: T otalm emoryread = numrb size(internaln ode)+numrt (size(leaf N ode)+size(triangle)) (4.1) The following results are based in the number of intersection tests performed in each case and the size of each internal node, leaf node and primitive. Reference to Appendix A for a list of relevant data structures. numrb stands for the number of ray-box intersection tests. numrt stands for the number of ray-triangle intersection tests. Cache hits are not taken into account. Our algorithm benefits best from them as each entity takes up less memory space compared to Karra s. Cornell MB Karras Node Compression Node + Mesh Compression Figure 4.10: Estimated memory read during intersection tests in CORNELL in each frame. In CORNELL we notice we are able to achieve a significant impact in the amount of memory we read when we implement node compression (14.5%), but we see little to no gains when implementing mesh compression. This is because in this scene the number of ray-box intersection dwarfs the number of ray-triangle intersection, hence optimizations made to the execution time of ray-triangle tests will have little impact. 38

49 Stanford Bunny MB Karras Node Compression Node + Mesh Compression Figure 4.11: Estimated memory read during intersection tests in STANFORD BUNNY in each frame. STANFORD BUNNY reduces its memory accesses by 14% when compressing internal nodes alone and 22% when compressing the mesh alike. These results go in hand with the results obtained in ASIAN DRAGON providing some insight that every model, be it high or low poly will be positively affected by our improvements. Asian Dragon MB Karras Node Compression Node + Mesh Compression Figure 4.12: Estimated memory read during intersection tests in ASIAN DRAGON in each frame. In ASIAN DRAGON we see a steady decrease of accessed memory as we implement our changes. We reduce memory accesses by 11.9% when using node compression and, by 20.3% when adding mesh compression 39

50 4.4.4 Bounding Volume Hierarchy Total Size In this section we measure the total memory occupied by both the BVH and the scene s primitives. The following table displays the total memory each structure occupies: STRUCTURE SIZE Karras Node Compression Node + Mesh Compression Data type bytes bytes bytes Internal Node Leaf Node Triangle (encoded in leaf node) 0 Table 4.1: Size of internal node, leaf node and triangle in each algorithm. Remember that for N triangles we will have a BVH with N 1 internal nodes and N leaf nodes. Cornell In CORNELL we calculate the BVH and primitives occupy Kilobytes in Karra s approach. In our node compression variation we calculate 9.71 Kilobytes, and in our node and mesh compression variation 6.06 Kilobytes. Stanford Bunny In Karra s algorithm we calculate STANFORD BUNNY to have a total of 5.57 Megabytes. Our approach with node compression comes in at 4.46 Megabytes. And finally our approach with node and mesh compression has a total size of 2.79 Megabytes. Asian Dragon Karra s approach to ASIAN DRAGON uses an estimated Megabytes. Our approaches, node compression and node plus mesh compression both use Megabytes and Megabytes, respectively. We are thus capable of reducing memory footprint to 50% of its original value. 40

51 4.4.5 Kernel Execution Time In this section we measure the execution time for each of the most relevant kernels. We evaluate: BVH Creation - Step that encloses the creation of Morton codes and their sorting, the creation of the hierarchy and the calculation of the bounding volumes. Node Compression - Compression of the internal nodes of the BVH. Mesh Compression - Encoding of the triangles into the enveloping bounding box. Rendering - Rendering kernel that includes the traversal of the BVH, intersection tests with primitive and shading. BVH traversal code can be found in Appenddix B. Cornell CORNELL Karras Node Compression Node + Mesh Compression Kernel Time (ms) Time (ms) Time (ms) BVH Creation Node compression Mesh Compression Rendering Table 4.2: Kernel execution time for CORNELL frame. CORNELL doesn t benefit much from our modifications. We notice a reduction in rendering time of 6.7% when compressing internal nodes and 10.7% when compressing the mesh alike. Both BVH construction time and compression are minimal. We believe our modifications do not have a great impact on this scene as this a very small scene that easily fits in L2 Cache, thus mitigating any improvements gained from memory optimization. 41

52 Stanford Bunny STANDFORD BUNNY Karras Node Compression Node + Mesh Compression Kernel Time (ms) Time (ms) Time (ms) BVH Creation Node compression Mesh Compression Rendering Table 4.3: Kernel execution time for STANDFORD BUNNY frame. In this scene we notice most of the time is spent executing the rendering kernel as in CORNELL. The creation of the BVH has a slight impact on the time required to render each frame and remains constant across all 3 variations. Compression times have little to no weight but greatly impact the time it takes to render the scene. Here we can see a reduction in rendering time of 20.55% when compressing just the BVH s internal nodes and a reduction of 41.01% when compressing both internal nodes and triangles. Overall we are only able to achieve a 37.33% increase in performance when considering the rest of the kernels. Asian Dragon ASIAN DRAGON Karras Node Compression Node + Mesh Compression Kernel Time (ms) Time (ms) Time (ms) BVH Creation Node compression Mesh Compression Rendering Table 4.4: Kernel execution time for ASIAN DRAGON frame. In ASIAN DRAGON we see a time reduction 21.08% of the duration rendering kernel when compressing just the internal nodes and 34.67% when compressing both internal nodes and triangles. Here we can see our algorithm has a significat impact in high poly models as we are fetching a vast number of BVH nodes in this scene. Unfortunately the BVH Creation kernel takes up most of the time in each frame when dealing with such a high number of polygons. Overall we are only able to achieve a 6.6% increase in performance when we consider the rest of the kernels that make up the frame. 42

53 4.4.6 Frames Per Second FPS Karras Node Compression Node + Mesh Compression Asian Dragon Stanford Bunny Cornell Figure 4.13: Frames per seconds in each test scene. In this section we make an overall comparison of the frame per seconds we are able to generate in each scene. STANDFORD BUNNY is the scene that shows the most benefit from our approach. ASIAN DRAGON has no significant changes in performance as it manages a meager 2 frames per second, mainly caused by the high cost of constructing the BVH. CORNELL shows slight improvements in performance. 43

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016 Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives

More information

Fast BVH Construction on GPUs

Fast BVH Construction on GPUs Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California

More information

Bandwidth and Memory Efficiency in Real-Time Ray Tracing

Bandwidth and Memory Efficiency in Real-Time Ray Tracing Bandwidth and Memory Efficiency in Real-Time Ray Tracing Pedro Lousada INESC-ID/Instituto Superior Técnico, University of Lisbon Rua Alves Redol, 9 1000-029 Lisboa, Portugal pedro.lousada@ist.utl.pt Vasco

More information

Spatial Data Structures

Spatial Data Structures CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration

More information

Spatial Data Structures

Spatial Data Structures CSCI 480 Computer Graphics Lecture 7 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids BSP Trees [Ch. 0.] March 8, 0 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s/

More information

Acceleration Data Structures

Acceleration Data Structures CT4510: Computer Graphics Acceleration Data Structures BOCHANG MOON Ray Tracing Procedure for Ray Tracing: For each pixel Generate a primary ray (with depth 0) While (depth < d) { Find the closest intersection

More information

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees High Performance Graphics (2012) C. Dachsbacher, J. Munkberg, and J. Pantaleoni (Editors) Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research Abstract

More information

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial Data Structures and Speed-Up Techniques Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial data structures What is it? Data structure that organizes

More information

Accelerating Ray-Tracing

Accelerating Ray-Tracing Lecture 9: Accelerating Ray-Tracing Computer Graphics and Imaging UC Berkeley CS184/284A, Spring 2016 Course Roadmap Rasterization Pipeline Core Concepts Sampling Antialiasing Transforms Geometric Modeling

More information

Spatial Data Structures

Spatial Data Structures Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) [Angel 9.10] Outline Ray tracing review what rays matter? Ray tracing speedup faster

More information

Real Time Ray Tracing

Real Time Ray Tracing Real Time Ray Tracing Programação 3D para Simulação de Jogos Vasco Costa Ray tracing? Why? How? P3DSJ Real Time Ray Tracing Vasco Costa 2 Real time ray tracing : example Source: NVIDIA P3DSJ Real Time

More information

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,

More information

Spatial Data Structures for Computer Graphics

Spatial Data Structures for Computer Graphics Spatial Data Structures for Computer Graphics Page 1 of 65 http://www.cse.iitb.ac.in/ sharat November 2008 Spatial Data Structures for Computer Graphics Page 1 of 65 http://www.cse.iitb.ac.in/ sharat November

More information

Accelerated Raytracing

Accelerated Raytracing Accelerated Raytracing Why is Acceleration Important? Vanilla ray tracing is really slow! mxm pixels, kxk supersampling, n primitives, average ray path length of d, l lights, 2 recursive ray casts per

More information

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory

More information

Spatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017

Spatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017 Spatial Data Structures Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017 Ray Intersections We can roughly estimate the time to render an image as being proportional to the number of ray-triangle

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie

More information

Last Time: Acceleration Data Structures for Ray Tracing. Schedule. Today. Shadows & Light Sources. Shadows

Last Time: Acceleration Data Structures for Ray Tracing. Schedule. Today. Shadows & Light Sources. Shadows Last Time: Acceleration Data Structures for Ray Tracing Modeling Transformations Illumination (Shading) Viewing Transformation (Perspective / Orthographic) Clipping Projection (to Screen Space) Scan Conversion

More information

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016 Accelerating Geometric Queries Computer Graphics CMU 15-462/15-662, Fall 2016 Geometric modeling and geometric queries p What point on the mesh is closest to p? What point on the mesh is closest to p?

More information

Ray Tracing Acceleration Data Structures

Ray Tracing Acceleration Data Structures Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has

More information

Ray Tracing with Spatial Hierarchies. Jeff Mahovsky & Brian Wyvill CSC 305

Ray Tracing with Spatial Hierarchies. Jeff Mahovsky & Brian Wyvill CSC 305 Ray Tracing with Spatial Hierarchies Jeff Mahovsky & Brian Wyvill CSC 305 Ray Tracing Flexible, accurate, high-quality rendering Slow Simplest ray tracer: Test every ray against every object in the scene

More information

Advanced Ray Tracing

Advanced Ray Tracing Advanced Ray Tracing Thanks to Fredo Durand and Barb Cutler The Ray Tree Ni surface normal Ri reflected ray Li shadow ray Ti transmitted (refracted) ray 51 MIT EECS 6.837, Cutler and Durand 1 Ray Tree

More information

Intersection Acceleration

Intersection Acceleration Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees

More information

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome! INFOMAGR Advanced Graphics Jacco Bikker - February April 2016 Welcome! I x, x = g(x, x ) ε x, x + S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions The Top-level BVH Real-time Ray Tracing

More information

RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies

RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies Published at IEEE Transactions on Visualization and Computer Graphics, 2010, Vol. 16, Num. 2, pp. 273 286 Tae Joon Kim joint work with

More information

Lecture 4 - Real-time Ray Tracing

Lecture 4 - Real-time Ray Tracing INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 4 - Real-time Ray Tracing Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions

More information

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions. 1 2 Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions. 3 We aim for something like the numbers of lights

More information

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome! INFOGR Computer Graphics J. Bikker - April-July 2015 - Lecture 11: Acceleration Welcome! Today s Agenda: High-speed Ray Tracing Acceleration Structures The Bounding Volume Hierarchy BVH Construction BVH

More information

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL:

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL: CS580: Ray Tracing Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/gcg/ Recursive Ray Casting Gained popularity in when Turner Whitted (1980) recognized that recursive ray casting could

More information

Photorealistic 3D Rendering for VW in Mobile Devices

Photorealistic 3D Rendering for VW in Mobile Devices Abstract University of Arkansas CSCE Department Advanced Virtual Worlds Spring 2013 Photorealistic 3D Rendering for VW in Mobile Devices Rafael Aroxa In the past few years, the demand for high performance

More information

Ray Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner)

Ray Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner) CS4620/5620: Lecture 37 Ray Tracing 1 Announcements Review session Tuesday 7-9, Phillips 101 Posted notes on slerp and perspective-correct texturing Prelim on Thu in B17 at 7:30pm 2 Basic ray tracing Basic

More information

Sung-Eui Yoon ( 윤성의 )

Sung-Eui Yoon ( 윤성의 ) CS380: Computer Graphics Ray Tracing Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/cg/ Class Objectives Understand overall algorithm of recursive ray tracing Ray generations Intersection

More information

HLBVH: Hierarchical LBVH Construction for Real Time Ray Tracing of Dynamic Geometry. Jacopo Pantaleoni and David Luebke NVIDIA Research

HLBVH: Hierarchical LBVH Construction for Real Time Ray Tracing of Dynamic Geometry. Jacopo Pantaleoni and David Luebke NVIDIA Research HLBVH: Hierarchical LBVH Construction for Real Time Ray Tracing of Dynamic Geometry Jacopo Pantaleoni and David Luebke NVIDIA Research Some Background Real Time Ray Tracing is almost there* [Garanzha and

More information

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li Parallel Physically Based Path-tracing and Shading Part 3 of 2 CIS565 Fall 202 University of Pennsylvania by Yining Karl Li Jim Scott 2009 Spatial cceleration Structures: KD-Trees *Some portions of these

More information

CPSC / Sonny Chan - University of Calgary. Collision Detection II

CPSC / Sonny Chan - University of Calgary. Collision Detection II CPSC 599.86 / 601.86 Sonny Chan - University of Calgary Collision Detection II Outline Broad phase collision detection: - Problem definition and motivation - Bounding volume hierarchies - Spatial partitioning

More information

Today. Acceleration Data Structures for Ray Tracing. Cool results from Assignment 2. Last Week: Questions? Schedule

Today. Acceleration Data Structures for Ray Tracing. Cool results from Assignment 2. Last Week: Questions? Schedule Today Acceleration Data Structures for Ray Tracing Cool results from Assignment 2 Last Week: koi seantek Ray Tracing Shadows Reflection Refraction Local Illumination Bidirectional Reflectance Distribution

More information

6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17]

6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17] 6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, 2011 9:05-12pm Two hand-written sheet of notes (4 pages) allowed NAME: 1 / 17 2 / 12 3 / 35 4 / 8 5 / 18 Total / 90 1 SSD [ /17]

More information

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y. COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering

More information

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development Chap. 5 Scene Management Overview Scene Management vs Rendering This chapter is about rendering

More information

Ray-tracing Acceleration. Acceleration Data Structures for Ray Tracing. Shadows. Shadows & Light Sources. Antialiasing Supersampling.

Ray-tracing Acceleration. Acceleration Data Structures for Ray Tracing. Shadows. Shadows & Light Sources. Antialiasing Supersampling. Ray-tracing Acceleration Acceleration Data Structures for Ray Tracing Thanks to Fredo Durand and Barb Cutler Soft shadows Antialiasing (getting rid of jaggies) Glossy reflection Motion blur Depth of field

More information

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI) Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) Raytracing Global illumination-based rendering method Simulates

More information

Universiteit Leiden Computer Science

Universiteit Leiden Computer Science Universiteit Leiden Computer Science Optimizing octree updates for visibility determination on dynamic scenes Name: Hans Wortel Student-no: 0607940 Date: 28/07/2011 1st supervisor: Dr. Michael Lew 2nd

More information

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T Copyright 2018 Sung-eui Yoon, KAIST freely available on the internet http://sglab.kaist.ac.kr/~sungeui/render

More information

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology Point Cloud Filtering using Ray Casting by Eric Jensen 01 The Basic Methodology Ray tracing in standard graphics study is a method of following the path of a photon from the light source to the camera,

More information

Computer Graphics. - Spatial Index Structures - Philipp Slusallek

Computer Graphics. - Spatial Index Structures - Philipp Slusallek Computer Graphics - Spatial Index Structures - Philipp Slusallek Overview Last lecture Overview of ray tracing Ray-primitive intersections Today Acceleration structures Bounding Volume Hierarchies (BVH)

More information

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday Announcements Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday 1 Spatial Data Structures Hierarchical Bounding Volumes Grids Octrees BSP Trees 11/7/02 Speeding Up Computations

More information

COMP 175: Computer Graphics April 11, 2018

COMP 175: Computer Graphics April 11, 2018 Lecture n+1: Recursive Ray Tracer2: Advanced Techniques and Data Structures COMP 175: Computer Graphics April 11, 2018 1/49 Review } Ray Intersect (Assignment 4): questions / comments? } Review of Recursive

More information

Building a Fast Ray Tracer

Building a Fast Ray Tracer Abstract Ray tracing is often used in renderers, as it can create very high quality images at the expense of run time. It is useful because of its ability to solve many different problems in image rendering.

More information

Lighting. To do. Course Outline. This Lecture. Continue to work on ray programming assignment Start thinking about final project

Lighting. To do. Course Outline. This Lecture. Continue to work on ray programming assignment Start thinking about final project To do Continue to work on ray programming assignment Start thinking about final project Lighting Course Outline 3D Graphics Pipeline Modeling (Creating 3D Geometry) Mesh; modeling; sampling; Interaction

More information

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11. Chapter 11 Global Illumination Part 1 Ray Tracing Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.3 CG(U), Chap.11 Part 1:Ray Tracing 1 Can pipeline graphics renders images

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

Lecture 2 - Acceleration Structures

Lecture 2 - Acceleration Structures INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 2 - Acceleration Structures Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Problem Analysis

More information

CPSC GLOBAL ILLUMINATION

CPSC GLOBAL ILLUMINATION CPSC 314 21 GLOBAL ILLUMINATION Textbook: 20 UGRAD.CS.UBC.CA/~CS314 Mikhail Bessmeltsev ILLUMINATION MODELS/ALGORITHMS Local illumination - Fast Ignore real physics, approximate the look Interaction of

More information

MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur

MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur Leonhard Grünschloß Martin Stich Sehera Nawaz Alexander Keller August 6, 2011 Principles of Accelerated Ray Tracing Hierarchical

More information

Cross-Hardware GPGPU implementation of Acceleration Structures for Ray Tracing using OpenCL

Cross-Hardware GPGPU implementation of Acceleration Structures for Ray Tracing using OpenCL The Open University of Israel Department of Mathematics and Computer Science Cross-Hardware GPGPU implementation of Acceleration Structures for Ray Tracing using OpenCL Advanced Project in Computer Science

More information

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set.

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set. 1 In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set. This presentation presents how such a DAG structure can be accessed immediately

More information

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell Anti-aliased and accelerated ray tracing University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell Reading Required: Watt, sections 12.5.3 12.5.4, 14.7 Further reading: A. Glassner.

More information

Specialized Acceleration Structures for Ray-Tracing. Warren Hunt

Specialized Acceleration Structures for Ray-Tracing. Warren Hunt Specialized Acceleration Structures for Ray-Tracing Warren Hunt Bill Mark Forward: Flavor of Research Build is cheap (especially with scan, lazy and build from hierarchy) Grid build and BVH refit are really

More information

Speeding Up Ray Tracing. Optimisations. Ray Tracing Acceleration

Speeding Up Ray Tracing. Optimisations. Ray Tracing Acceleration Speeding Up Ray Tracing nthony Steed 1999, eline Loscos 2005, Jan Kautz 2007-2009 Optimisations Limit the number of rays Make the ray test faster for shadow rays the main drain on resources if there are

More information

Interactive Ray Tracing: Higher Memory Coherence

Interactive Ray Tracing: Higher Memory Coherence Interactive Ray Tracing: Higher Memory Coherence http://gamma.cs.unc.edu/rt Dinesh Manocha (UNC Chapel Hill) Sung-Eui Yoon (Lawrence Livermore Labs) Interactive Ray Tracing Ray tracing is naturally sub-linear

More information

improving raytracing speed

improving raytracing speed ray tracing II computer graphics ray tracing II 2006 fabio pellacini 1 improving raytracing speed computer graphics ray tracing II 2006 fabio pellacini 2 raytracing computational complexity ray-scene intersection

More information

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline Computer Graphics (Fall 2008) COMS 4160, Lecture 15: Ray Tracing http://www.cs.columbia.edu/~cs4160 Effects needed for Realism (Soft) Shadows Reflections (Mirrors and Glossy) Transparency (Water, Glass)

More information

Topics. Ray Tracing II. Intersecting transformed objects. Transforming objects

Topics. Ray Tracing II. Intersecting transformed objects. Transforming objects Topics Ray Tracing II CS 4620 Lecture 16 Transformations in ray tracing Transforming objects Transformation hierarchies Ray tracing acceleration structures Bounding volumes Bounding volume hierarchies

More information

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR

More information

EDAN30 Photorealistic Computer Graphics. Seminar 2, Bounding Volume Hierarchy. Magnus Andersson, PhD student

EDAN30 Photorealistic Computer Graphics. Seminar 2, Bounding Volume Hierarchy. Magnus Andersson, PhD student EDAN30 Photorealistic Computer Graphics Seminar 2, 2012 Bounding Volume Hierarchy Magnus Andersson, PhD student (magnusa@cs.lth.se) This seminar We want to go from hundreds of triangles to thousands (or

More information

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo Computer Graphics Bing-Yu Chen National Taiwan University The University of Tokyo Hidden-Surface Removal Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm

More information

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Anti-aliased and accelerated ray tracing University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell eading! equired:! Watt, sections 12.5.3 12.5.4, 14.7! Further reading:! A. Glassner.

More information

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques. I. Course Title Parallel Computing 2 II. Course Description Students study parallel programming and visualization in a variety of contexts with an emphasis on underlying and experimental technologies.

More information

Ray Tracing Acceleration. CS 4620 Lecture 20

Ray Tracing Acceleration. CS 4620 Lecture 20 Ray Tracing Acceleration CS 4620 Lecture 20 2013 Steve Marschner 1 Will this be on the exam? or, Prelim 2 syllabus You can expect emphasis on topics related to the assignment (Shaders 1&2) and homework

More information

CEng 477 Introduction to Computer Graphics Fall 2007

CEng 477 Introduction to Computer Graphics Fall 2007 Visible Surface Detection CEng 477 Introduction to Computer Graphics Fall 2007 Visible Surface Detection Visible surface detection or hidden surface removal. Realistic scenes: closer objects occludes the

More information

Physically-Based Laser Simulation

Physically-Based Laser Simulation Physically-Based Laser Simulation Greg Reshko Carnegie Mellon University reshko@cs.cmu.edu Dave Mowatt Carnegie Mellon University dmowatt@andrew.cmu.edu Abstract In this paper, we describe our work on

More information

Topics. Ray Tracing II. Transforming objects. Intersecting transformed objects

Topics. Ray Tracing II. Transforming objects. Intersecting transformed objects Topics Ray Tracing II CS 4620 ations in ray tracing ing objects ation hierarchies Ray tracing acceleration structures Bounding volumes Bounding volume hierarchies Uniform spatial subdivision Adaptive spatial

More information

Comparison of hierarchies for occlusion culling based on occlusion queries

Comparison of hierarchies for occlusion culling based on occlusion queries Comparison of hierarchies for occlusion culling based on occlusion queries V.I. Gonakhchyan pusheax@ispras.ru Ivannikov Institute for System Programming of the RAS, Moscow, Russia Efficient interactive

More information

Fast Agglomerative Clustering for Rendering

Fast Agglomerative Clustering for Rendering Fast Agglomerative Clustering for Rendering Bruce Walter, Kavita Bala, Cornell University Milind Kulkarni, Keshav Pingali University of Texas, Austin Clustering Tree Hierarchical data representation Each

More information

RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies. Tae-Joon Kim, Bochang Moon, Duksu Kim, Sung-Eui Yoon, Member, IEEE

RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies. Tae-Joon Kim, Bochang Moon, Duksu Kim, Sung-Eui Yoon, Member, IEEE 1 RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies Tae-Joon Kim, Bochang Moon, Duksu Kim, Sung-Eui Yoon, Member, IEEE Abstract We present a novel compressed bounding volume hierarchy (BVH)

More information

Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha

Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha Symposium on Interactive Ray Tracing 2008 Los Angeles, California Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Kirill Garanzha Department of Software for Computers Bauman Moscow State

More information

Hello, Thanks for the introduction

Hello, Thanks for the introduction Hello, Thanks for the introduction 1 In this paper we suggest an efficient data-structure for precomputed shadows from point light or directional light-sources. Because, in fact, after more than four decades

More information

Real-Time Voxelization for Global Illumination

Real-Time Voxelization for Global Illumination Lecture 26: Real-Time Voxelization for Global Illumination Visual Computing Systems Voxelization to regular grid Input: scene triangles Output: surface information at each voxel in 3D grid - Simple case:

More information

Ray Intersection Acceleration

Ray Intersection Acceleration Ray Intersection Acceleration CMPT 461/761 Image Synthesis Torsten Möller Reading Chapter 2, 3, 4 of Physically Based Rendering by Pharr&Humphreys An Introduction to Ray tracing by Glassner Topics today

More information

Questions from Last Week? Extra rays needed for these effects. Shadows Motivation

Questions from Last Week? Extra rays needed for these effects. Shadows Motivation CS 431/636 Advanced Rendering Techniques Dr. David Breen University Crossings 149 Tuesday 6PM 8:50PM Presentation 4 4/22/08 Questions from Last Week? Color models Light models Phong shading model Assignment

More information

CS 431/636 Advanced Rendering Techniques

CS 431/636 Advanced Rendering Techniques CS 431/636 Advanced Rendering Techniques Dr. David Breen University Crossings 149 Tuesday 6PM 8:50PM Presentation 4 4/22/08 Questions from Last Week? Color models Light models Phong shading model Assignment

More information

1. Mesh Coloring a.) Assign unique color to each polygon based on the polygon id.

1. Mesh Coloring a.) Assign unique color to each polygon based on the polygon id. 1. Mesh Coloring a.) Assign unique color to each polygon based on the polygon id. Figure 1: The dragon model is shown rendered using a coloring scheme based on coloring each triangle face according to

More information

CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella

CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella Introduction Acceleration Techniques Spatial Data Structures Culling Outline for the Night Bounding

More information

Computer Graphics Ray Casting. Matthias Teschner

Computer Graphics Ray Casting. Matthias Teschner Computer Graphics Ray Casting Matthias Teschner Outline Context Implicit surfaces Parametric surfaces Combined objects Triangles Axis-aligned boxes Iso-surfaces in grids Summary University of Freiburg

More information

Computer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II

Computer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II Computer Graphics - Ray-Tracing II - Hendrik Lensch Overview Last lecture Ray tracing I Basic ray tracing What is possible? Recursive ray tracing algorithm Intersection computations Today Advanced acceleration

More information

Acceleration Data Structures for Ray Tracing

Acceleration Data Structures for Ray Tracing Acceleration Data Structures for Ray Tracing Travis Fischer and Nong Li (2007) Andries van Dam November 10, 2009 Acceleration Data Structures 1/35 Outline Introduction/Motivation Bounding Volume Hierarchy

More information

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012)

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012) Foundations of omputer Graphics (Spring 202) S 84, Lecture 5: Ray Tracing http://inst.eecs.berkeley.edu/~cs84 Effects needed for Realism (Soft) Shadows Reflections (Mirrors and Glossy) Transparency (Water,

More information

Ray Tracing Acceleration. CS 4620 Lecture 22

Ray Tracing Acceleration. CS 4620 Lecture 22 Ray Tracing Acceleration CS 4620 Lecture 22 2014 Steve Marschner 1 Topics Transformations in ray tracing Transforming objects Transformation hierarchies Ray tracing acceleration structures Bounding volumes

More information

CS 465 Program 5: Ray II

CS 465 Program 5: Ray II CS 465 Program 5: Ray II out: Friday 2 November 2007 due: Saturday 1 December 2007 Sunday 2 December 2007 midnight 1 Introduction In the first ray tracing assignment you built a simple ray tracer that

More information

Computer Graphics. Bing-Yu Chen National Taiwan University

Computer Graphics. Bing-Yu Chen National Taiwan University Computer Graphics Bing-Yu Chen National Taiwan University Visible-Surface Determination Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm Scan-Line Algorithm

More information

Holger Dammertz August 10, The Edge Volume Heuristic Robust Triangle Subdivision for Improved BVH Performance Holger Dammertz, Alexander Keller

Holger Dammertz August 10, The Edge Volume Heuristic Robust Triangle Subdivision for Improved BVH Performance Holger Dammertz, Alexander Keller Holger Dammertz August 10, 2008 The Edge Volume Heuristic Robust Triangle Subdivision for Improved BVH Performance Holger Dammertz, Alexander Keller Page 2 The Edge Volume Heuristic Robust Triangle Subdivision

More information

Applications of Explicit Early-Z Culling

Applications of Explicit Early-Z Culling Applications of Explicit Early-Z Culling Jason L. Mitchell ATI Research Pedro V. Sander ATI Research Introduction In past years, in the SIGGRAPH Real-Time Shading course, we have covered the details of

More information

School of Computer and Information Science

School of Computer and Information Science School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast

More information

Logistics. CS 586/480 Computer Graphics II. Questions from Last Week? Slide Credits

Logistics. CS 586/480 Computer Graphics II. Questions from Last Week? Slide Credits CS 586/480 Computer Graphics II Dr. David Breen Matheson 408 Thursday 6PM Æ 8:50PM Presentation 4 10/28/04 Logistics Read research paper and prepare summary and question P. Hanrahan, "Ray Tracing Algebraic

More information

Hidden Surface Elimination: BSP trees

Hidden Surface Elimination: BSP trees Hidden Surface Elimination: BSP trees Outline Binary space partition (BSP) trees Polygon-aligned 1 BSP Trees Basic idea: Preprocess geometric primitives in scene to build a spatial data structure such

More information

Computing Visibility. Backface Culling for General Visibility. One More Trick with Planes. BSP Trees Ray Casting Depth Buffering Quiz

Computing Visibility. Backface Culling for General Visibility. One More Trick with Planes. BSP Trees Ray Casting Depth Buffering Quiz Computing Visibility BSP Trees Ray Casting Depth Buffering Quiz Power of Plane Equations We ve gotten a lot of mileage out of one simple equation. Basis for D outcode-clipping Basis for plane-at-a-time

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Lecture 25 of 41. Spatial Sorting: Binary Space Partitioning Quadtrees & Octrees

Lecture 25 of 41. Spatial Sorting: Binary Space Partitioning Quadtrees & Octrees Spatial Sorting: Binary Space Partitioning Quadtrees & Octrees William H. Hsu Department of Computing and Information Sciences, KSU KSOL course pages: http://bit.ly/hgvxlh / http://bit.ly/evizre Public

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Last Time? The Traditional Graphics Pipeline Participating Media Measuring BRDFs 3D Digitizing & Scattering BSSRDFs Monte Carlo Simulation Dipole Approximation Today Ray Casting / Tracing Advantages? Ray

More information