COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1

Abstract The goal of this project was to use ray tracing in a rendering engine to enhance real-time graphics. For many years real-time rendering has been realized with techniques using rasterization approaches. The ease of implementing rasterized rendering engine is promised by highly optimized graphics hardware. However, there are other alternative rendering methods. This paper introduces an approach that combines rasterization and ray tracing techniques to approximate reflection and refraction effect in real-time. 2

Table of Contents Table of Contents... 3 List of Figures... 4 1 Introduction... 5 2. Implementation... 6 2.1 Whitted ray tracing on CPU... 7 2.2 Vertex shader for ray casting... 9 2.3 Compute shader for ray-triangle intersection test... 12 Summary... 14 References... 16 3

List of Figures Figure 1 Images rendered by the ray tracer... 8 Figure 2 Unity's default rendering pipeline...10 Figure 3 Vertex shader applied to the floor...11 Figure 4 Vertex shader applied to the sphere...11 Figure 5 Timeline of CPU and GPU usage...14 Figure 6 Flow chart of the program...14 Figure 7 Thread allocation for a 1024x768 image...14 4

1 Introduction The benefit of ray tracing is that it can accurately simulate the transport of light and produce photorealistic images with very simple algorithms. The integration of ray tracing into a real-time game engine would be beneficial for adding specific image effects that would be difficult to produce by a rasterization renderer. This project was implemented on the Unity platform which has its own scene editor and rendering pipeline. The original rendering pipeline is based fully on rasterization. The scene editor manages 3D models with BSP tree for spatial partitioning. But for ray tracing, the spatial partitioning alone is not enough for raising the frame rate to an interactive level. Therefore, this projects also make use of the rasterization pipeline and shader programming to unearth the power of the GPU in accelerating ray tracing. 5

2. Implementation The implementation of this project was divided into three phases. In the first phase, a simple ray tracer that runs fully on the CPU is implemented. In the second phase, I utilized vertex shaders to replace ray casting with rasterization. In the third phase, DirectCompute shaders were brought into the program so that ray-triangle intersection test was performed on the GPU while the CPU remained responsible for ray casting and shading. 6

2.1 Whitted ray tracing on CPU The simple ray tracer follows the Whitted ray tracing algorithm (Whitted, 1980). It casts primary rays, shadow rays, reflective and refractive rays recursively. Local illumination is applied to the the ray casting hit points. Phong, Blinn Phong and simple diffuse shading models are implemented as options for surface shadings. A pixel color is calculated for each primary ray hit point on objects according to the following equation: Pixel color = (Diffuse + Specular) * Light intensity + Reflection/Refraction (1) This approach is very slow but produces fairly good images. On my personal laptop with average Intel core i7 processors, it takes about five minutes to render a frame with resolution of 1024 x 768. The scene has 73 meshes with 10590 triangle. The same scene is also used for testing in phase three. 7

Figure 1 Images rendered by the ray tracer 8

2.2 Vertex shader for ray casting In the ray tracing program implemented in phase one, although BSP tree is used by default in Unity to accelerate ray-object hit detection, the intersection tests still take a lot of time to run. This become a serious bottleneck of the program. One way to walk around the expensive intersection tests was to cast less rays into the scene. Therefore, in phase two the ray casting procedure is replaced with vertex shaders that return the texture colors of visible objects and construct secondary rays. By this means, the recursive level of the ray tracing process is decreased by 1. This methodology makes use of Unity s deferred rendering pipeline, as illustrated in figure 1. The program replaced the G-buffer contents in the deferred shading path with data to be used in later ray tracing. Since a vertex shader will only be applied when the vertex becomes visible to the camera, the visibility problem is solved easily with the vertex shader. Next, the G-buffer contents are passed to the ray tracer script to cast secondary rays and determine the shading of the vertex. The implementation of the vertex shader follows the deferred rendering pipeline. It reads the vertex data once and produce temporal results in multiple rendering targets (MRT). These rendering targets are typically frame buffers that could be read as 2D textures. The rendering buffers store diffuse color, world position and 9

reflection direction for the reflective surfaces in RGBA channels. The ray tracer script then decodes the colors into 32-bit float vectors and performs ray tracing. Figure 3 and Figure 4 show some results of the intermediate frame buffers and their final images. Figure 2 Unity's default rendering pipeline 10

Figure 3 Vertex shader applied to the floor Figure 4 Vertex shader applied to the sphere 11

2.3 Compute shader for ray-triangle intersection test The idea is a humble version of what Nathan Carr (2002) described in his paper as a ray engine. Since CPU intersection test is too slow, they are moved to be calculated on the GPU. The obstacle in implementation is that GPUs are not built for recursions whereas the ray tracing algorithm makes a fair amount of recursive calls. The good thing is that parallelism is easily supported by the GPU. This is non-trivial because the tracing of each ray is independent of each other. With shared memory containing the geometry data, the program could allocate one thread for each ray to finish the intersection calculation. However, there is a limitation on memory. The data shared by each thread is stored as textures and structured buffers. The limitation of memory is easily reached as the scene complexity grows, and could cause potential cease of the program. Another limitation is the IO speed between CPU and GPU. This now becomes the bottleneck that slows down the frame rate. Figure 5 shows the timeline of CPU and GPU usage when rendering a simple scene, where the peak appears during the transfer of data storage. The flow of the program is illustrated in figure 6 in charts. The compute shader is executed on the GPU on a fixed update every few frames. It gets mesh data and rays from the main program and returns the hit information for each pixel. 12

The DX11 compute shader I used for this project supports up a maximum of 1024 threads per group. The X and Y dimension of number of threads can be up to 1024. Say for a rendering target of 1024x768 pixels, the threads are allocated as in figure 7 so that each thread is responsible for the calculation of a pixel. Compared to the straightforward approach in phase 1, this method achieves averagely half of the rendering time needed for the same scene, without losing of image quality. Potential improvements to this method include constructing a spatial partition data structure such as BSP or BVH for the meshes passing to the GPU, as well as compressing the data buffers so that less bytes are needed for the IO between the CPU and the GPU. 13

Figure 5 Timeline of CPU and GPU usage Figure 6 Flow chart of the program Figure 7 Thread allocation for a 1024x768 image 14

Summary In this project, ray tracing is used to produce image effects such as reflection, refraction and simple shadows and illuminations. Apart from a simple ray tracer that follows the standard ray tracing algorithm, two approaches are implemented and presented in this report to accelerate the rendering. These two approaches can be combined or used separately, depending on the scene to be rendered. The vertex shader approach is more suitable for local ray traced reflections rather than the whole scene. The compute shader approach is more flexible so it can be adapted to various kind of ray tracing. The thing that needs to be taken care of is the supported GPU memory limit on different platforms. 15

References Carr, N. A. (2002). The ray engine. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (pp. 37-46). Eurographics Association. Glassner, A. (1989). An introduction to ray tracing. Morgan Kaufmann. Pharr, M., & Humphreys, G. (2004). Physically based rendering: From theory to implementation. Morgan Kaufmann. Whitted, T. (1980). An improved illumination model for shaded display. Communications of the ACM, 23(6), pp. 343-349. 16