Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories *

Size: px
Start display at page:

Download "Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories *"

Transcription

1 JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, (2012) Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories * KYUNGHEE CHO 1, SEONGGUN KIM 2 AND HWANSOO HAN 2,+ 1 S-Core Corporation Seongnam, Korea 2 School of Information and Communication Engineering Sungkyunkwan University Suwon, Korea Recent electronic devices are equipped with processors extended with multicore accelerators to take advantage of the powerful performance from acceleration co-processors. Applications on such high-end electronic products require capability to run graphic-rich applications. Scalable acceleration co-processors are frequently designed as multicores with explicitly managed memories. Such multicore architectures require sophisticated data management among the main memory and the local memories to fully exploit their potential performance. Ray tracing is a high quality rendering algorithm in computer graphics and has potentially many parallelism to exploit. On the explicitly managed memory hierarchies, however, ray tracing with complex data structures tends to suffer from irregular memory accesses and inefficient data management. Compared to other acceleration structures for ray tracing, grid structure is simple to manage but commonly regarded to produce too slow algorithms. However, recent improvements on grid structure with SIMD optimizations show comparable performance with kd-tree structure, which is one of the fastest acceleration structures. We introduce a grid structure based parallel ray tracer on a processor with a multicore accelerator. We adopt SIMD optimizations and double buffering to enhance the performance of grid-based ray tracer and propose a macrocell structure over the grid to fully exploit the memory bandwidth. In our experiment, our ray tracing scheme shows comparable performance with BVH-based ray tracer. Keywords: ray tracing, multicore accelerator, grid structure, DMA latency hiding, explicitly managed memory 1. INTRODUCTION Recent advances in microprocessors allow off-the-shelf processors to equip powerful accelerators on the same microprocessor chips. For example, a multicore processor is designed to have nine processing cores: one regular processing core and eight specialized cores. While applications run on the regular processing core, some parts of the applications that demand high performance are assigned to the specialized cores to accelerate the execution. Processors of this kind can provide the total processing power up to hundreds of Gflops, which is competitive with powerful GPUs. Actually, the processor with such a multicore accelerator was originally developed for accelerating multimedia and vector processing applications. Thus, its main target applications include digital media, image Received May 31, 2011; accepted March 31, Communicated by Jiman Hong, Junyoung Heo and Tei-Wei Kuo. * This work was supported by the Ministry of Education, Science, and Technology, Korea under NRF Grant No. NRF and by the Ministry of Knowledge Economy, Korea under NIPA ITRC program No. NIPA-2012-H Corresponding author. 895

2 896 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN processing, compression, encryption, DSP, ray tracing, high performance computing, pattern matching, network security, etc. [1-3]. As electronics companies have an increasing interest in high quality applications on their electronic devices, some of high-end electronic devices are often equipped with this kind of multicore processors. Game consoles, which come with such multicore processors, provide realistic game play and high quality graphics. High-end HDTVs, which are also equipped with such multicore processors, are capable of decoding multiple video streams in software. Ray tracing is one of the most representative rendering algorithms for three-dimensional scenes. The quality of graphics is far better than traditional rasterization schemes. Since we can calculate the color of each pixel without any dependence to other pixels, parallelisms in ray tracing are abundant. Moreover, many fast traversal algorithms and intersection algorithms for ray tracing are developed by using SIMD optimizations. Ray tracing is a promising solution to distinguish the quality of future electronic products [4, 5]. The ray tracing algorithm shoots a ray from the eye through the screen to the 3D space, finds the nearest hit point of an object, and calculates the color of pixel from the information of objects and lights. After the first hit, we can generate more rays and traverse them recursively for reflection, refraction, and shadow. The object traversal step of ray tracing could be implemented to traverse all the triangles to test the intersection and find the nearest one, but it is inefficient to traverse all the triangles within a scene. If we can skip to traverse some of the triangles, which will never hit the ray, the performance would improve much better. Acceleration structures for ray tracing are proposed to implement this idea. Bounding volume hierarchy (BVH), grid, octree, binary space partition (BSP), and kd-tree are such examples [6]. Fig. 1 shows representative acceleration structures for ray tracing. (a) Grid. (b) Octree. (c) Bounding volume (d) kd-tree. hierarchy (BVH). Fig. 1. Acceleration structures. The acceleration structures can be classified depending on how those structures are built. Spatial subdivision is to divide triangles according to the location in the space. Grid, octree, and kd-tree belong to this category. The advantage of spatial subdivision is that we can exit early without checking rest of triangles, when we find the triangle hit by the ray. The downside of these structures is that intersection tests could be duplicated many times, when triangles stretch over more than two sub-spaces. Meanwhile, hierarchical object grouping collect the triangles grouped by objects and hierarchically enclose them within bounding shapes. BVH, skd-tree, and bkd-tree belong to this category. Traversal algorithms for these hierarchical objects start from the root node to the leaf nodes, checking

3 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 897 the intersection between a ray and a bounding shape. If a bounding shape does not intersect with a ray, we can skip the entire triangles that belong to the bounding shape. Complex acceleration structures, which adaptively build acceleration structures depending on distribution of objects, require a large amount of time to initialize the structures for ray tracing. For static scenes, initial structure building time can be amortized by rendering the same scene many times during the navigation of the scene. As the speed of ray tracers are getting faster, rendering dynamic scenes becomes an important feature [6-11]. One implication for dynamic rendering is that the structure build time should be included in rendering time. Most of acceleration structures are adaptive to geometry and require heavy build costs. Meanwhile, the build times of the grid structures are very fast in general. Since the grid structure just projects triangles to the uniformly divided cell, the building time is far lower than any other acceleration structures. In addition, we investigate an appropriate acceleration structures for explicitly managed memories. The uniformity of the grid structure is a plus side for managing data around the main memory and the local stores of specialized cores. The grid structure is generally regarded as a slower acceleration structure than other adaptive acceleration structures, but recent techniques to exploit SIMD instructions prefer the uniform and regular shape of the grid structure. In our paper, we additionally investigate techniques for grid-based structures on multicore accelerators with explicitly managed memories. The main contributions of our paper are as follows. We propose a grid-based ray tracer on multicore accelerators with explicitly managed memories. We propose a parallelization technique for ray tracing which can hide the DMA latency. We experimentally evaluate that our grid-based ray tracer is comparable to other hierarchical traversals on multicore accelerators with explicitly managed memories. 2. OVERVIEW OF GRID-BASED RAY TRACING To implement an efficient ray tracer, several components of the ray tracer should be taken into account. Structure building, structure traversal, and intersection test are all considered. Acceleration structures often decide the efficiencies of those components of the ray tracer. Building time is often ignored when most of the rendering algorithms focus on static scenes. Due to fast processors and advanced ray tracing algorithms, some ray tracers achieve a real-time rendering, which is capable of handling dynamic scenes. As we can deal with dynamic scenes, building acceleration structures becomes an important issue in ray tracing. Well-known acceleration structures such as BVH and kd-tree are classified as hierarchical, adaptive structure, but grid structures are uniform, spatial subdivision. A brief comparison is shown in Table 1. In terms of traversal time, BVH and kd-tree are faster than grid, since BVH and kd-tree build trees for fast traversing. Traversal algorithms for grid structures are considered to be slow, since they compute each ray to traverse cell by cell by using 3D-DDA algorithm. To improve the performance, we use the coherent grid traversal [8]. Instead of calculating each ray, the coherent grid traversal processes a packet

4 898 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN of multiple rays together by using SIMD instructions. The coherent grid traversal improves the performance of the 3D-DDA by 10 times and it is comparable with kd-tree traversals. Grid-based traversals often suffer from multiple intersection tests for the triangles that overlapped by two or more grids. We use the mail-boxing [12] and the frustum culling [13] to reduce the overhead of multiple intersection tests. As for the build time of acceleration structures, BVH and kd-tree take more times than simple spatial subdivision structures such as grid structures. Special cases such as deformable motion, refitting and incremental updates can be handled fast by BVH or kd-tree, but they are slow to build in general. On the other hand, grid-based acceleration structures have a plenty of potentials for real-time ray tracers on modern processor architectures. Table 1. Characteristics of acceleration structures. Acceleration Structure Partition Method Hierarchy Traversal Time * Build Time * Grid Uniform No O( 3 n) O(n) BVH Object Yes O(log n) O(n log n) kd-tree Adaptive Yes O(log n) O(n log n) * n is the number of triangles. Fig. 2. Parallel programming model for the multicore accelerator. To parallelize ray tracing on multicore accelerators, we use the single program multiple data (SPMD) programming model. Each core within the accelerator processes different ray to find the intersections with triangles on grid cells. Since each ray passes different grid cells, the processing time of each core is different, which may causes load imbalance across multiple cores. To avoid the load imbalance, we use dynamic scheduling. We divide the screen by n n pixel size tiles and distribute them to each core. If a core finishes its work, another tile is given by the master processor. The workload among cores is balanced in this way. As described in Fig. 2, the master processor constructs the grid-based data structures for ray tracing, initializes the cores within the accelerator, and schedule pixel tiles to render. Each core renders the pixels within the given tiles and returns the resulting colors of the pixels to the main memory. Since rendering the pixels of the disjoint tiles is an independent work, each core runs in parallel the steps of ray tracing: ray generation, ray traversal, intersection test, shading, and writing to the frame buffer. Once all the tiles are processed by the multiple cores, the master processor displays the result on the screen.

5 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 899 The memory system on multicore accelerators is composed of the multiple local memories which need an explicit management of data movement among the main memory and the local memories. The performance on such a memory system highly depends on how much we can hide the memory latency in the DMA transfers between the main memory and the local memory of the core. Software cache is a typical solution for BVH or kd-tree, since we cannot determine which triangles will be tested until we traverse down to the leaf node. The performance, however, may suffer from the high miss rate on the software cache. Meanwhile, grid-based structures divide space and map each grid space to a different data node without overlapping. If we know the traversing direction of the ray, we can find the grids to be processed by using a simple calculation and determine which triangles will be tested. Fig. 3 briefly describes the double buffering on a grid-based structure. Assuming that we have a scene with four objects and each object consists of many triangles as in Fig. 3 (a). We can find which grid cells are passed by a ray by using a simple calculation before the actual traversal of each grid cell. Those grid cells are indicated by a grey color on Fig. 3 (b). Among the grey cells, we can figure out which grid cell contains triangles. We request the first DMA transfer of the triangles for the dark trapezoid object as in Fig. 3 (c). Once we get the first set of triangles, we perform the intersection test for the set of triangles. At the same time, we request the DMA transfer for the white circle object in Fig. 3 (c). We keep processing the intersection test and the DMA transfer until we find the hit point as in Fig. 3 (d). In this manner, we overlap the computation of the intersection tests with the DMA transfer for the next object to test and hide the DMA latency. (a) (b) (c) (d) Fig. 3. Double buffering for grid-based ray tracer; (a) Ray tracing on the scene with 4 objects; (b) Find the cells passed by a ray (those cells are indicated by grey color); (c) Perform double buffering by overlapping the computation for dark trapezoid with the DMA transfer for white circle; (d) Continue to process grid cells one by one until we find the hit point. 3. GRID STRUCTURE Grid structures are regarded to have a slower traversal time than hierarchical structures, but a faster build time. In terms of data structure, hierarchical structures traverse multiple nodes across non-contiguous memory locations, even though those nodes are spatially close. In grid structures, on the other hand, adjacent grid cells are located contiguously in memory, which allows to process spatially close grid cells with a better locality. In addition, grid structures are not hierarchically organized. Thus, we can easily predict the location of the next grid cell while we traverse. These characteristics make grid-based acceleration structures easily employ double buffering to hide the latency of the DMA

6 900 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN transfer. Grid structures are once regarded as slow acceleration structures, but they show comparable traversal performance to other complex acceleration structures. Moreover, the fast build times add values to grid structures, particularly in real time rendering for dynamic scenes. 3.1 Data Structure of Grid We use the polygon file format (PLY file format), which includes vertices and faces for the test set. The vertices consist of x, y, and z axes, and faces include some of vertices, but we use only triangles (a face includes 3 vertices). For the triangle intersection test with barycentric coordination [9], we make a pre-computed acceleration structure. Triangles in the acceleration structure are grouped and sorted by the grid cell and the triangles crossing the boundaries of grid cells are duplicated for the multiple grid cells. Each grid cell has two pieces of the information which are the index of triangle data in the acceleration structure and the number of triangles in the grid cell. By using these, we can easily fetch the data via DMA. The size of a grid cell is important for the grid-based acceleration structure. If a grid cell size is too small, it is good for performing fewer intersection tests for a grid cell, but we should traverse more grid cells. On the other hand, if a grid cell size is too large, we can traverse fewer grid cells, but perform more intersection tests for a gird cell. In this paper, we use the following Eq. (1) for the grid cell size [14]. λn 3 tri λn, 3 tri λn, 3 tri Nx = Lx Ny = Ly Nz = Lz (1) V V V V is the volume of the bounding box and L x, L y, and L z are the lengths of the three sides of the bounding box, respectively (i.e. V = L x L y L z ). The number of total triangles is N tri and the parameter that determines the size of a grid cell is λ. Since the total number of grid cells is N grid = N x N y N z, we can calculate the parameter, λ = N grid /N tri. If we assume that all the triangles are uniformly distributed when λ = 1, each grid will have one triangle. In general, we can assign appropriate number of triangles to a grid cell by adjusting the parameter (λ). If we increase λ, grid cells tend to include fewer triangles. If we decrease λ, grid cells are likely to include more triangles. 3.2 Macrocell When we build a uniform grid structure, some grid cells may contain too many triangles, but others may have no triangles. The imbalance of the triangles often hurts the performance of the grid traversal, since we need to traverse many grid cells which have no triangles to test intersections. To reduce such overhead, we use the idea of hierarchical grids by constructing macrocells over grid cells [8, 15]. A macrocell includes m m m size of grid cells. As a result, it introduces one level of hierarchy for grid cells. By using the macrocell structure, we can traverse cells faster, when the triangles are sparsely distributed over some grid cells. Moreover, the macrocell structure provides an advantage on multicore accelerators over other general purpose multicore processors. We can hold the information of many

7 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 901 more grid cells within the local memory. The data structure of a grid cell includes 3-dimensional information, the size of which can be very large. The local memory of an accelerator core is only a couple of hundreds of kilobytes. Since the accelerator core cannot access the data structures of the grid cells directly, it needs to bring them to its local memory before it can process them. A few hundreds of kilobytes are too small for a fairly large number of grid cells. In general, the acceleration core prefers to fetch the information for a large number of grid cells, as the overheads of DMA transfers can be reduced by fewer number of DMA requests. By using the macrocell structure, we can fetch the information of a macrocell on the local memory and traverse each grid cell without multiple DMA transfers. For empty grid cells that have no triangles inside, we can skip fetching the detailed information for the gird cells. Without the macrocell, we do not know whether a grid cell includes triangles before bringing the data for that grid cell to the local memory. If the fetched grid cell has no triangles, it would be a useless DMA request. In addition, it makes difficult to apply double buffering. Since there are no triangles to compute intersection tests, the next DMA transfer cannot be overlapped with the computation. As a result, it causes a performance drop. By using the macrocell, we can avoid fetching empty grid cells. We also reorder the grid cells so that the grid cells within the same macrocell are adjacently placed. When the information of the grid cells within the same macrocell is requested, a single DMA request can handle this request. 4. GRID-BASED TRAVERSAL AND INTERSECTION Ray tracing consists of five steps: ray generation, ray traversal, intersection test, shading, and frame buffer. In this section, we will present our traversal and intersection algorithms. To speed up our algorithms, we adopt the coherent grid traversal which takes advantage of SIMD instructions for intersection tests [8]. We also apply the mail-boxing [8, 12] and the vertex culling [8, 13] to overcome the shortcomings in grid structures. 4.1 Coherent Grid Traversal and SIMD Intersection Test For the fast grid traversal, we use the coherent grid traversal algorithm [8]. This algorithm is about 10 times faster than the conventional 3D-DDA algorithm. First, it finds the axis of the ray packet that is aligned to the traversal direction, and computes the bounding frustum of the packet. Then, it starts to traverse the grids along the traversal axis one slice at a time. As it proceeds to the next slice, the overlapping frustum with the next slice is incrementally computed from the overlapping frustum of the current slice. To maximize the ability of accelerator cores, we employ SIMD instructions in our intersection test [9]. First, we construct a ray packet with coherent n n rays. By using SIMD instructions, four rays are tested together at a time. The intersection test with a triangle consists of four individual tests. First, it tests the distance to the embedding plane of the triangle. Then, it tests the three barycentric coordinates of the point where the ray pierces the plane. 4.2 Mail-Boxing and Frustum Culling Mail-boxing and frustum culling are both very effective to reduce the number of redundant intersection tests, which are major disadvantages of uniform grid traversals. In

8 902 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN grid structures, a large number of triangles may overlap multiple grid cells. Since the multiple overlapped grid cells are neighboring among another, it is highly probable that the intersection test for the same triangle is performed multiple times. Repeatedly testing the intersection for the same triangle can be avoided by mail-boxing [8, 12]. A unique identification number is assigned to each triangle and accelerator cores record the triangle numbers which are already tested. Before performing the intersection test, we can check whether the identification number of the triangle to test is in the recorded list of numbers. If the identification number indicates that the triangle is already tested, we can skip its intersection test. Since triangles are not so tightly fit within the boundaries of grid cells as kd-tree, the intersection test on a grid structure results in some extra triangles for test which a kd-tree would avoid. If a triangle lies completely outside the frustum of the ray packet, we skip the intersection tests for the rays which are on the outside of the triangle by frustum culling with barycentric coordination. Before performing the intersection test, we perform the culling test for the four corner rays of a ray packet [8, 13]. If all of the four rays are on the outside of the triangle, we do not have to perform the intersection tests for the rest of the rays. 5. DMA LATENCY HIDING Multicore accelerators with local memories typically form a distributed memory. On such a memory system, we cannot access the main memory directly. With the usage of direct memory access (DMA), we need to move the data from the main memory to the local memories for accelerator cores. To reduce the overhead of the DAM latency, software cache [16, 17] is one technique, which keeps the triangle data for future usages. Another technique is double buffering, which is a widely used to overlap the communication with the computation [18]. Software cache is useful when the access pattern of memory is irregular, but this technique could impose quite a large overhead for misses in the software cache. When the access pattern is regular and we can predict the next data to access, doubling buffering is much effective with less overhead. Predicting the next index of the grid cell during the traversal of grid structures is relatively easy, since grid-based structures show a regular access pattern during the traversal. Thus, we adopt double buffering to hide the DMA latency instead of the software cache. We apply the double buffering scheme to three levels of the data transfers via asynchronous DMA requests. Fig. 4 represents these three levels of double buffering: tile level, macrocell level, and triangle level. Each acceleration core runs the same ray tracing code, but with different areas of tiles. The color of each pixel within a tile is calculated from the rendering algorithm and the resulting colored tiles are sent to the main memory through DMA transfers. At this tile level, we prepare two buffers, one for the rendering computation and the other for the DMA transfer. The double buffering scheme simultaneously renders current tiles and transfers the previously rendered tiles. Within the rendering algorithm for a pixel, we generate a packet of coherent rays and traverse through macrocells that intersect with the ray packet. At the macrocell level, we request the DMA transfer for the grid cells that are contained within the intersected non-empty macrocells. While we transfer the next grid cells to compute, we simultaneously traverse the current grid cells which have been transferred during the previous request. At the triangle level, we traverse the grid cells to find the triangles that intersect with the ray packet. In a similar fashion,

9 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 903 Fig. 4. Double buffering for DMA latency hiding: three levels of double buffering are employed for rendering. From the top to the bottom, each represents tile level, macrocell level, and triangle level double buffering, respectively. we request the DMA transfer for the next triangles, while we perform the intersection test with the current triangles, which have been transferred previously. Once we find the first hit points for the rays in the packet, we perform the shading algorithm to calculate the colors. 6. EXPERIMENTAL RESULTS To experimentally evaluate our grid-based ray tracing on multicore accelerators with explicitly managed memories, we used a game console which contains a multicore processor. The multicore processor has one general processor and six special cores as an accelerator, which runs at 3.2 GHz. The size of the local memory each accelerator core has is 256KB. The main memory of the game console is 256MB. Table 2 shows the rendered scenes and the characteristics of polygon models we used for our experiments. Four different models contain 36, ,000 vertexes and 70, ,000 triangles. The same conference model is rendered from two different viewpoints to generate the similar scenes used in other ray tracers. The polygon models were downloaded from the Stanford 3D scanning repository [19]. Table 2. Scenes used in experiments. Name Bunny Horse Armadillo Vertexes 35,947 48, ,974 #Triangles 69,451 96, ,944 Name Conference 1 Conference 2 #Vertexes 166, ,867 #Triangles 282, ,755

10 904 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN 6.1 Performance of Grid-Based Ray Tracer Table 3 shows the performance of the rendered scenes with our grid-based ray tracer on multicore accelerators with two shading variances. When we perform the ray casting without shading, the measured FPS values range from on a 6 core accelerator. Complex models such as armadillo and conference show lower FPSs than bunny and horse, but the results are still competitive. When we add simple shading, the measured FPSs drop by 10-20%, but the FPSs are still high. Table 3. Performance of grid-based ray tracer (fps). Bunny Horse Armadillo Conf.1 Conf.2 Ray casting + no shading on multicore 3.2 GHz 1 core cores Ray casting + simple shading on multicore 3.2 GHz 1 core cores Fig. 5. Scalability on multicore accelerators: FPS increases almost linearly as the number of cores increases. The dotted lines represent projected linear speedups based on 1 core performance. The graphs in Fig. 5 show how much scalable our grid-based ray tracer is. The thin dotted lines in the graph show the linearly projected FPS values from the results of the 1 core accelerator. The thick lines represent the measured FPS values by varying the number

11 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 905 of cores used in the accelerator. For all the scenes, the measured FPSs are only slightly lower than the projected FPSs. Thus, our grid-based ray tracer shows an almost linear scalability up to 6 core accelerators. 6.2 DMA Latency Hiding Fig. 6 compares the effect of double buffering. Two bars for each scene represent the result without double buffering and the result with double buffering, respectively. The execution times are broken down in seconds. The execution time is divided into initialization, computation, and DMA latency. The initialization part includes the time spent in ray generation and parameter reset. The computation part is the time for traversal and intersection test. The DMA latency contains the wait time to complete the DMA transfers on three levels of DMA requests. The difference for two bars for each scene is whether the double buffering is applied or not. Thus, only the DMA latency part (the top component on each bar) among the breakdown in seconds has reduced, as shown in Fig. 6. The rows labeled as DMA hide in Table 4 show how many percentage of the DMA latency is reduced by the double buffering scheme. For all experiments, 60-76% of the DMA latency is hidden by overlapping the DMA transfers and the computation. As a result, the performance of our ray tracer is increased by 10-22%. The rows labeled as speedup in Table 4 show theses performance improvements. The two conference scenes have relatively high portions for DMA latency, as these models have many objects inside. With double buffering, however, the performances of these two conference scenes improve more. Fig. 6. Execution time breakdown in seconds: initialization (INIT), computation (COMP), and DMA latency (DMA). Two bars for a scene represent the results without double buffering and the results with double buffering on 1 core and 6 core accelerators. Table 4. DMA latency hiding and dpeedup. Bunny Horse Armadillo Conf. 1 Conf. 2 DMA hide 1 core 71.7% 75.2% 75.7% 71.6% 73.5% 6 cores 71.2% 73.8% 75.4% 62.1% 60.1% speedup 1 core 15.4% 14.1% 10.3% 18.8% 21.8% (fps) 6 cores 15.6% 14.7% 10.6% 16.5% 18.5%

12 906 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN 6.3 Comparison with Other Ray Tracers Fig. 7 shows the performance comparison of our grid-based ray tracer on a 6 core accelerator with other ray tracers on a similar architecture and several general purpose processors. We used the two conference scenes to measure the performance of our gridbased ray tracer on a 6 core accelerator. The performances of other ray tracers are taken from the previously published literature. The graph in Fig. 7 (a) compares the performance of the ray tracing on multicore accelerators with two different acceleration structures: BVH and grid. The performance for the BVH structure is measured on an 8 core accelerator running at 2.4GHz [16]. Meanwhile, the performance of our grid-based ray tracing is measured on a 6 core accelerator running at 3.2GHz. Since the theoretical peak performances of two platforms are actually the same (8 cores 2.4GHz = 6 cores 3.2GHz), direct comparisons are meaningful. Our grid-based ray tracer is slow by half, but still comparable. If we include the build time, our grid-based ray tracer can be more competitive. The graph in Fig. 7 (b) compares the performance of our ray tracer with other ray tracers on general purpose processors from [8]. The performance of our grid-based ray tracer on a 6 core accelerator is quite impressive. Particularly, our grid-based traversal performs almost four times faster than the coherent grid traversal on a general purpose CPU, from which we mainly take the ideas for our grid-based structure on multicore accelerators. (a) (b) Fig. 7. Performance comparison with conference 1 for (a) and conference 2 for (b); (a) BVH vs. grid on multicore accelerators; (b) Grid on multicore accelerator vs. various ray tracers on general CPUs with/without HW multithreading (MT). 7. RELATED WORK Multicore accelerators are appropriate architectures for ray tracing. There have been quite volume of ray tracing studies on multicore accelerator architectures. The terrain rendering engine (TRE) has been developed as a client-server ray casting system [20]. A client sends user parameters to render and a server performs the rendering. The rendered

13 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 907 images are delivered in compressed forms between the server and the client. The rendering engine is pipelined and optimized to use SIMD instructions. Ray tracing with the BVH has been investigated on the dual multicore accelerators, which explores the software cache technique for explicitly managed memories [16, 17]. To reduce the cache miss delay, software hyper-threading has been also studied to hide the latency of the DMA transfers for the missed data [17]. To exploit the SIMD architecture of the multicore accelerators, efficient SIMD intersection algorithms are investigated on the BVH traversal with a packet of rays [9, 21]. The interactive ray tracer (irt) [22] for the multicore accelerators has been implemented by using techniques introduced in previous works [16, 17]. In addition, reflection, transparency, shadow, BRDF lighting, and cubic environment mapped texture are added to the features of the irt. The ray packet technique is applied to ambient occlusion rays, too. The irt was able to render complex scenes with over one million polygons on a cluster of eight accelerators, each of which contains eight special accelerator cores. 8. CONCLUSION In this paper, we present a grid-based ray tracer on multicore accelerators. We propose a parallelization scheme for multicore accelerators with explicitly managed memories. We also introduce the double buffering with macrocells over grid to hide the DMA latency. We experimentally show that our grid-based ray tracer has a close to linear scalability on a multicore accelerator. We also show our doubling buffering scheme can hide 60-76% of the DMA latency, which, in turns, results in 10-22% speedup in FPSs. Compared to ray tracers on various architectures, our ray tracer on a multicore accelerator shows competitive or better performance. Compared to the BVH based ray tracer on a similar multicore accelerator, our grid-based ray tracer is about two times slower, but the result is still promising as the build time for grid structures is much faster than BVH structures. For real time ray tracing with dynamic scenes, grid-based acceleration structures can be favorite choices. In addition, since more DMA overheads are expected for secondary rays, double buffering and SIMD intersection test could be extended to handle secondary rays in grid-based structures. In summary, grid-based structures have much potential on modern processor architectures, which are embedded with multicore accelerators and explicitly managed memories. REFERENCES 1. K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K. Yelick, The landscape of parallel computing research: a view from Berkeley, Technical Report No. UCB/EECS , Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, H. P. Hofstee, Power efficient processor architecture and the cell processor, in Proceedings of International Symposium on High-Performance Computer Architecture, 2005, pp S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick, The potential

14 908 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN of the cell processor for scientific computing, in Proceedings of International Conference on Computing Frontiers, 2006, pp T. J. Purcell, I. Buck, W. R. Mark, and P. Hanrahan, Ray tracing on programmable graphics hardware, ACM Transactions on Graphics, Vol. 21, 2002, pp J. Madruga, Passive head tracking using cell processor, in International Conference and Exhibition on Computer Graphics and Interactive Techniques, youtube.com/watch?v=ryntiyyijbq. 6. T. Ize, I. Wald, and S. Parker, Asynchronous BVH construction for ray tracing dynamic scenes on parallel multi-core architectures, in Proceedings of Eurographics Symposium on Parallel Graphics and Visualization, 2007, pp S. Parker, W. Martin, P. P. Sloan, P. Shirley, B. Smits, and C. Hansen, Interactive ray tracing, Interactive 3D Graphics, 1999, pp I. Wald, T. Ize, A. Kensler, A. Knoll, and S. G. Parker, Ray tracing animated scenes using coherent grid traversal, ACM Transactions on Graphics, Vol. 25, 2006, pp I. Wald, Realtime Ray Tracing and Interactive Global Illumination, Ph.D. Thesis, Department of Computer Science, Saarland University, I. Wald, W. R. Mark, J. Günther, S. Boulos, T. Ize, W. A. Hunt, S. G. Parker, and P. Shirley, State of art in ray tracing animated scenes, Computer Graphics Forum, Vol. 28, 2009, pp T. Akenine-Möller, E. Haines, and N. Hoffman, Real-Time Rendering, 3rd ed., A. K. Peters Ltd., D. Kirk and J. Arvo, Improved ray tagging for voxel-based ray tracing, Graphics Gems II, 1991, pp K. Dmitriev, V. Havran, and H. P. Seidel, Faster ray tracing with SIMD shaft culling, Research Report No. MPI-I , Max-Planck-Institut für Informatik, J. Cleary, B. Wyvill, G. Birtwistle, and R. Vatti, Design and analysis of a parallel ray tracing computer, in Proceedings of Simula Users Conference, 1984, pp S. Parker, M. Parker, Y. Livnat, P. P. Sloan, C. Hansen, and P. Shirley, Interactive ray tracing for volume visualization, IEEE Transactions on Computer Graphics and Visualization, Vol. 5, 1999, pp C. Benthin, I. Wald, M. Scherbaum, and H. Friedrich, Ray tracing on the cell processor, in Proceedings of IEEE Symposium on Interactive Ray Tracing, 2006, pp J. Sugerman, T. Foley, S. Yoshioka, and P. Hanrahan, Ray tracing on a cell processor with software caching, in IEEE Symposium on Interactive Ray Tracing, 2006, pp T. Chen, Z. Sura, and K. O Brien, Optimizing the use of static buffers for DMA on a Cell chip, in Proceedings of International Workshop on Languages and Compilers for Parallel Computing, 2006, pp Stanford Computer Graphics Laboratory, The Stanford Models, The Stanford 3D Scanning Repository, B. Minor, G. Fossum, and V. To, Terrain rendering engine (TRE): Cell broadband engine optimized real-time ray-caster, IBM White Paper, I. Wald, S. Boulos, and P. Shirley, Ray tracing deformable scenes using dynamic

15 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 909 bounding volume hierarchies, ACM Transactions on Graphics, Vol. 26, 2007, Art B. Minor, M. Nutter, and J. Madruga, irt: An interactive ray tracer for the cell be processor, IBM White Paper, Kyunghee Cho received the B.S. degree in Electrical Engineering from Hanyang University in 2007 and the M.S. degree in Computer Science from Korea Advanced Institute of Science and Technology (KAIST) in After graduation, he joined S-Core, Korea as an engineering staff. His research interests are in the field of compiler optimizations for graphics applications. Currently, he investigates optimization opportunities in the OpenGL runtime libraries on embedded systems. Seonggun Kim received the B.S. degree in Electrical Engineering and the Ph.D. degree in Computer Science from Korea Advanced Institute of Science and Technology (KAIST) in 2004 and 2010, respectively. He is currently a post-doctoral research associate at Sungkyunkwan University. His research interests are in the field of compiler techniques to automatically generate SIMD code and improve the memory locality for a broad range of applications. Hwansoo Han received the B.S. and the M.S. degrees in Computer Engineering from Seoul National University, Korea in 1993 and 1995, and the Ph.D. degree in Computer Science from the University of Maryland at College Park in He is currently an Associate Professor at Sungkyunkwan University. Previously, he was with Korea Advanced Institute of Science and Technology (KAIST) and Intel. His research interests include compiler technology for high-performance computing and embedded computing.

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016 Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives

More information

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday Announcements Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday 1 Spatial Data Structures Hierarchical Bounding Volumes Grids Octrees BSP Trees 11/7/02 Speeding Up Computations

More information

Accelerating Ray Tracing

Accelerating Ray Tracing Accelerating Ray Tracing Ray Tracing Acceleration Techniques Faster Intersections Fewer Rays Generalized Rays Faster Ray-Object Intersections Object bounding volumes Efficient intersection routines Fewer

More information

Lecture 2 - Acceleration Structures

Lecture 2 - Acceleration Structures INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 2 - Acceleration Structures Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Problem Analysis

More information

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development Chap. 5 Scene Management Overview Scene Management vs Rendering This chapter is about rendering

More information

Ray-Box Culling for Tree Structures

Ray-Box Culling for Tree Structures JOURNAL OF INFORMATION SCIENCE AND ENGINEERING XX, XXX-XXX (2012) Ray-Box Culling for Tree Structures JAE-HO NAH 1, WOO-CHAN PARK 2, YOON-SIG KANG 1, AND TACK-DON HAN 1 1 Department of Computer Science

More information

Row Tracing with Hierarchical Occlusion Maps

Row Tracing with Hierarchical Occlusion Maps Row Tracing with Hierarchical Occlusion Maps Ravi P. Kammaje, Benjamin Mora August 9, 2008 Page 2 Row Tracing with Hierarchical Occlusion Maps Outline August 9, 2008 Introduction Related Work Row Tracing

More information

Real Time Ray Tracing

Real Time Ray Tracing Real Time Ray Tracing Programação 3D para Simulação de Jogos Vasco Costa Ray tracing? Why? How? P3DSJ Real Time Ray Tracing Vasco Costa 2 Real time ray tracing : example Source: NVIDIA P3DSJ Real Time

More information

Spatial Data Structures

Spatial Data Structures CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration

More information

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI) Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) Raytracing Global illumination-based rendering method Simulates

More information

Spatial Data Structures

Spatial Data Structures CSCI 480 Computer Graphics Lecture 7 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids BSP Trees [Ch. 0.] March 8, 0 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s/

More information

Ray Tracing Acceleration Data Structures

Ray Tracing Acceleration Data Structures Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has

More information

Ray Tracing with Spatial Hierarchies. Jeff Mahovsky & Brian Wyvill CSC 305

Ray Tracing with Spatial Hierarchies. Jeff Mahovsky & Brian Wyvill CSC 305 Ray Tracing with Spatial Hierarchies Jeff Mahovsky & Brian Wyvill CSC 305 Ray Tracing Flexible, accurate, high-quality rendering Slow Simplest ray tracer: Test every ray against every object in the scene

More information

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial Data Structures and Speed-Up Techniques Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial data structures What is it? Data structure that organizes

More information

Interactive Ray Tracing: Higher Memory Coherence

Interactive Ray Tracing: Higher Memory Coherence Interactive Ray Tracing: Higher Memory Coherence http://gamma.cs.unc.edu/rt Dinesh Manocha (UNC Chapel Hill) Sung-Eui Yoon (Lawrence Livermore Labs) Interactive Ray Tracing Ray tracing is naturally sub-linear

More information

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome! INFOGR Computer Graphics J. Bikker - April-July 2015 - Lecture 11: Acceleration Welcome! Today s Agenda: High-speed Ray Tracing Acceleration Structures The Bounding Volume Hierarchy BVH Construction BVH

More information

Accelerating Ray-Tracing

Accelerating Ray-Tracing Lecture 9: Accelerating Ray-Tracing Computer Graphics and Imaging UC Berkeley CS184/284A, Spring 2016 Course Roadmap Rasterization Pipeline Core Concepts Sampling Antialiasing Transforms Geometric Modeling

More information

Accelerated Entry Point Search Algorithm for Real-Time Ray-Tracing

Accelerated Entry Point Search Algorithm for Real-Time Ray-Tracing Accelerated Entry Point Search Algorithm for Real-Time Ray-Tracing Figure 1: Four of the scenes used for testing purposes. From the left: Fairy Forest from the Utah 3D Animation Repository, Legocar from

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y. COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering

More information

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens Ray Tracing with Multi-Core/Shared Memory Systems Abe Stephens Real-time Interactive Massive Model Visualization Tutorial EuroGraphics 2006. Vienna Austria. Monday September 4, 2006 http://www.sci.utah.edu/~abe/massive06/

More information

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T Copyright 2018 Sung-eui Yoon, KAIST freely available on the internet http://sglab.kaist.ac.kr/~sungeui/render

More information

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer wimmer@cg.tuwien.ac.at Visibility Overview Basics about visibility Basics about occlusion culling View-frustum culling / backface culling Occlusion

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Last Time? The Traditional Graphics Pipeline Participating Media Measuring BRDFs 3D Digitizing & Scattering BSSRDFs Monte Carlo Simulation Dipole Approximation Today Ray Casting / Tracing Advantages? Ray

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie

More information

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory

More information

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,

More information

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR

More information

Intersection Acceleration

Intersection Acceleration Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees

More information

Sung-Eui Yoon ( 윤성의 )

Sung-Eui Yoon ( 윤성의 ) CS380: Computer Graphics Ray Tracing Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/cg/ Class Objectives Understand overall algorithm of recursive ray tracing Ray generations Intersection

More information

Spatial Data Structures

Spatial Data Structures Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) [Angel 9.10] Outline Ray tracing review what rays matter? Ray tracing speedup faster

More information

Ray Casting Deformable Models on the GPU

Ray Casting Deformable Models on the GPU Ray Casting Deformable Models on the GPU Suryakant Patidar and P. J. Narayanan Center for Visual Information Technology, IIIT Hyderabad. {skp@research., pjn@}iiit.ac.in Abstract The GPUs pack high computation

More information

Part IV. Review of hardware-trends for real-time ray tracing

Part IV. Review of hardware-trends for real-time ray tracing Part IV Review of hardware-trends for real-time ray tracing Hardware Trends For Real-time Ray Tracing Philipp Slusallek Saarland University, Germany Large Model Visualization at Boeing CATIA Model of Boeing

More information

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome! INFOMAGR Advanced Graphics Jacco Bikker - February April 2016 Welcome! I x, x = g(x, x ) ε x, x + S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions The Top-level BVH Real-time Ray Tracing

More information

Comparison of hierarchies for occlusion culling based on occlusion queries

Comparison of hierarchies for occlusion culling based on occlusion queries Comparison of hierarchies for occlusion culling based on occlusion queries V.I. Gonakhchyan pusheax@ispras.ru Ivannikov Institute for System Programming of the RAS, Moscow, Russia Efficient interactive

More information

Accelerated Ambient Occlusion Using Spatial Subdivision Structures

Accelerated Ambient Occlusion Using Spatial Subdivision Structures Abstract Ambient Occlusion is a relatively new method that gives global illumination like results. This paper presents a method to accelerate ambient occlusion using the form factor method in Bunnel [2005]

More information

Lecture 4 - Real-time Ray Tracing

Lecture 4 - Real-time Ray Tracing INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 4 - Real-time Ray Tracing Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions

More information

Massive Model Visualization using Real-time Ray Tracing

Massive Model Visualization using Real-time Ray Tracing Massive Model Visualization using Real-time Ray Tracing Eurographics 2006 Tutorial: Real-time Interactive Massive Model Visualization Andreas Dietrich Philipp Slusallek Saarland University & intrace GmbH

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Last Time? The Traditional Graphics Pipeline Reading for Today A Practical Model for Subsurface Light Transport, Jensen, Marschner, Levoy, & Hanrahan, SIGGRAPH 2001 Participating Media Measuring BRDFs

More information

Realtime Ray Tracing and its use for Interactive Global Illumination

Realtime Ray Tracing and its use for Interactive Global Illumination EUROGRAPHICS 2003 STAR State of The Art Report Realtime Ray Tracing and its use for Interactive Global Illumination Ingo Wald Timothy J.Purcell Jörg Schmittler {wald,schmittler,benthin,slusallek}@graphics.cs.uni-sb.de

More information

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Xingxing Zhu and Yangdong Deng Institute of Microelectronics, Tsinghua University, Beijing, China Email: zhuxingxing0107@163.com,

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Final Projects Proposals due Thursday 4/8 Proposed project summary At least 3 related papers (read & summarized) Description of series of test cases Timeline & initial task assignment The Traditional Graphics

More information

High Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal

High Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal High Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal David R. Chapman University of Maryland Baltimore County Abstract The IBM/Toshiba/Sony CELL processor exhibited

More information

Computer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II

Computer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II Computer Graphics - Ray-Tracing II - Hendrik Lensch Overview Last lecture Ray tracing I Basic ray tracing What is possible? Recursive ray tracing algorithm Intersection computations Today Advanced acceleration

More information

CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella

CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella Introduction Acceleration Techniques Spatial Data Structures Culling Outline for the Night Bounding

More information

COMP 175: Computer Graphics April 11, 2018

COMP 175: Computer Graphics April 11, 2018 Lecture n+1: Recursive Ray Tracer2: Advanced Techniques and Data Structures COMP 175: Computer Graphics April 11, 2018 1/49 Review } Ray Intersect (Assignment 4): questions / comments? } Review of Recursive

More information

Software Occlusion Culling

Software Occlusion Culling Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into

More information

Computer Graphics. - Rasterization - Philipp Slusallek

Computer Graphics. - Rasterization - Philipp Slusallek Computer Graphics - Rasterization - Philipp Slusallek Rasterization Definition Given some geometry (point, 2D line, circle, triangle, polygon, ), specify which pixels of a raster display each primitive

More information

Fast kd-tree Construction for 3D-Rendering Algorithms Like Ray Tracing

Fast kd-tree Construction for 3D-Rendering Algorithms Like Ray Tracing Fast kd-tree Construction for 3D-Rendering Algorithms Like Ray Tracing Sajid Hussain and Håkan Grahn Blekinge Institute of Technology SE-371 79 Karlskrona, Sweden {sajid.hussain,hakan.grahn}@bth.se http://www.bth.se/tek/paarts

More information

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome! INFOMAGR Advanced Graphics Jacco Bikker - February April 2016 Welcome! I x, x = g(x, x ) ε x, x + S ρ x, x, x I x, x dx Today s Agenda: Introduction : GPU Ray Tracing Practical Perspective Advanced Graphics

More information

Stackless Ray Traversal for kd-trees with Sparse Boxes

Stackless Ray Traversal for kd-trees with Sparse Boxes Stackless Ray Traversal for kd-trees with Sparse Boxes Vlastimil Havran Czech Technical University e-mail: havranat f el.cvut.cz Jiri Bittner Czech Technical University e-mail: bittnerat f el.cvut.cz November

More information

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11. Chapter 11 Global Illumination Part 1 Ray Tracing Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.3 CG(U), Chap.11 Part 1:Ray Tracing 1 Can pipeline graphics renders images

More information

Accelerated Raytracing

Accelerated Raytracing Accelerated Raytracing Why is Acceleration Important? Vanilla ray tracing is really slow! mxm pixels, kxk supersampling, n primitives, average ray path length of d, l lights, 2 recursive ray casts per

More information

Visible-Surface Detection Methods. Chapter? Intro. to Computer Graphics Spring 2008, Y. G. Shin

Visible-Surface Detection Methods. Chapter? Intro. to Computer Graphics Spring 2008, Y. G. Shin Visible-Surface Detection Methods Chapter? Intro. to Computer Graphics Spring 2008, Y. G. Shin The Visibility Problem [Problem Statement] GIVEN: a set of 3-D surfaces, a projection from 3-D to 2-D screen,

More information

Acceleration Structure for Animated Scenes. Copyright 2010 by Yong Cao

Acceleration Structure for Animated Scenes. Copyright 2010 by Yong Cao t min X X Y 1 B C Y 1 Y 2 A Y 2 D A B C D t max t min X X Y 1 B C Y 2 Y 1 Y 2 A Y 2 D A B C D t max t min X X Y 1 B C Y 1 Y 2 A Y 2 D A B C D t max t min A large tree structure change. A totally new tree!

More information

Computer Graphics. Bing-Yu Chen National Taiwan University

Computer Graphics. Bing-Yu Chen National Taiwan University Computer Graphics Bing-Yu Chen National Taiwan University Visible-Surface Determination Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm Scan-Line Algorithm

More information

Interactive Isosurface Ray Tracing of Large Octree Volumes

Interactive Isosurface Ray Tracing of Large Octree Volumes Interactive Isosurface Ray Tracing of Large Octree Volumes Aaron Knoll, Ingo Wald, Steven Parker, and Charles Hansen Scientific Computing and Imaging Institute University of Utah 2006 IEEE Symposium on

More information

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo Computer Graphics Bing-Yu Chen National Taiwan University The University of Tokyo Hidden-Surface Removal Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm

More information

Applications of Explicit Early-Z Culling

Applications of Explicit Early-Z Culling Applications of Explicit Early-Z Culling Jason L. Mitchell ATI Research Pedro V. Sander ATI Research Introduction In past years, in the SIGGRAPH Real-Time Shading course, we have covered the details of

More information

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012)

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012) Foundations of omputer Graphics (Spring 202) S 84, Lecture 5: Ray Tracing http://inst.eecs.berkeley.edu/~cs84 Effects needed for Realism (Soft) Shadows Reflections (Mirrors and Glossy) Transparency (Water,

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL:

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL: CS580: Ray Tracing Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/gcg/ Recursive Ray Casting Gained popularity in when Turner Whitted (1980) recognized that recursive ray casting could

More information

SUMMARY. CS380: Introduction to Computer Graphics Ray tracing Chapter 20. Min H. Kim KAIST School of Computing 18/05/29. Modeling

SUMMARY. CS380: Introduction to Computer Graphics Ray tracing Chapter 20. Min H. Kim KAIST School of Computing 18/05/29. Modeling CS380: Introduction to Computer Graphics Ray tracing Chapter 20 Min H. Kim KAIST School of Computing Modeling SUMMARY 2 1 Types of coordinate function Explicit function: Line example: Implicit function:

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using

More information

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS Chris Wyman, Rama Hoetzlein, Aaron Lefohn 2015 Symposium on Interactive 3D Graphics & Games CONTRIBUTIONS Full scene, fully dynamic alias-free

More information

Speeding up your game

Speeding up your game Speeding up your game The scene graph Culling techniques Level-of-detail rendering (LODs) Collision detection Resources and pointers (adapted by Marc Levoy from a lecture by Tomas Möller, using material

More information

MULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids

MULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids MULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids Vasco Costa, João Madeiras Pereira INESC-ID / IST, Rua Alves Redol 9, Apartado 1369,

More information

Spatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017

Spatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017 Spatial Data Structures Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017 Ray Intersections We can roughly estimate the time to render an image as being proportional to the number of ray-triangle

More information

Computer Graphics. - Spatial Index Structures - Philipp Slusallek

Computer Graphics. - Spatial Index Structures - Philipp Slusallek Computer Graphics - Spatial Index Structures - Philipp Slusallek Overview Last lecture Overview of ray tracing Ray-primitive intersections Today Acceleration structures Bounding Volume Hierarchies (BVH)

More information

Acceleration Data Structures

Acceleration Data Structures CT4510: Computer Graphics Acceleration Data Structures BOCHANG MOON Ray Tracing Procedure for Ray Tracing: For each pixel Generate a primary ray (with depth 0) While (depth < d) { Find the closest intersection

More information

Ray Tracing with Sparse Boxes

Ray Tracing with Sparse Boxes Ray Tracing with Sparse Boxes Vlastimil Havran Czech Technical University Jiří Bittner Czech Technical University Vienna University of Technology Figure : (left) A ray casted view of interior of a larger

More information

RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies

RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies Published at IEEE Transactions on Visualization and Computer Graphics, 2010, Vol. 16, Num. 2, pp. 273 286 Tae Joon Kim joint work with

More information

Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure

Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure EG UK Theory and Practice of Computer Graphics(2009) Wen Tang, John Collomosse(Editors) Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure MichaelSteffenandJosephZambreno Department

More information

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology Point Cloud Filtering using Ray Casting by Eric Jensen 01 The Basic Methodology Ray tracing in standard graphics study is a method of following the path of a photon from the light source to the camera,

More information

Real-time Ray Tracing on Programmable Graphics Hardware

Real-time Ray Tracing on Programmable Graphics Hardware Real-time Ray Tracing on Programmable Graphics Hardware Timothy J. Purcell, Ian Buck, William R. Mark, Pat Hanrahan Stanford University (Bill Mark is currently at NVIDIA) Abstract Recently a breakthrough

More information

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS Chris Wyman, Rama Hoetzlein, Aaron Lefohn 2015 Symposium on Interactive 3D Graphics & Games CONTRIBUTIONS Full scene, fully dynamic alias-free

More information

Universiteit Leiden Computer Science

Universiteit Leiden Computer Science Universiteit Leiden Computer Science Optimizing octree updates for visibility determination on dynamic scenes Name: Hans Wortel Student-no: 0607940 Date: 28/07/2011 1st supervisor: Dr. Michael Lew 2nd

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors

A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors Michael Steffen Electrical and Computer Engineering Iowa State University steffma@iastate.edu Joseph Zambreno Electrical

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Advanced Ray Tracing

Advanced Ray Tracing Advanced Ray Tracing Thanks to Fredo Durand and Barb Cutler The Ray Tree Ni surface normal Ri reflected ray Li shadow ray Ti transmitted (refracted) ray 51 MIT EECS 6.837, Cutler and Durand 1 Ray Tree

More information

Acceleration Structures. CS 6965 Fall 2011

Acceleration Structures. CS 6965 Fall 2011 Acceleration Structures Run Program 1 in simhwrt Lab time? Program 2 Also run Program 2 and include that output Inheritance probably doesn t work 2 Boxes Axis aligned boxes Parallelepiped 12 triangles?

More information

Ray Tracing on the Cell Processor

Ray Tracing on the Cell Processor Ray Tracing on the Cell Processor Carsten Benthin Ingo Wald Michael Scherbaum Heiko Friedrich intrace Realtime Ray Tracing GmbH SCI Institute, University of Utah Saarland University {benthin, scherbaum}@intrace.com,

More information

Fast BVH Construction on GPUs

Fast BVH Construction on GPUs Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California

More information

REDUCING RENDER TIME IN RAY TRACING

REDUCING RENDER TIME IN RAY TRACING REDUCING RENDER TIME IN RAY TRACING BY PIXEL AVERAGING Ali Asghar Behmanesh 1,Shahin pourbahrami 2, Behrouz Gholizadeh 3 1 Computer Department, Avecina University,Hamedan-Iran aa.behmanesh@gmail.com 2

More information

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell Anti-aliased and accelerated ray tracing University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell Reading Required: Watt, sections 12.5.3 12.5.4, 14.7 Further reading: A. Glassner.

More information

A Parallel Algorithm for Construction of Uniform Grids

A Parallel Algorithm for Construction of Uniform Grids A Parallel Algorithm for Construction of Uniform Grids Javor Kalojanov Saarland University Philipp Slusallek Saarland University DFKI Saarbrücken Abstract We present a fast, parallel GPU algorithm for

More information

Deformable and Fracturing Objects

Deformable and Fracturing Objects Interactive ti Collision i Detection ti for Deformable and Fracturing Objects Sung-Eui Yoon ( 윤성의 ) IWON associate professor KAIST http://sglab.kaist.ac.kr/~sungeui/ Acknowledgements Research collaborators

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

Interactive High Resolution Isosurface Ray Tracing on Multi-Core Processors

Interactive High Resolution Isosurface Ray Tracing on Multi-Core Processors Interactive High Resolution Isosurface Ray Tracing on Multi-Core Processors Qin Wang a Joseph JaJa a,1 a Institute for Advanced Computer Studies, Department of Electrical and Computer Engineering, University

More information

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline Computer Graphics (Fall 2008) COMS 4160, Lecture 15: Ray Tracing http://www.cs.columbia.edu/~cs4160 Effects needed for Realism (Soft) Shadows Reflections (Mirrors and Glossy) Transparency (Water, Glass)

More information

Computer Graphics. - Ray Tracing I - Marcus Magnor Philipp Slusallek. Computer Graphics WS05/06 Ray Tracing I

Computer Graphics. - Ray Tracing I - Marcus Magnor Philipp Slusallek. Computer Graphics WS05/06 Ray Tracing I Computer Graphics - Ray Tracing I - Marcus Magnor Philipp Slusallek Overview Last Lecture Introduction Today Ray tracing I Background Basic ray tracing What is possible? Recursive ray tracing algorithm

More information

New Reliable Algorithm of Ray Tracing. through Hexahedral Mesh

New Reliable Algorithm of Ray Tracing. through Hexahedral Mesh Applied Mathematical Sciences, Vol. 8, 2014, no. 24, 1171-1176 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4159 New Reliable Algorithm of Ray Tracing through Hexahedral Mesh R. P.

More information

Speeding Up Ray Tracing. Optimisations. Ray Tracing Acceleration

Speeding Up Ray Tracing. Optimisations. Ray Tracing Acceleration Speeding Up Ray Tracing nthony Steed 1999, eline Loscos 2005, Jan Kautz 2007-2009 Optimisations Limit the number of rays Make the ray test faster for shadow rays the main drain on resources if there are

More information

Realtime Ray Tracing

Realtime Ray Tracing Realtime Ray Tracing Meinrad Recheis Vienna University of Technology Figure 1: Images rendered in realtime with OpenRT on PC clusters at resolution 640 480. a) A Mercedes C-Class model consisting of 320.000

More information

Ray Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner)

Ray Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner) CS4620/5620: Lecture 37 Ray Tracing 1 Announcements Review session Tuesday 7-9, Phillips 101 Posted notes on slerp and perspective-correct texturing Prelim on Thu in B17 at 7:30pm 2 Basic ray tracing Basic

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

Identifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts

Identifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts Visible Surface Detection Identifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts Two approaches: 1. Object space methods 2. Image

More information

Ray Intersection Acceleration

Ray Intersection Acceleration Ray Intersection Acceleration Image Synthesis Torsten Möller Reading Physically Based Rendering by Pharr&Humphreys Chapter 2 - rays and transformations Chapter 3 - shapes Chapter 4 - intersections and

More information