Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories *
|
|
- Phebe Hodge
- 5 years ago
- Views:
Transcription
1 JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, (2012) Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories * KYUNGHEE CHO 1, SEONGGUN KIM 2 AND HWANSOO HAN 2,+ 1 S-Core Corporation Seongnam, Korea 2 School of Information and Communication Engineering Sungkyunkwan University Suwon, Korea Recent electronic devices are equipped with processors extended with multicore accelerators to take advantage of the powerful performance from acceleration co-processors. Applications on such high-end electronic products require capability to run graphic-rich applications. Scalable acceleration co-processors are frequently designed as multicores with explicitly managed memories. Such multicore architectures require sophisticated data management among the main memory and the local memories to fully exploit their potential performance. Ray tracing is a high quality rendering algorithm in computer graphics and has potentially many parallelism to exploit. On the explicitly managed memory hierarchies, however, ray tracing with complex data structures tends to suffer from irregular memory accesses and inefficient data management. Compared to other acceleration structures for ray tracing, grid structure is simple to manage but commonly regarded to produce too slow algorithms. However, recent improvements on grid structure with SIMD optimizations show comparable performance with kd-tree structure, which is one of the fastest acceleration structures. We introduce a grid structure based parallel ray tracer on a processor with a multicore accelerator. We adopt SIMD optimizations and double buffering to enhance the performance of grid-based ray tracer and propose a macrocell structure over the grid to fully exploit the memory bandwidth. In our experiment, our ray tracing scheme shows comparable performance with BVH-based ray tracer. Keywords: ray tracing, multicore accelerator, grid structure, DMA latency hiding, explicitly managed memory 1. INTRODUCTION Recent advances in microprocessors allow off-the-shelf processors to equip powerful accelerators on the same microprocessor chips. For example, a multicore processor is designed to have nine processing cores: one regular processing core and eight specialized cores. While applications run on the regular processing core, some parts of the applications that demand high performance are assigned to the specialized cores to accelerate the execution. Processors of this kind can provide the total processing power up to hundreds of Gflops, which is competitive with powerful GPUs. Actually, the processor with such a multicore accelerator was originally developed for accelerating multimedia and vector processing applications. Thus, its main target applications include digital media, image Received May 31, 2011; accepted March 31, Communicated by Jiman Hong, Junyoung Heo and Tei-Wei Kuo. * This work was supported by the Ministry of Education, Science, and Technology, Korea under NRF Grant No. NRF and by the Ministry of Knowledge Economy, Korea under NIPA ITRC program No. NIPA-2012-H Corresponding author. 895
2 896 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN processing, compression, encryption, DSP, ray tracing, high performance computing, pattern matching, network security, etc. [1-3]. As electronics companies have an increasing interest in high quality applications on their electronic devices, some of high-end electronic devices are often equipped with this kind of multicore processors. Game consoles, which come with such multicore processors, provide realistic game play and high quality graphics. High-end HDTVs, which are also equipped with such multicore processors, are capable of decoding multiple video streams in software. Ray tracing is one of the most representative rendering algorithms for three-dimensional scenes. The quality of graphics is far better than traditional rasterization schemes. Since we can calculate the color of each pixel without any dependence to other pixels, parallelisms in ray tracing are abundant. Moreover, many fast traversal algorithms and intersection algorithms for ray tracing are developed by using SIMD optimizations. Ray tracing is a promising solution to distinguish the quality of future electronic products [4, 5]. The ray tracing algorithm shoots a ray from the eye through the screen to the 3D space, finds the nearest hit point of an object, and calculates the color of pixel from the information of objects and lights. After the first hit, we can generate more rays and traverse them recursively for reflection, refraction, and shadow. The object traversal step of ray tracing could be implemented to traverse all the triangles to test the intersection and find the nearest one, but it is inefficient to traverse all the triangles within a scene. If we can skip to traverse some of the triangles, which will never hit the ray, the performance would improve much better. Acceleration structures for ray tracing are proposed to implement this idea. Bounding volume hierarchy (BVH), grid, octree, binary space partition (BSP), and kd-tree are such examples [6]. Fig. 1 shows representative acceleration structures for ray tracing. (a) Grid. (b) Octree. (c) Bounding volume (d) kd-tree. hierarchy (BVH). Fig. 1. Acceleration structures. The acceleration structures can be classified depending on how those structures are built. Spatial subdivision is to divide triangles according to the location in the space. Grid, octree, and kd-tree belong to this category. The advantage of spatial subdivision is that we can exit early without checking rest of triangles, when we find the triangle hit by the ray. The downside of these structures is that intersection tests could be duplicated many times, when triangles stretch over more than two sub-spaces. Meanwhile, hierarchical object grouping collect the triangles grouped by objects and hierarchically enclose them within bounding shapes. BVH, skd-tree, and bkd-tree belong to this category. Traversal algorithms for these hierarchical objects start from the root node to the leaf nodes, checking
3 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 897 the intersection between a ray and a bounding shape. If a bounding shape does not intersect with a ray, we can skip the entire triangles that belong to the bounding shape. Complex acceleration structures, which adaptively build acceleration structures depending on distribution of objects, require a large amount of time to initialize the structures for ray tracing. For static scenes, initial structure building time can be amortized by rendering the same scene many times during the navigation of the scene. As the speed of ray tracers are getting faster, rendering dynamic scenes becomes an important feature [6-11]. One implication for dynamic rendering is that the structure build time should be included in rendering time. Most of acceleration structures are adaptive to geometry and require heavy build costs. Meanwhile, the build times of the grid structures are very fast in general. Since the grid structure just projects triangles to the uniformly divided cell, the building time is far lower than any other acceleration structures. In addition, we investigate an appropriate acceleration structures for explicitly managed memories. The uniformity of the grid structure is a plus side for managing data around the main memory and the local stores of specialized cores. The grid structure is generally regarded as a slower acceleration structure than other adaptive acceleration structures, but recent techniques to exploit SIMD instructions prefer the uniform and regular shape of the grid structure. In our paper, we additionally investigate techniques for grid-based structures on multicore accelerators with explicitly managed memories. The main contributions of our paper are as follows. We propose a grid-based ray tracer on multicore accelerators with explicitly managed memories. We propose a parallelization technique for ray tracing which can hide the DMA latency. We experimentally evaluate that our grid-based ray tracer is comparable to other hierarchical traversals on multicore accelerators with explicitly managed memories. 2. OVERVIEW OF GRID-BASED RAY TRACING To implement an efficient ray tracer, several components of the ray tracer should be taken into account. Structure building, structure traversal, and intersection test are all considered. Acceleration structures often decide the efficiencies of those components of the ray tracer. Building time is often ignored when most of the rendering algorithms focus on static scenes. Due to fast processors and advanced ray tracing algorithms, some ray tracers achieve a real-time rendering, which is capable of handling dynamic scenes. As we can deal with dynamic scenes, building acceleration structures becomes an important issue in ray tracing. Well-known acceleration structures such as BVH and kd-tree are classified as hierarchical, adaptive structure, but grid structures are uniform, spatial subdivision. A brief comparison is shown in Table 1. In terms of traversal time, BVH and kd-tree are faster than grid, since BVH and kd-tree build trees for fast traversing. Traversal algorithms for grid structures are considered to be slow, since they compute each ray to traverse cell by cell by using 3D-DDA algorithm. To improve the performance, we use the coherent grid traversal [8]. Instead of calculating each ray, the coherent grid traversal processes a packet
4 898 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN of multiple rays together by using SIMD instructions. The coherent grid traversal improves the performance of the 3D-DDA by 10 times and it is comparable with kd-tree traversals. Grid-based traversals often suffer from multiple intersection tests for the triangles that overlapped by two or more grids. We use the mail-boxing [12] and the frustum culling [13] to reduce the overhead of multiple intersection tests. As for the build time of acceleration structures, BVH and kd-tree take more times than simple spatial subdivision structures such as grid structures. Special cases such as deformable motion, refitting and incremental updates can be handled fast by BVH or kd-tree, but they are slow to build in general. On the other hand, grid-based acceleration structures have a plenty of potentials for real-time ray tracers on modern processor architectures. Table 1. Characteristics of acceleration structures. Acceleration Structure Partition Method Hierarchy Traversal Time * Build Time * Grid Uniform No O( 3 n) O(n) BVH Object Yes O(log n) O(n log n) kd-tree Adaptive Yes O(log n) O(n log n) * n is the number of triangles. Fig. 2. Parallel programming model for the multicore accelerator. To parallelize ray tracing on multicore accelerators, we use the single program multiple data (SPMD) programming model. Each core within the accelerator processes different ray to find the intersections with triangles on grid cells. Since each ray passes different grid cells, the processing time of each core is different, which may causes load imbalance across multiple cores. To avoid the load imbalance, we use dynamic scheduling. We divide the screen by n n pixel size tiles and distribute them to each core. If a core finishes its work, another tile is given by the master processor. The workload among cores is balanced in this way. As described in Fig. 2, the master processor constructs the grid-based data structures for ray tracing, initializes the cores within the accelerator, and schedule pixel tiles to render. Each core renders the pixels within the given tiles and returns the resulting colors of the pixels to the main memory. Since rendering the pixels of the disjoint tiles is an independent work, each core runs in parallel the steps of ray tracing: ray generation, ray traversal, intersection test, shading, and writing to the frame buffer. Once all the tiles are processed by the multiple cores, the master processor displays the result on the screen.
5 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 899 The memory system on multicore accelerators is composed of the multiple local memories which need an explicit management of data movement among the main memory and the local memories. The performance on such a memory system highly depends on how much we can hide the memory latency in the DMA transfers between the main memory and the local memory of the core. Software cache is a typical solution for BVH or kd-tree, since we cannot determine which triangles will be tested until we traverse down to the leaf node. The performance, however, may suffer from the high miss rate on the software cache. Meanwhile, grid-based structures divide space and map each grid space to a different data node without overlapping. If we know the traversing direction of the ray, we can find the grids to be processed by using a simple calculation and determine which triangles will be tested. Fig. 3 briefly describes the double buffering on a grid-based structure. Assuming that we have a scene with four objects and each object consists of many triangles as in Fig. 3 (a). We can find which grid cells are passed by a ray by using a simple calculation before the actual traversal of each grid cell. Those grid cells are indicated by a grey color on Fig. 3 (b). Among the grey cells, we can figure out which grid cell contains triangles. We request the first DMA transfer of the triangles for the dark trapezoid object as in Fig. 3 (c). Once we get the first set of triangles, we perform the intersection test for the set of triangles. At the same time, we request the DMA transfer for the white circle object in Fig. 3 (c). We keep processing the intersection test and the DMA transfer until we find the hit point as in Fig. 3 (d). In this manner, we overlap the computation of the intersection tests with the DMA transfer for the next object to test and hide the DMA latency. (a) (b) (c) (d) Fig. 3. Double buffering for grid-based ray tracer; (a) Ray tracing on the scene with 4 objects; (b) Find the cells passed by a ray (those cells are indicated by grey color); (c) Perform double buffering by overlapping the computation for dark trapezoid with the DMA transfer for white circle; (d) Continue to process grid cells one by one until we find the hit point. 3. GRID STRUCTURE Grid structures are regarded to have a slower traversal time than hierarchical structures, but a faster build time. In terms of data structure, hierarchical structures traverse multiple nodes across non-contiguous memory locations, even though those nodes are spatially close. In grid structures, on the other hand, adjacent grid cells are located contiguously in memory, which allows to process spatially close grid cells with a better locality. In addition, grid structures are not hierarchically organized. Thus, we can easily predict the location of the next grid cell while we traverse. These characteristics make grid-based acceleration structures easily employ double buffering to hide the latency of the DMA
6 900 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN transfer. Grid structures are once regarded as slow acceleration structures, but they show comparable traversal performance to other complex acceleration structures. Moreover, the fast build times add values to grid structures, particularly in real time rendering for dynamic scenes. 3.1 Data Structure of Grid We use the polygon file format (PLY file format), which includes vertices and faces for the test set. The vertices consist of x, y, and z axes, and faces include some of vertices, but we use only triangles (a face includes 3 vertices). For the triangle intersection test with barycentric coordination [9], we make a pre-computed acceleration structure. Triangles in the acceleration structure are grouped and sorted by the grid cell and the triangles crossing the boundaries of grid cells are duplicated for the multiple grid cells. Each grid cell has two pieces of the information which are the index of triangle data in the acceleration structure and the number of triangles in the grid cell. By using these, we can easily fetch the data via DMA. The size of a grid cell is important for the grid-based acceleration structure. If a grid cell size is too small, it is good for performing fewer intersection tests for a grid cell, but we should traverse more grid cells. On the other hand, if a grid cell size is too large, we can traverse fewer grid cells, but perform more intersection tests for a gird cell. In this paper, we use the following Eq. (1) for the grid cell size [14]. λn 3 tri λn, 3 tri λn, 3 tri Nx = Lx Ny = Ly Nz = Lz (1) V V V V is the volume of the bounding box and L x, L y, and L z are the lengths of the three sides of the bounding box, respectively (i.e. V = L x L y L z ). The number of total triangles is N tri and the parameter that determines the size of a grid cell is λ. Since the total number of grid cells is N grid = N x N y N z, we can calculate the parameter, λ = N grid /N tri. If we assume that all the triangles are uniformly distributed when λ = 1, each grid will have one triangle. In general, we can assign appropriate number of triangles to a grid cell by adjusting the parameter (λ). If we increase λ, grid cells tend to include fewer triangles. If we decrease λ, grid cells are likely to include more triangles. 3.2 Macrocell When we build a uniform grid structure, some grid cells may contain too many triangles, but others may have no triangles. The imbalance of the triangles often hurts the performance of the grid traversal, since we need to traverse many grid cells which have no triangles to test intersections. To reduce such overhead, we use the idea of hierarchical grids by constructing macrocells over grid cells [8, 15]. A macrocell includes m m m size of grid cells. As a result, it introduces one level of hierarchy for grid cells. By using the macrocell structure, we can traverse cells faster, when the triangles are sparsely distributed over some grid cells. Moreover, the macrocell structure provides an advantage on multicore accelerators over other general purpose multicore processors. We can hold the information of many
7 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 901 more grid cells within the local memory. The data structure of a grid cell includes 3-dimensional information, the size of which can be very large. The local memory of an accelerator core is only a couple of hundreds of kilobytes. Since the accelerator core cannot access the data structures of the grid cells directly, it needs to bring them to its local memory before it can process them. A few hundreds of kilobytes are too small for a fairly large number of grid cells. In general, the acceleration core prefers to fetch the information for a large number of grid cells, as the overheads of DMA transfers can be reduced by fewer number of DMA requests. By using the macrocell structure, we can fetch the information of a macrocell on the local memory and traverse each grid cell without multiple DMA transfers. For empty grid cells that have no triangles inside, we can skip fetching the detailed information for the gird cells. Without the macrocell, we do not know whether a grid cell includes triangles before bringing the data for that grid cell to the local memory. If the fetched grid cell has no triangles, it would be a useless DMA request. In addition, it makes difficult to apply double buffering. Since there are no triangles to compute intersection tests, the next DMA transfer cannot be overlapped with the computation. As a result, it causes a performance drop. By using the macrocell, we can avoid fetching empty grid cells. We also reorder the grid cells so that the grid cells within the same macrocell are adjacently placed. When the information of the grid cells within the same macrocell is requested, a single DMA request can handle this request. 4. GRID-BASED TRAVERSAL AND INTERSECTION Ray tracing consists of five steps: ray generation, ray traversal, intersection test, shading, and frame buffer. In this section, we will present our traversal and intersection algorithms. To speed up our algorithms, we adopt the coherent grid traversal which takes advantage of SIMD instructions for intersection tests [8]. We also apply the mail-boxing [8, 12] and the vertex culling [8, 13] to overcome the shortcomings in grid structures. 4.1 Coherent Grid Traversal and SIMD Intersection Test For the fast grid traversal, we use the coherent grid traversal algorithm [8]. This algorithm is about 10 times faster than the conventional 3D-DDA algorithm. First, it finds the axis of the ray packet that is aligned to the traversal direction, and computes the bounding frustum of the packet. Then, it starts to traverse the grids along the traversal axis one slice at a time. As it proceeds to the next slice, the overlapping frustum with the next slice is incrementally computed from the overlapping frustum of the current slice. To maximize the ability of accelerator cores, we employ SIMD instructions in our intersection test [9]. First, we construct a ray packet with coherent n n rays. By using SIMD instructions, four rays are tested together at a time. The intersection test with a triangle consists of four individual tests. First, it tests the distance to the embedding plane of the triangle. Then, it tests the three barycentric coordinates of the point where the ray pierces the plane. 4.2 Mail-Boxing and Frustum Culling Mail-boxing and frustum culling are both very effective to reduce the number of redundant intersection tests, which are major disadvantages of uniform grid traversals. In
8 902 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN grid structures, a large number of triangles may overlap multiple grid cells. Since the multiple overlapped grid cells are neighboring among another, it is highly probable that the intersection test for the same triangle is performed multiple times. Repeatedly testing the intersection for the same triangle can be avoided by mail-boxing [8, 12]. A unique identification number is assigned to each triangle and accelerator cores record the triangle numbers which are already tested. Before performing the intersection test, we can check whether the identification number of the triangle to test is in the recorded list of numbers. If the identification number indicates that the triangle is already tested, we can skip its intersection test. Since triangles are not so tightly fit within the boundaries of grid cells as kd-tree, the intersection test on a grid structure results in some extra triangles for test which a kd-tree would avoid. If a triangle lies completely outside the frustum of the ray packet, we skip the intersection tests for the rays which are on the outside of the triangle by frustum culling with barycentric coordination. Before performing the intersection test, we perform the culling test for the four corner rays of a ray packet [8, 13]. If all of the four rays are on the outside of the triangle, we do not have to perform the intersection tests for the rest of the rays. 5. DMA LATENCY HIDING Multicore accelerators with local memories typically form a distributed memory. On such a memory system, we cannot access the main memory directly. With the usage of direct memory access (DMA), we need to move the data from the main memory to the local memories for accelerator cores. To reduce the overhead of the DAM latency, software cache [16, 17] is one technique, which keeps the triangle data for future usages. Another technique is double buffering, which is a widely used to overlap the communication with the computation [18]. Software cache is useful when the access pattern of memory is irregular, but this technique could impose quite a large overhead for misses in the software cache. When the access pattern is regular and we can predict the next data to access, doubling buffering is much effective with less overhead. Predicting the next index of the grid cell during the traversal of grid structures is relatively easy, since grid-based structures show a regular access pattern during the traversal. Thus, we adopt double buffering to hide the DMA latency instead of the software cache. We apply the double buffering scheme to three levels of the data transfers via asynchronous DMA requests. Fig. 4 represents these three levels of double buffering: tile level, macrocell level, and triangle level. Each acceleration core runs the same ray tracing code, but with different areas of tiles. The color of each pixel within a tile is calculated from the rendering algorithm and the resulting colored tiles are sent to the main memory through DMA transfers. At this tile level, we prepare two buffers, one for the rendering computation and the other for the DMA transfer. The double buffering scheme simultaneously renders current tiles and transfers the previously rendered tiles. Within the rendering algorithm for a pixel, we generate a packet of coherent rays and traverse through macrocells that intersect with the ray packet. At the macrocell level, we request the DMA transfer for the grid cells that are contained within the intersected non-empty macrocells. While we transfer the next grid cells to compute, we simultaneously traverse the current grid cells which have been transferred during the previous request. At the triangle level, we traverse the grid cells to find the triangles that intersect with the ray packet. In a similar fashion,
9 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 903 Fig. 4. Double buffering for DMA latency hiding: three levels of double buffering are employed for rendering. From the top to the bottom, each represents tile level, macrocell level, and triangle level double buffering, respectively. we request the DMA transfer for the next triangles, while we perform the intersection test with the current triangles, which have been transferred previously. Once we find the first hit points for the rays in the packet, we perform the shading algorithm to calculate the colors. 6. EXPERIMENTAL RESULTS To experimentally evaluate our grid-based ray tracing on multicore accelerators with explicitly managed memories, we used a game console which contains a multicore processor. The multicore processor has one general processor and six special cores as an accelerator, which runs at 3.2 GHz. The size of the local memory each accelerator core has is 256KB. The main memory of the game console is 256MB. Table 2 shows the rendered scenes and the characteristics of polygon models we used for our experiments. Four different models contain 36, ,000 vertexes and 70, ,000 triangles. The same conference model is rendered from two different viewpoints to generate the similar scenes used in other ray tracers. The polygon models were downloaded from the Stanford 3D scanning repository [19]. Table 2. Scenes used in experiments. Name Bunny Horse Armadillo Vertexes 35,947 48, ,974 #Triangles 69,451 96, ,944 Name Conference 1 Conference 2 #Vertexes 166, ,867 #Triangles 282, ,755
10 904 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN 6.1 Performance of Grid-Based Ray Tracer Table 3 shows the performance of the rendered scenes with our grid-based ray tracer on multicore accelerators with two shading variances. When we perform the ray casting without shading, the measured FPS values range from on a 6 core accelerator. Complex models such as armadillo and conference show lower FPSs than bunny and horse, but the results are still competitive. When we add simple shading, the measured FPSs drop by 10-20%, but the FPSs are still high. Table 3. Performance of grid-based ray tracer (fps). Bunny Horse Armadillo Conf.1 Conf.2 Ray casting + no shading on multicore 3.2 GHz 1 core cores Ray casting + simple shading on multicore 3.2 GHz 1 core cores Fig. 5. Scalability on multicore accelerators: FPS increases almost linearly as the number of cores increases. The dotted lines represent projected linear speedups based on 1 core performance. The graphs in Fig. 5 show how much scalable our grid-based ray tracer is. The thin dotted lines in the graph show the linearly projected FPS values from the results of the 1 core accelerator. The thick lines represent the measured FPS values by varying the number
11 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 905 of cores used in the accelerator. For all the scenes, the measured FPSs are only slightly lower than the projected FPSs. Thus, our grid-based ray tracer shows an almost linear scalability up to 6 core accelerators. 6.2 DMA Latency Hiding Fig. 6 compares the effect of double buffering. Two bars for each scene represent the result without double buffering and the result with double buffering, respectively. The execution times are broken down in seconds. The execution time is divided into initialization, computation, and DMA latency. The initialization part includes the time spent in ray generation and parameter reset. The computation part is the time for traversal and intersection test. The DMA latency contains the wait time to complete the DMA transfers on three levels of DMA requests. The difference for two bars for each scene is whether the double buffering is applied or not. Thus, only the DMA latency part (the top component on each bar) among the breakdown in seconds has reduced, as shown in Fig. 6. The rows labeled as DMA hide in Table 4 show how many percentage of the DMA latency is reduced by the double buffering scheme. For all experiments, 60-76% of the DMA latency is hidden by overlapping the DMA transfers and the computation. As a result, the performance of our ray tracer is increased by 10-22%. The rows labeled as speedup in Table 4 show theses performance improvements. The two conference scenes have relatively high portions for DMA latency, as these models have many objects inside. With double buffering, however, the performances of these two conference scenes improve more. Fig. 6. Execution time breakdown in seconds: initialization (INIT), computation (COMP), and DMA latency (DMA). Two bars for a scene represent the results without double buffering and the results with double buffering on 1 core and 6 core accelerators. Table 4. DMA latency hiding and dpeedup. Bunny Horse Armadillo Conf. 1 Conf. 2 DMA hide 1 core 71.7% 75.2% 75.7% 71.6% 73.5% 6 cores 71.2% 73.8% 75.4% 62.1% 60.1% speedup 1 core 15.4% 14.1% 10.3% 18.8% 21.8% (fps) 6 cores 15.6% 14.7% 10.6% 16.5% 18.5%
12 906 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN 6.3 Comparison with Other Ray Tracers Fig. 7 shows the performance comparison of our grid-based ray tracer on a 6 core accelerator with other ray tracers on a similar architecture and several general purpose processors. We used the two conference scenes to measure the performance of our gridbased ray tracer on a 6 core accelerator. The performances of other ray tracers are taken from the previously published literature. The graph in Fig. 7 (a) compares the performance of the ray tracing on multicore accelerators with two different acceleration structures: BVH and grid. The performance for the BVH structure is measured on an 8 core accelerator running at 2.4GHz [16]. Meanwhile, the performance of our grid-based ray tracing is measured on a 6 core accelerator running at 3.2GHz. Since the theoretical peak performances of two platforms are actually the same (8 cores 2.4GHz = 6 cores 3.2GHz), direct comparisons are meaningful. Our grid-based ray tracer is slow by half, but still comparable. If we include the build time, our grid-based ray tracer can be more competitive. The graph in Fig. 7 (b) compares the performance of our ray tracer with other ray tracers on general purpose processors from [8]. The performance of our grid-based ray tracer on a 6 core accelerator is quite impressive. Particularly, our grid-based traversal performs almost four times faster than the coherent grid traversal on a general purpose CPU, from which we mainly take the ideas for our grid-based structure on multicore accelerators. (a) (b) Fig. 7. Performance comparison with conference 1 for (a) and conference 2 for (b); (a) BVH vs. grid on multicore accelerators; (b) Grid on multicore accelerator vs. various ray tracers on general CPUs with/without HW multithreading (MT). 7. RELATED WORK Multicore accelerators are appropriate architectures for ray tracing. There have been quite volume of ray tracing studies on multicore accelerator architectures. The terrain rendering engine (TRE) has been developed as a client-server ray casting system [20]. A client sends user parameters to render and a server performs the rendering. The rendered
13 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 907 images are delivered in compressed forms between the server and the client. The rendering engine is pipelined and optimized to use SIMD instructions. Ray tracing with the BVH has been investigated on the dual multicore accelerators, which explores the software cache technique for explicitly managed memories [16, 17]. To reduce the cache miss delay, software hyper-threading has been also studied to hide the latency of the DMA transfers for the missed data [17]. To exploit the SIMD architecture of the multicore accelerators, efficient SIMD intersection algorithms are investigated on the BVH traversal with a packet of rays [9, 21]. The interactive ray tracer (irt) [22] for the multicore accelerators has been implemented by using techniques introduced in previous works [16, 17]. In addition, reflection, transparency, shadow, BRDF lighting, and cubic environment mapped texture are added to the features of the irt. The ray packet technique is applied to ambient occlusion rays, too. The irt was able to render complex scenes with over one million polygons on a cluster of eight accelerators, each of which contains eight special accelerator cores. 8. CONCLUSION In this paper, we present a grid-based ray tracer on multicore accelerators. We propose a parallelization scheme for multicore accelerators with explicitly managed memories. We also introduce the double buffering with macrocells over grid to hide the DMA latency. We experimentally show that our grid-based ray tracer has a close to linear scalability on a multicore accelerator. We also show our doubling buffering scheme can hide 60-76% of the DMA latency, which, in turns, results in 10-22% speedup in FPSs. Compared to ray tracers on various architectures, our ray tracer on a multicore accelerator shows competitive or better performance. Compared to the BVH based ray tracer on a similar multicore accelerator, our grid-based ray tracer is about two times slower, but the result is still promising as the build time for grid structures is much faster than BVH structures. For real time ray tracing with dynamic scenes, grid-based acceleration structures can be favorite choices. In addition, since more DMA overheads are expected for secondary rays, double buffering and SIMD intersection test could be extended to handle secondary rays in grid-based structures. In summary, grid-based structures have much potential on modern processor architectures, which are embedded with multicore accelerators and explicitly managed memories. REFERENCES 1. K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K. Yelick, The landscape of parallel computing research: a view from Berkeley, Technical Report No. UCB/EECS , Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, H. P. Hofstee, Power efficient processor architecture and the cell processor, in Proceedings of International Symposium on High-Performance Computer Architecture, 2005, pp S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick, The potential
14 908 KYUNGHEE CHO, SEONGGUN KIM AND HWANSOO HAN of the cell processor for scientific computing, in Proceedings of International Conference on Computing Frontiers, 2006, pp T. J. Purcell, I. Buck, W. R. Mark, and P. Hanrahan, Ray tracing on programmable graphics hardware, ACM Transactions on Graphics, Vol. 21, 2002, pp J. Madruga, Passive head tracking using cell processor, in International Conference and Exhibition on Computer Graphics and Interactive Techniques, youtube.com/watch?v=ryntiyyijbq. 6. T. Ize, I. Wald, and S. Parker, Asynchronous BVH construction for ray tracing dynamic scenes on parallel multi-core architectures, in Proceedings of Eurographics Symposium on Parallel Graphics and Visualization, 2007, pp S. Parker, W. Martin, P. P. Sloan, P. Shirley, B. Smits, and C. Hansen, Interactive ray tracing, Interactive 3D Graphics, 1999, pp I. Wald, T. Ize, A. Kensler, A. Knoll, and S. G. Parker, Ray tracing animated scenes using coherent grid traversal, ACM Transactions on Graphics, Vol. 25, 2006, pp I. Wald, Realtime Ray Tracing and Interactive Global Illumination, Ph.D. Thesis, Department of Computer Science, Saarland University, I. Wald, W. R. Mark, J. Günther, S. Boulos, T. Ize, W. A. Hunt, S. G. Parker, and P. Shirley, State of art in ray tracing animated scenes, Computer Graphics Forum, Vol. 28, 2009, pp T. Akenine-Möller, E. Haines, and N. Hoffman, Real-Time Rendering, 3rd ed., A. K. Peters Ltd., D. Kirk and J. Arvo, Improved ray tagging for voxel-based ray tracing, Graphics Gems II, 1991, pp K. Dmitriev, V. Havran, and H. P. Seidel, Faster ray tracing with SIMD shaft culling, Research Report No. MPI-I , Max-Planck-Institut für Informatik, J. Cleary, B. Wyvill, G. Birtwistle, and R. Vatti, Design and analysis of a parallel ray tracing computer, in Proceedings of Simula Users Conference, 1984, pp S. Parker, M. Parker, Y. Livnat, P. P. Sloan, C. Hansen, and P. Shirley, Interactive ray tracing for volume visualization, IEEE Transactions on Computer Graphics and Visualization, Vol. 5, 1999, pp C. Benthin, I. Wald, M. Scherbaum, and H. Friedrich, Ray tracing on the cell processor, in Proceedings of IEEE Symposium on Interactive Ray Tracing, 2006, pp J. Sugerman, T. Foley, S. Yoshioka, and P. Hanrahan, Ray tracing on a cell processor with software caching, in IEEE Symposium on Interactive Ray Tracing, 2006, pp T. Chen, Z. Sura, and K. O Brien, Optimizing the use of static buffers for DMA on a Cell chip, in Proceedings of International Workshop on Languages and Compilers for Parallel Computing, 2006, pp Stanford Computer Graphics Laboratory, The Stanford Models, The Stanford 3D Scanning Repository, B. Minor, G. Fossum, and V. To, Terrain rendering engine (TRE): Cell broadband engine optimized real-time ray-caster, IBM White Paper, I. Wald, S. Boulos, and P. Shirley, Ray tracing deformable scenes using dynamic
15 ENHANCING VISUAL RENDERING ON MULTICORE ACCELERATORS WITH EMMS 909 bounding volume hierarchies, ACM Transactions on Graphics, Vol. 26, 2007, Art B. Minor, M. Nutter, and J. Madruga, irt: An interactive ray tracer for the cell be processor, IBM White Paper, Kyunghee Cho received the B.S. degree in Electrical Engineering from Hanyang University in 2007 and the M.S. degree in Computer Science from Korea Advanced Institute of Science and Technology (KAIST) in After graduation, he joined S-Core, Korea as an engineering staff. His research interests are in the field of compiler optimizations for graphics applications. Currently, he investigates optimization opportunities in the OpenGL runtime libraries on embedded systems. Seonggun Kim received the B.S. degree in Electrical Engineering and the Ph.D. degree in Computer Science from Korea Advanced Institute of Science and Technology (KAIST) in 2004 and 2010, respectively. He is currently a post-doctoral research associate at Sungkyunkwan University. His research interests are in the field of compiler techniques to automatically generate SIMD code and improve the memory locality for a broad range of applications. Hwansoo Han received the B.S. and the M.S. degrees in Computer Engineering from Seoul National University, Korea in 1993 and 1995, and the Ph.D. degree in Computer Science from the University of Maryland at College Park in He is currently an Associate Professor at Sungkyunkwan University. Previously, he was with Korea Advanced Institute of Science and Technology (KAIST) and Intel. His research interests include compiler technology for high-performance computing and embedded computing.
Ray Tracing. Computer Graphics CMU /15-662, Fall 2016
Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives
More informationAnnouncements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday
Announcements Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday 1 Spatial Data Structures Hierarchical Bounding Volumes Grids Octrees BSP Trees 11/7/02 Speeding Up Computations
More informationAccelerating Ray Tracing
Accelerating Ray Tracing Ray Tracing Acceleration Techniques Faster Intersections Fewer Rays Generalized Rays Faster Ray-Object Intersections Object bounding volumes Efficient intersection routines Fewer
More informationLecture 2 - Acceleration Structures
INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 2 - Acceleration Structures Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Problem Analysis
More informationScene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development
Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development Chap. 5 Scene Management Overview Scene Management vs Rendering This chapter is about rendering
More informationRay-Box Culling for Tree Structures
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING XX, XXX-XXX (2012) Ray-Box Culling for Tree Structures JAE-HO NAH 1, WOO-CHAN PARK 2, YOON-SIG KANG 1, AND TACK-DON HAN 1 1 Department of Computer Science
More informationRow Tracing with Hierarchical Occlusion Maps
Row Tracing with Hierarchical Occlusion Maps Ravi P. Kammaje, Benjamin Mora August 9, 2008 Page 2 Row Tracing with Hierarchical Occlusion Maps Outline August 9, 2008 Introduction Related Work Row Tracing
More informationReal Time Ray Tracing
Real Time Ray Tracing Programação 3D para Simulação de Jogos Vasco Costa Ray tracing? Why? How? P3DSJ Real Time Ray Tracing Vasco Costa 2 Real time ray tracing : example Source: NVIDIA P3DSJ Real Time
More informationSpatial Data Structures
CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration
More informationComputer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)
Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) Raytracing Global illumination-based rendering method Simulates
More informationSpatial Data Structures
CSCI 480 Computer Graphics Lecture 7 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids BSP Trees [Ch. 0.] March 8, 0 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s/
More informationRay Tracing Acceleration Data Structures
Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has
More informationRay Tracing with Spatial Hierarchies. Jeff Mahovsky & Brian Wyvill CSC 305
Ray Tracing with Spatial Hierarchies Jeff Mahovsky & Brian Wyvill CSC 305 Ray Tracing Flexible, accurate, high-quality rendering Slow Simplest ray tracer: Test every ray against every object in the scene
More informationSpatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology
Spatial Data Structures and Speed-Up Techniques Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial data structures What is it? Data structure that organizes
More informationInteractive Ray Tracing: Higher Memory Coherence
Interactive Ray Tracing: Higher Memory Coherence http://gamma.cs.unc.edu/rt Dinesh Manocha (UNC Chapel Hill) Sung-Eui Yoon (Lawrence Livermore Labs) Interactive Ray Tracing Ray tracing is naturally sub-linear
More informationINFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!
INFOGR Computer Graphics J. Bikker - April-July 2015 - Lecture 11: Acceleration Welcome! Today s Agenda: High-speed Ray Tracing Acceleration Structures The Bounding Volume Hierarchy BVH Construction BVH
More informationAccelerating Ray-Tracing
Lecture 9: Accelerating Ray-Tracing Computer Graphics and Imaging UC Berkeley CS184/284A, Spring 2016 Course Roadmap Rasterization Pipeline Core Concepts Sampling Antialiasing Transforms Geometric Modeling
More informationAccelerated Entry Point Search Algorithm for Real-Time Ray-Tracing
Accelerated Entry Point Search Algorithm for Real-Time Ray-Tracing Figure 1: Four of the scenes used for testing purposes. From the left: Fairy Forest from the Utah 3D Animation Repository, Legocar from
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationCOMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.
COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering
More informationRay Tracing with Multi-Core/Shared Memory Systems. Abe Stephens
Ray Tracing with Multi-Core/Shared Memory Systems Abe Stephens Real-time Interactive Massive Model Visualization Tutorial EuroGraphics 2006. Vienna Austria. Monday September 4, 2006 http://www.sci.utah.edu/~abe/massive06/
More informationS U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T
S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T Copyright 2018 Sung-eui Yoon, KAIST freely available on the internet http://sglab.kaist.ac.kr/~sungeui/render
More informationReal-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer
Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer wimmer@cg.tuwien.ac.at Visibility Overview Basics about visibility Basics about occlusion culling View-frustum culling / backface culling Occlusion
More informationSpatial Data Structures
15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie
More informationThe Traditional Graphics Pipeline
Last Time? The Traditional Graphics Pipeline Participating Media Measuring BRDFs 3D Digitizing & Scattering BSSRDFs Monte Carlo Simulation Dipole Approximation Today Ray Casting / Tracing Advantages? Ray
More informationSpatial Data Structures
15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie
More informationImproving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm
Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory
More informationRay Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University
Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,
More informationB-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes
B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR
More informationIntersection Acceleration
Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees
More informationSung-Eui Yoon ( 윤성의 )
CS380: Computer Graphics Ray Tracing Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/cg/ Class Objectives Understand overall algorithm of recursive ray tracing Ray generations Intersection
More informationSpatial Data Structures
Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) [Angel 9.10] Outline Ray tracing review what rays matter? Ray tracing speedup faster
More informationRay Casting Deformable Models on the GPU
Ray Casting Deformable Models on the GPU Suryakant Patidar and P. J. Narayanan Center for Visual Information Technology, IIIT Hyderabad. {skp@research., pjn@}iiit.ac.in Abstract The GPUs pack high computation
More informationPart IV. Review of hardware-trends for real-time ray tracing
Part IV Review of hardware-trends for real-time ray tracing Hardware Trends For Real-time Ray Tracing Philipp Slusallek Saarland University, Germany Large Model Visualization at Boeing CATIA Model of Boeing
More informationINFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!
INFOMAGR Advanced Graphics Jacco Bikker - February April 2016 Welcome! I x, x = g(x, x ) ε x, x + S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions The Top-level BVH Real-time Ray Tracing
More informationComparison of hierarchies for occlusion culling based on occlusion queries
Comparison of hierarchies for occlusion culling based on occlusion queries V.I. Gonakhchyan pusheax@ispras.ru Ivannikov Institute for System Programming of the RAS, Moscow, Russia Efficient interactive
More informationAccelerated Ambient Occlusion Using Spatial Subdivision Structures
Abstract Ambient Occlusion is a relatively new method that gives global illumination like results. This paper presents a method to accelerate ambient occlusion using the form factor method in Bunnel [2005]
More informationLecture 4 - Real-time Ray Tracing
INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 4 - Real-time Ray Tracing Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions
More informationMassive Model Visualization using Real-time Ray Tracing
Massive Model Visualization using Real-time Ray Tracing Eurographics 2006 Tutorial: Real-time Interactive Massive Model Visualization Andreas Dietrich Philipp Slusallek Saarland University & intrace GmbH
More informationThe Traditional Graphics Pipeline
Last Time? The Traditional Graphics Pipeline Reading for Today A Practical Model for Subsurface Light Transport, Jensen, Marschner, Levoy, & Hanrahan, SIGGRAPH 2001 Participating Media Measuring BRDFs
More informationRealtime Ray Tracing and its use for Interactive Global Illumination
EUROGRAPHICS 2003 STAR State of The Art Report Realtime Ray Tracing and its use for Interactive Global Illumination Ingo Wald Timothy J.Purcell Jörg Schmittler {wald,schmittler,benthin,slusallek}@graphics.cs.uni-sb.de
More informationEvaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique
Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Xingxing Zhu and Yangdong Deng Institute of Microelectronics, Tsinghua University, Beijing, China Email: zhuxingxing0107@163.com,
More informationThe Traditional Graphics Pipeline
Final Projects Proposals due Thursday 4/8 Proposed project summary At least 3 related papers (read & summarized) Description of series of test cases Timeline & initial task assignment The Traditional Graphics
More informationHigh Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal
High Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal David R. Chapman University of Maryland Baltimore County Abstract The IBM/Toshiba/Sony CELL processor exhibited
More informationComputer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II
Computer Graphics - Ray-Tracing II - Hendrik Lensch Overview Last lecture Ray tracing I Basic ray tracing What is possible? Recursive ray tracing algorithm Intersection computations Today Advanced acceleration
More informationCS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella
CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella Introduction Acceleration Techniques Spatial Data Structures Culling Outline for the Night Bounding
More informationCOMP 175: Computer Graphics April 11, 2018
Lecture n+1: Recursive Ray Tracer2: Advanced Techniques and Data Structures COMP 175: Computer Graphics April 11, 2018 1/49 Review } Ray Intersect (Assignment 4): questions / comments? } Review of Recursive
More informationSoftware Occlusion Culling
Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into
More informationComputer Graphics. - Rasterization - Philipp Slusallek
Computer Graphics - Rasterization - Philipp Slusallek Rasterization Definition Given some geometry (point, 2D line, circle, triangle, polygon, ), specify which pixels of a raster display each primitive
More informationFast kd-tree Construction for 3D-Rendering Algorithms Like Ray Tracing
Fast kd-tree Construction for 3D-Rendering Algorithms Like Ray Tracing Sajid Hussain and Håkan Grahn Blekinge Institute of Technology SE-371 79 Karlskrona, Sweden {sajid.hussain,hakan.grahn}@bth.se http://www.bth.se/tek/paarts
More informationINFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!
INFOMAGR Advanced Graphics Jacco Bikker - February April 2016 Welcome! I x, x = g(x, x ) ε x, x + S ρ x, x, x I x, x dx Today s Agenda: Introduction : GPU Ray Tracing Practical Perspective Advanced Graphics
More informationStackless Ray Traversal for kd-trees with Sparse Boxes
Stackless Ray Traversal for kd-trees with Sparse Boxes Vlastimil Havran Czech Technical University e-mail: havranat f el.cvut.cz Jiri Bittner Czech Technical University e-mail: bittnerat f el.cvut.cz November
More informationChapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.
Chapter 11 Global Illumination Part 1 Ray Tracing Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.3 CG(U), Chap.11 Part 1:Ray Tracing 1 Can pipeline graphics renders images
More informationAccelerated Raytracing
Accelerated Raytracing Why is Acceleration Important? Vanilla ray tracing is really slow! mxm pixels, kxk supersampling, n primitives, average ray path length of d, l lights, 2 recursive ray casts per
More informationVisible-Surface Detection Methods. Chapter? Intro. to Computer Graphics Spring 2008, Y. G. Shin
Visible-Surface Detection Methods Chapter? Intro. to Computer Graphics Spring 2008, Y. G. Shin The Visibility Problem [Problem Statement] GIVEN: a set of 3-D surfaces, a projection from 3-D to 2-D screen,
More informationAcceleration Structure for Animated Scenes. Copyright 2010 by Yong Cao
t min X X Y 1 B C Y 1 Y 2 A Y 2 D A B C D t max t min X X Y 1 B C Y 2 Y 1 Y 2 A Y 2 D A B C D t max t min X X Y 1 B C Y 1 Y 2 A Y 2 D A B C D t max t min A large tree structure change. A totally new tree!
More informationComputer Graphics. Bing-Yu Chen National Taiwan University
Computer Graphics Bing-Yu Chen National Taiwan University Visible-Surface Determination Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm Scan-Line Algorithm
More informationInteractive Isosurface Ray Tracing of Large Octree Volumes
Interactive Isosurface Ray Tracing of Large Octree Volumes Aaron Knoll, Ingo Wald, Steven Parker, and Charles Hansen Scientific Computing and Imaging Institute University of Utah 2006 IEEE Symposium on
More informationComputer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo
Computer Graphics Bing-Yu Chen National Taiwan University The University of Tokyo Hidden-Surface Removal Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm
More informationApplications of Explicit Early-Z Culling
Applications of Explicit Early-Z Culling Jason L. Mitchell ATI Research Pedro V. Sander ATI Research Introduction In past years, in the SIGGRAPH Real-Time Shading course, we have covered the details of
More informationEffects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012)
Foundations of omputer Graphics (Spring 202) S 84, Lecture 5: Ray Tracing http://inst.eecs.berkeley.edu/~cs84 Effects needed for Realism (Soft) Shadows Reflections (Mirrors and Glossy) Transparency (Water,
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationCS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL:
CS580: Ray Tracing Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/gcg/ Recursive Ray Casting Gained popularity in when Turner Whitted (1980) recognized that recursive ray casting could
More informationSUMMARY. CS380: Introduction to Computer Graphics Ray tracing Chapter 20. Min H. Kim KAIST School of Computing 18/05/29. Modeling
CS380: Introduction to Computer Graphics Ray tracing Chapter 20 Min H. Kim KAIST School of Computing Modeling SUMMARY 2 1 Types of coordinate function Explicit function: Line example: Implicit function:
More informationEnhancing Traditional Rasterization Graphics with Ray Tracing. October 2015
Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using
More informationFRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS
FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS Chris Wyman, Rama Hoetzlein, Aaron Lefohn 2015 Symposium on Interactive 3D Graphics & Games CONTRIBUTIONS Full scene, fully dynamic alias-free
More informationSpeeding up your game
Speeding up your game The scene graph Culling techniques Level-of-detail rendering (LODs) Collision detection Resources and pointers (adapted by Marc Levoy from a lecture by Tomas Möller, using material
More informationMULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids
MULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids Vasco Costa, João Madeiras Pereira INESC-ID / IST, Rua Alves Redol 9, Apartado 1369,
More informationSpatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017
Spatial Data Structures Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017 Ray Intersections We can roughly estimate the time to render an image as being proportional to the number of ray-triangle
More informationComputer Graphics. - Spatial Index Structures - Philipp Slusallek
Computer Graphics - Spatial Index Structures - Philipp Slusallek Overview Last lecture Overview of ray tracing Ray-primitive intersections Today Acceleration structures Bounding Volume Hierarchies (BVH)
More informationAcceleration Data Structures
CT4510: Computer Graphics Acceleration Data Structures BOCHANG MOON Ray Tracing Procedure for Ray Tracing: For each pixel Generate a primary ray (with depth 0) While (depth < d) { Find the closest intersection
More informationRay Tracing with Sparse Boxes
Ray Tracing with Sparse Boxes Vlastimil Havran Czech Technical University Jiří Bittner Czech Technical University Vienna University of Technology Figure : (left) A ray casted view of interior of a larger
More informationRACBVHs: Random Accessible Compressed Bounding Volume Hierarchies
RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies Published at IEEE Transactions on Visualization and Computer Graphics, 2010, Vol. 16, Num. 2, pp. 273 286 Tae Joon Kim joint work with
More informationDesign and Evaluation of a Hardware Accelerated Ray Tracing Data Structure
EG UK Theory and Practice of Computer Graphics(2009) Wen Tang, John Collomosse(Editors) Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure MichaelSteffenandJosephZambreno Department
More informationPoint Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology
Point Cloud Filtering using Ray Casting by Eric Jensen 01 The Basic Methodology Ray tracing in standard graphics study is a method of following the path of a photon from the light source to the camera,
More informationReal-time Ray Tracing on Programmable Graphics Hardware
Real-time Ray Tracing on Programmable Graphics Hardware Timothy J. Purcell, Ian Buck, William R. Mark, Pat Hanrahan Stanford University (Bill Mark is currently at NVIDIA) Abstract Recently a breakthrough
More informationFRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS
FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS Chris Wyman, Rama Hoetzlein, Aaron Lefohn 2015 Symposium on Interactive 3D Graphics & Games CONTRIBUTIONS Full scene, fully dynamic alias-free
More informationUniversiteit Leiden Computer Science
Universiteit Leiden Computer Science Optimizing octree updates for visibility determination on dynamic scenes Name: Hans Wortel Student-no: 0607940 Date: 28/07/2011 1st supervisor: Dr. Michael Lew 2nd
More informationEgemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for
Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and
More informationA Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors
A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors Michael Steffen Electrical and Computer Engineering Iowa State University steffma@iastate.edu Joseph Zambreno Electrical
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationAdvanced Ray Tracing
Advanced Ray Tracing Thanks to Fredo Durand and Barb Cutler The Ray Tree Ni surface normal Ri reflected ray Li shadow ray Ti transmitted (refracted) ray 51 MIT EECS 6.837, Cutler and Durand 1 Ray Tree
More informationAcceleration Structures. CS 6965 Fall 2011
Acceleration Structures Run Program 1 in simhwrt Lab time? Program 2 Also run Program 2 and include that output Inheritance probably doesn t work 2 Boxes Axis aligned boxes Parallelepiped 12 triangles?
More informationRay Tracing on the Cell Processor
Ray Tracing on the Cell Processor Carsten Benthin Ingo Wald Michael Scherbaum Heiko Friedrich intrace Realtime Ray Tracing GmbH SCI Institute, University of Utah Saarland University {benthin, scherbaum}@intrace.com,
More informationFast BVH Construction on GPUs
Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California
More informationREDUCING RENDER TIME IN RAY TRACING
REDUCING RENDER TIME IN RAY TRACING BY PIXEL AVERAGING Ali Asghar Behmanesh 1,Shahin pourbahrami 2, Behrouz Gholizadeh 3 1 Computer Department, Avecina University,Hamedan-Iran aa.behmanesh@gmail.com 2
More informationAnti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell
Anti-aliased and accelerated ray tracing University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell Reading Required: Watt, sections 12.5.3 12.5.4, 14.7 Further reading: A. Glassner.
More informationA Parallel Algorithm for Construction of Uniform Grids
A Parallel Algorithm for Construction of Uniform Grids Javor Kalojanov Saarland University Philipp Slusallek Saarland University DFKI Saarbrücken Abstract We present a fast, parallel GPU algorithm for
More informationDeformable and Fracturing Objects
Interactive ti Collision i Detection ti for Deformable and Fracturing Objects Sung-Eui Yoon ( 윤성의 ) IWON associate professor KAIST http://sglab.kaist.ac.kr/~sungeui/ Acknowledgements Research collaborators
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationInteractive High Resolution Isosurface Ray Tracing on Multi-Core Processors
Interactive High Resolution Isosurface Ray Tracing on Multi-Core Processors Qin Wang a Joseph JaJa a,1 a Institute for Advanced Computer Studies, Department of Electrical and Computer Engineering, University
More informationEffects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline
Computer Graphics (Fall 2008) COMS 4160, Lecture 15: Ray Tracing http://www.cs.columbia.edu/~cs4160 Effects needed for Realism (Soft) Shadows Reflections (Mirrors and Glossy) Transparency (Water, Glass)
More informationComputer Graphics. - Ray Tracing I - Marcus Magnor Philipp Slusallek. Computer Graphics WS05/06 Ray Tracing I
Computer Graphics - Ray Tracing I - Marcus Magnor Philipp Slusallek Overview Last Lecture Introduction Today Ray tracing I Background Basic ray tracing What is possible? Recursive ray tracing algorithm
More informationNew Reliable Algorithm of Ray Tracing. through Hexahedral Mesh
Applied Mathematical Sciences, Vol. 8, 2014, no. 24, 1171-1176 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4159 New Reliable Algorithm of Ray Tracing through Hexahedral Mesh R. P.
More informationSpeeding Up Ray Tracing. Optimisations. Ray Tracing Acceleration
Speeding Up Ray Tracing nthony Steed 1999, eline Loscos 2005, Jan Kautz 2007-2009 Optimisations Limit the number of rays Make the ray test faster for shadow rays the main drain on resources if there are
More informationRealtime Ray Tracing
Realtime Ray Tracing Meinrad Recheis Vienna University of Technology Figure 1: Images rendered in realtime with OpenRT on PC clusters at resolution 640 480. a) A Mercedes C-Class model consisting of 320.000
More informationRay Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner)
CS4620/5620: Lecture 37 Ray Tracing 1 Announcements Review session Tuesday 7-9, Phillips 101 Posted notes on slerp and perspective-correct texturing Prelim on Thu in B17 at 7:30pm 2 Basic ray tracing Basic
More informationHardware-driven visibility culling
Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount
More informationIdentifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts
Visible Surface Detection Identifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts Two approaches: 1. Object space methods 2. Image
More informationRay Intersection Acceleration
Ray Intersection Acceleration Image Synthesis Torsten Möller Reading Physically Based Rendering by Pharr&Humphreys Chapter 2 - rays and transformations Chapter 3 - shapes Chapter 4 - intersections and
More information