Parallel Path Tracing using Incoherent Path-Atom Binning

Size: px
Start display at page:

Download "Parallel Path Tracing using Incoherent Path-Atom Binning"

Transcription

1 Parallel Path Tracing using Incoherent Path-Atom Binning David Coulthurst University of Bristol, Piotr Dubla Kurt Debattista Alan Chalmers Simon McIntosh-Smith ClearSpeed Technology, Abstract Current parallel graphics algorithms minimise memory access latency by tracing packets of coherent rays. This coherency, however, breaks down after several bounces, and is unsuited to acceleration techniques such as selective rendering. This paper presents an unbiased path tracing algorithm which is insensitive to the coherency of the rays traced, allowing it to run on diverse architectures including massively SIMD processors. Bins of path-atoms are created and processed to form a path tracing circular buffer. Latency is hidden by n-buffering the load/save operations between bins. We demonstrate our approach as an implementation on the massively parallel SIMD architecture, the ClearSpeed CSX600. Keywords: path tracing, global illumination, parallel graphics 1 Introduction The Rendering Equation as first presented in [Kajiya 1986] described all the illumination at any given point in a scene via an integral equation representing all the reflections in the scene. In the same paper the concept of path tracing was introduced, a stochastic method for solving the rendering equation. Non branching paths from the camera to light sources are traced through each intersection with the scene geometry, as compared to Whitted ray tracing [Whitted 1980] where at each intersection both specular and shadow rays are sampled. In [Cook et al. 1984] this is extended such that at each intersection, the entire hemisphere is sampled to gather an estimate. Path tracing requires many more samples per pixel than ray-tracing, but delivers an unbiased result. Bi-directional path tracing was independently developed by both [Lafortune and Willems 1993] and [Veach and Guibas 1994], after noting that not all significant light transport paths are easily found from starting at the camera. Both path tracing and ray tracing require a lot of computational power to achieve a high-fidelity result. Traditional methods to reduce computational time by parallel processing have exploited coherence. However, such techniques limit the use of other acceleration techniques, such as selective rendering, and are not well suited to modern wide-width SIMD architectures. In this paper we present a novel approach which is able to exploit the performance of modern wide-width SIMD processors and significantly reduces latency, not through coherence, but by data buffering. We reformulate the recursive nature of rendering into a set of basic atomic operations, which are inherently suited to parallelisation. The current way of formulating rendering does not map itself well to all architectures, and we show that if we break the computation into simple atomic instructions we can adapt and group such instructions according to the hardware requirments. We demonstrate our approach on the ClearSpeed CSX architecture, a massively-parallel SIMD machine. We show the latency can be reduced with careful grouping of atomic instructions into bins. This allows us to reduce latency and improve perfromance without relying on the coherence commonly associated with fast ray tracing methods, enabling our method to be used for multiple bounce global illumination (GI) solutions and selective rendering. 2 Related Work Previous parallel rendering work has tended to focus on ray tracing, and on the use of commodity CPUs or GPUs. In [Purcell 2004], significant speedup was achieved through adapting the ray tracing algorithm to a stream processor model. Purcell also showed in [Purcell et al. 2003] how the photon mapping algorithm could be adapted to run on comodity GPUs. However this is only an approximation of true GI methods. An alternative approach is to design hardware specifically for tracing rays. In [Sven Woop and Slusallek 2005] a hardware ray-tracer was implemented. Ray-scene intersection using k-d trees, a programmable shading unit and the ability to handle the recursion necessary for ray tracing each have dedicated hardware on the chip. Specific hardware has the disadvantage of needing chip redesign if a new algorimthm emerges, such as the new acceleration structures being developed which are optimised for dynamic scenes [Wald et al. ]. Fast ray-tracing has now become possible on commidity PCs. The concept of ray packets [Wald et al. 2001] has come to dominate attempts to harness the new SIMD instructions in modern CPUs. The original techniques of grouping rays into coherent packets to make use of the 4-SIMD CPU instructions has been extended to use much larger packets. These techniques form a conservative bound to the ray packets and use this to avoid unnecessary intersection tests and traversal steps. Dmitriev et al. [Dmitriev et al. 2004] first proposed this for triangle intersections, with Reshetov et al. [Reshetov et al. 2005] extending this to kd-tree traversal. This was then developed further to both grids [Wald et al. 2006] and BVHs [Wald et al. 2007a]. However, all these techniques fundamentally rely on using coherent ray packets. Perceptually adaptive rendering techniques have achieved significant performance improvements for global illumination algorithms by exploiting knowledge of the human visual system [Myszkowski et al. 2001]. In particular, selective rendering, computes those areas of a scene to which a viewer is attending in high quality. The remainder of the scene is rendered at a much lower quality, and thus at a much lower computational cost, without the viewer being aware of this quality difference [Yee et al. 2001; Debattista 2006; Chalmers et al. 2006]. Adaptive techniques in general are not naturally coherent [Dubla et al. 2008] as the viewer may be attending to non-coherent regions throughout the scene.

2 3 Incoherent Path-Atom Binning Theory and Framework Current parallel ray tracing techniques have become very focused on the concept of ray packets, as introduced in [Wald et al. 2001]. The original intent of ray packets was to minimise the latency of loading scene data before processing it. While this technique works well for narrow-width SIMD processors and primary rays, for large width SIMD global illumination it has severe drawbacks. Ray packets provide speed up where up to 8 8 and packets are used, usually however this breaks down for larger packets, with a consequential loss of performance. Also, the coherence breaks down quickly, and the current work focuses on a single bounce at most [Wald et al. 2007b]. In [Boulos et al. 2007], the authors explored whether packetised ray tracing techniques could be extended to several bounces and non specular effects. The result shows that current packetisation methods can be applied, however once again it breaks down for higher SIMD widths and, in addition, does not provide an unbiased result. Considering a processor architecture such as the CSX, the SIMD width is large, 96 wide for the CSX600 chip used for the experiments. This SIMD width consists of 96 processing elements or PEs. Each PE is similar to a VLIW processor, and it is the large number of PEs operating in parallel that gives the processor its computational power. With SIMD width this large, and likely to grow larger in the future, different ways for harnessing the increased computational power massively SIMD chips bring, need to be found. The original intent of ray packets is to minimise the latency of loading scene data before processing it. In our approach, we separate path generation from path tracing which allows a binning structure to be created that is inherently suited to parallelisation, by using several bins of path-atoms. This allows us to hide latency with n-buffering the load/save operations, while being insensitive to the coherence of the rays. The insensitivity to coherence means this approach is inherently more scalable as SIMD width increases. 3.1 Path tracing and Path Generation Traditionally path-tracing is recursive with respect to calculating the final radiance value for each path. However, the generation of the path from the camera to a light source can be separated from calculating the actual radiance value. Further to this, the shading of each vertex can be separated from the calculation of the radiance value. Indeed, the calculation of the radiance value is only a very small part of the total calculation, shown to be less than 1% (see results) of our computation in our experiments. Assuming pertinent data of each vertex of the path is stored during calculation, the radiance value can be calculated at any point after the path is generated. The path generation process can be viewed then as a loop, rather than a recursive process, following the pseudo code below. while path completed do intersect ray with scene generate shading details for intersection and new direction from brdf record new vertex in path end while Once the path has been generated, each vertex can be shaded in parallel. Finally the radiance value of the path is calculated using the normal path tracing recursive algorithm. However, as all the material contributions, BRDFs, etc are already known, this calculation is very small (less than 1%). 3.2 Binning Path-Atoms to form a Circular Buffer The circular nature of path generation becomes significant when attempting to parallelise path tracing, due to the variable length of paths. If we are dealing with a recursive algorithm, we must recurse down to the level of the longest of the paths we are considering, not calculating for the ones that have a shorter path length. This has the obvious disadvantage of leaving the PE s of the shorter paths idle while the longer paths are computed. If the radiance value calculation is postponed until after the paths are generated, we have a better option. Each time the end of a path is reached on a PE, it simply loads a new ray from the begining of a new path and starts tracing. This leads to a circular buffer model for generating the paths, whereby in each loop the rays are intersected with the scene. The intersection is subsequently processed and new direction calculated. The new vertex is recorded and then either the new direction formed to a ray, or a new ray added. This leads to three separate bins with operations to move between them, as shown in Figure 1. Each time a path is completed, the path is read out to another buffer to be processed and the ray corresponding to the next path is fed in. The algorithm loops around this buffer continuously, until all the paths required have been calculated. Next ray in path generated Active paths' details Completed paths radiance values calculated Rays from new paths for each completed path Rays to be traced Intersection materials fetched and new vertex calculated Figure 1: Circular Buffer Rays intersected with scene Intersection details One elegant feature of this algorithm is the that of replacing a ray which completes half way through, but the rest of the bin does not. Instead of having to close the gap this leaves to keep the PE s all with data to process, a new ray can simply be dropped into its place and the corresponding path space in the path bin started. This removes the need for book keeping of which bin position corresponds to which path. Another advantage is that none of the path needs to be loaded to add a new vertex. Instead the number of vertices each path has is kept, and the new vertex is saved into the following position in the path bin. 3.3 N-Buffering to hide latency The memory on the ClearSpeed card consists of two parts, mono memory and poly memory. Mono memory is a large block of shared memory, similar in size and use to the RAM in a PC used by a commodity CPU. The poly memory is a small piece of private memory on each PE, and can be viewed as similar to cache memory on a CPU. That is, data to be processed on the PEs must be loaded from mono memory to the poly memory of the PE it is to be processed on before it can be used. The path-atom bins are stored in their entirety in mono memory on the card. To process a SIMD width block of

3 Data from these bin positions is being saved. Data from this bin position is currently being processed Data from these bin positions is being loaded { - { m - n m 2 m 1 m m + 1 m + 2 m + n 96 Elements wide (SIMD width of chip) Figure 2: N-buffering allows the proceeding n results to be saved and subsequent n blocks of data to be loaded while the current bin position is processed path-atoms, they must be loaded from mono to poly memory, processed, and the desired results saved back from poly to mono memory. In this way, the algorithm cycles through all of the path-atoms in each bin, loading-processing-saving in SIMD width chunks of the bin. First the ray bin is processed thus, then the intersection bin, then the path bin. The data to be loaded may depend on the result of a previous bins operation. For example the material needed for shading depends on the object intersected. A small piece of data is kept in poly memory between each bin to facilitate this. In the case of loading the material to be shaded, the intersected object s number would be kept in poly memory, and the address of the material to be loaded calculated from it when the next bin is processed. The constant loading and saving of data during the processing of the bins introduces a large amount of latency, as the card has to wait for the I/O operations to complete. Part of the speedup achieved using ray packets is through having coherent rays that intersect the same objects as they are traversed. Thus less scene data needs to be loaded from RAM to the cache, as each ray can be intersected with the same set of objects that are already in the cache. Instead of using coherency as a way of minimising latency, the bin structure allows the loading and saving to be n-buffered to offset the latency. While bin position m is being processed, m+1,m+2,,m+n are being loaded from memory. Similarly for saving, while bin position m is being processed, m 1,m 2,,m n are being saved to memory. For n-buffering to work, the bins must be at least n (numbero f PEs) in length, so that a result being written to memory is not being read from memory. Figure 2 shows how it works in the general case. N-buffering can also be used to wrap loads/saves around between each type of bin. As the final n bin positions of a preceding bin are processed, the first n of the following bin are being loaded. By using n-buffering, the latency introduced by the load and save operations necessary to process each block of path-atoms is offset, being carried out concurrently with the processing of earlier or later blocks respectively. As the latency is offset this way, there is no need for the rays being traced to be coherent. This gives the primary advantage of our method over ray-packets. The length of the paths, and where the paths start is entirely arbritrary. When using unbiased methods such as path tracing, where the rays do not keep coherence, this allows us to harness the power of wide width SIMD architectures. Further more, arbritraty path segments can be traced as coherence is no longer an issue. In the case of bi-directional path tracing, arbritrary and incoherent visibility tests are needed to check path validity, and path-atom binning fits in with this too. As well as this, methods such as selective rendering that use arbritrary numbers of rays per pixel, often very low, are unsuited to coherent packet tracing, as the benefits of selective rendering are based on tracing fewer rays and interpolating the results. In these cases again the method of path-atom binning is advantageous. 4 Implementation Our implementation runs on the ClearSpeed CSX co-processor architecture. It is a multi-threaded array processor (MTAP), normally utilised for traditional High Performance Computing (HPC) topics such as scientific modelling. A MTAP co-processor shares some characteristics of a multi-core CPU, and some of a stream processor such as a GPU. It has a standard RISC control unit with instruction fetch, cache and I/O mechanisms. Additionally it has the main block of so called processing elements or PEs. Each of these PEs consists of a register file, 6Kbytes of SRAM,a high speed I/O channel to adjacent PEs, an integer ALU and a 64-bit FPU. The 64-bit FPU, which implements full IEEE double precision, is responsible for the high throughput of 50 GFlops double precision. Branching in code is handled via an enable state, in a method similar to predicated instructions in some RISC CPUs. Figure 3 shows an overview of the architecture. Figure 3: ClearSpeed CSX chip architecture overview From a parallel graphics perspective the CSX architecture offers an interesting mix of the advantages of a stream procesor and multicore CPU. The current GPU architecture for comparison is Nvidia s G80 achitecture. Although advertised as a 128 core mutli threaded scalar processor, in effect is a 32 width SIMD processor. This is because each warp of 32 threads is processed in SIMD. To achieve the high performance the hardware is capable of, all these threads must run the same instructions. So when considering high throughput applications such as parallel rendering, the G80 architecture should be considered a 32 width SIMD processor, not a 128 core scalar architecture. The CSX600 chip by comparison has a SIMD width of 96, which as described later is advantageous for the wide SIMD algoritms detailed in this paper. A second point is that the G80 architecture doesn t natively perform double precision, but supports it through multiple cycles of single precision, while the CSX architecture FPUs are double precision throughout. From an algorithmic view, the subtle differences between a stream processor and MTAP are significant. The stream processor model consists of a host computer offloading very specific fine grained chunks of work to the processor, processing it and returning it to shared memory. The MTAP model is much closer to how a normal

4 CPU runs - the RISC control unit with a set of PEs model allows a complete program to be run on the board. The onboard memory is around that of a desktop PC - 1 to 4 GB, so, except for the most complex scenes, the entire scene description can be held on the card rather than on the host PC. Dedicated high speed buses between the PEs is similar to those found in multi-core chips, and allows algorithmic subtlety that isn t possible on a stream processor such as the G80. For example in the case where only n of the PEs succeed in a task. It is often the case that the spread of the succesful n is random, and only those succesful are to be stored into a bin, for further use. The high speed PE to PE buses can be used within algorithms to assign ascending numbers to each succesful PEs. In turn these ascending numbers can be used in addressing for saving results back to main memory. Alternatively these results can be retained in PE memory and while the PEs with unsuccesful task can load more data. This is in contrast to a stream processor where the results would have to all be unloaded, and the succesful ones reloaded along with fresh data to fill the gaps. Level of Path Path Segment Full Buffering Segments & Direct Lighting Computation Table 1: Render times in seconds of differing scenes when varying n in n-buffered computation compared to not buffering, whereas buffering both gives just over 20% performance increase. Type of Path Path Segment Full Buffering Segments & Direct Lighting Computation None Load Save Both Table 2: How number of samples per pixel affects rendering times in seconds Finally we rendered the scene in Figure 4 to a size of pixels, with 1000 samples per pixel. Using double buffering for both load and save operations, the scene took seconds, or 50.6 minutes. This gives an average of just over 150,000 incoherent path segments a second, unshaded, or 750,000 path segments a second with each path segments with a single shadow ray per path segment. Figure 4: Cornell Box Test scene, rendered in pixels, 1000 samples per pixel 4.1 Results We used a simple Cornell Box scene (Figure 4) to test different aspects of the algorithm. The first thing we tested was the effect on rendering times of n-buffering the load and save operations. Table 1 shows our test results when using values of n from 0 to 3. 0-buffering simply means that each load or save operation was carried out when it occured in the code, with no attempt to offset it. To make comparison easier, the time recorded is the time taken for 1,000,000 path segments. Russian roulette was used to terminate paths. We ran our algorithm only tracing the path segments, tracing and direct lighting each path segment, and running the full rendering algorithm. As can be seen in the results, as the value of n rises, the render time decreases. The gain in render time stops increasing once you start to buffer more than once (double buffering). This is due to the load/save operations acting on small pieces of data, around 10 double precision floats per PE. Our results show that computing the path segments with direct lighting forms 99% of the time taken, with only 1% of the computation taken up with recursively calculating the radiance values. The second thing we tested was the effect of buffering different combinations of the load / save operations. To test whether it was either the load or save operations that slowed the algorithm more, we buffered only the loads, only the saves, both and neither. Table 2 shows our results, again showing the time taken to compute 1,000,000 path segments. As can be seen, buffering either the load or save operations gives a performance increase of around 10% 5 Conclusions and Future Work We presented a new approach to the classic ray-tracing problem of memory access latency. Instead of grouping together packets of coherent rays, with all the limitations this implies, we reworked the problem to set aside the recursive calculation portion from the main computation. This allows us to bin the operation as a circular buffer. This has several important benifits. Firstly it makes it simple to keep the hardware with enough work so that it doesn t stand idle. Secondly it allows us to hide the latency using n-buffering. Thirdly, as latency is not coming from coherency, the type of rays processed is not important. Thus multiple bounces suffer no penalties, so unbiased techniques are possible. Also techniques that naturally result in incoherent ray processing can be used, such as selective rendering, and time-constrained rendering [Debattista 2006]. An average of 150,000 incoherent path segments a second, traced and shaded, was achieved for a simple Cornell box scene. The code used to generate these figures is not optimised, instead this is proof of concept work, to test our alternative approach to latency reduction. Our method could be extended simply with the implementation of bi-directional path tracing [Lafortune and Willems 1993]. This could be achieved by separately tracing both a forward and light path though the scene. Indeed, both paths could be stored contiguously in memory, with the starting pint for the light and eye paths calculated at the same time. Then as soon as the light path is complete, the pointer to the path can be moved on to the start of the eye path, and processing continued. Once both paths are traced, each vertex from the forward path could be combined with every vertex in the light path. The required visibility test could be handled within the circular buffer framework. Our method currently has not addressed the issue of acceleration structures, in terms of both building and traversing. Current work

5 favours the use of bounding volume hierarchies (BVH) [Wald et al. 2007a] with recent work on their use on parallel architectures [Ize et al. 2007]. Adapting this work to run on the ClearSpeed architecture is a clear next step. The current method already loads the object data to be intersected, similar to having a single leaf node. To extend to using an acceleration stucture, the intersection operation would have to be extended. A circular buffer similar in principle to the overall method presented here could be used, with an additional temporary bin would be needed. Each SIMD width block of rays and current leaf to be intersected with would be loaded and intersected. Those intersecting would have their details saved as in the current methods. The non-intersecting rays would be traversed to find the next leaf to be intersected with, and this result saved to the the temporary bin. Once all of the rays had been tested once, the temporary bin would have all the non-intersected rays and the next leaf for each ray to be tested against. The process could be repeated again, with the original bin being used to store the next round of un-intersected results. This would continue, the temporary and original bin swapping each time and with the number of nonintersecting rays falling until all have been intersected fully with the scene. The bins would only contain rays that still needed intersecting, meaning a full block of rays and objects can be loaded each time, keeping the processor fully working. As well as the rays and objects to be intersected, enough of the acceleration structure to traverse the ray to find the next leaf to intersect is necessary. Object hierarchies such as BVH may be the best suited, as the internal nodes of a sub-tree could be loaded (without the object data in the leaves). Clearly the next step is to research how large a sub-tree could be loaded, what if any impact this has on the algorithm (for example if only half a sub-tree can be loaded, how could the cases where this is insufficient be handled best). References BOULOS, S., EDWARDS, D., LACEWELL, J. D., KNISS, J., KAUTZ, J., WALD, I., AND SHIRLEY, P Packet-based Whitted and Distribution Ray Tracing. In Proceedings of Graphics Interface CHALMERS, A., DEBATTISTA, K., AND DOS SANTOS, L Selective rendering: Computing only what you see. In Graphite 2006, ACM SIGGRAPH, COOK, R. L., PORTER, T., AND CARPENTER, L Distributed ray tracing. SIGGRAPH Comput. Graph. 18, 3, DEBATTISTA, K., SUNDSTEDT, V., PEREIRA, F., AND CHALMERS, A Selective parallel rendering for highfidelity graphics. In Proceedings of Theory and Practice of Computer Graphics 2005, Eurographics Association, DEBATTISTA, K Selective rendering for high-fidelity graphics. PhD thesis, University of Bristol. DMITRIEV, K., HAVRAN, V., AND SEIDEL, H.-P Faster ray tracing with simd shaft culling. Research Report MPI-I , Max-Planck-Institut für Informatik, Saarbrücken, Germany, December. DUBLA, P., CHALMERS, A., AND DEBATTISTA, K An analysis of cache awareness for interactive selective rendering. In Communications Papers proceedings, WSCG, V. Skala, Ed. IZE, T., WALD, I., AND PARKER, S. G Asynchronous BVH Construction for Ray Tracing Dynamic Scenes on Parallel Multi- Core Architectures. In Proceedings of the 2007 Eurographics Symposium on Parallel Graphics and Visualization. KAJIYA, J. T The rendering equation. SIGGRAPH Comput. Graph. 20, 4, LAFORTUNE, E. P., AND WILLEMS, Y. D Bi-directional Path Tracing. In Proceedings of Third International Conference on Computational Graphics and Visualization Techniques (Compugraphics 93), H. P. Santo, Ed., MYSZKOWSKI, K., TAWARA, T., AKAMINE, H., AND SEIDEL, H.-P Perception-guided global illumination solution for animation rendering. In SIGGRAPH 2001, Computer Graphics Proceedings, ACM Press / ACM SIGGRAPH, E. Fiume, Ed., OSULLIVAN, C., H. S. M. Y. M. R Perceptually adaptive graphics. Eurographics 2004, STAR, PURCELL, T. J., DONNER, C., CAMMARANO, M., JENSEN, H. W., AND HANRAHAN, P Photon mapping on programmable graphics hardware. In Proceedings of the ACM SIG- GRAPH/EUROGRAPHICS Conference on Graphics Hardware, Eurographics Association, PURCELL, T. J Ray tracing on a stream processor. PhD thesis, Stanford, CA, USA. Adviser-Patrick M. Hanrahan. RESHETOV, A., SOUPIKOV, A., AND HURLEY, J Multilevel ray tracing algorithm. In SIGGRAPH 05: ACM SIG- GRAPH 2005 Papers, ACM, New York, NY, USA, SVEN WOOP, J. S., AND SLUSALLEK, P Rpu: A programmable ray processing unit for realtime ray tracing. In Proceedings of ACM SIGGRAPH VEACH, E., AND GUIBAS, L. J Bidirectional Estimators for Light Transport WALD, I., MARK, W. R., GÜNTHER, J., BOULOS, S., IZE, T., HUNT, W., PARKER, S. G., AND SHIRLEY, P. State of the Art in Ray Tracing Animated Scenes. In Eurographics 2007 State of the Art Reports. WALD, I., BENTHIN, C., WAGNER, M., AND SLUSALLEK, P Interactive rendering with coherent ray tracing. In Computer Graphics Forum (Proceedings of EUROGRAPHICS 2001, Blackwell Publishers, Oxford, A. Chalmers and T.-M. Rhyne, Eds., vol. 20, available at wald/publications. WALD, I., IZE, T., KENSLER, A., KNOLL, A., AND PARKER, S. G Ray Tracing Animated Scenes using Coherent Grid Traversal. ACM Transactions on Graphics, (Proceedings of ACM SIGGRAPH 2006). WALD, I., BOULOS, S., AND SHIRLEY, P Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies. ACM Transactions on Graphics 26, 1. WALD, I., GRIBBLE, C. P., BOULOS, S., AND KENSLER, A SIMD Ray Stream Tracing - SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering. Tech. Rep. UUSCI WHITTED, T An improved illumination model for shaded display. Commun. ACM 23, 6, YEE, H., PATTANAIK, S., AND GREENBERG, D. P Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments. ACM Trans. Graph. 20, 1,

Accelerated Entry Point Search Algorithm for Real-Time Ray-Tracing

Accelerated Entry Point Search Algorithm for Real-Time Ray-Tracing Accelerated Entry Point Search Algorithm for Real-Time Ray-Tracing Figure 1: Four of the scenes used for testing purposes. From the left: Fairy Forest from the Utah 3D Animation Repository, Legocar from

More information

Ray-Box Culling for Tree Structures

Ray-Box Culling for Tree Structures JOURNAL OF INFORMATION SCIENCE AND ENGINEERING XX, XXX-XXX (2012) Ray-Box Culling for Tree Structures JAE-HO NAH 1, WOO-CHAN PARK 2, YOON-SIG KANG 1, AND TACK-DON HAN 1 1 Department of Computer Science

More information

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016 Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives

More information

Recent Advances in Monte Carlo Offline Rendering

Recent Advances in Monte Carlo Offline Rendering CS294-13: Special Topics Lecture #6 Advanced Computer Graphics University of California, Berkeley Monday, 21 September 2009 Recent Advances in Monte Carlo Offline Rendering Lecture #6: Monday, 21 September

More information

Acceleration Structure for Animated Scenes. Copyright 2010 by Yong Cao

Acceleration Structure for Animated Scenes. Copyright 2010 by Yong Cao t min X X Y 1 B C Y 1 Y 2 A Y 2 D A B C D t max t min X X Y 1 B C Y 2 Y 1 Y 2 A Y 2 D A B C D t max t min X X Y 1 B C Y 1 Y 2 A Y 2 D A B C D t max t min A large tree structure change. A totally new tree!

More information

Interactive Ray Tracing: Higher Memory Coherence

Interactive Ray Tracing: Higher Memory Coherence Interactive Ray Tracing: Higher Memory Coherence http://gamma.cs.unc.edu/rt Dinesh Manocha (UNC Chapel Hill) Sung-Eui Yoon (Lawrence Livermore Labs) Interactive Ray Tracing Ray tracing is naturally sub-linear

More information

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes J. Pantaleoni, L. Fascione, M. Hill, T. Aila

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes J. Pantaleoni, L. Fascione, M. Hill, T. Aila PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes J. Pantaleoni, L. Fascione, M. Hill, T. Aila Agenda Introduction Motivation Basics PantaRay Accelerating structure generation Massively parallel

More information

Ray Casting Deformable Models on the GPU

Ray Casting Deformable Models on the GPU Ray Casting Deformable Models on the GPU Suryakant Patidar and P. J. Narayanan Center for Visual Information Technology, IIIT Hyderabad. {skp@research., pjn@}iiit.ac.in Abstract The GPUs pack high computation

More information

SIMD Ray Stream Tracing - SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering -

SIMD Ray Stream Tracing - SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering - 1 SIMD Ray Stream Tracing - SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering - Ingo Wald, Christiaan Gribble, Solomon Boulos and Andrew Kensler SCI Institute, University of Utah

More information

Packet-based Whitted and Distribution Ray Tracing

Packet-based Whitted and Distribution Ray Tracing Packet-based Whitted and Distribution Ray Tracing Solomon Boulos Dave Edwards J. Dylan Lacewell University of Utah Joe Kniss University of New Mexico Jan Kautz Peter Shirley Ingo Wald University College

More information

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y. COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering

More information

Packet-based Whitted and Distribution Ray Tracing

Packet-based Whitted and Distribution Ray Tracing Packet-based Whitted and Distribution Ray Tracing Solomon Boulos Dave Edwards J. Dylan Lacewell Joe Kniss Jan Kautz Peter Shirley of Utah University of New Mexico University College London Ingo Wald University

More information

Real Time Ray Tracing

Real Time Ray Tracing Real Time Ray Tracing Programação 3D para Simulação de Jogos Vasco Costa Ray tracing? Why? How? P3DSJ Real Time Ray Tracing Vasco Costa 2 Real time ray tracing : example Source: NVIDIA P3DSJ Real Time

More information

Deep Coherent Ray Tracing

Deep Coherent Ray Tracing Deep Coherent Ray Tracing Erik Ma nsson Jacob Munkberg Tomas Akenine-Mo ller Lund University/TAT AB Lund University Lund University fairy sponza oldtree newtree dragon Figure 1: The example scenes used

More information

Realtime Ray Tracing

Realtime Ray Tracing Realtime Ray Tracing Meinrad Recheis Vienna University of Technology Figure 1: Images rendered in realtime with OpenRT on PC clusters at resolution 640 480. a) A Mercedes C-Class model consisting of 320.000

More information

Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories *

Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 895-909 (2012) Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories * KYUNGHEE CHO 1, SEONGGUN KIM 2 AND HWANSOO HAN

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Packet-based Whitted and Distribution Ray Tracing

Packet-based Whitted and Distribution Ray Tracing Packet-based Whitted and Distribution Ray Tracing Solomon Boulos Dave Edwards J. Dylan Lacewell University of Utah Joe Kniss University of New Mexico Jan Kautz Peter Shirley Ingo Wald University College

More information

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR

More information

Reducing the Number of Shadow Rays. in Bidirectional Path Tracing. Department of Computer Science, Katholieke Universiteit Leuven

Reducing the Number of Shadow Rays. in Bidirectional Path Tracing. Department of Computer Science, Katholieke Universiteit Leuven Reducing the Number of Shadow Rays in Bidirectional Path Tracing Eric P. Lafortune Yves D. Willems Department of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, 3001 Heverlee, Belgium

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors

A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors Michael Steffen Electrical and Computer Engineering Iowa State University steffma@iastate.edu Joseph Zambreno Electrical

More information

Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure

Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure EG UK Theory and Practice of Computer Graphics(2009) Wen Tang, John Collomosse(Editors) Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure MichaelSteffenandJosephZambreno Department

More information

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens Ray Tracing with Multi-Core/Shared Memory Systems Abe Stephens Real-time Interactive Massive Model Visualization Tutorial EuroGraphics 2006. Vienna Austria. Monday September 4, 2006 http://www.sci.utah.edu/~abe/massive06/

More information

An Improved Multi-Level Raytracing Algorithm

An Improved Multi-Level Raytracing Algorithm An Improved Multi-Level Raytracing Algorithm Joshua Barczak Figure 1: Renderings of our test scenes. On average, our method eliminates 6%, 69%, 63%, and 56% of BVH traversals for these scenes. Abstract

More information

Ray Tracing: Whence and Whither?

Ray Tracing: Whence and Whither? Ray Tracing: Whence and Whither? Dave Edwards April 24, 2008 Introduction Rendering Input: description of a scene (geometry, materials) Ouput: image or images (i.e., movie) Two main components Computing

More information

Part IV. Review of hardware-trends for real-time ray tracing

Part IV. Review of hardware-trends for real-time ray tracing Part IV Review of hardware-trends for real-time ray tracing Hardware Trends For Real-time Ray Tracing Philipp Slusallek Saarland University, Germany Large Model Visualization at Boeing CATIA Model of Boeing

More information

Lecture 11 - GPU Ray Tracing (1)

Lecture 11 - GPU Ray Tracing (1) INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 11 - GPU Ray Tracing (1) Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Exam Questions: Sampler

More information

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome! INFOMAGR Advanced Graphics Jacco Bikker - February April 2016 Welcome! I x, x = g(x, x ) ε x, x + S ρ x, x, x I x, x dx Today s Agenda: Introduction : GPU Ray Tracing Practical Perspective Advanced Graphics

More information

Cache-Oblivious Ray Reordering

Cache-Oblivious Ray Reordering Cache-Oblivious Ray Reordering Bochang Moon Yongyoung Byun Tae-Joon Kim Pio Claudio Sung-Eui Yoon CS/TR-2009-314 May, 2009 KAIST Department of Computer Science Cache-Oblivious Ray Reordering Bochang Moon

More information

Real-time ray tracing

Real-time ray tracing Lecture 10: Real-time ray tracing (and opportunities for hardware acceleration) Visual Computing Systems Recent push towards real-time ray tracing Image credit: NVIDIA (this ray traced image can be rendered

More information

NVIDIA Case Studies:

NVIDIA Case Studies: NVIDIA Case Studies: OptiX & Image Space Photon Mapping David Luebke NVIDIA Research Beyond Programmable Shading 0 How Far Beyond? The continuum Beyond Programmable Shading Just programmable shading: DX,

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Last Time? The Traditional Graphics Pipeline Reading for Today A Practical Model for Subsurface Light Transport, Jensen, Marschner, Levoy, & Hanrahan, SIGGRAPH 2001 Participating Media Measuring BRDFs

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Last Time? The Traditional Graphics Pipeline Participating Media Measuring BRDFs 3D Digitizing & Scattering BSSRDFs Monte Carlo Simulation Dipole Approximation Today Ray Casting / Tracing Advantages? Ray

More information

Choosing the Right Algorithm & Guiding

Choosing the Right Algorithm & Guiding Choosing the Right Algorithm & Guiding PHILIPP SLUSALLEK & PASCAL GRITTMANN Topics for Today What does an implementation of a high-performance renderer look like? Review of algorithms which to choose for

More information

Variance reduction using interframe coherence for animated scenes

Variance reduction using interframe coherence for animated scenes Computational Visual Media DOI 10.1007/s41095-015-0026-0 Vol. 1, No. 4, December 2015, 343 349 Research Article Variance reduction using interframe coherence for animated scenes Peng Zhou 1 ( ), Yanyun

More information

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been

More information

Philipp Slusallek Karol Myszkowski. Realistic Image Synthesis SS18 Instant Global Illumination

Philipp Slusallek Karol Myszkowski. Realistic Image Synthesis SS18 Instant Global Illumination Realistic Image Synthesis - Instant Global Illumination - Karol Myszkowski Overview of MC GI methods General idea Generate samples from lights and camera Connect them and transport illumination along paths

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

New Reliable Algorithm of Ray Tracing. through Hexahedral Mesh

New Reliable Algorithm of Ray Tracing. through Hexahedral Mesh Applied Mathematical Sciences, Vol. 8, 2014, no. 24, 1171-1176 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4159 New Reliable Algorithm of Ray Tracing through Hexahedral Mesh R. P.

More information

Real-Time Realism will require...

Real-Time Realism will require... Real-Time Realism will require... Tomas Akenine-Möller Lund University and Intel Corporation Panel at High-Performance Graphics 2010 2010-06-27 1 Contents Talk about differences/similarities between ray

More information

Simpler and Faster HLBVH with Work Queues

Simpler and Faster HLBVH with Work Queues Simpler and Faster HLBVH with Work Queues Kirill Garanzha NVIDIA Keldysh Institute of Applied Mathematics Jacopo Pantaleoni NVIDIA Research David McAllister NVIDIA Figure 1: Some of our test scenes, from

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

Accelerated Ambient Occlusion Using Spatial Subdivision Structures

Accelerated Ambient Occlusion Using Spatial Subdivision Structures Abstract Ambient Occlusion is a relatively new method that gives global illumination like results. This paper presents a method to accelerate ambient occlusion using the form factor method in Bunnel [2005]

More information

GPU-BASED RAY TRACING ALGORITHM USING UNIFORM GRID STRUCTURE

GPU-BASED RAY TRACING ALGORITHM USING UNIFORM GRID STRUCTURE GPU-BASED RAY TRACING ALGORITHM USING UNIFORM GRID STRUCTURE Reza Fuad R 1) Mochamad Hariadi 2) Telematics Laboratory, Dept. Electrical Eng., Faculty of Industrial Technology Institut Teknologi Sepuluh

More information

Path Tracing part 2. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017

Path Tracing part 2. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017 Path Tracing part 2 Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017 Monte Carlo Integration Monte Carlo Integration The rendering (& radiance) equation is an infinitely recursive integral

More information

Lecture 2 - Acceleration Structures

Lecture 2 - Acceleration Structures INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 2 - Acceleration Structures Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Problem Analysis

More information

Realistic Image Synthesis

Realistic Image Synthesis Realistic Image Synthesis Bidirectional Path Tracing & Reciprocity Karol Myszkowski Gurprit Singh Path Sampling Techniques Different techniques of sampling paths from both sides Numbers in parenthesis

More information

Massive Model Visualization using Real-time Ray Tracing

Massive Model Visualization using Real-time Ray Tracing Massive Model Visualization using Real-time Ray Tracing Eurographics 2006 Tutorial: Real-time Interactive Massive Model Visualization Andreas Dietrich Philipp Slusallek Saarland University & intrace GmbH

More information

Asynchronous BVH Construction for Ray Tracing Dynamic Scenes on Parallel Multi-Core Architectures

Asynchronous BVH Construction for Ray Tracing Dynamic Scenes on Parallel Multi-Core Architectures Eurographics Symposium on Parallel Graphics and Visualization (27) Jean M. Favre, Luis Paulo dos Santos, and Dirk Reiners (Editors) Asynchronous BVH Construction for Ray Tracing Dynamic Scenes on Parallel

More information

REAL-TIME GPU PHOTON MAPPING. 1. Introduction

REAL-TIME GPU PHOTON MAPPING. 1. Introduction REAL-TIME GPU PHOTON MAPPING SHERRY WU Abstract. Photon mapping, an algorithm developed by Henrik Wann Jensen [1], is a more realistic method of rendering a scene in computer graphics compared to ray and

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Final Projects Proposals due Thursday 4/8 Proposed project summary At least 3 related papers (read & summarized) Description of series of test cases Timeline & initial task assignment The Traditional Graphics

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

High Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal

High Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal High Definition Interactive Animated Ray Tracing on CELL Processor using Coherent Grid Traversal David R. Chapman University of Maryland Baltimore County Abstract The IBM/Toshiba/Sony CELL processor exhibited

More information

Precomputed & Hybrid Variants of Lightcuts

Precomputed & Hybrid Variants of Lightcuts Precomputed & Hybrid Variants of Lightcuts Tim Condon Bruce Walter Kavita Bala Cornell University Abstract Our extensions to multidimensional lightcuts improve rendering performance using precomputation

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using

More information

Building a Fast Ray Tracer

Building a Fast Ray Tracer Abstract Ray tracing is often used in renderers, as it can create very high quality images at the expense of run time. It is useful because of its ability to solve many different problems in image rendering.

More information

Realtime Ray Tracing and its use for Interactive Global Illumination

Realtime Ray Tracing and its use for Interactive Global Illumination EUROGRAPHICS 2003 STAR State of The Art Report Realtime Ray Tracing and its use for Interactive Global Illumination Ingo Wald Timothy J.Purcell Jörg Schmittler {wald,schmittler,benthin,slusallek}@graphics.cs.uni-sb.de

More information

Illumination Algorithms

Illumination Algorithms Global Illumination Illumination Algorithms Digital Lighting and Rendering CGT 340 The goal of global illumination is to model all possible paths of light to the camera. Global Illumination Global illumination

More information

ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013

ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013 ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013 TOPIC: RESEARCH PAPER Title: Data Management for SSDs for Large-Scale Interactive Graphics Applications Authors: M. Gopi, Behzad Sajadi,

More information

Cache-Oblivious Ray Reordering

Cache-Oblivious Ray Reordering Cache-Oblivious Ray Reordering Bochang Moon, Yongyoung Byun, Tae-Joon Kim, Pio Claudio KAIST Hye-sun Kim, Yun-ji Ban, Seung Woo Nam Electronics and Telecommunications Research Institute (ETRI) and Sung-eui

More information

RTfact. Concepts for Generic Ray Tracing. Iliyan Georgiev. Computer Graphics Group Saarland University Saarbrücken, Germany

RTfact. Concepts for Generic Ray Tracing. Iliyan Georgiev. Computer Graphics Group Saarland University Saarbrücken, Germany RTfact Concepts for Generic Ray Tracing Iliyan Georgiev Computer Graphics Group Saarland University 66123 Saarbrücken, Germany A thesis submitted in partial satisfaction of the requirements for the degree

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering

An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering T. Ropinski, F. Steinicke, K. Hinrichs Institut für Informatik, Westfälische Wilhelms-Universität Münster

More information

We present a method to accelerate global illumination computation in pre-rendered animations

We present a method to accelerate global illumination computation in pre-rendered animations Attention for Computer Graphics Rendering Hector Yee PDI / DreamWorks Sumanta Pattanaik University of Central Florida Corresponding Author: Hector Yee Research and Development PDI / DreamWorks 1800 Seaport

More information

Efficient use of Multiple Hardware Components for Image Synthesis

Efficient use of Multiple Hardware Components for Image Synthesis Efficient use of Multiple Hardware Components for Image Synthesis Francisco Pereira João Paulo Moura José Afonso Bulas-Cruz Luís Magalhães Universidade de Trás-os-Montes e Alto Douro Quinta de Prados,

More information

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan.  Ray Tracing. Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Ray Tracing with Tim Purcell 1 Topics Why ray tracing? Interactive ray tracing on multicomputers

More information

MULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids

MULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids MULTI-LEVEL GRID STRATEGIES FOR RAY TRACING Improving Render Time Performance for Row Displacement Compressed Grids Vasco Costa, João Madeiras Pereira INESC-ID / IST, Rua Alves Redol 9, Apartado 1369,

More information

Advanced Graphics. Path Tracing and Photon Mapping Part 2. Path Tracing and Photon Mapping

Advanced Graphics. Path Tracing and Photon Mapping Part 2. Path Tracing and Photon Mapping Advanced Graphics Path Tracing and Photon Mapping Part 2 Path Tracing and Photon Mapping Importance Sampling Combine importance sampling techniques Reflectance function (diffuse + specular) Light source

More information

CS 563 Advanced Topics in Computer Graphics Irradiance Caching and Particle Tracing. by Stephen Kazmierczak

CS 563 Advanced Topics in Computer Graphics Irradiance Caching and Particle Tracing. by Stephen Kazmierczak CS 563 Advanced Topics in Computer Graphics Irradiance Caching and Particle Tracing by Stephen Kazmierczak Introduction Unbiased light transport algorithms can sometimes take a large number of rays to

More information

Shell: Accelerating Ray Tracing on GPU

Shell: Accelerating Ray Tracing on GPU Shell: Accelerating Ray Tracing on GPU Kai Xiao 1, Bo Zhou 2, X.Sharon Hu 1, and Danny Z. Chen 1 1 Department of Computer Science and Engineering, University of Notre Dame 2 Department of Radiation Oncology,

More information

Adaptive Assignment for Real-Time Raytracing

Adaptive Assignment for Real-Time Raytracing Adaptive Assignment for Real-Time Raytracing Paul Aluri [paluri] and Jacob Slone [jslone] Carnegie Mellon University 15-418/618 Spring 2015 Summary We implemented a CUDA raytracer accelerated by a non-recursive

More information

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Xingxing Zhu and Yangdong Deng Institute of Microelectronics, Tsinghua University, Beijing, China Email: zhuxingxing0107@163.com,

More information

Schedule. MIT Monte-Carlo Ray Tracing. Radiosity. Review of last week? Limitations of radiosity. Radiosity

Schedule. MIT Monte-Carlo Ray Tracing. Radiosity. Review of last week? Limitations of radiosity. Radiosity Schedule Review Session: Tuesday November 18 th, 7:30 pm, Room 2-136 bring lots of questions! MIT 6.837 Monte-Carlo Ray Tracing Quiz 2: Thursday November 20 th, in class (one weeks from today) MIT EECS

More information

To Do. Real-Time High Quality Rendering. Motivation for Lecture. Monte Carlo Path Tracing. Monte Carlo Path Tracing. Monte Carlo Path Tracing

To Do. Real-Time High Quality Rendering. Motivation for Lecture. Monte Carlo Path Tracing. Monte Carlo Path Tracing. Monte Carlo Path Tracing Real-Time High Quality Rendering CSE 274 [Fall 2015], Lecture 5 Tour of Modern Offline Rendering To Do Project milestone (1-2 pages), final project proposal Due on Oct 27 Please get in touch with me if

More information

Early Split Clipping for Bounding Volume Hierarchies

Early Split Clipping for Bounding Volume Hierarchies Early Split Clipping for Bounding Volume Hierarchies Manfred Ernst Günther Greiner Lehrstuhl für Graphische Datenverarbeitung Universität Erlangen-Nürnberg, Germany ABSTRACT Despite their algorithmic elegance

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

Distribution Ray Tracing

Distribution Ray Tracing Reading Required: Distribution Ray Tracing Brian Curless CSE 557 Fall 2015 Shirley, 13.11, 14.1-14.3 Further reading: A. Glassner. An Introduction to Ray Tracing. Academic Press, 1989. [In the lab.] Robert

More information

Sung-Eui Yoon ( 윤성의 )

Sung-Eui Yoon ( 윤성의 ) CS380: Computer Graphics Ray Tracing Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/cg/ Class Objectives Understand overall algorithm of recursive ray tracing Ray generations Intersection

More information

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory

More information

Game Technology. Lecture Physically Based Rendering. Dipl-Inform. Robert Konrad Polona Caserman, M.Sc.

Game Technology. Lecture Physically Based Rendering. Dipl-Inform. Robert Konrad Polona Caserman, M.Sc. Game Technology Lecture 7 4.12.2017 Physically Based Rendering Dipl-Inform. Robert Konrad Polona Caserman, M.Sc. Prof. Dr.-Ing. Ralf Steinmetz KOM - Multimedia Communications Lab PPT-for-all v.3.4_office2010

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

Selective Rendering: Computing only what you see

Selective Rendering: Computing only what you see Selective Rendering: Computing only what you see Alan Chalmers University of Bristol, UK Kurt Debattista University of Bristol, UK Luis Paulo dos Santos University of Minho, Portugal Abstract The computational

More information

Reusing Shading for Interactive Global Illumination GDC 2004

Reusing Shading for Interactive Global Illumination GDC 2004 Reusing Shading for Interactive Global Illumination Kavita Bala Cornell University Bruce Walter Introduction What is this course about? Schedule What is Global Illumination? Computing Global Illumination

More information

Lecture 4 - Real-time Ray Tracing

Lecture 4 - Real-time Ray Tracing INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 4 - Real-time Ray Tracing Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions

More information

Statistical Acceleration for Animated Global Illumination

Statistical Acceleration for Animated Global Illumination Statistical Acceleration for Animated Global Illumination Mark Meyer John Anderson Pixar Animation Studios Unfiltered Noisy Indirect Illumination Statistically Filtered Final Comped Frame Figure 1: An

More information

Packet-based Ray Tracing of Catmull-Clark Subdivision Surfaces

Packet-based Ray Tracing of Catmull-Clark Subdivision Surfaces 1 Packet-based Ray Tracing of Catmull-Clark Subdivision Surfaces Carsten Benthin, Solomon Boulos, Dylan Lacewell and Ingo Wald Intel Corporation School of Computing, University of Utah Walt Disney Animation

More information

Efficient MIMD Architectures for High-Performance Ray Tracing

Efficient MIMD Architectures for High-Performance Ray Tracing Efficient MIMD Architectures for High-Performance Ray Tracing D. Kopta, J. Spjut, E. Brunvand, and A. Davis University of Utah, School of Computing {dkopta,spjut,elb,ald}@cs.utah.edu Abstract Ray tracing

More information

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker Rendering Algorithms: Real-time indirect illumination Spring 2010 Matthias Zwicker Today Real-time indirect illumination Ray tracing vs. Rasterization Screen space techniques Visibility & shadows Instant

More information

Photorealistic 3D Rendering for VW in Mobile Devices

Photorealistic 3D Rendering for VW in Mobile Devices Abstract University of Arkansas CSCE Department Advanced Virtual Worlds Spring 2013 Photorealistic 3D Rendering for VW in Mobile Devices Rafael Aroxa In the past few years, the demand for high performance

More information

Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy

Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy Rohit Nigam rohit_n@students.iiit.ac.in P. J. Narayanan pjn@iiit.ac.in Center for Visual Information Technology, IIIT-Hyderabad,

More information

AN EXPERIMENTAL COMPARISON OF ACCELERATION SCHEMES FOR DENSELY OCCLUDED ENVIRONMENTS

AN EXPERIMENTAL COMPARISON OF ACCELERATION SCHEMES FOR DENSELY OCCLUDED ENVIRONMENTS SIGNAL - IMAGE - COMMUNICATIONS FRE CNRS n 2731 AN EXPERIMENTAL COMPARISON OF ACCELERATION SCHEMES FOR DENSELY OCCLUDED ENVIRONMENTS D. FRADIN, D. MENEVEAUX RAPPORT DE RECHERCHE n 26 - Janvier 26 SIC,

More information

MIT Monte-Carlo Ray Tracing. MIT EECS 6.837, Cutler and Durand 1

MIT Monte-Carlo Ray Tracing. MIT EECS 6.837, Cutler and Durand 1 MIT 6.837 Monte-Carlo Ray Tracing MIT EECS 6.837, Cutler and Durand 1 Schedule Review Session: Tuesday November 18 th, 7:30 pm bring lots of questions! Quiz 2: Thursday November 20 th, in class (one weeks

More information

Final Project: Real-Time Global Illumination with Radiance Regression Functions

Final Project: Real-Time Global Illumination with Radiance Regression Functions Volume xx (200y), Number z, pp. 1 5 Final Project: Real-Time Global Illumination with Radiance Regression Functions Fu-Jun Luan Abstract This is a report for machine learning final project, which combines

More information

Bidirectional Instant Radiosity

Bidirectional Instant Radiosity Eurographics Symposium on Rendering (2006) Tomas Akenine-Möller and Wolfgang Heidrich (Editors) Bidirectional Instant Radiosity B. Segovia 1,2, J. C. Iehl 2, R. Mitanchey 1 and B. Péroche 2 1 LIRIS: Lyon

More information

Introduction to GPU computing

Introduction to GPU computing Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU

More information

Fast BVH Construction on GPUs

Fast BVH Construction on GPUs Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California

More information

11/2/2010. In the last lecture. Monte-Carlo Ray Tracing : Path Tracing. Today. Shadow ray towards the light at each vertex. Path Tracing : algorithm

11/2/2010. In the last lecture. Monte-Carlo Ray Tracing : Path Tracing. Today. Shadow ray towards the light at each vertex. Path Tracing : algorithm Comuter Grahics Global Illumination: Monte-Carlo Ray Tracing and Photon Maing Lecture 11 In the last lecture We did ray tracing and radiosity Ray tracing is good to render secular objects but cannot handle

More information

GPU Memory Management for Efficient Ray Tracing Applications

GPU Memory Management for Efficient Ray Tracing Applications Eurographics Symposium on Rendering 014 Wojciech Jarosz and Pieter Peers (Guest Editors) Volume 33 (014), Number 4 GPU Memory Management for Efficient Ray Tracing Applications Weilun Sun, Ling-Qi Yan,

More information