Acceleration of ray tracing method using predictive evaluation and GPGPU technology
|
|
- Claribel Greer
- 6 years ago
- Views:
Transcription
1 Cent. Eur. J. Comp. Sci. 4(3) DOI: /s Central European Journal of Computer Science Acceleration of ray tracing method using predictive evaluation and GPGPU technology Research Article Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, Košice, Slovakia Received 28 February 2014; accepted 29 August 2014 Abstract: Ray tracing is one of computer graphics methods for achieving the most realistic outputs. Its main disadvantage is high computation demands. Removal of this disadvantage is possible using parallelization due to the fact that the ray tracing method is inherently parallel. Solution presented in this article uses GPGPU (general-purpose computing on graphics processing units) technology and a predictive evaluation for the acceleration of ray tracing method. The CUDA C was selected as a GPGPU language and it was used for a conversion of a raytracer core. The main reason for choosing this language was usage of the Tesla C1060 graphics card. The predictive evaluation of a scene was based on the fact that total computation time increases proportionally with resolution. This evaluation allows selection of the optimal scene division for the parallel ray tracing. In tests, proposed GPGPU solution reached accelerations up to 28.3 comparing to CPU. Keywords: ray tracing parallel ray tracing GPGPU CUDA NVIDIA Versita sp. z o.o. 1. Introduction Ray tracing is one of computer graphics techniques used to produce accurate images of photorealistic quality from complex three-dimensional scenes described and stored in a computer-readable form. It is based on simulation of realworld optical processes. One great disadvantage of such techniques is that they are computationally very expensive and require massive amounts of floating point operations. Parallel ray tracing takes advantage of parallel computing to speed up image rendering, since this technique is inherently parallel 1 [1]. In nature, light sources emit rays of light, which travel through space and interact with objects and environment, by which they are absorbed, reflected, or refracted. These rays are then received by our eyes and form a picture. Ray tracing branislav.sobota@tuke.sk stefan.korecko@tuke.sk csaba.szabo@tuke.sk frantisek.hrozek@tuke.sk (Corresponding author) 1 OpenCL Reference pages - official website https: // www. khronos. org/ opencl/ [cited May 2013] 118
2 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek produces images by simulating these processes, with one significant modification. Emitting rays from light sources and tracking them would be very time-consuming and inefficient, because only a small fraction ends up in the eye/camera, the rest is irrelevant. So instead of this, ray tracing casts rays from camera through image plane (for each pixel of final image) into the scene and tracks these rays. It computes the intersection of the ray with the first surface it collides with, examines the material properties (casting additional rays for refraction/reflection if necessary) and incoming light from light sources in the scene (by casting additional rays from intersection to each source) and then computes the colour of the pixel in the final image [2 4]. A survey of the current techniques for raytracing can be found in [5]. Ray tracing belongs to a set of problems that utilize parallel computing very well, since it is computationally expensive and can be easily decomposed. The two main factors influencing the design and performance of parallel ray tracing systems are the computation model [3] and the load-balancing mechanism [6]. The idea to use graphics cards and their parallel possibilities for non-graphical computations has been developed in many ways during recent years and the acceleration of applications based on GPU utilisation is already common these days (for example medical image reconstruction [7, 8], molecular dynamics [9, 10] or industry [11, 12]). This technology is also applicable to photorealistic displaying methods. Ray tracing belongs to these methods and its algorithm can be very well parallelized as was mentioned earlier. The main goal of this article is to propose a parallel ray tracing solution using GPGPU technology, which concerns the predictive evaluation of the scene. This evaluation is executed before rendering of a scene in full resolution and it allows selection of the optimal scene division for the parallelization on GPGPU. The article also describes results obtained by our solution using statistical comparison of rendering times. 2. Analysis 2.1. Parallel ray tracing There are two principal methods of decomposing a ray tracing computation: demand-driven and data-driven (or dataparallel), and there are researches focused on developing a hybrid model trying to combine the best features of both mentioned methods 2 [4]. Demand-driven parallel ray tracing computes the final product of ray tracer as an image of m*n pixels, and since each pixel is computed independently, the most obvious way of decomposition is to divide the image into p parts, where p is number of processors available and each processor would compute m*n/p pixels and ideally, the computation would be p times faster. This approach is called demand-driven parallel ray tracing. A number of jobs are created, each containing a different subset of image pixels and these jobs are assigned to processors. Input scene is copied to local memory of each processor. Processors render their parts, return computed pixels, get another job if there is any, and in the end the final image is composed from these parts. Main benefits of this approach are: easy decomposition and implementation, simple job distribution and control and the general ray tracing algorithm remains unchanged and it scales well. The main disadvantage is that input scene has to be copied to local memory of each processor, which poses a problem if the scene is very large. Data-driven parallel ray tracing (also called data-parallel ray tracing) splits the input scene into a number of sections (rows, tiles or columns, Figure 1) and assigns these sections to processors. Each processor is responsible for all computations associated with objects in this particular section, no matter where the ray comes from. Only rays passing through the processor s section are traced. If a ray spawned at one processor needs data from another processor, it is transferred to that processor. The way the scene is divided into section determines the efficiency of parallel computation. Determining the number of rays that will pass through a section of the scene in order to estimate the sections requiring the most processing is one of the hardest problems to overcome. Using the cost function can be helpful. Main benefit of this approach is that the input scene doesn t have to be copied entirely to each processor, but it is split into sections, so even very large scenes can be processed relatively easy. Main disadvantage is that this approach doesn t scale very well with growing scene complexity and cluster size, because of substantial task communication overhead and ray 2 OpenCL Reference pages - official website https: // www. khronos. org/ opencl/ [cited May 2013] 119
3 Acceleration of ray tracing method using predictive evaluation and GPGPU technology transfers [4]. Figure 1. Tilling examples for the demand-driven parallel ray tracing Predictive evaluation There are two methods of a scene predictive evaluation [13]: prediction based on scene description and prediction based on scene simplification. Prediction based on scene description uses parameters of objects (e.g. material, position or type) and is executed without scene rendering. According to the combination of these parameter we distinguish: scene prediction based on material analysis; scene prediction based on material and position analysis; scene prediction based on material, position and type analysis. Prediction based on scene simplification uses a simplified version of the original scene. For this simplification can be used: reduction of scene resolution; limitation of depth recursion used for scene rendering; histogram of objects "dificulty". Disadvantage of the predictive evaluation is the fact that the prediction strongly depends on the used scene and therefore time needed for computation varies. 3. Design and implementation 3.1. Parallel raytracer The implementation was done in several steps: a GPGPU language selection, a raytracer selection, transformation of a raytracer core into a parallel raytracer core using CUDA C language, optimization and testing (this step is covered in the section Experiments and results). CUDA C 3 was selected as a GPGPU language in our solution. The main reason 3 NVIDIA Corporation: NVIDIA CUDA C Programming guide - official website http: // docs. nvidia. com/ cuda/ cuda-c-programming-guide/ index. html [cited April 2013] 120
4 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek for choosing this language was the usage of the Tesla C1060 graphics card, which reaches highest output results with this language [14]. The solution created by Grégory Massal [3] was used for the raytracer. Subsequently, the core of this raytracer was converted into CUDA C. The next step was an optimization of CUDA application, which consisted of two sub-steps: memory usage optimization and stream processors occupancy optimization. The memory optimization was based on the following rules: use fast memories on GPU whenever possible, minimize usage of slow memories on GPU and minimize data copying between GPU and host system. The CUDA GPU Occupancy Calculator 4 was used for the stream processors occupancy optimization. An optimal amount of threads for used graphic cards (NVIDIA Tesla C1060 and GeForce GTX 275) was 128 or 320. Number of parts, into which scene was divided for parallel ray tracing, was selected according to the optimal number of threads. These parts were divided according to the data-driven approach (into tiles, rows or columns): selected sizes of parts for 128 threads (points) - 16 points on the x-axis and 8 points on the y-axis, 128 points on the x-axis and 128 points on the y-axis; selected sizes of parts for 320 threads (points) - 20 points on the x-axis and 16 points on the y-axis, 320 points on the x-axis and 320 points on the y-axis Predictive evaluation Predictive evaluation used in our solution is based on a fact that total time of a scene rendering increases proportionally with a resolution. Using this fact, an algorithm for the predictive evaluation looks as follows: 1. render a scene in a low resolution using different types of scene division; 2. select division with the lowest rendering time; 3. render a scene in full resolution using this division. Outputs of this predictive evaluation for a sample scene using three various low resolutions and three various divisions are shown in the next section: Experiments and results. 4. Experiments and results Sample scenes from Figure 2 have been used for the Experiment no.1 and no.2. identification of scene complexity in individual parts (tiles, rows or columns). Selected scenes allow an easy Figure 2. Sample scenes (from left to right): scene 1, scene 2, scene 3. 4 NVIDIA Corporation: NVIDIA CUDA C Programming guide - official website http: // docs. nvidia. com/ cuda/ cuda-c-programming-guide/ index. html [cited April 2013] 121
5 Acceleration of ray tracing method using predictive evaluation and GPGPU technology 4.1. Experiment no.1 - testing parallel raytracer The experiment no. 1 was focused on the comparison of final times for various divisions of sample scenes. The hardware configuration used for the experiment: Intel Dual Core E6300 overclocked to 3.8 GHz, NVIDIA GeForce GTX MB, 4 GB RAM and Windows 7 64-bit. The results of this comparison are shown in Table 1. Visual representation of this comparison is shown in Figure 3. Table 1. Final times for sample scenes divided into 128 and 320 parts (resolution ). Size Scene 1 Scene 2 Scene 3 16 points on the x-axis and 8 points on the y-axis points on the x-axis points on the y-axis points on the x-axis and 16 points on the y-axis points on the x-axis points on the y-axis As can be seen, the optimal division for the first and the third scene is 128 points on the y-axis. For the second scene is optimal division 16 points on the x-axis and 8 points on the y-axis. Worst results were obtained using division 128 points on the x-axis. The results also confirmed the assumption that the total computation time depends on the used division, while the optimal division depends on the used scene ,726 6,744 8, ,187 7,446 9,532 parts size (points) ,52 5,726 6,88 6,718 8,201 8,095 scene 3 scene 2 scene ,145 7,44 9, ,566 6,915 8, time (s) Figure 3. The comparison of final times for various divisions Experiment no.2 - testing acceleration GPU vs. CPU The experiment no. 2 was focused on the comparison of computation times between the graphics cards and processors. These CPUs and GPUs were compared in the experiment: Intel Dual Core E6300 (overclocked to 3.8 GHz), Intel i5-2500k processor (overclocked to 4.5 GHz), NVIDIA GeForce GTX MB and NVIDIA Tesla C1060. The division with 20 points on the x-axis and 16 points on the y-axis (320 threads) was used for the parallelization on GPUs. Ray 122
6 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek tracing computation on CPUs was not parallel so only one core was used for computation. Results of this comparison are shown in Table 2. As it can be seen, the GPGPU solution reaches much more lower rendering times. Comparison of final times and accelerations between the best CPU (Intel i5-2500k) and GPU (NVIDIA Tesla C1060) is shown in Table 3. Average acceleration observed in this experiment was Table 2. Computation times comparison (in seconds). Scene Resolution Intel Core 2 Duo E6300 Intel i5-2500k NVIDIA GeForce GTX 275 NVIDIA Tesla C Scene Scene Scene Table 3. Comparison of times and accelerations between the Intel i5-2500k and the NVIDIA Tesla C1060 (in seconds). Scene Resolution Intel i5-2500k NVIDIA Tesla C1060 Acceleration ,0 Scene Scene Scene Experiment no.3 - predictive evaluation Correctness of predictive evaluation was tested on several scenes. For the tests, we used three different low resolutions (32 32, and pixels) and three different types of scene division (based on data-driven approach: tiles, rows and columns). An example of the used scene and used divisions is shown in Figure 4. The division into 16 parts was used for this scene. Computation times for each part of individual divisions (tiles, rows and collumns) are shown in Figures 5, 6, 7. The results obtained in this experiment also empirically verified the assumption that was used for the predictive evaluation: the total rendering time of the scene increases proportionally with the resolution (see the increase in the rendering times for individual resolutions in Figures 5, 6, 7). 123
7 Acceleration of ray tracing method using predictive evaluation and GPGPU technology Figure 4. One of the used scenes with three types of division (from left): tiles, rows, columns. Figure 5. Computation time(left) and number of rays per 1ms (right) for each tile. Figure 6. Computation time (left) and number of rays per 1ms (right) for each row. 5. Conclusion Parallel ray tracing using GPGPU and CUDA technology is very popular research topic (for example [15 17]). But many existing solutions selects the best scene division for a parallelization according to tests, which were performed manually. Our solution uses for this selection predictive evaluation algorithm, which allows automatization of these tests according to the number of used threads. 124
8 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek Figure 7. Computation time (left) and number of rays per 1ms (right) for each column. It was found out during the design of application that for the parallelization of ray tracing on the level of pixels (group of pixels) it is necessary to transfer significant part of used raytracer core into GPGPU language (in our solution CUDA C). The algorithm of parallelization used in this solution is universal and can be used in other ray tracing applications as well. The results of our solution were presented in the results of the experiment no.2 (computation times for sample scenes using GPU and CPU). During this experiment we observed acceleration up to 28.3 compared to CPU. However this acceleration strongly depends on the scene and rendering parameters, for example: rendering resolution, scene complexity or scene division for the parallelization. Also, usage of parallel solution for ray tracing computation on CPU can affects this acceleration strongly. An important part of our solution is the predictive evaluation, which allows semi-automatic selection of the optimal scene division for the parallelization on GPGPU. The rendering times using this evaluation were shown in the results of the experiment no.3. There is still an open question about predictive evaluation of scenes with lower resolutions (up to several hundreds of pixels). Evaluation of these scenes can take too much time comparing to time needed for their rendering. In this case it is better to render these scenes without evaluation. Formal description of ray tracing/parallel-ray tracing and its implementation is also excellent basis for teaching of formal methods [18]. The using of Petri nets is also a perfect base for time gains evaluation. Another question is using OpenCL as GPGPU language. Its usage would enable execution of the application on graphics cards of other manufacturers. However, results would be probably worse than by using CUDA. Evaluation of this hypothesis is the goal of our future works, which will be focused on two main areas: implementation of parallel ray tracing using OpenCL and comparison of results obtained by both GPGPU technologies. 6. Acknowlwdgment This work is supported by the project KEGA no. Teaching Formal Methods". 050TUKE-4/2012: "Application of Virtual reality Technologies in References [1] B. Sobota, M. Straka, J. Perháč, A visualization in cluster environment, Grid Computing for Complex Problem 2007, Bratislava, (Institute of Informatics SAV, Bratislava, 2007) [2] M. Jelšina, B. Sobota, M. Strak, Parallel Hierarchical Model of Visualisation Computing in Virtual Reality System, In proceedings of: 7 th Scientific Conference with International Participation, Engineering of Modern Electric Systems 2003 (EMES 03), University of Oradea Romania - Faculty of Electrotechnics and Informatics Department of 125
9 Acceleration of ray tracing method using predictive evaluation and GPGPU technology Computer Science, Romania, Oradea, May (University of Oradea Romania Faculty of Electrotechnics and Informatics Department of Computer Science, Oradea) [3] G. Massal, A raytracer in C Introduction-What-is-ray-tracing.html [cited April 2013] [4] I. Notkin, C. Gotsman, Parallel Progressive Ray-tracing, Comput. Graph. Forum 16(1), 43 55, 1997 [5] I. Wald, W.R. Mark, J. GntherBoulo et. al., State of the Art in Ray Tracing Animated Scenes, Comput. Graph. Forum 28(6) , 2009 [6] A. Heirich, J. Arvo, A competitive analysis of load balancing strategies for parallel ray tracing, JoS 12, 57 68, 1998 [7] V. Archirapatkave, H. Sumilo, S.C.W. See et al., GPGPU Acceleration Algorithm for Medical Image Reconstruction, IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA 2011) May 2011, 41 46, [8] B. Hu, X. Ma, M. Joyce et al., A GPGPU accelerated compressed sensing with tight wavelet frame transform technique for MR imaging reconstruction, IEEE International Conference on Imaging Systems and Techniques (IST 2012), July 2012, [9] G. Chen, G. Li, S. Pei, B. Wu, GPGPU supported cooperative acceleration in molecular dynamics, 13th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2009), April 2009, [10] W. Liu, B. Schmidt, G. Voss et. al., Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA, Comput. Phys. Comm. 179(9), , 2008 [11] D. Hallmans, K. Sandstrom, M. Lindgren, T. Nolte, GPGPU for industrial control systems, IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA 2013), Sept. 2013, 1 4 [12] T. Messay, Chong Chen, R. Ordonez et. al, GPGPU acceleration of a novel calibration method for industrial robots, In proceedings of: 2011 IEEE National Aerospace and Electronics Conference (NAECON 2011), July 2011, [13] E. Reinhard, A.J. Kok, P.W. Jansen, Cost prediction in ray tracing, Rendering Techniques 96 (Springer, Vienna, 1996) [14] R. Šoltys, Raytracing method implementation using GPGPU technology, Diploma thesis, Technical university of Košice, FEEI, 2012 [15] R. Geist, J. Steele, A lighting model for fast rendering of forest ecosystems, IEEE Symposium on Interactive Ray Tracing, 2008, RT 2008., 9-10 Aug. 2008, [16] S. Guntury, P.J. Narayanan, Raytracing Dynamic Scenes on the GPU Using Grids, IEEE Trans. Visual. Comput. Graphics 18(1), 5 16, 2012 [17] A. Segovia, L. Xiaoming, G. Guang, Iterative layer-based raytracing on CUDA, 28th IEEE International Performance Computing and Communications Conference (IPCCC 2009), Dec. 2009, [18] Š. Korečko, B. Sobota, Using coloured Petri nets for design of parallel raytracing environment, Acta Univ. Sapientiae 2(1), 28 39,
Ray tracing based fast refraction method for an object seen through a cylindrical glass
20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Ray tracing based fast refraction method for an object seen through a cylindrical
More informationImproving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm
Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationPARALLEL SCENE SPLITTING AND ASSIGNING FOR FAST RAY TRACING
Acta Electrotechnica et Informatica, Vol. 10, No. 2, 2010, 33 37 33 PARALLEL SCENE SPLITTING AND ASSIGNING FOR FAST RAY TRACING Liberios VOKOROKOS, Eva DANKOVÁ, Norbert ÁDÁM Department of Computers and
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationSimultaneous Solving of Linear Programming Problems in GPU
Simultaneous Solving of Linear Programming Problems in GPU Amit Gurung* amitgurung@nitm.ac.in Binayak Das* binayak89cse@gmail.com Rajarshi Ray* raj.ray84@gmail.com * National Institute of Technology Meghalaya
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationBuilding a Fast Ray Tracer
Abstract Ray tracing is often used in renderers, as it can create very high quality images at the expense of run time. It is useful because of its ability to solve many different problems in image rendering.
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationFast BVH Construction on GPUs
Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationRow Tracing with Hierarchical Occlusion Maps
Row Tracing with Hierarchical Occlusion Maps Ravi P. Kammaje, Benjamin Mora August 9, 2008 Page 2 Row Tracing with Hierarchical Occlusion Maps Outline August 9, 2008 Introduction Related Work Row Tracing
More informationEfficient Depth-Compensated Interpolation for Full Parallax Displays
ETN-FPI TS3 Plenoptic Sensing Efficient Depth-Compensated Interpolation for Full Parallax Displays Reinhard Koch and Daniel Jung Multimedia Information Processing Full parallax autostereoscopic display
More informationAccelerating Ray Tracing
Accelerating Ray Tracing Ray Tracing Acceleration Techniques Faster Intersections Fewer Rays Generalized Rays Faster Ray-Object Intersections Object bounding volumes Efficient intersection routines Fewer
More informationand Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller
Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller Organization People Waqar Saleem, waqar.saleem@uni-jena.de Jens Mueller, jkm@informatik.uni-jena.de Room 3335, Ernst-Abbe-Platz 2
More informationA Hybrid Approach to Parallel Connected Component Labeling Using CUDA
International Journal of Signal Processing Systems Vol. 1, No. 2 December 2013 A Hybrid Approach to Parallel Connected Component Labeling Using CUDA Youngsung Soh, Hadi Ashraf, Yongsuk Hae, and Intaek
More informationGPU programming. Dr. Bernhard Kainz
GPU programming Dr. Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms Pitfalls and best practice Reduction and tiling
More informationComputer Graphics. Lecture 13. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura
Computer Graphics Lecture 13 Global Illumination 1: Ray Tracing and Radiosity Taku Komura 1 Rendering techniques Can be classified as Local Illumination techniques Global Illumination techniques Local
More informationImplementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU
Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationEvaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique
Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Xingxing Zhu and Yangdong Deng Institute of Microelectronics, Tsinghua University, Beijing, China Email: zhuxingxing0107@163.com,
More informationRay Tracing. Computer Graphics CMU /15-662, Fall 2016
Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives
More informationAccelerating K-Means Clustering with Parallel Implementations and GPU computing
Accelerating K-Means Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam
More informationGPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir
GPGPU Applications for Hydrological and Atmospheric Simulations and Visualizations on the Web Ibrahim Demir Big Data We are collecting and generating data on a petabyte scale (1Pb = 1,000 Tb = 1M Gb) Data
More informationHere s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and
1 Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and the light. 2 To visualize this problem, consider the
More informationGlobal Rendering. Ingela Nyström 1. Effects needed for realism. The Rendering Equation. Local vs global rendering. Light-material interaction
Effects needed for realism Global Rendering Computer Graphics 1, Fall 2005 Lecture 7 4th ed.: Ch 6.10, 12.1-12.5 Shadows Reflections (Mirrors) Transparency Interreflections Detail (Textures etc.) Complex
More informationBenchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.
I. Course Title Parallel Computing 2 II. Course Description Students study parallel programming and visualization in a variety of contexts with an emphasis on underlying and experimental technologies.
More informationCOMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.
COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering
More informationRendering and Modeling of Transparent Objects. Minglun Gong Dept. of CS, Memorial Univ.
Rendering and Modeling of Transparent Objects Minglun Gong Dept. of CS, Memorial Univ. Capture transparent object appearance Using frequency based environmental matting Reduce number of input images needed
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationAN ACCELERATION OF FPGA-BASED RAY TRACER
AN ACCELERATION OF FPGA-BASED RAY TRACER Raisa Malcheva, PhD Mohammad Yunis, MA Donetsk National Technical University, Ukraine Abstract The Hardware implementations of the Ray Tracing algorithm are analyzed.
More informationLecture 1: Introduction and Computational Thinking
PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational
More informationParallel Computer Architecture and Programming Final Project
Muhammad Hilman Beyri (mbeyri), Zixu Ding (zixud) Parallel Computer Architecture and Programming Final Project Summary We have developed a distributed interactive ray tracing application in OpenMP and
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationComputer Graphics. Lecture 10. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura 12/03/15
Computer Graphics Lecture 10 Global Illumination 1: Ray Tracing and Radiosity Taku Komura 1 Rendering techniques Can be classified as Local Illumination techniques Global Illumination techniques Local
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationA Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function
A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao
More informationScalable multi-gpu cloud raytracing with OpenGL
Scalable multi-gpu cloud raytracing with OpenGL University of Žilina Digital technologies 2014, Žilina, Slovakia Overview Goals Rendering distant details in visualizations Raytracing Multi-GPU programming
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationModern GPUs (Graphics Processing Units)
Modern GPUs (Graphics Processing Units) Powerful data parallel computation platform. High computation density, high memory bandwidth. Relatively low cost. NVIDIA GTX 580 512 cores 1.6 Tera FLOPs 1.5 GB
More informationFacial Recognition Using Neural Networks over GPGPU
Facial Recognition Using Neural Networks over GPGPU V Latin American Symposium on High Performance Computing Juan Pablo Balarini, Martín Rodríguez and Sergio Nesmachnow Centro de Cálculo, Facultad de Ingeniería
More informationAdaptive Assignment for Real-Time Raytracing
Adaptive Assignment for Real-Time Raytracing Paul Aluri [paluri] and Jacob Slone [jslone] Carnegie Mellon University 15-418/618 Spring 2015 Summary We implemented a CUDA raytracer accelerated by a non-recursive
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationL10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion
L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationReconstruction Improvements on Compressive Sensing
SCITECH Volume 6, Issue 2 RESEARCH ORGANISATION November 21, 2017 Journal of Information Sciences and Computing Technologies www.scitecresearch.com/journals Reconstruction Improvements on Compressive Sensing
More informationMost real programs operate somewhere between task and data parallelism. Our solution also lies in this set.
for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationV-Ray RT: A New Paradigm in Photorealistic Raytraced Rendering on NVIDIA GPUs. Vladimir Koylazov Chaos Software.
V-Ray RT: A New Paradigm in Photorealistic Raytraced Rendering on NVIDIA s Vladimir Koylazov Chaos Software V-Ray RT demonstration V-Ray RT demonstration V-Ray RT architecture overview Goals of V-Ray RT
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More information3D Registration based on Normalized Mutual Information
3D Registration based on Normalized Mutual Information Performance of CPU vs. GPU Implementation Florian Jung, Stefan Wesarg Interactive Graphics Systems Group (GRIS), TU Darmstadt, Germany stefan.wesarg@gris.tu-darmstadt.de
More informationEnhancing Traditional Rasterization Graphics with Ray Tracing. October 2015
Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationCross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing
and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing Chris Lupo Computer Science Cal Poly Session 0311 GTC 2012 Slide 1 The Meta Data Cal Poly is medium sized, public polytechnic
More informationVisual Analysis of Lagrangian Particle Data from Combustion Simulations
Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang
More informationA distributed rendering architecture for ray tracing large scenes on commodity hardware. FlexRender. Bob Somers Zoe J.
FlexRender A distributed rendering architecture for ray tracing large scenes on commodity hardware. GRAPP 2013 Bob Somers Zoe J. Wood Increasing Geometric Complexity Normal Maps artifacts on silhouette
More informationRecursion and Data Structures in Computer Graphics. Ray Tracing
Recursion and Data Structures in Computer Graphics Ray Tracing 1 Forward Ray Tracing imagine that you take a picture of a room using a camera exactly what is the camera sensing? light reflected from the
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationGPU Programming Using NVIDIA CUDA
GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationA Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT
A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationEfficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha
Symposium on Interactive Ray Tracing 2008 Los Angeles, California Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Kirill Garanzha Department of Software for Computers Bauman Moscow State
More informationConsider a partially transparent object that is illuminated with two lights, one visible from each side of the object. Start with a ray from the eye
Ray Tracing What was the rendering equation? Motivate & list the terms. Relate the rendering equation to forward ray tracing. Why is forward ray tracing not good for image formation? What is the difference
More informationN-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo
N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational
More informationComparison of High-Speed Ray Casting on GPU
Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationimplementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot
Parallel Implementation Algorithm of Motion Estimation for GPU Applications by Tian Song 1,2*, Masashi Koshino 2, Yuya Matsunohana 2 and Takashi Shimamoto 1,2 Abstract The video coding standard H.264/AVC
More informationRay Casting of Trimmed NURBS Surfaces on the GPU
Ray Casting of Trimmed NURBS Surfaces on the GPU Hans-Friedrich Pabst Jan P. Springer André Schollmeyer Robert Lenhardt Christian Lessig Bernd Fröhlich Bauhaus University Weimar Faculty of Media Virtual
More informationGPU Programming for Mathematical and Scientific Computing
GPU Programming for Mathematical and Scientific Computing Ethan Kerzner and Timothy Urness Department of Mathematics and Computer Science Drake University Des Moines, IA 50311 ethan.kerzner@gmail.com timothy.urness@drake.edu
More informationAccelerated Ambient Occlusion Using Spatial Subdivision Structures
Abstract Ambient Occlusion is a relatively new method that gives global illumination like results. This paper presents a method to accelerate ambient occlusion using the form factor method in Bunnel [2005]
More informationA Cross-Input Adaptive Framework for GPU Program Optimizations
A Cross-Input Adaptive Framework for GPU Program Optimizations Yixun Liu, Eddy Z. Zhang, Xipeng Shen Computer Science Department The College of William & Mary Outline GPU overview G-Adapt Framework Evaluation
More informationOpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data
OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION
More informationComputer Vision Systems. Dean, Faculty of Technology Professor, Department of Technology University of Pune, Pune
Improving Performance for Computer Vision Systems Dr. Aditya Abhyankar Dean, Faculty of Technology Professor, Department of Technology University of Pune, Pune Homography based Hybrid Mixture Model for
More informationSubset Sum Problem Parallel Solution
Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in
More informationRT 3D FDTD Simulation of LF and MF Room Acoustics
RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id. 749612 andreaemanuele.greco@mail.polimi.it ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing.
More informationHigh Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño
High Quality DXT Compression using OpenCL for CUDA Ignacio Castaño icastano@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change 0.1 02/01/2007 Ignacio Castaño First
More informationA GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou
A GPU Implementation of Tiled Belief Propagation on Markov Random Fields Hassan Eslami Theodoros Kasampalis Maria Kotsifakou BP-M AND TILED-BP 2 BP-M 3 Tiled BP T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 4 Tiled
More informationGPU Implementation of a Multiobjective Search Algorithm
Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen
More informationReal-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.
Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Ray Tracing with Tim Purcell 1 Topics Why ray tracing? Interactive ray tracing on multicomputers
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationCurrent Trends in Computer Graphics Hardware
Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)
More informationarxiv: v1 [physics.ins-det] 11 Jul 2015
GPGPU for track finding in High Energy Physics arxiv:7.374v [physics.ins-det] Jul 5 L Rinaldi, M Belgiovine, R Di Sipio, A Gabrielli, M Negrini, F Semeria, A Sidoti, S A Tupputi 3, M Villa Bologna University
More informationA Simulated Annealing algorithm for GPU clusters
A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011 1 Introduction 2 3 The lower level The upper
More informationJournal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.
Journal of Universal Computer Science, vol. 14, no. 14 (2008), 2416-2427 submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.UCS Tabu Search on GPU Adam Janiak (Institute of Computer Engineering
More informationDeformable and Fracturing Objects
Interactive ti Collision i Detection ti for Deformable and Fracturing Objects Sung-Eui Yoon ( 윤성의 ) IWON associate professor KAIST http://sglab.kaist.ac.kr/~sungeui/ Acknowledgements Research collaborators
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationGeoImaging Accelerator Pansharpen Test Results. Executive Summary
Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has
More informationDense matching GPU implementation
Dense matching GPU implementation Author: Hailong Fu. Supervisor: Prof. Dr.-Ing. Norbert Haala, Dipl. -Ing. Mathias Rothermel. Universität Stuttgart 1. Introduction Correspondence problem is an important
More informationScalable Ambient Effects
Scalable Ambient Effects Introduction Imagine playing a video game where the player guides a character through a marsh in the pitch black dead of night; the only guiding light is a swarm of fireflies that
More informationRendering Computer Animations on a Network of Workstations
Rendering Computer Animations on a Network of Workstations Timothy A. Davis Edward W. Davis Department of Computer Science North Carolina State University Abstract Rendering high-quality computer animations
More informationRate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations
Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Prashant Ramanathan and Bernd Girod Department of Electrical Engineering Stanford University Stanford CA 945
More informationTHE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS
Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT
More informationAn Implementation of Ray Tracing in CUDA
An Implementation of Ray Tracing in CUDA CSE 260 Project Report Liang Chen Hirakendu Das Shengjun Pan December 4, 2009 Abstract In computer graphics, ray tracing is a popular technique for rendering images
More information