Acceleration of ray tracing method using predictive evaluation and GPGPU technology

Size: px

Start display at page:

Download "Acceleration of ray tracing method using predictive evaluation and GPGPU technology"

Claribel Greer
6 years ago
Views:

1 Cent. Eur. J. Comp. Sci. 4(3) DOI: /s Central European Journal of Computer Science Acceleration of ray tracing method using predictive evaluation and GPGPU technology Research Article Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, Košice, Slovakia Received 28 February 2014; accepted 29 August 2014 Abstract: Ray tracing is one of computer graphics methods for achieving the most realistic outputs. Its main disadvantage is high computation demands. Removal of this disadvantage is possible using parallelization due to the fact that the ray tracing method is inherently parallel. Solution presented in this article uses GPGPU (general-purpose computing on graphics processing units) technology and a predictive evaluation for the acceleration of ray tracing method. The CUDA C was selected as a GPGPU language and it was used for a conversion of a raytracer core. The main reason for choosing this language was usage of the Tesla C1060 graphics card. The predictive evaluation of a scene was based on the fact that total computation time increases proportionally with resolution. This evaluation allows selection of the optimal scene division for the parallel ray tracing. In tests, proposed GPGPU solution reached accelerations up to 28.3 comparing to CPU. Keywords: ray tracing parallel ray tracing GPGPU CUDA NVIDIA Versita sp. z o.o. 1. Introduction Ray tracing is one of computer graphics techniques used to produce accurate images of photorealistic quality from complex three-dimensional scenes described and stored in a computer-readable form. It is based on simulation of realworld optical processes. One great disadvantage of such techniques is that they are computationally very expensive and require massive amounts of floating point operations. Parallel ray tracing takes advantage of parallel computing to speed up image rendering, since this technique is inherently parallel 1 [1]. In nature, light sources emit rays of light, which travel through space and interact with objects and environment, by which they are absorbed, reflected, or refracted. These rays are then received by our eyes and form a picture. Ray tracing branislav.sobota@tuke.sk stefan.korecko@tuke.sk csaba.szabo@tuke.sk frantisek.hrozek@tuke.sk (Corresponding author) 1 OpenCL Reference pages - official website https: // www. khronos. org/ opencl/ [cited May 2013] 118

2 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek produces images by simulating these processes, with one significant modification. Emitting rays from light sources and tracking them would be very time-consuming and inefficient, because only a small fraction ends up in the eye/camera, the rest is irrelevant. So instead of this, ray tracing casts rays from camera through image plane (for each pixel of final image) into the scene and tracks these rays. It computes the intersection of the ray with the first surface it collides with, examines the material properties (casting additional rays for refraction/reflection if necessary) and incoming light from light sources in the scene (by casting additional rays from intersection to each source) and then computes the colour of the pixel in the final image [2 4]. A survey of the current techniques for raytracing can be found in [5]. Ray tracing belongs to a set of problems that utilize parallel computing very well, since it is computationally expensive and can be easily decomposed. The two main factors influencing the design and performance of parallel ray tracing systems are the computation model [3] and the load-balancing mechanism [6]. The idea to use graphics cards and their parallel possibilities for non-graphical computations has been developed in many ways during recent years and the acceleration of applications based on GPU utilisation is already common these days (for example medical image reconstruction [7, 8], molecular dynamics [9, 10] or industry [11, 12]). This technology is also applicable to photorealistic displaying methods. Ray tracing belongs to these methods and its algorithm can be very well parallelized as was mentioned earlier. The main goal of this article is to propose a parallel ray tracing solution using GPGPU technology, which concerns the predictive evaluation of the scene. This evaluation is executed before rendering of a scene in full resolution and it allows selection of the optimal scene division for the parallelization on GPGPU. The article also describes results obtained by our solution using statistical comparison of rendering times. 2. Analysis 2.1. Parallel ray tracing There are two principal methods of decomposing a ray tracing computation: demand-driven and data-driven (or dataparallel), and there are researches focused on developing a hybrid model trying to combine the best features of both mentioned methods 2 [4]. Demand-driven parallel ray tracing computes the final product of ray tracer as an image of m*n pixels, and since each pixel is computed independently, the most obvious way of decomposition is to divide the image into p parts, where p is number of processors available and each processor would compute m*n/p pixels and ideally, the computation would be p times faster. This approach is called demand-driven parallel ray tracing. A number of jobs are created, each containing a different subset of image pixels and these jobs are assigned to processors. Input scene is copied to local memory of each processor. Processors render their parts, return computed pixels, get another job if there is any, and in the end the final image is composed from these parts. Main benefits of this approach are: easy decomposition and implementation, simple job distribution and control and the general ray tracing algorithm remains unchanged and it scales well. The main disadvantage is that input scene has to be copied to local memory of each processor, which poses a problem if the scene is very large. Data-driven parallel ray tracing (also called data-parallel ray tracing) splits the input scene into a number of sections (rows, tiles or columns, Figure 1) and assigns these sections to processors. Each processor is responsible for all computations associated with objects in this particular section, no matter where the ray comes from. Only rays passing through the processor s section are traced. If a ray spawned at one processor needs data from another processor, it is transferred to that processor. The way the scene is divided into section determines the efficiency of parallel computation. Determining the number of rays that will pass through a section of the scene in order to estimate the sections requiring the most processing is one of the hardest problems to overcome. Using the cost function can be helpful. Main benefit of this approach is that the input scene doesn t have to be copied entirely to each processor, but it is split into sections, so even very large scenes can be processed relatively easy. Main disadvantage is that this approach doesn t scale very well with growing scene complexity and cluster size, because of substantial task communication overhead and ray 2 OpenCL Reference pages - official website https: // www. khronos. org/ opencl/ [cited May 2013] 119

3 Acceleration of ray tracing method using predictive evaluation and GPGPU technology transfers [4]. Figure 1. Tilling examples for the demand-driven parallel ray tracing Predictive evaluation There are two methods of a scene predictive evaluation [13]: prediction based on scene description and prediction based on scene simplification. Prediction based on scene description uses parameters of objects (e.g. material, position or type) and is executed without scene rendering. According to the combination of these parameter we distinguish: scene prediction based on material analysis; scene prediction based on material and position analysis; scene prediction based on material, position and type analysis. Prediction based on scene simplification uses a simplified version of the original scene. For this simplification can be used: reduction of scene resolution; limitation of depth recursion used for scene rendering; histogram of objects "dificulty". Disadvantage of the predictive evaluation is the fact that the prediction strongly depends on the used scene and therefore time needed for computation varies. 3. Design and implementation 3.1. Parallel raytracer The implementation was done in several steps: a GPGPU language selection, a raytracer selection, transformation of a raytracer core into a parallel raytracer core using CUDA C language, optimization and testing (this step is covered in the section Experiments and results). CUDA C 3 was selected as a GPGPU language in our solution. The main reason 3 NVIDIA Corporation: NVIDIA CUDA C Programming guide - official website http: // docs. nvidia. com/ cuda/ cuda-c-programming-guide/ index. html [cited April 2013] 120

Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek for choosing this language was the usage of the Tesla C1060 graphics card, which reaches highest output results with this language [14].

The next step was an optimization of CUDA application, which consisted of two sub-steps: memory usage optimization and stream processors occupancy optimization.

4 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek for choosing this language was the usage of the Tesla C1060 graphics card, which reaches highest output results with this language [14]. The solution created by Grégory Massal [3] was used for the raytracer. Subsequently, the core of this raytracer was converted into CUDA C. The next step was an optimization of CUDA application, which consisted of two sub-steps: memory usage optimization and stream processors occupancy optimization. The memory optimization was based on the following rules: use fast memories on GPU whenever possible, minimize usage of slow memories on GPU and minimize data copying between GPU and host system. The CUDA GPU Occupancy Calculator 4 was used for the stream processors occupancy optimization. An optimal amount of threads for used graphic cards (NVIDIA Tesla C1060 and GeForce GTX 275) was 128 or 320. Number of parts, into which scene was divided for parallel ray tracing, was selected according to the optimal number of threads. These parts were divided according to the data-driven approach (into tiles, rows or columns): selected sizes of parts for 128 threads (points) - 16 points on the x-axis and 8 points on the y-axis, 128 points on the x-axis and 128 points on the y-axis; selected sizes of parts for 320 threads (points) - 20 points on the x-axis and 16 points on the y-axis, 320 points on the x-axis and 320 points on the y-axis Predictive evaluation Predictive evaluation used in our solution is based on a fact that total time of a scene rendering increases proportionally with a resolution. Using this fact, an algorithm for the predictive evaluation looks as follows: 1. render a scene in a low resolution using different types of scene division; 2. select division with the lowest rendering time; 3. render a scene in full resolution using this division. Outputs of this predictive evaluation for a sample scene using three various low resolutions and three various divisions are shown in the next section: Experiments and results. 4. Experiments and results Sample scenes from Figure 2 have been used for the Experiment no.1 and no.2. identification of scene complexity in individual parts (tiles, rows or columns). Selected scenes allow an easy Figure 2. Sample scenes (from left to right): scene 1, scene 2, scene 3. 4 NVIDIA Corporation: NVIDIA CUDA C Programming guide - official website http: // docs. nvidia. com/ cuda/ cuda-c-programming-guide/ index. html [cited April 2013] 121

Acceleration of ray tracing method using predictive evaluation and GPGPU technology 4.1. Experiment no.1 - testing parallel raytracer The experiment no.

8 GHz, NVIDIA GeForce GTX 275 896MB, 4 GB RAM and Windows 7 64-bit. The results of this comparison are shown in Table 1.

5 Acceleration of ray tracing method using predictive evaluation and GPGPU technology 4.1. Experiment no.1 - testing parallel raytracer The experiment no. 1 was focused on the comparison of final times for various divisions of sample scenes. The hardware configuration used for the experiment: Intel Dual Core E6300 overclocked to 3.8 GHz, NVIDIA GeForce GTX MB, 4 GB RAM and Windows 7 64-bit. The results of this comparison are shown in Table 1. Visual representation of this comparison is shown in Figure 3. Table 1. Final times for sample scenes divided into 128 and 320 parts (resolution ). Size Scene 1 Scene 2 Scene 3 16 points on the x-axis and 8 points on the y-axis points on the x-axis points on the y-axis points on the x-axis and 16 points on the y-axis points on the x-axis points on the y-axis As can be seen, the optimal division for the first and the third scene is 128 points on the y-axis. For the second scene is optimal division 16 points on the x-axis and 8 points on the y-axis. Worst results were obtained using division 128 points on the x-axis. The results also confirmed the assumption that the total computation time depends on the used division, while the optimal division depends on the used scene ,726 6,744 8, ,187 7,446 9,532 parts size (points) ,52 5,726 6,88 6,718 8,201 8,095 scene 3 scene 2 scene ,145 7,44 9, ,566 6,915 8, time (s) Figure 3. The comparison of final times for various divisions Experiment no.2 - testing acceleration GPU vs. CPU The experiment no. 2 was focused on the comparison of computation times between the graphics cards and processors. These CPUs and GPUs were compared in the experiment: Intel Dual Core E6300 (overclocked to 3.8 GHz), Intel i5-2500k processor (overclocked to 4.5 GHz), NVIDIA GeForce GTX MB and NVIDIA Tesla C1060. The division with 20 points on the x-axis and 16 points on the y-axis (320 threads) was used for the parallelization on GPUs. Ray 122

6 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek tracing computation on CPUs was not parallel so only one core was used for computation. Results of this comparison are shown in Table 2. As it can be seen, the GPGPU solution reaches much more lower rendering times. Comparison of final times and accelerations between the best CPU (Intel i5-2500k) and GPU (NVIDIA Tesla C1060) is shown in Table 3. Average acceleration observed in this experiment was Table 2. Computation times comparison (in seconds). Scene Resolution Intel Core 2 Duo E6300 Intel i5-2500k NVIDIA GeForce GTX 275 NVIDIA Tesla C Scene Scene Scene Table 3. Comparison of times and accelerations between the Intel i5-2500k and the NVIDIA Tesla C1060 (in seconds). Scene Resolution Intel i5-2500k NVIDIA Tesla C1060 Acceleration ,0 Scene Scene Scene Experiment no.3 - predictive evaluation Correctness of predictive evaluation was tested on several scenes. For the tests, we used three different low resolutions (32 32, and pixels) and three different types of scene division (based on data-driven approach: tiles, rows and columns). An example of the used scene and used divisions is shown in Figure 4. The division into 16 parts was used for this scene. Computation times for each part of individual divisions (tiles, rows and collumns) are shown in Figures 5, 6, 7. The results obtained in this experiment also empirically verified the assumption that was used for the predictive evaluation: the total rendering time of the scene increases proportionally with the resolution (see the increase in the rendering times for individual resolutions in Figures 5, 6, 7). 123

Acceleration of ray tracing method using predictive evaluation and GPGPU technology Figure 4. One of the used scenes with three types of division (from left): tiles, rows, columns. Figure 5.

7 Acceleration of ray tracing method using predictive evaluation and GPGPU technology Figure 4. One of the used scenes with three types of division (from left): tiles, rows, columns. Figure 5. Computation time(left) and number of rays per 1ms (right) for each tile. Figure 6. Computation time (left) and number of rays per 1ms (right) for each row. 5. Conclusion Parallel ray tracing using GPGPU and CUDA technology is very popular research topic (for example [15 17]). But many existing solutions selects the best scene division for a parallelization according to tests, which were performed manually. Our solution uses for this selection predictive evaluation algorithm, which allows automatization of these tests according to the number of used threads. 124

8 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek Figure 7. Computation time (left) and number of rays per 1ms (right) for each column. It was found out during the design of application that for the parallelization of ray tracing on the level of pixels (group of pixels) it is necessary to transfer significant part of used raytracer core into GPGPU language (in our solution CUDA C). The algorithm of parallelization used in this solution is universal and can be used in other ray tracing applications as well. The results of our solution were presented in the results of the experiment no.2 (computation times for sample scenes using GPU and CPU). During this experiment we observed acceleration up to 28.3 compared to CPU. However this acceleration strongly depends on the scene and rendering parameters, for example: rendering resolution, scene complexity or scene division for the parallelization. Also, usage of parallel solution for ray tracing computation on CPU can affects this acceleration strongly. An important part of our solution is the predictive evaluation, which allows semi-automatic selection of the optimal scene division for the parallelization on GPGPU. The rendering times using this evaluation were shown in the results of the experiment no.3. There is still an open question about predictive evaluation of scenes with lower resolutions (up to several hundreds of pixels). Evaluation of these scenes can take too much time comparing to time needed for their rendering. In this case it is better to render these scenes without evaluation. Formal description of ray tracing/parallel-ray tracing and its implementation is also excellent basis for teaching of formal methods [18]. The using of Petri nets is also a perfect base for time gains evaluation. Another question is using OpenCL as GPGPU language. Its usage would enable execution of the application on graphics cards of other manufacturers. However, results would be probably worse than by using CUDA. Evaluation of this hypothesis is the goal of our future works, which will be focused on two main areas: implementation of parallel ray tracing using OpenCL and comparison of results obtained by both GPGPU technologies. 6. Acknowlwdgment This work is supported by the project KEGA no. Teaching Formal Methods". 050TUKE-4/2012: "Application of Virtual reality Technologies in References [1] B. Sobota, M. Straka, J. Perháč, A visualization in cluster environment, Grid Computing for Complex Problem 2007, Bratislava, (Institute of Informatics SAV, Bratislava, 2007) [2] M. Jelšina, B. Sobota, M. Strak, Parallel Hierarchical Model of Visualisation Computing in Virtual Reality System, In proceedings of: 7 th Scientific Conference with International Participation, Engineering of Modern Electric Systems 2003 (EMES 03), University of Oradea Romania - Faculty of Electrotechnics and Informatics Department of 125

9 Acceleration of ray tracing method using predictive evaluation and GPGPU technology Computer Science, Romania, Oradea, May (University of Oradea Romania Faculty of Electrotechnics and Informatics Department of Computer Science, Oradea) [3] G. Massal, A raytracer in C Introduction-What-is-ray-tracing.html [cited April 2013] [4] I. Notkin, C. Gotsman, Parallel Progressive Ray-tracing, Comput. Graph. Forum 16(1), 43 55, 1997 [5] I. Wald, W.R. Mark, J. GntherBoulo et. al., State of the Art in Ray Tracing Animated Scenes, Comput. Graph. Forum 28(6) , 2009 [6] A. Heirich, J. Arvo, A competitive analysis of load balancing strategies for parallel ray tracing, JoS 12, 57 68, 1998 [7] V. Archirapatkave, H. Sumilo, S.C.W. See et al., GPGPU Acceleration Algorithm for Medical Image Reconstruction, IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA 2011) May 2011, 41 46, [8] B. Hu, X. Ma, M. Joyce et al., A GPGPU accelerated compressed sensing with tight wavelet frame transform technique for MR imaging reconstruction, IEEE International Conference on Imaging Systems and Techniques (IST 2012), July 2012, [9] G. Chen, G. Li, S. Pei, B. Wu, GPGPU supported cooperative acceleration in molecular dynamics, 13th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2009), April 2009, [10] W. Liu, B. Schmidt, G. Voss et. al., Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA, Comput. Phys. Comm. 179(9), , 2008 [11] D. Hallmans, K. Sandstrom, M. Lindgren, T. Nolte, GPGPU for industrial control systems, IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA 2013), Sept. 2013, 1 4 [12] T. Messay, Chong Chen, R. Ordonez et. al, GPGPU acceleration of a novel calibration method for industrial robots, In proceedings of: 2011 IEEE National Aerospace and Electronics Conference (NAECON 2011), July 2011, [13] E. Reinhard, A.J. Kok, P.W. Jansen, Cost prediction in ray tracing, Rendering Techniques 96 (Springer, Vienna, 1996) [14] R. Šoltys, Raytracing method implementation using GPGPU technology, Diploma thesis, Technical university of Košice, FEEI, 2012 [15] R. Geist, J. Steele, A lighting model for fast rendering of forest ecosystems, IEEE Symposium on Interactive Ray Tracing, 2008, RT 2008., 9-10 Aug. 2008, [16] S. Guntury, P.J. Narayanan, Raytracing Dynamic Scenes on the GPU Using Grids, IEEE Trans. Visual. Comput. Graphics 18(1), 5 16, 2012 [17] A. Segovia, L. Xiaoming, G. Guang, Iterative layer-based raytracing on CUDA, 28th IEEE International Performance Computing and Communications Conference (IPCCC 2009), Dec. 2009, [18] Š. Korečko, B. Sobota, Using coloured Petri nets for design of parallel raytracing environment, Acta Univ. Sapientiae 2(1), 28 39,

Ray tracing based fast refraction method for an object seen through a cylindrical glass

$Ray tracing based fast refraction method for an object seen through a cylindrical glass$ 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Ray tracing based fast refraction method for an object seen through a cylindrical