Acceleration of ray tracing method using predictive evaluation and GPGPU technology

Size: px
Start display at page:

Download "Acceleration of ray tracing method using predictive evaluation and GPGPU technology"

Transcription

1 Cent. Eur. J. Comp. Sci. 4(3) DOI: /s Central European Journal of Computer Science Acceleration of ray tracing method using predictive evaluation and GPGPU technology Research Article Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, Košice, Slovakia Received 28 February 2014; accepted 29 August 2014 Abstract: Ray tracing is one of computer graphics methods for achieving the most realistic outputs. Its main disadvantage is high computation demands. Removal of this disadvantage is possible using parallelization due to the fact that the ray tracing method is inherently parallel. Solution presented in this article uses GPGPU (general-purpose computing on graphics processing units) technology and a predictive evaluation for the acceleration of ray tracing method. The CUDA C was selected as a GPGPU language and it was used for a conversion of a raytracer core. The main reason for choosing this language was usage of the Tesla C1060 graphics card. The predictive evaluation of a scene was based on the fact that total computation time increases proportionally with resolution. This evaluation allows selection of the optimal scene division for the parallel ray tracing. In tests, proposed GPGPU solution reached accelerations up to 28.3 comparing to CPU. Keywords: ray tracing parallel ray tracing GPGPU CUDA NVIDIA Versita sp. z o.o. 1. Introduction Ray tracing is one of computer graphics techniques used to produce accurate images of photorealistic quality from complex three-dimensional scenes described and stored in a computer-readable form. It is based on simulation of realworld optical processes. One great disadvantage of such techniques is that they are computationally very expensive and require massive amounts of floating point operations. Parallel ray tracing takes advantage of parallel computing to speed up image rendering, since this technique is inherently parallel 1 [1]. In nature, light sources emit rays of light, which travel through space and interact with objects and environment, by which they are absorbed, reflected, or refracted. These rays are then received by our eyes and form a picture. Ray tracing branislav.sobota@tuke.sk stefan.korecko@tuke.sk csaba.szabo@tuke.sk frantisek.hrozek@tuke.sk (Corresponding author) 1 OpenCL Reference pages - official website https: // www. khronos. org/ opencl/ [cited May 2013] 118

2 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek produces images by simulating these processes, with one significant modification. Emitting rays from light sources and tracking them would be very time-consuming and inefficient, because only a small fraction ends up in the eye/camera, the rest is irrelevant. So instead of this, ray tracing casts rays from camera through image plane (for each pixel of final image) into the scene and tracks these rays. It computes the intersection of the ray with the first surface it collides with, examines the material properties (casting additional rays for refraction/reflection if necessary) and incoming light from light sources in the scene (by casting additional rays from intersection to each source) and then computes the colour of the pixel in the final image [2 4]. A survey of the current techniques for raytracing can be found in [5]. Ray tracing belongs to a set of problems that utilize parallel computing very well, since it is computationally expensive and can be easily decomposed. The two main factors influencing the design and performance of parallel ray tracing systems are the computation model [3] and the load-balancing mechanism [6]. The idea to use graphics cards and their parallel possibilities for non-graphical computations has been developed in many ways during recent years and the acceleration of applications based on GPU utilisation is already common these days (for example medical image reconstruction [7, 8], molecular dynamics [9, 10] or industry [11, 12]). This technology is also applicable to photorealistic displaying methods. Ray tracing belongs to these methods and its algorithm can be very well parallelized as was mentioned earlier. The main goal of this article is to propose a parallel ray tracing solution using GPGPU technology, which concerns the predictive evaluation of the scene. This evaluation is executed before rendering of a scene in full resolution and it allows selection of the optimal scene division for the parallelization on GPGPU. The article also describes results obtained by our solution using statistical comparison of rendering times. 2. Analysis 2.1. Parallel ray tracing There are two principal methods of decomposing a ray tracing computation: demand-driven and data-driven (or dataparallel), and there are researches focused on developing a hybrid model trying to combine the best features of both mentioned methods 2 [4]. Demand-driven parallel ray tracing computes the final product of ray tracer as an image of m*n pixels, and since each pixel is computed independently, the most obvious way of decomposition is to divide the image into p parts, where p is number of processors available and each processor would compute m*n/p pixels and ideally, the computation would be p times faster. This approach is called demand-driven parallel ray tracing. A number of jobs are created, each containing a different subset of image pixels and these jobs are assigned to processors. Input scene is copied to local memory of each processor. Processors render their parts, return computed pixels, get another job if there is any, and in the end the final image is composed from these parts. Main benefits of this approach are: easy decomposition and implementation, simple job distribution and control and the general ray tracing algorithm remains unchanged and it scales well. The main disadvantage is that input scene has to be copied to local memory of each processor, which poses a problem if the scene is very large. Data-driven parallel ray tracing (also called data-parallel ray tracing) splits the input scene into a number of sections (rows, tiles or columns, Figure 1) and assigns these sections to processors. Each processor is responsible for all computations associated with objects in this particular section, no matter where the ray comes from. Only rays passing through the processor s section are traced. If a ray spawned at one processor needs data from another processor, it is transferred to that processor. The way the scene is divided into section determines the efficiency of parallel computation. Determining the number of rays that will pass through a section of the scene in order to estimate the sections requiring the most processing is one of the hardest problems to overcome. Using the cost function can be helpful. Main benefit of this approach is that the input scene doesn t have to be copied entirely to each processor, but it is split into sections, so even very large scenes can be processed relatively easy. Main disadvantage is that this approach doesn t scale very well with growing scene complexity and cluster size, because of substantial task communication overhead and ray 2 OpenCL Reference pages - official website https: // www. khronos. org/ opencl/ [cited May 2013] 119

3 Acceleration of ray tracing method using predictive evaluation and GPGPU technology transfers [4]. Figure 1. Tilling examples for the demand-driven parallel ray tracing Predictive evaluation There are two methods of a scene predictive evaluation [13]: prediction based on scene description and prediction based on scene simplification. Prediction based on scene description uses parameters of objects (e.g. material, position or type) and is executed without scene rendering. According to the combination of these parameter we distinguish: scene prediction based on material analysis; scene prediction based on material and position analysis; scene prediction based on material, position and type analysis. Prediction based on scene simplification uses a simplified version of the original scene. For this simplification can be used: reduction of scene resolution; limitation of depth recursion used for scene rendering; histogram of objects "dificulty". Disadvantage of the predictive evaluation is the fact that the prediction strongly depends on the used scene and therefore time needed for computation varies. 3. Design and implementation 3.1. Parallel raytracer The implementation was done in several steps: a GPGPU language selection, a raytracer selection, transformation of a raytracer core into a parallel raytracer core using CUDA C language, optimization and testing (this step is covered in the section Experiments and results). CUDA C 3 was selected as a GPGPU language in our solution. The main reason 3 NVIDIA Corporation: NVIDIA CUDA C Programming guide - official website http: // docs. nvidia. com/ cuda/ cuda-c-programming-guide/ index. html [cited April 2013] 120

4 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek for choosing this language was the usage of the Tesla C1060 graphics card, which reaches highest output results with this language [14]. The solution created by Grégory Massal [3] was used for the raytracer. Subsequently, the core of this raytracer was converted into CUDA C. The next step was an optimization of CUDA application, which consisted of two sub-steps: memory usage optimization and stream processors occupancy optimization. The memory optimization was based on the following rules: use fast memories on GPU whenever possible, minimize usage of slow memories on GPU and minimize data copying between GPU and host system. The CUDA GPU Occupancy Calculator 4 was used for the stream processors occupancy optimization. An optimal amount of threads for used graphic cards (NVIDIA Tesla C1060 and GeForce GTX 275) was 128 or 320. Number of parts, into which scene was divided for parallel ray tracing, was selected according to the optimal number of threads. These parts were divided according to the data-driven approach (into tiles, rows or columns): selected sizes of parts for 128 threads (points) - 16 points on the x-axis and 8 points on the y-axis, 128 points on the x-axis and 128 points on the y-axis; selected sizes of parts for 320 threads (points) - 20 points on the x-axis and 16 points on the y-axis, 320 points on the x-axis and 320 points on the y-axis Predictive evaluation Predictive evaluation used in our solution is based on a fact that total time of a scene rendering increases proportionally with a resolution. Using this fact, an algorithm for the predictive evaluation looks as follows: 1. render a scene in a low resolution using different types of scene division; 2. select division with the lowest rendering time; 3. render a scene in full resolution using this division. Outputs of this predictive evaluation for a sample scene using three various low resolutions and three various divisions are shown in the next section: Experiments and results. 4. Experiments and results Sample scenes from Figure 2 have been used for the Experiment no.1 and no.2. identification of scene complexity in individual parts (tiles, rows or columns). Selected scenes allow an easy Figure 2. Sample scenes (from left to right): scene 1, scene 2, scene 3. 4 NVIDIA Corporation: NVIDIA CUDA C Programming guide - official website http: // docs. nvidia. com/ cuda/ cuda-c-programming-guide/ index. html [cited April 2013] 121

5 Acceleration of ray tracing method using predictive evaluation and GPGPU technology 4.1. Experiment no.1 - testing parallel raytracer The experiment no. 1 was focused on the comparison of final times for various divisions of sample scenes. The hardware configuration used for the experiment: Intel Dual Core E6300 overclocked to 3.8 GHz, NVIDIA GeForce GTX MB, 4 GB RAM and Windows 7 64-bit. The results of this comparison are shown in Table 1. Visual representation of this comparison is shown in Figure 3. Table 1. Final times for sample scenes divided into 128 and 320 parts (resolution ). Size Scene 1 Scene 2 Scene 3 16 points on the x-axis and 8 points on the y-axis points on the x-axis points on the y-axis points on the x-axis and 16 points on the y-axis points on the x-axis points on the y-axis As can be seen, the optimal division for the first and the third scene is 128 points on the y-axis. For the second scene is optimal division 16 points on the x-axis and 8 points on the y-axis. Worst results were obtained using division 128 points on the x-axis. The results also confirmed the assumption that the total computation time depends on the used division, while the optimal division depends on the used scene ,726 6,744 8, ,187 7,446 9,532 parts size (points) ,52 5,726 6,88 6,718 8,201 8,095 scene 3 scene 2 scene ,145 7,44 9, ,566 6,915 8, time (s) Figure 3. The comparison of final times for various divisions Experiment no.2 - testing acceleration GPU vs. CPU The experiment no. 2 was focused on the comparison of computation times between the graphics cards and processors. These CPUs and GPUs were compared in the experiment: Intel Dual Core E6300 (overclocked to 3.8 GHz), Intel i5-2500k processor (overclocked to 4.5 GHz), NVIDIA GeForce GTX MB and NVIDIA Tesla C1060. The division with 20 points on the x-axis and 16 points on the y-axis (320 threads) was used for the parallelization on GPUs. Ray 122

6 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek tracing computation on CPUs was not parallel so only one core was used for computation. Results of this comparison are shown in Table 2. As it can be seen, the GPGPU solution reaches much more lower rendering times. Comparison of final times and accelerations between the best CPU (Intel i5-2500k) and GPU (NVIDIA Tesla C1060) is shown in Table 3. Average acceleration observed in this experiment was Table 2. Computation times comparison (in seconds). Scene Resolution Intel Core 2 Duo E6300 Intel i5-2500k NVIDIA GeForce GTX 275 NVIDIA Tesla C Scene Scene Scene Table 3. Comparison of times and accelerations between the Intel i5-2500k and the NVIDIA Tesla C1060 (in seconds). Scene Resolution Intel i5-2500k NVIDIA Tesla C1060 Acceleration ,0 Scene Scene Scene Experiment no.3 - predictive evaluation Correctness of predictive evaluation was tested on several scenes. For the tests, we used three different low resolutions (32 32, and pixels) and three different types of scene division (based on data-driven approach: tiles, rows and columns). An example of the used scene and used divisions is shown in Figure 4. The division into 16 parts was used for this scene. Computation times for each part of individual divisions (tiles, rows and collumns) are shown in Figures 5, 6, 7. The results obtained in this experiment also empirically verified the assumption that was used for the predictive evaluation: the total rendering time of the scene increases proportionally with the resolution (see the increase in the rendering times for individual resolutions in Figures 5, 6, 7). 123

7 Acceleration of ray tracing method using predictive evaluation and GPGPU technology Figure 4. One of the used scenes with three types of division (from left): tiles, rows, columns. Figure 5. Computation time(left) and number of rays per 1ms (right) for each tile. Figure 6. Computation time (left) and number of rays per 1ms (right) for each row. 5. Conclusion Parallel ray tracing using GPGPU and CUDA technology is very popular research topic (for example [15 17]). But many existing solutions selects the best scene division for a parallelization according to tests, which were performed manually. Our solution uses for this selection predictive evaluation algorithm, which allows automatization of these tests according to the number of used threads. 124

8 Branislav Sobota, Štefan Korečko, Csaba Szabó, František Hrozek Figure 7. Computation time (left) and number of rays per 1ms (right) for each column. It was found out during the design of application that for the parallelization of ray tracing on the level of pixels (group of pixels) it is necessary to transfer significant part of used raytracer core into GPGPU language (in our solution CUDA C). The algorithm of parallelization used in this solution is universal and can be used in other ray tracing applications as well. The results of our solution were presented in the results of the experiment no.2 (computation times for sample scenes using GPU and CPU). During this experiment we observed acceleration up to 28.3 compared to CPU. However this acceleration strongly depends on the scene and rendering parameters, for example: rendering resolution, scene complexity or scene division for the parallelization. Also, usage of parallel solution for ray tracing computation on CPU can affects this acceleration strongly. An important part of our solution is the predictive evaluation, which allows semi-automatic selection of the optimal scene division for the parallelization on GPGPU. The rendering times using this evaluation were shown in the results of the experiment no.3. There is still an open question about predictive evaluation of scenes with lower resolutions (up to several hundreds of pixels). Evaluation of these scenes can take too much time comparing to time needed for their rendering. In this case it is better to render these scenes without evaluation. Formal description of ray tracing/parallel-ray tracing and its implementation is also excellent basis for teaching of formal methods [18]. The using of Petri nets is also a perfect base for time gains evaluation. Another question is using OpenCL as GPGPU language. Its usage would enable execution of the application on graphics cards of other manufacturers. However, results would be probably worse than by using CUDA. Evaluation of this hypothesis is the goal of our future works, which will be focused on two main areas: implementation of parallel ray tracing using OpenCL and comparison of results obtained by both GPGPU technologies. 6. Acknowlwdgment This work is supported by the project KEGA no. Teaching Formal Methods". 050TUKE-4/2012: "Application of Virtual reality Technologies in References [1] B. Sobota, M. Straka, J. Perháč, A visualization in cluster environment, Grid Computing for Complex Problem 2007, Bratislava, (Institute of Informatics SAV, Bratislava, 2007) [2] M. Jelšina, B. Sobota, M. Strak, Parallel Hierarchical Model of Visualisation Computing in Virtual Reality System, In proceedings of: 7 th Scientific Conference with International Participation, Engineering of Modern Electric Systems 2003 (EMES 03), University of Oradea Romania - Faculty of Electrotechnics and Informatics Department of 125

9 Acceleration of ray tracing method using predictive evaluation and GPGPU technology Computer Science, Romania, Oradea, May (University of Oradea Romania Faculty of Electrotechnics and Informatics Department of Computer Science, Oradea) [3] G. Massal, A raytracer in C Introduction-What-is-ray-tracing.html [cited April 2013] [4] I. Notkin, C. Gotsman, Parallel Progressive Ray-tracing, Comput. Graph. Forum 16(1), 43 55, 1997 [5] I. Wald, W.R. Mark, J. GntherBoulo et. al., State of the Art in Ray Tracing Animated Scenes, Comput. Graph. Forum 28(6) , 2009 [6] A. Heirich, J. Arvo, A competitive analysis of load balancing strategies for parallel ray tracing, JoS 12, 57 68, 1998 [7] V. Archirapatkave, H. Sumilo, S.C.W. See et al., GPGPU Acceleration Algorithm for Medical Image Reconstruction, IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA 2011) May 2011, 41 46, [8] B. Hu, X. Ma, M. Joyce et al., A GPGPU accelerated compressed sensing with tight wavelet frame transform technique for MR imaging reconstruction, IEEE International Conference on Imaging Systems and Techniques (IST 2012), July 2012, [9] G. Chen, G. Li, S. Pei, B. Wu, GPGPU supported cooperative acceleration in molecular dynamics, 13th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2009), April 2009, [10] W. Liu, B. Schmidt, G. Voss et. al., Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA, Comput. Phys. Comm. 179(9), , 2008 [11] D. Hallmans, K. Sandstrom, M. Lindgren, T. Nolte, GPGPU for industrial control systems, IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA 2013), Sept. 2013, 1 4 [12] T. Messay, Chong Chen, R. Ordonez et. al, GPGPU acceleration of a novel calibration method for industrial robots, In proceedings of: 2011 IEEE National Aerospace and Electronics Conference (NAECON 2011), July 2011, [13] E. Reinhard, A.J. Kok, P.W. Jansen, Cost prediction in ray tracing, Rendering Techniques 96 (Springer, Vienna, 1996) [14] R. Šoltys, Raytracing method implementation using GPGPU technology, Diploma thesis, Technical university of Košice, FEEI, 2012 [15] R. Geist, J. Steele, A lighting model for fast rendering of forest ecosystems, IEEE Symposium on Interactive Ray Tracing, 2008, RT 2008., 9-10 Aug. 2008, [16] S. Guntury, P.J. Narayanan, Raytracing Dynamic Scenes on the GPU Using Grids, IEEE Trans. Visual. Comput. Graphics 18(1), 5 16, 2012 [17] A. Segovia, L. Xiaoming, G. Guang, Iterative layer-based raytracing on CUDA, 28th IEEE International Performance Computing and Communications Conference (IPCCC 2009), Dec. 2009, [18] Š. Korečko, B. Sobota, Using coloured Petri nets for design of parallel raytracing environment, Acta Univ. Sapientiae 2(1), 28 39,

Ray tracing based fast refraction method for an object seen through a cylindrical glass

Ray tracing based fast refraction method for an object seen through a cylindrical glass 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Ray tracing based fast refraction method for an object seen through a cylindrical

More information

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

PARALLEL SCENE SPLITTING AND ASSIGNING FOR FAST RAY TRACING

PARALLEL SCENE SPLITTING AND ASSIGNING FOR FAST RAY TRACING Acta Electrotechnica et Informatica, Vol. 10, No. 2, 2010, 33 37 33 PARALLEL SCENE SPLITTING AND ASSIGNING FOR FAST RAY TRACING Liberios VOKOROKOS, Eva DANKOVÁ, Norbert ÁDÁM Department of Computers and

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

Simultaneous Solving of Linear Programming Problems in GPU

Simultaneous Solving of Linear Programming Problems in GPU Simultaneous Solving of Linear Programming Problems in GPU Amit Gurung* amitgurung@nitm.ac.in Binayak Das* binayak89cse@gmail.com Rajarshi Ray* raj.ray84@gmail.com * National Institute of Technology Meghalaya

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

Building a Fast Ray Tracer

Building a Fast Ray Tracer Abstract Ray tracing is often used in renderers, as it can create very high quality images at the expense of run time. It is useful because of its ability to solve many different problems in image rendering.

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Fast BVH Construction on GPUs

Fast BVH Construction on GPUs Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

Row Tracing with Hierarchical Occlusion Maps

Row Tracing with Hierarchical Occlusion Maps Row Tracing with Hierarchical Occlusion Maps Ravi P. Kammaje, Benjamin Mora August 9, 2008 Page 2 Row Tracing with Hierarchical Occlusion Maps Outline August 9, 2008 Introduction Related Work Row Tracing

More information

Efficient Depth-Compensated Interpolation for Full Parallax Displays

Efficient Depth-Compensated Interpolation for Full Parallax Displays ETN-FPI TS3 Plenoptic Sensing Efficient Depth-Compensated Interpolation for Full Parallax Displays Reinhard Koch and Daniel Jung Multimedia Information Processing Full parallax autostereoscopic display

More information

Accelerating Ray Tracing

Accelerating Ray Tracing Accelerating Ray Tracing Ray Tracing Acceleration Techniques Faster Intersections Fewer Rays Generalized Rays Faster Ray-Object Intersections Object bounding volumes Efficient intersection routines Fewer

More information

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller Organization People Waqar Saleem, waqar.saleem@uni-jena.de Jens Mueller, jkm@informatik.uni-jena.de Room 3335, Ernst-Abbe-Platz 2

More information

A Hybrid Approach to Parallel Connected Component Labeling Using CUDA

A Hybrid Approach to Parallel Connected Component Labeling Using CUDA International Journal of Signal Processing Systems Vol. 1, No. 2 December 2013 A Hybrid Approach to Parallel Connected Component Labeling Using CUDA Youngsung Soh, Hadi Ashraf, Yongsuk Hae, and Intaek

More information

GPU programming. Dr. Bernhard Kainz

GPU programming. Dr. Bernhard Kainz GPU programming Dr. Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms Pitfalls and best practice Reduction and tiling

More information

Computer Graphics. Lecture 13. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura

Computer Graphics. Lecture 13. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura Computer Graphics Lecture 13 Global Illumination 1: Ray Tracing and Radiosity Taku Komura 1 Rendering techniques Can be classified as Local Illumination techniques Global Illumination techniques Local

More information

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique Xingxing Zhu and Yangdong Deng Institute of Microelectronics, Tsinghua University, Beijing, China Email: zhuxingxing0107@163.com,

More information

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016 Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives

More information

Accelerating K-Means Clustering with Parallel Implementations and GPU computing

Accelerating K-Means Clustering with Parallel Implementations and GPU computing Accelerating K-Means Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam

More information

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir GPGPU Applications for Hydrological and Atmospheric Simulations and Visualizations on the Web Ibrahim Demir Big Data We are collecting and generating data on a petabyte scale (1Pb = 1,000 Tb = 1M Gb) Data

More information

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and 1 Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and the light. 2 To visualize this problem, consider the

More information

Global Rendering. Ingela Nyström 1. Effects needed for realism. The Rendering Equation. Local vs global rendering. Light-material interaction

Global Rendering. Ingela Nyström 1. Effects needed for realism. The Rendering Equation. Local vs global rendering. Light-material interaction Effects needed for realism Global Rendering Computer Graphics 1, Fall 2005 Lecture 7 4th ed.: Ch 6.10, 12.1-12.5 Shadows Reflections (Mirrors) Transparency Interreflections Detail (Textures etc.) Complex

More information

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques. I. Course Title Parallel Computing 2 II. Course Description Students study parallel programming and visualization in a variety of contexts with an emphasis on underlying and experimental technologies.

More information

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y. COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering

More information

Rendering and Modeling of Transparent Objects. Minglun Gong Dept. of CS, Memorial Univ.

Rendering and Modeling of Transparent Objects. Minglun Gong Dept. of CS, Memorial Univ. Rendering and Modeling of Transparent Objects Minglun Gong Dept. of CS, Memorial Univ. Capture transparent object appearance Using frequency based environmental matting Reduce number of input images needed

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

AN ACCELERATION OF FPGA-BASED RAY TRACER

AN ACCELERATION OF FPGA-BASED RAY TRACER AN ACCELERATION OF FPGA-BASED RAY TRACER Raisa Malcheva, PhD Mohammad Yunis, MA Donetsk National Technical University, Ukraine Abstract The Hardware implementations of the Ray Tracing algorithm are analyzed.

More information

Lecture 1: Introduction and Computational Thinking

Lecture 1: Introduction and Computational Thinking PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational

More information

Parallel Computer Architecture and Programming Final Project

Parallel Computer Architecture and Programming Final Project Muhammad Hilman Beyri (mbeyri), Zixu Ding (zixud) Parallel Computer Architecture and Programming Final Project Summary We have developed a distributed interactive ray tracing application in OpenMP and

More information

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been

More information

Computer Graphics. Lecture 10. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura 12/03/15

Computer Graphics. Lecture 10. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura 12/03/15 Computer Graphics Lecture 10 Global Illumination 1: Ray Tracing and Radiosity Taku Komura 1 Rendering techniques Can be classified as Local Illumination techniques Global Illumination techniques Local

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao

More information

Scalable multi-gpu cloud raytracing with OpenGL

Scalable multi-gpu cloud raytracing with OpenGL Scalable multi-gpu cloud raytracing with OpenGL University of Žilina Digital technologies 2014, Žilina, Slovakia Overview Goals Rendering distant details in visualizations Raytracing Multi-GPU programming

More information

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Modern GPUs (Graphics Processing Units)

Modern GPUs (Graphics Processing Units) Modern GPUs (Graphics Processing Units) Powerful data parallel computation platform. High computation density, high memory bandwidth. Relatively low cost. NVIDIA GTX 580 512 cores 1.6 Tera FLOPs 1.5 GB

More information

Facial Recognition Using Neural Networks over GPGPU

Facial Recognition Using Neural Networks over GPGPU Facial Recognition Using Neural Networks over GPGPU V Latin American Symposium on High Performance Computing Juan Pablo Balarini, Martín Rodríguez and Sergio Nesmachnow Centro de Cálculo, Facultad de Ingeniería

More information

Adaptive Assignment for Real-Time Raytracing

Adaptive Assignment for Real-Time Raytracing Adaptive Assignment for Real-Time Raytracing Paul Aluri [paluri] and Jacob Slone [jslone] Carnegie Mellon University 15-418/618 Spring 2015 Summary We implemented a CUDA raytracer accelerated by a non-recursive

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

Reconstruction Improvements on Compressive Sensing

Reconstruction Improvements on Compressive Sensing SCITECH Volume 6, Issue 2 RESEARCH ORGANISATION November 21, 2017 Journal of Information Sciences and Computing Technologies www.scitecresearch.com/journals Reconstruction Improvements on Compressive Sensing

More information

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set. for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

V-Ray RT: A New Paradigm in Photorealistic Raytraced Rendering on NVIDIA GPUs. Vladimir Koylazov Chaos Software.

V-Ray RT: A New Paradigm in Photorealistic Raytraced Rendering on NVIDIA GPUs. Vladimir Koylazov Chaos Software. V-Ray RT: A New Paradigm in Photorealistic Raytraced Rendering on NVIDIA s Vladimir Koylazov Chaos Software V-Ray RT demonstration V-Ray RT demonstration V-Ray RT architecture overview Goals of V-Ray RT

More information

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge

More information

3D Registration based on Normalized Mutual Information

3D Registration based on Normalized Mutual Information 3D Registration based on Normalized Mutual Information Performance of CPU vs. GPU Implementation Florian Jung, Stefan Wesarg Interactive Graphics Systems Group (GRIS), TU Darmstadt, Germany stefan.wesarg@gris.tu-darmstadt.de

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

Cross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing

Cross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing Chris Lupo Computer Science Cal Poly Session 0311 GTC 2012 Slide 1 The Meta Data Cal Poly is medium sized, public polytechnic

More information

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Visual Analysis of Lagrangian Particle Data from Combustion Simulations Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang

More information

A distributed rendering architecture for ray tracing large scenes on commodity hardware. FlexRender. Bob Somers Zoe J.

A distributed rendering architecture for ray tracing large scenes on commodity hardware. FlexRender. Bob Somers Zoe J. FlexRender A distributed rendering architecture for ray tracing large scenes on commodity hardware. GRAPP 2013 Bob Somers Zoe J. Wood Increasing Geometric Complexity Normal Maps artifacts on silhouette

More information

Recursion and Data Structures in Computer Graphics. Ray Tracing

Recursion and Data Structures in Computer Graphics. Ray Tracing Recursion and Data Structures in Computer Graphics Ray Tracing 1 Forward Ray Tracing imagine that you take a picture of a room using a camera exactly what is the camera sensing? light reflected from the

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha

Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha Symposium on Interactive Ray Tracing 2008 Los Angeles, California Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Kirill Garanzha Department of Software for Computers Bauman Moscow State

More information

Consider a partially transparent object that is illuminated with two lights, one visible from each side of the object. Start with a ray from the eye

Consider a partially transparent object that is illuminated with two lights, one visible from each side of the object. Start with a ray from the eye Ray Tracing What was the rendering equation? Motivate & list the terms. Relate the rendering equation to forward ray tracing. Why is forward ray tracing not good for image formation? What is the difference

More information

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational

More information

Comparison of High-Speed Ray Casting on GPU

Comparison of High-Speed Ray Casting on GPU Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot Parallel Implementation Algorithm of Motion Estimation for GPU Applications by Tian Song 1,2*, Masashi Koshino 2, Yuya Matsunohana 2 and Takashi Shimamoto 1,2 Abstract The video coding standard H.264/AVC

More information

Ray Casting of Trimmed NURBS Surfaces on the GPU

Ray Casting of Trimmed NURBS Surfaces on the GPU Ray Casting of Trimmed NURBS Surfaces on the GPU Hans-Friedrich Pabst Jan P. Springer André Schollmeyer Robert Lenhardt Christian Lessig Bernd Fröhlich Bauhaus University Weimar Faculty of Media Virtual

More information

GPU Programming for Mathematical and Scientific Computing

GPU Programming for Mathematical and Scientific Computing GPU Programming for Mathematical and Scientific Computing Ethan Kerzner and Timothy Urness Department of Mathematics and Computer Science Drake University Des Moines, IA 50311 ethan.kerzner@gmail.com timothy.urness@drake.edu

More information

Accelerated Ambient Occlusion Using Spatial Subdivision Structures

Accelerated Ambient Occlusion Using Spatial Subdivision Structures Abstract Ambient Occlusion is a relatively new method that gives global illumination like results. This paper presents a method to accelerate ambient occlusion using the form factor method in Bunnel [2005]

More information

A Cross-Input Adaptive Framework for GPU Program Optimizations

A Cross-Input Adaptive Framework for GPU Program Optimizations A Cross-Input Adaptive Framework for GPU Program Optimizations Yixun Liu, Eddy Z. Zhang, Xipeng Shen Computer Science Department The College of William & Mary Outline GPU overview G-Adapt Framework Evaluation

More information

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION

More information

Computer Vision Systems. Dean, Faculty of Technology Professor, Department of Technology University of Pune, Pune

Computer Vision Systems. Dean, Faculty of Technology Professor, Department of Technology University of Pune, Pune Improving Performance for Computer Vision Systems Dr. Aditya Abhyankar Dean, Faculty of Technology Professor, Department of Technology University of Pune, Pune Homography based Hybrid Mixture Model for

More information

Subset Sum Problem Parallel Solution

Subset Sum Problem Parallel Solution Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in

More information

RT 3D FDTD Simulation of LF and MF Room Acoustics

RT 3D FDTD Simulation of LF and MF Room Acoustics RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id. 749612 andreaemanuele.greco@mail.polimi.it ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing.

More information

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño High Quality DXT Compression using OpenCL for CUDA Ignacio Castaño icastano@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change 0.1 02/01/2007 Ignacio Castaño First

More information

A GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou

A GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou A GPU Implementation of Tiled Belief Propagation on Markov Random Fields Hassan Eslami Theodoros Kasampalis Maria Kotsifakou BP-M AND TILED-BP 2 BP-M 3 Tiled BP T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 4 Tiled

More information

GPU Implementation of a Multiobjective Search Algorithm

GPU Implementation of a Multiobjective Search Algorithm Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen

More information

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan.  Ray Tracing. Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Ray Tracing with Tim Purcell 1 Topics Why ray tracing? Interactive ray tracing on multicomputers

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

arxiv: v1 [physics.ins-det] 11 Jul 2015

arxiv: v1 [physics.ins-det] 11 Jul 2015 GPGPU for track finding in High Energy Physics arxiv:7.374v [physics.ins-det] Jul 5 L Rinaldi, M Belgiovine, R Di Sipio, A Gabrielli, M Negrini, F Semeria, A Sidoti, S A Tupputi 3, M Villa Bologna University

More information

A Simulated Annealing algorithm for GPU clusters

A Simulated Annealing algorithm for GPU clusters A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011 1 Introduction 2 3 The lower level The upper

More information

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J. Journal of Universal Computer Science, vol. 14, no. 14 (2008), 2416-2427 submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.UCS Tabu Search on GPU Adam Janiak (Institute of Computer Engineering

More information

Deformable and Fracturing Objects

Deformable and Fracturing Objects Interactive ti Collision i Detection ti for Deformable and Fracturing Objects Sung-Eui Yoon ( 윤성의 ) IWON associate professor KAIST http://sglab.kaist.ac.kr/~sungeui/ Acknowledgements Research collaborators

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

GeoImaging Accelerator Pansharpen Test Results. Executive Summary

GeoImaging Accelerator Pansharpen Test Results. Executive Summary Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has

More information

Dense matching GPU implementation

Dense matching GPU implementation Dense matching GPU implementation Author: Hailong Fu. Supervisor: Prof. Dr.-Ing. Norbert Haala, Dipl. -Ing. Mathias Rothermel. Universität Stuttgart 1. Introduction Correspondence problem is an important

More information

Scalable Ambient Effects

Scalable Ambient Effects Scalable Ambient Effects Introduction Imagine playing a video game where the player guides a character through a marsh in the pitch black dead of night; the only guiding light is a swarm of fireflies that

More information

Rendering Computer Animations on a Network of Workstations

Rendering Computer Animations on a Network of Workstations Rendering Computer Animations on a Network of Workstations Timothy A. Davis Edward W. Davis Department of Computer Science North Carolina State University Abstract Rendering high-quality computer animations

More information

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Prashant Ramanathan and Bernd Girod Department of Electrical Engineering Stanford University Stanford CA 945

More information

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT

More information

An Implementation of Ray Tracing in CUDA

An Implementation of Ray Tracing in CUDA An Implementation of Ray Tracing in CUDA CSE 260 Project Report Liang Chen Hirakendu Das Shengjun Pan December 4, 2009 Abstract In computer graphics, ray tracing is a popular technique for rendering images

More information