A GPU based Real-Time Line Detector using a Cascaded 2D Line Space

Similar documents
Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

Pattern recognition systems Lab 3 Hough Transform for line detection

Fast BVH Construction on GPUs

Scan Primitives for GPU Computing

Computer and Machine Vision

Efficient Stream Reduction on the GPU

Optimization solutions for the segmented sum algorithmic function

ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS

Advanced Computer Graphics CS 563: Making Imperfect Shadow Maps View Adaptive. Frederik Clinck lie

Data parallel algorithms, algorithmic building blocks, precision vs. accuracy

Improved Integral Histogram Algorithm. for Big Sized Images in CUDA Environment

high performance medical reconstruction using stream programming paradigms

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Edge detection. Convert a 2D image into a set of curves. Extracts salient features of the scene More compact than pixels

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture

Face Detection CUDA Accelerating

Line Segment Based Watershed Segmentation

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Practical Shadow Mapping

Sorting and Searching. Tim Purcell NVIDIA

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Other Linear Filters CS 211A

First Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection

CS427 Multicore Architecture and Parallel Computing

Abstract. Introduction. Kevin Todisco

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Mesh Repairing and Simplification. Gianpaolo Palma

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

E0005E - Industrial Image Analysis

High-quality Shadows with Improved Paraboloid Mapping

RASTERIZING POLYGONS IN IMAGE SPACE

The Traditional Graphics Pipeline

COMP30019 Graphics and Interaction Scan Converting Polygons and Lines

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion

An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners

Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms

Circular Arcs as Primitives for Vector Textures

Comparison of Some Motion Detection Methods in cases of Single and Multiple Moving Objects

GPU Programming for Mathematical and Scientific Computing

HOUGH TRANSFORM CS 6350 C V

Introduction. Chapter Overview

Multimedia Computing: Algorithms, Systems, and Applications: Edge Detection

Accelerated Ambient Occlusion Using Spatial Subdivision Structures

Applications of Explicit Early-Z Culling

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

Einführung in Visual Computing

GPU Implementation of a Multiobjective Search Algorithm

Row Tracing with Hierarchical Occlusion Maps

The Traditional Graphics Pipeline

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer

Future Computer Vision Algorithms for Traffic Sign Recognition Systems

UberFlow: A GPU-Based Particle Engine

Ray Casting on Programmable Graphics Hardware. Martin Kraus PURPL group, Purdue University

Sobel Edge Detection Algorithm

Universiteit Leiden Computer Science

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

Comparison of hierarchies for occlusion culling based on occlusion queries

Edge and corner detection

Computer Graphics Fundamentals. Jon Macey

UNIVERSITY OF NEBRASKA AT OMAHA Computer Science 4620/8626 Computer Graphics Spring 2014 Homework Set 1 Suggested Answers

Experiments with Edge Detection using One-dimensional Surface Fitting

Announcements. Midterms graded back at the end of class Help session on Assignment 3 for last ~20 minutes of class. Computer Graphics

Rendering Subdivision Surfaces Efficiently on the GPU

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Cross-Hardware GPGPU implementation of Acceleration Structures for Ray Tracing using OpenCL

An Extension to Hough Transform Based on Gradient Orientation

Efficient Scan-Window Based Object Detection using GPGPU

Chapter 12: Indexing and Hashing. Basic Concepts

Fall CSCI 420: Computer Graphics. 7.1 Rasterization. Hao Li.

This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you

(Refer Slide Time: 00:02:00)

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

NVIDIA Case Studies:

Chapter 3 Image Registration. Chapter 3 Image Registration

Local Features: Detection, Description & Matching

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Chapter 12: Indexing and Hashing

Screen Space Ambient Occlusion TSBK03: Advanced Game Programming

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

Efficient Rendering of Glossy Reflection Using Graphics Hardware

Dense matching GPU implementation

Face Detection using GPU-based Convolutional Neural Networks

Chapter 7 - Light, Materials, Appearance

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012)

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Graduate Examination. Department of Computer Science The University of Arizona Spring March 5, Instructions

Performance Analysis and Culling Algorithms

The Traditional Graphics Pipeline

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

GPU-BASED RAY TRACING ALGORITHM USING UNIFORM GRID STRUCTURE

Pipeline Operations. CS 4620 Lecture 14

Interpolation using scanline algorithm

CSCI 420 Computer Graphics Lecture 14. Rasterization. Scan Conversion Antialiasing [Angel Ch. 6] Jernej Barbic University of Southern California

Transcription:

A GPU based Real-Time Line Detector using a Cascaded 2D Line Space Jochen Hunz jhunz@uni-koblenz.de Anna Katharina Hebborn ahebborn@uni-koblenz.de University of Koblenz-Landau Universitätsstraße 1 56070 Koblenz Germany Stefan Müller stefanm@uni-koblenz.de ABSTRACT We propose the Line Space as a novel parameterization for lines in 2D images. The approach has similarities to the well known Hough Transform; however, we use a linear parameterization instead of angular representations leading to a better quality and less redundancy. The Line Space is very well suited for GPU implementation since all potential lines in an image are captured through rasterization. In addition, we improve the efficiency by introducing the term Cascaded Line Space, where the image is subdivided into smaller Line Spaces which are finally merged to the global Line Space. We implemented our approaches exploiting modern GPU facilities (i.e. compute shader) and we will describe the details in this paper. Finally, we will discuss the enormous potential of the Line Space for further extensions. KEYWORDS Image Processing, Visual Computing, Line Detection, GPGPU, Line Space 1 INTRODUCTION Lines are typical image features of interest in computer vision and image analysis. Several use cases are depending on capturing a set of lines in images accurately and efficiently. Since 1962, one established way to detect lines in images is the Hough Transform [1]. Significantly involved in the present algorithm is the work of Duda and Hart [2]. They represent a line in its normal form. Thereby they parameterize the line through its algebraic distance d and its normal's angle α. By restricting α to the interval [0, α), all normal parameters are unique. Therefore, every line in the input image corresponds to a unique point in the Hough accumulator. If the coordinate system s origin is the center of the image, the maximum distance d to the origin is. Thus, the resolution of the Hough accumulator depends on the number of discretizations of α and the input image of size. This is, the resolution of the Hough accumulator is given by. Transforming a set of collinear points from the input image to the Hough accumulator will result in sinusoidal curves with one common point of intersection. Finding this point of intersection results in a detected line. The discretization of the angle α leads to a subsampling of the lines if the discretization is not selected properly. Thus, the line's accuracy is not perfect. In contrast to this, our approach detects lines in images using a linear parameterization of lines instead of an angular representation. Doing this, we consider all possible lines in an image. In this paper, we propose a novel approach for detecting lines in images. Due to the computing power of modern GPUs and the versatile ability of compute shaders, all possible lines in an image can be scanned. In section 2, we will describe the idea and theory behind the Line ISBN: 978-1-941968-02-4 2014 SDIWC 56

Space and we will introduce the Cascaded Line Space to optimize the algorithm. Section 3 illustrates and discusses implementation details for detecting lines in an efficient way on the GPU using OpenGL Compute Shaders. Afterwards, section 4 presents our evaluated results related to the Hough Transform, specifically the quality of the results and the algorithm's efficiency. At last, we will conclude this paper with a discussion about the results and we will give a look on what might await us in the future, in terms of possibilities and chances of the Line Space. 2 LINE SPACE all corner points of an image, which is filtered by the Canny edge detector [4]. In figure 1 an edge image of size n x n is shown with an exemplary line. The border edges are numbered from 0 to 4n - 1. Now we consider all lines from every border edge to all border edges, each identified by a tuple (start edge number, end edge number). As a result, we get a two dimensional space (the Line Space) of size 4n x 4n, where each index tuple represents a line in the image. As a consequence, the algorithm rasterizes lines in total without any applied optimization. The blue line in figure 1a is represented by the index (1,8) or (8,1) respectively. Thereby, the entry in the Line Space is the number of pixels in the Canny image with a value not equal to zero. As we can see, the Line Space has some interesting properties: (a) (b) ) Figure 1: (a) An empty image with one exemplary line and (b) the corresponding Line Space entries. The term Line Space was introduced by Drettakis and Sillion [3] in the context of hierarchical radiosity simulation. In their paper, a line is considered as a link between two arbitrary surface elements, surrounded by a shaft, covering all potential rays between both surface elements. For each shaft, a list of potential occluding candidates is computed during the radiosity simulation. The candidate list can be reused for further visibility computation (i.e. if a link needs to be refined) or for dynamic scene updates. We use the term Line Space in a somehow similar manner. However, instead of shafts between surfaces, we investigate lines between 1. LS(s,e) = LS(e,s): it is symmetric and only the upper triangle is needed. 2. LS(s,s) = 0: the elements of the diagonal characterize degenerated lines with zero length and can be omitted. 3. Collinearity: lines between collinear edges are also degenerated, leading to blocks around the diagonal in this example. 2.1 Cascaded Line Space As the computation of the Line Space should be as fast as possible to remain real-time capable, we developed a Cascaded Line Space (CLS) to speed up the algorithm. The idea is rather simple: Instead of rasterizing lines between all corner points of an image, we divide the image into cascades of size. In a similar way as described before, we number the border edges of each cascade from 0 to 4k - 1. Now we consider all lines within a cascade between these border edges, resulting in one Line Space per cascade. Thereby, each Line Space is pixels in size. Consequently, we ISBN: 978-1-941968-02-4 2014 SDIWC 57

compute Line Spaces while rasterizing lines, which are as many lines as before. Because of that we can store the CLS in an Image of size 4n x 4n. The advantage of this approach is, that we rasterize short lines instead of longer lines through the whole image, which results in a significant speed increase. Figure 2a shows an image of size 16 x 16 divided into 4 x 4 cascades. The result of our algorithm is a CLS as in figure 2b. In order to obtain the same information from the CLS as from the Line Space we need to merge the CLS together. We can compute the next higher CLS hierarchy based on the existing one by merging groups of four Line Spaces to one Line Space. Thereby, the cascade intersection points of each line in the Line Space are determined as in figure 3. Here, the line intersects the cascade at the red, green, and orange points. The red points give the index tupels (2,5) and (5,2) within the upper left cascade, whereby the green points give the index tuple (12,13) and (13,12) within the lower left cascade, for instance. Knowing this for every intersection pair, we can perform simple look ups in the CLS to determine how many pixels are set on the way of this line. The result is stored in the next higher CLS hierachy at the proposed line entry. Doing this for all cascades leads to a new CLS which is a composition of cascades. Repeating this procedure times will result in the complete Line Space. Figure 4a shows the result of the first merge step of the CLS shown in figure 2b. The result is a new CLS with 2 x 2 cascades of size each. The next and last merge step in this example is shown in figure 4b, which is the complete Line Space of the image in figure 2a: A CLS with only 1 cascade of size (a) (b) ) Figure 2: (a) An image of size 16 x 16 divided into 4 x 4 cascades and (b) the corresponding Cascaded Line Space. (a) (b) ) Figure 3: (a) Determined intersection points for each cascade and (b) the corresponding CLS entries, exemplary for the left upper image section of figure 2. (a) (b) ) Figure 4: (a) Cascaded Line Space with 2 x 2 cascades and (b) the complete Line Space as a result of the last merge step. ISBN: 978-1-941968-02-4 2014 SDIWC 58

3 IMPLEMENTATION As the Line Space is constructed through line rasterization, we implemented our approach using OpenGL 4 and the performance of modern GPUs. We obtained the best performance using Compute Shaders as they provide high-speed general-purpose computing. The dispatch of a Compute Shader can be fully configured. Thereby, the Shader gets dispatched in one global work group, which is a three dimensional space of local work groups. Each local work group itself forms a three dimensional space of threads and gets executed on one streaming multiprocessor of the GPU. A Compute Shader invocation is dispatched for every border edge tuple (start edge, end edge) to determine whether a line appears between the edges or not. For this, a line from the start to the end edge is rasterized by the Digital Differential Analyzer (DDA) algorithm. Thereby, the number of pixels with a value not equal to zero is counted. The Line Space stores this value, whereby the tuple serves as the Line Space's index. We reduce the computational complexity of the Line Space by using a Buffer Object to store all important tuples and to omit equivalent and degenerated lines at the same time. The following listing shows the core of the entire algorithm in the OpenGL Shading Language: void main() { uint ID = gl_globalinvocationid.x; ivec2 tuple = buffer[id]; uint hits = DDA(tuple.x,tuple.y); imagestore(linespace,tuple,hits); } For CLS computation the core of the algorithm remains the same, only the input buffer and the compute shader dispatch vary. Now the buffer contains only border edge tuples of one cascade and we dispatch a set of two dimensional local work groups. Using this technique, we can determine the ID of the cascade inside the shader. Knowing this, we can compute the exact start- and endpoint of the line in the image. After computing the number of line hits, the result gets stored on its proposed cascade position. To merge the cascades as fast as possible, the intersections points for every cascade level can be precomputed and can be made available in the shader through a buffer. It is also recommended to omit equivalent and degenerated lines as before. 3.1 Line Extraction After computing the Line Space, it needs to get analyzed in a similar way as the Hough accumulator of the Hough Transform to extract the detected lines. As every entry in the Line Space stores the number of line hits, the simplest way to analyze the Line Space is to introduce a threshold. Now every Line Space entry is considered. If the value of the entry is greater than the threshold, the entry is regarded as a detected line. The value of the threshold should depend on the size of the input image as well as on the size of the lines which are expected. If the threshold is chosen too small, this approach detects too many lines. Conversely, an overlarge t would detect too few or, in the worst case, even no lines in the image. The neighborhood of a strong Line Space entry is also densely occupied, and therefore, it is advantageous to supress that neighborhood. Thus, we avoid detecting similar and false lines. As the number of detected lines by using thresholding depends on the used threshold and the input image, a stable method, which detects exactly m lines, is desirable. In general, one could search for the m-maxima by going through the Line Space sequentially. Nevertheless, a preferable technique would be to search for the m-maxima on the GPU. For this, the Line Space is divided into a grid using a compute shader. Each compute shader ISBN: 978-1-941968-02-4 2014 SDIWC 59

invocation reduces the four entries of a grid cell to one entry so that only the largest entry remains (see figure 5). Repeating the image reduction ld(4n) times will lead to the maximum value in the Line Space, and therefore to the most distinct line in the image. For this, each entry in the Line Space must store its position additionally. With this knowledge, the detected line can be deleted in the Line Space by setting its value to zero. The next maximum in the input image will be another distinct line, which is different from the first one. Therefore, doing this procedure m times will deliver the m-maxima of the Line Space. It can improve the results if the neighborhoods of the maxima gets deleted in the Line Space. maxima. To improve the detected lines, the Line Space can be filtered in a preprocessing step. One approach is to use the image reduction technique as in figure 5. Importantly, the Line Space is not reduced ld(4n) times but less. This will reduce the regions a line in the input image produces in the Line Space and will suppress false lines in such a region. 4 RESULTS Figure 5: Image reduction on an n x n input image applied ld(n) = 3 times to find the global maximum. [5] The computational performance of the algorithm depends on m. To detect any number of lines with almost constant performance, the Line Space gets sorted by using a sorting algorithm which is appropriate for GPUs. There are two categories of sorting algorithms: datadriven ones and data-independent ones. Thereby, data-independent sorting algorithms are well suited to be implemented for multiple processors, therefore to run on the GPU [6]. The most common algorithms in the literature are the bitonic merge sort [6] and the radix sort [7]. The first m texel of the sorted Line Space are the m most distinct lines in the input image. Consequently, if the unfiltered Line Space is sorted, the detected lines will be the same as through reducing the Line Space m*ld(4n) times without deleting the neighborhoods of the Figure 6: Top row: input and Canny image of size 512 x 512. Bottom row: 8 most distinct lines detected by Hough (left) and the Line Space (right). We compare the Line Space and the Cascaded Line Space to the Hough Transform. Both extract straight lines from an image and in direct comparison, both approaches find reasonable lines in the image (see figure 6). To be as fair as possible, we use the well tested and fast CPU and GPU implementation provided by OpenCV. As mentioned in the introduction, the Hough Transform's accuarcy depends on the fineness of the discretization of the angle α and the algebraic distance d to the image coordinate system's origin. Therefore, simply applying the Hough Transform to images of different sizes using the same discretization parameters (e.g. an of one ISBN: 978-1-941968-02-4 2014 SDIWC 60

degree) is not a fair comparison. Furthermore, the result's accuracy depends strongly on the parameters and the image size. Hence, the minimum angle between two discrete lines in an image serve as the for the Hough Transform to be comparable to the Line Space. 4.1 Performance Our test system consists of an Intel Core i7 CPU with 4 cores, 2.66 GHz and 12 GB main memory. The GPU is an Nvidia GTX 770 with 1536 cores and 2 GB video memory. The input Canny images varies between 64 x 64 and 1024 x 1024 pixels in size. We only use quadratic images as our proposed implementation of the CLS is solely running on those. However, this is only an implementation detail and the algorithm is adaptable for arbitrary sized images as well. Table 1 shows the average computation time in milliseconds for the Hough Transform running on the CPU and the GPU as well as the average computation time for the Line Space and the CLS. While evaluating the performance of the CLS we consider the initial cascade computation time and the time to merge the cascades to the global Line Space separately. Here we use an initial cascading of k = 8, which provides the best performance in our test scenario. Consequently, a Canny image of size n x n =1024 x 1024 consists of 128 x 128 cascades and as many Line Spaces. The Hough Transform only considers pixels in the image with a value not equal to zero. Therefore, the Hough Transform's performance strongly depends on the number of these pixels in the image. To consider this circumstance, we use two different test images with a different amount of structure in it. Thus, we have two different test results for the Hough Transform for Canny images of n = 512, for instance. In the first image p = 6.7% of the pixels are not equal to zero while the second image has twice as many pixels not equal to zero (p = 13.5%). Please notice that the second image corresponds to the Canny image of figure 6. The computation of the Line Space only depends on the image size and therefore does not require a differentation. For image sizes up to 128 x 128 both, the Line Space as well as the CLS, run significantly faster than the implementations of the Hough Transform. Table 1: Average computation time in milliseconds for the Hough Transform on the CPU and the GPU, the Line Space and the Cascaded Line Space for an Canny image size of n x n. The Cascaded Line Space is separated into the computation time for the initial cascades and the merge phase. Two different images are tested. The minimum angle between two discrete lines in an image is given by. The amount of pixels in percent with a value not equal to zero is given by p. n p Hough CPU Hough GPU Line Space 64 0.451 128 0.226 256 0.112 512 0.06 1024 0.028 15.3 16.3 13.4 15.1 9.8 14.4 6.7 13.5 4.5 10.6 2.14 2.37 23.30 26.97 147.89 207.79 780.57 1576.35 5040.72 12952.30 2.11 1.56 1.56 1.57 2.70 4.56 8.50 13.23 30.62 59.52 Cascaded Line Space Initial Cascades Merge Total 0.11 0.02 0.06 0.08 0.60 0.07 0.30 0.37 4.43 0.24 1.49 1.73 34.13 0.91 7.45 8.36 247.53 3.31 36.46 39.77 ISBN: 978-1-941968-02-4 2014 SDIWC 61

The Line Space is more than 19 times faster than the GPU Hough Transform for n = 64 and p = 15.3 and the CLS is even faster than the Line Space. For n = 128 the Line Space is still more than twice as fast as the GPU Hough Transform. For an image resolution of n = 256 the Line Space is significantly slower than the GPU Hough Transform for the first image (p = 9.8%). Nevertheless, the CLS is still more than 1.5 times faster than the GPU Hough Transform. However, the Hough Transform requires 4.56 ms on the second image (p = 14.4%) which is slower than the Line Space and significantly slower than the CLS. Considering images of size n = 512 and p = 6.7 the CLS is also faster than the GPU Hough Transform, although the difference is not that big anymore. However, it can be stated that the initial computation of the cascades for k = 8 is very fast since it requires only 0.91 milliseconds to get computed. Obviously, the merging of the initial cascades is crucial as it is the main time factor. The Hough Transform runs significantly slower on the second image (p = 13.5) than the CLS. The resulting lines of that image are shown in figure 6. For n = 1024, the GPU Hough Transform is 29% faster than the CLS on image 1 but for image 2, the CLS is almost 1.5 times faster than the GPU Hough Transform. Again, we can observe that the initial computation of the cascades is very fast, requiring only 3.31 milliseconds. As before, the expensive part is the merging. One must consider that for an image size of n = 1024 and an initial cascading of k = 8 a total of merge steps are necessary. However, increasing k will not increase the performance. Therefore, it is desirable for future work to speed up the merging phase. Overall, it can be stated that the CLS is faster than the Line Space considering every image size in our scenario. The merging phase of the CLS is time consuming and therefore critical. The computation of the initial cascades is very fast in contrast to the merge phase. For images with only a few pixels not equal to zero, the Hough Transform runs faster than the CLS. In spite of this, the CLS would already be the better choice for an image with p = 6.7%. In general, the CLS runs constantly fast but the Hough Transform's performance greatly depends on the input image which can be a disadvantage. Furthermore, the Hough Transform is very slow when running on the CPU so that the GPU implementation should be used in general. 5 DISCUSSION AND FUTURE WORK We introduced the Line Space as a new and efficient parameterization for lines in 2D images. The major benefit of the Line Space is that all potential lines in an image are captured without any redundancy. In addition, the Line Space is well suited for GPU implementation. As a first application, we presented a global line detection algorithm with a Cascaded Line Space. The results of our brute force implementation can already compete with the OpenCV GPU Hough Transform. We have several ideas to optimize the merge phase as the most time consuming task. One idea is to use the group shared memory provided for GPU cores, which might result in a significant speed-up. Another idea is to merge line spaces directly, since for each line exit point in one cascade the line starting point of the next cascade is well defined. Using this approach, we are working on a line segment detector as well. In conclusion, we are convinced that the Line Space has an enormous potential for line and line segment detection, since it provides an efficient basis for further optimization and more complex algorithm in this area. ISBN: 978-1-941968-02-4 2014 SDIWC 62

6 REFERENCES [1] Hough, Paul VC. "Method and means for recognizing complex patterns." U.S. Patent No. 3,069,654. 18 Dec. 1962. [2] Duda, Richard O., and Peter E. Hart. "Use of the Hough transformation to detect lines and curves in pictures." Communications of the ACM 15.1 (1972): 11-15. [3] Drettakis, George, and François X. Sillion. "Interactive update of global illumination using a line-space hierarchy." Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 1997, pp. 57-64. [4] Canny, John. "A computational approach to edge detection." Pattern Analysis and Machine Intelligence, IEEE Transactions on 6 (1986): 679-698. [5] Buck, Ian, and Tim Purcell. "A toolkit for computation on GPUs." GPU Gems (2004): 621-636. [6] Kipfer, Peter, and Rüdiger Westermann. "Improved GPU sorting." GPU gems 2 (2005): 733-746. [7] Harris, Mark, Shubhabrata Sengupta, and John D. Owens. "Gpu gems 3."Parallel Prefix Sum (Scan) with CUDA (2007): 851-876. ISBN: 978-1-941968-02-4 2014 SDIWC 63