GIST GPU Implementation Prakhar Jain ( 200601066 ) Ejaz Ahmed ( 200601028 ) 3 rd May, 2009 International Institute Of Information Technology, Hyderabad
Table of Contents S. No. Topic Page No. 1 Abstract 3 2 Introduction 4 3 Basic Algorithm 4 4 Parallelization 4 5 Getting Descriptor 5 6 Graphs 8 7 Speed Up 9 8 Precision / Accuracy 9 9 Related work Refrences 9 Page 2 of 10
ABSTRACT GIST is a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene called the Spatial Envelope. Torralba proposed a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, he has shown that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category. The implementation the authors have given is in Matlab which runs on the CPU.The objective of this project is to parallelize it using GPGPU ( CUDA ). Page 3 of 10
Introduction Many of the Computer Vision algorthims are computationally expensive.algorithms such as Adaboost training, SIFT Feature training, SFM etc require weeks to execute. But, these algorithms are inherently parallelizable. In this project we have parallelized the GIST feature exploiting the computation power of the GPU with the help of Nvidia s CUDA. The rest of the report descrives the Basic algorithm, flowchart and our approach to parallelize it. We conclude it with some results, speed ups and graphs to support the performance claims. BASIC ALGORITHM Create Gabor filter bank Create a Parameter Matrix Create transfer functions Preprocess Image Low contrast normalization Local luminance variance normalization Getting the descriptor For each filter Convolve image with filter Divide the image in blocks Mean of each block is the corresponding feature PARALLELIZATION Pixel level parallelization. Use one thread to calculate the value of one element of a Gabor filter. Number of threads will be equal to number of filters * size_of_image * size_of_image Preprocessing the image Page 4 of 10
o Fourier transform * gaussian ( element by element ) o Pixel level parallelization due to inter pixel independence. Getting the Descriptor:-> For calculating the actual descriptors three things are required:- Preprocessing of image to obtain normalized image suitable for calculating features. Calculation of Gabor filters for every orientation and scale. Number of bocks in which the image should be devided. Steps For Getting the Descriptor:- 1. Getting the Fourier Transform of an image:- For calculating the features the processing is done in frequency domain.therefore the image needs to be converted from spacial to frequency domain. Image R G B components R G B Components Spatial Domain Spatial Domain Frequency Domain Fourier Transform of each component of image is calculated using CUFFT Library's function, which is 2 times as fast as CPU implementation of FFT. 2. Applying the Filters on Images:- Number of scales Ns Number of Orientation Per Scale No Total Number of Filters Nf=Ns * No Number of Channels 3 Page 5 of 10
Each filter is applied to each channel of image to give Nf*3 images.this is done on GPU with the help of following Kernels Element Multiplication:- Number of Threads Created = ImageSize * imagesize*nf R G B image components in Frquency Domain Nf Number of Filters (i) Element Multiplication Result Of Element Multiplication in Frequency domain. Taking Inverse FFT of Nf * 3 Images :- Inverse Fourier Transform of Nf * 3 images is calculated using CUFFT Library's function, which is 2 times as fast as CPU implementation of IFFT. Taking Absolute of Every image :- For this absolute kernel is used. Total Number of threads created = ImageSize * ImageSize * Nf * 3. Page 6 of 10
3. Getting The Features:- For every Nf * 3 images (obtained from prev step) Devide the image in blocks. Mean Value of each block is the one feature. mean value of each block is one feature For calculating mean CUDPP's SEGMENTED SCAN is Used. Page 7 of 10
Graphs Page 8 of 10
SPEED UP Precision / Accuracy The values are found to be accurate upto 4 places of decimal when compared to results of Torralba s Matlab code. Related Work and Refrences Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope Aude Oliva, Antonio Torralba International Journal of Computer Vision, Vol. 42(3): 145-175, 2001 http://cvcl.mit.edu/papers/ijcv01-oliva-torralba.pdf CPU Implementation: http://people.csail.mit.edu/torralba/code/spatialenvelope/gist.zip Segmented Scan: http://www.gpgpu.org/static/developer/cudpp/rel/cudpp_1.0a/html/group pub lic_interface.html Page 9 of 10
Page 10 of 10