A Comparison and Matching Point Extraction of SIFT and ISIFT

A Comparison and Matching Point Extraction of SIFT and ISIFT A. Swapna A. Geetha Devi M.Tech Scholar, PVPSIT, Vijayawada Associate Professor, PVPSIT, Vijayawada bswapna.naveen@gmail.com geetha.agd@gmail.com Abstract This paper presents the performance comparison of feature matching algorithms SIFT (scale invariant feature transform) and ISIFT (Iterative scale invariant feature transform). In SIFT, invariant feature key points of an images are extracted to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation. In ISIFT for accurate matching of images, the relative view and illumination between the images estimated iteratively. The matching performance not effected by view and illumination changes in a threshold ranges. The threshold values reflects the maximum allowable change in view and illumination between the images. After comparison of SIFT and ISIFT matching algorithms, the results shows that the ISIFT algorithm consists of accurate matching rate and scale invariant to illumination change. Keywords: image matching, view threshold, illumination threshold, ISIFT (iterative-sift). 1. Introduction Image matching is defined as the comparing images in order to obtain a measure of their similarity. Image matching is a fundamental aspect of many problems in computer vision, including object or scene recognition, solving for 3D structure from multiple images[1], stereo correspondence, and motion tracking. This paper describes image features that have many properties which makes them suitable for matching different images of an object or scene. They are well localized in both the spatial and frequency domains. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and scene recognition. Following are the major stages of computation [2] used to generate the set of image features: Scale-space extreme detection: The first stage of computation searches over all scales and image locations. It is implemented efficiently by using a difference-of-gaussian function to identify potential interest points that are invariant to scale and orientation. Key point localization: Scale-space extreme detection produces too many key point candidates. For each key point candidate, scale and location are assigned at this stage based on their stability. The stability of key points are calculated based on measure of contrast level. Orientation assignment: One or more orientations are assigned to each key point location, based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations. Key point descriptor: The local image gradients are measured at the selected scale in the region around each key point. These are transformed into a representation that allows for significant levels of local shape distortion. Key point descriptors are used to match the features between the images. This approach has been named the Scale Invariant Feature Transform (SIFT), as it transforms image data into scale-invariant coordinates relative to local features. An important aspect of this approach is that it generates large numbers of features that densely cover the image over the full range of scales and locations. However the main contribution of this paper is to address the insufficiency of matching performance of SIFT and its variants. The main disadvantage of SIFT is that the computational complexity of the algorithm increases with the number of key points, especially at the matching step due to the high dimensionality of the SIFT feature descriptor. One of the main drawbacks of SIFT is its vulnerability to colour images, being designed mainly for the gray images and the SIFT not invariant to illumination simulation. This drawbacks has to be overcome by modification of SIFT. That is ISIFT.. Lowe[2] did not only developed SIFT but also discussed the key point matching which is also needed to find the nearest neighbour. He gave an 1026

effective measurement to choose the neighbour which is obtained by comparing the distance of the closest neighbour to the second-closest neighbor.k. Mikolajczyk and C. Schmid [4] compared the performance of many local descriptors which used recall and precision as the evaluation criterion. They gave experiments scale changes, rotation, blur, compression, and illumination changes. In [5]they showed how to compute the repeatability measurement of affine region detectors. also in[6] the image was characterized by a set of scale invariant points. Some researches focused on the application of algorithms such as automatic image mosaic technique based on SIFT[7][8], stitching application of SIFT[9][10][11] and Traffic sign recognition[10] based on SIFT. Y. Ke[12] gave some comparisons of SIFT and PCA-SIFT. In this paper section 2 and 3 explains about the SIFT algorithm and SIFT feature matching, section 4 discusses with ISIFT feature matching algorithm. The comparison of SIFT and ISIFT, and experimental results are shown in section 5 and 6. 2. SIFT ALGORITHM The scale invariant feature transform (SIFT) algorithm, developed by Lowe[2] is an algorithm for image features generation which are invariant to image translation, scaling, rotation and partially invariant to illumination changes and affine projection. The block diagram of SIFT algorithm is shown in figure1.calculation of SIFT image features is performed through the four consecutive steps which are briefly described in the following: 2.1 Scale-space local extrema detection - the feature locations are determined as the local extrema of Difference of Gaussians (DOG) pyramid. To build the DOG pyramid the input image is convolved iteratively with a Gaussian kernel of σ. The last convolved image is down-sampled in each image direction by a factor, and the convolving process is repeated. This procedure is repeated as long as the down sampling is possible. Each collection of images of the same size is called an octave. All octaves build together the so-called Gaussian pyramid, which is represented by a 3D function L(x,y,σ). The DOG pyramid D(x,y,σ)is computed from the difference of each two nearby images in Gaussian pyramid. The local extrema (maxima or minima) of DOG function are detected by comparing each pixel with its 26 neighbours in the scale-space. 18 neighbours in the scale above and below equally, and 8 neighbours in the same scale. The detected local extrema are good candidates for key points. The method presented in this paper search for extrema is performed over the whole octave including the first and the last scale. 2.2 key point localization - the detected local extrema are need to be exactly localized by fitting a 3D quadratic function to the scale-space local sample point. The quadratic function is computed using a second order Taylor expansion having the origin at the sample point. Mathematically it is : D(x) = D+( D T / x)x+0.5xx T ( 2 D/ x 2 )...(1). We can easily find the extreme points of this equation (differentiate and equate to zero). On solving, we ll get sub pixel key point locations. These sub pixel values increase chances of matching and stability of the algorithm. Input Image points Key point detection Orientation assignment Key point localization Key point descriptor Key Figure 1: block diagram of the SIFT algorithm. 2.3 Orientation assignment - once the SIFTfeature location is determined, a main orientation is assigned to each feature based on local image gradients. The gradient magnitudes are weighted by a Gaussian window whose size depends on the feature octave. The weighted gradient magnitudes are used to establish an orientation histogram, which has 36 bins covering the 360 degree range of orientations. The highest orientation histogram peak and peaks with amplitudes greater than 80% of the highest peak are used to create a key point with this orientation. Therefore, there will be multiple key points created at the same location but with different orientations. 2.4 Key point descriptor - the region around a key point is divided into 4X4 boxes. The gradient magnitudes and orientations within each box are 1027

computed and weighted by appropriate Gaussian window, and the coordinate of each pixel and its gradient orientation are rotated relative to the key points orientation. Then, for each box an 8 bins orientation histogram is established. From the 16 obtained orientation histograms, a 128 dimensional vector (SIFT-descriptor) is built. This descriptor is orientation invariant, because it is calculated relative to the main orientation. Finally, to achieve the invariance against change in illumination, the descriptor is normalized to unit length. robustness of the SIFT algorithm with respect to the number of correct matches is presented. Figure2: input image1 3. SIFT feature matching: From the algorithm description given in Section 2 it is evident that in general, the SIFT-algorithm can be understood as a local image operator which takes an input image and transforms it into a collection of local features. To use the SIFT operator for object recognition purposes, it is applied on two object images, a model and a test image, As shown, the model object image is an image of the object alone taken in conditions, while the test image is an image of the object together with its environment. To find corresponding features between the two images, different feature matching approaches can be used. According to the nearest neighbourhood procedure for each feature in the model image feature set the corresponding feature must be looked for in the test image feature set. The corresponding feature is one with the smallest Euclidean distance to the feature. A pair of corresponding features are determine whether this match is positive or negative, a threshold can be used is labelled as positive. Because of the change in the projection of the target object from scene to scene, the global threshold for the distance to the next feature is not useful. Lowe[2] proposed the using of the ratio between the Euclidean distance to the nearest and the second nearest neighbours as a threshold. Under the condition that the object does not contain repeating patterns, one suitable match is expected and the Euclidean distance to the nearest neighbor is significantly smaller than the Euclidean distance to the second nearest neighbour. If no match is correct, all distances have a similar, small difference from each other. A match is selected as positive only if the distance to the nearest neighbour is larger than that from the second nearest one. Among positive and negative matches, correct as well as false matches can be found. Lowe [2] claims that the threshold value provides maximum of correct matches as positive and minimum of false matches as negative. The total amount of the correct positive matches must be large enough to provide reliable feature matching In the following a feature matching Figure3:input image2 Figure4: matching features between the images using SIFT. 4. ISIFT feature matching: When large variations in view and illumination occurs, the matching performance of SIFT is unstable and inaccurate. In ISIFT algorithm, the view and illumination between the images is iteratively estimated. Relationship of the relative view and illumination of the images transform one image to other within the allowable view and illumination threshold values. The process of finding the key points continues iteratively without the need for sequentially going through the whole scale space. In iterative scale invariant feature transform scale space is normalized. The first is randomly searching the scale space for feature key point. And the key point descriptors are formed by using neighborhood key points array of equal scale. to achieve the invariance against change in illumination, the descriptor is normalized to unit length. The features between the images are matched by using ISIFT algorithm. and the view and illumination are transferred between the images within the allowable threshold ranges. They are view and illumination threshold values. The ISIFT matching process is explained below: 1028

Consider two images of the same scene are shown as two points in scale space P of the object (scene). Let be the original appearance of an object G, and G v = E(F(G)) be the real appearance of the object shown from an image, where F indicates the illumination transformation, and E indicates view transformation of the image. Here, we can define the scale space of a given image G as P = {E,F}. Translation E and F is a point in scale space. thus, the observed image is shown as a point in the scale space, which is expanded by object. Therefore, the purpose of image matching is to find transformation H between the two points in the scale space. To find the coordinate differences between the two points transformation is needed. Therefore, E is the homography transform Matrix, and F is the histogram matching function that transforms the histogram of one image to a specific one. Consider reference image and test image to be matched as G r and G t as shown in figure5. Suppose that the true view transformation matrix from G t to G r is E 1.and the illumination change function is F 1.The relationship between the G r and G t is G r (X) = H 1 (G t ) = F 1 (E 1 (G t )) = F 1 (G t (E 1 X))..(2) i = i+1; step3: then iteratively estimate the transformation H i : E i, F i ; Step4:where H = H i H ; E = E i * E; Step5: Now Transform G i-1 to G i by (4); Until (E i D) < σ E, (F i - 1) < σ L ; (Where D is the unit matrix, E i and F i are the view and illumination factors, σ E and σ L are the view and illumination threshold values.). Return H,E; Reference image Gr Low matching Target image Gt Where H 1 is the true transformation between G t and G r and x is the homogeneous coordinates, and if there exists approximate estimations and illuminations changes and view changes the G t could be transformed to an estimated image G that is High matching Estimated image Ge E,F G(X) = H(G t ) = F(G t (EX)) (3). If G is not a rough estimation between G t and G r,the estimated image G could be more similar to G r than G t.thus the matching between the G and G r will be easier. In this way, propose the iterative object matching process: G 1 x = H 1 (G 0 ) = F 1 (G 0 (E 1 x H )) (G 0 = G t ) G i x=h i (G i-1 )= F i (G i-1 (E i x H )) (i >1).... (4) 4. 1 ISIFT Algorithm: Step1: initially, assuming the estimated transformation H 0 = {E 0,F 0 } = {D,1}, where H = H 0,σ E, σ L ; Step2:at this step, iteration starts Figure5: Block diagram of iterative image matching algorithm. General image-matching methods by local features focus on the first parameter E since the concerned issue is the space correspondence. It could improve the performance of image matching because the images in the parameter space would be closer when the illuminations between them are similar. One of the advantages of the proposed method is that it also estimates the illumination change, which makes matching much better when illumination has changed[14]. 5. Comparision of ISIFT and SIFT: SIFT approach finds a constant number of key points and requires a relative high and constant time. It is important to noting that even though iterative SIFT finds a good amount of features in less time than it would take SIFT. This is the process of iterative SIFT. Especially at the matching step due to the high 1029

dimensionality of the SIFT feature descriptor. The below is a table of comparison of matching points between SIFT and ISIFT. TABLE I MATCHING POINTS COMPARISON OF SIFT AND ISIFT DATA SET NO SIFT ISIFT 1 85 102 2 96 137 3 65 84 SIFT is high compared to ISIFT. Real time application of ISIFT is possible. For SIFT it is not possible. 6. ISIFT and SIFT results evaluation: Figure 6, 8, 10 shows feature matching for an image containing 3D objects using ISIFT. And with extensive background clutter so that detection of the objects may not be immediate even for human vision. The image on the right shows the final correct identification Super imposed on a reduced contrast version of the image.. The table1 summarizes the matching points comparison of SIFT and ISIFT algorithms. After observing all the results ISIFT provides the high matching points between the images. therefore, the ISIFT algorithm consists high matching accuracy than the SIFT algorithm. TABLE 2 COMPARISON OF SIFT AND ISIFT data SIFT ISIFT 50 100 150 200 250 300 350 400 450 IMG 0 192.jpg 100 200 300 400 500 600 ISIFT features of image window1.jpg 50 Simulation to reference image YES NO 100 150 200 250 Simulation to test image YES YES 300 350 400 450 Illumination simulation NO YES Affine invariance Full Partial 100 200 300 400 500 600 Figure6:.Matching results of data set no 1 using ISIFT with pose and illumination. Computational Cost High Low Real-Time NO YES The table 2 summarizes the properties of SIFT and ISIFT algorithms. In SIFT major Characteristics of key points is the luminance of the image does not enter their definition. This is the main disadvantage, because slight variation in luminance will produce candidates for key points similar to those produced to large variations. ISIFT is invariant to illumination change. SIFT is fully invariant to affine transformations, and ISIFT is partially invariant to affine change. Computational cost of 1030

Figure7: Matching results of data set no 1 using SIFT with pose. Figure9: Matching results of data set no 2 using SIFT with pose. Figure8: Matching results of data set no 2 using ISIFT with pose and illumination. Figure 10: Matching results of data set no 3 using ISIFT with pose and illumination 1031

ISSN:2229-6093 References Figure11: matching results of data set no 3 using SIFT with pose. The key points that were used for detection are shown as squares with an extra line to indicate orientation. The sizes of the squares correspond to the image regions used to construct the descriptor. An outer parallelogram is also drawn around each instance of recognition, with its sides corresponding to the boundaries of the training images projected under the final affine transformation determined during recognition. Figure 7, 9, 11 shows the results of SIFT matching algorithm with view changes. Another potential application of the approach is to place recognition, in which a mobile device or vehicle could identify its location by recognizing familiar locations. 7. Conclusion This paper has evaluated two feature matching algorithms for matching between images. SIFT is slow and not good at illumination changes, while it is invariant to rotation, scale changes and affine transformations. An improved version of SIFT, that is ISIFT. Which could overcome the limitation of conventional SIFT. By comparing the results of both matching algorithms, the ISIFT-based matching algorithm not only increases the number of correct matches, but also improves the matching accuracy. In the entire above scenario the ISIFT algorithm outstands than the SIFT algorithm. [1] David G. Lowe. : Local feature view clustering for 3D object recognition, IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii (December 2001),pp. 682-688. [2] David G. Lowe. : Distinctive image features from scale-invariant key-points, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. [3] David G. Lowe. : Object recognition from local scaleinvariant features, International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157. [4] K. Mikolajzyk and C. Schmid. A Performance Evaluation of Local Descriptors, IEEE,Trans. Pattern Analysis and Machine Intelligence, vol.27, no.10, pp 16151630,October 2005. [5] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F.Schaffalitzky, T. Kadir, and L.V. Gool. A Comparison of Affine Region Detectors, IJCV,65(1/2):43-72, 2005. [6] K. Mikolajczyk and C. Schmid. Indexing Based on Scale Invariant Interest Points. Proc. Eighth Int l Conf. Computer Vision, pp. 525-531, 2001. [7] Yang zhan-long and Guo bao-long. Image Mosaic Based On SIFT,International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp:1422-1425,2008. [8] Salgian, A.S. Using Multiple Patches for 3D Object Recognition,Computer Vision and Pattern Recognition, CVPR '07. pp:1-6, June 2007. [9] M. Brown and D. Lowe. Recognizing Panoramas. Proc. Ninth Int lconf. Computer Vision, pp. 1218-1227, 2003. [10] Y. Heo, K. Lee, and S. Lee Illumination and camera invariant stereo matching. In CVPR, pp:1 8, 2008. [11] Cheng-Yuan Tang; Yi-Leh Wu; Maw-Kae Hor; WenHung Wang. Modified sift descriptor for image matching under interference. Machine Learning and Cybernetics, 2008 International Conference on Volume 6, pp:3294 3300, July 2008. [12] Y. Ke and R. Sukthankar.PCA-SIFT: A More Distinctive Representation for Local Image Descriptors,Proc. Conf. Computer Vision and Pattern Recognition, pp. 511-517, 2004. [13] Bay,H,. Tuytelaars, T., &Van Gool, L.(2006). SURF :Speeded UpRobust Features, 9th European Conference on Computer Vision. [14] Matching Yinan Yu, Student Member, IEEE, Kaiqi Huang, Senior Member, IEEE, Wei Chen, Student Member, IEEE, A Novel Algorithm for View and IEEE Illumination Invariant Image Matching. TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012. 1032

1033