An Effective Image Retrieval Mechanism Using Family-based Spatial Consistency Filtration with Object Region

Size: px
Start display at page:

Download "An Effective Image Retrieval Mechanism Using Family-based Spatial Consistency Filtration with Object Region"


1 International Journal of Automation and Computing 7(1), February 2010, DOI: /s An Effective Image Retrieval Mechanism Using Family-based Spatial Consistency Filtration with Object Region Jing Sun Ying-Jie Xing School of Mechanical Engineering, Dalian University of Technology, Dalian , PRC Abstract: How to construct an appropriate spatial consistent measurement is the key to improving image retrieval performance. To address this problem, this paper introduces a novel image retrieval mechanism based on the family filtration in object region. First, we supply an object region by selecting a rectangle in a query image such that system returns a ranked list of images that contain the same object, retrieved from the corpus based on 100 images, as a result of the first rank. To further improve retrieval performance, we add an efficient spatial consistency stage, which is named family-based spatial consistency filtration, to re-rank the results returned by the first rank. e elaborate the performance of the retrieval system by some experiments on the dataset selected from the key frames of TREC Video Retrieval Evaluation 2005 (TRECVID2005). The results of experiments show that the retrieval mechanism proposed by us has vast major effect on the retrieval quality. The paper also verifies the stability of the retrieval mechanism by increasing the number of images from 100 to 2000 and realizes generalized retrieval with the object outside the dataset. Keywords: Content-based image retrieval, object region, family-based spatial consistency filtration, local affine invariant feature, spatial relationship. 1 Introduction Research in content-based image retrieval (CBIR) is converging towards building an efficient retrieval mechanism, including the detection of the feature, the selection of the query region, the construction of the spatial consistency, etc. The objective of this paper is to retrieve the subset of images that contain some query target in the object region based on local invariant feature through using the proposed retrieval mechanism considering the spatial relationship of the feature regions in the matching stage. The traditional and inefficient solutions of CBIR consist of the following steps: 1) feature detection by the detector, 2) feature description by some one high-dimensional descriptor, 3) clustering of the descriptors, and 4) returning a ranked list of the image set by a ranking function. How to detect and measure the similarities among the different objects becomes a core issue of CBIR. Recent work in this domain can be divided into two categories: one is based on the image segmentation, such as Blobworld and SIMPLcity [1] ; the other is the bag of words (Bo) method [2 6] which has become more and more attractive. In the Bo method, some feature regions are detected in the images, and each region is described by some certain descriptors. Then, the descriptors are clustered into a visual vocabulary, and each region is mapped to its closest clustering; each clustering has a clustering centre. At last, an image is represented as a bag of visual words together with their frequencies of occurrence. Manuscript received March 3, 2009; revised May 25, 2009 This work was supported by National High Technology Research and Development Program of China (863 Program) (No. 2007AA01Z416) and National Natural Science Foundation of China (No ), Beijing New Star Project on Science and Technology (No. 2007B071), Natural Science Foundation of Liaoning Province of China (No ) Methods aforementioned in fact mimicked the text retrieval system using the analogy of visual words and were equal to an initial filtering idea. This is very computationally expensive, although the initial filtering can greatly reduce the number of documents to be considered. Typically, some existing image retrieval methods [2,7] do not use spatial structure of the Bo in the initial filtering stage. The biggest difference between the image retrieval and the text retrieval is that the visual words in the former have more spatial structures than the latter. For example, someone who wants to retrieve a three-word text query, such as ABC, may, in general, search for documents containing those three words in any order, such as ACB, BAC or CBA, etc., and at any positions in the document. A visual query, however, since it is selected from a query image, includes visual words in a spatial configuration corresponding to some view of the object in this image; it is therefore reasonable to try to make use of this spatial information when searching the corpus for different views of the same object. The paper constructs a retrieval mechanism consisting of several components, including the extraction of the local affine invariant feature regions, the selection of query region based on the similarity score, and the spatial consistency measurement based on the family. The rest of the paper is organized as follows. Section 2 describes the detection of local affine invariant feature. Section 3 constructs an image retrieval mechanism, including the clustering of the scale invariant feature transform (SIFT) descriptors, the object region by the rectangle, and the family-based spatial consistency filtration. Section 4 illustrates some experiments to verify the retrieval mechanism proposed by us. An image set is set up firstly in Section 4.1; the evaluation standard of retrieval results is given in Section 4.2; in Section 4.3, four schemes are de-

2 24 International Journal of Automation and Computing 7(1), February 2010 signed to prove the correctness of the retrieval mechanism proposed by this paper through the retrieval examples; the stability with the increasing number of images is verified in Section 4.4; generalized retrieval with the object outside the dataset is realized in Section 4.5; the retrieval time is analyzed in Section 4.6. Section 5 summarizes the contributions of our study and highlights our future research directions. 2 Local affine invariant feature detection CBIR is not based on the text nor manually annotated but is based on the recognition of invariant feature in the image [2,7 10]. Since Baumberg [11] proposed the innovation about the feature detection of image, a lot of detection algorithms have emerged in recent years. The Harris-Affine detector [12] has attracted great attention because of its good performance. This paper just uses the detector to extract the feature as the low-level local affine invariant feature for the succeeding retrieval process. 2.1 Affine normalization theory based on shape-adapted The scale-adapted second-moment matrix is often used for describing the gradient distribution in the local image neighborhood; it is also independent of the image resolution [12, 13] : ( ) µ(x, σ I, σ D) = σdg(σ 2 L 2 x(x, σ D) L xl y(x, σ D) I) L xl y(x, σ D) L 2. y(x, σ D) (1) The scale-adapted Harris measurement consists of the trace and determinant of the second-moment matrix, and the local maximum of the Harris measurement can determine the space location of the initial point: cornerness = det(µ(x, σ I, σ D)) α tr 2 (µ(x, σ I, σ D)). (2) Affine Gaussian-scale space is constructed by a series of images which are obtained by convolution with images and non-uniform elliptical Gaussian function: 1 L(x; Σ) = g(σ) I(X) = ( 2π X detσ )e T Σ 1 X 2 I(X). (3) The second-moment matrix µ L, based on affine Gaussian scale space, is defined by µ L( ; Σ I, Σ D) = g( ; Σ D) (( L)( ; Σ I)( L)( ; Σ I) T ). (4) Some variable substitutions are done for convenience of expression µ L(q L; Σ I,L, Σ D,L) = M L, µ R(q R; Σ I,R, Σ D,R) = M R. (5) Lindeberg [13] has proved that the relationship between two points having affine transformation is purely rotary after the neighborhood of the two points is affine normalized (see Fig. 1). This is the theoretical basis of the extraction of local affine invariant feature. (c) (d) Fig. 1 Affine normalization based on the shape-adapted matrices. : q L M 1/2 L q L 1/2 ; (c) (d): qr MR q R ; (d): q L Rq R To modify the shape of the initial point, we should make the local anisotropy region an isotropy region with the shape-adapted matrix [12] : U = k (µ 1 2 ) (k) U (0). (6) 2.2 Synchronous iterative of the location, scale, and shape For some given initial point X(0), the concrete iterative procedures are as follows: Step 1. Initialize the second-moment matrix U(0) to the unit matrix E. Step 2. Normalize the elliptical feature region to the circular feature region by the shape-adapted matrix U (k 1), and the centre of the normalization window (X ) = I(X ) is located at X (k 1) : X (k 1) = U (k 1) 1 X (k 1). Step 3. hen normalized Laplacian of Gaussian (LoG) function has reached the maximum, fix integration scale at X (k 1) is σ (k) I σ (k) D LOG(X, σ I) = σ 2 I L xx(x, σ I) + L yy(x, σ I). (7) Step 4. Select the differentiation scale σ (k) D, σ(k) I, and X (k 1) into µ(x (k 1) λ min(µ)/λ max(µ) obtains the maximum. Step 5. Compute the extremum X (k), σ(k) I = sσ(k) I. Put ). Then,, σ (k) D of the Harris measurement in normalized window whose centre is the point X (k 1). Step 6. Compute µ (k) i = µ ( 1/2) (X (k) µ (k) i, σ(k) I, σ (k) D ). Step 7. Update the shape-adapted matrix U (k) = U (k 1) with µ (k) i the matrix U (k) to λ max(u (k) ) = 1. obtained in Step 6, and then normalize Step 8. Compute X (k) in the image domain by the formula X (k) = X (k 1) + U (k 1) (X (k) X(k 1) ). Step 9. Compute the convergence ratio: 1 λmin(µ) < εc. (8) λ max(µ) If ε C 0.05 or µ is approximately equal to rotation matrix, the above algorithm converges; otherwise, go to Step 2. For an initial point, the above processes may automatically iterate and converge to an affine invariant feature point by modifying the shape, scale, and space location of the initial point. e extracted feature points and their neighborhoods in the skylight from two different views of a car model. Figs. 2 and (c) are the iterative processes of the corresponding feature points in two views, and Figs. 2 and (d) are the normalized regions of and (c), respectively. After iterating five times, the location, scale, and shape of the invariant point do not change anymore.

3 J. Sun and Y. J. Xing / An Effective Image Retrieval Mechanism Using Family-based 25 with the dataset in Section 4.1. The clustered patches reflect the properties of SIFT descriptors: the clustering is based on the spatial distribution of the image gradient orientation, but not the intensity just across the region. (c) (d) Fig. 2 Iterative detection of the affine invariant point region. The iterative processes of the corresponding feature points in one view; The normalized regions of ; (c) The iterative processes of the corresponding feature points in another view; (d) The normalized regions of (c) 3 Image retrieval mechanisms This paper designs two distinctive strategies to improve the performance of image retrieval: the object region by a rectangle and the family-based spatial consistency filtration. 3.1 The clustering of SIFT descriptors Those elliptical feature regions extracted in Section 2.2 are computed at twice the original size of the detected regions in order to make the appearance of the regions more discriminating after they are described. To obtain the vector representation of the feature regions, we first affinenormalized elliptical feature regions to the same size image patches and describe these patches. The left of Fig. 3 shows all elliptical feature regions in the image, and the right of Fig. 3 shows affinenormalized patches of elliptical feature regions; and the arrows in Fig. 3 indicate the corresponding relations between the feature regions and the patches. This paper describes the feature regions using the SIFT descriptor supplied by Lowe [14]. Because of its rotation invariance and good performance [15], SIFT descriptor is used in many fields of the computer vision [16, 17]. Fig. 3 shows the SIFT descriptors described in the normalization patches in Fig. 3 ; the length and the direction of the black arrow are the magnitude and the direction of the major gradient of the feature point neighborhood with the scale, respectively. Fig. 3 Normalization of elliptical feature regions into the same size patches and description of the patches by SIFT. The elliptical feature regions and their normalized patches; The SIFT descriptors described in the normalization patches in SIFT descriptor is quantized by K-means clustering, called vector quantization, and Mahalanobis distance is used as the distances function. The clustering process of the visual words is presented in Fig. 4. Fig. 5 shows two visual words, which cluster from SIFT descriptors that come from feature regions detected by the method of Section 2.2 Two visual words clustering in Harris-Affine feature re- Fig. 5 gions Fig. 4 The flowchart of clustering process 3.2 The object region by the rectangle To improve the precision of retrieval results, we propose an efficient retrieval method based on invariant feature in the object region. In a query image, some query object belongs to the query region that is defined as the object region. The subordination among the query image, object region, and the query object is query image object region query object. The flowchart of image retrieval based on the object regions is as follows: Step 1. Pre-processing (off-line) Step 1.1. Extract the feature regions. Select the image dataset, and then extract the affine covariant feature regions in the whole images of the dataset. Step 1.2. Describe the feature regions. Represent the invariant feature regions with SIFT descriptors. Step 1.3. Cluster the descriptors. Use the K-mean method to cluster the descriptors, and the clustering centers are the visual words. All the words become the visual vocabulary. Step 1.4. Vectorization of the images. Computer term frequency - inverse document frequency (TF-IDF) of the

4 26 International Journal of Automation and Computing 7(1), February 2010 visual words, and then construct the vector space for words of the whole dataset. Step 2. Retrieval processing (on-line) Step 2.1. Select the object region. Query region including the object, named the object region, is selected with the rectangle in the retrieval image by manual, and then the visual words are computed in the object region. Step 2.2. First retrieval. According to the similarity score between the object region of the query image and the object region of the image in the dataset, the retrieval results are ranked at the first based on the frequency of the visual words in the object region. Step 2.3. Second retrieval. The first retrieval results are re-ranked by using family-based spatial consistency filtration in the object region. By the standard weighting scheme of TF-IDF, which is the vector-space model of information-retrieval [18], we count the word frequency coefficient t i in every image. The query image and each image in the dataset are represented as a vector of the visual word occurrences, and then we should calculate the similarity score between the query vector and each image vector using the Mahalanobis distance. Query region is selected by the rectangle in retrieval image, and then the words and their frequencies after weighting are counted; thus, the subset R query of visual words in query region is obtained. Compute intersection set between R query and R i (i [1, N]), that is, the set of the visual words in every image, and then accumulate the minimum weighting word frequency in the intersection set. Similarity score S i between the image i in the dataset and the query image with the object region is S i = j min(t jquery, t ji), j R query R i (9) where t jquery is the weighting word frequency in query region; t ij is the weighting word frequency in the word j of the image i. Fig. 6 shows a comparison between retrieval results based on the whole image and those based on the object region. Because we only want to explain the influence of the object region on the retrieval results, spatial consistency is not used here. Fig. 6 shows the query image, and Fig. 6 shows another image having the same background; Fig. 6 (c) shows the retrieval results based on the whole image, and Fig. 6 (d) shows the retrieval results based on the object region in the rectangle. From Fig. 6 (c), we can see that, because the retrieval method based on the whole image is selected, although the query image itself is retrieved as the first image, the image having the same background as the query image is also retrieved with a high score as it has the same content as the query image. Fig. 6 (d) shows the retrieval results using the object region in the rectangle and not containing the image, Fig. 6, because Fig. 6 has not the same content as the object region in Fig. 6. (c) (d) Fig. 6 Comparison of retrieval results between the whole image and the object region. The query image; Another image having the same background with ; (c) The retrieval results based on the whole image; (d) The retrieval results based on the object region in the rectangle 3.3 Family-based spatial consistency filtration The spatial angles between the right matching pairs in two images are very similar. But for the false matching pairs, there are two apparent characteristics: 1) the number of the matching pairs is small; and 2) the spatial angles of the false matching pair are not consistent with the spatial angles of the right matched pair. In other words, the false matching pair is isolated. According to the characteristics of the matching pairs, we propose a filtration method to remove false matching: family-based spatial consistency filtration. The definition and the forming process of the family are introduced below. Family: The set of matching pairs whose spatial angles are very similar. Families sets: All of the families construct the families sets which are the matching pairs for realizing image retrieval. The forming processes of the family are as follows: Step 1. Sort the spatial angles among the different matching pairs: SpaceAngle{ }: {S min,, S i,, S max}. Step 2. Give the thresholds of the spatial angle and the family volume: 1) Th SA: The spatial angle thresholds. It describes the relation of the spatial angles that belong to the same family. 2) Th FamilyNUM: The family volume thresholds. It is the minimum of the number of the matching pairs that compose a family. Step 3. Finding the family in the set of SpaceAngle{ }. Step 4. The number of right matching pairs: MatchedFamily [ ]. Make a family: Procedure MakeFamily() If S i 1 S i > Th SA and S j+1 S j >Th SA then FamilyNUM j i If FamilyNUM > Th FamilyNUM then MachedFamily[n + +] FamilyNUM End if End if If the difference of the spatial angle of matching pairs

5 27 J. Sun and Y. J. Xing / An Effective Image Retrieval Mechanism Using Family-based is no more than Th SA, the matching belongs to this family; if the number of the matching pairs in one family is less than Th FamilyNUM, the family is ignored as a false matching. As long as a family is found, SpaceAngle{} SpaceAngle{}+1. Fig. 7 shows that the original matching results without the spatial consistency are many false matching pairs; Fig. 7 shows the matching result based on the spatial consistency. The false matching pairs are removed effectively by the family-based spatial consistency filtration. retrieval results with the spatial consistency on the whole image, and seven of the foremost images belong to the same sorts as the query images. It verifies that the spatial consistency filtration proposed by us can significantly improve the retrieval results. 4 Experiments 4.1 The image set To verify the correctness of retrieval mechanism proposed by us, we use 100 images that belong to 10 subjects to do retrieval experiments. There are 10 images in every subject, and in 10 images, there is the same object with the changes of rotation, scale, viewpoint, brightness, and the partial occlusion. The size of each image is Using a 3.2 G Pentium 4 PC with 2 G of main memory, the retrieval time of each image is 0.16 s on average. Fig. 9 shows the number of invariant feature regions in 100 images using Harris-Affine detector. There are totally regions in dataset, and the number of feature regions detected on each image is approximately 120. The number of SIFT descriptors and the clustering centers are and 487, respectively (see Table 1). Table 1 also shows the data of 2000 images, as will be discussed in Section 4.4. Fig. 7 The function of the spatial consistency on matching. The original matching result without the spatial consistency; The matching result based on the spatial consistency Fig. 8 shows the influence of spatial consistency on the retrieval results. Fig. 9 The number of the feature regions in 100 images by the Harris-Affine detector Table 1 Images (c) Fig. 8 The influence of the spatial consistency on retrieval. The query image; The retrieval results without the spatial consistency; (c) The retrieval results with the spatial consistency on the whole image Fig. 8 shows the query image, and Fig. 8 shows the retrieval results without the spatial consistency; we can find that in the first ten images, there are only two images that are the same sorts with the query image. Fig. 8 (c) shows the 4.2 The clustering parameters of the two image sets Feature regions Descriptors Cluster centers The evaluation standard of the retrieval results The paper takes average retrieval accurate ratio (AAR) as the evaluation standard of retrieval results. First, given the ground truth, taking the subject of man as an example, the subject of man is the ground truth for man 1 and man 10, respectively, as the query image. In other words, if we take man 1 as the query image, the correct retrieval result is that the foremost ten images should be the whole images of the subject of man. The dataset has already been classified. Let G be the ground truth, including image I. If the retrieval result outputs T similar images in which there are n images that belong to the ground truth G, then the retrieval accurate ratio (AR) is defined as n AR =. (10) T

6 28 International Journal of Automation and Computing 7(1), February 2010 For an objective evaluation result, we further present AAR of the several retrieval results to measure the performance of our method. Retrieving each image of the subject and recording the retrieval results of first ten images (T = 10), we compute AAR for every subject: AAR = 1 j j AR i (11) where j is the number of images that every subject contains (i.e., j = 10). Then, we average the average AAR to obtain a mean AAR (maar) score. The maar score is used to evaluate the overall performance of our retrieval mechanism maar = 1 n i=1 n ( 1 j j=1 j AR i) (12) where n is the number of subjects. e select 10 sorts images, so n = 10. i=1 4.3 The retrieval examples Four schemes are designed to verify the correctness of retrieval mechanisms: 1) Based on the whole image: feature extraction, description, matching, and image retrieval are all in the whole image. In fact, it is the typical Bo, which simulates simple text-based image retrieval using the analogy of the visual words. 4.4 Verifying the stability by increasing the number of images To verify the stability of the retrieval mechanism proposed by us, we increase the number of images from 100 to 2000, and then compare the results between 100 and 2000 dataset in order to stress-test retrieval performance when the volume of the dataset is enlarged. All new images which are the same sorts with 100 dataset, and have the new sorts different from 100 image sets, are chosen from key frames of TREC Video Retrieval Evaluation 2005 (TRECVID2005). In doing so, we ensure the variety and the rationality of the sorts of 2000 images. Fig. 11 shows the retrieval results based on the object region with the family-based spatial consistency filtration to 2000 dataset. There are 25 images, the sort of Basketball Game, in 2000 dataset. Fig. 11 shows the query image, and the region in the rectangle is the object region. Figs. 11 and (c) show that there are 18 images in the front 25 images, so AAR of this retrieval is 72. 2) Based on the object region: feature extraction, description, matching, and image retrieval are all in the object region, which are selected by a rectangle. 3) Based on the whole image with the spatial consistency: on the basis of 1), the spatial consistency is used for constraining retrieval results. 4) Based on the object region with the spatial consistency: on the basis of 2), the spatial consistency is used for increasing constraint to retrieval. Fig. 10 shows the AAR of the ten sorts of images. It shows the functions of the object region and the familybased spatial consistency on the AAR, respectively. Fig. 10 Average retrieval accurate ratio of ten sorts of images (c) Fig. 11 Retrieval results based on the object region and the family-based spatial consistency filtration. The query image; & (c) The retrieval results Fig. 12 shows a comparison of retrieval results based on the object region and the family-based spatial consistency filtration between 2000 and 100 dataset. ith the number of images increasing, the ground truth is changed. Fig. 12 shows decreasing degree from 100 to 2000 images at AAR with the object region and the spatial consistency, and Fig. 12 shows maar of the four retrieval mechanisms under 100 and 2000 images, respectively. From

7 J. Sun and Y. J. Xing / An Effective Image Retrieval Mechanism Using Family-based 29 Fig. 12, we can see that when the number of images increases from 100 to 2000, the performance of retrieval results may decay in some extent. But such results are more ideal relative to the increasing number of images. Here, we introduce the decrease rate of AAR to evaluate the changes of AAR when the capacity of dataset is increasing. The values of the decrease rate under the four retrieval mechanisms are the difference between 100 and 2000 images; when the capacity of dataset has increased to 20 times, under the four different retrieval mechanisms, the decrease rate of maar is 9.2. of logo NBC in Figs. 13,, and (c), respectively. The results show the corresponding relationship of the feature regions between the ground truth and the image from the internet. (c) (d) (e) (f) Fig. 13 Retrieval with the object outside the dataset. One image having logo NBC using the word NBC with Baidu Search Engine; & (c) The retrieval results that belong to the ground truth; (d) The close-up of logo NBC in ; (e) The close-up of logo NBC in ; (f) The close-up of logo NBC in (c) Fig. 12 AAR maar The comparison of the 100 and the 2000 images 4.5 Generalized retrieval In practice, images with the object are always beyond the image dataset, such as the images of product logos or particular buildings from the internet, but not like the object mentioned above which should be in some image belonging to the dataset. Therefore, we should check out if the object is in the dataset or not. This retrieval searching for objects from the other dataset is named generalized retrieval. By the generalized retrieval, we further prove the performance of the feature regions and the retrieval mechanisms constructed by this paper. To evaluate the performance of retrieval mechanisms as it works outside the dataset, we should first set up the ground truth consisting of the objects either in the dataset or in the image from the internet. According to 2000 dataset, we select the sort of NBC to be the ground truth, and also search one image having logo NBC using the word NBC with Baidu Search Engine (see Fig. 13 ). The ellipses in Fig. 13 represent the feature regions, and for the quality of image, we only extract some regions of the image. Figs. 13 and (c) show the retrieval results that belong to the ground truth. Figs. 13 (d), (e), and (f) show the close-up 4.6 Analysis on the retrieval time At last, we will analyze the time consumption of the whole retrieval process. For 100 dataset, the whole retrieval process is very quick because the capacity of dataset is not big. Table 2 shows the time at different stages for 2000 datasets. e make a sampling test to gain an objective retrieval time. 100 images which cover all sorts of the 2000 datasets are selected at random from 2000 images. The average time of the sampling test is the average retrieval time of 2000 dataset. Of course, we also use the average time for feature extraction, description, and clustering. For example, the time of feature extraction and feature description is 18 min, and so the average time of the feature extraction and description for each image is 18 60/2000 = 0.54 s. Table 2 The time consumption of the 2000 images Stages Time (s) Feature detection and description < 0.5 Clustering < 0.8 hole image < 0.2 Retrieval Object region < 0.3 mechanism hole image and family-based < 0.7 Object region and family-based < Conclusions CBIR is one of the most important research contents in image retrieval. Recently, one major development in this area is the use of the spatial information of the visual words of the query images. Our purpose is to construct an efficient image retrieval mechanism. In this work, we deal with the extraction and description of feature, selection of the object region and the spatial consistency measurement in a combined way to improve the performance of CBIR. Our main contribution can be outlined as follows: 1) Based on the idea of traditional text retrieval, we first detect the Harris-Affine regions as low-level features,

8 30 International Journal of Automation and Computing 7(1), February 2010 and then describe the features into the SIFT descriptors and cluster them into the visual words. 2) According to the standard weighting and similarity rule for measuring invariant feature, the object region is selected using rectangle in query image; the first sorting for the retrieval images is based on the similarity score between the image in the dataset and the query image. 3) e propose the family-based spatial consistency filtration for the second sorting by means of the spatial angles between the matching pairs in two images. e also give the definition and the forming process of the family. 4) To demonstrate the correctness of the proposed retrieval mechanism, we design four retrieval schemes. At last, we verify the stability by increasing the volume of the dataset. e have realized generalized retrieval with the object outside the dataset and verified the correctness of our retrieval mechanism. The family-based spatial consistency is not suitable for the large-scaling transformation. This is a problem to be tackled in our future work. e can use the spatial consistency based on searching unit at the large-scaling images and try to incorporate the two configurations of the spatial consistency. Moreover, the stress test for retrieval performance will be research direction in the future in scaling up of the dataset. Acknowledgement e thank the Institute of Computing Technology (ICT), Beijing, the Chinese Academy of Sciences for providing the test platform; we are also very grateful for suggestions from and discussions with Dr. Y. D. Zhang and Dr. K. Gao of ICT. References [1] C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein, J. Malik. Blobworld: A system for region-based image indexing and retrieval. In Proceeding of the 3rd International Conference on Visual Information Systems, IEEE Computer Society, Amsterdam, Holand, vol. 2, pp , [2] J. Sivic, A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision, IEEE, Nice, France, vol. 2, pp , [3] J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Minneapolis, USA, pp. 1 8, [4] K. Gao, S. X. Lin, J. B. Guo, D. M. Zhang, Y. D. Zhang, Y. F. u. Object retrieval based on spatially frequent items with informative patches. In Proceedings of IEEE International Conference on Multimedia and Expo, IEEE, Hanoverian, Germany, pp , [5] S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of IEEE Computer Society Conference on Conference on Computer Vision and Pattern Recognition, IEEE, New York, USA, vol. 2, pp , [6] Q. F. Zheng,. Q. ang,. Gao. Effective and efficient object-based image retrieval using visual phrases. In Proceedings of the 14th Annual ACM International Conference on Multimedia, ACM, Santa Barbara, USA, pp , [7] D. Nistér, H. Stewénius. Scalable recognition with a vocabulary tree. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, New York, USA, vol. 2, pp , [8] C. Schmid, R. Mohr. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 5, pp , [9] T. Tuytelaars, L. Van Gool. Content-based image retrieval based on local affinely invariant regions. Lecture Notes in Computer Science, Springer, vol. 1614, pp , [10] F. Schaffalitzky. A. Zisserman. Multi-view matching for unordered image sets. In Proceedings of the 7th European Conference on Computer Vision, Lecture Notes in Computer Science, Springer, Copenhagen, Denmark, vol. 2350, pp , [11] A. Baumberg. Reliable feature matching across widely separated views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Hilton Head Island, USA, vol. 1, pp , [12] K. Mikolajczyk, C. Schmid. Scale and affine invariant interest point detectors. International Journal of Computer Vision, vol. 60, no. 1, pp , [13] T. Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, vol. 30, no. 2, pp , [14] D. G. Lowe. Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp , [15] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool. A comparison of affine region detectors. International Journal of Computer Vision, vol. 65, no. 1 2, pp , [16] K. Yamamoto, R. Oi. Color correction for multi-view video using energy minimization of view networks. International Journal of Automation and Computing, vol. 5, no. 3, pp , [17] S. F. Liu, C. McMahon, M. Darlington, S. Culley, P. ild. EDCMS: A content management system for engineering documents. International Journal of Automation and Computing, vol. 4, no. 1, pp , [18] R. Baeza-Yates, B. Ribeiro-Neto. Modern Information Retrieval, ACM Press, pp , Jing Sun received the B. A. degree in Yanshan University, PRC and the M. A. degree in Dalian University of Technology, PRC in 1997 and 2002, respectively. Since 2005, she has been a Ph. D. candidate in Dalian University of Technology. She is currently a lecturer at the School of Mechanical and Engineering of Dalian University of Technology. Her research interests include feature extract, image matching, and object retrieval. (Corresponding author) Ying-Jie Xing received the B. A. degree and the M. A. degree in Harbin Institute of Technology, PRC in 1983 and 1986, respectively. He received the Ph. D. degree from University of Yamanashi, Japan in He is currently an associate professor at the School of Mechanical and Engineering of Dalian University of Technology, PRC. His research interests include feature extract, image processing, and pattern recognition.

Video Google: A Text Retrieval Approach to Object Matching in Videos

Video Google: A Text Retrieval Approach to Object Matching in Videos Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature

More information


SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191,

More information

Large Scale Image Retrieval

Large Scale Image Retrieval Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China Abstract. This paper proposes

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Bundling Features for Large Scale Partial-Duplicate Web Image Search Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research Abstract In state-of-the-art image retrieval systems, an image is

More information


A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM Karthik Krish Stuart Heinrich Wesley E. Snyder Halil Cakir Siamak Khorram North Carolina State University Raleigh, 27695

More information

Feature Based Registration - Image Alignment

Feature Based Registration - Image Alignment Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros

More information

Patch Descriptors. EE/CSE 576 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information


SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Implementing the Scale Invariant Feature Transform(SIFT) Method

Implementing the Scale Invariant Feature Transform(SIFT) Method Implementing the Scale Invariant Feature Transform(SIFT) Method YU MENG and Dr. Bernard Tiddeman(supervisor) Department of Computer Science University of St. Andrews Abstract The

More information

An Image Based 3D Reconstruction System for Large Indoor Scenes

An Image Based 3D Reconstruction System for Large Indoor Scenes 36 5 Vol. 36, No. 5 2010 5 ACTA AUTOMATICA SINICA May, 2010 1 1 2 1,,,..,,,,. : 1), ; 2), ; 3),.,,. DOI,,, 10.3724/SP.J.1004.2010.00625 An Image Based 3D Reconstruction System for Large Indoor Scenes ZHANG

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan

More information



More information

Local invariant features

Local invariant features Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest

More information

Automatic Ranking of Images on the Web

Automatic Ranking of Images on the Web Automatic Ranking of Images on the Web HangHang Zhang Electrical Engineering Department Stanford University Zixuan Wang Electrical Engineering Department Stanford University

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Video Google: A Text Retrieval Approach to Object Matching in Videos

Video Google: A Text Retrieval Approach to Object Matching in Videos Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman Robotics Research Group, Department of Engineering Science University of Oxford, United Kingdom Abstract

More information

3D model search and pose estimation from single images using VIP features

3D model search and pose estimation from single images using VIP features 3D model search and pose estimation from single images using VIP features Changchang Wu 2, Friedrich Fraundorfer 1, 1 Department of Computer Science ETH Zurich, Switzerland {fraundorfer, marc.pollefeys}

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford Video Google faces Josef Sivic, Mark Everingham, Andrew Zisserman Visual Geometry Group University of Oxford The objective Retrieve all shots in a video, e.g. a feature length film, containing a particular

More information


III. VERVIEW OF THE METHODS An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance

More information

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1 Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline

More information

Matching Local Invariant Features with Contextual Information: An Experimental Evaluation.

Matching Local Invariant Features with Contextual Information: An Experimental Evaluation. Matching Local Invariant Features with Contextual Information: An Experimental Evaluation. Desire Sidibe, Philippe Montesinos, Stefan Janaqi LGI2P - Ecole des Mines Ales, Parc scientifique G. Besse, 30035

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko, Abstract

More information

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Efficient Representation of Local Geometry for Large Scale Object Retrieval Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt. CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt. Section 10 - Detectors part II Descriptors Mani Golparvar-Fard Department of Civil and Environmental Engineering 3129D, Newmark Civil Engineering

More information



More information

Prof. Feng Liu. Spring /26/2017

Prof. Feng Liu. Spring /26/2017 Prof. Feng Liu Spring 2017 04/26/2017 Last Time Re-lighting HDR 2 Today Panorama Overview Feature detection Mid-term project presentation Not real mid-term 6

More information

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching Akshay Bhatia, Robert Laganière School of Information Technology and Engineering University of Ottawa

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

Local Feature Detectors

Local Feature Detectors Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study J. Zhang 1 M. Marszałek 1 S. Lazebnik 2 C. Schmid 1 1 INRIA Rhône-Alpes, LEAR - GRAVIR Montbonnot, France

More information

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds 9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School

More information

Structure Guided Salient Region Detector

Structure Guided Salient Region Detector Structure Guided Salient Region Detector Shufei Fan, Frank Ferrie Center for Intelligent Machines McGill University Montréal H3A2A7, Canada Abstract This paper presents a novel method for detection of

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:

More information

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection Today Indexing with local features, Bag of words models Matching local features Indexing features Bag of words model Thursday, Oct 30 Kristen Grauman UT-Austin Main questions Where will the interest points

More information

Key properties of local features

Key properties of local features Key properties of local features Locality, robust against occlusions Must be highly distinctive, a good feature should allow for correct object identification with low probability of mismatch Easy to etract

More information

CAP 5415 Computer Vision Fall 2012

CAP 5415 Computer Vision Fall 2012 CAP 5415 Computer Vision Fall 01 Dr. Mubarak Shah Univ. of Central Florida Office 47-F HEC Lecture-5 SIFT: David Lowe, UBC SIFT - Key Point Extraction Stands for scale invariant feature transform Patented

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 1. Motivation As active

More information

Shape recognition with edge-based features

Shape recognition with edge-based features Shape recognition with edge-based features K. Mikolajczyk A. Zisserman C. Schmid Dept. of Engineering Science Dept. of Engineering Science INRIA Rhône-Alpes Oxford, OX1 3PJ Oxford, OX1 3PJ 38330 Montbonnot

More information

Image Retrieval with a Visual Thesaurus

Image Retrieval with a Visual Thesaurus 2010 Digital Image Computing: Techniques and Applications Image Retrieval with a Visual Thesaurus Yanzhi Chen, Anthony Dick and Anton van den Hengel School of Computer Science The University of Adelaide

More information

A Comparison of SIFT, PCA-SIFT and SURF

A Comparison of SIFT, PCA-SIFT and SURF A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea Oubong Gwun Computer Graphics Lab, Chonbuk National

More information

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Jung H. Oh, Gyuho Eoh, and Beom H. Lee Electrical and Computer Engineering, Seoul National University,

More information

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Efficient Representation of Local Geometry for Large Scale Object Retrieval Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perd och, Ondřej Chum and Jiří Matas Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering,

More information


ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj Proceedings of the 5th International Symposium on Communications, Control and Signal Processing, ISCCSP 2012, Rome, Italy, 2-4 May 2012 ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING

More information

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting Visual Word based Location Recognition in 3D models using Distance Augmented Weighting Friedrich Fraundorfer 1, Changchang Wu 2, 1 Department of Computer Science ETH Zürich, Switzerland {fraundorfer, marc.pollefeys}

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science 1 Object Recognition in Large Databases Some material for these slides comes from

More information

A Novel Real-Time Feature Matching Scheme

A Novel Real-Time Feature Matching Scheme Sensors & Transducers, Vol. 165, Issue, February 01, pp. 17-11 Sensors & Transducers 01 by IFSA Publishing, S. L. A Novel Real-Time Feature Matching Scheme Ying Liu, * Hongbo

More information

Indexing local features and instance recognition May 14 th, 2015

Indexing local features and instance recognition May 14 th, 2015 Indexing local features and instance recognition May 14 th, 2015 Yong Jae Lee UC Davis Announcements PS2 due Saturday 11:59 am 2 We can approximate the Laplacian with a difference of Gaussians; more efficient

More information

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search Binary SIFT: Towards Efficient Feature Matching Verification for Image Search Wengang Zhou 1, Houqiang Li 2, Meng Wang 3, Yijuan Lu 4, Qi Tian 1 Dept. of Computer Science, University of Texas at San Antonio

More information

The most cited papers in Computer Vision

The most cited papers in Computer Vision COMPUTER VISION, PUBLICATION The most cited papers in Computer Vision In Computer Vision, Paper Talk on February 10, 2012 at 11:10 pm by gooly (Li Yang Ku) Although it s not always the case that a paper

More information

Lecture 10 Detectors and descriptors

Lecture 10 Detectors and descriptors Lecture 10 Detectors and descriptors Properties of detectors Edge detectors Harris DoG Properties of detectors SIFT Shape context Silvio Savarese Lecture 10-26-Feb-14 From the 3D to 2D & vice versa P =

More information

Automatic Image Alignment

Automatic Image Alignment Automatic Image Alignment with a lot of slides stolen from Steve Seitz and Rick Szeliski Mike Nese CS194: Image Manipulation & Computational Photography Alexei Efros, UC Berkeley, Fall 2018 Live Homography

More information

Large-scale Image Search and location recognition Geometric Min-Hashing. Jonathan Bidwell

Large-scale Image Search and location recognition Geometric Min-Hashing. Jonathan Bidwell Large-scale Image Search and location recognition Geometric Min-Hashing Jonathan Bidwell Nov 3rd 2009 UNC Chapel Hill Large scale + Location Short story... Finding similar photos See: Microsoft PhotoSynth

More information

Determinant of homography-matrix-based multiple-object recognition

Determinant of homography-matrix-based multiple-object recognition Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom

More information

Object Recognition with Invariant Features

Object Recognition with Invariant Features Object Recognition with Invariant Features Definition: Identify objects or scenes and determine their pose and model parameters Applications Industrial automation and inspection Mobile robots, toys, user

More information

Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier

Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier Stefan Hinterstoisser 1, Selim Benhimane 1, Vincent Lepetit 2, Pascal Fua 2, Nassir Navab 1 1 Department

More information

NOWADAYS, the computer vision is one of the top

NOWADAYS, the computer vision is one of the top Evaluation of Interest Point Detectors for Scenes with Changing Lightening Conditions Martin Zukal, Petr Cika, Radim Burget Abstract The paper is aimed at the description of different image interest point

More information

Instance-level recognition part 2

Instance-level recognition part 2 Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,

More information

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability Midterm Wed. Local features: detection and description Monday March 7 Prof. UT Austin Covers material up until 3/1 Solutions to practice eam handed out today Bring a 8.5 11 sheet of notes if you want Review

More information

arxiv: v3 [] 3 Oct 2012

arxiv: v3 [] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Feature Detection and Matching

Feature Detection and Matching and Matching CS4243 Computer Vision and Pattern Recognition Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (CS4243) Camera Models 1 /

More information

Keypoint-based Recognition and Object Search

Keypoint-based Recognition and Object Search 03/08/11 Keypoint-based Recognition and Object Search Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Notices I m having trouble connecting to the web server, so can t post lecture

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception car

More information


CHAPTER 1.5 PATTERN RECOGNITION WITH LOCAL INVARIANT FEATURES CHAPTER 1.5 PATTERN RECOGNITION WITH LOCAL INVARIANT FEATURES C. Schmid 1, G. Dorkó 1, S. Lazebnik 2, K. Mikolajczyk 1 and J. Ponce 2 1 INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 38330 Montbonnot,

More information

A performance evaluation of local descriptors

A performance evaluation of local descriptors MIKOLAJCZYK AND SCHMID: A PERFORMANCE EVALUATION OF LOCAL DESCRIPTORS A performance evaluation of local descriptors Krystian Mikolajczyk and Cordelia Schmid Dept. of Engineering Science INRIA Rhône-Alpes

More information

Local features: detection and description May 12 th, 2015

Local features: detection and description May 12 th, 2015 Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, By Suren Manvelyan, By Suren Manvelyan, By Suren Manvelyan,

More information

Recognition. Topics that we will try to cover:

Recognition. Topics that we will try to cover: Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) Object classification (we did this one already) Neural Networks Object class detection Hough-voting techniques

More information

Combining Appearance and Topology for Wide

Combining Appearance and Topology for Wide Combining Appearance and Topology for Wide Baseline Matching Dennis Tell and Stefan Carlsson Presented by: Josh Wills Image Point Correspondences Critical foundation for many vision applications 3-D reconstruction,

More information

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian. Announcements Project 1 Out today Help session at the end of class Image matching by Diva Sian by swashford Harder case Even harder case How the Afghan Girl was Identified by Her Iris Patterns Read the

More information

Indexing local features and instance recognition May 16 th, 2017

Indexing local features and instance recognition May 16 th, 2017 Indexing local features and instance recognition May 16 th, 2017 Yong Jae Lee UC Davis Announcements PS2 due next Monday 11:59 am 2 Recap: Features and filters Transforming and describing images; textures,

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

TRECVid 2013 Experiments at Dublin City University

TRECVid 2013 Experiments at Dublin City University TRECVid 2013 Experiments at Dublin City University Zhenxing Zhang, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton INSIGHT Centre for Data Analytics Dublin City University Glasnevin, Dublin 9, Ireland

More information

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013 CS4670 / 5670: Computer Vision Noah Snavely Bag-of-words models Object Bag of words Bag of Words Models Adapted from slides by Rob Fergus and Svetlana Lazebnik 1 Object Bag of words Origin 1: Texture Recognition

More information

Instance-level recognition II.

Instance-level recognition II. Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale

More information



More information

Local Features Tutorial: Nov. 8, 04

Local Features Tutorial: Nov. 8, 04 Local Features Tutorial: Nov. 8, 04 Local Features Tutorial References: Matlab SIFT tutorial (from course webpage) Lowe, David G. Distinctive Image Features from Scale Invariant Features, International

More information

Image Matching. AKA: Image registration, the correspondence problem, Tracking,

Image Matching. AKA: Image registration, the correspondence problem, Tracking, Image Matching AKA: Image registration, the correspondence problem, Tracking, What Corresponds to What? Daisy? Daisy From: Relevant for Analysis of Image Pairs (or more) Also Relevant for

More information

Wide Baseline Matching using Triplet Vector Descriptor

Wide Baseline Matching using Triplet Vector Descriptor 1 Wide Baseline Matching using Triplet Vector Descriptor Yasushi Kanazawa Koki Uemura Department of Knowledge-based Information Engineering Toyohashi University of Technology, Toyohashi 441-8580, JAPAN

More information

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale. Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe presented by, Sudheendra Invariance Intensity Scale Rotation Affine View point Introduction Introduction SIFT (Scale Invariant Feature

More information

Scalable Recognition with a Vocabulary Tree

Scalable Recognition with a Vocabulary Tree Scalable Recognition with a Vocabulary Tree David Nistér and Henrik Stewénius Center for Visualization and Virtual Environments Department of Computer Science, University of Kentucky

More information

Visual Object Recognition

Visual Object Recognition Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department

More information

Coarse-to-fine image registration

Coarse-to-fine image registration Today we will look at a few important topics in scale space in computer vision, in particular, coarseto-fine approaches, and the SIFT feature descriptor. I will present only the main ideas here to give

More information

SIFT: Scale Invariant Feature Transform

SIFT: Scale Invariant Feature Transform 1 / 25 SIFT: Scale Invariant Feature Transform Ahmed Othman Systems Design Department University of Waterloo, Canada October, 23, 2012 2 / 25 1 SIFT Introduction Scale-space extrema detection Keypoint

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

Scalable Recognition with a Vocabulary Tree

Scalable Recognition with a Vocabulary Tree Scalable Recognition with a Vocabulary Tree David Nistér and Henrik Stewénius Center for Visualization and Virtual Environments Department of Computer Science, University of Kentucky

More information

Local Image Features

Local Image Features Local Image Features Ali Borji UWM Many slides from James Hayes, Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial Overview of Keypoint Matching 1. Find a set of distinctive key- points A 1 A 2 A 3 B 3

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Image Retrieval (Matching at Large Scale)

Image Retrieval (Matching at Large Scale) Image Retrieval (Matching at Large Scale) Image Retrieval (matching at large scale) At a large scale the problem of matching between similar images translates into the problem of retrieving similar images

More information