Encoding Spatial Context for Large-Scale Partial-Duplicate Web Image Retrieval

Size: px
Start display at page:

Download "Encoding Spatial Context for Large-Scale Partial-Duplicate Web Image Retrieval"

Transcription

1 Zhou WG, Li HQ, Lu Y et al. Encoding spatial context for large-scale partial-duplicate Web image retrieval. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 29(5): Sept DOI /s Encoding Spatial Context for Large-Scale Partial-Duplicate Web Image Retrieval Wen-Gang Zhou 1,2 ( ), Hou-Qiang Li 1,2 ( ), Yijuan Lu 3 ( ), Member, ACM, IEEE and Qi Tian 4 ( ), Senior Member, IEEE 1 Chinese Academy of Sciences Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China, Hefei , China 2 Department of Electronic Engineering and Information Science, University of Science and Technology of China Hefei , China 3 Department of Computer Science, Texas State University, San Marcos, TX 78666, U.S.A. 4 Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, U.S.A. {zhwg, lihq}@ustc.edu.cn; yl12@txstate.edu; qitian@cs.utsa.edu Received March 15, 2014; revised July 15, Abstract Many recent state-of-the-art image retrieval approaches are based on Bag-of-Visual-Words model and represent an image with a set of visual words by quantizing local SIFT (scale invariant feature transform) features. Feature quantization reduces the discriminative power of local features and unavoidably causes many false local matches between images, which degrades the retrieval accuracy. To filter those false matches, geometric context among visual words has been popularly explored for the verification of geometric consistency. However, existing studies with global or local geometric verification are either computationally expensive or achieve limited accuracy. To address this issue, in this paper, we focus on partialduplicate Web image retrieval, and propose a scheme to encode the spatial context for visual matching verification. An efficient affine enhancement scheme is proposed to refine the verification results. Experiments on partial-duplicate Web image search, using a database of one million images, demonstrate the effectiveness and efficiency of the proposed approach. Evaluation on a 10-million image database further reveals the scalability of our approach. Keywords large-scale image retrieval, spatial context coding, spatial verification, affine estimation 1 Introduction Recent years have witnessed the explosive growth of images accessible on the Web. Image retrieval lends itself as a straightforward tool to help users find images of interest from a large image corpus. Traditional image search systems commonly utilize the textual information such as the surrounding text for indexing. Since textual information may be inconsistent with the visual content, content-based image search (CBIR) is desired and attracts lots of attention. In CBIR, a hot topic is partial-duplicate Web image search [1-4]. Given a query image, the target is to find its partial-duplicate versions in a large Web image database. There are many potential applications of such a system, for instance, finding out where an image is derived from and getting more information about it, tracking the appearance of an image online, detecting image copyright violation, and so on. Partial-duplicate image retrieval task is different in some sense from the image-based object retrieval. In image-based object retrieval, the main challenge is image variation due to 3D view-point change, illumination change, or object-class variability [1]. Partial-duplicate Web image retrieval differs in that the target images are usually obtained by Web users who edit the original image with changes in color, scale, rotation, partial occlusion, compression rate, etc. In partial-duplicate Web images, different Regular Paper This work was supported in part to Dr. Wen-Gang Zhou by the Fundamental Research Funds for the Central Universities of China under Grant Nos. WK and WK , the Start-Up Funding from the University of Science and Technology of China under Grant No. KY , the Open Project of Beijing Multimedia and Intelligent Software Key Laboratory in Beijing University of Technology, and the sponsor from Intel ICRI MNC project, in part to Dr. Hou-Qiang Li by the National Natural Science Foundation of China (NSFC) under Grant Nos , , and , in part to Dr. Yijuan Lu by the Army Research Office (ARO) of USA under Grant No. W911NF and the National Science Foundation of USA under Grant No. CRI , and in part to Dr. Qi Tian by ARO under Grant No. W911NF and the Faculty Research Award by NEC Laboratories of America, respectively. This work was supported in part by NSFC under Grant No A preliminary version of the paper was published in the Proceedings of MM Springer Science + Business Media, LLC & Science Press, China

2 838 J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5 parts are often cropped from the original image and pasted in the target image with modifications. The result is a partial-duplicate version of the original image with different appearances but still sharing some duplicated patches. Some examples are shown in Fig.1. Fig.1. Some examples of partial-duplicate Web images. In large-scale image retrieval systems, the state-ofthe-art approaches[5-18] leverage scalable textual retrieval techniques for image search. Similar to text words in information retrieval, local features, such as SIFT[19], are quantized to visual words[5,20]. Inverted file structure is then adopted to index images via the contained visual words[5,21]. However, compared with text words in information retrieval, the descriptive power of visual words is very limited. And with the increasing size of image database (e.g., greater than one million images) to be indexed, the discriminative power of visual words decreases sharply. Unlike text words in information retrieval, the 2D geometric relationship among visual words in the image plane plays a key role in visual content identification. Geometric verification[1-2,10-11,14,19,22-24] has become very popular recently as an important postprocessing step to improve the retrieval precision. Due to the expensive computational cost, full geometric verification is usually only applied to some top-ranked candidate images, which may fail to identify those relevant results with initial low rank. Some locally spatial verification approaches are developed to trade off the precision and efficiency, including spatially nearest neighbors[5] and bundled feature approach[1]. Above all, based on the Bag-of-Visual-Words model[5], image retrieval mainly relies on improving the discrimination of visual words by reducing feature quantization loss and embedding geometric consistency. In this paper, we propose to address partialduplicate Web image search with a spatial context coding strategy. Our approach is based on the Bag-ofVisual-Words model. Two local features from two images are considered as a match, if the two features are quantized to the same visual word. To verify the local matches between two images, we propose a novel scheme to encode the relative spatial context of local features in images into a binary map. Then through efficient spatial verification based on spatial maps, the false matches of local features can be removed effectively. The verification results are further refined with an affine enhancement scheme. This work is built on our previous work[15] to conduct efficient geometric verification to improve the image retrieval results. The difference lies in three aspects. Firstly, the image plane division for spatial context encoding is more flexible in this work. Unlike [15, 23], we do not require the image plane division to be quadrantwise for the generation of multiple orthogonal separate coding maps. Instead, we generalize the horizontal and vertical maps in [15, 23] into a uniform formulation. Secondly, a new affine enhancement scheme is proposed to address the SIFT drifting issue so as to improve the geometric verification results. Thirdly, in this work, we demonstrate the scalability of our method in a 10million scale image database. The rest of the paper is organized as follows. In Section 2, we introduce the related work. Then, our spatial context encoding scheme is discussed in Section 3. In Section 4, we discuss the implementation details of the image search framework. In Section 5, experimental results are provided. Finally, we make the conclusion in Section 6. 2 Related Work In the past decade, large-scale image retrieval[1,5-12,14-15,25-27] has been significantly boosted by two pioneering studies. The first one is the introduction of local invariant features, such as SIFT[19] and Edge-SIFT[28], for image representation. The second one is the scalable image indexing scheme based on the Bag-of-Visual-Words model[5]. With local features quantized to visual words, image representation can be very compact. Scalability is achieved by the invertedfile index structure and the amount of candidate images is greatly reduced, since we only need to check those images sharing common visual words with the query image. In image retrieval, the retrieval quality (accuracy) is usually measured by the criterion of average precision, which takes consideration of both precision performance and recall performance. To obtain a high average precision, both precision performance and recall performance are required to be good enough. Bias on precision or recall may lead to limited average precision and poor retrieval quality. Based on the Bag-of-Visual-Words model in image search, generally, local feature descriptors quantized to the same visual word are considered to match to each other[5]. Since the dimensionality of SIFT descriptor space is as high as 128, the quantization error is not easy to control. As a result, some noisy local matches

3 Wen-Gang Zhou et al.: Encoding Spatial Context for Image Retrieval 839 are easily incurred, which degrades image retrieval accuracy. To reduce the quantization loss, two different types of approaches have been proposed recently. The first category focuses on exploring individual feature. Some approaches work on reducing quantization error. One of the representative researches is softquantization [12,26,29-30], which quantizes an SIFT descriptor to multiple visual words. Hamming Embedding [10] enriches the visual word with compact information from its original local descriptor with binary signature, and feature scale and orientation values are used to filter false matches. Some other approaches try to enrich the feature representation of images. For instance, Chum et al. [7,25,31] reissued the highly ranked images from the original query to expand as new queries to boost the recall performance. Jégou et al. [13] proposed a compact image representation scheme by aggregating local descriptors in images. It jointly optimizes the dimension reduction and indexing to preserve the quality of vector comparison. Zhang et al. [32] proposed a novel query-sensitive ranking algorithm to rank PCA-based binary hash codes to address neighbor search problem for image search. Above all, since the focus of our work is the post-processing stage, the above quantization methods can be combined with our spatial context coding scheme for better retrieval accuracy. The approaches in the second category concentrate on utilizing the geometric context among local features to suppress false matches [2]. The geometric verification approaches can be categorized into global verification and local verification. Global verification approach is to exploit geometric relationships of all features in a whole image. Full geometric verification based on RANSAC [33] and its variants [10-11,19,34] is the representative global verification method. Although full geometric verification can greatly improve the retrieval precision, it usually incurs expensive computational cost. Therefore, it is usually only applied to a subset of the top-ranked candidate images, resulting in limited retrieval accuracy. In [14], a novel geometric-preserving visual phrase (GVP) is proposed to explore the visual word co-occurrences in the neighborhood areas. And image search is realized by identifying the shared GVPs between images. In VisualSEEk [35], 2-D strings [36] are adopted to represent images with multiple color regions for comparison. The 2-D string approach represents an image as a symbolic projection along the horizontal and vertical directions [36]. Then, the image retrieval problem is converted to a problem of 2-D sequence matching. Our approach also makes use of the relative position between features. However, our approach adopts different representations from the 2D string approach [36] and can flexibly control the strictness of the spatial context encoding. Besides, our approach can flexibly address the rotation changes and enjoys much lower computational complexity in verification than the 2D string approach [36]. On the other hand, some local verification approaches are proposed to check the spatial consistency of features in local areas instead of the entire image plane. In [5], loose spatial consistency from some spatially nearest neighbors is imposed to filter false visualword matches. Such a loose scheme, although efficient, is sensitive to the image noise incurred by editing. Wu et al. [1] proposed to weight local features with the comparison results of the corresponding bundled features. The bundled feature is a group of local features in an MSER region [37]. The MSER regions are defined by an extremal property of the intensity function in the region and on its outer boundary, and are detected as stable regions across a threshold range from watershed-based segmentation [37]. Bundled features are compared by the shared visual word amount and the relative ordering of matched visual words. The matching score of bundled features is used to weight the visual word vote for image similarity. In [8, 38], min-hash is proposed to describe images by independently selecting a set of visual words as global descriptors and to define image similarity as the visual word set overlap. In [9], geometric min-hashing constructs repeatable hash keys with loosely local geometric information for more discriminative description. The potential concern of min-hashing [8,38] and its variant [9] is that although high retrieval precision can be achieved, the retrieval recall performance may be limited unless many more hashing tables are involved, which, however, imposes severer memory burden. Compared with global geometric verification, in general cases, local geometric verification is more computationally efficient for realtime retrieval systems with the sacrifice of retrieval accuracy. Since spatial context plays a key role in the visual content identification and understanding, it has also been widely explored for object classification and recognition in computer vision community. Spatial Pyramid Matching (SPM) [39] represents rigid spatial information by quantizing the image space for scene recognition but cannot address transformation changes. In [40], shape context is proposed to match shapes for object recognition. At a reference point, shape context captures the distribution of the remaining points relative to it. In [41], correlograms of visual words are quantized to compact correlations to model object shape for classification. Yuan et al. [42] proposed to discover meaningful

4 840 J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5 spatially co-occurring pattern of visual words, called visual phrases. Zhang and Chen [43] proposed to identify high-order spatial features by hashing matched local features into an offset space. Visual phrase based approaches are promising in the application of limited object classification and recognition. But in large-scale image retrieval with a large codebook of visual words, it is intractable to explore the visual phrase of all potential objects. In this paper, we propose an efficient full geometric verification method for large-scale partial-duplicate Web image search. We explicitly encode the spatial context among local features of an image into a coding map. Based on the invariance merit of SIFT feature [19], the coding map captures invariance to translation, rotation, and scaling. Based on the coding maps, we propose an efficient spatial verification scheme to effective identify and remove those spatially inconsistent local matches, which consequently significantly boost the retrieval accuracy with small computational overhead. 3 Our Method The spatial relationship among visual words in an image is critical in identifying duplicate image patches. Assume we have obtained the local feature matching results between two images by feature quantization and the matching results are polluted by some false (spatially inconsistent) matches. To efficiently identify those false matches, we propose to encode the spatial context for spatial verification. We first introduce the spatial context encoding scheme in Subsection 3.1. After that, we discuss how to perform geometric verification in Subsection 3.2. In Subsection 3.3, we discuss how to include those mis-removed pairs with the enhancement based on affine estimation. 3.1 Spatial Context Encoding We encode the relative positions between features in an image plane. A binary spatial map S is generated to describe the relative position between each feature pair along the horizontal direction. For instance, given an image i with K features {v i }, (i = 1,, K), its spatial map S is defined as follows: { 0, if vj is left to v i, S(i, j) = (1) 1, otherwise. In the spatial map S, row i records other features spatial relationships with feature v i in the image. We can also interpret the map as follows. In row i, feature v i is selected as the origin, and the image plane is uniformly divided into two parts along the vertical line. The spatial map then shows in which side other features are located. Therefore, one bit either 0 or 1 can encode the relative spatial position of one feature to another. The spatial context encoded by S in (1) is relatively loose. To describe the spatial relationship of local features more strictly, we advance our encoding scheme to a more general case. In other words, before comparing features relative positions along the horizontal line, we rotate the image plane counterclockwise by a series of r predefined angles θ k = k π r, (k = 0,..., r 1) and then encode the new relative spatial relationship of local features for each rotation into a new coding map. After that, all coding maps are concatenated into a 3-D array G as a general coding map. For instance, after rotating the image plane counterclockwise by θ k according to the image origin point, the new location (x (k) i, y (k) i ) of feature v i can be derived by its original location (x i, y i ): ( (k) ) ( ) ( ) x i cos(θ) sin(θ) xi y (k) =. sin(θ) cos(θ) y i i Then, the generalized spatial map G can be defined as follows: { 0, if v (k) j is left to v (k) i, G(i, j, k) = 1, otherwise. The spatial map described above only captures invariance to translation and scaling change. Invariance to rotation change can be flexibly enhanced by considering the SIFT characteristic orientation when constructing spatial coding map. In other words, instead of checking whether a test feature v (k) j is on the left-hand side of the vertical line passing through a reference feature point v (k) i, we can check whether v (k) j is on the lefthand side of the line passing through v (k) i with normal direction exactly along the orientation of the reference feature point. With the above modification, the new coding map will capture invariance to rotation, as well as translation and scaling change. Our spatial context encoding scheme is a kind of quantization of the image plane. In this sense, it degenerates to model the local histograms as in spatial pyramid matching [39]. From the discussion above, it can be seen that, it is computationally efficient to encode the spatial context. However, the whole spatial maps of all features in an image will cost considerable memory. Fortunately, there is no need to store these maps in our image search scenario. In fact, for each feature, we only need to store the x- and y-coordinate and the characteristic orientation. When checking the feature matching of two images, only the coordinates and the characteristic orientations of these matched features are needed to generate the spatial maps for spatial verification in real time. The details about spatial verification are discussed in the next subsection.

5 Wen-Gang Zhou et al.: Encoding Spatial Context for Image Retrieval Local Matching Verification In image search, after feature quantization, we can obtain the feature matching results between images. Denote that a query image Iq and a matched image Im are found to share N pairs of matched SIFT features through vector quantization. Then the corresponding sub-spatial-maps of these matched features of both Iq and Im can be generated and denoted as Gq and Gm. If the query image and the matched image are partial duplicate, it is likely that M out of N matches correspond to the duplicate patch and the spatial configuration of those M matched features in two images is very similar, with potential changes in translation, scale, or rotation. Since our spatial maps can handle those changes, the corresponding spatial maps of the M matched features in the two images, which are sub-maps of Gq and Gm, are identical. Therefore, the spatial verification problem can be formulated as identifying a sub-map corresponding to those spatially consistent matches from the complete spatial maps. In our spatial verification, instead of directly identifying spatially consistent matches, we proceed to determine and remove those spatially inconsistent matches. After all those false matches are discarded, the remained matches are the expected spatially consistent results. In fact, it is difficult to discover all spatially inconsistent matches once for all. However, it is possible to find one match that is most likely to be one of the false matches. Intuitively, such a match will incur the most spatial inconsistency. Therefore, the key lies in how to formulate the spatial inconsistency. In the following, we will discuss our spatial verification scheme in details. To investigate the spatial inconsistency of matches, we perform logical exclusive-or (XOR) operation on Gq and Gm : f (i) = r N [ X V (i, j, k). (2) j=1 k=1 If there exists i with f (i) > 0, we define i = arg mini f (i), then the i-th pair will be most likely to be a false match and should be removed. Consequently, we can iteratively remove such mismatching pairs until the maximal value of f is zero. The computational complexity of spatial verification is O(r N 2 ). It should be noted that in the iteration loop, when updating S, we just need to subtract the values of the inconsistent match, instead of recomputing it by (2). Fig.2 shows four real examples of the spatial verification results on two relevant and two irrelevant image pairs. All image pairs have many initial local matches after feature quantization. It can be observed that for those relevant partial-duplicate images, our verification scheme can effectively identify and remove those spatially inconsistent matches. On the other hand, for those irrelevant image pairs, only very few, say, one or two, local matches can pass our spatial verification scheme. V = Gq Gm. Ideally, if all N matches are geometrically consistent, V are zero for all their entries. If some false matches exist, the entries of these false matches on Gq and Gm may be inconsistent. Those inconsistencies will cause the corresponding exclusive-or result in V to be 1. In other words, the relative spatial consistency of feature j to the referred feature i is encoded with an r-bit vector: C = (V (i, j, 1),, V (i, j, r)). And if feature i and feature j are geometrically inconsistent in two images, at least one bit in C willsbe 1. Therefore, the r union (logical OR) operation k=1 V (i, j, k) expresses the geometric inconsistency status of the two features along all rotated directions. As a result, we define the inconsistency of feature i with all other features as: Fig.2. Real examples of spatial verification results with spatial context coding on (a) relevant and (b) irrelevant image pairs. Each line denotes a local match across two images, with the line endpoint at the key point location of the corresponding SIFT feature. The red lines denote those local matches passing our spatial verification while the blue lines denote those that fail to pass the verification. 3.3 Affine Estimation for Enhancement Due to the unavoidable digital error in the detection of key points, some SIFT features exhibit some

6 842 small drifting. With such drifting error, the relative location of features with the same x- or y-coordinate might be inverse, causing some true matches to fail to pass our spatial verification. Such a phenomenon is prevalent in the case when two duplicate images share many matched feature pairs in a cluster. Besides, small affine transformation of images will exacerbate such error. Moreover, the error will also be worsened by large spatial coding factor r. As illustrated in Fig.3, although the duplicate image pair shares 12 true matches, five of them are discovered as false matches and fail to pass the spatial verification. In this subsection, we propose to address such SIFT drifting problem by affine estimation based on matched pairs passing our spatial verification. J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5 tion model. Then the mapping error of the matching feature point pair is p e = (x x )2 + (y y )2. To check the validness of the matching, we set a threshold T such that, if the mapping error of a matched pair that fails to pass spatial verification is less than T, we will consider that pair as a true match. Since the duplicated images may undergo scale changes, it is reasonable to weight the threshold with a factor reflecting the scale difference. Consequently, T is defined as T = s τ, where τ is a weighting constant, and s is the average scale ratio between all true positive matches, defined as: M s= Fig.3. Instance of matching error due to SIFT drifting. The red lines denote the true matches that pass our spatial verification, while the blue lines denote those that fail. r = 6. Since many true matches are still remained, the matched images will be assigned a comparatively high similarity score (defined as the number of local matches passing spatial verification) and consequently the effect of those false negatives on the retrieval performance is very small. In fact, we can also discover those false negatives by means of those discovered spatialconsistent matches. In other words, we can estimate the affine transformation from target image to query image. Generally, there are six parameters of an affine transformation. Let (u, v) and (x, y) be the coordinates of the matched feature points in the matched image and the query image, respectively. Then the affine transformation between (u, v) and (x, y) can be expressed as: µ µ µ µ t u m1 m2 x + 1. = t2 v m3 m3 y The least-squares solution for the parameters p = (m1, m2, m3, m4, t1, t2 ) can be determined by solving a corresponding normal equation[19]. After estimating the parameters of the affine transformation, we can check the other matching pairs that fail to pass the spatial verification. In details, assume (x, y ) is the feature point in the query image estimated by the transforma- X scl q 1 i t, M scl i i=1 where scl qi and scl ti are the scale value of the i-th truly matched SIFT features from the matched image and the query image, respectively. In our experiment, we set τ = 5. Finally, false negative matching pairs due to SIFT drifting will be identified and kept. It should be noted that, unlike RANSAC, our affine model estimation is performed only once. Therefore, it is efficient in implementation. With our affine estimation for enhancement, we can effectively and efficiently recover those false negative matches failing to pass our spatial verification due to SIFT drifting. 4 Retrieval Framework In this section, we introduce our retrieval framework for large-scale partial-duplicate Web image search. In our approach, we adopt the DoG SIFT feature[19] for image representation. We use the SIFT descriptor for quantization, as discussed in Subsection 4.1. From feature quantization, the local matches between images can be identified for spatial verification with spatial coding, as discussed in Subsection 4.2. In Subsection 4.3, we discuss the inverted-file index structure for retrieval. 4.1 SIFT Feature Quantization To build a large-scale image indexing and retrieval system, we need to quantize SIFT descriptors to visual words. Based on the quantization results, images can be readily indexed with inverted file structure. On the other hand, local matches between images can be efficiently determined. If two local features from different images are quantized to a same visual word, the two local features are

7 Wen-Gang Zhou et al.: Encoding Spatial Context for Image Retrieval considered a local match across the two images. From this perspective, quantization works to filter irrelevant features of database images and focuses on only a very small set of features, which are likely to match the query feature. In our feature quantization, the hierarchical visual vocabulary tree approach[6] is adopted as the quantizer. The leaf nodes of the tree are considered as visual words. Each SIFT descriptor is mapped to the scalar ID of the closest visual word. 4.2 Spatial Verification for Retrieval Once the local matches between images are obtained, we can perform spatial verification with spatial maps to remove false matches and impose affine enhancement to avoid mis-filtering true matches, as discussed in Section 3. As a result, a number of local matches are regarded as spatially consistent and kept. The similarity scores between images are defined by counting the number of such true matches. We formulate the image retrieval as a voting problem. Each visual word in the query image votes on its matched images. Intuitively, the tf-idf weight[5] can be used to distinguish different matched features. In our experiments, we simply count the number of matched features which yield similar results and incur less computation. In other words, the similarity between images is defined as the set cardinality of matched visual words passing the verification. 4.3 Index Structure An inverted-file index structure is used for scalable indexing and retrieval. Each visual word has an entry 843 in the index that contains the list of images in which the visual word appears. For each indexed feature, we store its image ID, SIFT scale and orientation value, and its x- and y-coordinate. For each of the four geometric items, we re-scale its range to be between 0 and 255 and represent it with only one byte. In our implementation, we allocate 8 bytes to index a feature, 4 bytes for image ID, and 1 byte for each of the four geometric items. 5 Experiments We conduct experiments on large-scale image database. In Subsection 5.1, we introduce the experiment setup. Then, in Subsection 5.2, we discuss the evaluation results on the one-million image database. After that, we provide the results on the 10.2-million image database. 5.1 Experiment Setup We build our basic dataset by randomly crawling one million images from the Web. The basic dataset is used as distracting images. Two ground truth datasets are used for the evaluation of the partial-duplicate Web image retrieval. The first dataset is collected by ourselves. Following the Tineye search demo results①, we collected and manually labeled partially duplicated Web images of 23 groups from both Tineye② and Google Image search. Some sample images are shown in Figs. 1, 2, and 4. We denote this dataset as DupData. The second dataset is the Copydays dataset③, which contains images generated from the 157 original images, with various editing④, such as cropping, scaling, rotation, JPEG compression, printing and scanning, blur- Fig.4. Six groups of sample images from the DupData dataset. (a) Titanic. (b) Seven-Eleven logo. (c) Energy Star logo. (d) Uncle Sam. (e) Starbucks logo. (f) Apollo. ① searches, Aug ② Tineye Image Search, Aug ③ jegou/data.php/, Aug ④ generator/, Aug

8 844 J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5 ring, and painting. Some sample images of this dataset are shown Fig.1(b). To evaluate the performance with respect to the database size, three smaller databases (50 K, 200 K, and 500 K) are built by sampling the basic dataset. For the DupData dataset, we randomly select query images from each group of our ground truth dataset. The number of query images from each group is proportional to the group size. Finally, 100 representative query images are obtained from the ground truth dataset. For the Copydays dataset, we select 229 images of the challenging strong category as queries. Mean average precision (map) defined in (3) is used as our evaluation metric. map = 1 Q Q i=1 n k=1 P i(k) rel i (k) R i, (3) where Q denotes the number of query images, R i denotes the number of relevant results for the i-th query image, P i (k) denotes the precision of top k retrieval results of the i-th query image, rel i (k) is an indicator function equaling 1 when the k-th retrieval result of the i-th query image is a relevant image and 0 otherwise, and n denotes the total number of returned results. For the DupData and the Copydays dataset, Q is set as 100 and 229, respectively. In the DupData dataset, R i ranges between 30 and 70 for different categories while in the Copydays dataset, R i ranges from 20 to 24. n is selected as in our experiments, since P i (n) is usually very small when n is larger than and contributes little to the average precision. In our approach, the key parameter is the coding map factor r. The selection of r determines the division extent of image plane to check the relative spatial position of features. We have tested different settings of r on 1-million image database with the Copydays dataset as ground truth. As shown in Fig.5, when r in Fig.5. Impact of r to the map performance on the Copydays ground truth dataset mixed with the 1-million image database. creases, the map performance first grows sharply, then becomes relatively stable and gradually declines after reaching the peak. This is due to the fact that larger r defines stricter relative spatial relationship and increases the discriminative capability to identify those false positive matches. However, when r is too large, say, over 6, the SIFT drifting error will be magnified. Since that error is suppressed by the affine enhancement discussed in Subsection 3.3, the drop in map is gentle. The best performance is achieved when r equals 6, which is selected in the following experiments. Our approach identifies those spatially inconsistent matches from the initial local feature matches between images. In the experimental study, it is observed that, for all those images passing our geometric verification, the average number of feature matches between images before verification is After the spatial verification, only 6.1 features in average are left for those verified images. 5.2 Evaluation Comparison Algorithms. We take the Bag-of-Visual- Words method with visual vocabulary tree (VVT) [6] as the baseline approach. A visual vocabulary of 1 M visual words is adopted, which is trained with 30 million SIFT features sampled from 300 million features of an independent image database. In fact, we have experimented with different visual codebook sizes, and found the 1 M vocabulary yields the best overall performance on the VVT approach. Such visual vocabulary tree is also used for feature quantization in our approach. Two other approaches are adopted to enhance the VVT method for comparison. The first one is reranking via full geometric verification [34], which is based on the estimation of an affine transformation with our implementation of [11]. As a post-processing step, the re-ranking is performed on only the top 300 initial results of the VVT approach. We call this method reranking. The second one is the Bundled Feature [1] approach, which assembles local SIFT features in MSER regions and compares the feature bundles by their share visual word amount and the relative ordering consistency of matched features. We denote this approach as BF. Accuracy. We evaluate all approaches on the Dup- Data dataset and the Copydays dataset, respectively. The map comparison results on the DupData dataset are illustrated in Fig.6. From Fig.6, it can be observed that our approach outperforms the other three methods in retrieval accuracy. With the increase of the image database size, the map performance of all approaches decreases. When the image database size is very small (50 K), the map of our approach is very close to that

9 Wen-Gang Zhou et al.: Encoding Spatial Context for Image Retrieval 845 of re-ranking method. On the 1 M dataset, the map of VVT is Our approach increases it to 0.66, a relatively 35% improvement. Since re-ranking based on full geometric verification is only applied on the top- 300 initial returned results, its performance is highly determined by the recall of initial top-300 results. As can be seen, the performance of re-ranking the results of baseline is 0.61, still 0.05 lower than our approach. Besides, our approach also outperforms BF. detection accuracy and the repeatability of the MSER detection. In partial-duplicate images, the MSER detection can be disturbed by various editing and degrade the effectiveness of the BF approach. Efficiency. We perform the experiments on a server with 2.0 GHz CPU and 16 GB memory. Fig.8 shows the average time cost of all the four approaches. The time for SIFT feature extraction is not included. It takes the VVT 0.90 seconds to perform one image query on average, while for our approach, the average query time cost is 1.1 seconds. Bundled Feature approach is much more time consuming. It takes about 3 seconds in average, which is twice more than our method. Full geometric re-ranking is the most time-consuming, introducing an additional 4.6 seconds over the VVT, although it just verifies the top 300 candidate images. Fig.6. Performance comparison of different methods with different database sizes on the DupData ground truth dataset. The map comparison results on the Copydays dataset are illustrated in Fig.7. The map performance of all comparison algorithms on this dataset exhibits similar trend to that in the Copydays dataset. Our approach achieves steadily the best performance over all the comparison methods in different sizes of image database. The performance of BF is better than the VVT approach but poorer than the other two comparison algorithms. This is partially due to the reason that, different from the other three algorithms, the effectiveness of the BF approach is strongly dependent on the Fig.8. Comparison of average query time cost for different methods on the 1 M database. Memory Cost. In large-scale image search, the memory cost is mainly spent on the index file whose size is proportional to the number of indexed features and the memory cost per indexed feature. Since the number of indexed features is the same for all approaches, we only need to compare the memory cost per indexed feature. The comparison results are shown in Table 1. The VVT approach [6] costs 8 bytes to index one feature, 4 bytes to store one image ID, and another 4 bytes to store the tf-idf weight. In BF [1], for each indexed feature, it has to store the image ID (4 bytes), the number of bundled cell (1 byte), and a list of bundled cell information (4 bytes for each one). For different indexed features, the bundled cell numbers are different. In our implementation, there are about 2.2 bundled cells in average for each feature. Therefore, it takes about 14 bytes for Table 1. Memory Cost on Each Indexed Feature for the Four Approaches Fig.7. Performance comparison of different methods with different database sizes on the Copydays ground truth dataset. Method Memory Cost per Indexed Feature (Byte) VVT 8 BF 14 Reranking 8 Our approach 8

10 846 J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5 BF [1] to index one feature. Our approach costs the same memory as the reranking approach in indexing each feature. Both approaches take 4 bytes to store the image ID, and another 4 bytes to store the x- and y-coordinates, orientation and scale information. 5.3 Scalability on a 10-Million Image Database To demonstrate the scalability of our spatial coding scheme, we move on to a 10-million database. This database includes about 3 million images from Image- Net [44], and 7 million images crawled from the Web, with 10.2 million images in total. Since the 10-million database contains additional partial duplicates of the DupData dataset, for evaluation purpose, we identify these partially duplicated images from the dataset by searching the database with every image from our ground-truth dataset as the query image. Then, we enrich our original DupData dataset with those images, generating a new DupData ground-truth dataset, which contains images. To evaluate the retrieval performance, we mix the enriched DupData dataset and the CopyDays dataset with the 10-million images, respectively, which are used as distractor. Considering the high memory requirement, we evaluate VVT and our approach on another server (Dell R710) with 2.4 GHz CPU and 32 GB memory, which is much more powerful than the server mentioned in Subsection 5.2. Also, 100 images from the 23 categories are used as queries. The 1 M visual codebook is selected for the VVT method as discussed in Subsection 5.2. In this evaluation, the reranking approach takes the top returned results for geometric verification. Due to the memory limit, the Bundled Feature [1] approach is not compared on this dataset. As discussed in previous subsection, the Bundled Feature approach costs about 14 bytes to index each feature. In our implementation, there are about 300 SIFT features in average per image. Therefore, to index 10 million images, the BF approach will take about 42 GB memory, which exceeds the capability of our server. Table 2 compares the average performance in map and the time cost of the VVT method, the reranking method, and our approach. On the enriched DupData dataset, our approach achieves a relatively 30.1% improvement in map over the VVT approach, from 0.36 to On the Copydays dataset, our approach pushes map from 0.37 of the VVT to 0.53, with a relatively 43.2% improvement. Compared with the reranking method, our approach improves map by 0.02 and 0.04 in the two datasets, respectively. In terms of efficiency, our approach is comparable to the baseline, with about additional 0.5 seconds time cost per query. Compared with the reranking method, our approach is much more efficient, with about 5.9 seconds less time per query. Table 2. Performance Comparison in map and Time Cost on 10-Million Dataset Method map map Query Time (DupData) (CopyDays) (s) VVT Reranking Our approach Conclusions In this paper, we proposed a scheme to encode the spatial context of local features for efficient geometric verification in large-scale partial-duplicate image search. The generated spatial map encodes the relative spatial locations among features in an image and captures invariance to translation, rotation and scaling changes. Based on the spatial context coding maps, spatial verification can be efficiently performed to effectively discover spatially inconsistent feature matches between images. Further, an enhancement scheme based on affine estimation was presented to refine the geometric verification results. As demonstrated in large-scale partial-duplicate Web image retrieval, spatial coding achieves competitive performance compared with existing algorithms and the additional computational overhead over VVT approach is small. Our method aims to identify images sharing some duplicated patches. The geometric context verification scheme was designed for partial-duplicate Web image retrieval. However, it may not work well on object images with severe affine distortion from various observation views. References [1] Wu Z, Ke Q, Isard M, Sun J. Bundling features for large scale partial-duplicate web image search. In Proc. CVPR, June 2009, pp [2] Xie H, Gao K, Zhang Y et al. Efficient feature detection and effective post-verification for large scale near-duplicate image search. IEEE Trans. Multimedia, 2011, 13(6): [3] Xie L, Tian Q, Zhou W et al. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb. Computer Vision and Image Understanding, 2014, 124: [4] Chu L, Jiang S, Wang S, Zhang Y, Huang Q. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Trans. Multimedia, 2013, 15(8): [5] Sivic J, Zisserman A. Video Google: A text retrieval approach to object matching in videos. In Proc. the 9th IEEE Int. Conf. Computer Vision, Oct. 2003, pp [6] Nister D, Stewenius H. Scalable recognition with a vocabulary tree. In Proc. CVRP, June 2006, pp [7] Chum O, Philbin J, Sivic J, Isard M, Zisserman A. Total recall: Automatic query expansion with a generative feature

11 Wen-Gang Zhou et al.: Encoding Spatial Context for Image Retrieval 847 model for object retrieval. In Proc. the 11th IEEE Int. Conf. Computer Vision, Oct. 2007, pp.1-8. [8] Chum O, Philbin J, Zisserman A. Near duplicate image detection: Min-Hash and tf-idf weighting. In Proc. the 19th BMVC, Sept. 2008, pp [9] Chum O, Perdoch M, Matas J. Geometric min-hashing: Finding a (thick) needle in a haystack. In Proc. CVPR, June 2009, pp [10] Jegou H, Douze M, Schmid C. Hamming embedding and weak geometric consistency for large scale image search. In Proc. the 10th ECCV, Oct. 2008, pp [11] Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, June 2007, pp.1-8. [12] Philbin J, Chum O, Isard M et al. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, June 2008, pp.1-8. [13] Jégou H, Douze M, Schmid C, Pérez P. Aggregating local descriptors into a compact image representation. In Proc. CVPR, June 2010, pp [14] Zhang Y, Jia Z, Chen T. Image retrieval with geometrypreserving visual phrases. In Proc. CVPR, June 2011, pp [15] Zhou W, Lu Y, Li H, Song Y, Tian Q. Spatial coding for large scale partial-duplicate Web image search. In Proc. Int. Conf. Multimedia, Oct. 2010, pp [16] Zheng L, Wang S, Liu Z, Tian Q. LP-Norm IDF for large scale image search. In Proc. CVPR, June 2013, pp [17] Xie H, Zhang Y, Tan J, Guo L, Li J. Contextual query expansion for image retrieval. IEEE Trans. Multimedia, 2014, 16(4): [18] Liu Z, Li H, Zhou W, Zhao R, Tian Q. Contextual hashing for large-scale image search. IEEE Trans. Image Processing, 2014, 23(4): [19] Lowe D G. Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 2004, 60(2): [20] Zhou W, Lu Y, Li H, Tian Q. Scalar quantization for large scale image search. In Proc. the 20th ACM Multimedia, Oct. 2012, pp [21] Babenko A, Lempitsky V. The inverted multi-index. In Proc. CVPR, June 2012, pp [22] Shen X, Lin Z, Brandt J et al. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In Proc. CVPR, June 2012, pp [23] Zhou W, Li H, Lu Y, Tian Qi. SIFT match verification by geometric coding for large-scale partial-duplicate Web image search. ACM Trans. Multimedia Computing, Communications, and Applications, 2013, 9(1): Article No. 4. [24] Wang W, Zhang D, Zhang Y, Li J, Gu X. Robust spatial matching for object retrieval and its parallel implementation on GPU. IEEE Trans. Multimedia, 2011, 13(6): [25] Chum O, Mikulik A, Perdoch M, Matas J. Total recall II: Query expansion revisited. In Proc. CVPR, June 2011, pp [26] Zhou W, Li H, Lu Y, Wang M, Tian Q. Visual word expansion and BSIFT verification for large-scale image search. Multimedia Systems, /s , Aug [27] Zhang S, Yang M, Wang X et al. Semantic-aware co-indexing for image retrieval. In Proc. ICCV, 2013, pp [28] Zhang S, Tian Q, Lu K et al. Edge-SIFT: Discriminative binary descriptor for scalable partial-duplicate mobile search. IEEE Trans. Image Processing, 2013, 22(7): pp [29] Jégou H, Harzallah H, Schmid C. A contextual dissimilarity measure for accurate and efficient image search. In Proc. CVPR, June 2007, pp.1-8. [30] Zhou W, Yang M, Li H, Wang X, Lin Y, Tian Q. Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Trans. Multimedia, 2014, 16(3): [31] Arandjelovic R, Zisserman A. Three things everyone should know to improve object retrieval. In Proc. CVPR, June 2012, pp [32] Zhang X, Zhang L, Shum H Y. QsRank: Query-sensitive hash code ranking for efficient ɛ-neighbor search. In Proc. CVPR, June 2012, pp [33] Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981, 24(6): [34] Chum O, Matas J. Matching with PROSAC-progressive sample consensus. In Proc. CVPR, June 2005, pp [35] Smith J R, Chang S F. VisualSEEk: A fully automated content-based image query system. In Proc. the 4th ACM Multimedia, Nov. 1996, pp [36] Chang S, Shi Q, Yan C. Iconic indexing by 2-D strings. IEEE Trans. Pattern Analysis and Machine Intelligence, 1987, 9(3): [37] Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): [38] Chum O, Philbin J, Isard M et al. Scalable near identical image and shot detection. In Proc. CIVR, July 2007, pp [39] Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, June 2006, pp [40] Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002, 24(4): [41] Savarese S, Winn J, Criminisi A. Discriminative object class models of appearance and shape by correlatons. In Proc. CVPR, June 2006, pp [42] Yuan J, Wu Y, Yang M. Discovery of collocation patterns: From visual words to visual phrases. In Proc. CVPR, June 2007, pp.1-8. [43] Zhang Y, Chen T. Efficient kernels for identifying unboundedorder spatial features. In Proc. CVPR, June 2009, pp [44] Deng J, Dong W, Socher R et al. ImageNet: A large-scale hierarchical image database. In Proc. CVPR, June 2009, pp Wen-Gang Zhou received his B.E. degree in electronic information engineering from Wuhan University, in 2006, and Ph.D. degree in electronic engineering and information science from University of Science and Technology of China (USTC), Hefei, in He was a research intern in Internet Media Group of Microsoft Research Asia from December 2008 to August From September 2011 to September 2013, he worked as a postdoctoral researcher in Computer Science Department of University of Texas at San Antonio. He is currently an associate professor of the Department of Electronic Engineering and Information Science, USTC. His research interests include multimedia information retrieval and computer vision.

12 848 J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5 Hou-Qiang Li received his B.S., M.E. and Ph.D. degrees from University of Science and Technology of China (USTC), Hefei, in 1992, 1997, and 2000, respectively, all in electronic engineering. He is currently a professor at the Department of Electronic Engineering and Information Science, USTC. His research interests include video coding and communications, image/video analysis, and computer vision. His research has been supported by the National Natural Science Foundation of China (NSFC), National High Technology Research and Development 863 Program of China, Microsoft, Nokia, and Huawei. He is an associate editor of IEEE Transactions on Circuits and Systems for Video Technology and in the editorial board of Journal of Multimedia. He has served on technical/program committees, organizing committees, and as program co-chair and track/session chair for over 10 international conferences. Yijuan Lu is an assistant professor in the Department of Computer Science, Texas State University. She received her Ph.D. degree in computer science from the University of Texas at San Antonio in During , she was a summer intern researcher at FXPAL Lab, Web Search and Mining Group, Microsoft Research Asia (MSRA), National Resource for Biomedical Supercomputing (NRBSC) at the Pittsburgh Supercomputing Center (PSC), Pittsburgh. She was an intern researcher at Media Technologies Lab, Hewlett-Packard Laboratories (HP), 2008, and a research fellow of Multimodal Information Access and Synthesis (MIAS) Center at University of Illinois at Urbana- Champaign (UIUC), Her current research interests include multimedia information retrieval, computer vision, machine learning, data mining, and bioinformatics. She has published extensively and served as reviewers at top conferences and journals. She is the 2007 Best Paper Candidate in Retrieval Track of Pacific-Rim Conference on Multimedia (PCM) and the recipient of 2007 Prestigious HEB Dissertation Fellowship, 2007 Star of Tomorrow Internship Program of MSRA. She is a member of ACM and IEEE. Qi Tian received his B.E. degree in electronic engineering from Tsinghua University, Beijing, in 1992, M.S. degree in electrical and computer engineering from Drexel University in 1996, and Ph.D. degree in electrical and computer engineering from the University of Illinois, Urbana-Champaign in He is currently a professor of the Department of Computer Science at the University of Texas at San Antonio (UTSA). Dr. Tian s research interests include multimedia information retrieval and computer vision. He has published over 250 refereed journal and conference papers. His research projects were funded by NSF, ARO, DHS, SALSI, CIAS, and UTSA and he received faculty research awards from Google, NEC Laboratories of America, FX- PAL, Akiira Media Systems, and HP Labs. He took a oneyear faculty leave at Microsoft Research Asia (MSRA) during He was the co-author of an ACM ICIMCS 2012 Best Paper, an MMM 2013 Best Paper, a Top 10% Paper Award in MMSP 2011, a Best Student Paper in ICASSP 2006, and a Best Paper at PCM He received 2010 ACM Service Award. Dr. Tian has been serving as program chair, organization committee member and TPC for numerous IEEE and ACM conferences including ACM Multimedia, SIGIR, ICCV, ICME, etc. He is the guest editor of IEEE Transactions on Multimedia, Journal of Computer Vision and Image Understanding, Pattern Recognition Letter, EURASIP Journal on Advances in Signal Processing, Journal of Visual Communication and Image Representation, and is in the editorial board of IEEE Transactions on Circuit and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM), Journal of Multimedia (JMM), and Journal of Machine Visions and Applications (MVA).

Spatial Coding for Large Scale Partial-Duplicate Web Image Search

Spatial Coding for Large Scale Partial-Duplicate Web Image Search Spatial Coding for Large Scale Partial-Duplicate Web Image Search Wengang Zhou, Yijuan Lu 2, Houqiang Li, Yibing Song, Qi Tian 3 Dept. of EEIS, University of Science and Technology of China, Hefei, P.R.

More information

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search Binary SIFT: Towards Efficient Feature Matching Verification for Image Search Wengang Zhou 1, Houqiang Li 2, Meng Wang 3, Yijuan Lu 4, Qi Tian 1 Dept. of Computer Science, University of Texas at San Antonio

More information

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Bundling Features for Large Scale Partial-Duplicate Web Image Search Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research Abstract In state-of-the-art image retrieval systems, an image is

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191,

More information

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search Yan Zhang 1,2, Yao Zhao 1,2, Shikui Wei 3( ), and Zhenfeng Zhu 1,2 1 Institute of Information Science, Beijing Jiaotong

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds 9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School

More information

WISE: Large Scale Content Based Web Image Search. Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley

WISE: Large Scale Content Based Web Image Search. Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley WISE: Large Scale Content Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1 A picture is worth a thousand words. Query by Images What leaf?

More information

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Anand K. Hase, Baisa L. Gunjal Abstract In the real world applications such as landmark search, copy protection, fake image

More information

Evaluation of GIST descriptors for web scale image search

Evaluation of GIST descriptors for web scale image search Evaluation of GIST descriptors for web scale image search Matthijs Douze Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg and Cordelia Schmid INRIA Grenoble, France July 9, 2009 Evaluation of GIST for

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Instance-level recognition part 2

Instance-level recognition part 2 Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,

More information

SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION

SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION Yupeng Yan 1 Xinmei Tian 1 Linjun Yang 2 Yijuan Lu 3 Houqiang Li 1 1 University of Science and Technology of China, Hefei Anhui, China 2 Microsoft Corporation,

More information

Speeding up the Detection of Line Drawings Using a Hash Table

Speeding up the Detection of Line Drawings Using a Hash Table Speeding up the Detection of Line Drawings Using a Hash Table Weihan Sun, Koichi Kise 2 Graduate School of Engineering, Osaka Prefecture University, Japan sunweihan@m.cs.osakafu-u.ac.jp, 2 kise@cs.osakafu-u.ac.jp

More information

Large-scale Image Search and location recognition Geometric Min-Hashing. Jonathan Bidwell

Large-scale Image Search and location recognition Geometric Min-Hashing. Jonathan Bidwell Large-scale Image Search and location recognition Geometric Min-Hashing Jonathan Bidwell Nov 3rd 2009 UNC Chapel Hill Large scale + Location Short story... Finding similar photos See: Microsoft PhotoSynth

More information

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval Packing and Padding: Coupled Multi-index for Accurate Image Retrieval Liang Zheng 1, Shengjin Wang 1, Ziqiong Liu 1, and Qi Tian 2 1 State Key Laboratory of Intelligent Technology and Systems; 1 Tsinghua

More information

Large Scale Image Retrieval

Large Scale Image Retrieval Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

An Adaptive Threshold LBP Algorithm for Face Recognition

An Adaptive Threshold LBP Algorithm for Face Recognition An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent

More information

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Efficient Representation of Local Geometry for Large Scale Object Retrieval Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Instance-level recognition II.

Instance-level recognition II. Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale

More information

Hamming embedding and weak geometric consistency for large scale image search

Hamming embedding and weak geometric consistency for large scale image search Hamming embedding and weak geometric consistency for large scale image search Herve Jegou, Matthijs Douze, and Cordelia Schmid INRIA Grenoble, LEAR, LJK firstname.lastname@inria.fr Abstract. This paper

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Object Recognition in Large Databases Some material for these slides comes from www.cs.utexas.edu/~grauman/courses/spring2011/slides/lecture18_index.pptx

More information

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012) Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of

More information

Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Internet

More information

Compressed local descriptors for fast image and video search in large databases

Compressed local descriptors for fast image and video search in large databases Compressed local descriptors for fast image and video search in large databases Matthijs Douze2 joint work with Hervé Jégou1, Cordelia Schmid2 and Patrick Pérez3 1: INRIA Rennes, TEXMEX team, France 2:

More information

DeepIndex for Accurate and Efficient Image Retrieval

DeepIndex for Accurate and Efficient Image Retrieval DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions

More information

Instance Search with Weak Geometric Correlation Consistency

Instance Search with Weak Geometric Correlation Consistency Instance Search with Weak Geometric Correlation Consistency Zhenxing Zhang, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton Insight Centre for Data Analytics School of Computing, Dublin City University

More information

Robot localization method based on visual features and their geometric relationship

Robot localization method based on visual features and their geometric relationship , pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department

More information

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607

More information

Efficient Re-ranking in Vocabulary Tree based Image Retrieval

Efficient Re-ranking in Vocabulary Tree based Image Retrieval Efficient Re-ranking in Vocabulary Tree based Image Retrieval Xiaoyu Wang 2, Ming Yang 1, Kai Yu 1 1 NEC Laboratories America, Inc. 2 Dept. of ECE, Univ. of Missouri Cupertino, CA 95014 Columbia, MO 65211

More information

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE. Project Plan

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE. Project Plan FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE Project Plan Structured Object Recognition for Content Based Image Retrieval Supervisors: Dr. Antonio Robles Kelly Dr. Jun

More information

Large-scale visual recognition Efficient matching

Large-scale visual recognition Efficient matching Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!

More information

Supplementary Material for Ensemble Diffusion for Retrieval

Supplementary Material for Ensemble Diffusion for Retrieval Supplementary Material for Ensemble Diffusion for Retrieval Song Bai 1, Zhichao Zhou 1, Jingdong Wang, Xiang Bai 1, Longin Jan Latecki 3, Qi Tian 4 1 Huazhong University of Science and Technology, Microsoft

More information

Image Retrieval with a Visual Thesaurus

Image Retrieval with a Visual Thesaurus 2010 Digital Image Computing: Techniques and Applications Image Retrieval with a Visual Thesaurus Yanzhi Chen, Anthony Dick and Anton van den Hengel School of Computer Science The University of Adelaide

More information

Automatic Ranking of Images on the Web

Automatic Ranking of Images on the Web Automatic Ranking of Images on the Web HangHang Zhang Electrical Engineering Department Stanford University hhzhang@stanford.edu Zixuan Wang Electrical Engineering Department Stanford University zxwang@stanford.edu

More information

Perception IV: Place Recognition, Line Extraction

Perception IV: Place Recognition, Line Extraction Perception IV: Place Recognition, Line Extraction Davide Scaramuzza University of Zurich Margarita Chli, Paul Furgale, Marco Hutter, Roland Siegwart 1 Outline of Today s lecture Place recognition using

More information

Video Google: A Text Retrieval Approach to Object Matching in Videos

Video Google: A Text Retrieval Approach to Object Matching in Videos Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature

More information

Towards Visual Words to Words

Towards Visual Words to Words Towards Visual Words to Words Text Detection with a General Bag of Words Representation Rakesh Mehta Dept. of Signal Processing, Tampere Univ. of Technology in Tampere Ondřej Chum, Jiří Matas Centre for

More information

Randomized Visual Phrases for Object Search

Randomized Visual Phrases for Object Search Randomized Visual Phrases for Object Search Yuning Jiang Jingjing Meng Junsong Yuan School of EEE, Nanyang Technological University, Singapore {ynjiang, jingjing.meng, jsyuan}@ntu.edu.sg Abstract Accurate

More information

arxiv: v1 [cs.cv] 11 Feb 2014

arxiv: v1 [cs.cv] 11 Feb 2014 Packing and Padding: Coupled Multi-index for Accurate Image Retrieval Liang Zheng 1, Shengjin Wang 1, Ziqiong Liu 1, and Qi Tian 2 1 Tsinghua University, Beijing, China 2 University of Texas at San Antonio,

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

III. VERVIEW OF THE METHODS

III. VERVIEW OF THE METHODS An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

arxiv: v1 [cs.cv] 15 Mar 2014

arxiv: v1 [cs.cv] 15 Mar 2014 Geometric VLAD for Large Scale Image Search Zixuan Wang, Wei Di, Anurag Bhardwaj, Vignesh Jagadeesh, Robinson Piramuthu Dept.of Electrical Engineering, Stanford University, CA 94305 ebay Research Labs,

More information

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM Christian Eggert, Stefan Romberg, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg ABSTRACT

More information

A Novel Real-Time Feature Matching Scheme

A Novel Real-Time Feature Matching Scheme Sensors & Transducers, Vol. 165, Issue, February 01, pp. 17-11 Sensors & Transducers 01 by IFSA Publishing, S. L. http://www.sensorsportal.com A Novel Real-Time Feature Matching Scheme Ying Liu, * Hongbo

More information

Patch Descriptors. EE/CSE 576 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Visual Object Recognition

Visual Object Recognition Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Nagoya University at TRECVID 2014: the Instance Search Task

Nagoya University at TRECVID 2014: the Instance Search Task Nagoya University at TRECVID 2014: the Instance Search Task Cai-Zhi Zhu 1 Yinqiang Zheng 2 Ichiro Ide 1 Shin ichi Satoh 2 Kazuya Takeda 1 1 Nagoya University, 1 Furo-Cho, Chikusa-ku, Nagoya, Aichi 464-8601,

More information

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Visual Recognition and Search April 18, 2008 Joo Hyun Kim Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in downtown with a tour guide book?? Austin, TX 2 Introduction Look at guide What s this? Found Name of place Where

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:

More information

A Comparison of SIFT, PCA-SIFT and SURF

A Comparison of SIFT, PCA-SIFT and SURF A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea qiuhehappy@hotmail.com Oubong Gwun Computer Graphics Lab, Chonbuk National

More information

Adaptive Binary Quantization for Fast Nearest Neighbor Search

Adaptive Binary Quantization for Fast Nearest Neighbor Search IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,

More information

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting Visual Word based Location Recognition in 3D models using Distance Augmented Weighting Friedrich Fraundorfer 1, Changchang Wu 2, 1 Department of Computer Science ETH Zürich, Switzerland {fraundorfer, marc.pollefeys}@inf.ethz.ch

More information

Published in: MM '10: proceedings of the ACM Multimedia 2010 International Conference: October 25-29, 2010, Firenze, Italy

Published in: MM '10: proceedings of the ACM Multimedia 2010 International Conference: October 25-29, 2010, Firenze, Italy UvA-DARE (Digital Academic Repository) Landmark Image Retrieval Using Visual Synonyms Gavves, E.; Snoek, C.G.M. Published in: MM '10: proceedings of the ACM Multimedia 2010 International Conference: October

More information

Partial Copy Detection for Line Drawings by Local Feature Matching

Partial Copy Detection for Line Drawings by Local Feature Matching THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. E-mail: 599-31 1-1 sunweihan@m.cs.osakafu-u.ac.jp, kise@cs.osakafu-u.ac.jp MSER HOG HOG MSER SIFT Partial

More information

Learning Affine Robust Binary Codes Based on Locality Preserving Hash

Learning Affine Robust Binary Codes Based on Locality Preserving Hash Learning Affine Robust Binary Codes Based on Locality Preserving Hash Wei Zhang 1,2, Ke Gao 1, Dongming Zhang 1, and Jintao Li 1 1 Advanced Computing Research Laboratory, Beijing Key Laboratory of Mobile

More information

Min-Hashing and Geometric min-hashing

Min-Hashing and Geometric min-hashing Min-Hashing and Geometric min-hashing Ondřej Chum, Michal Perdoch, and Jiří Matas Center for Machine Perception Czech Technical University Prague Outline 1. Looking for representation of images that: is

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

Local invariant features

Local invariant features Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest

More information

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS Luca Bertinetto, Attilio Fiandrotti, Enrico Magli Dipartimento di Elettronica e Telecomunicazioni, Politecnico di Torino (Italy) ABSTRACT

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1 IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1 : Presentation structure : 1. Brief overview of talk 2. What does Object Recognition involve? 3. The Recognition Problem 4. Mathematical background:

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Mixtures of Gaussians and Advanced Feature Encoding

Mixtures of Gaussians and Advanced Feature Encoding Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

ImgSeek: Capturing User s Intent For Internet Image Search

ImgSeek: Capturing User s Intent For Internet Image Search ImgSeek: Capturing User s Intent For Internet Image Search Abstract - Internet image search engines (e.g. Bing Image Search) frequently lean on adjacent text features. It is difficult for them to illustrate

More information

Learning a Fine Vocabulary

Learning a Fine Vocabulary Learning a Fine Vocabulary Andrej Mikulík, Michal Perdoch, Ondřej Chum, and Jiří Matas CMP, Dept. of Cybernetics, Faculty of EE, Czech Technical University in Prague Abstract. We present a novel similarity

More information

Indexing local features and instance recognition May 16 th, 2017

Indexing local features and instance recognition May 16 th, 2017 Indexing local features and instance recognition May 16 th, 2017 Yong Jae Lee UC Davis Announcements PS2 due next Monday 11:59 am 2 Recap: Features and filters Transforming and describing images; textures,

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Contour-Based Large Scale Image Retrieval

Contour-Based Large Scale Image Retrieval Contour-Based Large Scale Image Retrieval Rong Zhou, and Liqing Zhang MOE-Microsoft Key Laboratory for Intelligent Computing and Intelligent Systems, Department of Computer Science and Engineering, Shanghai

More information

Modern-era mobile phones and tablets

Modern-era mobile phones and tablets Anthony Vetro Mitsubishi Electric Research Labs Mobile Visual Search: Architectures, Technologies, and the Emerging MPEG Standard Bernd Girod and Vijay Chandrasekhar Stanford University Radek Grzeszczuk

More information

Query Adaptive Similarity for Large Scale Object Retrieval

Query Adaptive Similarity for Large Scale Object Retrieval Query Adaptive Similarity for Large Scale Object Retrieval Danfeng Qin Christian Wengert Luc van Gool ETH Zürich, Switzerland {qind,wengert,vangool}@vision.ee.ethz.ch Abstract Many recent object retrieval

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

Query-Specific Visual Semantic Spaces for Web Image Re-ranking

Query-Specific Visual Semantic Spaces for Web Image Re-ranking Query-Specific Visual Semantic Spaces for Web Image Re-ranking Xiaogang Wang 1 1 Department of Electronic Engineering The Chinese University of Hong Kong xgwang@ee.cuhk.edu.hk Ke Liu 2 2 Department of

More information

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

A Novel Method for Image Retrieving System With The Technique of ROI & SIFT

A Novel Method for Image Retrieving System With The Technique of ROI & SIFT A Novel Method for Image Retrieving System With The Technique of ROI & SIFT Mrs. Dipti U.Chavan¹, Prof. P.B.Ghewari² P.G. Student, Department of Electronics & Tele. Comm. Engineering, Ashokrao Mane Group

More information

Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features

Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features 2011 International Conference on Document Analysis and Recognition Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features Masakazu Iwamura, Takuya Kobayashi, and Koichi

More information

State-of-the-Art: Transformation Invariant Descriptors. Asha S, Sreeraj M

State-of-the-Art: Transformation Invariant Descriptors. Asha S, Sreeraj M International Journal of Scientific & Engineering Research, Volume 4, Issue ş, 2013 1994 State-of-the-Art: Transformation Invariant Descriptors Asha S, Sreeraj M Abstract As the popularity of digital videos

More information

Evaluation of GIST descriptors for web-scale image search

Evaluation of GIST descriptors for web-scale image search Evaluation of GIST descriptors for web-scale image search Matthijs Douze, Hervé Jégou, Sandhawalia Harsimrat, Laurent Amsaleg, Cordelia Schmid To cite this version: Matthijs Douze, Hervé Jégou, Sandhawalia

More information

A Novel Algorithm for Color Image matching using Wavelet-SIFT

A Novel Algorithm for Color Image matching using Wavelet-SIFT International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 A Novel Algorithm for Color Image matching using Wavelet-SIFT Mupuri Prasanth Babu *, P. Ravi Shankar **

More information

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.E-Communication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important

More information

Indexing local features and instance recognition May 14 th, 2015

Indexing local features and instance recognition May 14 th, 2015 Indexing local features and instance recognition May 14 th, 2015 Yong Jae Lee UC Davis Announcements PS2 due Saturday 11:59 am 2 We can approximate the Laplacian with a difference of Gaussians; more efficient

More information

arxiv: v1 [cs.cv] 26 Dec 2013

arxiv: v1 [cs.cv] 26 Dec 2013 Finding More Relevance: Propagating Similarity on Markov Random Field for Image Retrieval Peng Lu a, Xujun Peng b, Xinshan Zhu c, Xiaojie Wang a arxiv:1312.7085v1 [cs.cv] 26 Dec 2013 a Beijing University

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline Lecture 14: Indexing with local features Thursday, Nov 1 Prof. Kristen Grauman Outline Last time: local invariant features, scale invariant detection Applications, including stereo Indexing with invariant

More information