Query Expansion for Hash-based Image Object Retrieval
|
|
- Elinor Thomas
- 5 years ago
- Views:
Transcription
1 Query Expansion for Hash-based Image Object Retrieval Yin-Hsi Kuo 1, Kuan-Ting Chen 2, Chien-Hsing Chiang 1, Winston H. Hsu 1,2 1 Dept. of Computer Science and Information Engineering, 2 Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan ABSTRACT An efficient indexing method is essential for content-based image retrieval with the exponential growth in large-scale videos and photos. Recently, hash-based methods (e.g., locality sensitive hashing LSH) have been shown efficient for similarity search. We extend such hash-based methods for retrieving images represented by bags of (high-dimensional) feature points. Though promising, the hash-based image object search suffers from low recall rates. To boost the hash-based search quality, we propose two novel expansion strategies intra-expansion and inter-expansion. The former expands more target feature points similar to those in the query and the latter mines those feature points that shall co-occur with the search targets but not present in the query. We further exploit variations for the proposed methods. Experimenting in two consumer-photo benchmarks, we will show that the proposed expansion methods are complementary to each other and can collaboratively contribute up to 76.3% (average) relative improvement over the original hash-based method. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval models General Terms Algorithms, Experimentation, Performance Keywords Locality sensitive hashing (LSH), Query expansion 1. INTRODUCTION The exponential growth of photos and videos, either in media-sharing sites, business stockings, or personal collections, have created the needs for efficient content-based image retrieval (CBIR), which helps locating similar images in large-scale collections. In recent years, researchers have focused on the more challenging problem of image object retrieval [21][23]. Image object retrieval aims to retrieve images that contain the visual object shown in the object query image. For example, searching for images that contain Torre Pendente Di Pisa or the Starbucks logo (cf. Figure 7(c)(d)). Such techniques also motivate many promising applications such as exploring photo collections in 3D [25], photo-based question answering [31], video advertising by image matching [21], annotation by search [27], etc. The traditional solutions for CBIR employ global low-level features like color and texture. By selecting proper feature representations and distance metrics, the similarities between query and database images can be calculated and a ranking list generated accordingly. Prominent CBIR systems that use these Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM 09, October 19 24, 2009, Beijing, China. Copyright 2009 ACM /09/10...$ techniques include QBIC (Query By Image Content) [15] and VisualSEEK [24], etc. However, for image object retrieval, the query can be a full image or just part of the whole image which we call object-level image. As shown in Figure 2(a), the red rectangle represents an object-level query image. Similarly, the object may occupy only part of the target database images. Global color and texture features become limited under these conditions. To capture local image information that is essential in object retrieval, Lowe proposed scale-invariant feature transform (SIFT) [19]. The SIFT feature works by first detecting salient regions in an image and then describing each region with a 128-d feature vector. Its advantage is that both spatial and appearance information of each local region are recorded with built-in invariance to modest changes in object scale or camera viewpoint. As a result, an image can be viewed as a bag-of-feature-points and any object within the image is a subset of the points. With the bag-of-feature-points representation, image retrieval is carried out by comparing all feature points in query image to those from all images in the database. Since an image typically contains hundreds to thousands of feature points, an image database can easily have millions or even billions of points. Therefore, image retrieval based on bag-of-feature-points representation is in fact a large-scale many-to-many matching problem in high-dimensional space. To index the data so that similarity search can be performed effectively and efficiently is extremely crucial. A hash-based method known as locality sensitive hashing (LSH) [16] has been shown successful in performing similarity search on various types of data including text [26], audio [8][6], images [18] and videos [13]. LSH has the important characteristic that given a distance metric, hash functions are designed so that similar data have much higher probability to hash into the same bucket than dissimilar data. As a result, when we evaluate a query, only a small portion of the data points, those that hash into the same bucket as the query point, needs to be examined. Also, multiple hash functions and hash tables are employed to improve robustness against false negatives. LSH runs in sublinear time and scales well with data dimensionality [16]. See section 3 for more details. The combination of SIFT feature and LSH has been shown effective in image duplicate detection [18]. Figure 1(a) shows an illustration of searching by LSH over bag-of-feature-points representation. The red rectangle is the object query image which contains four query feature points (A, B, C, and D). Each query point is used to retrieve database points that collide (hash into the same bucket) with it in any of the hash tables. After examining all query points, retrieved database points are analyzed to identify database images associated to them and also the degree of relevance of these images to the query. For simplicity of illustration, relevance here is defined as number of matching feature point pairs between query and data images. Although this retrieval model succeeds at retrieving target images with high precision, it suffers from low recall rates. Figure 2(c) shows a real retrieval case with basic LSH. While most of the top ranked images are correct, some false positives are high up in the 65
2 (a) Basic LSH and query result (b) Intra-expansion result (c) Inter-expansion result Figure 1: Query expansion for hash-based image retrieval. (a) An illustration for search by hashing over bag-of-feature-points (i.e., {A, B, C, D}) and ranking by the number of matched pairs (e.g., 4, 2, 2, 1, 1); the query image is in the left; since only the exact (in the same bucket) feature points are considered, the original LSH suffers from low recall rate. We tackle the problem by proposing two expansion strategies for hashing. (b) Intra-expansion expands more target feature points (e.g., A, A, B, C, etc.) similar to those in the query but mis-hashed to other buckets. (c) Inter-expansion mines those feature points (e.g., {E, F, G}) that shall co-occur with the search targets by exploiting the related hash buckets from the initial search result; more diverse (and related) results can be retrieved. Note that the two expansion strategies can be combined iteratively to boost the search results, and the expansions are realized efficiently by merely looking up the hash buckets. state-of-the-art hash-based image search method (i.e., [18]). The recall is improved greatly as well as shown in Figure 2. list, too. The recall is low because even though LSH guarantees a high probability of hashing similar data into the same buckets, this probability is not 1. It is unavoidable that some similar features are hashed into neighboring buckets and LSH fails to retrieve them at query time. Also, there may be features that strongly characterize the query object but are not present in the query image or simply appear very different from query features. This can be the result of changes in lighting condition, camera angle and occlusion effects, etc. The primary contributions of this paper include: The first proposal of intra- and inter-expansion for hash-based image object retrieval. Investigating effective and efficient algorithms in implementing the proposed expansion methods (Section 4). Conducting extensive experiments in large-scale benchmarks (Section 5 and 6) and exemplifying the significant gains for the proposed methods. To tackle this problem, we introduce query expansion1 technique for hash-based image object retrieval. We propose two expansion strategies intra-expansion and inter-expansion. Intra-expansion uses existing query features to obtain more similar features as matching targets. For example, in Figure 1(b), features A and A are similar to feature A (the distance between A and A is smaller than a threshold) but not found by basic LSH. Intra-expansion discovers A and A by inspecting neighboring buckets to A and they can be used as target features once identified. Inter-expansion is to obtain new query features not present in the query image but mined from the initial search results. To do this, we choose some possible correct images (e.g., top ranked images) and issue their relevant features as new queries. For example, in Figure 1(c), features E, F and G are not present in the query image but they appear alongside correctly matched target features in the initial search results. By inter-expansion, they are considered relevant to the query and issued as new queries. Their query results are factored into the final rank list returned to the user. 2. RELATED WORK In most multimedia retrieval works, researchers are faced with two important issues: the feature representation and the means to calculate similarity between data objects effectively and efficiently. Repeatedly solving the nearest neighbor problem (NN) between features is often required while addressing these issues. Researchers have argued and proved that by allowing a small error bound in solving NN, time efficiency can be greatly improved while performance degradation is acceptable [4]. NN with the addition of error bound is known as approximate nearest neighbor problems (ANN). Two popular methods for solving ANN are KD-trees [4] and locality sensitive hashing (LSH) [16]. A KD-tree is a search tree that splits according to data distribution along one single dimension at a time. Data points are stored at leave nodes and the internal nodes store splitting criteria that can be used to speed up verification of pruning opportunities. However, when the number of data dimensions exceeds 20, KD-trees suffer from the curse of dimensionality and its pruning mechanism becomes ineffective [16]. Intra- and inter-expansion expand two different types of new queries. Both can improve retrieval performance significantly and they can be combined iteratively to obtain even better results. Note that such expansion methods are realized by searching over hash tables and buckets only and actually incur little extra time while obtaining new query features. Experimenting over photo search benchmarks, we show that the proposed methods can achieve up to 76.3% (average) relative improvement over 1 On the other hand, LSH has gained popularity in recent years because of its ability to deal with data features in even higher number of dimensions and at the same time keeping running time satisfactory. The basic idea for LSH is to hash data points in a way that close points in feature space have higher chance of collision compared to far apart points. LSH is first proposed in [17] along with rigorous mathematical description. [16] implements LSH in Hamming space and evaluates its performance with low level multimedia features in high dimensions. Ke et al. [18] applies LSH in Hamming space to a variant of SIFT feature and In information retrieval community, query expansion involves evaluating a user's input (what words were typed into the search query area and sometimes other types of data) and expanding the search query to match additional documents [28]. 66
3 verification is not possible. While appropriate for classification, it is not suitable for image object retrieval where local distribution of features is the key rather than global compositions. Another popular approach in object retrieval is by using the bag-of-visual-words representation and adopting traditional retrieval techniques for textual words [23]. In this approach, a set of training feature points is clustered and the centroid of each resulting cluster is defined as a visual word. Once a set of visual words is obtained, every feature can be represented by its most similar visual word. An image is thus a bag-of-visual-words and described by a frequency histogram of the visual words which in turn is used in calculating similarity scores. Due to the nature of clustering algorithms, the quantization of feature points into visual words can be a noisy process [11]. To attenuate the problem, [33][34] both consider quantization-related (by soft assignment) visual words as calculating the global visual word histogram. Figure 2: For query image (a), basic LSH returns result (c) which has high precision but low recall. With proposed query expansion methods, more accurate and diverse results can be retrieved as shown in (d), (e) and (f). The number below each image is its rank in retrieval result. Number in the parentheses represents its rank with basic LSH. (b) shows a comparison of PR curves for these results. It is clear that the proposed expansion methods can greatly improve search performance. achieved very good performance in detecting near-duplicate images. However, the ground truth images in their experimental dataset are obtained by performing various image transformations on query images. In consumer photos, rather than duplicates, the ground truth images are photographed on real world objects under much diversified photo capturing conditions (as depicted in Figure 7). Meanwhile, we are interested in retrieving those designated objects in consumer photos. This results in far greater differences in feature points for the same object. A direct LSH approach in finding similar points is not sufficient to deal with the vast variability present within consumer photos. We introduce query expansion techniques to mend this problem. LSH is later extended to work with Euclidean distance by adopting a class of p-stable distribution functions [12]. E2LSH [1] is a publicly available software package that implements the algorithms in Euclidean space. One disadvantage of LSH is that by using multiple hash tables, space and time requirement becomes a burden. The authors of [20] propose multi-probe LSH where multiple buckets in a hash table are inspected to retrieve more diversified results. In exchange, less hash tables and running time are needed to achieve the same performance. [13] employs LSH to project feature points to an auxiliary space and represent images as random histograms therein. Support vector machines (SVMs) is used to learn classifiers on object classes in this auxiliary space. The use of random histograms bypasses the many-to-many matching problem when features are compared directly for similarity. However, the random histogram is a global summary of local features which does not preserve important spatial and appearance information and spatial Object retrieval of the visual word model still suffers from low recall rates. The authors of [11] proposed multiple image resolution expansion where correct entries in the initial retrieval result are analyzed to obtain latent images which are estimated visual word histograms of the query object if shot under a different resolution. The latent images are issued as new queries to retrieve more diversified results. This is similar to inter-expansion proposed in our work in that both are expanded by characteristics of verified correct images in the initial result. The difference is that instead of using estimated visual word histogram as new queries, we use feature points that are verified to be important for the query object. Also, a visual word histogram is the aggregation of local features. If an image has multiple salient objects or a complex background with many interfering visual words, its histogram does not well describe any objects. By working with the SIFT features directly, we avoid introducing extra noises from quantization of SIFT features or aggregation of visual words. Meanwhile, we propose novel expansion methods integrating both inter-expansion and intra-expansion strategies, which will be shown complementary to each other and significantly improving hash-based image object retrieval (up to 76.3%). 3. SYSTEM OVERVIEW AND LSH For image object retrieval, we adopt bag-of-feature-points image representation and employ the hash-based indexing method. To improve the search recall, we will extend the hash method (i.e., LSH) by two novel expansion strategies. We first provide the LSH overview in Section 3.1 and explain the matching for bags of feature points in Section 3.2. Two ranking criteria for image object retrieval are introduced in Section 3.3. Based on them, we detail the hash-based query expansion methods in Section LSH Overview LSH [16] has been shown effective and efficient for many multimedia (high-dimensional) retrieval applications [8][6][18] [13]. The essence is the hash functions which assure that similar data have much higher probability to hash into the same bucket than dissimilar data. There are plenty of hash functions that can achieve this goal, such as transform into hamming space [17][16], L1 distance [3], min-wise hash [5], random projection [10], or stable distribution method [12], etc. Applicable on L2 distance metric, the stable distribution hash function [12] is widely adopted. In this work, without loss of generality, we will base on the hash framework [12] and its implementation [1]. 67
4 Figure 3: An illustration for LSH with feature points (K=2, L=2, cf. Section 3.1); e.g., the triangle is hashed into bucket (2, 2) in hash table 1 and bucket (1, 2) in hash table 2. It also shows the need for intra-expansion. Assuming feature points with the same shape (or color) are from the same image. If is a query feature, we can retrieve two features, and. Actually is missed due to residing in a nearby bucket though it is still within (Euclidean) distance R. This example motivates intra-expansion by further checking neighboring buckets (cf. Section 4.1.2) or using co-occurrence across hash tables (cf. Section 4.1.1). For example, we can retrieve in hash table 2, by collided with the query in hash table 1. For LSH by stable distribution method [12], the hash function h(v) v is defined as follows: v v v a + b h ( ) = W where v represents a feature point, a v is a vector sampled from Gaussian distribution, W is the window size, and b is a random value for the offset ranging from 0 to W. h(v v ) computes the inner product a v v and projects the input feature point v onto a v and then windowed by W, as illustrated in Figure 3. Generally, K hash functions are considered to provide discriminativity among high-dimensional data points. For a large K, two feature points are very likely closed to each other (in the original feature space) if they are still hashed to the same bucket through these K hashing functions. However, false negatives, those true neighbors mis-hashed to other buckets, are commonly observed; for example, see feature points (bucket (2, 2)) and (bucket (3, 2)) in hash table 1, Figure 3; the two feature points are actually closed to each other but hashed to different buckets. To remedy the problems, multiple hash tables (parameterized by L) are considered to improve the robustness against the false positives; for example, and collide in the same bucket in hash table 2. Aggregating multiple tables, a query point can treat those feature points residing in the same buckets across hash tables as the (approximate) nearest neighbors; i.e., in Figure 3, (bucket (2, 2), hash table 2) and (bucket (2, 2), hash tale 1) are neighbors for. It is efficient since generally requiring constant time for retrieving these buckets. Empirically, the effecitiveness and efficieicy for LSH is parameterized by the number of hash functions K and the number of hash tables L. A larger K will have fewer feature points colliding in the same bucket; similarly, increasing L will increae candidate nearest neighbors since more buckets included across these tables. We will evaluate the impacts of K and L for image object retrieval in Section Feature Points Matching in LSH Originally, hash-based methods are to support problems where either the query or a target in the database is represented by a (1) (a) Images (b) Feature points (c) Hash tables (d) Ranking Figure 4: Illustration for image search by hashing over bags of feature points. Each feature point of the database image is associated with (a) the image id (i.e.,,, ) it belongs to and (b) the unique feature id (i.e., A-H), and then hashed to (c) multiple hash tables (cf. section 3.1). To retrieve the similar images for the query image ( ), with a single feature point, we simply hash the query feature point(s) into the buckets across hash tables (e.g., the grey buckets in (c)). An intuitive way to score the target image similarity is simply counting the number of its feature points that reside in the same buckets with the query feature points, as shown in (d). (global) feature. The retrieval is intuitive by one-to-many search in LSH and the candidate targets are those collide with the query in the buckets across hash tables. With the bag-of-feature-point representation, image retrieval is essentially a many-to-many matching problem in the high-dimensional space. An image typically contains hundreds or thousands of feature points (See few of them in Figure 6). The image database generally contains millions or even billions of feature points (i.e., 128-dimensioanl SIFT features). Feature points of the same image are associated with a unique image id. We then hash all feature points into the L hash tables. When querying an image, we issue a LSH query for each feature point independently. Each query point is used to retrieve database points that collide (hash into the same bucket) with it in any of the hash tables. After examining all query points, retrieved database points are analyzed to identify retrieved database images associated to them. For each retrieved image, we then know its feature points having initial matches to those in the query image. The naive way to rank these database images is by the number of possible matches between the query and database images. However, such ranking criterion is poor since the feature point matching through bucket lookup only is noisy. We will further inspect the candidate matches between the query and database images and then filter out their false positives (Section 3.3.1). 3.3 Filtering and Ranking Functions For each database image retrieved, we have a set of initial (noisy) matches to the corresponding feature points in the query image (cf. Figure 4). For improving the matching quality, we will employ two filtering methods: (1) inspecting the matches between feature pairs by the feature distance and (2) employing spatial consistency between (true) matching pairs. We will also introduce two image ranking measures. Note that the filtering methods are conducted one by one for each retrieved image and referred to the query Nearest neighbor filtering and similarities A naïve way to filter out the (noisy) matches between the query and a retrieved image is simply rejecting those candidate points with (Euclidean) distance greater than a threshold. However, such distance-thresholding filtering is not adaptive and even difficult to determine a proper threshold. It has been observed that a correct match needs to have the closest matching point significantly 68
5 closer than the closest incorrect match, while false matches have a certain number of other close false matches, due to the high dimensionality of the feature space [19]. We then filter out those matching pairs whose ratios of the distances to the first and second nearest neighbors are larger than a threshold (i.e., 0.8). For the matched feature pairs, we transform the feature distance to a similarity score. The ranking score, R s, matching similarities for the query image Q to a database image I is defined as: R S (I,Q) = e d ( v i,v j ) σ v i I,v j Q where (v i, v j ) are matched pairs between images I and Q, d(v i, v j ) is the L2 distance between the match; here σ=200 is empirically determined 2. Generally, a database image will have a higher similarity score if it has more matches to the query image Spatial verification and matching inliers Another important cue for image object retrieval is the spatial relationship between the matched image objects. Besides filtering by feature point similarities, we also investigate filtering by spatial verification exploiting geometry model between matching candidates for rejecting false positive matches. After spatial verification, we only keep those feature points following the spatial consistency (e.g., only 11 remained in Figure 6). The approach is mostly related to the well-known RANdom Sample Consensus (RANSAC) algorithm [14]. A basic assumption is that the data consists of inliers and outliers. The former are consistent with the estimated (spatial) model and can be explained by some set of model parameters; the latter are items that do not fit the model. This method has the intuition that if one of the set used for model estimation is an outlier then the geometry model will not gain much support. By the RANSAC algorithm, we can estimate the possible geometry model between the query Q and the inspected image I and use the model to further determine the inliers and outliers among candidate feature point matches. Meanwhile, we can also estimate the matched region in the target image for further applications. See more details in [19]. A matched region is illustrated in Figure 6. Hence, after spatial verification, another ranking score between the query image Q and the target image I is by the number of inliers and defined as follows: R L ( I, Q ) = inliers ( I, Q ) For example, Figure 6 illustrates the 11 matches between the query object and a retrieved image after spatial verification; i.e., R L (I,Q) = 11. Note that till now the spatial verification is very time-consuming. In this experiment, we will mainly adopt prior nearest neighbor ratio filtering for the baseline. For efficiency, we only do spatial verification for the few top-ranked initial results as [23], where an efficient spatial verification is proposed as well. 4. QUERY EXPANSION To boost hash-based image object retrieval, we propose two expansion strategies intra-expansion and inter-expansion, which will be explained in the following sections. We will show that combining both expansions can collaboratively boost the 2 Experiments show that σ is not sensitive within a reasonable range. We take σ=200 for the following experiments. (2) (3) performance gains over conventional hash-based image object retrieval (Section 6.3). 4.1 Intra-expansion LSH-like methods suffer from low recall rates [23][18][11] and similar feature points are likely to be mis-hashed to other buckets. Traditionally, a large number of hash tables are utilized to ease this problem. However, extra hash tables consume huge memory and are infeasible for large-scale image object retrieval by bags of feature points. Instead, we propose novel intra-expansion methods aimed to expand more target feature points similar to those in the query. To optimize the performance, we investigate variant methods for effective intra-expansion by: (1) associating feature points by co-occurrence across hash tables, (2) probing neighboring buckets from the initial hashed bucket, and (3) using meta-features to associate related feature points. The impacts of the proposed methods will be experimented in Section 6.1. Note that such expansion methods are merely looking up the hash buckets and require no extra hashing overheads except the initial image object query. The expansion will increase the number of candidate feature point matches. Similarly, we can reject the false positives effectively through the nearest neighbor ratio introduced in Section Matched point in LSH (MP) In this method, we use matched feature points to locate possible buckets that might accommodate similar feature points mis-hashed to other buckets. Each query feature point will be hashed into a bucket in each table. The candidate feature points collide with a query feature point can be seeds to associate possible buckets in other hash tables that the candidate matches might be hashed to. It is intuitive since for a matched feature point A, collided with the query in a hash table, the feature points that collide with feature point A in other hash tables might also be candidate matches. See Figure 3 for the illustration. A red star is a feature point for the query image. We can only find two feature points colliding in the same buckets with the query point in the two hash tables. Actually, there is still a blue circle point similar to the query (i.e., within certain Euclidean distance). In MP, we can associate the missing feature point by inspecting where the purple triangle, matched in hash table 1, has been hashed to in hash table 2. As a result, we can then get the blue feature point, which might associate with a candidate database image Multi-probe LSH (MPL) The prior method is to associate candidate buckets by feature point co-occurrence. It is less efficient since by point-based association the time complexity grows linearly with the number of feature points in the same bucket. Another perspective is to locate the possible buckets neighboring the initial bucket that the query point is hashed to, as motivated by multi-probe LSH proposed in [20] 3. It is intuitive since the neighboring buckets are geometrically closed to the query feature point and very likely to accommodate mis-hashed feature candidates. The probing sequence for neighboring buckets is not randomly chosen but considers the distance to the initial hashed bucket. See the example in Figure 3. For example, feature point star is 3 Note that in [20] multi-probe LSH is designed to reduce hash table size. Instead, in this work, we address to locate more likely buckets that the candidate matches might reside in. 69
6 Figure 5: An illustration for the need of inter-expansion. Each rectangle represents an image with certain feature points. Those of the same shape are assumed matched. Query image can retrieve image A (4 matched pairs) by baseline LSH but cannot find image B (0 matches). However, we can still retrieve image B through image A by including those feature points in image A as the expanded feature points, i.e., inter-expansion. Note that intra-expansion is optional here. hashed to bucket (2, 2) in hash table 1; its first bucket to probe for MPL will be bucket (2, 3) (above) since it is the most closed to the feature query start ; the next is the right bucket (3, 2), etc. See more details in [20]. Note that the number of probes for the neighboring buckets is an important factor for the performance and efficiency. We will conduct the sensitivity test in Section Meta-feature (MF) The prior intra-expansion methods, MP and MPL, both depend on the cues provided by the hash structure. We investigate another intra-expansion method to associate the feature points by discovering the meta- (or representative) features in the original feature space (i.e., SIFT). A meta-feature matched to a query feature point is then used to bring in the set of feature points that the meta-feature represents. The meta-feature uses one feature point to represent a set of similar feature points in the original feature space. A clustering method for all the feature points in the database images is required to locate the meta-features. We mark each cluster center as the meta-feature and record the feature points belonging to it (in the same cluster). The meta-features are then hashed into the hash tables. In the query phase, we retrieve all meta-features collided with the query and then use the matched meta-features as seeds to include the candidate feature points associated with these meta-features. Note that since clustering for a large number of high-dimensional data is very time-consuming, here we adopt hierarchical clustering (HKM) [22] as the clustering method. It has been shown much more efficient than conventional clustering methods (e.g., K-means). Empirically, we take 10K cluster centers in HKM as meta-features. 4.2 Inter-expansion Due to the changes in lighting condition, camera angles, and occlusions, etc., there are feature points that strongly characterize the query object but are not present in the query image or appear very different from query features. Such issues cannot be solved by the intra-expansion methods mentioned above. We propose to seek the solution from the initial query results by inter-expansion. Figure 6: Matches and regions in the target (database) image for inter expansion. There are actually two regions potential for inter-expansion: region A (red) for the (projected) matched region and B for the whole image. Note that the feature point matches are usually noisy and random. The 11 matches are retained as inliers after spatial verification. Figure 5 illustrates the intuition for inter-expansion. For example, the query image can robustly retrieve image A through LSH methods (or enhanced by intra-expansion mentioned in Section 4.1) with 4 matches but fail to retrieve image B (no matches). However, we can associate image B by taking image A as a new query image. Such behaviors had been observed by pseudo-relevance feedback (PRF) in text-based retrieval [7][28] and video retrieval [29]. For optimizing inter-expansion in hash-based image object retrieval, we investigate certain factors parameterizing the retrieval results: For example, (1) if requiring any filtering process for determining the seeded images from the initial search result for expansion (more details in Section 3.3.1), (2) the proper region for expansion the region of interest or the entire image (more details in Section and Figure 6), (3) effective fusion methods and similarity measures for fusing multiple image ranking lists expanded by the images from the initial search result, (cf. Figure 5). Note that such inter-expansion methods are mostly realized by searching related feature points in the buckets. The most time-consuming part is spatial verification (cf. Section 3.3.2) for evaluating matching inliers Pseudo-relevance feedback (PRF) PRF is the most intuitive method for inter-expansion. Initially introduced in [7], where the top-ranking documents are used to rerank the retrieved documents assuming that a significant fraction of top-ranked documents will be relevant. This is in contrast to relevance feedback where users explicitly provide feedback by labeling the top results as positive or negative. For inter-expansion, we are to automatically expand the retrieved images by issuing new queries with the top-ranking images from the initial search result since some characteristic feature points relevant to the search targets might not exist in the query image but only the search results. Figure 5 demonstrates the needs and the process for PRF (or inter-expansion). It simply assumes that the top retrieved images are correct and might be the case for text retrieval but not for photo or video search [29]. The latter generally contains more noisy high-dimensional data and often mistakenly includes false positives as the seeded images for expansion. We will exploit further methods to verify the needs of image filtering (i.e., spatial verification). Meanwhile, it is not intuitive to select the number of top ranks for image expansion and mostly query-dependent. The fusion of multiple expanded lists will be discussed in Section
7 Figure 7: Examples from two photo datasets for evaluating image object retrieval. Oxford landmarks, (a) balliol and (b) ashmolean, in Oxford Building dataset. (c) Torre Pendente Di Pisa and (d) Starbucks in Flickr11K dataset including multiple buildings and small logos, which are more complicated and diverse. Rectangles highlight the objects Full image by spatial verification (SVFI) 4.3 Iteration for Expansion Rather than blindly taking the top images as seeds for inter-expansion as PRF, we select the images, from the initial search result, whose matched inliers are larger than a threshold δ, as shown in Figure 5. Naturally images with more inliers to the query are more likely to be true targets. Here the number of inliers between the query image and the target image is determined by spatial verification discussed in Section See the example in Figure 6, where we have 11 inliers between the target and query images and region B, the entire target image, is used for further image expansion in SVFI. Note that the threshold δ for determining the correct image will be experimented in Section The expanded photos can then iteratively work as the seeded photos for another round of inter-expansion (or enhanced with intra-expansion). The iteration stops as no more new photos are considered relevant (thresholded by inlier number δ) in the expansion. For example in Figure 5, from expanded ranking list 2 and 3, we will choose the retrieved images with matching inliers larger than the threshold δ as the new seeded images for next inter and intra-expansion. The expanded correct images are collected and later ranked by their corresponding ranking measures as the iteration stops. 5. EXPERIMENTAL SETUPS Matched region by spatial verification (SVMR) We experiment the proposed methods in two photo retrieval benchmarks, Oxford Building [23] and Flickr11K, a subset of Flickr550 [30]. Some of the queries and their ground-truth images are exemplified in Figure 7. Similar to SVFI, we need to conduct spatial verification and filter out incorrect images from the initial search result by inlier threshold δ. However, the region for inter-expansion is the matched region corresponding to the query object, i.e., Region A in Figure 6. As explained in Section 3.3.2, the region generally corresponds to the region of query interest and can be estimated in the spatial verification process. 5.1 Datasets Oxford Building dataset4: The Oxford Buildings dataset [23] consists of 5,062 images collected by issuing Oxford landmark names as search keywords on the Flickr website. The dataset includes 11 query categories with 5 queries each. We use the cropped image object from the authors [23] for the 55 query photos, illustrated in Figure 7 (a) and (b) (Query objects are in the red rectangles). The images are downsized to quarter of the original to match the image dimensions in our next dataset. The average image size is about 512x374 pixels. There are totally 7,162,122 feature points in Oxford dataset and on the average 1,415 ones for each photo Fusion methods As shown in Figure 5, multiple images or image regions from the initial search result will be used for expansion. Generally, each image will generate a ranking list through LSH or further be enhanced by intra-expansion methods introduced in Section 4.1. To maximize the performance for fusing multiple ranking lists from the seeded inter-expansion images, we consider several fusion methods and similarity measures as suggested in [32]. assumed scored zero in ranking list 3. Note that, for AVG, spatial verification is not employed for the expanded ranking lists and is thus efficient. Each expanded image is to take the average similarity scores Rs from the seeding ranking lists. Maximum Score (MAX): Same as AVG, except that the final ranking score for each expanded image is to take the maximum among those scored in each ranking list. Average Inliers (AVG_IL): Similar to AVG, except that the inlier score RL in Equation 3 is adopted. Hence, such fusion method requires applying spatial verification among the expanded ranking lists and required extra computation time. Maximum Inliers (MAX_IL): Same as AVG_IL, except that the final ranking score for each expanded image is to take the maximum among those scored in each ranking list. Borda Score (Borda): Instead of using the matching similarity score RS or the inlier score RL, the Borda score is used to score each expanded image list based on its ranking position in the list. For an image at rank i among the total N, its Borda score is 1-i/N. The final ranking score for each expanded image is to take the average among the Borda scores across expanded ranking lists. Flickr11K dataset: Flickr11K is a larger dataset consisting of 11,282 medium resolution (500x360) images. This is the subset provided by the authors of Flickr550 dataset [30], downloaded from Flickr in the European Travel group. We modify the queries and ground truth defined by the authors to suit the Average Score (AVG): in AVG, the expanded image in each ranking list (cf. Figure 5) is scored by the matching similarity RS in Equation 2. For inter-expansion, the query image is now the seeded image from the initial search result (e.g., image A ). The final ranking score for each expanded image is to take the average among those scored in each ranking list. Note that since LSH-based retrieval only returns a subset of database images. We assume zero score if the image is not within the expanded ranking list; for example, image B exists only in ranking list 2 but not in ranking list 3; thus it is 4 71 Note that we do not aim to optimize the retrieval performance for the benchmark but investigate relative improvements for the proposed expansion methods in LSH.
8 Table 1: Comparisons of intra-expansion methods, which all improve the baseline hash-based image retrieval in Oxford dataset. MPL (100 probes) is the most effective as probing more buckets neighboring the initially hashed bucket. MP and MF are limited due to loosely associating other candidate feature points by co-occurrences in other hash tables or sparse meta-features. % stands for relative improvement from the hash baseline. Oxford Baseline MP MPL MF MAP % objective of content-based photo search. The result is a total of 1,282 ground truth images in 7 query categories such as Torre Pendente Di Pisa and starbucks in Figure 7 (c) and (d). We then add 10k images randomly sampled from Flickr550 to form our Flickr11K dataset. 5.2 Performance Metrics In order to evaluate the retrieval performance, we use the average precision (AP) as the major performance metric. Widely adopted in large-scale photo/video retrieval benchmarks such as TRECVID [2], Oxford Buildings [23], and Flickr550 [30], AP approximates the area under a non-interpolated precision-recall curve for a query. Since AP only shows the performance for a single image query, generally we compute mean average precision (MAP) to represent the system performance over the all queries; thereby, 55 query images (across 11 categories) in Oxford dataset and 56 query images (across 7 categories) in Flickr11K dataset. 5.3 Baseline LSH Configurations The experiments for the LSH expansion are based on E2LSH implementation [1]. For object-level image retrieval by bags of feature points, the default configurations need to be adjusted for optimizing efficiency and effectiveness. Through the pilot experiment in a held-out dataset, we found that decreasing K or increasing L will help locate more matched feature point candidates since a smaller K will put loose constraints for hashing more feature points into the same bucket and more hash tables (larger L) can help accumulate more collided feature points. For baseline LSH, we set K=10, L=10 and W=400, through the sensitivity test in the held-out set for balancing time efficiency and effectiveness 5. Generally, each query feature point will retrieve (averaged) 0.01% of total number of feature points as candidates. For Oxford dataset, we have 7,162,122 feature points in LSH; on the average, each table has around 1,192,240 buckets. We also set the threshold for matched pair filtering ratio as 0.8 as suggested in [19]. Note that if the query feature point only has one nearest neighbor we treat the two points are matched. The hash-based baseline achieves (MAP) in Oxford dataset and in Flickr11K (cf. the second row in Table 3). With the configurations mentioned above, the averaged image query time for the two datasets is 0.8 second and 1.2 second respectively in a regular Linux server with Intel CPU. We also apply spatial verification on the hash-based baselines and found the improvement is marginal due to very sparse matching between 5 Note that according to our preliminary experiment, more hash tables can slightly improve the retrieval performance, however, at the cost of huge memory and slow response time. In the following, we retain a small number of hash tables. Table 2: Comparisons for variant inter-expansion and fusion methods in Oxford dataset; most of all outperform hash-based baseline (with MAP). The best performer for inter-expansion only is SVMR with MAX_IL fusion and is SVMR in AVG fusion if further considering intra-expansion. See more explanations in Section 6.2. Note that we have MPL for intra-expansion. Oxford Inter-expansion Intra + Inter expansion PRF SVFI SVMR PRF SVFI SVMR AVG MAX Borda MAX_IL AVG_IL photo pairs as commonly observed in bags of visual words paradigm as well [11]. We will show later that the introduction of inter and intra expansions can significantly boost the hash-based baseline by bringing in more target-related and context-related feature points. 6. RESULTS AND DISCUSSIONS 6.1 Intra-expansion on LSH We first experiment variant methods for intra-expansion, which is aimed to help locate similar feature points mis-hashed to other buckets. According to Table 1, the expansion methods (i.e., MP, MPL, and MF, cf. Section 4.1) all improve the object-level image retrieval by hash-based baseline in Oxford dataset. The expansion methods do include more related feature points helpful for bags of feature point matching. Note that the candidate feature points are later filtered by nearest neighbor ratio (cf. Section 3.3.1). Among them, multi-probe LSH (MPL) outperforms MP and MF because MPL can still locate neighboring hash buckets, where candidate feature points might reside, even if there are no matched points at the initial hashed bucket; whereas MP is to associate candidate feature points by feature co-occurrence and the association between feature points might be sparse. Similar deficiencies are observed in MF, which utilizes meta-features to associate candidate feature points. However the total number of buckets is so huge such that the meta-features might not be hashed into the same bucket with the initial query feature points. 6.2 Inter-expansion and Fusion Methods The goal of inter-expansion is to retrieve more target-related photos given the initial search results since there might be certain context-related feature points not observed in the given query. We compare three inter-expansion methods (PRF, SVFI, and SVMR) based on different fusion manners, which are necessary since generally multiple photos in the initial search result are seeded for expansion and their respective expanded lists need to be fused effectively, as illustrated in Figure 5. We also look into the impacts as further considering intra-expansion. For inter-expansion only (cf. Table 2) in Oxford dataset, PRF is the worst and at most improves the hash baseline 6.1% relatively (from to by MAX fusion). It is natural since PRF simply assumes that the top returns are correct and then issued for expansion directly. Such limitation for PRF-like methods in video retrieval is also observed in [9][29]. If the initial seeded photos are further filtered by spatial verification (cf. Section 3.3.2), the expanded results mostly have salient gains (e.g., up to 23%) over 72
9 Table 3: Performance gain boosts saliently as combining intra and inter expansions for both Oxford and Flick11K datasets. % stands for relative improvement in MAP. Oxford Flickr11K MAP % MAP % Baseline Intra Inter Intra + Inter Intra + Inter + Iteration the hash baseline (see SVFI in Table 2) across fusion methods. Apparently, spatial verification is helpful for filtering effective photos for expansion since the search targets might be spatially correlated for these object or landmark photos. Another factor is if the photo area (either a matched region only or the entire photo) for expansion does matter the expansion quality. For inter-expansion only, it does not differ a lot for SVFI and SVMR. The former uses the entire region for expansion and the latter uses the matched region only (cf. Figure 6). As more matched feature points are brought in by intra-expansion, matching region of interest (i.e., SVMR) with proper fusion methods (i.e., AVG, AVG_IL) saliently outperforms inter-expansion by the whole seeded photo, which is generally contaminated by background noises not relevant to the target photos. Meanwhile, as shown in Table 2, SVMR with AVG fusion and intra-expansion has the most gain (MAP to 0.460, relatively 76.3%) among all configurations in Oxford dataset; so does it in Flickr11K. As for the fusion methods, AVG, AVG_IL, and MAX_IL are generally on par for different inter-expansion configurations as stated in Table 2. However, AVG_IL and MAX_IL are time-consuming as evaluating matched inliers and require extra spatial verification for the expanded photo lists; however, AVG simply averages the feature point matching similarities across expanded photo lists and is very efficient. Borda fusion, taking only the ranking order from expanded photo lists, is non-effective as ignoring the matched inliers, spatial information, and feature point similarities. 6.3 Combining Intra and Inter Expansions We had shown that both intra and inter expansion methods could improve the hash-based baseline for image object retrieval. It is more interesting to see that the performance boosts significantly as combining both expansion strategies. More reliable matching pairs can be retrieved through intra-expansion and more context-related photos are yielded through inter-expansion from the retrieved image lists. Besides the comparisons by MAP, a sample query by different expansion methods is illustrated in Figure 2, where some results ranked low in the hash-based baseline can be boosted to the top by the expansion methods, also demonstrated by the precision-recall (PR) curves in Figure 2(b). As shown in Table 3, combining intra-expansion (MPL with 100 probes) and inter-expansion (SVMR in AVG fusion) can improve the MAP 76.3% relatively from the LSH baseline in Oxford dataset. Another interesting observation is that the intra and inter expansions seem to work orthogonally and the two expansion methods collaboratively boost the performance gains. For example in Oxford dataset, we have 52% relative improvement for intra and 13.1% for inter; ideally, the multiplied gains from Figure 8: Performance breakdown in Oxford dataset where we have 100-probe MPL for intra and SVMR with AVG fusion for inter-expansion. All expansion methods improve the hash-based baseline across query categories. Only few query categories degrade slightly in inter-expansion because some incorrect images are seeded for expansion. Interestingly, inter + intra and inter + intra + iteration outperform all others. two different methods is around 72% (0.72 = ( ) x ( ) 1; practically we have 76.3%. Similarly, for Flickr11K dataset, we have 67.3% for intra and 5.2% for inter; thus ideally the multiplied performance gain should be 76% (i.e., 0.76 = ( ) x ( ) 1) and empirically it is 75%. Figure 8 shows the performance breakdown for 11 query categories in Oxford dataset for the major expansion methods. All expansion methods improve the hash-based baseline across query categories. Only few query categories (e.g., Balliol ) degrade marginally in inter-expansion because some incorrect images from the hash baseline are seeded for expansion due to unreliable feature point matches. Interestingly, as we combine intra and inter expansions (either inter + intra or inter + intra + iteration ), the former brings in more robust feature point matches and significantly outperforms the hash baseline across queries. Iteratively expanding the retrieval results by inter and intra expansions is slightly helpful (2% - 5% relative improvement) as shown in the last row in Table 3. However, it takes quite time to conduct spatial verification for selecting the seeded photos for expansion. 6.4 Parameter Sensitivity Number of probes for MPL intra-expansion In intra-expansion, the most effective method is MPL, where the essential parameter is the number of probes to the neighboring buckets (cf. Section 4.1.2). Experimenting in Oxford dataset, we test different number of probes and find that the performance (in MAP) saturates for the number from 100 to 350, as shown in Figure 9. Generally, the performance increases as probing more buckets. It is natural since it improves the recall for mis-hashed feature points. However, as probing more than 350 buckets, the performance degrades due to including more noisy feature points. Note that the candidate feature points will increase as probing more buckets. Balancing efficiency and effectiveness, we choose 100 for MPL intra-expansion; even so, the total number of candidate feature points inspected for a feature point query is still small 0.1%, on the average, among the total. 73
Min-Hashing and Geometric min-hashing
Min-Hashing and Geometric min-hashing Ondřej Chum, Michal Perdoch, and Jiří Matas Center for Machine Perception Czech Technical University Prague Outline 1. Looking for representation of images that: is
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationThree things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)
Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of
More informationUnsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo 1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang 1, and Winston H. Hsu 1 1 National Taiwan University
More informationImage Retrieval with a Visual Thesaurus
2010 Digital Image Computing: Techniques and Applications Image Retrieval with a Visual Thesaurus Yanzhi Chen, Anthony Dick and Anton van den Hengel School of Computer Science The University of Adelaide
More informationColumbia University High-Level Feature Detection: Parts-based Concept Detectors
TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab
More informationTrademark Matching and Retrieval in Sport Video Databases
Trademark Matching and Retrieval in Sport Video Databases Andrew D. Bagdanov, Lamberto Ballan, Marco Bertini and Alberto Del Bimbo {bagdanov, ballan, bertini, delbimbo}@dsi.unifi.it 9th ACM SIGMM International
More informationLarge scale object/scene recognition
Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database
More informationRecognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)
Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationTotal Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision Fall 2007 Objective Given a query image of an object,
More informationBundling Features for Large Scale Partial-Duplicate Web Image Search
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research Abstract In state-of-the-art image retrieval systems, an image is
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationDetecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds
9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationExploratory Product Image Search With Circle-to-Search Interaction
Exploratory Product Image Search With Circle-to-Search Interaction Dr.C.Sathiyakumar 1, K.Kumar 2, G.Sathish 3, V.Vinitha 4 K.S.Rangasamy College Of Technology, Tiruchengode, Tamil Nadu, India 2.3.4 Professor,
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationBridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza
Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor
More informationLarge-scale visual recognition Efficient matching
Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!
More informationDetecting Clusters and Outliers for Multidimensional
Kennesaw State University DigitalCommons@Kennesaw State University Faculty Publications 2008 Detecting Clusters and Outliers for Multidimensional Data Yong Shi Kennesaw State University, yshi5@kennesaw.edu
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationVideo Google: A Text Retrieval Approach to Object Matching in Videos
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature
More informationGraph Matching Iris Image Blocks with Local Binary Pattern
Graph Matching Iris Image Blocs with Local Binary Pattern Zhenan Sun, Tieniu Tan, and Xianchao Qiu Center for Biometrics and Security Research, National Laboratory of Pattern Recognition, Institute of
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationImage Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58
Image Features: Local Descriptors Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 [Source: K. Grauman] Sanja Fidler CSC420: Intro to Image Understanding 2/ 58 Local Features Detection: Identify
More informationSEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang
SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191,
More informationDeep Face Recognition. Nathan Sun
Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an
More informationBAG-OF-VISUAL WORDS (BoVW) MODEL BASED APPROACH FOR CONTENT BASED IMAGE RETRIEVAL (CBIR) IN PEER TO PEER (P2P)NETWORKS.
BAG-OF-VISUAL WORDS (BoVW) MODEL BASED APPROACH FOR CONTENT BASED IMAGE RETRIEVAL (CBIR) IN PEER TO PEER (P2P)NETWORKS. 1 R.Lavanya, 2 E.Lavanya, 1 PG Scholar, Dept Of Computer Science Engineering,Mailam
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationAn Introduction to Content Based Image Retrieval
CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and
More informationBinary Online Learned Descriptors
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI.9/TPAMI.27.267993,
More informationK-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors
K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607
More informationClustering Billions of Images with Large Scale Nearest Neighbor Search
Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton
More informationStructuring a Sharded Image Retrieval Database
Structuring a Sharded Image Retrieval Database Eric Liang and Avideh Zakhor Department of Electrical Engineering and Computer Science, University of California, Berkeley {ekhliang, avz}@eecs.berkeley.edu
More informationBinary SIFT: Towards Efficient Feature Matching Verification for Image Search
Binary SIFT: Towards Efficient Feature Matching Verification for Image Search Wengang Zhou 1, Houqiang Li 2, Meng Wang 3, Yijuan Lu 4, Qi Tian 1 Dept. of Computer Science, University of Texas at San Antonio
More informationEnhanced Hemisphere Concept for Color Pixel Classification
2016 International Conference on Multimedia Systems and Signal Processing Enhanced Hemisphere Concept for Color Pixel Classification Van Ng Graduate School of Information Sciences Tohoku University Sendai,
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationImproving Recognition through Object Sub-categorization
Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,
More informationAccelerometer Gesture Recognition
Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationA Miniature-Based Image Retrieval System
A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,
More informationover Multi Label Images
IBM Research Compact Hashing for Mixed Image Keyword Query over Multi Label Images Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,
More informationA Systems View of Large- Scale 3D Reconstruction
Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)
More informationTextural Features for Image Database Retrieval
Textural Features for Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195-2500 {aksoy,haralick}@@isl.ee.washington.edu
More informationSpecular 3D Object Tracking by View Generative Learning
Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More information2. LITERATURE REVIEW
2. LITERATURE REVIEW CBIR has come long way before 1990 and very little papers have been published at that time, however the number of papers published since 1997 is increasing. There are many CBIR algorithms
More informationFace detection and recognition. Detection Recognition Sally
Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification
More informationClassifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped
More informationCS 223B Computer Vision Problem Set 3
CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.
More informationVolume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Internet
More informationSpeed-up Multi-modal Near Duplicate Image Detection
Open Journal of Applied Sciences, 2013, 3, 16-21 Published Online March 2013 (http://www.scirp.org/journal/ojapps) Speed-up Multi-modal Near Duplicate Image Detection Chunlei Yang 1,2, Jinye Peng 2, Jianping
More informationRanking Clustered Data with Pairwise Comparisons
Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances
More informationLarge Scale Nearest Neighbor Search Theories, Algorithms, and Applications. Junfeng He
Large Scale Nearest Neighbor Search Theories, Algorithms, and Applications Junfeng He Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School
More informationString distance for automatic image classification
String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationImgSeek: Capturing User s Intent For Internet Image Search
ImgSeek: Capturing User s Intent For Internet Image Search Abstract - Internet image search engines (e.g. Bing Image Search) frequently lean on adjacent text features. It is difficult for them to illustrate
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationFast Efficient Clustering Algorithm for Balanced Data
Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut
More informationImage Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 18.
Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 18 Image Hashing Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph:
More informationRobot localization method based on visual features and their geometric relationship
, pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationLearning independent, diverse binary hash functions: pruning and locality
Learning independent, diverse binary hash functions: pruning and locality Ramin Raziperchikolaei and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced
More informationChapter 5: Outlier Detection
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.
More informationLeveraging Transitive Relations for Crowdsourced Joins*
Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,
More informationDSH: Data Sensitive Hashing for High-Dimensional k-nn Search
DSH: Data Sensitive Hashing for High-Dimensional k-nn Search Jinyang Gao, H. V. Jagadish, Wei Lu, Beng Chin Ooi School of Computing, National University of Singapore, Singapore Department of Electrical
More informationAdaptive Binary Quantization for Fast Nearest Neighbor Search
IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationA Keypoint Descriptor Inspired by Retinal Computation
A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement
More informationBeyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University
Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University Why Image Retrieval? World Wide Web: Millions of hosts Billions of images Growth of video
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationClassifiers and Detection. D.A. Forsyth
Classifiers and Detection D.A. Forsyth Classifiers Take a measurement x, predict a bit (yes/no; 1/-1; 1/0; etc) Detection with a classifier Search all windows at relevant scales Prepare features Classify
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationHand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction
Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction Chieh-Chih Wang and Ko-Chih Wang Department of Computer Science and Information Engineering Graduate Institute of Networking
More informationContent Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features
Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features 1 Kum Sharanamma, 2 Krishnapriya Sharma 1,2 SIR MVIT Abstract- To describe the image features the Local binary pattern (LBP)
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationShort Run length Descriptor for Image Retrieval
CHAPTER -6 Short Run length Descriptor for Image Retrieval 6.1 Introduction In the recent years, growth of multimedia information from various sources has increased many folds. This has created the demand
More informationCRF Based Point Cloud Segmentation Jonathan Nation
CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to
More informationCS 231A Computer Vision (Fall 2012) Problem Set 3
CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationMobile Human Detection Systems based on Sliding Windows Approach-A Review
Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London What Is Text Clustering? Text Clustering = Grouping a set of documents into classes of similar
More informationA Pivot-based Index Structure for Combination of Feature Vectors
A Pivot-based Index Structure for Combination of Feature Vectors Benjamin Bustos Daniel Keim Tobias Schreck Department of Computer and Information Science, University of Konstanz Universitätstr. 10 Box
More informationCombining Appearance and Topology for Wide
Combining Appearance and Topology for Wide Baseline Matching Dennis Tell and Stefan Carlsson Presented by: Josh Wills Image Point Correspondences Critical foundation for many vision applications 3-D reconstruction,
More informationImage Matching Using Run-Length Feature
Image Matching Using Run-Length Feature Yung-Kuan Chan and Chin-Chen Chang Department of Computer Science and Information Engineering National Chung Cheng University, Chiayi, Taiwan, 621, R.O.C. E-mail:{chan,
More informationFace Recognition using Eigenfaces SMAI Course Project
Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract
More informationImproved Coding for Image Feature Location Information
Improved Coding for Image Feature Location Information Sam S. Tsai, David Chen, Gabriel Takacs, Vijay Chandrasekhar Mina Makar, Radek Grzeszczuk, and Bernd Girod Department of Electrical Engineering, Stanford
More informationTop-K Entity Resolution with Adaptive Locality-Sensitive Hashing
Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing Vasilis Verroios Stanford University verroios@stanford.edu Hector Garcia-Molina Stanford University hector@cs.stanford.edu ABSTRACT Given
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More information