Metric inverted - an efficient inverted indexing method for metric spaces
|
|
- Brian Campbell
- 6 years ago
- Views:
Transcription
1 Metric inverted - an efficient inverted indexing method for metric spaces Benjamin Sznajder, Jonathan Mamou, Yosi Mass, and Michal Shmueli-Scheuer IBM Haifa Research Laboratory, Mount Carmel Campus, Haifa, Israel {benjams,mamou,yosimass,shmueli}@il.ibm.com Abstract. The increasing amount of digital audio-visual content accessible today calls for scalable solutions for content based search. The state of the art solutions reveal linear scalability in respect to the collection size due to the large number of distance computations needed for comparing low level audio-visual features. As a result, search in large audio-visual collections is limited to associated metadata only done by text retrieval methods that are proven to be scalable. Search in audio-visual content can be generalized to search in metric spaces by assuming some distance function on low-level features. In this paper we propose a framework for efficient indexing and retrieval of metric spaces by extending classical techniques taken from the textual IR methods such as lexicon, posting lists and boolean constraints- thus enable scalable search in any metric space and in particular in audiovisual content. We show the efficiency and effectiveness of our method by experiments on a large image collection. 1 Introduction Recently, with the proliferation of Web 2.0 applications, we experience a new trend in multimedia production where users become mass producers of audiovisual content. Websites like Flickr, YouTube and Facebook 1 are just some examples of websites with large collections of audio-visual content. Still, search in such collections is limited to associated metadata that was added manually by users and thus search results are quite dependant on the quality of the metadata. The problem of audio-visual content based search can be generalized to a search in metric spaces by assuming some distance function on low-level features. For example, for CBIR (Content Based Image Retrieval) it is common to use low level features such as Scalable Color, Textures, Edge Histograms etc. For example, the Scalable Color feature which describes the image color histogram can be represented by a vector of 64 bytes. Distance between two Scalable Colors vectors can be computed by the L 1 norm thus inducing a metric space over the domain of all Scalable Colors. Similarly, we can have another metric space for other low level image features [9]. This work was partially funded by the European Commission Sixth Framework Programme project SAPIR - Search in Audiovisual content using P2P IR
2 State of the art solutions for search in metric spaces such as [2, 4, 14] reveal linear scalability in the collection size. This is mainly because of high number of distance computations needed for comparing low level features. Thus, some approximation is needed. On the other hand, text search is known to be scalable since text IR methods uses efficient data structures such as lexicon, posting lists and boolean operations that make it very efficient for searching in large collections. In this work, we try to bridge the performance gap between text IR and metric IR by introducing an approximation for top-k using MII (Metric Inverted Index) - an inverted index implementation for metric spaces. We show that by using text IR methods such as lexicon, posting lists and boolean operators we can achieve high scalability for search in metric spaces. Using the MII we can tradeoff between efficiency and effectiveness of our approximated top-k. We show an implementation of a system based on Lucene 2 and report experiments on a real data collection that shows the efficiency of our method compared to other methods. The paper is organized as follows. We review related work in Section 2. In Section 3, we define metric spaces and state the problem. In Section 4, we describe a text IR method for indexing metric spaces. Then, in Section 5, we show how to use them efficiently for top-k retrieval. Experimental setup and results on a large image collection are given in Section 6 and we conclude with future work in Section 7. 2 Related work The idea of using inverted index to represent audio-visual was already suggested by some works. The serious of papers by Müller et al. [13, 7, 12] is one of the early works that suggested the use of inverted index for visual features. The terms in the lexicon are fixed and it contains pre-defined set of around features which correspond to human perception features. In [12] they show how to improve the search by pruning. Specifically, they define lossy and lossless reductions, both use the knowledge of the score and score increasing. However, it is not clear if and how this approach is scalable (their experiments are on limited collection of about 2500 images). The work in [17] shares similar motivation to our work. They facilitate the use of inverted index to index visual features by using N-grams approach on the visual descriptors. Their approach is thus limited only to the vector space, whereas, our approach is more general in the sense that it can handle any type of objects with metric distance. Recently, [1, 8] showed how to generate tf idf for image visual descriptors using segmentation. Specifically, in [1], each image was segmented into several segments and like-tf is generated for the image. Our focus is different, we do not attempt to use tf idf scoring but rather efficient filtering method, which implies different indexing and querying methods. The focus in [20, 3] is how to combine low level features 2
3 with text, semantic keywords etc. In [11], they use the tf idf measure for index and search in videos. Their goal is to efficiently recognize scenes and objects in videos. For scenes search, they indexed continuous frames into regions preferring Shape Adapted and Maximally Stable regions, To locate on object in the frame, they mainly look on the spatial consistency (other objects in the frame) and look for the best match. Another line of related works discuss methodologies of indexing and accessing data in metric spaces and are covered in details in [9, 19]. These techniques are general and work for every metric space, however, there main disadvantage is that complexity increases linearly with the collection size, thus, they are not scalable and not appropriate to large-scale collections. Each of the techniques partition the metric space differently, specifically, the work in [18] considered the space as a ball and applied ball-partitioning. An orthogonal approach is the GHT by [14], where two reference objects (pivots) are arbitrarily chosen and assigned to two distinct objects subset. All objects are assigned to the subset containing the pivot which is nearest to the object itself. In contrast to ball partitioning the generalized hyperplane does not guarantee a balanced split. Finally, the works in [2, 4] present another way of partitioning the data having balanced structure. Merging techniques are also relevant to our problem since we want to do efficient merging of different index lists. Approaches like mergesort and more sophisticated, like TA [5] exists, our work, however, uses boolean query techniques. 3 Definitions and Problem Statement 3.1 Definitions Metric Spaces A metric space is an ordered pair (S, d), where S is a domain and d is a distance function d : S S R. d calculates the distance between two objects from the set S. Usually, the smaller the distance is, the more similar the objects are. The distance function d must satisfy non-negativity, reflexibility, symmetry and triangle inequality. Assume a metric space (S, d), a collection C S of objects, and a query Q S. The best-k results for the query are the k objects with the shortest distance to the query object. Top-k problem Assume m metric spaces {(S i, d i ))} m i=1, a query Q m i=1 S i, and an aggregate function f : R m R. The problem of top-k is to retrieve the best k objects that have the lower aggreagte value, namely those objects D with the lower f(d 1 (Q, D), d 2 (Q, D),..., d m (Q, D)) Measures effectiveness To measure effectiveness of a retrieval engine we use the standard Precision and MAP (Mean average Precision) [16] measures. The precision is used to measure the percentage of documents retrieved that are
4 relevant. In metric spaces each object has some distance to the query (and thus some level of relevancy). Thus, we define which is the same as MAP but we take the relevant results to be the real top-k objects. efficiency Let us assume that the cost of a single distance computation between any two objects in any of the metric spaces is c d. The efficiency of a retrieval engine on metric spaces is measured by the total computation cost (c d N d ), where N d is the total number of distance computations needed to answer a top-k query. 3.2 Problem Statement Assume {(S i, d i ))} m i=1 metric spaces and a cost c d for a single distance computation. Our goal is to return the best approximation (measured by effectiveness) to the top-ḱ query while minimizing the total computation cost (efficiency measure). 4 Indexing In this section, we describe the indexing stage. We assume m metric spaces denoted by {(S 1, d 1 ) (S m, d m )} and a collection of N objects each having m features F 1 F m such that i [1, m], F i (S i, d i ). An inverted text index contains two main parts: a lexicon of terms in the collection and posting lists containing for each term the list of documents in which the term appear. For metric spaces, a term is a feature names associated with its value. Indexing stage is done in two steps: first, we create a lexicon; then, we index each object in the corresponding entries of the lexicon. 4.1 Lexicon candidate selection In text IR, the size of the lexicon is substantially smaller than the size of the collection since terms tend to appear in many documents. This is not always the case for metric spaces. For example, low level image features such as e.g. a Scalable Color can be reprensented by a vector of 256 integers. Since the space of possible values for each feature is huge, the size of a lexicon will be in the order of the collection size. Consequently, it would be impossible to store all features in the lexicon as in text IR. Thus we would like to select candidates for the lexicon such that the max number of lexicon entries will be at most l. Note that we keep all the different features from all the metric spaces in the same lexicon. A naive approach would consist by ramdomly selecting l/m documents per feature. The m features extracted from these documents will constitute the lexicon terms. This approach can be widely improved by applying some clustering algorithms on the extracted features using the distance function such that selected candidates spans the metric space. Such algorithms are described in [10] and [15]. In the present work, we use K-Means clustering algorithm [6].
5 For each feature, we choose randomly M = l/m centroids. Note that generally M << N. K-Means algorithm is run iteratively by mapping each feature to the closest centroid using the distance function. Then new centroids are selected and the process continues until fixed clusters are reached, namely no feature was moved to another centroid in a full iteration. The lexicon elements are then the clusters centroids. K-Means is run for each of the m features and we get a lexicon of l terms. 4.2 Index creation Once the lexicon entries are selected we need to create the posting lists. Let us denote the terms of the lexicon by (F : v) where F is the feature name and v is its value. Assume an object D with features D = {(F 1 : v 1 ), (F 2 : v 2 ),..., (F m : v m )}. Most of the features (F i : v i ) are not part of the lexicon so we need to select from the lexicon terms that best represent our object. This is done by selecting for each feature (F i : v i ), the n nearest lexicon terms (n is a parameter) according to the distance function. In order to determine these nearest terms efficiently, the lexicon is stored in an M-tree [2]. After this stage, D can be represented as: D = {(F 1 : v 11 ), (F 1 : v 12 ),... (F 1 : v 1n ), (F 2 : v 21 ), (F 2 : v 22 ),... (F 2 : v 2n )..., (F m : v m1 ), (F m : v m2 ),..., (F m : v mn )}, where (F i : v ij ) are 3 lexicon terms. In other words, each document is transformed to a bag of n m lexicon terms and therefor it is indexed in n m posting lists. Note that the n nearest terms selection can be compared to the stemming step in text IR. 5 Retrieval stage Given a query Q = {F 1 : qv 1, F 2 : qv 2,..., F m : qv m } we describe now the query processing done to retrieve the best approximation of the top-k while minimizing the computation cost as defined in Section 3. Query processing is done in three steps: term selection, Boolean constraint filtering, and scoring. 5.1 Term selection: Similarly to the indexing process in section 4 above, we first find for each query feature, the n closest lexicon terms that can represent it. This results in Q = {F 1 : qv 11, F 1 : qv 12,... F 1 : v 1n,... F m : qv m1, F m : qv m2,..., F m : qv mn }. Q contains m n terms all from the lexicon. 3 BS: maybe add all
6 5.2 Boolean constraint Filtering: We look now at the posting lists of the m n in Q and we want to select the best approximation to the top-k out of the objects in those posting lists. We present two strategies to select the best results: Strict-Query mode - recall that we have m features in the query and for each feature we have n posting lists of the closest lexicon entries. In the strict mode we would like to have results only if they appear in all m features. To achieve that we create a conjunction of m clauses C i, where clause C i is a disjunction of the n terms found for query feature (F i, qv i ). Experiments demonstrate that this occurs very rarely in real configuration. Formally, m n j=1(f i : qv ij ) (1) i=1 Fuzzy-Query mode - in this mode, we consider all objects that appear in at least 2 features. To achive that we create ( ) m 2 conjunction clauses for each pair of features and then we create a disjunction between those clauses. Formally, m ( ( n j=1(f i : v ij ), n j=1(f k : v kj )), i k) (2) i=1 5.3 Scoring At the end of the filtering step we have a list of documents and we would like to rank them by relevance to the query. Given an aggregate function f, we compute for each returned document D, its aggregate score to the query Q given by f(d 1 (F 1 : qv 1, F 1 : v 1 ), d 2 (F 2 : qv 2, F 2 : v 2 ),..., d m (F m : qv m, F m : v m )), where F i : v i and F i : qv i are the original values of the document and the query. Our top-k result is the k documents with the lower aggregate score. 6 Experiments In this section, we show thorough analysis of the MII under various settings with focus on the effectiveness which is measured by the number of distance computations and the efficiency which we measured by MAP@k. Specifically, we varied the lexicon size and number of nearest terms. In addition, we compare the MII to the M-Tree data structure [2], which is one of the state of the art data structure for metric spaces 4. 4 The M-Tree implementation we used has been taken from
7 6.1 Collection description Our experiments were done on a collection of images crawled from Flickr 5, for each image, three MPEG7 features have been extracted, namely, ScalableColor, EdgeHistogram and ColorLayout. The crawling and extraction are described in [?]. The distance function that we used for ScalableColor and EdgeHistogram is the L 1 (Manhattan distance), and L 2 (Euclidean distance) for ColorLayout. The aggregate function f is simple sum. We randomly chosen 180 images from the collection to serve as the queries, for each we collected the MAP and the number of comparisons. In addition, we used the Fuzzy-Query mode. 6.2 Results Like all pruning algorithms, the experiments point on a tension between the precision of the results and the strictness of the pruning: more the pruning is strict, more the precision will be low and the efficiency of the computing high. Effectiveness We present the effectiveness of the MII approach using the MAP@k measure. Figure 1(a) shows the MAP@k values vs. the number of nearest terms for index with lexicon size of l = When the number of nearest terms increases, the MAP@k increases as well, for example, when the number of nearest terms is 20 we got MAP@30=0.75 and MAP@10=0.83. Increasing the number of nearest terms implies more accesses to entries in the lexicon and our filtering becomes lazier, eventually, more documents need to pass the full ranking. For 30 nearest terms the MAP value is around the 90%. Figure 1(b) shows the MAP@k values vs. the lexicons size for fixed number of nearest terms (30). Reducing the size of the lexicon enhanced the MAP@k. This result was expected, since the posting lists of each term became longer and the Boolean filtering need to validate more documents. (a) (b) Fig. 1. MAP value for different k s.(a) Lexicons size =16000, varied number of nearest terms(b) number of nearest term =30, varied size of lexicon Efficiency The efficiency is measured by the number of distance computations. We calculate the number of computations as follow; for each feature fet, we use M-Tree to found the n nearest terms in the lexicon. Let us designate this cost as lookup fet. After this step, we merge the different lists using Boolean filtering which does not imply additional computation costs. Finally, assume the number of objects that passed the filtering step to be m and the number of feature that 5
8 we used to be x, in order to get the top-k results, each object need additional m x comparisons. Formally, the total distance computations is: x (lookup i ) + x m i=1 Figure 2 shows the efficiency results for varying sizes of lexicon. Intuitively, reducing the size of the lexicon should increase the number of computations, however, we get that the cost for a size of lexicon l = is lower than for a size of lexicon equals to l = This phenomenon can be explained as follow, the lookup in the lexicon for selecting the Visual Vocabulary is cheap and compensates the manipulation of long returned results list. This result is interesting and promising, we can find an optimal size of lexicon that insures us a lazier pruning, and does not cost more in computations. Fig. 2. # comparisons vs. # nearest terms for varied lexicons size To validate the goodness of our efficiency results, we compare it with the number of computations involved in a lookup via M-Tree. Figure 3 shows the number of computations for MII and M-Tree for different values of ks. Fig. 3. # comparisons vs. k 7 Conclusion and future works References 1. Giuseppe Amato, Pasquale Savino, and Vanessa Magionami. Image indexing and retrieval using visual terms and text-like weighting. In DELOS Conference, pages 11 21, Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Matthias Jarke, Michael Carey, Klaus R. Dittrich, Fred Lochovsky, Pericles Loucopoulos, and Manfred A. Jeusfeld, editors, Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 97), pages , Athens, Greece, August Morgan Kaufmann Publishers, Inc. 3. T. Deselaers, T. Weyand, D. Keysers, W. Macherey, and H. Ney. FIRE in ImageCLEF 2005: Combining content-based image retrieval with textual information retrieval. In Working notes of the CLEF 2005 Workshop, Vienna, Austria, September 2005.
9 4. Vlastislav Dohnal, Claudio Gennaro, Pasquale Savino, and Pavel Zezula. D-index: Distance searching index for metric data sets. Multimedia Tools Appl., 21(1):9 33, Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. In PODS 01: Proceedings of the twentieth ACM SIGMOD-SIGACT- SIGART symposium on Principles of database systems, pages , New York, NY, USA, ACM Press. 6. J. A. Hartigan and M. A. Wong. A K-means clustering algorithm. Applied Statistics, 28: , Henning Muller, David McG. Squire, Wolfgang Muller, and Thierry Pun. Efficient access methods for content-based image retrieval with inverted files. 8. Rute Frias Rui Jesus, Ricardo Dias and Nuno Correia. Multimedia information retrieval: new challenges in audio visual search. SIGIR Forum, 41(2):77 82, Hanan Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Gholamhosein Sheikholeslami, Wendy Chang, and Aidong Zhang. Semantic clustering and querying on heterogeneous features for visual data. In MULTIMEDIA 98: Proceedings of the sixth ACM international conference on Multimedia, pages 3 12, New York, NY, USA, ACM. 11. Josef Sivic and Andrew Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV 03: Proceedings of the Ninth IEEE International Conference on Computer Vision, page 1470, Washington, DC, USA, IEEE Computer Society. 12. D. Squire, H. Muller, and W. Muller. Improving response time by search pruning in a content-based image retrieval system. 13. D. Squire, W. Muller, H. Muller, and J. Raki. Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback. In Proceedings of the 10th Scandinavian Conference on Image Analysis (SCIA 99), June Jeffrey K. Ulmann. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett., 40(4): , A. Vailaya, M. Figueiredo, A. Jain, and H. Zhang. Image classification for contentbased indexing, C. J. Van Rijsbergen. Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow, Peter Wilkins, Paul Ferguson, Alan F. Smeaton, and Cathal Gurrin. Text based approaches for content-based image retrieval on large image collections. In 2nd IEE European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, P. Yianilos. Excluded middle vantage point forests for nearest neighbor search, Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer- Verlag New York, Inc., Secaucus, NJ, USA, Xiang Sean Zhou and Thomas S. Huang. Unifying keywords and visual contents in image retrieval. IEEE MultiMedia, 9(2):23 33, 2002.
MESSIF: Metric Similarity Search Implementation Framework
MESSIF: Metric Similarity Search Implementation Framework Michal Batko David Novak Pavel Zezula {xbatko,xnovak8,zezula}@fi.muni.cz Masaryk University, Brno, Czech Republic Abstract The similarity search
More informationCombining Local and Global Visual Feature Similarity using a Text Search Engine
Combining Local and Global Visual Feature Similarity using a Text Search Engine Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, Fausto Rabitti {giuseppe.amato, fabrizio.falchi, paolo.bolettieri,
More informationImage retrieval based on bag of images
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong
More informationMultimodal Medical Image Retrieval based on Latent Topic Modeling
Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath
More informationIPL at ImageCLEF 2017 Concept Detection Task
IPL at ImageCLEF 2017 Concept Detection Task Leonidas Valavanis and Spyridon Stathopoulos Information Processing Laboratory, Department of Informatics, Athens University of Economics and Business, 76 Patission
More informationAn approach to Content-Based Image Retrieval based on the Lucene search engine library (Extended Abstract)
An approach to Content-Based Image Retrieval based on the Lucene search engine library (Extended Abstract) Claudio Gennaro, Giuseppe Amato, Paolo Bolettieri, Pasquale Savino {claudio.gennaro, giuseppe.amato,
More informationNearest Neighbor Search by Branch and Bound
Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to
More informationApproximate Similarity Search in Metric Spaces using Inverted Files
Approximate Similarity Search in Metric Spaces using Inverted Files Giuseppe Amato ISTI-CNR Via G. Moruzzi, 56, Pisa, Italy giuseppe.amato@isti.cnr.it ABSTRACT We propose a new approach to perform approximate
More informationCLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More informationThe Effect of Word Sampling on Document Clustering
The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,
More informationContent-Based Image Retrieval of Web Surface Defects with PicSOM
Content-Based Image Retrieval of Web Surface Defects with PicSOM Rami Rautkorpi and Jukka Iivarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O. Box 54, FIN-25
More informationClassifying Documents by Distributed P2P Clustering
Classifying Documents by Distributed P2P Clustering Martin Eisenhardt Wolfgang Müller Andreas Henrich Chair of Applied Computer Science I University of Bayreuth, Germany {eisenhardt mueller2 henrich}@uni-bayreuth.de
More informationA Pivot-based Index Structure for Combination of Feature Vectors
A Pivot-based Index Structure for Combination of Feature Vectors Benjamin Bustos Daniel Keim Tobias Schreck Department of Computer and Information Science, University of Konstanz Universitätstr. 10 Box
More informationADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN
THE SEVENTH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2002), DEC. 2-5, 2002, SINGAPORE. ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN Bin Zhang, Catalin I Tomai,
More informationQuery-Sensitive Similarity Measure for Content-Based Image Retrieval
Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn
More informationConsistent Line Clusters for Building Recognition in CBIR
Consistent Line Clusters for Building Recognition in CBIR Yi Li and Linda G. Shapiro Department of Computer Science and Engineering University of Washington Seattle, WA 98195-250 shapiro,yi @cs.washington.edu
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationAn Enhanced Image Retrieval Using K-Mean Clustering Algorithm in Integrating Text and Visual Features
An Enhanced Image Retrieval Using K-Mean Clustering Algorithm in Integrating Text and Visual Features S.Najimun Nisha 1, Mrs.K.A.Mehar Ban 2, 1 PG Student, SVCET, Puliangudi. najimunnisha@yahoo.com 2 AP/CSE,
More informationmodern database systems lecture 5 : top-k retrieval
modern database systems lecture 5 : top-k retrieval Aristides Gionis Michael Mathioudakis spring 2016 announcements problem session on Monday, March 7, 2-4pm, at T2 solutions of the problems in homework
More informationCPPP/UFMS at ImageCLEF 2014: Robot Vision Task
CPPP/UFMS at ImageCLEF 2014: Robot Vision Task Rodrigo de Carvalho Gomes, Lucas Correia Ribas, Amaury Antônio de Castro Junior, Wesley Nunes Gonçalves Federal University of Mato Grosso do Sul - Ponta Porã
More informationContent-Based Image Retrieval with LIRe and SURF on a Smartphone-Based Product Image Database
Content-Based Image Retrieval with LIRe and SURF on a Smartphone-Based Product Image Database Kai Chen 1 and Jean Hennebert 2 1 University of Fribourg, DIVA-DIUF, Bd. de Pérolles 90, 1700 Fribourg, Switzerland
More informationK-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors
K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607
More informationPivot Selection Strategies for Permutation-Based Similarity Search
Pivot Selection Strategies for Permutation-Based Similarity Search Giuseppe Amato, Andrea Esuli, and Fabrizio Falchi Istituto di Scienza e Tecnologie dell Informazione A. Faedo, via G. Moruzzi 1, Pisa
More informationCrawling, Indexing, and Similarity Searching Images on the Web
Crawling, Indexing, and Similarity Searching Images on the Web (Extended Abstract) M. Batko 1 and F. Falchi 2 and C. Lucchese 2 and D. Novak 1 and R. Perego 2 and F. Rabitti 2 and J. Sedmidubsky 1 and
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Agenda Evaluations
More informationTRECVid 2013 Experiments at Dublin City University
TRECVid 2013 Experiments at Dublin City University Zhenxing Zhang, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton INSIGHT Centre for Data Analytics Dublin City University Glasnevin, Dublin 9, Ireland
More informationA NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY
A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY S.Shiva Reddy *1 P.Ajay Kumar *2 *12 Lecterur,Dept of CSE JNTUH-CEH Abstract Optimal route search using spatial
More informationProposal: k-d Tree Algorithm for k-point Matching
Proposal: k-d Tree Algorithm for k-point Matching John R Hott University of Virginia 1 Motivation Every drug seeking FDA approval must go through Phase II and III clinical trial periods (trials on human
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationSketch Based Image Retrieval Approach Using Gray Level Co-Occurrence Matrix
Sketch Based Image Retrieval Approach Using Gray Level Co-Occurrence Matrix K... Nagarjuna Reddy P. Prasanna Kumari JNT University, JNT University, LIET, Himayatsagar, Hyderabad-8, LIET, Himayatsagar,
More informationFrom Passages into Elements in XML Retrieval
From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles
More informationQuery Evaluation Strategies
Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Labs (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa Research
More informationA Graph Theoretic Approach to Image Database Retrieval
A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500
More informationOutline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI
Other Use of Triangle Ineuality Algorithms for Nearest Neighbor Search: Lecture 2 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Outline
More informationOffering Access to Personalized Interactive Video
Offering Access to Personalized Interactive Video 1 Offering Access to Personalized Interactive Video Giorgos Andreou, Phivos Mylonas, Manolis Wallace and Stefanos Kollias Image, Video and Multimedia Systems
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationLarge Scale Image Retrieval
Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions
More informationFuzzy based Multiple Dictionary Bag of Words for Image Classification
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image
More informationОбработка и оптимизация запросов
Обработка и оптимизация запросов Б.А. Новиков Осенний семестр 2014 h=p://www.math.spbu.ru/user/boris novikov/courses/ 20 сентября 2014 1 Antonin Gu=man. 1984. R- trees: a dynamic index structure for spazal
More informationA Content Based Image Retrieval System Based on Color Features
A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationNOVEL APPROACH FOR COMPARING SIMILARITY VECTORS IN IMAGE RETRIEVAL
NOVEL APPROACH FOR COMPARING SIMILARITY VECTORS IN IMAGE RETRIEVAL Abstract In this paper we will present new results in image database retrieval, which is a developing field with growing interest. In
More informationEnhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention
Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Anand K. Hase, Baisa L. Gunjal Abstract In the real world applications such as landmark search, copy protection, fake image
More informationA Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems
A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems Kyu-Woong Lee Hun-Soon Lee, Mi-Young Lee, Myung-Joon Kim Abstract We address the problem of designing index structures that
More informationEfficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points
Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,
More informationAn Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques
An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques Doaa M. Alebiary Department of computer Science, Faculty of computers and informatics Benha University
More informationMUFIN Basics. MUFIN team Faculty of Informatics, Masaryk University Brno, Czech Republic SEMWEB 1
MUFIN Basics MUFIN team Faculty of Informatics, Masaryk University Brno, Czech Republic mufin@fi.muni.cz SEMWEB 1 Search problem SEARCH index structure data & queries infrastructure SEMWEB 2 The thesis
More informationMobile Visual Search with Word-HOG Descriptors
Mobile Visual Search with Word-HOG Descriptors Sam S. Tsai, Huizhong Chen, David M. Chen, and Bernd Girod Department of Electrical Engineering, Stanford University, Stanford, CA, 9435 sstsai@alumni.stanford.edu,
More informationText Based Approaches for Content-Based Image Retrieval on Large Image Collections
Text Based Approaches for Content-Based Image Retrieval on Large Image Collections Peter Wilkins, Paul Ferguson, Alan F. Smeaton and Cathal Gurrin Centre for Digital Video Processing Adaptive Information
More informationVideo Google: A Text Retrieval Approach to Object Matching in Videos
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature
More informationAN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH
AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationBeyond Bags of Features
: for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationA Simple, Efficient, Parallelizable Algorithm for Approximated Nearest Neighbors
A Simple, Efficient, Parallelizable Algorithm for Approximated Nearest Neighbors Sebastián Ferrada 1, Benjamin Bustos 1, and Nora Reyes 2 1 Institute for Foundational Research on Data Department of Computer
More informationLecture 12 Recognition
Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab
More informationEfficient Content Based Image Retrieval System with Metadata Processing
IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 10 March 2015 ISSN (online): 2349-6010 Efficient Content Based Image Retrieval System with Metadata Processing
More informationAutomatic Group-Outlier Detection
Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr
More informationEfficient Representation of Local Geometry for Large Scale Object Retrieval
Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society
More informationIssues in Using Knowledge to Perform Similarity Searching in Multimedia Databases without Redundant Data Objects
Issues in Using Knowledge to Perform Similarity Searching in Multimedia Databases without Redundant Data Objects Leonard Brown The University of Oklahoma School of Computer Science 200 Felgar St. EL #114
More informationA Document-centered Approach to a Natural Language Music Search Engine
A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler
More informationPivoting M-tree: A Metric Access Method for Efficient Similarity Search
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz
More informationMultimodal Information Spaces for Content-based Image Retrieval
Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due
More informationSpatial Index Keyword Search in Multi- Dimensional Database
Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases
More informationHandling Multiple K-Nearest Neighbor Query Verifications on Road Networks under Multiple Data Owners
Handling Multiple K-Nearest Neighbor Query Verifications on Road Networks under Multiple Data Owners S.Susanna 1, Dr.S.Vasundra 2 1 Student, Department of CSE, JNTU Anantapur, Andhra Pradesh, India 2 Professor,
More informationInverted Index for Fast Nearest Neighbour
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationEfficient Indexing and Searching Framework for Unstructured Data
Efficient Indexing and Searching Framework for Unstructured Data Kyar Nyo Aye, Ni Lar Thein University of Computer Studies, Yangon kyarnyoaye@gmail.com, nilarthein@gmail.com ABSTRACT The proliferation
More informationClustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers
Clustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers Nieves Brisaboa 1, Oscar Pedreira 1, Diego Seco 1, Roberto Solar 2,3, Roberto Uribe 2,3 1 Database Laboratory, University
More informationImproving Instance Search Performance in Video Collections
Improving Instance Search Performance in Video Collections Zhenxing Zhang School of Computing Dublin City University Supervisor: Dr. Cathal Gurrin and Prof. Alan Smeaton This dissertation is submitted
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationStatic Pruning of Terms In Inverted Files
In Inverted Files Roi Blanco and Álvaro Barreiro IRLab University of A Corunna, Spain 29th European Conference on Information Retrieval, Rome, 2007 Motivation : to reduce inverted files size with lossy
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval
More informationLecture 12 Recognition. Davide Scaramuzza
Lecture 12 Recognition Davide Scaramuzza Oral exam dates UZH January 19-20 ETH 30.01 to 9.02 2017 (schedule handled by ETH) Exam location Davide Scaramuzza s office: Andreasstrasse 15, 2.10, 8050 Zurich
More informationAn Efficient Methodology for Image Rich Information Retrieval
An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,
More informationComponent ranking and Automatic Query Refinement for XML Retrieval
Component ranking and Automatic uery Refinement for XML Retrieval Yosi Mass, Matan Mandelbrod IBM Research Lab Haifa 31905, Israel {yosimass, matan}@il.ibm.com Abstract ueries over XML documents challenge
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More informationString distance for automatic image classification
String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,
More informationVolume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Internet
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationIMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES
IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,
More informationAUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS
AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,
More informationIntroduction to Similarity Search in Multimedia Databases
Introduction to Similarity Search in Multimedia Databases Tomáš Skopal Charles University in Prague Faculty of Mathematics and Phycics SIRET research group http://siret.ms.mff.cuni.cz March 23 rd 2011,
More informationAutomatic Cluster Number Selection using a Split and Merge K-Means Approach
Automatic Cluster Number Selection using a Split and Merge K-Means Approach Markus Muhr and Michael Granitzer 31st August 2009 The Know-Center is partner of Austria's Competence Center Program COMET. Agenda
More informationImproved Query by Image Retrieval using Multi-feature Algorithms
International Journal of Scientific & Engineering Research, Volume 4, Issue 8, August 2013 Improved Query by Image using Multi-feature Algorithms Rani Saritha R, Varghese Paul, P. Ganesh Kumar Abstract
More informationBipartite Graph Partitioning and Content-based Image Clustering
Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the
More informationQuery Expansion using Wikipedia and DBpedia
Query Expansion using Wikipedia and DBpedia Nitish Aggarwal and Paul Buitelaar Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway firstname.lastname@deri.org
More informationGrid Resources Search Engine based on Ontology
based on Ontology 12 E-mail: emiao_beyond@163.com Yang Li 3 E-mail: miipl606@163.com Weiguang Xu E-mail: miipl606@163.com Jiabao Wang E-mail: miipl606@163.com Lei Song E-mail: songlei@nudt.edu.cn Jiang
More informationBest Keyword Cover Search
Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationKeypoint-based Recognition and Object Search
03/08/11 Keypoint-based Recognition and Object Search Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Notices I m having trouble connecting to the web server, so can t post lecture
More informationFinding the k-closest pairs in metric spaces
Finding the k-closest pairs in metric spaces Hisashi Kurasawa The University of Tokyo 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, JAPAN kurasawa@nii.ac.jp Atsuhiro Takasu National Institute of Informatics 2-1-2
More informationImage Querying. Ilaria Bartolini DEIS - University of Bologna, Italy
Image Querying Ilaria Bartolini DEIS - University of Bologna, Italy i.bartolini@unibo.it http://www-db.deis.unibo.it/~ibartolini SYNONYMS Image query processing DEFINITION Image querying refers to the
More informationUsing XML Logical Structure to Retrieve (Multimedia) Objects
Using XML Logical Structure to Retrieve (Multimedia) Objects Zhigang Kong and Mounia Lalmas Queen Mary, University of London {cskzg,mounia}@dcs.qmul.ac.uk Abstract. This paper investigates the use of the
More informationLarge scale object/scene recognition
Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database
More informationHierarchical Document Clustering
Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters
More informationEvaluation of GIST descriptors for web scale image search
Evaluation of GIST descriptors for web scale image search Matthijs Douze Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg and Cordelia Schmid INRIA Grenoble, France July 9, 2009 Evaluation of GIST for
More information