Published in A R DIGITECH

Similar documents
ImgSeek: Capturing User s Intent For Internet Image Search

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL

Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

Keywords semantic, re-ranking, query, search engine, framework.

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

Web Image Re-Ranking UsingQuery-Specific Semantic Signatures

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Query-Specific Visual Semantic Spaces for Web Image Re-ranking

An Enhanced Image Retrieval Using K-Mean Clustering Algorithm in Integrating Text and Visual Features

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Dimension Reduction CS534

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

An Efficient Methodology for Image Rich Information Retrieval

A Technique Approaching for Catching User Intention with Textual and Visual Correspondence

Tag Based Image Search by Social Re-ranking

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition

Multimodal Information Spaces for Content-based Image Retrieval

Re-Ranking of Web Image Search Using Relevance Preserving Ranking Techniques

Performance Enhancement of an Image Retrieval by Integrating Text and Visual Features

[Supriya, 4(11): November, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Information Retrieval: Retrieval Models

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

SPEECH WATERMARKING USING DISCRETE WAVELET TRANSFORM, DISCRETE COSINE TRANSFORM AND SINGULAR VALUE DECOMPOSITION

Information Retrieval. hussein suleman uct cs

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Short Communications

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

ISSN: , (2015): DOI:

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

An Introduction to Content Based Image Retrieval

Latent Semantic Indexing

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

VK Multimedia Information Systems

Introduction to Information Retrieval

TGI Modules for Social Tagging System

Short Run length Descriptor for Image Retrieval

A Metadatabase System for Semantic Image Search by a Mathematical Model of Meaning

An Efficient Approach for Color Pattern Matching Using Image Mining

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis

FSRM Feedback Algorithm based on Learning Theory

Semantic Website Clustering

An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques

Inverted Index for Fast Nearest Neighbour

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

Topic Diversity Method for Image Re-Ranking

CSE 494: Information Retrieval, Mining and Integration on the Internet

QUERY REGION DETERMINATION BASED ON REGION IMPORTANCE INDEX AND RELATIVE POSITION FOR REGION-BASED IMAGE RETRIEVAL

FRACTAL IMAGE COMPRESSION OF GRAYSCALE AND RGB IMAGES USING DCT WITH QUADTREE DECOMPOSITION AND HUFFMAN CODING. Moheb R. Girgis and Mohammed M.

Mining Web Data. Lijun Zhang

AN IMAGE RANKING USING GOOGLE IMAGE SEARCH

Image Retrieval Based on Quad Chain Code and Standard Deviation

Collaborative Filtering based on User Trends

TEVI: Text Extraction for Video Indexing

Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge

Concept Based Search Using LSI and Automatic Keyphrase Extraction

Clustered SVD strategies in latent semantic indexing q

CONTENT BASED IMAGE RETRIEVAL SYSTEM USING IMAGE CLASSIFICATION

Efficient Content Based Image Retrieval System with Metadata Processing

Image Similarity Measurements Using Hmok- Simrank

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD

A Modified SVD-DCT Method for Enhancement of Low Contrast Satellite Images

FEATURE EXTRACTION TECHNIQUES FOR IMAGE RETRIEVAL USING HAAR AND GLCM

Exam IST 441 Spring 2013

Content-based Dimensionality Reduction for Recommender Systems

Content based Image Retrieval Using Multichannel Feature Extraction Techniques

XETA: extensible metadata System

LATENT SEMANTIC ANALYSIS AND WEIGHTED TREE SIMILARITY FOR SEMANTIC SEARCH IN DIGITAL LIBRARY

Content Based Image Retrieval Using Combined Color & Texture Features

Comparative Analysis of 2-Level and 4-Level DWT for Watermarking and Tampering Detection

Content Based Image Retrieval: Survey and Comparison between RGB and HSV model

Learning based face hallucination techniques: A survey

Robust Lossless Image Watermarking in Integer Wavelet Domain using SVD

A Miniature-Based Image Retrieval System

Writer Recognizer for Offline Text Based on SIFT

Image Contrast Enhancement in Wavelet Domain

A NOVEL SECURED BOOLEAN BASED SECRET IMAGE SHARING SCHEME

CS 6320 Natural Language Processing

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Wavelet Based Image Retrieval Method

Information Retrieval. (M&S Ch 15)

A Study on Low Level Features and High

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Advance Engineering and Research Development. A Survey of Document Search Using Images

A ew Algorithm for Community Identification in Linked Data

Image Compression with Singular Value Decomposition & Correlation: a Graphical Analysis

Interferogram Analysis using Active Instance-Based Learning

Transcription:

IMAGE RETRIEVAL USING LATENT SEMANTIC INDEXING Rachana C Patil*1, Imran R. Shaikh*2 *1 (M.E Student S.N.D.C.O.E.R.C, Yeola) *2(Professor, S.N.D.C.O.E.R.C, Yeola) rachanap4@gmail.com*1, imran.shaikh22@gmail.com*2 Abstract Traditional methods for image retrieval used meta-data associated with images, commonly known as keywords. These methods empowered many World Wide Web search engines and achieved reasonable amount of accuracy. With increase in number of digital images, retrieval of images efficiently becomes an important topic for research. Image metadata is completely dependent on annotation quality and completeness. This is a problem associated with the traditional method since humans manually annotate images by entering keywords or metadata in a large database, it can be time consuming and may not capture the keywords results. For example: using Samsung as query desired to describe the image. The evaluation of the effectiveness image search based on keyword is subjective and has not been well-defined. This problem was resolved by content based image retrieval (CBIR) up to an extent. CBIR uses contents of image, such as shape, color, texture or any other information that can be derived from image itself. Although there are many problems associated with CBIR method. Amongst them semantic gap with image features has received a lot of attention. Images are represented by low-level features and it is important to reduce semantic gap between high-level and low-level features of images to retrieve visual similar images and re-ranking of images. This study proposes latent semantic indexing (LSI) method to re-rank images that are retrieved using CBIR method to improve image retrieval as per user s intention. Keywords:Content based, Image re-ranking, Low-level features, Latent semantic indexing, Metadata 1. Introduction Multiple research efforts are made towards effective retrieval of images. As we know, there is a famous idiom A picture paints a thousand words ; which tells us that a visual presentation is far more descriptive than words, however retrieval of pictorial information is not so easy. Hence retrieval of images has never been so easy task. Keyword based penetrating is used widely by most of the popular search engines such as Google and Bing. This method relies on adjacent text features associated with an image while searching relevant images; which leads to an ambiguous and noisy keyword yields result set that includes number of images of different categories, such as Samsung TV, Samsung laptop and Samsunghandset. To solve this problem, I used the concept of content based image retrieval (CBIR) and online image re-ranking in our base project; which solved the problems encountered in keyword based search [1] [2] [3]. In this approach, system used user entered keyword to retrieve initial set of images by matching it with highlevel semantic content of images in a stored word image index file. From retrieved images that are relevant to the query keyword; user selects a query image from pool of images which reflects the users search intention. Based on user selection remaining images in the pool are reranked based on their visual similarities (low level features associated with image such as shape, color, texture, spatial relationship) with the query image than textual annotations in interactive manner[4][5]. By using the concept of content based image retrieval we find out the similarity between the images. Instead of constructing a universal concept dictionary, this 1

framework learns different query keywords individually and automatically. By using query keyword provided by the user, then semantic space related to the images which are to be re-ranked can be significantly narrowed down. For example - If the query keyword is Samsung, the semantic concepts of TV and MacBook are unlikely to be relevant and can be ignored [7]. I removed other unlimited number of irrelevant concepts, which serve only as noise and weaken the performance of re-ranking in terms of both accuracy and computational cost by using the query specific visual semantic space, which can more accurately model the images to be re-ranked. Visual and textual features are projected into semantic spaces to obtain semantic signatures. Hence, image re-ranking is possible by comparing their semantic signatures with semantic signatures obtained from semantic spaces arrived from given query keyword, in the online part. Although content based image retrieval approach is effective; it still has some limitations. The two most problematic constraints are: [1] Synonymy :Multiple words that have similar meanings [2] Polysemy : Words that have more than one meaning These limitations are also called as semantic gap where high-level semantic contents of images cannot be mapped with their low-level features easily; due to which it s hard to retrieve images that are of interest to user automatically using CBIR re-ranking approach at one go. In this paper, I propose usage of latent semantic indexing (LSI) method to overcome above mentioned problems, in the context of an online image retrieval system. Assuming such a system, the user queries are used to construct a Latent Semantic Indexing through which the relevance between keywords seen by the system is defined. The user queries are also used to automatically interpret the images. A stochastic space between set of images is captured, based on their comment and the keyword relevance. This is further used for generation of D-Matrix which results into formation of Clusters and Re-ranking of images. 2. Latent Semantic Indexing In past, various techniques are applied for image retrieval. These techniques used various methods to give appropriate images in search result. Although these methods are inefficient due to - [1] Direct keyword matching [2] Multiple words that have similar meanings (synonymy) [3] Words that have more than one meaning (polysemy) [4] A human being only thinks about persons or objects on image and not about pixels; he extracts from image important features that define semantics of image for him. In this Journal, I proposed a system that uses LSI approach to do effective image search that can handle problem related to sensitivity towards changes in keywords. 2.1. LSI Concept Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique to identify patterns in the relationships between the terms and concepts contained in an object. LSI is based on principle that, words that are used in similar context tends to have similar meanings. LSI uses matrix evaluation to retrieve information to find similar images based on the underlying information. Originally LSI method was proposed for textual documents retrieval based on the user input query. The approach used was to improve the detection of relevant documents on the basis of terms found in the queries by taking the advantage of implicit high-order structure (semantic structure) association of terms with 2

document. LSI approach has shown good results in text data [8]. Later this approach is extended for visual data [9][10][11]. LSI is a different method to search an image which is closer to query image based on underlying semantics of images. LSI uses a mathematical technique called Singular Value Decomposition (SVD) to do matrix computation which drop out irrelevant images and locate images that have similar semantics nearer to each other in a multidimensional space [8] [9][10][11][15]. Images can be stored either as vector images or raster images; however most of real world images are captured and stored as raster images. Raster images are made up of a set of pixels. Hence each image can be viewed as a vector of pixels where every pixel is assigned a color value. This vector of pixel information can be used as keywords representing the image content. But human observer extracts from image important features that define semantics of image for him. The man only thinks about persons or objects on image and not about pixels. So, I needed a technique which can extract these features and that is resistant to minor changes of images (e.g. amount of light, contrast and moves of objects on the images). Direct usage of keyword based systems leads to results that are sensitive to small change of any keyword (pixel in query). 2.1. Numerical Aspect Let the symbol A denotes the m n termdocument matrix related to m dictionary terms in n documents. Let us remind the (i, j) element of the termdocument matrix A represents the number of occurrence the i th term in the j th document. The matrix A is often sparse, because each document usually does not contain all dictionary terms. The LSI procedure involves Singular value decomposition (SVD) of the term-document matrix A. The aim of SVD is to compute decomposition A = USV T Where S R mxn is a diagonal matrix with nonnegative diagonal elements called the singular values, U - R mxm and V - R nxn are orthogonal matrices. The columns of matrices U and V are called the left singular vectors and the right singular vectors respectively. The decomposition computation can be done so that the singular values are sorted by decreasing order. The full SVD decomposition is memory and time consuming operation, especially for large problems. But the document matrix A is often sparse; the matrices U and V have dense structure. Due to this fact, only a few largest singular values of A and the corresponding left and right singular vectors are computed and stored in memory. Computation of the number of singular values and vectors are performed and kept in memory can be chosen as a compromise between the speed/precision ratio of the LSI procedure. 2.3. System Architecture This section illustrates architecture for a system that implements LSI technique. The block diagram of system is shown in Figure. 1. It reveals the details of the system components. Figure 1: System Architecture Base system architecture includes two components - offline and online part of system. 3

Offline part of system The reference classes of given query keywords are automatically discovered. Hence, for a given querykeyword, the set of most appropriate or relevant keyword expansions are automatically selected by considering both textual and visual information. For each query keyword the reference classes are defined by the set of query keyword I wish to use this example to illustrate how LSI works. Problem: Use Latent Semantic Indexing (LSI) to rank these documents for the query gold silver truck. expansions. A multi-class classifier is trained from the training sets of its reference classes and stored offline for each query keyword. If there are N types Step 1: Set term weights and construct the termdocument matrix A and query matrix: of visual features of images such as color, texture and shape then I can combine them into a single classifier. Step 2: Decompose matrix A matrix and find the U, S and V matrices, where Online part of system After given a query keyword to search engine, the number of images are retrieved according to query keyword. When user selects a sample query image from the retrieved images; all the images are reranked by comparing similarities of semantic signature. This paper describes the enhancements done to A = USV T U = eigen vectors of A.A T V = eigen vectors of A T.A S = Singular matrix - square roots of eigenvalues from AA T or A T A the base project work by including LSI methodology to improve image retrieval efficiency and accuracy. 2.4. Algorithm with Example documents : A collection consists of the following d1: Shipment of gold damaged in a fire. Step 3: Implement a Rank 2 Approximation by keeping d2: Delivery of silver arrived in a silver truck. the first two columns of U and V and the first two d3: Shipment of gold arrived in a truck. columns and rows of S. K = 2 Frequency - term weights and query weights. The following document indexing rules are also used: Stop words were not ignored Text was tokenized and lowercased No stemming was used Terms were sorted alphabetically 4

Step 4: Find the new document vector coordinates in this reduced 2-dimensional space. di = di T U ks -1 k Rows of V hold eigenvector values. These are the coordinates of individual document vectors, hence, d1 (-0.4945, 0.6492) d2 (-0.6458, -0.7194) d3 (-0.5817, 0.2469) 3. Experimental Results Thus LSI returns to the user the vector of similarity coefficients sim. The i th element of the vector sim contains a value which indicates a measure of the semantic similarity between the i th document and the query document. The increasing value of the similarity coefficient indicates the increasing semantic similarity. Step 5: Find the new query vector coordinates in the reduced 2-dimensional space. q = q T U ks -1 k Note: These are the new coordinate of the query vector in two dimensions. Note how this matrix is now different from the original query matrix q given in Step 1. Step 6: Rank documents in decreasing order of querydocument cosine similarities. Documents Similarity D2 0.9910 D3 0.4478 D1-0.0541 Table -1: Similarity co-efficient of query 4. Conclusion The efficiency of image retrieval and corelation with human observations is better than plain CBIR re-ranking approach. It is basically an extension to vector space model. It tries to overcome problem of keyword matching by conceptual matching. Thus by this study, I can conclude that LSI is also a very effective method for retrieval of visually similar images. LSI does have few limitations; detailed solution for those is out of papers study. Paper mainly focuses on overcoming limitations of CBIR re-ranking method by implementing LSI method. Few disadvantages of LSI can be overcome by probabilistic latent semantic indexing (PLSI) and Markov Chain semantic indexing (MSI) methods. Detail descriptions of PLSI and MSI have not been included in this paper. It will be done as a future study for this paper. 5. Acknowledgement This paper is based on research work conducted by myself, Miss. Rachana C. Patil. Any opinions, findings and conclusions or recommendations expressed in this paper are those of the author. 5

6. References [1] Xiaogang Wang, Ke Liu, and Xiaoou Tang, Web, Image Re- Ranking UsingQuery-Specific Semantic Signatures, IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume:36, Issue: 4 ) April 2014, DOI:10.1109/TPAMI.201 [2] J. Cui, F. Wen, and X. Tang, Real Time Google and Live Image Search Re-Ranking, Proc. 16th ACM Intl Conf. Multimedia, 2008. [3] B. Luo, X. Wang, and X. Tang. A world wide web based image search engine using text and image content features. In Proceedings of the SPIE Electronic Imaging, 2003. [4] Y. Chen and J. Z. Wang, A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval, IEEE Trans. Pattern Analysis and Machine Intelligence [5] Sayali Baxi, and S. V. Dabhade, Re-ranking of Images using Semantic Signatures with Duplicate Images Removal K-means clustering in the IJECS, ISSN:2319-7242 Volume 3 Issue 5 may, 2014 [6] J. Cui, F. Wen, and X. Tang. Intentsearch: Interactive on-line image search re-ranking, In Proc. ACM Multimedia. ACM, 2008. [7] Ravirajk Kasture, and Dr. A. M. Dixit, Internet Image Search Based On User Intention in the IJARCSMS Volume 2, Issue 6, June 2014. [8] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harfshman - Indexing by latent semantic analysis 41(6):391-407 - Journal of the American Society for Information Science [9] Wengang Zhou, Qi Tian, Yijuan Lu, Linjun Yang, and H. Li, Latent visual context learning for web image applications Pattern Recognition, Semi Supervised Learning for Visual Content Analysis and Understanding. Vol. 44, No. 10-11, pages 2263-2273, (October November 2011) [10] Ana Benitez and Shih-Fu Chang Semantic knowledge construction from annotated image collections In Proceedings IEEE ICME, Lausanne, July 2002. [11] Pavel Praks, Jin Dvorsky, Vaclav Snasel, Latent Semantic Indexing for Image Retrieval [12] Norbert Fuhr Model for Retrieval with Probabilistic Indexing 1989. [13] Kontostathis, A. Essential dimensions of latent semantic indexing (LSI), In Proceedings of the 40th Hawaii International Conference on Systems Science (HICSS), pages 7380. IEEE Computer Society, 2007. 6