ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL Shwetha S P 1 and Alok Ranjan 2 Visvesvaraya Technological University, Belgaum, Dept. of Computer Science and Engineering, Canara Engineering College Mangalore-574219 Abstract: Markovian Semantic Indexing (MSI) for automatic annotation, indexing and annotation-based retrieval of images. In this paper we are going to construct the aggregate Markov Chain for the queries which are used by the users where the relevance between the keywords is calculated for the system. Using those same queries, annotations for the images are developed. In order to improve the better user experience in the context of online image retrieval system, the repeating images will be avoided and the related annotations will be combined. In this paper, we are going to propose a new system which has many advantages when compared to Latent Semantic Indexing (LSI) method in Annotation-Based Image Retrieval (ABIR) tasks. Keywords: Aggregate Markov Chain (AMC), Annotation-Based Image Retrieval (ABIR), Markovian Semantic Indexing (MSI). 1 Introduction A rapid increase of digital images is due to the increase in digital media like camera, mobile phones. Hence there is a need of large collection of image database where there is a need to store and retrieve those images. Along with that, the collection of digital images is growing rapidly with the advances of multiple technologies. To browse, search and retrieve images from large database many image retrieval systems have been developed. Manual image annotation is time consuming,laborious and expensive. So there is a large research done on automatic image annotation. Human tends to merge the images with all the high level concepts, the existing computer vision techniques which mine the images from mostly the low level features and the connection between the high level semantics and the low level features of all the image content is lost. There is no exact external meaning for the single low level feature or the combination of multiple low level features.there is no similarity measures for the query given and does not match the human expectation so the retrieval results will always be in the low level. What I mean is I am not getting this is called as semantic gap. Hence this approach is unsatisfactory and unpredictable. The difference in the information which is pre stored and to the real world object is called sensory gap.in addition to the semantic gap, retrieval process may also fail due to sensory gap. As in the former gap it is an issue that occurs between the user query interpretation and the visual content, the recognition of image content is done in latter gap. In the current situation very few percent of the image files in online have the professional annotation. As a result of that the image search engines can deliver only a precision of less value. Over the past few decades, the area of research in the field of image retrieval Content Based Image Retrieval (CBIR) is very important. Research shows that semantic gap is the main challenge in the CBIR systems. To bridge a semantic gap between the low level feature and high level semantics image annotation is the effective way.at present, in search engines like Google,Yahoo! systems uses Annotation-Based Image Retrieval (ABIR) method which is more efficient for text based queries and image captions.this paper deals with two modules, Latent Semantic Indexing (LSI) method and Markovian Semantic Indexing (MSI) method.relationship between the set of documents and the terms can be analyzed using the method and theory called Latent Semantic Indexing(LSI).The Latent Semantic Indexing (LSI)-based approaches that were initially applied for the document indexing. Information retrieval is done using the ABIR systems. However there was large number of keyword annotation for single image when compared to the number of keywords which was assigned to the documents. The main idea behind the LSI is used to aggregate the all the words in the document in which given word may or may not be included. A set of mutual constraints are provided by LSI which also determines the similar words. These words are compared each other by calculating the cosine angle the two vectors which are formed Canara Engineering College Mangalore NJCIET 2015 142
by two rows.if the cosine value is greater,then the similarity is higher and vice versa. In this paper we are going to discuss another method called Markovian Semantic Indexing (MSI). It is one of the new methodfor extracting the user queries with the help of pre builtkeyword.in this paper we also deal with the automatic annotation, indexing and annotation-based retrieval of images and also avoiding the repeated or redundant images for better user experience. MSI is efficient for large data sets where as LSI is used for information retrieval for small documents. 2 Related Works Image annotation, the task of associating text to the semantic content of images, is a good way to reduce the semantic gap and can be used as an intermediate step to image retrieval. V.Vijayarajan, M.Khalid, P.V.S.S.R. Chandra mouli [6] have said that TBIR is currently used in almost all general purpose web image retrieval systems.in this paper, they have used TBIR,it uses the text associated with an image to determine what the image contains. Google, Yahoo Image Search engines are examples of systems using this type of approach. However these search engines are fast and robust but sometimes they fail to retrieve relevant images. R. Datta, D. Joshi, J. Li, J.Z. Wang [8] proposed CBIR technique. In this paper,cbir has been used as an alternative to text based image retrieval. IBM was the first, who take an initiative by proposing query-by image content (QBIC).The features employed by the image retrieval systems include color, texture, shape and spatial are retrieve automatically. But it becomes weak in providing compact storage for large image database. In the image retrieval, content-based image retrieval (CBIR) is one of the most important research topics. In the last decade more than 200 Content-based image retrieval (CBIR) systems have been studied and explored [8].Several surveys on CBIR research in literature can be found in document retrieval. Annotation based image retrieval [4] is based on the theory of text retrieval systems. Many document retrieval and indexing techniques were incorporated into ABIR systems. There are different types of document retrieval techniques. Latent Semantic Indexing [5] was first introduced by S. Deerwester as a document retrieval technique.lsi deals with the problems of synonymy and polysemy. Here a mathematical technique called Singular Value Decomposition is used. But it is computationally expensive, performance and speed level degrades when applied to large scale collection. Next, Hofmann presented the probabilistic LSI (PLSI) [10] model, as an alternative to LSI. The roots of PLSI go back to the LSI. Like LSI, PLSI also deals with synonymous as well as polysemous words. PLSI is an automated document indexing technique, in which each document is represented by its word frequency.plsi is based on the Expectation Maximization (EM) algorithm. It has a better statistical foundation than LSA. But it does not clear how to assign probability to a document outside of the training data. Blei et al. proposed a unsupervised, generative model called Latent Dirichlet Allocation (LDA) [9].It is closely related to PLSI.. In LDA each document is a mixture of a small number of latent topics, here each topic is characterized by a distribution over words. Here the main problem is that, performance is poor while retrieval of information in the form of query. 3. Proposed Architecture 3.1Latent Semantic Indexing Latent Semantic Indexing (LSI) is a theory and a method used to analyze the relationship between the set of documents and the terms containing in it. The idea behind in LSI is to aggregate all the word in the document in which a given word includes or does not includes. It provides a set of mutual constraints which determines the similarity of meaning of words and sets of words to each other. Documents are represented as bag of words. To perform LSI on a group of documents, initially each document must be converted into a vector of word occurrences and form a matrix. Document may contain full set of unique words or some documents will be empty patches. It is recommended common words like example the, this, him, that are removed. Here each row represents a term and column represents a document. This matrix generated here will be usually very large and very sparse. Hence to this matrix local and global function is applied. It focuses only how many times each word appears in a document. This explains the frequency of terms in a document and inverse document frequency. Next, LSI uses a mathematical technique called Singular Value Decomposition (SVD) is used. SVD helps to reduce the number of rows while preserving the similarity structure among the columns. SVD Canara Engineering College Mangalore NJCIET 2015 143
decomposes the rectangular matrix into three other matrices. Hence three types of matrix are generated. These matrices are used to represent the relationship between terms and documents. Dimensionality reduction of matrix can be done simply by deleting coefficients in the diagonal matrix, ordinarily starting with the smallest Here words are compared by taking the cosine of angle between two vectors formed by two rows. Greater the cosine value, greater is the similarity. DB Annotation DAO Create term Document Matrix Stop word elimination Image search query TFIDF Output Perform SVD Creating Query Vector Similarity Check 3.2 Markovian Semantic Indexing Figure 1: Architecture for Latent Semantic Indexing Another method discussed in this paper is Markovian semantic Indexing (MSI) model. Markovian Semantic Indexing is a new method for mining user queries by defining keyword relevance as a connectivity measure between Markovian states modelled after the user queries. Here, this project deals with automatic annotation, indexing and annotation-based retrieval of images and also avoiding the repeated or redundant images for better user experience. As the user issues the query and picks the images from database, behind the scene images should be automatically annotated, indexing is done for every transaction. It can be done as online or offline. Each state is not dependent on any memory. Next move can happen anything. In my project also, keyword preparation is similar. We cannot predict what the user query can come for images. The queries used by users are used to construct an Aggregate Markovian Chain (AMC).However drawing of AMC is based on existing data and latest query. The automatic annotations for the images can be developed using the same user queries. In order to improve the better user experience in the context of online image retrieval system, elimination of duplicates or redundant as part of MSI can be included.i.e.the repeated images will be avoided and the related annotations will be combined. This method can also be used to improve annotations and image titles based on user queries can be highlighted. Likewise based on latest markovian chain diagram and probability calculation, elimination of existing few keywords and addition of new keywords accordingly can be done. i.e. if the users doesnot use the particular image for particular keyword for a period of time then it can be eliminated or else if the users uses the same keyword many times for the images then it can be added. Here AMC uses clustering technique. For annotation building and optimization it is required to set threshold value. Such that execution and retrieval of images from database will be much better. Elimination of stop words is also done initially, so that clustering in next step will be done faster. In this project, images are modelled as points in a vector space and their similarity is measured with MSI. Probability decision is based only after selecting the image. A new Canara Engineering College Mangalore NJCIET 2015 144
method is shown which includes many advantages when compared to Latent Semantic Indexing (LSI) method in Annotation-Based Image Retrieval (ABIR) tasks. Image Search AMC Constructor Keyword improviser Response/ Result Annotation Builder Result Generator Duplicate eliminator DB images 4. Experimental Results Figure 2: Architecture for Markovian Semantic Indexing The experiment is a comparison between LSI and MSI. Since the limited number of images are used in this experiment does not permit reliable comparison to MSI. The full features of the MSI are demonstrated in this experiment since we do the aggregate Markov chain during the automatic annotation of images which is available later. First, the distance of the images from the query is calculated and ranked for both methods and the results are examined. LSI achieves a reduction of dimensionality based on creation of matrices by means of Singular Value Decomposition (SVD). LSI treat the automatic indexing and the query-based retrieval tasks by means of a standard cosine Matching. While the cosine distance is widely used and generally accepted it has no direct interpretation. The method proposed in this work is MSI. On the other hand, it incorporates automatic indexing and query matching tasks. The MSI approach lies in the clustering of the state space, since this clustering arranges the states into groups of relevance score. Canara Engineering College Mangalore NJCIET 2015 145
Conclusion and Future work This paper introduced the technique which will give the advanced method for retrieval of images with the help of queries. Markovian Semantic Indexing (MSI), a new method for automatic annotation and annotation based image retrieval. The properties of MSI make it particularly suitable for ABIR tasks when the per image annotation data is limited. The characteristics of the method make it also particularly applicable in the context of online image retrieval systems. Elimination of duplicate or repetitive images and combining their annotations already prepared is also been introduced. This will be an value addition for better user satisfaction during search. The proposed work overcomes the drawback of older retrieval techniques. Here the comparison of LSI and MSI is done. For the future work, we suggest for real time deployment of video searching with same technique. References [1] Konstantinos A. Raftopoulos, Klimis S. Ntalianis, Dionyssios D. Sourlas, and Stefanos D. Kollias Mining User Queries with Markov Chains: Application to Online Image Retrieval, IEEE Transactions on knowledge and Data Engineering, Vol. 25, No. 2, February 2013 [2] Shriram, K.V., P.L.K. Priyadarsini, KaushikVelusamy and A. Balachandran An Efficient/Enhanced Content Based Image Retrieval for a Computational Engine, Journal of Computer Science 10 (2): 272-284, 2014. [3] MeenakshiShruti Pal, 2Dr. Sushil Kumar Garg Image Retrieval: A Literature Review, International Journal of Advanced Research in Computer Engineering and Technology (IJARCET) Volume 2, Issue 6, June 2013. [4] HimaliChaudhari, Prof D.D.Patil A Survey on Automatic Annotation and Annotation Based Image Retrieval, International Journal of Computer Science and Information Technologies, Vol. 5 (2), 2014, 1368-1371 [5] S. Deerwester, S. T. Tumais, T. K. Landauer, G. W. Furnas, and R.A. Harshman Indexing by latent semantic analysis, J. Soc. Inform.Sci. 41, 6 (1990), 391_407. [6] V.Vijayarajan, M.Khalid, P.V.S.S.R. Chandra mouli A review: from keyword based image retrieval to ontology based image retrieval,international journal of reviews in computing, 31st December 2012. vol. 12. [7] HuiHui Wang, DzulkifliMohamad, N.A. Ismail Approaches, Challenges and Future Direction of Image Retrieval, Journal of Computing, Volume 2, Issue 6, June 2010, ISSN 2151-9617. [8] R. Datta, D. Joshi, J. Li, J.Z. Wang, Image retrieval: ideas, influences and trends of the new age, ACM Computing Surveys 40(2) (April 2008). [9] D.M. Blei and A.Y. Ng, and M.I. Jordan, Latent Dirichlet Allocation, J. Machine Learning Research, vol. 3, pp. 993-1022,2003. [10] T. Hofmann Probabilistic Latent Semantic Indexing, Proc. 22ndInt l Conf. Research and Development in Information Retrieval (SIGIR 99), 1999. [11] MeenakshiShruti Pal, Dr. Sushil Kumar Garg Image Retrieval: A Literature Review, International Journal of Advanced Research in Computer Engineering and Technology (IJARCET) Volume 2, Issue 6, June 2013. Canara Engineering College Mangalore NJCIET 2015 146