CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

Size: px
Start display at page:

Download "CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING"

Transcription

1 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic Indexing (LSI). Initially the user s query input is semantically expanded by query processing with WordNet and MeSH (Medical Subject Headings) ontologies. Query expansion reformulates the original query that enables users desired information to be retrieved. After query reformulation a term document matrix is constructed under Latent Semantic Indexing process. Once a term-document matrix is constructed, local and global weighting functions can be applied to it to condition the data. Rank reduced singular value decomposition is performed on the matrix to determine patterns in the relationships between the terms and concepts contained in the text. The new document vector coordinates and new query vector coordinates are the results of singular value decomposition (SVD). Finally the documents are ranked by using cosine similarity Query Expansion Semantic Web is a machine understandable web in which information provides well-defined meaning that enables system and people for better understanding of information from different sources and enables them to work effectively (Berners-Lee et al 2001). The introduction of semantic web is a great leap from the existing Web 2.0 in which the user not only interacts with the web, but also has the capability to generate more

2 44 meaningful information. The complete information is represented with the help of Ontology. Due to the decentralized nature of the Semantic Web, it is inevitable that different communities of users or software developers will use their own ontologies to describe semantic data sources (Yingjie Li et al 2010). Furthermore, the increasing popularity of Internet and digital libraries has made applications of information retrieval techniques crucial for finding relevant documents. A collection of documents that are related to the user s request are retrieved by comparing it with an automatically generated index of the textual content of the documents present in the system by means of a computerized process, called document retrieval (also known as Information Retrieval). The objective of retrieval activity is the retrieval of highly useful documents and not the huge number of documents. In this proposed work an efficient ontology mapping query expansion model is proposed for providing a dynamic information retrieval service. The ontologies adapt to the document space within multi-disciplinary domains where a different terminology is used. The objective is to enhance the user-experience by improving the search result quality for large-scale search systems (Stein Tomassen 2006). One of the applications of ontologies in information retrieval is related to query expansion, which involves the searching the new terms in the ontologies. These new terms are related to the original query terms, to be used as a part of the query. Ontologies are useful for disambiguation in natural language. The well designed ontologies give the basis for knowledge representation in common sense or specific sense. Ontology-based knowledge representation provides two applications in information retrieval. Domainspecific ontologies help identify the semantic categories that are involved in understanding discussion in that domain. For this purpose, the ontologies work as a concept dictionary. Domain-independent ontology is a general-

3 45 purpose ontology and has been used for language understanding (Chandrasekaran et al 1999). Query expansion is one of the methods to improve the performance of retrieval system. The basic process is that select new terms which are based on the initial query, and then combine both of them to form a new query. Query expansion aims to express an information need by multiple terms. The ontology-based query refinement gives the suggestion of newly selected terms from the conceptualized knowledge. There are several retrieval methods for IR system. This work focuses on the ontology-based query expansion for the effective document retrieval which expands the original query term by using WordNet and MeSH. Ontology allows knowledge to be represented as a set of concepts, properties and the relations between them (Uschold & Gruninger 1996). In information retrieval, the users don t search with the exact terms represented in the documents in most of the cases. Hence, relevant documents are not fetched by the keyword-based information retrieval but the semantic web makes the information retrieval more users driven than that of keyword driven. Hence it helps to retrieve more relevant documents. Many researchers widely used Natural language processing (NLP) to understand the meaning of the user query input in ontology based information retrieval. The semantic web makes use of various types for ontologies for understanding the user query input in information retrieval. Widely, linguistic ontologies like WordNet, VerbNet, FrameNet, and ConceptNet are used to understand the concepts of user query. There are also other ontologies like application or domain specific ontology, contextual ontology and user history based generative ontology used in information retrieval based on the user s requirement.

4 46 The major problem of current IR system is that its performance is affected by language ambiguity. When several terms express the same concept, an IR system perhaps only retrieves the documents that represent the concept by the same term used in the query. When a term expresses multiple concepts, this term might lead the retrieval result to non-relevant documents. WordNet is a domain-independent ontology, the application of it is to expand common vocabulary in the query terms, match the same concept but represented by different terms in documents. It is an online lexical reference system where English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. MeSH is a domain-specific ontology, used to expand biomedical terms and to identify term variants respectively. It is the National Library of Medicine's vocabulary thesaurus, which contains a collection of words representing descriptors in a hierarchical structure. The method of expanding query term and their weight are the most important factors that affect the retrieval performance Latent Semantic Indexing Latent Semantic Indexing (LSI) provides the ability to find a broader range of relevant documents, because it looks at semantic relationships between groups of keywords and the use of a high-dimensional representation. The LSI algorithm also represents both terms and text objects in the same space, allowing all relevant information to be processed together as a collection rather than unrelated documents. This representational feature also helps in allowing objects to be retrieved directly from the query terms. Latent Semantic Indexing is generally thought of as being an improvement over the current keyword matching search engines. However, there are a variety of other implementations.

5 47 The standard search engine only requires users to enter a small number of keywords to create a query. The larger this supplied list of keywords, the more irrelevant documents are returned. LSI contrasts this approach by making use of a larger set of related keywords to improve the recall. Relevance feedback improves the query supplied by the user by making use of the terms within relevant documents. This is achieved without increasing the computational requirements needed to perform the query, and allows the recall and precision of the results returned to be improved. As Latent Semantic Indexing is able to correlate related keywords, one possible use is to use it for information filtering, where certain types of words are removed or documents/text containing certain words are removed from retrieved documents. With an appropriate implementation of LSI, information filtering could be used to remove all documents that followed a generic structure. It could be also used to manage spam, content expressed within chat rooms, news groups, bulletin boards and family-suitable search engine results, using a similar system. LSI could be used to determine the semantic relationship between parts of the document. LSI could also aid in the task of fully automating academic integrity checking. With a broad range of documents within its collection, a system could check a submitted document to determine if any of the content is copied directly from other sources Singular Value Decomposition Initially the latent semantic indexing methodology uses input query document and target set of documents to construct a term document matrix. The latent semantic structure model is obtained using Singular Value Decomposition (SVD) by analyzing the matrix. Singular value decomposition is very much related to a number of mathematical and statistical techniques in

6 48 other fields, including eigenvector decomposition, spectral analysis, and factor analysis. The terminology of factor analysis is used, since that approach has some precedence in the information retrieval literature. The Singular Value Decomposition model starts with a matrix of associations between all pairs of one type of object, e.g., documents. Further this matrix is decomposed into the product of two matrices of a very special form by a process called eigen analysis. The decomposed matrix consists of eigen vectors and eigen values. These special matrices show a decomposing of the original data into linearly independent components or factors. In common many of these components are very small, and may be ignored, resulting to an approximate model that contains many fewer factors. The similarity in behavior of each of the original documents is now approximated by its values on this smaller number of factors. Finally the documents are ranked corresponding to their estimated similarity by using distant measure namely cosine similarity. Hence for information retrieval techniques, SVD can be the more suitable technique for finding a set of uncorrelated indexing variables or factors. Each term and document is represented by its vector of factor values in this methodology. It is noted that by desirable quality of the dimension reduction, it is possible for documents with fairly different profiles of term usage to be mapped into the same vector of factor values. The improvement of unreliable data is accomplished by this property.

7 PROPOSED SYSTEM ARCHITECTURE The proposed system architecture comprises of four major processes mentioned below. 1. Query expansion using MeSH and WordNet ontologies 2. Term document matrix construction using Term Frequency (TF) and Inverse Document Frequency (IDF) 3. Indexing of documents by Latent Semantic Indexing (LSI) and Singular Valued Decomposition (SVD) 4. Ranking of documents by calculating Cosine similarity. Figure 3.1 Proposed System Architecture

8 50 Finally, the relevant documents are retrieved from the document repository using the similarity value. The Query Expansion process, TF-IDF calculation, LSI& SVD procedure and cosine similarity method for the effective information retrieval is explained in detail in the following sections. The proposed system architecture for the retrieval of documents is shown in the Figure Ontology-based Query Expansion Query Expansion in information retrieval is one of the applications of ontologies. Query Expansion is to expand each term in the original query with synonyms or related terms by searching the new terms in the ontologies. These terms are to be added as a part of the query and related to the original query terms. The basis for knowledge representation in common sense or specific sense is given by the well-designed ontologies. In information retrieval there are two applications namely domain specific ontology and domain independent ontology. Domain specific ontologies are mainly used to discover the semantic categories that are concerned with that particular domain. Domain independent ontology is nothing but a general purpose ontology which is mostly used for language purpose Term-based similarity calculation using MeSH ontology MeSH is a biomedical controlled vocabulary formed by the U.S. National Library of Medicine (NLM). MeSH consists of descriptors, qualifiers and supplementary concepts. Descriptors are the core basics of the vocabulary. To express a special aspect of the concept qualifiers are assigned to descriptors inside the MeSH fields. Both descriptors and qualifiers are arranged in several hierarchies. In this proposed approach the Jaccard similarity measure is used for the mapping of input query terms to MeSH terms.

9 51 Jaccard similarity is the cardinality of the intersection of the two sets divided by that of the union of these two sets. The value of J (A, B) is 1 if A and B are exactly the same, and decreases when A and B become increasingly different. The Jaccard similarity measure is used to measure the similarity between two terms. The Jaccard similarity is calculated using the Equation (3.1) is given as J A, B = A B A B (3.1) where A B is the intersection of two sets of A and B, A B is the union of two sets of A and B. If the similarity value is greater than or equal to the threshold value 0.5, then the corresponding MeSH terms are added to the query Synonyms-based similarity calculation using WordNet ontology WordNet is an electronic lexical database of English and it was developed and is being maintained by the Cognitive Science Laboratory of Princeton University. In WordNet, a concept represents a meaning of a term. The terms which have the same meaning are grouped in a synset. Each synset has its gloss (definition) and links with other synset higher or lower in the hierarchy by different types of semantic relations. Synonyms-based similarity calculation is carried out using WordNet. By using the expanded query from the previous module, this similarity calculation generates words equivalent to input query words by substituting the head noun of each word with its synonyms. Hence the expanded input query from the previous stage is once again expanded by adding keywords from synset. The introduction of synonyms in the input query resolves the ambiguity problem.

10 Term Extraction using TF-IDF The expanded query is fed into the TF-IDF module. Using the expanded query words, the Term Frequency and Inverse Document Frequency are calculated. Then term matrix is constructed for further processing. The Equation (3.2) for Term Frequency (TF t ) can be written as, TF t = n t N (3.2) In the above equation, the number of occurrences of a considered term t in a document is defined by n t and N denotes the number of occurrences of all terms in a specific document. written as, The Equation (3.3) for Inverse Document Frequency (IDF t ) can be IDF t = log 2 D d t (3.3) where D specifies the number of all the documents and d t denotes the number of documents with term t in it. Terms that have very low discrimination value, i.e., the terms that are not useful for differentiating among documents, are replaced by the terms for low-frequency terms and phrases for highfrequency terms. If the documents are moved together after the assignment of a term, then the term is a poor or low discriminator valued term. The highest value of IDF shows that the term occurs in only one document, and the lowest value shows that the term occurs in many documents. Thus the correlation between the term and the document has been identified.

11 53 Now both TF and IDF have been defined and they can be combined to produce the score of a feature on the document. The TF-IDF score is calculated using the Equation (3.4) and can be written as TF-IDF score = TF t IDF t (3.4) By the high TF and the low DF of a considered term in the collection of documents, a high score in TD-IDF can be achieved. The scores lead to filter out the other unwanted common terms. Hence the extraction of terms and construction of term matrix using TF-IDF is successfully completed Latent Semantic Indexing Latent semantic indexing (LSI) (Scott Deerwester et al) tries to resolve the problems of lexical matching. The conceptual indices instead of individual words for retrieval are used by LSI. It is an extension of Vector Space Modelling, developed to provide retrieve results related to a specified keyword, even if the keyword is not present in the document. Using the term document constructed from the previous stage the LSI is implemented through the application of singular value decomposition (SVD). Singular value decomposition (SVD) is an orthogonal decomposition and is used to reduce the number of dimensions used to represent the documents. Singular Value Decomposition splits any rectangular matrix S of size t d into three components X, Y and Z. The Equation (3.5) for SVD can be written as S = XYZ T (3.5) where S is t d term-document matrix, X is t d orthogonal matrix, Z is t d orthogonal matrix and Y is d d diagonal matrix. The LSI algorithm is

12 54 depicted in the Figure 3.2. The document vector coordinates and the query vector coordinates are derived by implementing LSI algorithm. Figure 3.2 Latent Semantic Indexing Algorithm Cosine similarity This metric is frequently used when trying to determine similarity between two documents. Since there are more words that are in common between two documents, it is not viable to use the other methods of calculating similarities (namely the Euclidean Distance and the Pearson Correlation Coefficient). In this similarity metric, the attributes (or words, in the case of the documents) are used as a vector to find the normalized dot product of the two documents. The Cosine similarity distance measure is calculated using the Equation (3.6) as given below. Sim q, d = q.d q d (3.6)

13 55 Where q and d are vectors of attributes, q.d denotes dot product of q and d. By substituting the document coordinate values and the query vector coordinate values in the above Equation (3.6) the similarity value is found. If the similarity value is greater than or equal to the threshold value 0.60, then the documents are retrieved from the document repositories LSI & SVD implementation an example The Latent Semantic Indexing (LSI) pseudo code is explained with Bronchial Asthma respiratory disease dataset as follows. Target set of document collection consists of the following documents. d1: Asthma is characterized by airway inflammation. d2: Wood dust is a common cause of Asthma. d3: Bronchial Asthma is a respiratory disease. Step 1: Initially term weights are calculated. The term document matrix A and query matrix q is constructed using term weights. Terms d1 d2 d3 q a airway astma broncial by cause common caractrized dust inflammtion is of respiratory_ disease A = q =

14 56 Step 2: The matrices U (orthogonal matrix), S (diagonal matrix) and V (orthogonal matrix) are found after the decomposition of term-document matrix A where A = USV T U = S = V = V T = ,843 Step 3: A Rank 2 Approximation is implemented by keeping the first columns of X and Z and the first columns and rows of Y (k=2). U U k = 0, S S k =

15 57 V V k = V T V k T = Step 4: In this step the new document vector coordinates are calculated. The eigenvector values of the rows of Z are the coordinates of individual document vectors and represented by d1 ( , ), d2 ( , ), d3 ( , ). Step 5: the equation below. In this step the new query vector coordinates are calculated using q=q T U k S k 1 (3.7) In the Equation (3.7) q is the query matrix, X is the orthogonal matrix and Y is the diagonal matrix. Note: These are the new coordinate of the query vector in two dimensions. Note how this matrix is now different from the original query matrix q given in Step 1. q = q T U k S k 1 k = 2 q =

16 58 q = Step 6: By using cosine similarity measure the documents are ranked in decreasing order. Sim (q, d) = q d q d (sim (q, d1) =0.51, sim (q, d2) = 0.55, sim (q, d3) = 0.80) Ranking of documents in descending order is represented by d 3 > d 2 > d 1. The Pseudo code for latent semantic indexing is as follows: Figure 3.3 Latent Semantic Indexing - Pseudo Code 3.3 EXPERIMENTAL ENVIRONMENT AND DATASET DESCRIPTION The proposed LSI algorithm combined with Query Expansion strategy is tested using documents obtained from PubMed database. PubMed is the National Library of Medicine's search service that provides access to over 11 million citations in MEDLINE. MEDLINE is the premier

17 59 Bibliographic database covering the fields of medicine, nursing, dentistry, veterinary medicine, health care system, and preclinical sciences. It contains more than 11 million references and abstracts from over 4000 Biomedical Journals. From that database, 180 medical journals for 6 different medical keywords have been chosen. The chosen sample keywords are shown in Table 3.1. The proposed Information Retrieval system is implemented using Java (jdk 1.7) and NetBeans. Table 3.1 Input Query Words Sl.No Input Query 1. Rotavirus 2. Anemia 3. Asthma 4. Neoplasm 5. Hyperthyroidism 6. Diabetes 3.4 RESULTS AND DISCUSSION The experiment results of the proposed method are presented and analyzed in this section. The evaluation metrics used in this performance analysis are also discussed Evaluation Metrics The following evaluation metrics are used to evaluate the effectiveness of document retrieval systems and to justify theoretical and practical developments of these systems. It consists of a set of measures that follow a common underlying evaluation methodology. Some of the metrics that have been chosen for the evaluation purpose are Recall, Precision and the F-measure and the corresponding Equations (3.8), (3.9) & (3.10) are as follows.

18 60 Precision, P = related documents extracted documents (extracted documnets ) (3.8) Recall, R = related documents extracted documents (related documnets ) (3.9) F- Measure, F = 2PR (P+R) (3.10) As suggested by the above equations in the field of document retrieval, Precision is the fraction of retrieved documents that are relevant to the search, Recall is the fraction of the documents that are relevant to the query that are successfully retrieved and the F-measure that combines Precision and Recall is the harmonic mean of precision and recall Performance Analysis The Information Retrieval metrics Precision, Recall and F-measure are calculated and tabulated in the Table 3.2. The average values of the above mentioned metrics also calculated and tabulated. The performance of the proposed retrieval system is compared with the existing system of Kogilavani & Balasubramanie (2009). The Precision, Recall and F-Measure of the proposed system increases by 19%, 7% and 13% respectively when compared to the existing system. The performance metrics (Precision, Recall & F-Measure) comparison table is illustrated in the Table 3.3. The comparison graph for the average precision, recall and F-measure of existing and proposed system is depicted in the Figure 3.3. Even though the document clustering technique is improved by concept hierarchy knowledge of ontology in the existing system the proposed LSI based retrieval system outperforms the existing one. In the existing system only the domain specific ontology MeSH is used. In the case of proposed system the general purpose ontology WordNet is also used for query expansion process.

19 61 Table 3.2 Performance Metrics of the Proposed System Sl.No Input Query Precision Recall F-Measure 1. Rotavirus Anemia Asthma Neoplasm Hyperthyroidism Diabetes Mellitusz Average Table 3.3 Performance Metrics Comparison (Existing Vs Proposed) Methods Precision Recall F-Measure Existing Proposed Precision,Recall & F-Measure Existing Proposed 0 Precision Recall F-Measure Figure 3.4 Comparison Graph for the Precision, Recall and F-measure Values with the Existing Work

20 62 The performance of the proposed system is also compared with the existing system of Aswani Kumar et al (2006) and illustrated in the Table 3.4 & Table 3.5. The precision of the proposed method increases by 29%, and 15% when compared to the VSM with traditional approach and LSI with traditional approach respectively. The Precision of the proposed method increases by 13%, and 14% when compared to LSI with Intelligent approach and VSM with Intelligent approach respectively. The average Precision value of the proposed method is increased due to the ontology based (MeSH & WordNet) query expansion module. The basic process is that select new terms which are based on the initial query, and then combine both of them to form a new query. Query expansion aims to express an information need by multiple terms. The ontology-based query refinement gives the suggestion of newly selected terms from the conceptualized knowledge. Table 3.4 Precision Value Comparison (Traditional Approach Vs Proposed) Methods Precision VSM with Traditional Approach (Existing) 0.45 LSI with Traditional Approach (Existing) 0.59 Proposed Method 0.74 Table 3.5 Precision Value Comparison (Intelligent Approach Vs Proposed) Methods Precision VSM with Intelligent Approach Aswani Kumar et al (2006) 0.60 LSI with Intelligent Approach Aswani Kumar et al (2006) 0.61 Proposed Method 0.74

21 63 The performance metrics comparison for the traditional approach with the proposed system is presented in the Table 3.4 and intelligent approach with the proposed system is presented in the Table 3.5. They are also plotted as a chart and depicted in the following Figure 3.4 & Figure 3.5. Proposed Approach LSI(Traditional Approach) VSM(Traditional Approach) Precision Figure 3.5 Precision Value Comparison(Traditional Approach Vs Proposed) Proposed Approach LSI(Intelligent Approach) Precision VSM(Intelligent Approach) Precision Figure 3.6 Precision Value Comparison (Intelligent Approach Vs Proposed)

22 SUMMARY Using Query Expansion (QE) approach and Latent Semantic Indexing (LSI) methodology the performance of the proposed system is improved. The proposed method is tested on Pub Med Database. In traditional Information Retrieval systems the user queries are formed by a few keywords and the term mismatch is the serious issue which leads to poor retrieval performance. The multiple domain specific keywords given by a user exactly express his/her information need and naturally the retrieval process returns the relevant information. This is achieved by ontology based query expansion method which is dynamically implemented with WordNet and MeSH ontologies. Since most of the search engines do not expand the query by synonyms, the IR performance metric recall measure is substantially reduced by the synonymy problem. Moreover the automatic query expansion may add terms that have a different meaning from those intended by the user which leads to polysemy problem. This problem is resolved by LSI by using the word usage patterns which are building upon word co-occurrences. The terms and documents are represented within the semantic space created by LSI. SVD is used in this proposed approach to implement dimensionality reduction. The patterns of the word usage are analyzed and the similarities between the terms and documents are shown within the reduced space. Finally the retrieved documents are ordered by the cosine similarity distance measure. The proposed system is evaluated with standard evaluation metrics like Precision, Recall and F-measure. The results of the experiments proved that the retrieval performance of the proposed system significantly increased when compared to the existing system performance.

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion

More information

Towards Understanding Latent Semantic Indexing. Second Reader: Dr. Mario Nascimento

Towards Understanding Latent Semantic Indexing. Second Reader: Dr. Mario Nascimento Towards Understanding Latent Semantic Indexing Bin Cheng Supervisor: Dr. Eleni Stroulia Second Reader: Dr. Mario Nascimento 0 TABLE OF CONTENTS ABSTRACT...3 1 INTRODUCTION...4 2 RELATED WORKS...6 2.1 TRADITIONAL

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

vector space retrieval many slides courtesy James Amherst

vector space retrieval many slides courtesy James Amherst vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the

More information

Concept Based Search Using LSI and Automatic Keyphrase Extraction

Concept Based Search Using LSI and Automatic Keyphrase Extraction Concept Based Search Using LSI and Automatic Keyphrase Extraction Ravina Rodrigues, Kavita Asnani Department of Information Technology (M.E.) Padre Conceição College of Engineering Verna, India {ravinarodrigues

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Semantic Search in s

Semantic Search in  s Semantic Search in Emails Navneet Kapur, Mustafa Safdari, Rahul Sharma December 10, 2010 Abstract Web search technology is abound with techniques to tap into the semantics of information. For email search,

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

Euripides G.M. Petrakis 1, Angelos Hliaoutakis 2

Euripides G.M. Petrakis 1, Angelos Hliaoutakis 2 Automatic Document Categorisation by User Profile in Medline Euripides G.M. Petrakis 1, Angelos Hliaoutakis 2 Dept. Of Electronic and Comp. Engineering, Technical Univ. of Crete (TUC), Chania, Crete, Greece,

More information

Retrieval of Highly Related Documents Containing Gene-Disease Association

Retrieval of Highly Related Documents Containing Gene-Disease Association Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents. Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

A Semantic Model for Concept Based Clustering

A Semantic Model for Concept Based Clustering A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Using a Medical Thesaurus to Predict Query Difficulty

Using a Medical Thesaurus to Predict Query Difficulty Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY Asian Journal Of Computer Science And Information Technology 2: 3 (2012) 26 30. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science and Information Technology Journal

More information

WordNet-based User Profiles for Semantic Personalization

WordNet-based User Profiles for Semantic Personalization PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Vector Space Models: Theory and Applications

Vector Space Models: Theory and Applications Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Information Retrieval Basics: Agenda Vector

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

4. Image Retrieval using Transformed Image Content

4. Image Retrieval using Transformed Image Content 4. Image Retrieval using Transformed Image Content The desire of better and faster retrieval techniques has always fuelled to the research in content based image retrieval (CBIR). A class of unitary matrices

More information

Information Retrieval. hussein suleman uct cs

Information Retrieval. hussein suleman uct cs Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information

More information

CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT Clemson University TigerPrints All Dissertations Dissertations 5-2010 CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT William Taylor II

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Document Clustering using Concept Space and Cosine Similarity Measurement

Document Clustering using Concept Space and Cosine Similarity Measurement 29 International Conference on Computer Technology and Development Document Clustering using Concept Space and Cosine Similarity Measurement Lailil Muflikhah Department of Computer and Information Science

More information

MeSH: A Thesaurus for PubMed

MeSH: A Thesaurus for PubMed Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical

More information

Lecture 7: Relevance Feedback and Query Expansion

Lecture 7: Relevance Feedback and Query Expansion Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk

More information

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY Ankush Maind 1, Prof. Anil Deorankar 2 and Dr. Prashant Chatur 3 1 M.Tech. Scholar, Department of Computer Science and Engineering, Government

More information

Boolean Queries. Keywords combined with Boolean operators:

Boolean Queries. Keywords combined with Boolean operators: Query Languages 1 Boolean Queries Keywords combined with Boolean operators: OR: (e 1 OR e 2 ) AND: (e 1 AND e 2 ) BUT: (e 1 BUT e 2 ) Satisfy e 1 but not e 2 Negation only allowed using BUT to allow efficient

More information

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

A Document-centered Approach to a Natural Language Music Search Engine

A Document-centered Approach to a Natural Language Music Search Engine A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Latent Semantic Indexing

Latent Semantic Indexing Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH

More information

Content-based Dimensionality Reduction for Recommender Systems

Content-based Dimensionality Reduction for Recommender Systems Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100

More information

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu

More information

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Published in A R DIGITECH

Published in A R DIGITECH IMAGE RETRIEVAL USING LATENT SEMANTIC INDEXING Rachana C Patil*1, Imran R. Shaikh*2 *1 (M.E Student S.N.D.C.O.E.R.C, Yeola) *2(Professor, S.N.D.C.O.E.R.C, Yeola) rachanap4@gmail.com*1, imran.shaikh22@gmail.com*2

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

Information Retrieval and Data Mining Part 1 Information Retrieval

Information Retrieval and Data Mining Part 1 Information Retrieval Information Retrieval and Data Mining Part 1 Information Retrieval 2005/6, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Retrieval - 1 1 Today's Question 1. Information

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

Information Retrieval

Information Retrieval Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani

More information

Optimization of the PubMed Automatic Term Mapping

Optimization of the PubMed Automatic Term Mapping 238 Medical Informatics in a United and Healthy Europe K.-P. Adlassnig et al. (Eds.) IOS Press, 2009 2009 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-044-5-238

More information

Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion

Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion Vinay Kakade vkakade@cs.stanford.edu Madhura Sharangpani smadhura@cs.stanford.edu Department of Computer Science

More information

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline Relevance Feedback and Query Reformulation Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price IR on the Internet, Spring 2010 1 Outline Query reformulation Sources of relevance

More information

Digital Libraries: Language Technologies

Digital Libraries: Language Technologies Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................

More information

CHAPTER 3 DYNAMIC NOMINAL LANGUAGE MODEL FOR INFORMATION RETRIEVAL

CHAPTER 3 DYNAMIC NOMINAL LANGUAGE MODEL FOR INFORMATION RETRIEVAL 60 CHAPTER 3 DYNAMIC NOMINAL LANGUAGE MODEL FOR INFORMATION RETRIEVAL 3.1 INTRODUCTION Information Retrieval (IR) models produce ranking functions which assign scores to documents regarding a given query

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

International Journal of Advance Engineering and Research Development SENSE BASED INDEXING OF HIDDEN WEB USING ONTOLOGY

International Journal of Advance Engineering and Research Development SENSE BASED INDEXING OF HIDDEN WEB USING ONTOLOGY Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 SENSE

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

Simple Method for Ontology Automatic Extraction from Documents

Simple Method for Ontology Automatic Extraction from Documents Simple Method for Ontology Automatic Extraction from Documents Andreia Dal Ponte Novelli Dept. of Computer Science Aeronautic Technological Institute Dept. of Informatics Federal Institute of Sao Paulo

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? http://www.ncbi.nlm.nih.gov/mesh

More information

On Topic Categorization of PubMed Query Results

On Topic Categorization of PubMed Query Results On Topic Categorization of PubMed Query Results Andreas Kanavos 1, Christos Makris 1 and Evangelos Theodoridis 1,2 1.Computer Engineering and Informatics Department University of Patras Rio, Patras, Greece,

More information

CSE 494: Information Retrieval, Mining and Integration on the Internet

CSE 494: Information Retrieval, Mining and Integration on the Internet CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:

More information

Performance for Web document mining using NLP and Latent Semantic Indexing with Singular Value Decomposition

Performance for Web document mining using NLP and Latent Semantic Indexing with Singular Value Decomposition Performance for Web document mining using NLP and Latent Semantic Indexing with Singular Value Decomposition Vikram Saini 1, Jitender Kumar 2 1 M. Tech Scholar, RPSCET, Mohindergarh (Haryana), INDIA 2

More information

Tag-based Social Interest Discovery

Tag-based Social Interest Discovery Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture

More information

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL Shwetha S P 1 and Alok Ranjan 2 Visvesvaraya Technological University, Belgaum, Dept. of Computer Science and Engineering, Canara

More information

Query Expansion Study for Clinical Decision Support

Query Expansion Study for Clinical Decision Support Query Expansion Study for Clinical Decision Support Wenjie Zhuang Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for

More information

Query Refinement and Search Result Presentation

Query Refinement and Search Result Presentation Query Refinement and Search Result Presentation (Short) Queries & Information Needs A query can be a poor representation of the information need Short queries are often used in search engines due to the

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer

More information

Semantically Driven Snippet Selection for Supporting Focused Web Searches

Semantically Driven Snippet Selection for Supporting Focused Web Searches Semantically Driven Snippet Selection for Supporting Focused Web Searches IRAKLIS VARLAMIS Harokopio University of Athens Department of Informatics and Telematics, 89, Harokopou Street, 176 71, Athens,

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

An ontology-based approach for semantics ranking of the web search engines results

An ontology-based approach for semantics ranking of the web search engines results An ontology-based approach for semantics ranking of the web search engines results Abdelkrim Bouramoul*, Mohamed-Khireddine Kholladi Computer Science Department, Misc Laboratory, University of Mentouri

More information

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018

More information

Semantic-Based Information Retrieval for Java Learning Management System

Semantic-Based Information Retrieval for Java Learning Management System AENSI Journals Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Semantic-Based Information Retrieval for Java Learning Management System Nurul Shahida Tukiman and Amirah

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

Ontology Research Group Overview

Ontology Research Group Overview Ontology Research Group Overview ORG Dr. Valerie Cross Sriram Ramakrishnan Ramanathan Somasundaram En Yu Yi Sun Miami University OCWIC 2007 February 17, Deer Creek Resort OCWIC 2007 1 Outline Motivation

More information