International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7

Size: px
Start display at page:

Download "International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7"

Transcription

1 International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7 A Hybrid Method for Extracting Key Terms of Text Documents Ahmad Ali Al-Zubi Computer Science Department / King Saud University Saudi Arabia aalzubi@ksu.edu.sa, dralzubi@gmail.com Abstract-- key terms are important terms in the document, which can give high-level description of contents for the reader. Extracting key terms is a basic step for many problems in natural language processing, such as document classification, clustering documents, text summarization and output the general subject of the document. This article proposed a new method for extracting key terms from text documents. As an important feature of this method, we note the fact that the result of its work is a group of key terms, with terms from each group are semantically related by one of the main subjects of the document. Our proposed method is based on a combination of the following two techniques: a measure of semantic proximity of terms, calculated based on the knowledge base of Wikipedia and an algorithm for detecting communities in networks. One of the advantages of our proposed method is no need for preliminary learning, because the method works with the knowledge base of Wikipedia. Experimental evaluation of the method showed that it extracts key terms with high accuracy and completeness. Index Term-- Extraction Method, Key Term, Semantic Graph, Text Document I. INTRODUCTION Key terms (keywords or key phrases) are important terms in the document, which can give high-level description of contents for the reader. Extracting key terms is a basic step for many problems in natural language processing, such as document classification, clustering documents, text summarization and output the general subject of the document (Manning and Schtze, 1999). In this article we propose a method for extracting document key terms, using Wikipedia as a rich information resource about the semantic proximity of terms. Wikipedia is a free available encyclopaedia, which is now the largest encyclopaedia in the world. It contains millions of articles and redirect pages of synonyms of the main title of the article available in several languages. With a vast network of links between articles, a large number of categories, redirect pages and disambiguation pages, Wikipedia is an extremely powerful resource for our work and for many other applications of natural language processing and information retrieval. Our method is based on the following two techniques: A measure of semantic proximity, calculated based on Wikipedia and an algorithm for networks analysis, namely, Girvan- Newman algorithm for communities detection in networks. A brief description of these techniques is given below. Establishing the semantic proximity of concepts in the Wikipedia is a natural step towards a tool, useful for the problems of natural language processing and information retrieval. Over the recent years a number of articles were published on semantic proximity computation between concepts using different approaches [7, 8, 3, 12]. [7] Gives a detailed overview of many existing methods of semantic proximity calculation of concepts using Wikipedia. Although the method described in our article does not impose any requirements to the method of semantic proximity determination, the efficiency of this method depends on the quality of the chosen method for semantic proximity calculation. For the experiments described in this article, we used the method for semantic proximity calculation. Knowing the semantic proximity of terms, we can construct a semantic graph for all terms of processed document. The semantic graph is a weighted graph in which nodes are the document terms, the existence of edges between a pair of terms means that these two terms are semantically similar, the weight of the edges is the numerical value of the semantic proximity of these two terms. We noticed that, thus constructed graph, possesses an important property: Semantically similar terms stumble into dense subgraphs in so-called community, the most massive and highly connected subgraphs tend to correlate with the main subject of the document and the terms included in such subgraphs, are considered as the key terms of this document. The novelty of our approach is to apply the algorithm for detecting such communities in networks, which allows us to identify thematic groups of terms and then choose the densest among them. Those densest groups of terms are the result of the methodthematically grouped key terms. The task of structure analysis of the networks and detecting communities in these networks is well studied to date. Many algorithms were suggested and have been successfully used to analyze Social Networks (SN), Scientific Articles Citation Networks (SACN), Shopping Networks of large online retailers such as Amazon [1], Biochemical Networks (BN) and many others. At the same time, there are no examples of such algorithms for networks based on Wikipedia. In our method we use an algorithm proposed by [10, 9]. There are some articles which show the high performance of this algorithm in the analysis of both synthetic and real world networks IJVIPNS-IJENS April 2010 IJENS

2 International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 8 II. PREVIOUS SIMILAR WORKS In the field of statistical processing of natural language there are classical approaches for key terms extraction: tf.idf and collocation analysis [5]. tf.idf (term frequency-inverse document frequency) is a popular metric for solving problems of information retrieval and text analysis [10]. tf.idf is a statistical measure of how important the term is in the document, which is a part of a collection of documents. By using tf.idf the importance of the term is proportional to the number of occurrences of the term in the document and inversely proportional to the number of occurrences of the term in the entire collection of documents. While tf.idf is used to extract the key terms of one-word, Collocation Analysis (CA) is used to detect phrases. Approach tf.idf, supplemented by the CA, is used to extract key phrases. Both approaches require a certain collection of documents for statistics gathering, such collection is known as Training Set (TS). Quality of approaches depends on the quality of TS. The advantages of these approaches are the simplicity of implementation and satisfactory quality of work when the TS is well chosen. Due to these advantages, these approaches are widespread in practice. We would like to note an interesting fact: in these researches [7, 2, 6] where Wikipedia was used as a TS, they showed that Wikipedia is a good TS for many practical applications. An alternative class of approaches for natural language processing problems (keywords extraction is one of such problems) and this article belongs to this class of approaches. Approaches of this class based on the use of knowledge about the semantic proximity of terms. The semantic proximity of terms can be obtained with the help of a dictionary or thesaurus, but we are interested in the articles that use the semantic proximity of terms, obtained from Wikipedia. Calculation of semantic proximity of terms using Wikipedia can be performed with one of the following two ways: Using hypertext links between the articles of Wikipedia, which correspond to the terms [8, 10, 5]. Measuring cosine of the angle between the vectors constructed by the texts of relevant articles of the Wikipedia [3]. There are many articles where the semantic proximity of terms is derived from Wikipedia, are used to solve the following problems of natural language processing and information retrieval: resolution of lexical polysemy of terms [6, 11] derivation of document overall theme [13], categorization [4] co-reference resolution [12]. The authors of this article are not aware of articles, where the semantic proximity of terms would be used to extract document key terms; however, article [4] is the closest to mine. In [4] problem of text categorization is being solved, with the text terms a semantic graph is being constructed, similar to what I propose in this article. The idea of using algorithms for graphs analysis in this article appears in a simple form according to which: the most central terms in the graph are chosen by the Betweenness Centrality Algorithm (BCA), then these terms are used to categorize the document. We distinguish the following advantages of our method: Our method does not require training, in contrast to traditional approaches described above. Due to the fact that Wikipedia is large and continually updated encyclopaedia by millions of people, it remains relevant and covers a lot of specific areas of knowledge. Thus, practically any document, that has most of the terms described in Wikipedia, can be processed by our method Key terms are grouped by subject and the method extracts many different thematic groups of terms, as many as different topics covered in the document. Thematically grouped key terms can significantly improve the general subject determination of the document (using, for example, the spreading activation method on categories graph of Wikipedia, as described in [13]) and document categorization [4]. Our method is highly effective regarding the quality of extracted key terms. Experimental evaluation of the method, discussed later in this article showed that the method extracts key terms with high accuracy and completeness from any document III. MATERIALS AND METHODS III.1. Key Terms Extracting Method. Our proposed method for key terms extraction consists of the following five steps: Extraction of candidate terms Resolution of lexical polysemy of terms Building a semantic graph Detection of communities in the semantic graph Selection of suitable communities III.2. Extraction of Candidate Terms. The purpose of this step is to extract all the terms of the document and prepare for each term a set of Wikipedia articles, which could potentially describe its meaning. We break the source document into tokens, allocating all possible N-grams. For each N-gram, we construct its morphological variations. Then for each variation a search will be performed on all articles titles on Wikipedia. Thus, for each N-gram, we obtain a set of Wikipedia articles, which can describe its meaning. Construction of various morphological forms of words allows us to expand the search by articles titles of Wikipedia and thus, finding relevant articles for a larger portion of terms. For example, the words of reads, reading and read may be linked to two articles on Wikipedia: Read and reading. III.3. Resolution of Lexical Polysemy of Terms. In this step, for each N-gram, we must choose the most appropriate article on Wikipedia of the articles set, which was built for it in the previous step.

3 International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 9 The polysemy of words-is a widespread phenomenon of natural language. For example, the word platform can mean a railway platform, or software platform, as well as a platform, as a part of the shoe. The correct meaning of an ambiguous word may be determined by the context in which this word is mentioned. The task of lexical polysemy resolution is an automatic selection of the most appropriate meaning of the word (in our case-the most appropriate article on Wikipedia) when mentioning it in some context. There are several articles on resolving lexical polysemy of terms using Wikipedia [6, 7, 11]. For the experiments discussed in this study a method was implemented, where we use the pages for ambiguous terms and Wikipedia redirect pages. With the use of these wiki pages a set of possible meanings of the term is constructed. Then the most appropriate meaning is selected using the knowledge of the semantic proximity of terms: the degree of semantic proximity to the context is calculated for every possible meaning of the term. As a result, the meaning with the largest degree of semantic proximity to the context will be chosen as the meaning of the term. One common problem of traditional methods for extracting key terms is the existence of absurd sentences in the result, such as, for example, using, electric cars are. Using Wikipedia as a controlling thesaurus allows us to avoid this problem: All key terms extracted using our method are meaningful phrases. The result of this step is a list of terms in which each term is correlated with a corresponding article on Wikipedia, which describes its meaning. Electronics III.4. Building the Semantic Graph. In this step, we construct a semantic graph using the list of terms obtained in the previous step. The semantic graph is a weighted graph whose nodes are the terms of the document, the existence of edges between two nodes means that the terms are semantically related to each other, the weight of the edges is numerical value of the semantic proximity of the two connected terms. Fig. 1 shows an example of the semantic graph, constructed from a news article «Apple to Make ITunes More Accessible for the Blind». The article says that the chief prosecutor in Massachusetts and the National Federation of the Blind have reached an agreement with the corporation of Apple Inc., Following which Apple will make available its online music service ITunes for blind users through screenreading technology. On Fig. 1 we can see that the relevant terms Apple Inc. and Blindness, form the two dominant communities and the terms student, retailing and year were on the periphery and poorly connected with the rest of the graph. It is important to note that the terms-errors, that have occurred in the resolution of lexical polysemy of terms, performed on the second step, are peripheral or even isolated nodes of the graph and do not adjoin to the dominant communities. Information Access Music Download Apple Inc Technology Internet Computer Software Speech Synthesis National Federation of the Blind Screen Reader Blindness Braille Internaut Student ITunes Icon:Steve Jobs Apple Keyboard European Commission Consumor Massachusetts Retailing Massachusetts Attorney General Class Action Home Office Fig.1 An example of the semantic graph, constructed by the news article «Apple to Make ITunes More Accessible For the Blind» III.5. Detecting Communities in the Semantic Graph.

4 International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 10 The purpose of this step is automatic detection of texts, we have conducted experiments involving manual communities in the built semantic graph. To solve this efforts, that is, the completeness and accuracy of the extracted problem we apply Girvan-Newman algorithm. As a result of keywords were evaluated by people-participants in the the algorithm the original graph is divided into subgraphs, experiment. which represent a thematic community terms. We collected 30 blog posts. The experiment included five To evaluate the quality of the partition of some graph on people. Each participant had to read each blog post and select the community authors [9] suggested using a measure of from 5-10 key terms for this blog post. Every key term must modularity of the graph. Modular graphs are a property of a be in blog post and a relevant article in Wikipedia must be graph and some of its partitioning into subgraphs. It is a found for it. Participants also were instructed to choose the measure of how this partitioning qualitative in the sense that key terms so that they cover all main topics of the blog-post. there is a lot of edges lying within communities and few edges As a result, for each blog post, we chose some key terms that lying outside the communities (connecting communities with have been allocated, at least by two participants in the each other). In practice, the value of modularity in the range of experiment. Titles of redirect articles of Wikipedia and the means that the network has quite discernible structure titles of articles, to which redirection is made, in fact, with the communities and the application of algorithm for represent synonyms and in our experiment and we considered detecting communities makes sense. them as one term. We noted that the semantic graphs, built from text The method presented in this article was executed based documents (such as, for example, a news article or scientific on the following architectural principles: article), have modularity value from III.6. Choosing the Right Community. In this step, from all communities should be chosen those that contain key terms. We rank all communities so that communities with high ranks to hold important terms (key terms) and communities with low ranks to hold unimportant terms, as well as errors of resolution of lexical polysemy of terms that may occur on the second step of our method. The ranking is based on the use of density and informational content of the community. Community density is the sum of the weights of edges connecting the nodes of this community. Experimenting traditional approaches, we found that the use of tf.idf measures of terms helps to improve the ranking of communities. tf.idf gives high coefficients to terms corresponding to named entities (e.g., Apple Inc., Steve jobs, Braille) and lower coefficients to terms corresponding the general concepts (such as, for example, Consumer, Year, Student). We believe that tf.idf for terms when using Wikipedia as described in [6]. Under informational content of the community we understand the amount tf.idf-terms included in this community, divided by the number of terms in this community. As a result, we consider the rank of a community, as a community density multiplied by its information content and communities sorted in descending order regarding their ranks. Any application that uses our method for extracting keywords can use any number of communities with the highest ranks, but in practice it makes sense to use 1-3 communities with the highest ranks. IV. RESULTS IV.1. Experimental Evaluation. In the results, we discuss the experimental evaluation of the proposed method. Since there is no standard benchmark for measuring the quality of key terms extracted from the To achieve the best performance we have not calculated the semantic proximity of all pairs of Wikipedia terms in advance. The data needed to calculate the semantic proximity of terms on the fly, i.e., titles of Wikipedia articles, information about links between articles, statistical information about the terms are loaded into memory. The client applications work with the knowledge base through remote procedure calls IV.2. Completeness Evaluation of Selected Key Terms. By completeness we mean the proportion of keywords assigned manually, which also were identified automatically by our method: {ME} {AE} Completeness (1) {ME} Where: ME = Manually Extracted AE = Automatically Extracted For 30 blog posts we have 190 key terms, selected by participants of the experiment, 303-assigned automatically, the 133 hand-selected key terms were also identified automatically. Thus, the completeness is 68%. IV.3. Accuracy Evaluation of Selected Key Terms. We evaluate the accuracy using the same methodology used for completeness evaluation. By accuracy we mean the proportion of those key terms that automatically identified by our method and were also detected by participants in the experiment: {ME} {AE} Accuracy (2) {ME}

5 International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 11 [2] Dakka, W. and P.G. Ipeirotis, Automatic extraction of useful facet hierarchies from text databases. Proceedings of the Following the indicators of our test collection, accuracy = ICDE, IEEE, pp: In ICDE, IEEE. 41%. [3] Gabrilovich, E. and S. Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis. IV.4. Revision of Completeness and Accuracy Evaluation. Proceedings of the 20th International Joint Conference for Artificial Intelligence, pp: Morgan Kaufmann In order to better evaluate the method performance, we Publishers Inc. San Francisco, CA, USA, also reviewed the completeness and accuracy evaluation. An [4] Janik, M. and K.J. Kochut, Wikipedia in action: Ontological important feature of our method is that it allocates usually knowledge in text categorization. Proceedings of the International more key terms than people and retrieves more key terms that Conference on Semantic Computing, pp: August ISBN: are relevant to one topic. For example, let us have a look at [5] Manning, C.D. and H. Schtze, Foundations of Statistical Fig. 1. For the topic related to Apple Inc. Our method has Natural Language Processing. The MIT Press. retrieved the following terms: Internet, Information access, [6] Medelyan, O., I.H. Witten and D. Milne, Topic indexing Music download, Apple Inc., ITunes, Apple Keyboard and with Wikipedia. Proceedings of the Wikipedia and AI Workshop at the AAAI-08 Conference, (WikiAI 08). Chicago, I.L. Steve Jobs, while a man usually retrieved less key terms and [7] Mihalcea, R. and A. Csomai, Wikify!: Linking documents to inclined to retrieve such as terms and names: Music download, encyclopaedic knowledge. Proceedings of the 16th ACM Apple Inc., ITunes and Steve Jobs. This means that sometimes Conference on Information and Knowledge Management, ACM our method extracts key terms that cover the subject of the Press, New York, USA., pp: ISBN: article better than people do. This fact made us to re-evaluate [8] Milne, D. and I. Witten, An effective, low-cost measure of the completeness and accuracy of our method. semantic relatedness obtained from Wikipedia links. Proceedings Each participant was instructed to review the experiment of the Wikipedia and AI Workshop at the AAAI-08 Conference, key terms, which he identified as follows. For each blog post (WikiAI 08). [9] Milne, D., Computing semantic relatedness using Wikipedia he was to examine key terms selected automatically and, if link structure. Proceedings of the New Zealand Computer Science possible, expand his own key terms list with the terms, that he Research Student Conference, (NZCSRSC). thinks are related to the main subject of the document, but [10] Newman, M.E.J. and M. Girvan, Finding and evaluating were not included on the first stage. community structure in networks. Phys. Rev. E., 69: [11] Salton, G. and C. Buckley, Term-weighting approaches in After such review, we have 213 key terms, selected by automatic text retrieval. Inform. Process. Manage., 24: participants instead of 190, thus participants in the experiment [12] Sinha, R. and R. Mihalcea, Unsupervised graph-based word added 23 new key terms, which means that our assumption is sense disambiguation using measures of word semantic similarity. meaningful and this revision is important for a full evaluation Proceedings of the International Conference on Semantic Computing, IEEE Computer Society, Washington DC., USA., pp: of the method. As a result, Completeness = 77% and Accuracy = 61%. DOI: /ICSC V. CONCLUSION [13] Strube, M. and S. Ponzetto, WikiRelate! Computing We have proposed a new method for extracting key terms semantic relatedness using Wikipedia. Proceedings of the 21st National Conference on Artificial Intelligence, (AAAI 06), pp: from text documents. One of the advantages of our method July 16-20, 2006, Boston, Massachusetts that is no need for preliminary training, because the method is working on a knowledge base built on Wikipedia. Another important feature of our method is the form in which it gives the result: the key terms derived from the document, grouped by subjects of this document. Grouped key terms (by subject) can greatly facilitate the further categorization of this document and determination of its general subject. Experiments conducted manually, have shown that our method can extract key terms with high accuracy and completeness. We noted that our method can be successfully used for purification of complex composite documents from unimportant information and determine its general subject. This means that it would be very useful to apply this method for key terms extraction from Web-pages, which are usually loaded with secondary information, such as menu, navigation elements, ads. VI. REFERENCES [1] Clauset, A., M.E.J. Newman and C. Moore, Finding community structure in very large networks. Phys. Rev. E., 70:

Papers for comprehensive viva-voce

Papers for comprehensive viva-voce Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India

More information

Context Sensitive Search Engine

Context Sensitive Search Engine Context Sensitive Search Engine Remzi Düzağaç and Olcay Taner Yıldız Abstract In this paper, we use context information extracted from the documents in the collection to improve the performance of the

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department

More information

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Nitish Aggarwal, Kartik Asooja, Paul Buitelaar, and Gabriela Vulcu Unit for Natural Language Processing Insight-centre, National University

More information

An Adaptive Agent for Web Exploration Based on Concept Hierarchies

An Adaptive Agent for Web Exploration Based on Concept Hierarchies An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,

More information

Knowledge Engineering in Search Engines

Knowledge Engineering in Search Engines San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Knowledge Engineering in Search Engines Yun-Chieh Lin Follow this and additional works at:

More information

Efficient Mining Algorithms for Large-scale Graphs

Efficient Mining Algorithms for Large-scale Graphs Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Query Expansion using Wikipedia and DBpedia

Query Expansion using Wikipedia and DBpedia Query Expansion using Wikipedia and DBpedia Nitish Aggarwal and Paul Buitelaar Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway firstname.lastname@deri.org

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

Ontology based Web Page Topic Identification

Ontology based Web Page Topic Identification Ontology based Web Page Topic Identification Abhishek Singh Rathore Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Devshri Roy Department of Computer

More information

Wikulu: Information Management in Wikis Enhanced by Language Technologies

Wikulu: Information Management in Wikis Enhanced by Language Technologies Wikulu: Information Management in Wikis Enhanced by Language Technologies Iryna Gurevych (this is joint work with Dr. Torsten Zesch, Daniel Bär and Nico Erbs) 1 UKP Lab: Projects UKP Lab Educational Natural

More information

A New Method for Extracting Key Terms from Micro-Blogs Messages Using Wikipedia

A New Method for Extracting Key Terms from Micro-Blogs Messages Using Wikipedia Research Journal of Applied Sciences, Engineering and Technology 6(21): 4070-4076, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: January 29, 2013 Accepted: March

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Available online at ScienceDirect. Procedia Computer Science 82 (2016 ) 28 34

Available online at  ScienceDirect. Procedia Computer Science 82 (2016 ) 28 34 Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 82 (2016 ) 28 34 Symposium on Data Mining Applications, SDMA2016, 30 March 2016, Riyadh, Saudi Arabia Finding similar documents

More information

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY Ankush Maind 1, Prof. Anil Deorankar 2 and Dr. Prashant Chatur 3 1 M.Tech. Scholar, Department of Computer Science and Engineering, Government

More information

Semantic Annotation for Semantic Social Networks. Using Community Resources

Semantic Annotation for Semantic Social Networks. Using Community Resources Semantic Annotation for Semantic Social Networks Using Community Resources Lawrence Reeve and Hyoil Han College of Information Science and Technology Drexel University, Philadelphia, PA 19108 lhr24@drexel.edu

More information

Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy

Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy Si-Chi Chin Rhonda DeCook W. Nick Street David Eichmann The University of Iowa Iowa City, USA. {si-chi-chin, rhonda-decook,

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using

More information

Patent Classification Using Ontology-Based Patent Network Analysis

Patent Classification Using Ontology-Based Patent Network Analysis Association for Information Systems AIS Electronic Library (AISeL) PACIS 2010 Proceedings Pacific Asia Conference on Information Systems (PACIS) 2010 Patent Classification Using Ontology-Based Patent Network

More information

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term

More information

Clustering for Ontology Evolution

Clustering for Ontology Evolution Clustering for Ontology Evolution George Tsatsaronis, Reetta Pitkänen, and Michalis Vazirgiannis Department of Informatics, Athens University of Economics and Business, 76, Patission street, Athens 104-34,

More information

Data deduplication for Similar Files

Data deduplication for Similar Files Int'l Conf. Scientific Computing CSC'17 37 Data deduplication for Similar Files Mohamad Zaini Nurshafiqah, Nozomi Miyamoto, Hikari Yoshii, Riichi Kodama, Itaru Koike, Toshiyuki Kinoshita School of Computer

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions

Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions Satoshi Niwa University of Tokyo niwa@nii.ac.jp Takuo Doi University of Tokyo Shinichi Honiden University of Tokyo National

More information

Extracting Summary from Documents Using K-Mean Clustering Algorithm

Extracting Summary from Documents Using K-Mean Clustering Algorithm Extracting Summary from Documents Using K-Mean Clustering Algorithm Manjula.K.S 1, Sarvar Begum 2, D. Venkata Swetha Ramana 3 Student, CSE, RYMEC, Bellary, India 1 Student, CSE, RYMEC, Bellary, India 2

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

On-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database

On-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database On-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database Ashritha K.P, Sudheer Shetty 4 th Sem M.Tech, Dept. of CS&E, Sahyadri College of Engineering and Management,

More information

Graph-Based Concept Clustering for Web Search Results

Graph-Based Concept Clustering for Web Search Results International Journal of Electrical and Computer Engineering (IJECE) Vol. 5, No. 6, December 2015, pp. 1536~1544 ISSN: 2088-8708 1536 Graph-Based Concept Clustering for Web Search Results Supakpong Jinarat*,

More information

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method. IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED ROUGH FUZZY POSSIBILISTIC C-MEANS (RFPCM) CLUSTERING ALGORITHM FOR MARKET DATA T.Buvana*, Dr.P.krishnakumari *Research

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

Influence of Word Normalization on Text Classification

Influence of Word Normalization on Text Classification Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Assigning Vocation-Related Information to Person Clusters for Web People Search Results

Assigning Vocation-Related Information to Person Clusters for Web People Search Results Global Congress on Intelligent Systems Assigning Vocation-Related Information to Person Clusters for Web People Search Results Hiroshi Ueda 1) Harumi Murakami 2) Shoji Tatsumi 1) 1) Graduate School of

More information

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network Journal of Innovative Technology and Education, Vol. 3, 216, no. 1, 131-137 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/jite.216.6828 Graph Sampling Approach for Reducing Computational Complexity

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness UvA-DARE (Digital Academic Repository) Exploring topic structure: Coherence, diversity and relatedness He, J. Link to publication Citation for published version (APA): He, J. (211). Exploring topic structure:

More information

Wikipedia Mining for an Association Web Thesaurus Construction

Wikipedia Mining for an Association Web Thesaurus Construction Wikipedia Mining for an Association Web Thesaurus Construction Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio Dept. of Multimedia Eng., Graduate School of Information Science and Technology Osaka University,

More information

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML CS276B Text Retrieval and Mining Winter 2005 Plan for today Vector space approaches to XML retrieval Evaluating text-centric retrieval Lecture 15 Text-centric XML retrieval Documents marked up as XML E.g.,

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Glossary Construction of Scientific and Technical Term by using Wikipedia

Glossary Construction of Scientific and Technical Term by using Wikipedia Indian Journal of Science and Technology, Vol 8(27), DOI:10.17485/ijst/2015/v8i27/81057, October 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Glossary Construction of Scientific and Technical

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization

More information

TECHNIQUES FOR COMPONENT REUSABLE APPROACH

TECHNIQUES FOR COMPONENT REUSABLE APPROACH TECHNIQUES FOR COMPONENT REUSABLE APPROACH Sukanay.M 1, Biruntha.S 2, Dr.Karthik.S 3, Kalaikumaran.T 4 1 II year M.E SE, Department of Computer Science & Engineering (PG) sukanmukesh@gmail.com 2 II year

More information

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER N. Suresh Kumar, Dr. M. Thangamani 1 Assistant Professor, Sri Ramakrishna Engineering College, Coimbatore, India 2 Assistant

More information

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London Interpreting Document Collections with Topic Models Nikolaos Aletras University College London Acknowledgements Mark Stevenson, Sheffield Tim Baldwin, Melbourne Jey Han Lau, IBM Research Talk Outline Introduction

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

Lily: Ontology Alignment Results for OAEI 2009

Lily: Ontology Alignment Results for OAEI 2009 Lily: Ontology Alignment Results for OAEI 2009 Peng Wang 1, Baowen Xu 2,3 1 College of Software Engineering, Southeast University, China 2 State Key Laboratory for Novel Software Technology, Nanjing University,

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN

More information

ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators

ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie Laboratoire LATTICE CNRS, École Normale Supérieure, U Paris 3 Sorbonne Nouvelle

More information

Oleksandr Kuzomin, Bohdan Tkachenko

Oleksandr Kuzomin, Bohdan Tkachenko International Journal "Information Technologies Knowledge" Volume 9, Number 2, 2015 131 INTELLECTUAL SEARCH ENGINE OF ADEQUATE INFORMATION IN INTERNET FOR CREATING DATABASES AND KNOWLEDGE BASES Oleksandr

More information

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,

More information

Recommendation on the Web Search by Using Co-Occurrence

Recommendation on the Web Search by Using Co-Occurrence Recommendation on the Web Search by Using Co-Occurrence S.Jayabalaji 1, G.Thilagavathy 2, P.Kubendiran 3, V.D.Srihari 4. UG Scholar, Department of Computer science & Engineering, Sree Shakthi Engineering

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

The use of frequent itemsets extracted from textual documents for the classification task

The use of frequent itemsets extracted from textual documents for the classification task The use of frequent itemsets extracted from textual documents for the classification task Rafael G. Rossi and Solange O. Rezende Mathematical and Computer Sciences Institute - ICMC University of São Paulo

More information

The Relationship between Slices and Module Cohesion

The Relationship between Slices and Module Cohesion The Relationship between Slices and Module Cohesion Linda M. Ott Jeffrey J. Thuss Department of Computer Science Michigan Technological University Houghton, MI 49931 Abstract High module cohesion is often

More information

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation 230 The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009 An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation Ali Al-Humaimidi and Hussam Ramadan

More information

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce International Journal of u-and e-service, Science and Technology, pp.53-62 http://dx.doi.org/10.14257/ijunnesst2014.7.4.6 Study on A Recommendation Algorithm of Crossing Ranking in E- commerce Duan Xueying

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

Analytic Knowledge Discovery Models for Information Retrieval and Text Summarization

Analytic Knowledge Discovery Models for Information Retrieval and Text Summarization Analytic Knowledge Discovery Models for Information Retrieval and Text Summarization Pawan Goyal Team Sanskrit INRIA Paris Rocquencourt Pawan Goyal (http://www.inria.fr/) Analytic Knowledge Discovery Models

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Research on Community Structure in Bus Transport Networks

Research on Community Structure in Bus Transport Networks Commun. Theor. Phys. (Beijing, China) 52 (2009) pp. 1025 1030 c Chinese Physical Society and IOP Publishing Ltd Vol. 52, No. 6, December 15, 2009 Research on Community Structure in Bus Transport Networks

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Community structures evaluation in complex networks: A descriptive approach

Community structures evaluation in complex networks: A descriptive approach Community structures evaluation in complex networks: A descriptive approach Vinh-Loc Dao, Cécile Bothorel, and Philippe Lenca Abstract Evaluating a network partition just only via conventional quality

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

A Composite Graph Model for Web Document and the MCS Technique

A Composite Graph Model for Web Document and the MCS Technique A Composite Graph Model for Web Document and the MCS Technique Kaushik K. Phukon Department of Computer Science, Gauhati University, Guwahati-14,Assam, India kaushikphukon@gmail.com Abstract It has been

More information

Count based K-Means Clustering Algorithm

Count based K-Means Clustering Algorithm International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Count

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

Basics of Network Analysis

Basics of Network Analysis Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,

More information

A Comparative Study Weighting Schemes for Double Scoring Technique

A Comparative Study Weighting Schemes for Double Scoring Technique , October 19-21, 2011, San Francisco, USA A Comparative Study Weighting Schemes for Double Scoring Technique Tanakorn Wichaiwong Member, IAENG and Chuleerat Jaruskulchai Abstract In XML-IR systems, the

More information

More Efficient Classification of Web Content Using Graph Sampling

More Efficient Classification of Web Content Using Graph Sampling More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information

More information

KeaKAT An Online Automatic Keyphrase Assignment Tool

KeaKAT An Online Automatic Keyphrase Assignment Tool 2012 10th International Conference on Frontiers of Information Technology KeaKAT An Online Automatic Keyphrase Assignment Tool Rabia Irfan, Sharifullah Khan, Irfan Ali Khan, Muhammad Asif Ali School of

More information

CHAPTER-26 Mining Text Databases

CHAPTER-26 Mining Text Databases CHAPTER-26 Mining Text Databases 26.1 Introduction 26.2 Text Data Analysis and Information Retrieval 26.3 Basle Measures for Text Retrieval 26.4 Keyword-Based and Similarity-Based Retrieval 26.5 Other

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Entity Linking. David Soares Batista. November 11, Disciplina de Recuperação de Informação, Instituto Superior Técnico

Entity Linking. David Soares Batista. November 11, Disciplina de Recuperação de Informação, Instituto Superior Técnico David Soares Batista Disciplina de Recuperação de Informação, Instituto Superior Técnico November 11, 2011 Motivation Entity-Linking is the process of associating an entity mentioned in a text to an entry,

More information

Similarities in Source Codes

Similarities in Source Codes Similarities in Source Codes Marek ROŠTÁR* Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia rostarmarek@gmail.com

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information