Concept Based Text Document Summarization Using Domain Ontology

Size: px
Start display at page:

Download "Concept Based Text Document Summarization Using Domain Ontology"

Transcription

1 Concept Based Text Document Summarization Using Domain Ontology Dr. S.Logeswari ASP/CSE, Bannari Amman Institute of Technology, Sathyamangalam Dr. R.Gomathi AP (Sr.G)/CSE, Bannari Amman Institute of Technology, Sathyamangalam Dr.B.Gomathy AP(Sl.G)/CSE, Bannari Amman Institute of Technology, Sathyamangalam ABSTRACT In today s world, the web search engine often returns thousands of web pages which makes difficult for users to browse or to find relevant information. Clustering methods help to automatically group the retrieved documents into a list of meaningful sections. Summarization is the process of extracting and retaining the set of most important points from the original document. The important sentences are extracted using the domain ontology from the source documents for final summary generation. In this paper we propose a system which is used for generating summaries for the medical documents using MeSH ontology. KEYWORDS: Summarization, MeSH Ontology, Clustering, K-Means 1. INTRODUCTION In the most recent years, with the massive expansion of the information society, the web has become a precious source of information for almost every potential domain of knowledge. This has induced many researches to initiate considering the web as a legitimate repository for information retrieval and knowledge acquisition tasks. The Web consists of massive amount of information available for each possible domain and its high redundancy, can be a valid knowledge source for similarity computation. Therefore, text mining system faces with a large amount of attributes. The knowledge discovery in database techniques require input texts to be represented as a set of attributes in order to deal with them. The text-torepresentation method is known as text or document indexing, and the attributes are called indexes. Indexing becomes a critical task in text mining because it has to represent the information in the text with the minimum loss of semantics for its future usage. 1.1 DATA MINING Data mining is the computational process of discovering patterns in large data sets devising methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. Data mining involves data pre-processing, database and data management aspects, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. The overall objective of the data mining process is to extract the information from database and transfer it into an understandable structure for further use. Besides the raw analysis step, it includes database and data management aspects, data preprocessing, model and interface considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization and online updating. 1.2 TEXT MINING Text mining is also referred as text data mining or text analytics which is the technique of deriving high-quality information from text. High-quality information is typically derived through the devising of 173 Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

2 patterns and statistical pattern learning. Text mining generally involves the process of structuring the input text usually parsing, along with derived linguistic features and the eradication of others, and subsequent insertion into a database, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 1.3 CLUSTERING Clustering is an automatic learning approach aimed at grouping a set of objects into subsets or cluster. Clustering is the grouping of a particular set of objects based on their characteristics, aggregating them according to their similarities. Clustering is an unsupervised classification. A web search engine often returns thousands of pages in response to a broad query, making it difficult for users to browse or to diagnose relevant information. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories. A good clustering method will produce high quality clusters in which (a) the intraclass similarity is high (b)the inter-class similarity is low (c)the quality of a clustering result also depends on both the similarity measure used by the method and its implementation and (d)the quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns. 1.4 DOCUMENT CLUSTERING Document clustering is an automatic document organization, topic extraction and fast information retrieval or filtering. Document Clustering is a fundamental and crucial operation in several applications such as document organization, corpus summarization, information retrieval and filtering, automatic topic extraction. Document clustering has been widely applied to information retrieval systems for enhancing performance. The goal of a document clustering scheme is to minimize intra-cluster distances between documents, while maximizing inter-cluster distances. Document clustering involves the need of descriptors and descriptor extraction. Descriptors are sets of words that describe the contents within the cluster. Document clustering is typically considered to be a centralized process. In traditional document clustering technique, the terms of the documents are treated as features. Examples of document clustering comprises web document clustering for search users. 1.5 SUMMARIZATION Summarization is the process of extracting and retaining a set of most important points from the original document. The significant part in summarization is to identify the most informative part from the document. The information flow in a document varies from document to document and the importance of information is unpredictable. As the problem of information overload has grown, and as the quantity of data has increased, the interest in automatic summarization has also increased. Text summarization or abstraction has always been a key activity in the information access context. Document summaries serve readers with condensed versions of the most relevant information found in documents, they can help the readers to assess the value of the document without having to read it, or it can be used as content repositories for extracting valuable facts or information. 1.6 SEMANTIC ANALYSIS Semantic analysis figures out the meaning of linguistic input. It processes language to produce common-sense knowledge about the world and extract data and construct models of the world. Lexical semantics describes the meaning of component words and word sense disambiguation. Semantic analysis can begin with the relationship between individual words. 1.7 ONTOLOGY Ontology is an engineering artifact explaining what exists in a particular domain. Ontology belongs to a specific domain of knowledge. Ontology can be used as a background knowledge that can help in finding the related meanings for the terms occurring in documents. The scope of the ontology concentrates on definitions of a certain domain, although sometimes the domain can be very broad. The domain can be such as industry domain, an enterprise, a research field, or any other restricted set of knowledge, whether abstract, concrete or even imagined. It is usually constructed with a certain task in mind. Present-day ontology can be categorized into two general levels: those that form meta-language dictionaries and those that are derived from 174 Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

3 knowledge bases built for inference engines and expert systems. Text clustering and classification are two promising approaches to help users organize and contextualize textual information. Medical Subject Headings ( MeSH) is published by the National Library of Medicine is hierarchically arranged from most general to most specific. It mainly consists of the controlled vocabulary and a MeSH Tree. The controlled vocabulary contains many different kinds of terms, such as descriptor, qualifiers, publication types, geographic and entry terms. Descriptors terms are the main concepts or main headings in the ontology. Entry terms are the synonyms or the related terms to descriptors. MeSH descriptors are organized in MeSH Tree, which can be seen as MeSH concept Hierarchy. In the MeSH tree, there are 15 categories (e.g. category A for anatomic terms) and each category is further divided into subcategories. In each subcategory, the descriptors are hierarchically arranged from most general to most specific. Descriptors usually appear in more than one place in the tree, they are represented in a graph rather than a tree. 1. EXISTING SYSTEM There exist two common approaches for producing automatic summarization: extraction and abstraction. Extraction method selects a subset of existing words, phrases, or sentences in the source document to form the summary. The abstraction method builds an internal semantic representation and natural language generation techniques to create a summary that is closer to human generated summary. A new key-phrase extraction method is also used which extracts the content from source document using semantic relations. Lexical chains are used to represent semantic relations. 2.1 SUMMARIZATION PROCESS The steps involved in the general summarization system are: The input documents are analysed and the sentences are extracted one by one. Parse trees are generated for each sentence and typed dependencies are also generated by the parser. Subject, predicate and object are extracted from the typed dependencies for each sentence. The extracted subject, predicate and object are represented in a Resource Description Framework (RDF). Semantic distance between the each pair of triples is calculated and semantic distance matrix is generated using a Wu and Palmer metric and the distance is calculated from the values obtained from word net. Once the semantic distance matrix is generated, mean is calculated for the semantic values of each pair of triples and provided as input to the clustering algorithm. K means clustering algorithm is applied to the obtained mean values in order to group the triples that are semantically similar. After applying the clustering algorithm, the cluster points of each cluster is analysed for sentence extraction to generate the summary of the input document. 2.3 ISSUES IN EXISTING SYSTEM The huge number of possible documents may be assigned to imply that many standard semantic models such as simple language model, topic signature translation model, context sensitive semantic model, cannot be easily adapted. Whenever a parse tree is constructed, it occupies a lot of memory. In parse tree generation, it is necessary that the tokens to be completely context free. When a parsing error occurs, it is hard to determine where the source file gets failed. 2. PROPOSED SYSTEM The major objective of this proposed work is to improve the quality of searching which is based on the process of summarization model for MeSH labels. The experiment is based on concept-based weighting scheme that is used to index the words in the source document. It includes sentence extraction, Parts-Of- Speech (POS) tagging, concept mapping and summarization. 175 Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

4 POS tagging applies to tag the source text automatically. Concept mapping uses semantic relation such as identity, synonym, hypernym and meronym which are identified from the MeSH ontology. The significance of the concepts in each document is represented as concept weight. The concept weight is computed based on the semantic relations. Weight is computed using the frequency of each word and its importance. 3.1 MODULE DESCRIPTION In this system, MeSH ontology is used as the domain reference for concept extraction. A conceptbased weighting scheme is used to index the words in the source document. Semantic weight of individual concept is computed based on the semantic relationships, identity, synonymy, hypernymy and meronymy. It includes modules such as Sentence extraction POS tagging Concept mapping Summarization SENTENCE EXTRACTION Sentence extraction helps to separate the paragraph into individual sentences from the source document. Sentence extraction is a technique used for automatic summarization of a text. Sentence extraction works as a filter which allows only important sentences to pass. Sentences are extracted from a set of documents that contains similar content PARTS-OF-SPEECH TAGGING The process of assigning one of the parts of speech to the given word is called POS tagging. POS tagging includes nouns, verbs, adverbs, adjectives, pronouns, conjunction and their sub-categories CONCEPT MAPPING Concept mapping is a procedure used to find the concept behind the sentence using the semantic relation such as identity, synonym, hypernym and meronym. Hypernym is a word that denotes general or parent category whereas Meronym is a word that denotes a constituent part or a member of something SUMMARIZATION Individual relations are assigned with initial weights. In this proposed method, the concept of weight is introduced to prioritize each sentence. Identity and synonym are considered to have more weight whereas hypernym and meronym are considered to have minimum score. Initially, Identity and synonyms are assigned with the weight as 1. The term representing the concept is known as the root word and it is assigned with the weight 1. Hypernyms are assigned with the weight reduced by 0.1 level by level in the backward direction from the root. Meronyms are assigned with the weight value by decreasing 0.01 from the weight of the root, level by level in the forward direction till the end of the tree. Extract the sentence from the input document. Apply POS tagging. Extract the concept mapping from ontology. Assign the weight for the noun based on the semantic relation. Compute the concept for each sentence using the equation 3.1. where, w( ) - weight of the particular concept freq i - frequency of the particular relation weight i - semantic weight assumed for a particular relation N - number of occurrences of all concepts in the document (3.1) 176 Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

5 3.1.5 IMPLEMENTATION The documents are collected from the PubMed repository for the experimentation purpose in the disease Dengue, Typhoid and Jauntice. The documents are preprocessed by applying the procedures such as sentence extraction, POS tagging and the concept weights are computed for all the sentences in the input documents. The efficiency of this concept based summarization is assessed by the clustering process with K- Means, Hierarchical-single and Hierarchical-complete algorithms. Various similarity measures such as Euclidean distance and Pearson correlation are used to find the optimal number of clusters. The quality of the clusters is compared with the traditional tf-idf method which computed using the equations ( 3.2). For the computation of tf-idf weight of the documents, the tokenization and stop-word removal are performed during the preprocessing stage. where is the number of occurrences of term in the document, is the term frequency in the collection of documents and is the total number of documents in the collection PERFORMANCE EVALUATION SILHOUETTE CLUSTERING: Silhouette refers to a process of interpretation and validation of consistency within clusters of data. The silhouette value is a measure of how identical an object is to its own cluster distinguished to other clusters. The silhouette ranges from -1 to 1, where a high value indicates that the object is well related to its own cluster and poorly matched to neighboring clusters. If many objects have a high value, then the clustering configuration is appropriate. Objects with a high silhouette value are considered well grouped; objects with a low value may be outliers. This index suits well with k-means clustering, and is also used to determine the optimal number of clusters. Davies-Bouldin Criterion The Davies Bouldin (DB) index is a metric for assessing clustering algorithms. This is an internal evaluation scheme, where the validation of how well the clustering has been made is done using quantities and features inherent to the dataset. DB Index criterion depends on the ratio of within-cluster and between-cluster distances. The maximum value denotes the worst-case within-to-between cluster ratio for cluster i. The optimal clustering solution has the minimum DB index value. Clustering Algorithm Table 3.1: Performance Measures Based on Euclidean distance No. of Performance Measures Clusters K-means FM SILHOUETTE DB Index Hierarchical single Hierarchical Complete (3.2) 177 Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

6 In K-Means, the optimal cluster for FM index is obtained when k=2. The highest value for silhouette index is attained at k=3 whereas the lowest value for DB index is at k=3. When considering algorithms like Hierarchical single and Hierarchical complete, the best clustering for FM index is obtained when k=10, DB index attains the optimal cluster at k=2 and Silhouette index at k=2. Here, k denotes the number of clusters. In Hierarchical Single the optimal clusters are obtained at k=2. Hence, Hierarchical Single algorithm out performance K-Means and Hierarchical Complete algorithm. Figure: 3.1 Euclidean distance based FM index When k=3, Silhouette index has the value which is the optimal solution Figure: 3.2 Euclidean distance based Silhouette index When k=7 DB index has the value which is the optimal solution 178 Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

7 Figure: 3.3 Euclidean distance based DB index The optimal clustering solution has the lowest FM index and Davies Bouldin index value whereas the Silhouette index has the maximum value. CONCLUSION Document clustering will be mainly used for extraction of better document and text mining. From the literatures, it is observed that the summarization technique will be used for clustering the documents so that the dimensionality of the document can be reduced and clustering quality will be improved. Hence a concept based text document summarization method is proposed in this work. The proposed method involves with concept based weighting scheme which computes the importance of the underlying text by converting the documents into a bag of concepts. In this paper concept based text document summarization deals with encapsulating the text document using statistical approach. Ontology is used as a background knowledge which helps in finding the related meanings for the terms occurring in the source documents. It is designed to improve the quality of searching which is based on the process of summarization model for MeSH (Medical Subject Heading) labels. REFERENCES [1] Abdullah, KA 2015 An Ontology Based Text Document Summarization Using Statistical Approach, Journal of Computing & ICT vol 8. no. 2 issue 2, pp [2] Acierno, AD, Moscato, V, Persia, F, Picariello, A, Penta, A 2012 iwin: A summarizer system based on a semantic analysis of web documents, Proceedings of the Sixth IEEE International Conference on Semantic Computing, pp [3] Aliguliyev, RM 2009 A new sentence similarity measure and sentence based extractive technique for automatic text summarization, Expert Systems with Applications, vol. 36, no. 4, pp [4] Al-Hashemi, R 2010 Text Summarization Extraction System Using Extracted Keywords, International Arab Journal of e-technology, vol.1, no.4, pp [5] Archana, AB, Sunitha, C, Babu, AS & Sarasan, S 2013 Document Clustering Using Cluster Based Method, International Journal of Emerging Technology and Advanced Engineering, volume 3, issue 7. [6] Bhole, P and Dr.Agrawal, AJ 2014 Single Document Text Summarization Using Clustering Approach Implementing for News Article, International Journal of Engineering Trends and Technology (IJETT) volume 15, number Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

8 [7] Deshpande, AR and Lobo, LMRJ 2013 Text Summarization using Clustering Technique, International Journal of Engineering Trends and Technology (IJETT) volume.4, issue 8. [8] Devasena, CL & Hemalatha, M 2009 Automatic Text Categorization and Summarization using Rule Reduction, International Conference on Advances In Engineering, Science and Management, pp [9] Florence, A & Padmadas, V 2015 A Summarizer System Based on a Semantic Analysis of Web Documents, International Conference on Technologies for Sustainable Development, pp.1-6. [10] Fiszman, M., Demner-Fushman, D, Kilicoglu, H, & Rindflesch, TC 2009 Automatic Summarization of MEDLINE citations for Evidence-based Medical Treatment: A Topic-Oriented Evaluation, Journal of Biomedical Informatics, vol. 42, no.5, pp [11] Gupta, V, Chauhan, P & Garg, S 2012 An Statistical Tool for Multi-Document Summarization, International Journal of Scientific and Research Publications, volume 2, issue 5, pp [12] Gupta, V & Lehal, GS 2010 A Survey of Text Summarization Extractive Techniques, Journal of Emerging Technologies in Web Intelligence, vol. 2, no.3, pp [13]Khanapure, VM & Chirchi, VR 2014 Multi-document Summarization Based on Cluster, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, vol. 3, issue 4, pp [14] Kogilavani, A and Dr.Balasubramanie, P 2009 Ontology Enhanced Clustering Based Summarization of Medical Documents, International Journal of Recent Trends in Engineering, vol. 1, no. 1. [15] Kaur, S & Chopra, A 2016 Clustering Based Document Summarization, International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), volume 5, issue 1. [16] Latika 2015 An Effective and Efficient Algorithm for Document Clustering, International Journal of Advanced Research in Computer Science and Software Engineering, volume 5, issue 5. [17] Lei Li & Tao Li 2014 An Empirical Study of Ontology-Based Multi-Document Summarization in Disaster Management, vol. 44, no. 2, pp [18] Li, Y, Luo, C & Chung, SM 2008 Text clustering with feature selection by using statistical data, IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 20, pp [19] Lodhi, P, Sharma, T 2016, An Extractive Summarization of Document Using Conceptual Mining and Sentence Ranking, International Journal of Innovative Research in Computer and Communication Engineering, vol. 4, issue 6. [20] Nagwani, K and Dr.Verma, S 2011 A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm, International Journal of Computer Applications, volume 17, no.2, pp [21] Oak, R 2016 Extractive Techniques for Automatic Document Summarization, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 4, Issue 3. [22] Pal, AR, Maiti, KP & Saha, D 2013, An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm And Wordnet, International Journal of Control Theory and Computer Modeling (IJCTCM) vol.3, no.4/5. [23] Pie-ying, Z & Cun he, L 2012 Automatic text summarization based on sentences clustering and extraction, IEEE International Conference pp [24] Prasad, GK, Mathivanan, H, Jayaprakasam, M & Geetha, TV 2009 Document summarization and information extraction for generation of presentation slides, International Conference on Advances in Recent Technologies in Communication and Computing, pp [25] Ramezani, M & Feizi-Derakhshi, M 2015 Ontology-Based Automatic Text Summarization Using FarsNet, an International Journal, vol. 4, issue 2, no.14, pp [26] Alguliev, R & Aliguliyev, R 2009 Evolutionary Algorithm for Extractive Text Summarization, Intelligent Information Management, 1,pp [27] Saranyamol, CS & Sindhu, L 2014 A Survey on Automatic Text Summarization, International Journal of Computer Science and Information Technologies, vol. 5, pp [28] Sarkar, K 2009 Using Domain Knowledge for Text Summarization in Medical Domain, International Journal of Recent Trends in Engineering, vol.1, no. 1. [29] Wang, DD, Zhu, SH, Li, T, Chi, Y & Gong, YH 2008 Integrating clustering and multi-document summarization to improve document understanding, Proceedings of ACM 17th Conference on Information and Knowledge Management, pp [30] Zhang, X, Jing, L, Hu, X, Ng, M, Xia, J and Zhou, X, 2008 Medical document clustering using Ontology Based Term similarity measures, International Journal of Data Warehousing and Mining, vol.4, no.1, pp Dr. S.Logeswari, Dr. R.Gomathi, Dr.B.Gomathy

A Semantic Model for Concept Based Clustering

A Semantic Model for Concept Based Clustering A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES Dr. S.Vijayarani R.Janani S.Saranya Assistant Professor Ph.D.Research Scholar, P.G Student Department of CSE, Department of CSE, Department

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Automatic Text Summarization System Using Extraction Based Technique

Automatic Text Summarization System Using Extraction Based Technique Automatic Text Summarization System Using Extraction Based Technique 1 Priyanka Gonnade, 2 Disha Gupta 1,2 Assistant Professor 1 Department of Computer Science and Engineering, 2 Department of Computer

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

IJSER. Keywords: Document Clustering; Ontology; semantic similarity; concept weight I. INTRODUCTION

IJSER. Keywords: Document Clustering; Ontology; semantic similarity; concept weight I. INTRODUCTION International Journal of Scientific & Engineering Research, Volume 5, Issue 3, March-2014 469 A Review on Document Clustering Using Concept Weight Sapna Gupta 1, Prof. Vikrant Chole 2, Prof. Dr.A.Mahajan

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao Large Scale Chinese News Categorization --based on Improved Feature Selection Method Peng Wang Joint work with H. Zhang, B. Xu, H.W. Hao Computational-Brain Research Center Institute of Automation, Chinese

More information

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Integrating Text Mining with Image Processing

Integrating Text Mining with Image Processing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 01-05 www.iosrjournals.org Integrating Text Mining with Image Processing Anjali Sahu 1, Pradnya Chavan 2, Dr. Suhasini

More information

Ontology Based Search Engine

Ontology Based Search Engine Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

Effective Pattern Identification Approach for Text Mining

Effective Pattern Identification Approach for Text Mining Effective Pattern Identification Approach for Text Mining Vaishali Pansare Computer Science and Engineering, Jawaharlal Nehru Engineering College, Aurangabad-431003, M. S., India Abstract Text mining is

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

An Improved Document Clustering Approach Using Weighted K-Means Algorithm An Improved Document Clustering Approach Using Weighted K-Means Algorithm 1 Megha Mandloi; 2 Abhay Kothari 1 Computer Science, AITR, Indore, M.P. Pin 453771, India 2 Computer Science, AITR, Indore, M.P.

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Semantically Driven Snippet Selection for Supporting Focused Web Searches

Semantically Driven Snippet Selection for Supporting Focused Web Searches Semantically Driven Snippet Selection for Supporting Focused Web Searches IRAKLIS VARLAMIS Harokopio University of Athens Department of Informatics and Telematics, 89, Harokopou Street, 176 71, Athens,

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

A Survey On Different Text Clustering Techniques For Patent Analysis

A Survey On Different Text Clustering Techniques For Patent Analysis A Survey On Different Text Clustering Techniques For Patent Analysis Abhilash Sharma Assistant Professor, CSE Department RIMT IET, Mandi Gobindgarh, Punjab, INDIA ABSTRACT Patent analysis is a management

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering

A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering Gurpreet Kaur M-Tech Student, Department of Computer Engineering, Yadawindra College of Engineering, Talwandi Sabo,

More information

Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining

Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining 1 Vishakha D. Bhope, 2 Sachin N. Deshmukh 1,2 Department of Computer Science & Information Technology, Dr. BAM

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

WordNet-based User Profiles for Semantic Personalization

WordNet-based User Profiles for Semantic Personalization PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com

More information

A Comprehensive Analysis of using Semantic Information in Text Categorization

A Comprehensive Analysis of using Semantic Information in Text Categorization A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department

More information

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.25-30 Enhancing Clustering Results In Hierarchical Approach

More information

Context Based Web Indexing For Semantic Web

Context Based Web Indexing For Semantic Web IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 89-93 Anchal Jain 1 Nidhi Tyagi 2 Lecturer(JPIEAS) Asst. Professor(SHOBHIT

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Relevance Feature Discovery for Text Mining

Relevance Feature Discovery for Text Mining Relevance Feature Discovery for Text Mining Laliteshwari 1,Clarish 2,Mrs.A.G.Jessy Nirmal 3 Student, Dept of Computer Science and Engineering, Agni College Of Technology, India 1,2 Asst Professor, Dept

More information

Wordnet Based Document Clustering

Wordnet Based Document Clustering Wordnet Based Document Clustering Madhavi Katamaneni 1, Ashok Cheerala 2 1 Assistant Professor VR Siddhartha Engineering College, Kanuru, Vijayawada, A.P., India 2 M.Tech, VR Siddhartha Engineering College,

More information

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY Asian Journal Of Computer Science And Information Technology 2: 3 (2012) 26 30. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science and Information Technology Journal

More information

CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT Clemson University TigerPrints All Dissertations Dissertations 5-2010 CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT William Taylor II

More information

Knowledge Engineering in Search Engines

Knowledge Engineering in Search Engines San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Knowledge Engineering in Search Engines Yun-Chieh Lin Follow this and additional works at:

More information

ISSN: , (2015): DOI:

ISSN: , (2015): DOI: www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 6 June 2017, Page No. 21737-21742 Index Copernicus value (2015): 58.10 DOI: 10.18535/ijecs/v6i6.31 A

More information

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

Exploratory Analysis: Clustering

Exploratory Analysis: Clustering Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

The NLM Medical Text Indexer System for Indexing Biomedical Literature

The NLM Medical Text Indexer System for Indexing Biomedical Literature The NLM Medical Text Indexer System for Indexing Biomedical Literature James G. Mork 1, Antonio J. Jimeno Yepes 2,1, Alan R. Aronson 1 1 National Library of Medicine, Bethesda, MD, USA {mork,alan}@nlm.nih.gov

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Inferring Variable Labels Considering Co-occurrence of Variable Labels in Data Jackets

Inferring Variable Labels Considering Co-occurrence of Variable Labels in Data Jackets 2016 IEEE 16th International Conference on Data Mining Workshops Inferring Variable Labels Considering Co-occurrence of Variable Labels in Data Jackets Teruaki Hayashi Department of Systems Innovation

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY Ankush Maind 1, Prof. Anil Deorankar 2 and Dr. Prashant Chatur 3 1 M.Tech. Scholar, Department of Computer Science and Engineering, Government

More information

Visualization and text mining of patent and non-patent data

Visualization and text mining of patent and non-patent data of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

A Framework for Ontology Life Cycle Management

A Framework for Ontology Life Cycle Management A Framework for Ontology Life Cycle Management Perakath Benjamin, Nitin Kumar, Ronald Fernandes, and Biyan Li Knowledge Based Systems, Inc., College Station, TX, USA Abstract - This paper describes a method

More information

Semantic text features from small world graphs

Semantic text features from small world graphs Semantic text features from small world graphs Jurij Leskovec 1 and John Shawe-Taylor 2 1 Carnegie Mellon University, USA. Jozef Stefan Institute, Slovenia. jure@cs.cmu.edu 2 University of Southampton,UK

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Extractive Text Summarization Techniques

Extractive Text Summarization Techniques Extractive Text Summarization Techniques Tobias Elßner Hauptseminar NLP Tools 06.02.2018 Tobias Elßner Extractive Text Summarization Overview Rough classification (Gupta and Lehal (2010)): Supervised vs.

More information

GRAPH BASED RATIONAL TEXT CLUSTERING USING SEMANTIC ONTOLOGY

GRAPH BASED RATIONAL TEXT CLUSTERING USING SEMANTIC ONTOLOGY GRAPH BASED RATIONAL TEXT CLUSTERING USING SEMANTIC ONTOLOGY 1 S. SUBBAIAH, 2 C. CHANDRASEKAR 1 Asst. Prof., Department of Master of Computer Applications, K. S. Rangasamy College of Technology, Tiruchengode,

More information

Review on Text Mining

Review on Text Mining Review on Text Mining Aarushi Rai #1, Aarush Gupta *2, Jabanjalin Hilda J. #3 #1 School of Computer Science and Engineering, VIT University, Tamil Nadu - India #2 School of Computer Science and Engineering,

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Content Based Key-Word Recommender

Content Based Key-Word Recommender Content Based Key-Word Recommender Mona Amarnani Student, Computer Science and Engg. Shri Ramdeobaba College of Engineering and Management (SRCOEM), Nagpur, India Dr. C. S. Warnekar Former Principal,Cummins

More information

A SURVEY OF IMAGE MINING TECHNIQUES AND APPLICATIONS

A SURVEY OF IMAGE MINING TECHNIQUES AND APPLICATIONS A SURVEY OF IMAGE MINING TECHNIQUES AND APPLICATIONS R. Vijayalatha Research Scholar, Manonmaniam Sundaranar University, Tirunelveli (India) ABSTRACT In the area of Data Mining, Image Mining technology

More information

The Effect of Diversity Implementation on Precision in Multicriteria Collaborative Filtering

The Effect of Diversity Implementation on Precision in Multicriteria Collaborative Filtering The Effect of Diversity Implementation on Precision in Multicriteria Collaborative Filtering Wiranto Informatics Department Sebelas Maret University Surakarta, Indonesia Edi Winarko Department of Computer

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Filtering of Unstructured Text

Filtering of Unstructured Text International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 12 (December 2015), PP.45-49 Filtering of Unstructured Text Sudersan Behera¹,

More information