A Combined Method of Text Summarization via Sentence Extraction
|
|
- Jane Gardner
- 5 years ago
- Views:
Transcription
1 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, A Combined Method of Text Summarization via Sentence Extraction CHENGHUA DANG Hebei University of Engineering Handan, Hebei, PEOPLE S REPUBLIC OF CHINA XINJUN LUO Hebei University of Engineering Handan, Hebei, PEOPLE S REPUBLIC OF CHINA Abstract: - In this paper, we propose a practical approach for extracting the most relevant sentences from the original document to form a summary. We present this summarization procedure based upon statistical selection and WordNet. Experimental results show that our approach compares favourably to a commercial text summarizer. Key-Words: - Text Summarization, Key Sentence, Keyword, WordNet, Synset 1 Introduction Text summarization is the process of condensing a source text while preserving its information content and maintaining readability. The main (and large) difference between automatic and human-based text summarization is that humans can capture and convey subtle themes that permeate documents, whereas automatic approaches have a large difficulty to do the same. Nonetheless, as the amount of information available in electronic format continues to grow, research into automatic text summarization has taken on renewed interest. A summary can be employed in an indicative way as a pointer to some parts of the original document, or in an informative way to cover all relevant information of the text [1]. In both cases the most important advantage of using a summary is its reduced reading time. Summary generation by an automatic procedure has also other advantages: (i) the size of the summary can be controlled; (ii) its content is determinist; and (iii) the link between a text element in the summary and its position in the original text can be easily established. Technology of automatic summarization of text is maturing and may provide a solution to this problem [2, 3]. Automatic text summarization produces a concise summary by abstraction or extraction of important text using statistical approaches [4], linguistic approaches [5] or combination of the two [3, 6, 7]. In this paper, we propose a practical approach for extracting the most relevant sentences from the original document to form a summary. The idea of our approach is to exploit sentences from both the Keywords extraction based on statistics and Synsets extraction using WordNet. These two properties can be combined and tuned for ranking and extracting sentences. We provide experimental evidence that our approach achieves reasonable performance compared with a commercial text summarizer (Microsoft Word summarizer). This paper is structured as follows. In Section 2 we review previous research related to the problem of text summarization and summary evaluation. Section 3 presents our combined method of key-sentence extraction. Section 4 provides
2 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, experiments comparing our method to ten other summarization approaches. Finally, Section 5 concludes the paper. 2 Related Work 2.1 Summarization Techniques Text summarization by extraction can employ various levels of granularity, e.g., keyword, sentence, or paragraph. MEAD [8], a state of the art sentenceextractor and a top performer at DUC, aims to extracts sentences central to the overall topic of a document. The system employs (1) a centroid score representing the centrality of a sentence to the overall document, (2) a position score which is inversely proportional to the position of a sentence in the document, and (3) an overlap-with-first score which is the inner product of the tf * idf with the first sentence of the document. MEAD attempts to reduce summary redundancy by eliminating sentences above a similarity threshold parameter. Other approaches for sentence extraction include NLP methods [9, 10] and machinelearning techniques [11, 12]. These approaches tend to be computationally expensive and genre-dependent even though they are typically based on the more general tf * idf framework. Work on generative algorithms includes sentence compression [13], sentence fusion [14], and sentence modification [15]. 2.2 Keywords Extraction Techniques Traditionally, keywords are extracted from the documents in order to generate a summary. In this work, single keywords are extracted via statistical measures. Based on such keywords, the most significant sentences, which best describe the document, are retrieved. Keyword extraction from a body of text relies on an evaluation of the importance of each candidate keyword [16]. A candidate keyword is considered a true keyword if and only if it occurs frequently in the document, i.e., the total frequency of occurrence is high. Of course, stop words like the, a etc are excluded. 2.3 WordNet in Text Classification WordNet [17] is an online lexical reference system in which English nouns, verbs, adjectives and adverbs are grouped organized into synonym sets or synsets, each representing one underlying lexical concept. A synset is a set of synonyms (word forms that relate to the same word meaning) and two words are said to be synonyms if their mutual substitution does not alter the truth value of a given sentence in which they occur, in a given context. Noun synsets are related to each other through hypernymy (generalization), hyponymy (speciali-zation), holonymy (whole of) and meronymy (part of) relations. Of these, (hypernymy, hyponymy) and (meronymy, holonymy) are complementary pairs. The verb and adjective synsets are very sparsely connected with each other. No relation is available between noun and verb synsets. However, 4500 adjective synsets are related to noun synsets with pertainyms (pertaining to) and attra (attributed with) relations. Scott and Matwin [18] propose to deal with text classification within a mixed model where WordNet and machine learning are the main ingredients. This proposal explores the hypothesis that the incorporation of structured linguistic knowledge can aid (and guide) statistical inference in order to classify corpora. Other proposals have the same hybrid spirit in related areas: Rodriguez, Buenaga, Gómez- Hidalgo, Agudo [19] and Vorhees [20] use the WordNet ontology for Information Retrieval; Resnik [21] proposes another methodology that index corpora to WordNet with the goal of increasing the reliability of Information Retrieval results. Scott and Matwin [18], however, use a machine learning algorithm elaborated for WordNet (more specifically, over the relations of synonymy and hyperonymy). This aims to alter the text representation from a non-ordered set of words (bag-ofwords) to a hyperonymy density structure. 3 Our Algorithms 3.1 Preprocessing of the text 1) Break the text into sentences. 2) Stop-word elimination common words with no semantics and which
3 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, do not aggregate relevant information to the task (e.g., the, a ) are eliminated; 3) Case folding: consists of converting all the characters to the same kind of letter case - either upper case or lower case; 4) Stemming: syntactically similar words, such as plurals, verbal variations, etc. are considered similar; the purpose of this procedure is to obtain the stem or radix of each word, which emphasize its semantics. 3.2 Synsets Ranking The basic motivation of this step is to rank the synsets based on their relevance to the text. So, if lots of words in the text correspond to the same synset, that synset or meaning is more relevant to the text, and thus, it must get a higher rank. This idea has been borrowed from [22], which details the use of WordNet Synsets as a mode of text representation. 3.3 Refinement of Keywords The collection of Keywords are refined as compared with Synsets obtained above. The comparison is conducted by calculating the similarity between Keywords and Synsets. According to the vectorial model, this feature is obtained by using the Synsets of the document as a query against all the Keywords of the document; then the similarity of the document s Synsets and each Keyword is computed by the cosine similarity measure [23]. Then we retain those Keywords which have the closest similarity to the Synsets. 3.4 Key-Sentence Selection Once the keywords are identified, the most significant sentences for summary generation can be retrieved from all narrative paragraphs based on the presence of keywords [24]. The significance of a sentence is measured by calculating a weight value, which is the maximum of the weights for word clusters within the sentence. A word cluster is defined as a list of words which starts and ends with a keyword and less than 2 non-keywords must separate any two neighboring keywords [16]. The weight of a word cluster is computed by adding the weights of all keywords within the word cluster, and dividing this sum by the total number of keywords within the word cluster. The weights of all sentences in all narrative text paragraphs are computed and the top five sentences (ranked according to sentence weight) are the key sentences to be included in the summary. The overall summary is formed by the top 25 keywords and the top 5 key sentences. These numbers are determined based on the fact that key sentences are more informative than keywords, and the whole summary should fit in a single page. 4 Experiments Summaries can be evaluated using intrinsic or extrinsic measures [25]. While intrinsic methods attempt to measure summary quality using human evaluation thereof, extrinsic methods measure the same through a task-based performance measure such the information retrieval-oriented task. Intrinsic approach was utilized in our experiments. However, it is a timeconsuming process to identify important units in documents by humans, therefore, we chose the Microsoft Word summarizer of MS Office 2000 to output summary baselines. The comparison between our algorithm and the summarization algorithm for MS Word 2000 demonstrates that our experimental results give the best summarization at around 35% summary of a document. 5 Conclusion We have presented a combined technique for the extraction of key-sentences from a document, and use such sentences as a summary of the same document. Refining Keywords against WordNet Synsets comprehensively improve the correctness of automatic summary. References: [1] Mani, I. Automatic Summarization. J.Benjamins Publ. Co. Amsterdam Philadelphia (2001).
4 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, [2] I. Mani and M. Maybury. Advances in Automatic Text Summarization. MIT Press, ISBN , [3] I. Mani. Recent developments in text summarization. In ACM Conference on Information and Knowledge Management, CIKM 01, pages , [4] O. Buyukkokten, H. Garcia-Molina, and A. Paepcke. Seeing the whole in parts: Text summarization for web browsing on handheld devices. In Proceedings of 10 th International World-Wide Web Conference, [5] C. Aone, M.E. Okurowski, J. Gorlinsky, and B. Larsen. A scalable summarization system using robust NLP. In Proceedings of the ACL 97/EACL 97 Workshop on Intelligent Scalable Text Summarization, pages 66 73, [6] R. Barzilay and M. Elhadad. Using lexical chains for text summarization. In Proceedings of the Intelligent Scalable Text Summarization Workshop (ISTS 97), ACL, Madrid, Spain, [7] J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of SIGIR, pages , [8] D. Radev, H. Jing, and M. Budzikowska. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation and user studies. In ANLP/NAACL Workshop on Automatic Summarization, pages 21 29, [9] C. Aone, M. E. Okurowski, J. Gorlinsky, and B. Larsen. A Scalable Summarization System Using Robust NLP. In Proceedings of the Intelligent Scalable Text Summarization Workshop, pages 66 73, [10] R. Barzilay and M. Elhadad. Using lexical chains for text summarization. In Proceedings of the Intelligent Scalable Text Summarization Workshop, pages 10 17, [11] C. Nobata and S. Sekine. Results of CRL/NYU System at DUC-2003 and an Experiment on Division of Document Sets. In 2003 Document Understanding Conference Draft Papers, pages 79 85, [12] S. Teufel and M. Moens. Sentence extraction as a classification task. In ACL/EACL Workshop on Intelligent and Scalable Text Summarization, [13] C.-Y. Lin. Improving Summarization Performance by Sentence Compression - A Pilot Study. In Proceedings of the International Workshop on Information Retrieval with Asian Language, pages 1 8, [14] K. Han, Y. Song, and H. Rim. KU Text Summarization System for DUC In Document Understanding Conference Draft Papers, pages , [15] A. Nenkova, B. Schiffman, A. Schlaiker, S. Blair-Goldensohn, R. Barzilay, S. Sigelman, V. Hatzivassiloglou, and K. McKeown. Columbia at the Document Understanding Conference In 2003 Document Understanding Conference Draft Papers, pages 71 78, [16] O. Buyukkokten, H. Garcia-Molina, and A. Paepcke. Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices. In Proceedings of Tenth International World Wide Web Conference, , [17] Fellbaum, Christiane (Ed.), Wordnet : An Electronic Lexical Database (Language, Speech and Communication). MIT Press, [18] S. Scott and S. Matwin, Text classification using WordNet hypernyms. In Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, [19] Buenaga M. Rodríguez., J. M. Gómez- Hidalgo, B. Díaz Agudo, Using WordNet to complement training information in text categorization. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, [20] M. Vorhees, Ellen, Using WordNet for text retrieval. In Fellbaum C. (ed.) WordNet: An Electronic Lexical Database, MIT Press, 1998.
5 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, [21] Resnik Philip, Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14 th International Joint Conference on Artificial Intelligence, Montreal, [22] Ramakrishnan and Bhattacharya, Text representation with wordnet synsets. Eight International Conference on Applications of Natural Language to Information Systems (NLDB2003), [23] Salton, G.; Buckley, C. Termweighting approaches in automatic text retrieval. Information Processing and Management 24, [24] Chuang, W., and Yang, J. Extracting Sentence Segments for Text Summarization: A Machine Learning Approach. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, , [25] D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Computational Linguistics, 28(4): , 2002.
Web Information Retrieval using WordNet
Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT
More informationWEIGHTING QUERY TERMS USING WORDNET ONTOLOGY
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk
More informationWhat is this Song About?: Identification of Keywords in Bollywood Lyrics
What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics
More informationWordNet-based User Profiles for Semantic Personalization
PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationText Mining. Munawar, PhD. Text Mining - Munawar, PhD
10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection
More informationQuestion Answering Approach Using a WordNet-based Answer Type Taxonomy
Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering
More informationMulti-Document Summarization Using Distortion-Rate Ratio
Multi-Document Summarization Using Distortion-Rate Ratio Ulukbek Attokurov Department of Computer Engineering Istanbul Technical University attokurov@itu.edu.tr Ulug Bayazit Department of Computer Engineering
More informationEmpirical Analysis of Single and Multi Document Summarization using Clustering Algorithms
Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationImproving Retrieval Experience Exploiting Semantic Representation of Documents
Improving Retrieval Experience Exploiting Semantic Representation of Documents Pierpaolo Basile 1 and Annalina Caputo 1 and Anna Lisa Gentile 1 and Marco de Gemmis 1 and Pasquale Lops 1 and Giovanni Semeraro
More informationWordNet and Automated Text Summarization
WordNet and Automated Text Summarization Rui Pedro Chaves Computation of Lexical and Grammatical Knowledge Research Group Centro de Linguística da Universidade de Lisboa Avenida 5 de Outubro, 85-5º 1050-050
More informationAutomatic Text Summarization System Using Extraction Based Technique
Automatic Text Summarization System Using Extraction Based Technique 1 Priyanka Gonnade, 2 Disha Gupta 1,2 Assistant Professor 1 Department of Computer Science and Engineering, 2 Department of Computer
More informationReducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming
Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming Florian Boudin LINA - UMR CNRS 6241, Université de Nantes, France Keyphrase 2015 1 / 22 Errors made by
More informationCHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS
82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the
More informationString Vector based KNN for Text Categorization
458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research
More informationAutomatic Wordnet Mapping: from CoreNet to Princeton WordNet
Automatic Wordnet Mapping: from CoreNet to Princeton WordNet Jiseong Kim, Younggyun Hahm, Sunggoo Kwon, Key-Sun Choi Semantic Web Research Center, School of Computing, KAIST 291 Daehak-ro, Yuseong-gu,
More informationA Framework for Summarization of Multi-topic Web Sites
A Framework for Summarization of Multi-topic Web Sites Yongzheng Zhang Nur Zincir-Heywood Evangelos Milios Technical Report CS-2008-02 March 19, 2008 Faculty of Computer Science 6050 University Ave., Halifax,
More informationA Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet
A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch
More informationMulti-Document Summarizer for Earthquake News Written in Myanmar Language
Multi-Document Summarizer for Earthquake News Written in Myanmar Language Myat Myitzu Kyaw, and Nyein Nyein Myo Abstract Nowadays, there are a large number of online media written in Myanmar language.
More informationSummarizing Public Opinion on a Topic
Summarizing Public Opinion on a Topic 1 Abstract We present SPOT (Summarizing Public Opinion on a Topic), a new blog browsing web application that combines clustering with summarization to present an organized,
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationHebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process
A Text-Mining-based Patent Analysis in Product Innovative Process Liang Yanhong, Tan Runhua Abstract Hebei University of Technology Patent documents contain important technical knowledge and research results.
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationAn Adaptive Agent for Web Exploration Based on Concept Hierarchies
An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University
More informationQuery-based Text Normalization Selection Models for Enhanced Retrieval Accuracy
Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy Si-Chi Chin Rhonda DeCook W. Nick Street David Eichmann The University of Iowa Iowa City, USA. {si-chi-chin, rhonda-decook,
More informationA Study of Global Inference Algorithms in Multi-Document Summarization
A Study of Global Inference Algorithms in Multi-Document Summarization Ryan McDonald Google Research 76 Ninth Avenue, New York, NY 10011 ryanmcd@google.com Abstract. In this work we study the theoretical
More informationA hybrid method to categorize HTML documents
Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume
More informationSAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering
SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering 1 G. Loshma, 2 Nagaratna P Hedge 1 Jawaharlal Nehru Technological University, Hyderabad 2 Vasavi
More informationAn ontology-based approach for semantics ranking of the web search engines results
An ontology-based approach for semantics ranking of the web search engines results Abdelkrim Bouramoul*, Mohamed-Khireddine Kholladi Computer Science Department, Misc Laboratory, University of Mentouri
More informationQUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL
QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es
More informationDocument Summarization on Handheld Device:
Document Summarization on Handheld Device: An Information Visualization Tool for Mobile Commerce Christopher C. Yang Dept. of Systems Eng. and Eng. Management The Chinese University of Hong Kong Hong Kong
More informationTo search and summarize on Internet with Human Language Technology
To search and summarize on Internet with Human Language Technology Hercules DALIANIS Department of Computer and System Sciences KTH and Stockholm University, Forum 100, 164 40 Kista, Sweden Email:hercules@kth.se
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationPKUSUMSUM : A Java Platform for Multilingual Document Summarization
PKUSUMSUM : A Java Platform for Multilingual Document Summarization Jianmin Zhang, Tianming Wang and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key Laboratory of
More informationSummarizing Search Results using PLSI
Summarizing Search Results using PLSI Jun Harashima and Sadao Kurohashi Graduate School of Informatics Kyoto University Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan {harashima,kuro}@nlp.kuee.kyoto-u.ac.jp
More informationUsing Question-Answer Pairs in Extractive Summarization of Conversations
Using Question-Answer Pairs in Extractive Summarization of Email Conversations Lokesh Shrestha, Kathleen McKeown and Owen Rambow Columbia University New York, NY, USA lokesh, kathy, rambow @cs.columbia.edu
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationSense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm
ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using
More informationMeasuring Semantic Similarity between Words Using Page Counts and Snippets
Measuring Semantic Similarity between Words Using Page Counts and Snippets Manasa.Ch Computer Science & Engineering, SR Engineering College Warangal, Andhra Pradesh, India Email: chandupatla.manasa@gmail.com
More informationThe Goal of this Document. Where to Start?
A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce
More informationPunjabi WordNet Relations and Categorization of Synsets
Punjabi WordNet Relations and Categorization of Synsets Rupinderdeep Kaur Computer Science Engineering Department, Thapar University, rupinderdeep@thapar.edu Suman Preet Department of Linguistics and Punjabi
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationIntegrating Spanish Linguistic Resources in a Web Site Assistant
Integrating Spanish Linguistic Resources in a Web Site Assistant Paloma Martínez*, Ana García-Serrano, Alberto Ruiz-Cristina * Universidad Carlos III de Madrid Avd. Universidad 30, 28911 Leganés, Madrid,
More informationA Machine Learning Approach for Displaying Query Results in Search Engines
A Machine Learning Approach for Displaying Query Results in Search Engines Tunga Güngör 1,2 1 Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey 2 Visiting Professor at
More informationAn Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments
An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments Hui Fang ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Abstract In this paper, we report
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationUser Profiling for Interest-focused Browsing History
User Profiling for Interest-focused Browsing History Miha Grčar, Dunja Mladenič, Marko Grobelnik Jozef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia {Miha.Grcar, Dunja.Mladenic, Marko.Grobelnik}@ijs.si
More informationA Textual Entailment System using Web based Machine Translation System
A Textual Entailment System using Web based Machine Translation System Partha Pakray 1, Snehasis Neogi 1, Sivaji Bandyopadhyay 1, Alexander Gelbukh 2 1 Computer Science and Engineering Department, Jadavpur
More informationExtracting Summary from Documents Using K-Mean Clustering Algorithm
Extracting Summary from Documents Using K-Mean Clustering Algorithm Manjula.K.S 1, Sarvar Begum 2, D. Venkata Swetha Ramana 3 Student, CSE, RYMEC, Bellary, India 1 Student, CSE, RYMEC, Bellary, India 2
More informationNLP Final Project Fall 2015, Due Friday, December 18
NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationClassification and Comparative Analysis of Passage Retrieval Methods
Classification and Comparative Analysis of Passage Retrieval Methods Dr. K. Saruladha, C. Siva Sankar, G. Chezhian, L. Lebon Iniyavan, N. Kadiresan Computer Science Department, Pondicherry Engineering
More informationImproving the Reverse Dictionary Using Scalable Dataset
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 11 Nov. 2016, Page No. 18827-18831 ABSTRACT Improving the Reverse Dictionary Using Scalable Dataset
More informationOntology Research Group Overview
Ontology Research Group Overview ORG Dr. Valerie Cross Sriram Ramakrishnan Ramanathan Somasundaram En Yu Yi Sun Miami University OCWIC 2007 February 17, Deer Creek Resort OCWIC 2007 1 Outline Motivation
More informationA Comprehensive Analysis of using Semantic Information in Text Categorization
A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department
More informationEnhancing Automatic Wordnet Construction Using Word Embeddings
Enhancing Automatic Wordnet Construction Using Word Embeddings Feras Al Tarouti University of Colorado Colorado Springs 1420 Austin Bluffs Pkwy Colorado Springs, CO 80918, USA faltarou@uccs.edu Jugal Kalita
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationA probabilistic description-oriented approach for categorising Web documents
A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic
More informationA Deep Relevance Matching Model for Ad-hoc Retrieval
A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese
More informationLinguistic Graph Similarity for News Sentence Searching
Lingutic Graph Similarity for News Sentence Searching Kim Schouten & Flavius Frasincar schouten@ese.eur.nl frasincar@ese.eur.nl Web News Sentence Searching Using Lingutic Graph Similarity, Kim Schouten
More informationGernEdiT: A Graphical Tool for GermaNet Development
GernEdiT: A Graphical Tool for GermaNet Development Verena Henrich University of Tübingen Tübingen, Germany. verena.henrich@unituebingen.de Erhard Hinrichs University of Tübingen Tübingen, Germany. erhard.hinrichs@unituebingen.de
More informationOptimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.
Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this
More informationInternational Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7
International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7 A Hybrid Method for Extracting Key Terms of Text Documents Ahmad Ali Al-Zubi Computer Science Department
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationLexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval
LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval Ernesto William De Luca and Andreas Nürnberger 1 Abstract. The problem of word sense disambiguation in lexical resources
More informationCharles University at CLEF 2007 CL-SR Track
Charles University at CLEF 2007 CL-SR Track Pavel Češka and Pavel Pecina Institute of Formal and Applied Linguistics Charles University, 118 00 Praha 1, Czech Republic {ceska,pecina}@ufal.mff.cuni.cz Abstract
More informationHeading-Based Sectional Hierarchy Identification for HTML Documents
Heading-Based Sectional Hierarchy Identification for HTML Documents 1 Dept. of Computer Engineering, Boğaziçi University, Bebek, İstanbul, 34342, Turkey F. Canan Pembe 1,2 and Tunga Güngör 1 2 Dept. of
More informationEnriching Ontology Concepts Based on Texts from WWW and Corpus
Journal of Universal Computer Science, vol. 18, no. 16 (2012), 2234-2251 submitted: 18/2/11, accepted: 26/8/12, appeared: 28/8/12 J.UCS Enriching Ontology Concepts Based on Texts from WWW and Corpus Tarek
More informationSEMANTIC ANALYSIS BASED TEXT CLUSTERING BY THE FUSION OF BISECTING K-MEANS AND UPGMA ALGORITHM
SEMANTIC ANALYSIS BASED TEXT CLUSTERING BY THE FUSION OF BISECTING K-MEANS AND UPGMA ALGORITHM G. Loshma 1 and Nagaratna P. Hedge 2 1 Jawaharlal Nehru Technological University, Hyderabad, India 2 Vasavi
More informationEvaluating a Conceptual Indexing Method by Utilizing WordNet
Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4
More informationMEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY
MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY Ankush Maind 1, Prof. Anil Deorankar 2 and Dr. Prashant Chatur 3 1 M.Tech. Scholar, Department of Computer Science and Engineering, Government
More informationTag Semantics for the Retrieval of XML Documents
Tag Semantics for the Retrieval of XML Documents Davide Buscaldi 1, Giovanna Guerrini 2, Marco Mesiti 3, Paolo Rosso 4 1 Dip. di Informatica e Scienze dell Informazione, Università di Genova, Italy buscaldi@disi.unige.it,
More informationRandom Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li
Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled
More informationWikiOnto: A System For Semi-automatic Extraction And Modeling Of Ontologies Using Wikipedia XML Corpus
2009 IEEE International Conference on Semantic Computing WikiOnto: A System For Semi-automatic Extraction And Modeling Of Ontologies Using Wikipedia XML Corpus Lalindra De Silva University of Colombo School
More informationIRCE at the NTCIR-12 IMine-2 Task
IRCE at the NTCIR-12 IMine-2 Task Ximei Song University of Tsukuba songximei@slis.tsukuba.ac.jp Yuka Egusa National Institute for Educational Policy Research yuka@nier.go.jp Masao Takaku University of
More informationCADIAL Search Engine at INEX
CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr
More informationEffect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching
Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna
More informationCHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING
43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic
More informationOntology Based Search Engine
Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India
More informationContext based Re-ranking of Web Documents (CReWD)
Context based Re-ranking of Web Documents (CReWD) Arijit Banerjee, Jagadish Venkatraman Graduate Students, Department of Computer Science, Stanford University arijitb@stanford.edu, jagadish@stanford.edu}
More informationMaking Retrieval Faster Through Document Clustering
R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e
More informationText Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering
Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani
More informationAdaptive Model of Personalized Searches using Query Expansion and Ant Colony Optimization in the Digital Library
International Conference on Information Systems for Business Competitiveness (ICISBC 2013) 90 Adaptive Model of Personalized Searches using and Ant Colony Optimization in the Digital Library Wahyu Sulistiyo
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationAutomatic Construction of WordNets by Using Machine Translation and Language Modeling
Automatic Construction of WordNets by Using Machine Translation and Language Modeling Martin Saveski, Igor Trajkovski Information Society Language Technologies Ljubljana 2010 1 Outline WordNet Motivation
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationOntology Matching with CIDER: Evaluation Report for the OAEI 2008
Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task
More informationExam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:
English Student no:... Page 1 of 14 Contact during the exam: Geir Solskinnsbakk Phone: 735 94218/ 93607988 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:
More informationOntology Based Prediction of Difficult Keyword Queries
Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com
More informationA Semantic Model for Concept Based Clustering
A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of
More informationUSING SEMANTIC REPRESENTATIONS IN QUESTION ANSWERING. Sameer S. Pradhan, Valerie Krugler, Wayne Ward, Dan Jurafsky and James H.
USING SEMANTIC REPRESENTATIONS IN QUESTION ANSWERING Sameer S. Pradhan, Valerie Krugler, Wayne Ward, Dan Jurafsky and James H. Martin Center for Spoken Language Research University of Colorado Boulder,
More informationText Similarity. Semantic Similarity: Synonymy and other Semantic Relations
NLP Text Similarity Semantic Similarity: Synonymy and other Semantic Relations Synonyms and Paraphrases Example: post-close market announcements The S&P 500 climbed 6.93, or 0.56 percent, to 1,243.72,
More informationKEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES
KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES Dr. S.Vijayarani R.Janani S.Saranya Assistant Professor Ph.D.Research Scholar, P.G Student Department of CSE, Department of CSE, Department
More informationInfluence of Word Normalization on Text Classification
Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we
More informationA Novel PAT-Tree Approach to Chinese Document Clustering
A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong
More information