Scientific Literature Retrieval based on Terminological Paraphrases using Predicate Argument Tuple

Size: px
Start display at page:

Download "Scientific Literature Retrieval based on Terminological Paraphrases using Predicate Argument Tuple"

Transcription

1 Scientific Literature Retrieval based on Terminological Paraphrases using Predicate Argument Tuple Sung-Pil Choi 1, Sa-kwang Song 1, Hanmin Jung 1, Michaela Geierhos 2, Sung Hyon Myaeng 3 1 Korea Institute of Science and Technology Information, Daejeon, Korea 2 Munich University, Munich, Germany 3 Korea Advanced Institute of Science and Technology, Daejeon, Korea {spchoi, esmallj, jhm}@kisti.re.kr, michaela.geierhos@cis.unimuenchen.de, myaeng@kaist.ac.kr Abstract. The conceptual condensability of technical terms permits us to use them as effective queries to search scientific databases. However, authors often employ alternative expressions to represent the meanings of specific terms, in other words, Terminological Paraphrases (TPs) in the literature for certain reasons. In this paper, we propose an effective way to retrieve de facto relevance documents which only contain those TPs and cannot be searched by conventional models in an environment with only controlled vocabularies by adapting Predicate Argument Tuple (PAT). The experiment confirms that PAT-based document retrieval is an effective and promising method to search those kinds of documents and to improve terminology-based scientific information access models. 1 Introduction Terminology is defined as a set of linguistic elements, each of which represents, designates, and defines a technical concept in a particular scientific field. InfoTerm [1], an international information center for terminology, specifies two important roles of terminology: to conceptually represent the expertise of a particular domain, and serve as a tool to access domain-specific information and knowledge. Although much effort has been devoted to invent effective ways of query formulation and processing thus far, most of the world s major scientific databases adopt simple keyword-based strategies rather than more enhanced but complicated approaches 1. One reason is that scientific documents such as articles and patents include many technical terms that are discriminative and therefore highly informative. Accordingly, given that users and contents can share these technical terms, simple termbased methods can still achieve high levels of satisfaction. 1 Google Scholar( PubMed( Microsoft Academic Search ( adfa, p. 1, Springer-Verlag Berlin Heidelberg 2012

2 The conceptual condensability of technical terms permits us to use them as effective queries to search scientific databases. However, authors often employ alternative expressions to represent the meanings of specific terms in the literature for certain reasons. Normal keyword matching models can only find documents that contain the input query terms. In sum, with a single technical term, it is nontrivial to access documents that include only alternative expressions of terms, in other words, terminological paraphrases (TPs). In this paper, we propose an effective way to retrieve documents that contain the alternative expressions which denote the concepts of terminologies in literature by adapting Predicate Argument Tuple (PAT). A PAT consists of multiple arguments and a predicate which represents the semantic relation between them and therefore expresses both syntactic and semantic interrelations between words in a sentence. We exploit PATs as indices for searching various textual segments similar to an input sentence that defines a particular terminology (TPs). To achieve this, we construct a novel document retrieval system based on the PATs to investigate the retrieval of the de facto relevance documents which only contain those TPs and cannot be searched by conventional models in an environment with only controlled vocabularies (namely, single terms). 2 Related Work To enhance the search functions of PubMed, the largest biomedical literature database in the world, Lu et al. (2009) introduced the Automatic Term Mapping (ATM) method, which automatically maps user queries into MeSH descriptors and enables QE with various types of thesaurus information [2]. There have been many studies of QE application to improve the performance of biomedical information retrieval with controlled vocabularies such as MeSH and UMLS [3-7]. 3 PAT-based Scientific Literature Retrieval System This chapter explains a newly invented retrieval system that can identify the TPs of input query terms in scientific literatures based on the definitions of the terms and therefore retrieve de facto relevance documents in an efficient way. We start by introducing the detailed architecture of our proposed system.

3 Fig. 1. System Architecture and Process of PAT-based Retrieval System Fig. 1 shows the architecture and procedure of our system. With an input query term, the term definition finder can obtain the definition of the term from various sources. Definitional PATs, which compose a term definition, are extracted from the definition by applying syntactic parsing, PAT extraction, and preprocessing. With a PAT query consisting of definitional PATs, the system searches and ranks relevant documents that have similar sentences to the definition of the input term. To build the search database, our system extracts all the PATs, rather than words from the original target texts as indices and constructs an inverted file based on them as seen in the Fig. 2. Fig. 2. PAT-based Inverted File Fig. 2 shows a small portion of the PAT-based inverted file. Although conventional information retrieval systems have very complex indexing structures, we construct a simple inverted file structure that contains only sentence identifiers as posting information.

4 3.1 Predicate Argument Tuple (PAT) Predicate Argument Structure (PAS) is a graph structure that denotes collectively the syntactic and semantic relations between words in a sentence [8]. Figure 3 shows an example of the PAS generated from the results of the Enju Parser [8]. Fig. 3. Predicate Argument Structure and Predicate Argument Tuples in a Sentence In the left side of the figure, the gray boxes represent predicates, the white boxes denote arguments, and the arrows express the syntactic relations between them. For example, although the predicate covering in the sentence has two arguments, structure and portion, sperm carries only a single noun argument, head. We can extract Predicate-Argument Tuples (PATs) from the PAS of a sentence as in Fig. 4. A PAT is an element of a PAS and can be classified into one of four types: connective, verbal, adjectival, and nominal. 3.2 Ranking by PAT To compute the similarity between an input PAT query and a document and then rank the search results, we use a simple ranking scheme which measures how many PATs in a PAT query exist in a document. p p Q p S PMRQ, S (1) p p S where Q is a PAT query, p is a single PAT and S is a set of PATs in a sentence. Although we use the PMR (PAT Match Ratio) as our main ranking scheme in this fundamental research, we can invent many additional schemes which can be more effective in retrieving documents containing TPs.

5 4 Experiments In this chapter, we investigated the retrieval of these de facto relevant documents in an environment with only controlled vocabularies (namely, single terms) to retrieve TPs from scientific literature. 4.1 Experimental Settings We use a set of abstracts in biomedical domain selected from NDSL (National Discovery for Science Leaders) 2 database. Table 1 shows its statistics. Table 1. Target Database used in the Experiment Items # of documents # of sentences # of PAT indices extracted Size 615,125 6,061,366 20,608,631 As for the experimental queries, the experiment uses 43 terms randomly selected from MeSH thesaurus which frequently appear in the target database as shown in Table 2. Table 2. Sample Queries from 43 Terms ID MeSH Term Term Definition D Bronchitis, Chronic A subcategory of chronic obstructive pulmonary disease. D Monilethrix Rare autosomal dominant disorder of the hair shaft. D Femur Head Necrosis Aseptic or avascular necrosis of the femoral head. D Kidney Failure, Chronic The end-stage of chronic renal insufficiency. D Dermatitis, Seborrheic A chronic inflammatory disease of the skin with unknown etiology. D Nervous System Disease Diseases of the central and peripheral nervous system. D Hyperargininemia A rare autosomal recessive disorder of the urea cycle. We use three different retrieval models for comparison in this experiment: the (1) Pseudo-Relevance Feedback model (PRF), (2) relevance model with term definitions (DEF), and (3) PAT-based document retrieval (PAT) for performance comparison. For (1) and (2), we used Indri system which produces a ranking model based on a combination of language models [9] and an inference network [10]. In addition, its relevance feedback uses Lavrenko s relevance model [11]. Two experts performed the relevance judgment manually with the top 10 documents retrieved by each system based on the 43 query terms. We measured the agreement ratio for all judged documents. The results are shown in Table

6 Table 3. Agreement Ratio in Relevant Judgements Systems Kappa Score [12] Evaluation 3 PRF Substantial Agreement PAT Almost Perfect Agreement DEF Substantial Agreement Average Substantial Agreement Two raters almost perfectly agreed on the result of a PAT-based search. As for the others, the scores were not significantly different. We selected and analyzed one of the two judgment results without adjusting the conflicts. 4.2 Experimental Results and Discussion Table 4 shows the comprehensive results of the experiment with the three document retrieval systems. Table 4. Evaluation Results of the Three Retrieval Models (Top 10) Items PRF PAT DEF Number of total query terms (S) 43 # of terms searching more than 1 document 29 (67.4%) 43 (100%) 43 (100%) # of terms searching more than 10 documents 16 (37.2%) 28 (65.12%) 43 (100%) Total # of retrieved documents (A) Total # of relevant documents (B) # of retrieved documents per term (A/S) # of relevant documents per term (B/S) Average precision over terms Total precision First, we counted the number of input query terms that retrieved more than one document. Whereas PAT and DEF could retrieve documents with all queries, only 29 queries retrieved more than one document by using PRF. The numbers of queries retrieving more than 10 documents were 16 with PRF, 28 with PAT, and 43 with DEF. This shows the difficulty of retrieving documents without the query terms. PAT retrieved the largest number of relevant documents (226) and showed the highest average precision over terms (0.59). Total precision, which refers to the ratio of relevant documents to the total retrieved documents, was highest in PAT. Although PRF showed low precision, its total precision was relatively competitive (0.57) in that this model used only statistical information to expand the initial query terms. 3 Fair (0.2 <κ 0.4), Moderate (0.4 <κ 0.6), Substantial (0.6 <κ 0.8), and Almost perfect (κ> 0.8)

7 5 Conclusion and Future Work In this paper, we confirmed that PAT-based document retrieval is an effective and promising method to search relevant documents with no explicit query terms as well as to improve terminology-based scientific information access models. Moreover, we found that PAT-based retrieval could search hidden relevant documents that could not be retrieved by the PRF model. Therefore, our proposed model can be used as a supplementary model by combining it with other conventional retrieval models to improve search performance. The most pressing issue for future studies will be to expand the PAT retrieval model to search more TPs from the literature. It is possible to generate synonymous PATs such as cause(virus, disease), cause(virus, disorder) and develop(host, disease) without much lexical ambiguity owing to the richness of their contextual information. 6 References 1. InfoTerm. Terminology Standardization. 2010; Available from: 2. Lu, Z., W. Kim, and W.J. Wilbur, Evaluation of query expansion using MeSH in PubMed. Inf. Retr., (1): p Abdou, S., P. Ruck, and J. Savoy, Evaluation of stemming, query expansion and manual indexing approaches for the genomic task. cell. 501: p Aronson, A.R., The effect of textual variation on concept based information retrieval, in Proceedings a conference of the American Medical Informatics Association p Srinivasan, P., Query expansion and MEDLINE. Inf. Process. Manage., (4): p Choi, S.-P., S.-K. Song, and S.-H. Myaeng, Analysis of Sentential Paraphrase Patterns and Errors through Predicate-Argument Tuple-based Approximate Alignment. KIPS Journal, B(2). 7. Choi, S.-P. and S.-H. Myaeng, Simplicity is better: revisiting single kernel PPI extraction, in COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics Miyao, Y. and J.i. Tsujii, Feature Forest Models for Probabilistic HPSG Parsing. Computational Linguistics, (1): p Ponte, J.M. and W.B. Croft, A language modeling approach to information retrieval, in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998, ACM: Melbourne, Australia. p Turtle, H. and W.B. Croft, Evaluation of an inference network-based retrieval model. ACM Trans. Inf. Syst., (3): p Lavrenko, V. and W.B. Croft, Relevance based language models, in Proceedings of the 24th annual international ACM SIGIR conference on

8 Research and development in information retrieval. 2001, ACM: New Orleans, Louisiana, United States. p Cohen, J., Weighed kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, (4): p

Query Reformulation for Clinical Decision Support Search

Query Reformulation for Clinical Decision Support Search Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University

More information

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of Korea wakeup06@empas.com, jinchoi@snu.ac.kr

More information

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL Shuguang Wang Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA swang@cs.pitt.edu Shyam Visweswaran Department of Biomedical

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

Analyzing Patterns with Timelines on Researcher Data

Analyzing  Patterns with Timelines on Researcher Data Analyzing Email Patterns with Timelines on Researcher Data Jangwon Gim 1, Yunji Jang 1, Do-Heon Jeong 1,*, Hanmin Jung 1 1 Korea Institute of Science and Technology Information (KISTI) 245 Daehak-ro, Yuseong-gu,

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Finding Topic-centric Identified Experts based on Full Text Analysis

Finding Topic-centric Identified Experts based on Full Text Analysis Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr

More information

WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources

WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources Saeid Balaneshin-kordan, Alexander Kotov, and Railan Xisto Department

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

A Multiple-stage Approach to Re-ranking Clinical Documents

A Multiple-stage Approach to Re-ranking Clinical Documents A Multiple-stage Approach to Re-ranking Clinical Documents Heung-Seon Oh and Yuchul Jung Information Service Center Korea Institute of Science and Technology Information {ohs, jyc77}@kisti.re.kr Abstract.

More information

Indexing and Query Processing

Indexing and Query Processing Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation

More information

Automatic Generation of Query Sessions using Text Segmentation

Automatic Generation of Query Sessions using Text Segmentation Automatic Generation of Query Sessions using Text Segmentation Debasis Ganguly, Johannes Leveling, and Gareth J.F. Jones CNGL, School of Computing, Dublin City University, Dublin-9, Ireland {dganguly,

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

The University of Amsterdam at the CLEF 2008 Domain Specific Track

The University of Amsterdam at the CLEF 2008 Domain Specific Track The University of Amsterdam at the CLEF 2008 Domain Specific Track Parsimonious Relevance and Concept Models Edgar Meij emeij@science.uva.nl ISLA, University of Amsterdam Maarten de Rijke mdr@science.uva.nl

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

A Semantic Multi-Field Clinical Search for Patient Medical Records

A Semantic Multi-Field Clinical Search for Patient Medical Records BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 18, No 1 Sofia 2018 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2018-0014 A Semantic Multi-Field Clinical

More information

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Eun Ji Kim and Mun Yong Yi (&) Department of Knowledge Service Engineering, KAIST, Daejeon,

More information

MeSH: A Thesaurus for PubMed

MeSH: A Thesaurus for PubMed Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH

More information

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

An Investigation of Basic Retrieval Models for the Dynamic Domain Task An Investigation of Basic Retrieval Models for the Dynamic Domain Task Razieh Rahimi and Grace Hui Yang Department of Computer Science, Georgetown University rr1042@georgetown.edu, huiyang@cs.georgetown.edu

More information

Unsupervised Semantic Parsing

Unsupervised Semantic Parsing Unsupervised Semantic Parsing Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos) 1 Outline Motivation Unsupervised semantic parsing Learning and inference

More information

Natural Language Query Processing for SPARQL generation - a Prototype System for SNOMEDCT

Natural Language Query Processing for SPARQL generation - a Prototype System for SNOMEDCT Natural Language Query Processing for SPARQL generation - a Prototype System for SNOMEDCT Jin-Dong Kim Database Center for Life Science Research Organization of Information and Systems Tokyo, Japan jdkim@dbcls.rois.ac.jp

More information

Quality-Based Automatic Classification for Presentation Slides

Quality-Based Automatic Classification for Presentation Slides Quality-Based Automatic Classification for Presentation Slides Seongchan Kim, Wonchul Jung, Keejun Han, Jae-Gil Lee, and Mun Y. Yi Dept. of Knowledge Service Engineering, KAIST, Korea {sckim,wonchul.jung,keejun.han,jaegil,munyi@kaist.ac.kr}

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

A Practical Passage-based Approach for Chinese Document Retrieval

A Practical Passage-based Approach for Chinese Document Retrieval A Practical Passage-based Approach for Chinese Document Retrieval Szu-Yuan Chi 1, Chung-Li Hsiao 1, Lee-Feng Chien 1,2 1. Department of Information Management, National Taiwan University 2. Institute of

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Hybrid Approach for Query Expansion using Query Log

Hybrid Approach for Query Expansion using Query Log Volume 7 No.6, July 214 www.ijais.org Hybrid Approach for Query Expansion using Query Log Lynette Lopes M.E Student, TSEC, Mumbai, India Jayant Gadge Associate Professor, TSEC, Mumbai, India ABSTRACT Web

More information

The NLM Medical Text Indexer System for Indexing Biomedical Literature

The NLM Medical Text Indexer System for Indexing Biomedical Literature The NLM Medical Text Indexer System for Indexing Biomedical Literature James G. Mork 1, Antonio J. Jimeno Yepes 2,1, Alan R. Aronson 1 1 National Library of Medicine, Bethesda, MD, USA {mork,alan}@nlm.nih.gov

More information

Inter and Intra-Document Contexts Applied in Polyrepresentation

Inter and Intra-Document Contexts Applied in Polyrepresentation Inter and Intra-Document Contexts Applied in Polyrepresentation Mette Skov, Birger Larsen and Peter Ingwersen Department of Information Studies, Royal School of Library and Information Science Birketinget

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

This is the author s version of a work that was submitted/accepted for publication in the following source:

This is the author s version of a work that was submitted/accepted for publication in the following source: This is the author s version of a work that was submitted/accepted for publication in the following source: Koopman, Bevan, Bruza, Peter, Sitbon, Laurianne, & Lawley, Michael (2011) AEHRC & QUT at TREC

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

A Novel Approach of Mining Write-Prints for Authorship Attribution in Forensics

A Novel Approach of Mining Write-Prints for Authorship Attribution in  Forensics DIGITAL FORENSIC RESEARCH CONFERENCE A Novel Approach of Mining Write-Prints for Authorship Attribution in E-mail Forensics By Farkhund Iqbal, Rachid Hadjidj, Benjamin Fung, Mourad Debbabi Presented At

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

A Semantic Model for Concept Based Clustering

A Semantic Model for Concept Based Clustering A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of

More information

Optimization of the PubMed Automatic Term Mapping

Optimization of the PubMed Automatic Term Mapping 238 Medical Informatics in a United and Healthy Europe K.-P. Adlassnig et al. (Eds.) IOS Press, 2009 2009 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-044-5-238

More information

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents. Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

Towards Semantic Search and Inference in Electronic Medical Records: an approach using Concept- based Information Retrieval

Towards Semantic Search and Inference in Electronic Medical Records: an approach using Concept- based Information Retrieval Towards Semantic Search and Inference in Electronic Medical Records: an approach using Concept- based Information Retrieval Bevan Koopman 1,2 Peter Bruza 2 Laurianne Sitbon 2 Michael Lawley 1 1: Australian

More information

UMass at TREC 2017 Common Core Track

UMass at TREC 2017 Common Core Track UMass at TREC 2017 Common Core Track Qingyao Ai, Hamed Zamani, Stephen Harding, Shahrzad Naseri, James Allan and W. Bruce Croft Center for Intelligent Information Retrieval College of Information and Computer

More information

Exploiting Symmetry in Relational Similarity for Ranking Relational Search Results

Exploiting Symmetry in Relational Similarity for Ranking Relational Search Results Exploiting Symmetry in Relational Similarity for Ranking Relational Search Results Tomokazu Goto, Nguyen Tuan Duc, Danushka Bollegala, and Mitsuru Ishizuka The University of Tokyo, Japan {goto,duc}@mi.ci.i.u-tokyo.ac.jp,

More information

RMIT University at TREC 2006: Terabyte Track

RMIT University at TREC 2006: Terabyte Track RMIT University at TREC 2006: Terabyte Track Steven Garcia Falk Scholer Nicholas Lester Milad Shokouhi School of Computer Science and IT RMIT University, GPO Box 2476V Melbourne 3001, Australia 1 Introduction

More information

Quoogle: A Query Expander for Google

Quoogle: A Query Expander for Google Quoogle: A Query Expander for Google Michael Smit Faculty of Computer Science Dalhousie University 6050 University Avenue Halifax, NS B3H 1W5 smit@cs.dal.ca ABSTRACT The query is the fundamental way through

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Research and Design of Key Technology of Vertical Search Engine for Educational Resources

Research and Design of Key Technology of Vertical Search Engine for Educational Resources 2017 International Conference on Arts and Design, Education and Social Sciences (ADESS 2017) ISBN: 978-1-60595-511-7 Research and Design of Key Technology of Vertical Search Engine for Educational Resources

More information

Joining Collaborative and Content-based Filtering

Joining Collaborative and Content-based Filtering Joining Collaborative and Content-based Filtering 1 Patrick Baudisch Integrated Publication and Information Systems Institute IPSI German National Research Center for Information Technology GMD 64293 Darmstadt,

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Query Phrase Expansion using Wikipedia for Patent Class Search

Query Phrase Expansion using Wikipedia for Patent Class Search Query Phrase Expansion using Wikipedia for Patent Class Search 1 Bashar Al-Shboul, Sung-Hyon Myaeng Korea Advanced Institute of Science and Technology (KAIST) December 19 th, 2011 AIRS 11, Dubai, UAE OUTLINE

More information

Charles University at CLEF 2007 CL-SR Track

Charles University at CLEF 2007 CL-SR Track Charles University at CLEF 2007 CL-SR Track Pavel Češka and Pavel Pecina Institute of Formal and Applied Linguistics Charles University, 118 00 Praha 1, Czech Republic {ceska,pecina}@ufal.mff.cuni.cz Abstract

More information

QUT IElab at CLEF 2018 Consumer Health Search Task: Knowledge Base Retrieval for Consumer Health Search

QUT IElab at CLEF 2018 Consumer Health Search Task: Knowledge Base Retrieval for Consumer Health Search QUT Ilab at CLF 2018 Consumer Health Search Task: Knowledge Base Retrieval for Consumer Health Search Jimmy 1,3, Guido Zuccon 1, Bevan Koopman 2 1 Queensland University of Technology, Brisbane, Australia

More information

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published

More information

DESIGN AND IMPLEMENTATION OF AN INTERACTIVE QUERY EXPANSION METHODOLOGY FOR INFORMATION RETRIEVAL

DESIGN AND IMPLEMENTATION OF AN INTERACTIVE QUERY EXPANSION METHODOLOGY FOR INFORMATION RETRIEVAL DESIGN AND IMPLEMENTATION OF AN INTERACTIVE QUERY EXPANSION METHODOLOGY FOR INFORMATION RETRIEVAL S. Ruban 1, Vanitha T 2. and S. Behin Sam 3 1 Department of Computer Science, Bharathiar University, Coimbatore,

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

Author: Yunqing Xia, Zhongda Xie, Qiuge Zhang, Huiyuan Zhao, Huan Zhao Presenter: Zhongda Xie

Author: Yunqing Xia, Zhongda Xie, Qiuge Zhang, Huiyuan Zhao, Huan Zhao Presenter: Zhongda Xie Author: Yunqing Xia, Zhongda Xie, Qiuge Zhang, Huiyuan Zhao, Huan Zhao Presenter: Zhongda Xie Outline 1.Introduction 2.Motivation 3.Methodology 4.Experiments 5.Conclusion 6.Future Work 2 1.Introduction(1/3)

More information

Geosemantically-enhanced PubMed Queries Using the Geonames Ontology and Web Services

Geosemantically-enhanced PubMed Queries Using the Geonames Ontology and Web Services Geosemantically-enhanced PubMed Queries Using the Geonames Ontology and Web Services Maged N. Kamel Boulos, PhD, MSc, MBBCh Plymouth University, UK mnkboulos@ieee.org Agenda About PubMed and MeSH The Problem

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion

Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion Vinay Kakade vkakade@cs.stanford.edu Madhura Sharangpani smadhura@cs.stanford.edu Department of Computer Science

More information

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad

More information

Query Disambiguation from Web Search Logs

Query Disambiguation from Web Search Logs Vol.133 (Information Technology and Computer Science 2016), pp.90-94 http://dx.doi.org/10.14257/astl.2016. Query Disambiguation from Web Search Logs Christian Højgaard 1, Joachim Sejr 2, and Yun-Gyung

More information

Biomedical literature mining for knowledge discovery

Biomedical literature mining for knowledge discovery Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in

More information

Lecture 7: Relevance Feedback and Query Expansion

Lecture 7: Relevance Feedback and Query Expansion Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk

More information

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? http://www.ncbi.nlm.nih.gov/mesh

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Enriching Knowledge Domain Visualizations: Analysis of a Record Linkage and Information Fusion Approach to Citation Data

Enriching Knowledge Domain Visualizations: Analysis of a Record Linkage and Information Fusion Approach to Citation Data Enriching Knowledge Domain Visualizations: Analysis of a Record Linkage and Information Fusion Approach to Citation Data Marie B. Synnestvedt, MSEd 1, 2 1 Drexel University College of Information Science

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

A Session-based Ontology Alignment Approach for Aligning Large Ontologies

A Session-based Ontology Alignment Approach for Aligning Large Ontologies Undefined 1 (2009) 1 5 1 IOS Press A Session-based Ontology Alignment Approach for Aligning Large Ontologies Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University,

More information

Lecture 14: Annotation

Lecture 14: Annotation Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose

More information

External Query Reformulation for Text-based Image Retrieval

External Query Reformulation for Text-based Image Retrieval External Query Reformulation for Text-based Image Retrieval Jinming Min and Gareth J. F. Jones Centre for Next Generation Localisation School of Computing, Dublin City University Dublin 9, Ireland {jmin,gjones}@computing.dcu.ie

More information

Using a Medical Thesaurus to Predict Query Difficulty

Using a Medical Thesaurus to Predict Query Difficulty Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query

More information

Evaluating Relevance Ranking Strategies for MEDLINE Retrieval

Evaluating Relevance Ranking Strategies for MEDLINE Retrieval 32 Lu et al., Evaluating Relevance Ranking Strategies Application of Information Technology Evaluating Relevance Ranking Strategies for MEDLINE Retrieval ZHIYONG LU, PHD, WON KIM, PHD, W. JOHN WILBUR,

More information

A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result

A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result Wayne State University Wayne State University Dissertations 1-1-2013 A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result Massuod Hassan Alatrash Wayne State University,

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

TREC-10 Web Track Experiments at MSRA

TREC-10 Web Track Experiments at MSRA TREC-10 Web Track Experiments at MSRA Jianfeng Gao*, Guihong Cao #, Hongzhao He #, Min Zhang ##, Jian-Yun Nie**, Stephen Walker*, Stephen Robertson* * Microsoft Research, {jfgao,sw,ser}@microsoft.com **

More information

Approach Research of Keyword Extraction Based on Web Pages Document

Approach Research of Keyword Extraction Based on Web Pages Document 2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Approach Research Keyword Extraction Based on Web Pages Document Yangxin

More information

SEARCH TECHNIQUES: BASIC AND ADVANCED

SEARCH TECHNIQUES: BASIC AND ADVANCED 17 SEARCH TECHNIQUES: BASIC AND ADVANCED 17.1 INTRODUCTION Searching is the activity of looking thoroughly in order to find something. In library and information science, searching refers to looking through

More information

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-

More information

Extensible Dynamic Form Approach for Supplier Discovery

Extensible Dynamic Form Approach for Supplier Discovery Extensible Dynamic Form Approach for Supplier Discovery Yan Kang, Jaewook Kim, and Yun Peng Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County {kangyan1,

More information

POMap results for OAEI 2017

POMap results for OAEI 2017 POMap results for OAEI 2017 Amir Laadhar 1, Faiza Ghozzi 2, Imen Megdiche 1, Franck Ravat 1, Olivier Teste 1, and Faiez Gargouri 2 1 Paul Sabatier University, IRIT (CNRS/UMR 5505) 118 Route de Narbonne

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

A Language Independent Author Verifier Using Fuzzy C-Means Clustering

A Language Independent Author Verifier Using Fuzzy C-Means Clustering A Language Independent Author Verifier Using Fuzzy C-Means Clustering Notebook for PAN at CLEF 2014 Pashutan Modaresi 1,2 and Philipp Gross 1 1 pressrelations GmbH, Düsseldorf, Germany {pashutan.modaresi,

More information

Open Research Online The Open University s repository of research publications and other research outputs

Open Research Online The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs A Study of Document Weight Smoothness in Pseudo Relevance Feedback Conference or Workshop Item

More information

Multimedia Information Systems

Multimedia Information Systems Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Dynamic Visualization of Hubs and Authorities during Web Search

Dynamic Visualization of Hubs and Authorities during Web Search Dynamic Visualization of Hubs and Authorities during Web Search Richard H. Fowler 1, David Navarro, Wendy A. Lawrence-Fowler, Xusheng Wang Department of Computer Science University of Texas Pan American

More information