Author: Yunqing Xia, Zhongda Xie, Qiuge Zhang, Huiyuan Zhao, Huan Zhao Presenter: Zhongda Xie

Similar documents
Incorporating Semantic Knowledge with MRF Term Dependency Model in Medical Document Retrieval

A Multiple-stage Approach to Re-ranking Clinical Documents

York University at CLEF ehealth 2015: Medical Document Retrieval

ECNU at 2017 ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine

Ranking Web Pages by Associating Keywords with Locations

University of Glasgow at CLEF 2013: Experiments in ehealth Task 3 with Terrier

Document Retrieval using Predication Similarity

Classification and Comparative Analysis of Passage Retrieval Methods

A Deep Relevance Matching Model for Ad-hoc Retrieval

Query expansion using external resources for improving information retrieval in the biomedical domain

IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events

A Semantic Multi-Field Clinical Search for Patient Medical Records

Hierarchical Location and Topic Based Query Expansion

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

A Practical Passage-based Approach for Chinese Document Retrieval

A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF ehealth Task 3

External Query Reformulation for Text-based Image Retrieval

Towards Privacy-Preserving Evaluation for Information Retrieval Models over Industry Data Sets

Learning to Rank: A New Technology for Text Processing

Task 3 Patient-Centred Information Retrieval: Team CUNI

Query Likelihood with Negative Query Generation

DCU at FIRE 2013: Cross-Language!ndian News Story Search

ISSN: , (2015): DOI:

EFFECTIVE EFFICIENT BOOLEAN RETRIEVAL

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

Using a Medical Thesaurus to Predict Query Difficulty

Term Frequency Normalisation Tuning for BM25 and DFR Models

SINAI at CLEF ehealth 2017 Task 3

A language-modelling approach to User-Centred Health Information Retrieval

A Few Things to Know about Machine Learning for Web Search

The University of Amsterdam at the CLEF 2008 Domain Specific Track

Retrieval Evaluation. Hongning Wang

Ranking models in Information Retrieval: A Survey

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Task3 Patient-Centred Information Retrieval: Team CUNI

QUT IElab at CLEF 2018 Consumer Health Search Task: Knowledge Base Retrieval for Consumer Health Search

Combining Inter-Review Learning-to-Rank and Intra-Review Incremental Training for Title and Abstract Screening in Systematic Reviews

Entity and Knowledge Base-oriented Information Retrieval

Keywords Data alignment, Data annotation, Web database, Search Result Record

Video annotation based on adaptive annular spatial partition scheme

Linking Entities in Chinese Queries to Knowledge Graph

TREC-10 Web Track Experiments at MSRA

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Task 2a: Team KU-CS: Query Coherence Analysis for PRF and Genomics Expansion

From Passages into Elements in XML Retrieval

RMIT University at TREC 2006: Terabyte Track

Chapter 6: Information Retrieval and Web Search. An introduction

Leveraging User Query Sessions to Improve Searching of Medical Literature

CSCI 5417 Information Retrieval Systems. Jim Martin!

Performance Measures for Multi-Graded Relevance

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL

jldadmm: A Java package for the LDA and DMM topic models

MeSH-based dataset for measuring the relevance of text retrieval

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

Verbose Query Reduction by Learning to Rank for Social Book Search Track

CADIAL Search Engine at INEX

number of documents in global result list

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.

Query Reformulation for Clinical Decision Support Search

Detecting Multilingual and Multi-Regional Query Intent in Web Search

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

From CLIR to CLIE: Some Lessons in NTCIR Evaluation

Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

Content Based Cross-Site Mining Web Data Records

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

A Constrained Spreading Activation Approach to Collaborative Filtering

Structured Information Retrival Based Bug Localization

A cocktail approach to the VideoCLEF 09 linking task

Learning to Reweight Terms with Distributed Representations

Component ranking and Automatic Query Refinement for XML Retrieval

This is the author s version of a work that was submitted/accepted for publication in the following source:

Spatial Keyword Search. Presented by KWOK Chung Hin, WONG Kam Kwai

Context Sensitive Search Engine

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

Window Extraction for Information Retrieval

Using Coherence-based Measures to Predict Query Difficulty

Evaluation of some Information Retrieval models for Gujarati Ad hoc Monolingual Tasks

UMass at TREC 2017 Common Core Track

CUNI team: CLEF ehealth Consumer Health Search Task 2018

A New Measure of the Cluster Hypothesis

ImgSeek: Capturing User s Intent For Internet Image Search

Academic Paper Recommendation Based on Heterogeneous Graph

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Document indexing, similarities and retrieval in large scale text collections

Collaborative Filtering using a Spreading Activation Approach

Multi-Stage Rocchio Classification for Large-scale Multilabeled

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets

Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization

Semi-Parametric and Non-parametric Term Weighting for Information Retrieval

Extraction of Web Image Information: Semantic or Visual Cues?

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING

The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing

A System for Query-Specific Document Summarization

Annotating Multiple Web Databases Using Svm

Transcription:

Author: Yunqing Xia, Zhongda Xie, Qiuge Zhang, Huiyuan Zhao, Huan Zhao Presenter: Zhongda Xie

Outline 1.Introduction 2.Motivation 3.Methodology 4.Experiments 5.Conclusion 6.Future Work 2

1.Introduction(1/3) Traditional IR systems are based on independent keywords, which are called bag-of-word models. The ignorance of the connection between the words may lead to mistakes: Query: Cannabis and Cancer (Sentence-one) He is in bad conditions, he suffers from cancer, and he s addicted to cannabis. (Sentence-two) Studies prove that cannabis can be an effective treatment for cancer.[treats] (Sentence-three)The report indicates that long-term cannabis use may cause lung cancer.[causes] 3

1.Introduction(2/3) Previous work has justified the assumption that relations of various linguistic levels are helpful to improve document ranking: Statistical term dependency: Gao [4] linkage dependency language model Hou[8] Higher-Order word association relation Coarse-grained relations: Park[7] quasi-synchronous dependence model Lu[11] structural representation of texts Khoo[12] cause-effect relation Li[13] semantic relations (but are too general) The Coarse-grained relations are too general(like is_a, cooccurs_with). Fine-grained relations have real meainings(like treat, is_symptom_of, diagnoses). 4

1.Introduction(3/3) In specific areas like Medical Information, there are some ontologies which have fine-grained relations. SemMedDB[18] Vintar[3] achieved positive results, but they use it to filter cross-lingual web pages in a boolean manner. 5

2.Motivation Use fine-grained ontological relation(semmeddb) in specific area(medical) Document Retrieval, and to find whether it can improve retrieval result. To propose an algorithm to evaluate the querydocument relevance score in relation level. To determine a better way of combining the relevance scores in relation-level and traditional word-level 6

Framework 3.Methodology Raw Text Query Doc BM25 Word-Level Relevance Score Relation Discovery Combination Final Relevance Score Relation Vector Query Doc Vector Scoring Algorithm Relation- Level Relevance Score 7

3.Methodology Ontological Relation Detection SemMedDB has 57 kinds of ontological relations, and we choose 18 of them: PROCESS OF, METHOD OF, LOCATION OF, PART OF, OCCURS IN, STIMULATES, MANIFESTATION OF, CONVERT TO, AUGMENTS, ASSOCIATED WITH, PREVENTS, USES, TREATS, PREDISPOSES, PRODUCES, DISRUPTS, CAUSES and INHIBITS Use the predicate instances of the relation for detection. Studies prove that cannabis can be an effective treatment for cancer. The report show that his cancer may be treated by the right amount of cannabis. 8

3.Methodology Representation of Query and Document Using Ontological Relation Query Queries are often too short to detect any relation keyword Cannabis and Cancer (0,0,0,0,0.5,0,0,0,0.5,0,0,0,0,0,0,0,0,0) [TREATS, CAUSES] Document (1)He is in bad conditions, he suffers from cancer, and he s addicted to cannabis. (2)His cough is treated by using Aspirin. (3)The report indicates that long-term cannabis use may cause lung cancer. (4)Studies prove that cannabis can be an effective treatment for cancer. (5)The report show that his cancer may be treated by the right amount of cannabis. (0,0,0,0,2/3,0,0,0,1/3,0,0,0,0,0,0,0,0,0) 9

3.Methodology Relation Relevance Score: Cosine Distance Combination Method: r: word-level relevance score; l: relation-level relevance score. I. Summation II. Multiplication III. Amplification 10

Data Set: 4.Experiments CLEF: Clef2013 ehealth Lab Medical IR task[17] CLEF+: Extended non-annotation documents (three medical students as assessors, Kappa co-efficient 0.82) Queries: 14 out of all 50 queries contain two concepts. Evaluation Metrics: 1 p@10: precision at top 10. 2 ndcg@10: normalized Discounted Cumulative Gain 3 MAP: Mean average precision at top 10. 11

Relation Detection Window 1 2 3 4 5 6 4.Experiments CURS: The current sentence CURSP: The current and the preceding sentence CURSPF: The current, preceding, following sentences CURP: The current paragraph CURD: The current web document HTML: Text in the current HTML tag pair relation Conclusion: HTML is the best 12

Combination Method: 4.Experiments 1 SUUM: the summation method, α empirically set 0.7 2 3 MULT: the multiplication method AMPL: the amplification method. AMPL outperforms the other two over all metrics. 13

Different Methods: 1 2 3 4.Experiments BM25: Okapi BM25. Default setting. BMB: the method in [3], filtering web pages in a boolean manner. BMR: Our method, combing word-level and relation-level score. 14

5.Conclusion 1. We propose a novel medical document ranking method, which incorporates the fine-grained ontological relations in relevance scoring. 2. We think of a way to evaluate the relation-level relevance of query and document. 3. We explore the influence of combination model and relation detection window. 4. We compared the result with some related works, and it turns out better. 15

6.Future Work 1. The 18 relations are compiled by human experts, and we hope to extend these relations to cover all the possible relations. 2. To propose a better relation detection algorithm. 3. Apply ontological relation method in general domain. 4. Conduct more experiments, comparing with other methods. 16

References 1. Song, F., Croft, W.B.: A general language model for information retrieval. In: Proc. of CIKM 1999, pp. 316 321. ACM, New York (1999) 2. Matsumura, A., Takasu, A.: Adachi: The effect of information retrieval method using dependency relationship between words. In: Proceedings of RIAO 2000, pp. 1043 1058 (2000) 3. Vintar, S., Buitelaar, P., Volk, M.: Semantic relations in concept-based crosslanguage medical information retrieval. In: Proceedings of ECML/PKDD workshop on Adaptive Text Eextraction and Mining (ATEM) (2003) 4. Gao, J., Nie, J.Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proc. of SIGIR 2004, pp. 170 177. ACM, New York (2004) 5. Morton, T.: Using semantic relations to improve information retrieval. PhD thesis, University of Pennsylvania (2004) 6. Maisonnasse, L., Gaussier, E., Chevallet, J.P.: Revisiting the dependence language model for information retrieval. In: Proc. of SIGIR 2007, pp. 695 696. ACM, New York (2007) 7. Park, J.H., Croft, W.B., Smith, D.A.: A quasi-synchronous dependence model for information retrieval. In: Proc. of CIKM 2011, pp. 17 26. ACM, New York (2011) 8. Hou, Y., Zhao, X., Song, D., Li, W.: Mining pure high-order word associations via information geometry for information retrieval. ACM Trans. Inf. Syst. 31(3), 12:1 12:32 (2013) 9. Zhao, J., Huang, J.X., Ye, Z.: Modeling term associations for probabilistic information retrieval. ACM Trans. Inf. Syst. 32(2), 7:1 7:47 (2014) 17

References 10. Giger, H.P.: Concept based retrieval in classical ir systems. In: Proc. of SIGIR 1988, pp. 275 289. ACM, New York (1988) 11. Lu, X.: Document retrieval: A structural approach. Inf. Process. Manage. 26(2), 209 218 (1990) 12. Khoo, C.S.G., Myaeng, S.H., Oddy, R.N.: Using cause-effect relations in text to improve information retrieval precision. Inf. Process. Manage. 37(1), 119 145 (2001) 13. Li, Y., Wang, Y., Huang, X.: A relation-based search engine in semantic web. IEEE Trans. on Knowl. and Data Eng. 19(2), 273 282 (2007) 14. Lee, J., Min, J.K., Oh, A., Chung, C.W.: Effective ranking and search techniques for web resources considering semantic relationships. Inf. Process. Manage. 50(1), 132 155 (2014) 15. Bilotti, M.W., Elsas, J., Carbonell, J., Nyberg, E.: Rank learning for factoid question answering with linguistic and semantic constraints. In: Proc. of CIKM 2010,pp. 459 468. ACM, New York (2010) 16. Voorhees, E.M., Hersh, W.: Overview of the trec 2012 medical records track. In: Proc. of TREC 2012 (2012) 17. Goeuriot, L., Jones, G.J.F., Kelly, L., Leveling, J., Hanbury, A., M uller, H., Salanter a, S., Suominen, H., Zuccon, G.: Share/clef ehealth evaluation lab 2013, task 3: Information retrieval to address patients questions when reading clinical reports. In: CLEF Online Working Notes (2013) 18. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: Semmeddb: a pubmed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158 3160 (2012) 18

19