Biomedical literature mining for knowledge discovery

Size: px
Start display at page:

Download "Biomedical literature mining for knowledge discovery"

Transcription

1 Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine

2 Outline Biomedical Literature Access Challenges in Literature Search User interactions with PubMed Author name disambiguation Click-words identification Disease name recognition Relationship extraction

3 Biomedical Literature Access Welcome to PubMed PubMed comprises more than 20 million citations for biomedical articles from MEDLINE and life science journals. Citations may include links to full-text articles from PubMed Central or publisher web sites. 20 million citations 5,200 journals Diverse topics: microbiology, delivery of health care, nutrition, pharmacology, environmental health and more Categories: anatomy, organisms, diseases, psychiatry, psychology physical sciences and more

4 Word Weight PubMed abstracts Blast Nucleotide sequences Protein sequences Blast

5

6 PubMed: the busiest NCBI database

7 Daily PubMed Usage PubMed Queries Abstract Retrievals Full text Retrievals Average 2.2 Million 2.7 Million 900 Thousand Average/User Islamaj Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

8 Cumulative number of publications Number of publications related to breast cancer in PubMed 250, , , ,000 50, Publication year

9 Challenges in literature search Rapid growth of biomedical literature Breakdown of disciplinary boundaries Garg, et al., Lost in publication: Half of all renal practice evidence is published in non-renal journals, Kidney international, 2006

10 Goal: improving access to literature PubMed oriented projects: Log analysis of user search needs and habits Author name disambiguation Concept recognition for linking related data Methods development in text mining Natural language processing techniques Statistical and machine learning methods

11 Outline Biomedical Literature Access Challenges in Literature Search User interactions with PubMed Author name disambiguation Click-words identification Disease name recognition Relationship extraction

12 General user interactions with PubMed

13 PubMed log analysis First large-scale investigation of its kind Data: one month data including 23M user sessions Analyzed millions of queries and clicks Identified search topics in 10,000 queries Islamaj Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

14 User behavior characteristics Time on the site Number of searches Number of clicks on PubMed citations Number of full text requests Number of words in a query Types of queries Position of clicks Number of returned results

15 The first ranked search result is the most clicked March Islamaj 1, 2011Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

16 The first ranked search result is the most clicked (true for every page) March Islamaj 1, 2011Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

17 User click trend Most (>80%) clicks happened in top 20 positions.

18 PubMed search result size 9% 6% 15% 0 Result % % 1, ,000 10,000 18%

19 Users are less likely to click if number of search results is large March Islamaj 1, 2011Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

20 Users are more likely to reformulate their search if number of search results is large March Islamaj 1, 2011Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

21 Highlights of PubMed users search behavior Results of query analysis Average number of queries issued by a user per day 4.05 Average number of words in a PubMed query 3.54 Median number of citations returned per query 44 Results of click through analysis Queries that do not retrieve any results 15 % Queries that were followed by another query 47 % Abstract views followed by full text of the same article 29 % Average number of abstract or full text articles requested (clicked) by a user per day 3.57 Islamaj Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

22 Query annotation results (10,000 queries) March Islamaj 1, 2011Dogan R, et al. (2009) Understanding PubMed user search behavior through log analysis. Database. Vol. 2009: bap018;

23 Most common associations Categories Examples Author name + Citation Rezarta Islamaj Dogan, 2009 Disorder + Medical Procedure breast cancer mammography Disorder + Gene/Protein brca1 breast cancer Gene/Protein + Biological process nfkb activation Disorder + Chemical/Drug cold aspirin

24 Understanding User s Query Crucial for a search engine to address the information need of a user. Example: AUTHOR NAME DISAMBIGUATION

25 36% of PubMed queries contain an author name

26 Author Identification Articles sharing the same author name Individual authors

27 ,024 2,048 4,096 8,192 16,384 32,768 Average number of articles per author Million articles Million Unique Author Names 3.3 Million Author Names have multiple articles 100,000 10,000 1, Number of articles associated with an author name

28 Average Number of Articles per Author Number of authors 10,000,000 1,000, ,000 10,000 1, Number of articles associated with an author name

29 Identifying articles penned by the same author We designed a machine learning process to learn the difference between same author papers and different author papers in PubMed Author Name

30 Features Set (MEDLINE citation fields) For each pair of articles: Title of the article Abstract of the article Co-authors MeSH Terms First Author Affiliation Journal Name Publication Date Chemical compounds Grant information

31 Classifier Feature weight Feature Analysis

32 Author Identification B A C E D -3 Compute pair-wise score

33 Author Identification B A C E D Compute pair-wise probability

34 Hierarchical clustering A B C D E A B,C D E A,B,C D E

35 Average number of articles per author Original Namespace Estimated Individual Authors Space 100,000 10,000 1, Number of articles associated with an author name

36 Number of authors Original Namespace Estimated Individual Authors Space 10,000,000 1,000, ,000 10,000 1, Number of articles associated with an author name

37 Understanding User s Query Crucial for a search engine to address the information need of a user. Example: CLICK-WORDS IDENTIFICATION

38 56% of the queries in PubMed contain only content words

39 Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Islamaj Dogan R and Lu Z. (2010) Click-words: learning to predict document keywords from a user perspective. Bioinformatics 2010

40 Users generate/build signals for the papers they access Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Query Words Islamaj Dogan R and Lu Z. (2010) Click-words: learning to predict document keywords from a user perspective. Bioinformatics 2010

41 Author keywords: Arp2/3 complex cofilin FRAP lamellipodium migration MeSH terms: Actin Capping Actin-Related Protein 2-3 Complex/metabolism* Actins/metabolism* Adaptor Proteins, Signal Cell Line, Tumor Cortactin/metabolism Mice PMID: arp cortactin actin cofilin Click words: actin arp2 cofilin cortactin Top five TF-IDF words: arp2 actin cofilin lamellipodium complex

42 TF-IDF words TF-IDF is a statistical measure used often in information retrieval to identify the most relevant terms of a document compared to the whole collection We can use the TF-IDF weight to select the most important words for an article

43 Data Set Training Data Set Evaluation Dataset PubMed Log Data Two months Six months User queries 100 Million 333 Million Abstract clicks 130 Million 329 Million User sessions 51 Million 144 Million PubMed articles 47,609 11,237 Total click-words 101,377 22,663 Top five TF-IDF words 237,155 62,310 Click-words/article Highly accessed article: Received on average one click per user per day

44 Data Set Training Data Set Evaluation Dataset PubMed Log Data Two months Six months User queries 100 Million 333 Million Abstract clicks 130 Million 329 Million User sessions 51 Million 144 Million PubMed articles 47,609 11,237 Total click-words 101,377 22,663 Top five TF-IDF words 237,155 62,310 Click-words/article Random baseline: Break-Even precision recall point: 0.429

45 Learning Method We built a machine learning model to identify users click words from top weighted TF-IDF words Wide margin classifier with Huber Loss function 5-fold cross validation Evaluation: Break Even precision recall Ranking analysis for each article

46 Word itself is not enough We could built a learning model using only the words, but this would not be sufficient, New words in new articles would be a problem The set of words that appears as a click for one article but not for another article would be confusing We would not be able to rank articles based on solely this feature We need context

47 Click-word features Word and its neighbors Part of speech tag (MEDPOST) MetaMap semantic type Location in the abstract Part of phrase Abbreviation TF-IDF rank

48 Training dataset 5-fold Cross Validation Evaluation dataset top 5 TF-IDF words Classification Model Mean Average Precision Break- Even Precision Recall ROC Prec@1 Random selection TF-IDF weight Click word Model Random selection TF-IDF weight Click word Model

49 Click word A content word is a word in a query that results in a click for the article Automatic identification important for effective document retrieval User choice: the word that most users prefer to access a particular article Increases chances that an article receives better visibility.

50 Click-word characteristics Have high TF-IDF weight Occur several times Appear in Title Are Nouns and Verbs Are part of phrase Are meaningful concepts Are abbreviated terms

51 Neighbor words L3 L2 L1 WORD R1 R2 R3 Management Background Treatment Diagnosis Patients Chronic Background Role Management Stem Breast Acute Human Factor Virus Syndrome Cancer Receptor Family Infection Related Signaling Deficient Regulates Cells Virus Syndrome Cancer Receptor Family Infection Review Clinical Nf Micrornas Including Agents

52 Outline Biomedical Literature Access Challenges in Literature Search User interactions with PubMed Author name disambiguation Click-words identification Disease name recognition Relationship extraction

53 Understanding User s Query Crucial for a search engine to address the information need of a user. Example: DISEASE NAME RECOGNITION

54 20% of the queries in PubMed contain a disease name

55

56 Text retrieval in clinical data A very challenging problem Training data: 349 patient records Testing data: 477 patient records Data comes from i2b challenge Annotated for concept and relationship Three different hospitals

57 Extracting medical concepts from patient records Medical Concept Problem Treatment Test Example Sentence On admission, the patient was found to have a mild fever, myalgias, and arthralgias that were relieved by Tylenol. Infectious Disease was consulted and recommended doxycycline to cover both organisms. Pending labs included wound, bacterial, and fungal cultures and serologies for Bartonella, Francisella, Yersinia, EBV,

58 Example: There was some concern that the patient may have a partial biliary obstruction and the patient was sent for a magnetic resonance cholangiopancreatography to further evaluate the biliary system. TEST Conducted for PROBLEM

59

60 Concept recognition Concept Exact span evaluation Inexact span evaluation Precision Recall F-measure Precision Recall F-measure Problem Treatment Test Overall Best i2b2 system

61 Relationship identification model Representation: Five, not necessarily consecutive, context-blocks. Separate bag-of-words models. Context-blocks SVM features: Assertion UMLS concept identifiers UMLS semantic types Baseline: Naïve bag-of-words SVM model using the same features

62 Relationship identification Relationship Relates Conducted Reveals Given Not Given Improves Causes Worsens Naïve bag-ofwords SVM model Context-blocks SVM model Context-blocks + feature selection Features in the best model (Words +) CUI Assertion - CUI, Assertion, SemType CUI SemType CUI, Assertion, SemType Assertion

63 End-to-end model Results of relationship identification Annotated concepts Predicted concepts Prior feature After feature Prior feature selection selection selection Improves After feature selection Worsens Causes Given Not Given Reveals Conducted Relates Overall weighted average

64 Text retrieval in clinical data Why was the patient admitted to the hospital? What tests were done? Which ones were effective? Which medication worsened his condition? What problems were created as a result? What happens next?

65 Text retrieval in clinical data Who will follow the patients? Which medication is to be taken regularly? What dosage/duration? How can we use the knowledge in medical summaries to help doctors and nurses when they evaluate/treat new patients?

66 Question Answering Knowledge Discovery INFORMATION EXTRACTION SYNTHESIS

67 Summary PubMed Logs analysis is currently used to improve retrieval quality and direct future development of the site Click-words summarize the wisdom of the crowds. They are specific for every article, and can be predicted using the learned characteristics Disease sensor will be useful linking related disease resources in PubMed

68 Research interests Knowledge discovery on biomedical literature and other text resources Turning knowledge into health care: text mining on systematic reviews Identifying weak links in genomic sequences that affect function

69 Thank you

70 ACKNOWLEDGEMENTS This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. W. John Wilbur Zhiyong Lu Lana Yeganova Won Kim Wanli Liu Don Comeau Natalie Xie Larry Smith Aurelie Neveol Craig Murray Sun Kim Vahan Grigoryan

Alternative Tools for Mining The Biomedical Literature

Alternative Tools for Mining The Biomedical Literature Yale University From the SelectedWorks of Rolando Garcia-Milian May 14, 2014 Alternative Tools for Mining The Biomedical Literature Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/1/

More information

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center

More information

Overview of BioCreative VI Precision Medicine Track

Overview of BioCreative VI Precision Medicine Track Overview of BioCreative VI Precision Medicine Track Mining scientific literature for protein interactions affected by mutations Organizers: Rezarta Islamaj Dogan (NCBI) Andrew Chatr-aryamontri (BioGrid)

More information

Improving Interoperability of Text Mining Tools with BioC

Improving Interoperability of Text Mining Tools with BioC Improving Interoperability of Text Mining Tools with BioC Ritu Khare, Chih-Hsuan Wei, Yuqing Mao, Robert Leaman, Zhiyong Lu * National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda,

More information

Lane Medical Library Stanford University Medical Center

Lane Medical Library Stanford University Medical Center Lane Medical Library Stanford University Medical Center http://lane.stanford.edu LaneAskUs@Stanford.edu 650.723.6831 PubMed: A Quick Guide PubMed: (connect from Lane Library s webpage, http://lane.stanford.edu/

More information

MeSH: A Thesaurus for PubMed

MeSH: A Thesaurus for PubMed Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical

More information

Searching for Literature Using HDAS (Healthcare Databases Advanced Search)

Searching for Literature Using HDAS (Healthcare Databases Advanced Search) Searching for Literature Using HDAS (Healthcare Databases Advanced Search) 1. What is HDAS?... page 2 2. How do I access HDAS?... page 3 3. Questions and concepts (PICO) page 4 4. Selecting a database.

More information

A Semantic Multi-Field Clinical Search for Patient Medical Records

A Semantic Multi-Field Clinical Search for Patient Medical Records BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 18, No 1 Sofia 2018 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2018-0014 A Semantic Multi-Field Clinical

More information

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1 Outline Introduction Overview of Literature Data Sources PubMed, HighWire

More information

7/7/2014. Effective Literature Searching: the steps. Searching the Literature: Concepts, Resources & Searching Skills

7/7/2014. Effective Literature Searching: the steps. Searching the Literature: Concepts, Resources & Searching Skills Searching the Literature: Concepts, Resources & Searching Skills Victoria G. Riese, MLIS, AHIP Welch Medical Library vgoode@jhmi.edu 410-502-7570 July 21, 2014 1 Agenda The Steps to Searching Why Controlled

More information

PubMed Basics. Stephanie Friree Outreach and Technology Coordinator NN/LM New England Region (800)

PubMed Basics. Stephanie Friree Outreach and Technology Coordinator NN/LM New England Region (800) PubMed Basics 1 PubMed Basics 2 PubMed Basics 3 PubMed Basics Stephanie Friree Outreach and Technology Coordinator stephanie.friree@umassmed.edu NN/LM New England Region (800) 338-7657 Overview! Introductions!

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH

More information

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Pavel P. Kuksa, Rutgers University Yanjun Qi, Bing Bai, Ronan Collobert, NEC Labs Jason Weston, Google Research NY Vladimir

More information

Keep Current with Health Science Literature Create a Subject or Journal Alerts with My NCBI

Keep Current with Health Science Literature Create a Subject or Journal Alerts with My NCBI Neil John Maclean Health Sciences Library 727 McDermot Avenue (Brodie Centre) Winnipeg, MB 204-789-3464 njm_ref@umanitoba.ca umanitoba.ca/libraries/health Keep Current with Health Science Literature Create

More information

BioC: a minimalist approach to interoperability for biomedical text processing. Don Comeau

BioC: a minimalist approach to interoperability for biomedical text processing. Don Comeau BioC: a minimalist approach to interoperability for biomedical text processing Don Comeau Outline Background and origin of BioC What is BioC? Available Tools and Corpora 2 BioCreative Critical Assessment

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? http://www.ncbi.nlm.nih.gov/mesh

More information

MeSH-based dataset for measuring the relevance of text retrieval

MeSH-based dataset for measuring the relevance of text retrieval MeSH-based dataset for measuring the relevance of text retrieval Won Kim, Lana Yeganova, Donald C Comeau, W John Wilbur, Zhiyong Lu National Center for Biotechnology Information, NLM, NIH, Bethesda, MD,

More information

Renae Barger, Executive Director NN/LM Middle Atlantic Region

Renae Barger, Executive Director NN/LM Middle Atlantic Region Renae Barger, Executive Director NN/LM Middle Atlantic Region rbarger@pitt.edu http://nnlm.gov/mar/ DANJ Meeting, November 4, 2011 Advanced PubMed (20 min) General Information PubMed Citation Types Automatic

More information

EBP. Accessing the Biomedical Literature for the Best Evidence

EBP. Accessing the Biomedical Literature for the Best Evidence Accessing the Biomedical Literature for the Best Evidence Structuring the search for information and evidence Basic search resources Starting the search EBP Lab / Practice: Simple searches Using PubMed

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

Precise Medication Extraction using Agile Text Mining

Precise Medication Extraction using Agile Text Mining Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Information Services & Systems. The Cochrane Library. An introductory guide. Sarah Lawson Information Specialist (NHS Support)

Information Services & Systems. The Cochrane Library. An introductory guide. Sarah Lawson Information Specialist (NHS Support) Information Services & Systems The Cochrane Library An introductory guide Sarah Lawson Information Specialist (NHS Support) sarah.lawson@kcl.ac.uk April 2010 Contents 1. Coverage... 3 2. Planning your

More information

SciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus

SciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus Prepared by: Jawad Sayadi Account Manager, United Kingdom Elsevier BV Radarweg 29 1043 NX Amsterdam The Netherlands J.Sayadi@elsevier.com SciVerse Scopus SciVerse Scopus 1. Scopus introduction and content

More information

Query Reformulation for Clinical Decision Support Search

Query Reformulation for Clinical Decision Support Search Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University

More information

in Evidence-Based Medicine

in Evidence-Based Medicine in Evidence-Based Medicine รศ.นพ. อน ร ธ ภะทรากาญจน Aims of Literature Search To solve clinical problems (EBM) To search for existing knowledge in order to conduct a research To update knowledge 5A of

More information

The basics of searching biomedical databases. Francesca Frati, MLIS. Learning Outcomes. At the end of this workshop you will:

The basics of searching biomedical databases. Francesca Frati, MLIS. Learning Outcomes. At the end of this workshop you will: The basics of searching biomedical databases Francesca Frati, MLIS Learning Outcomes At the end of this workshop you will: Be better able to formulate a clear search question Become more familiar with

More information

Retrieval of Highly Related Documents Containing Gene-Disease Association

Retrieval of Highly Related Documents Containing Gene-Disease Association Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,

More information

Evaluating Relevance Ranking Strategies for MEDLINE Retrieval

Evaluating Relevance Ranking Strategies for MEDLINE Retrieval 32 Lu et al., Evaluating Relevance Ranking Strategies Application of Information Technology Evaluating Relevance Ranking Strategies for MEDLINE Retrieval ZHIYONG LU, PHD, WON KIM, PHD, W. JOHN WILBUR,

More information

Visualization and text mining of patent and non-patent data

Visualization and text mining of patent and non-patent data of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent

More information

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad

More information

Exploring the Query Expansion Methods for Concept Based Representation

Exploring the Query Expansion Methods for Concept Based Representation Exploring the Query Expansion Methods for Concept Based Representation Yue Wang and Hui Fang Department of Electrical and Computer Engineering University of Delaware 140 Evans Hall, Newark, Delaware, 19716,

More information

Searching PubMed. Enter your concepts into the search box and click Search. Your results are displayed below the search box.

Searching PubMed. Enter your concepts into the search box and click Search. Your results are displayed below the search box. Searching PubMed UCL Library Services, Gower St., London WC1E 6BT 020 7679 7700 E-mail: library@ucl.ac.uk http://www.ucl.ac.uk/library/ 1. What is PubMed? http://www.pubmed.gov PubMed is a free interface

More information

Genescene: Biomedical Text and Data Mining

Genescene: Biomedical Text and Data Mining Claremont Colleges Scholarship @ Claremont CGU Faculty Publications and Research CGU Faculty Scholarship 5-1-2003 Genescene: Biomedical Text and Data Mining Gondy Leroy Claremont Graduate University Hsinchun

More information

Medical Center Library & Archives

Medical Center Library & Archives Medical Center Library & Archives October 1, 2016 The Medical Center Library welcomes you to the Duke community! We would like to take a moment to tell you about some of the tremendous number of services

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

Literature Searching: hints and tips for developing search strategies and running searches

Literature Searching: hints and tips for developing search strategies and running searches Literature Searching: hints and tips for developing search strategies and running searches Kathy Murray, medical librarian kmurray10@alaska.edu 786.1611 Outline Being thorough Formulating the question

More information

PubMed - Beyond the Basics

PubMed - Beyond the Basics PubMed - Beyond the Basics Instructor: Greg Pratt The University of Texas M.D. Anderson Cancer Center http://www.mdanderson.org/library 713-792-2282 Objective: To become a more knowledgeable and efficient

More information

In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the

In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the relationships between concepts. And we discussed common

More information

A Semantic Model for Concept Based Clustering

A Semantic Model for Concept Based Clustering A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of

More information

This is the author s version of a work that was submitted/accepted for publication in the following source:

This is the author s version of a work that was submitted/accepted for publication in the following source: This is the author s version of a work that was submitted/accepted for publication in the following source: Koopman, Bevan, Bruza, Peter, Sitbon, Laurianne, & Lawley, Michael (2011) AEHRC & QUT at TREC

More information

Searching healthcare databases using Ovid / Training Guide

Searching healthcare databases using Ovid / Training Guide Searching healthcare databases using Ovid / Training Guide Overview This detailed guide will help you plan and carry out a topic search across databases such as Medline and Embase using a specialist search

More information

Guidance to JECFA Experts on Systematic Literature Searches. Prepared by. WHO JECFA Secretariat. Jan 2017

Guidance to JECFA Experts on Systematic Literature Searches. Prepared by. WHO JECFA Secretariat. Jan 2017 Guidance to JECFA Experts on Systematic Literature Searches Prepared by WHO JECFA Secretariat Jan 2017 CONTEXT... 1 INTRODUCTION... 1 STEPS OF A SYSTEMATIC LITERATURE SEARCH:... 2 1. Defining your Research

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3

e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 1 National Institute of Pharmaceutical Education and

More information

Literature Databases

Literature Databases Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and

More information

Embase for biomedical searching An introduction. Presented by Sherry Winter January 27, 2015

Embase for biomedical searching An introduction. Presented by Sherry Winter January 27, 2015 1 Embase for biomedical searching An introduction Presented by Sherry Winter January 27, 2015 2 2 Need to know Webinar control panel: Ask a question for questions and comments Option for full screen view

More information

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI. 2 Navigating the NCBI Instructions Aim: To become familiar with the resources available at the National Center for Bioinformatics (NCBI) and the search engine Entrez. Instructions: Write the answers to

More information

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of Korea wakeup06@empas.com, jinchoi@snu.ac.kr

More information

An Introduction to PubMed Searching: A Reference Guide

An Introduction to PubMed Searching: A Reference Guide An Introduction to PubMed Searching: A Reference Guide Created by the Ontario Public Health Libraries Association (OPHLA) ACCESSING PubMed PubMed, the National Library of Medicine s free version of MEDLINE,

More information

Humboldt-University of Berlin

Humboldt-University of Berlin Humboldt-University of Berlin Exploiting Link Structure to Discover Meaningful Associations between Controlled Vocabulary Terms exposé of diploma thesis of Andrej Masula 13th October 2008 supervisor: Louiqa

More information

Searching Pubmed Database استخدام قاعدة المعلومات Pubmed

Searching Pubmed Database استخدام قاعدة المعلومات Pubmed Searching Pubmed Database استخدام قاعدة المعلومات Pubmed برنامج مهارات البحث العلمي مركز البحىث بأقسام العلىم والدراسات الطبية للطالبات األحد 1433/11/14 ه الموافق 30 2012 /9/ م د. سيناء عبد المحسن العقيل

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

EVIDENCE SEARCHING IN EBM. By: Masoud Mohammadi

EVIDENCE SEARCHING IN EBM. By: Masoud Mohammadi EVIDENCE SEARCHING IN EBM By: Masoud Mohammadi Steps in EBM Auditing the outcome Defining the question or problem Applying the results Searching for the evidence Critically appraising the literature Clinical

More information

FIGURE 1. The updated PubMed format displays the Features bar as file tabs. A default Review limit is applied to all searches of PubMed. Select Englis

FIGURE 1. The updated PubMed format displays the Features bar as file tabs. A default Review limit is applied to all searches of PubMed. Select Englis CONCISE NEW TOOLS AND REVIEW FEATURES OF FOR PUBMED CLINICIANS Clinicians Guide to New Tools and Features of PubMed DENISE M. DUPRAS, MD, PHD, AND JON O. EBBERT, MD, MSC Practicing clinicians need to have

More information

A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY

A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY Paper # DM08 A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY ABSTRACT Recently, the internet has become

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Analyzer of Bio-resource Citations. World Data Center of Microorganisms(WDCM)

Analyzer of Bio-resource Citations. World Data Center of Microorganisms(WDCM) Analyzer of Bio-resource Citations World Data Center of Microorganisms(WDCM) http://abc.wdcm.org/ Outlines Introduction of ABC Homepage and function of ABC Text mining for microorganism : classification,

More information

Questions? Find citations on the therapy of earache with antibiotics written in English and published since 2000.

Questions? Find citations on the therapy of earache with antibiotics written in English and published since 2000. Questions? Find an article studying on the clinical application of benazepri published in NEJM, and written by Prof. Hou Fanfan who works in Nanfang Hospital. 1 Questions? Find citations on the therapy

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

RLIMS-P Website Help Document

RLIMS-P Website Help Document RLIMS-P Website Help Document Table of Contents Introduction... 1 RLIMS-P architecture... 2 RLIMS-P interface... 2 Login...2 Input page...3 Results Page...4 Text Evidence/Curation Page...9 URL: http://annotation.dbi.udel.edu/text_mining/rlimsp2/

More information

Integrated Access to Biological Data. A use case

Integrated Access to Biological Data. A use case Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research

More information

The Cochrane Library

The Cochrane Library The Cochrane Library Introduction The Cochrane Library is an electronic information service designed to provide evidence to inform health care decision-making. Updated monthly, it is a useful source of

More information

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous

More information

Introduction to Ovid. As a Clinical Librarían tool! Masoud Mohammadi Golestan University of Medical Sciences

Introduction to Ovid. As a Clinical Librarían tool! Masoud Mohammadi Golestan University of Medical Sciences Introduction to Ovid As a Clinical Librarían tool! Masoud Mohammadi Golestan University of Medical Sciences Overview Ovid helps researchers, librarians, clinicians, and other healthcare professionals find

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

dr.ir. D. Hiemstra dr. P.E. van der Vet

dr.ir. D. Hiemstra dr. P.E. van der Vet dr.ir. D. Hiemstra dr. P.E. van der Vet Abstract Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers

More information

Literature Search. What is PubMed? PubMed Database. What Does MEDLINE Cover? How Big is MEDLINE? PubMed Basics. PubMed

Literature Search. What is PubMed? PubMed Database. What Does MEDLINE Cover? How Big is MEDLINE? PubMed Basics. PubMed What is PubMed? Literature Search PubMed Somkiat Asawaphureekorn M.D., M.Sc. (Clinical Epidemiology) A web-based retrieval system developed by NCBI (a part of Entrez retrieval system) Free version of MEDLINE

More information

Databases available to ISU researchers:

Databases available to ISU researchers: Databases available to ISU researchers: Table of Contents Web of Knowledge Overview 3 Web of Science 4 Cited Reference Searching 5 Secondary Cited Author Searching 8 Eliminating Self-Citations 9 Saving

More information

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION Sandeep Kaur 1, Dr. Sheetal Kalra 2 1,2 Computer Science Department, Guru Nanak Dev University RC, Jalandhar(India) ABSTRACT

More information

The Process of the Literature Review: Searching & Managing the Literature

The Process of the Literature Review: Searching & Managing the Literature The Process of the Literature Review: Searching & Managing the Literature Susan Meadows, MLS Medical Librarian III, Adjunct Asst. Professor Family and Community Medicine University of Missouri - Columbia

More information

Beyond the Basics of a Search Kathi Grainger Sr. Training Manager

Beyond the Basics of a Search Kathi Grainger Sr. Training Manager Beyond the Basics of a Search Kathi Grainger Sr. Training Manager 8/2015 Objectives & agenda Literature Searching Overview, questions and scope EBP Content Platform review Searching Steps PICO question?

More information

Searching the Evidence in PubMed

Searching the Evidence in PubMed CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Supporting Literature Searching Searching the Evidence in PubMed July 2017 Supporting Literature Searching Searching the Evidence in PubMed How to access PubMed

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

EBSCOhost User Guide MEDLINE

EBSCOhost User Guide MEDLINE EBSCOhost User Guide MEDLINE April 1, 2003 Table of Contents What is MEDLINE?... 3 What is EBSCOhost?... 3 System Requirements...3 Choosing Databases to Search... 3 Database Help...3 Using the Toolbar...

More information

Introduction to Library resources for HHS students

Introduction to Library resources for HHS students Introduction to Library resources for HHS students Learning how to find and use Library resources is a key part of your studies. This guide will explain what is available, how to find them and some tips

More information

How to Apply Basic Principles of Evidence-

How to Apply Basic Principles of Evidence- CSHP 2015 TOOLKIT FROM PAPER TO PR ACTI CE: INCORPORATING EV IDENCE INTO YOUR PAGE 1 PHARMACY PR ACTICE (O BJECTIVE 3.1) How to Apply Basic Principles of Evidence- Based Practice May 2011 How to Do a Basic

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

Automatic Term Indexing in Medical Text Corpora. and its Applications to Consumer Health. Information Systems. Angelos Hliaoutakis

Automatic Term Indexing in Medical Text Corpora. and its Applications to Consumer Health. Information Systems. Angelos Hliaoutakis Automatic Term Indexing in Medical Text Corpora and its Applications to Consumer Health Information Systems Angelos Hliaoutakis December 3, 2009 Contents List of Figures iii Abstract v Acknowledgements

More information

Healthcare Information and Literature Searching

Healthcare Information and Literature Searching Healthcare Information and Literature Searching To book your place on the course contact the library team: www.epsom-sthelier.nhs.uk/lis E: hirsonlibrary@esth.nhs.uk T: 020 8296 2430 Planning your search

More information

PubMed Guide. A. Searching

PubMed Guide. A. Searching TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed Guide A. Searching 1. Keyword searching: What is really going on when you search for a term like stem cells? can use Boolean (AND, OR, NOT) type in:

More information

Introduction to The Cochrane Library

Introduction to The Cochrane Library Introduction to The Cochrane Library What is The Cochrane Library? A collection of databases that include high quality, independent, reliable evidence from Cochrane and other systematic reviews, clinical

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

Full-texts representation with Medical Subject Headings, and co-citations network reranking strategies for TREC 2014 Clinical Decision Support Track

Full-texts representation with Medical Subject Headings, and co-citations network reranking strategies for TREC 2014 Clinical Decision Support Track Full-texts representation with Medical Subject Headings, and co-citations network reranking strategies for TREC 2014 Clinical Decision Support Track J. Gobeill ab, A. Gaudinat a, E. Pasche c, P. Ruch ab

More information

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

National Center for Biotechnology Information and National Institute of Health Manuscript Submission Accounts Set Up

National Center for Biotechnology Information and National Institute of Health Manuscript Submission Accounts Set Up Puerto Rico Clinical and Translational Research Consortium National Center for Biotechnology Information and National Institute of Health Manuscript Submission Accounts Set Up Prepared by the Evaluation

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

A Knowledge-Based Approach to Organizing Retrieved Documents

A Knowledge-Based Approach to Organizing Retrieved Documents A Knowledge-Based Approach to Organizing Retrieved Documents Wanda Pratt Information & Computer Science University of California, Irvine Irvine, CA 92697-3425 pratt@ics.uci.edu From: AAAI-99 Proceedings.

More information

Global Telemedicine Market (Telehome and TeleHospital): Size, Trends & Forecasts ( ) March 2017

Global Telemedicine Market (Telehome and TeleHospital): Size, Trends & Forecasts ( ) March 2017 Global Telemedicine Market (Telehome and TeleHospital): Size, Trends & Forecasts (2017-2021) March 2017 Global Telemedicine Market Report Scope of the Report The report entitled Global Telemedicine Market:

More information

E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE. EBSCOhost User Guide MEDLINE. MEDLINE with Full Text. MEDLINE Complete

E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE. EBSCOhost User Guide MEDLINE. MEDLINE with Full Text. MEDLINE Complete E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE EBSCOhost User Guide MEDLINE MEDLINE with Full Text MEDLINE Complete Last Updated November 13, 2013 Table of Contents What is MEDLINE?... 3 What

More information

efip online Help Document

efip online Help Document efip online Help Document University of Delaware Computer and Information Sciences & Center for Bioinformatics and Computational Biology Newark, DE, USA December 2013 K K S I K K Table of Contents INTRODUCTION...

More information

Medical Information. Objectives 3/9/2016. Literature Search : PubMed. Know. Evaluation 2. Medical informatics Literature search : PubMed PICO Approach

Medical Information. Objectives 3/9/2016. Literature Search : PubMed. Know. Evaluation 2. Medical informatics Literature search : PubMed PICO Approach Medical Information Literature Search : PubMed Bordin Sapsomboon 9 Mar 2016 http://www.si.mahidol.ac.th/simi bordin.sap@mahidol.ac.th Objectives Know Medical informatics Literature search : PubMed PICO

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Presenter: Payam Karisani

Presenter: Payam Karisani Presenter: Payam Karisani Team members: Payam Karisani, CS Ph.D. Student (Team lead) Eugene Agichtein, Associate Professor/Advisor Intelligent Information Access Laboratory (IR Lab) Computer Science &

More information

COCHRANE LIBRARY. Contents

COCHRANE LIBRARY. Contents COCHRANE LIBRARY Contents Introduction... 2 Learning outcomes... 2 About this workbook... 2 1. Getting Started... 3 a. Finding the Cochrane Library... 3 b. Understanding the databases in the Cochrane Library...

More information

The Cochrane Library. Reference Guide. Trusted evidence. Informed decisions. Better health.

The Cochrane Library. Reference Guide. Trusted evidence. Informed decisions. Better health. The Cochrane Library Reference Guide Trusted evidence. Informed decisions. Better health. www.cochranelibrary.com Did you know? Did you know? Ten tips for getting the most out of the Cochrane Library 1.

More information

The information in the database is taken from a range of publication types including journals, books, meeting and patents.

The information in the database is taken from a range of publication types including journals, books, meeting and patents. UNIVERSITY OF ULSTER LIBRARY BIOSIS Previews Backfile 1969-2008 Coverage The backfile of BIOSIS Previews, covering the years 1969 to 2008, is available on the ISI Web of Knowledge platform It is a major

More information

Title Text and Data Mining for Systematic Reviews: Investigating Trends to Update Collaboration Services

Title Text and Data Mining for Systematic Reviews: Investigating Trends to Update Collaboration Services Title Text and Data Mining for Systematic Reviews: Investigating Trends to Update Collaboration Services Background (75) When discussing project planning for systematic reviews and meta-analyses with faculty

More information