A Technical Introduction to the Semantic Search Engine SeMedico

Size: px
Start display at page:

Download "A Technical Introduction to the Semantic Search Engine SeMedico"

Transcription

1 Talk in the Semesterprojekt Entwicklung einer Suchmaschine für Alternativmethoden zu Tierversuchen January 12, 2018 Humboldt-Universität zu Berlin A Technical Introduction to the Semantic Search Engine SeMedico Erik Fäßler Jena University Language & Information Engineering (JULIE) Lab Friedrich Schiller University Jena, Jena, Germany Erik Fäßler Technical Introduction to Semedico 1

2 SeMedico Front Page Erik Fäßler Technical Introduction to Semedico 2

3 SeMedico Auto Completion Erik Fäßler Technical Introduction to Semedico 3

4 SeMedico Result View I Erik Fäßler Technical Introduction to Semedico 4

5 SeMedico Result View II Erik Fäßler Technical Introduction to Semedico 5

6 SeMedico System Overview MEDLINE Doc Doc Doc JULIE Lab Server PostgreSQL NCBI Gene ElasticSearch Concept Database C R A E A E A E C O Frontend (Tapestry / JavaScript) SeMedico Web Application Java Servlet Erik Fäßler Technical Introduction to Semedico 6

7 MEDLINE Document Storage I MEDLINE comes in (G)ZIPed XML 30K documents per file <PubmedArticleSet> <PubmedArticle><MedlineCitation> <PMID> </PMID> <Article> <Journal>...</Journal> <ArticleTitle>...</ArticleTitle> <Abstract>...</Abstract> <AuthorList>...</AuthorList> <MeshHeadings>...</MeshHeadings> </Article> </MedlineCitation> <MedlineCitation> <PMID>...</PMID>... </MedlineCitation></PubmedArticle> </PubmedArticleSet> Erik Fäßler Technical Introduction to Semedico 7

8 MEDLINE Document Storage II Import of MEDLINE citations into database table Size of MEDLINE: 27M abstracts pmid xml <MedlineCitation><PMID> </PMID>...</MedlineCitation> <MedlineCitation><PMID> </PMID>...</MedlineCitation> <MedlineCitation><PMID> </PMID>...</MedlineCitation> <MedlineCitation><PMID> </PMID>...</MedlineCitation> Erik Fäßler Technical Introduction to Semedico 8

9 From the Database into the Pipeline I pmid xml <MedlineCitation><PMID> </PMID>...</MedlineCitation> <MedlineCitation><PMID> </PMID>...</MedlineCitation> <MedlineCitation><PMID> </PMID>...</MedlineCitation> <MedlineCitation><PMID> </PMID>...</MedlineCitation> MEDLINE Doc Doc Doc JULIE Lab Server PostgreSQL NCBI Gene ElasticSearch Concept Database C R A E A E A E C O Frontend (Tapestry / JavaScript) SeMedico Web Application Java Servlet Erik Fäßler Technical Introduction to Semedico 9

10 From the Database into the Pipeline II JULIE Lab Server PostgreSQL UIMA Medline DB Reader DB concurrency handling Parsing of XML Populating UIMA CAS instance Title / Abstract Authors Journal Info etc. Common Analysis System CAS to text analysis components Erik Fäßler Technical Introduction to Semedico 10

11 SeMedico System Overview MEDLINE Doc Doc Doc JULIE Lab Server PostgreSQL NCBI Gene ElasticSearch Concept Database C R A E A E A E C O Frontend (Tapestry / JavaScript) SeMedico Web Application Java Servlet Erik Fäßler Technical Introduction to Semedico 11

12 SeMedico UIMA JCoRe Pipeline I from reader Sentences Tokens Abbreviations Parts of Speech Semantic layer Species (LINNAEUS) GeNo: Genes/Proteins Recognition Normalization (NCBI Gene) Molecular Event Extraction (BioSem) Event Certainty Assessment Scale 1 to 6 1: Negation 6: No doubt MeSH Terms (Dictionary) Ontology classes (GO, GRO; Dictionary) to consumer Hahn & Matthies et al., LREC 2016 Erik Fäßler Technical Introduction to Semedico 12

13 SeMedico UIMA JCoRe Pipeline II from analysis pipeline ElasticSearch CAS Consumer Transforms CAS into preanalyzed JSON document Transformation configurable via API JULIE Lab ES plugin required http ElasticSearch CAS title abstract species genes events transformation API { } preanalyzed JSON title : { }, abstract : { }, authors : { }, : { } Erik Fäßler Technical Introduction to Semedico 13

14 Full texts from Pubmed Central SeMedico integrates the open access subset of PMC Use a specific reader from JCoRe: jcore-pmc-reader The rest of the analysis is basically the same But: Matthies, Franz, & Hahn, Udo (2017). Scholarly information extraction is going to make a quantum leap with PubMed Central (PMC) But moving from abstracts to full texts seems harder than expected. in: MedInfo 2017: Precision Healthcare through Informatics Proceedings of the 16th World Congress on Medical and Health Informatics. Hangzhou, China, August 2017, Erik Fäßler Technical Introduction to Semedico 14

15 SeMedico System Overview MEDLINE Doc Doc Doc JULIE Lab Server PostgreSQL NCBI Gene ElasticSearch Concept Database C R A E A E A E C O Frontend (Tapestry / JavaScript) SeMedico Web Application Java Servlet Erik Fäßler Technical Introduction to Semedico 15

16 Concept Database I Name Description Number of Concepts Medical Subject Headings (MeSH) Biomedical vocabulary, multihierarchy 26K MeSH Supplementary Concepts Chemicals, proteins etc. connected to MeSH 150K NCBI Gene Gene Database 650K (in SeMedico) NCBI Taxonomy Taxonomical classification of species 1.1M Gene Ontology (GO) Ontology about gene products and related processes Gene Regulation Ontology (GRO) Ontology about gene regulation processes 50K 507 Erik Fäßler Technical Introduction to Semedico 16

17 Concept Database II Concepts are arranged taxonomically Squamous Cell Carcinoma IS-A Carcinoma Neo4j is a graph database Terminologies and arbitrary relations between concepts can be modeled explicitly Appropiate query language: Get descendants of concept Compute shortest path between two concepts Erik Fäßler Technical Introduction to Semedico 17

18 Neo4j Example Graph Tauopathies type 1 type 4 type 2 type 3 Erik Fäßler Technical Introduction to Semedico 18

19 Neo4j Concept Node Properties Erik Fäßler Technical Introduction to Semedico 19

20 Zooming Out Erik Fäßler Technical Introduction to Semedico 20

21 Concept IDs Concept Database tid2341 tid42 tid914 SeMedico Web Application Java Servlet query: match: tid914 facet tid42 : { name : mtor, synonym : FRAP, description : } CAS abstract species ncbitax:9606 genes mtor ncbigene:2475 transformation API { } JSON abstract : {[ human, tid914, mtor, tid42 ]} ElasticSearch Erik Fäßler Technical Introduction to Semedico 21

22 ElasticSearch I Manages Lucene index Seamless index updates, no downtime Easy to use index distribution model Full text search Faceting Highlighting Erik Fäßler Technical Introduction to Semedico 22

23 ElasticSearch II? Lucene generates index terms via text analysis Tokenization, case folding, synonym enrichment, stemming ElasticSearch does the same on sent document text How to integrate UIMA? First idea: Create a Lucene UIMA analyzer, but Moves (a lot!) processing requirements into the ElasticSearch cluster Requires to load dictionaries, machine learning models Memory that is lost to Lucene and ElasticSearch Overall: Diminishes search performance Erik Fäßler Technical Introduction to Semedico 23

24 ElasticSearch III JULIE Lab ElasticSearch plugin to exactly specify index terms without ES-internal analysis Employs the JSON format created for the Solr JsonPreAnalyzedParser JsonPreAnalyzedParser Created by JULIE Lab internal (currently) CAS consumer Erik Fäßler Technical Introduction to Semedico 24

25 ElasticSearch IV Preanalyzed Format {"v":"1", "str":"immunohistochemistry performed to evaluate the expression of phosphorylated mtor (p-mtor), phosphorylated p70s6k (p-p70s6k), phosphorylated 4E-binding protein 1 (p-4e-bp1), and Ki-67 using 105 surgically resected ESCC correlated with treatment outcome.", "tokens":[ {"t": immunohistochemistry","s :0,"e :20,"i":1}, {"t": tid94702","s :0,"e :20,"i :0}, {"t": perform","s :21,"e :30,"i":1}, {"t": evaluat","s :34,"e :42,"i":1}, ] } {"t": event","s :34,"e :42,"i :0}, Erik Fäßler Technical Introduction to Semedico 25

26 ElasticSearch V Simple Query { "query": { "bool": { "must": [ { "match": { "abstracttext": { "query": cancer }}}, { "nested": { "path": "events", "inner_hits": {}, "query": { "bool": { "must": [{ "match": { "events.allarguments": "mtor" }}], "filter": { "range": { "events.likelihood": { "lte": 5}}}}}}}]}}, "fields": [ "abstracttext", "title" ] } Erik Fäßler Technical Introduction to Semedico 26

27 ElasticSearch VI Concept Query { "query": { "bool": { "must": [ { "match": { "abstracttext": { "query": tid52310 }}}, { "nested": { "path": "events", "inner_hits": {}, "query": { "bool": { "must": [{ "match": { "events.allarguments": tid42" }}], "filter": { "range": { "events.likelihood": { "lte": 5}}}}}}}]}}, "fields": [ "abstracttext", "title" ] } Erik Fäßler Technical Introduction to Semedico 27

28 ElasticSearch VII Highlighting Erik Fäßler Technical Introduction to Semedico 28

29 References Semedico Faessler, Erik, & Hahn, Udo (2017). SEMEDICO: A comprehensive semantic search engine for the life sciences. in: ACL 2017 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Vancouver, British Columbia, Canada, August 1, 2017, GeNo Wermter, Joachim, & Tomanek, Katrin, & Hahn, Udo (2009). High-performance gene name normalization with GeNo. in: Bioinformatics, 25, BioSem Bui, Q., Mulligen, E. van, Campos, D., & Kors, J. (2013). A Fast Rule-based Approach for Biomedical Event Extraction. In Proceedings of the BioNLP 2013 Shared Task Workshop (pp ). Sofia, Bulgaria: Association for Computational Linguistics. Certainty Assessment Engelmann, Christine, & Hahn, Udo (2014). An empirically grounded approach to extend the linguistic coverage and lexical diversity of verbal probabilities. in: CogSci Proceedings of the 36th Annual Cognitive Science Conference. Cognitive Science Meets Artificial Intelligence: Human and Artificial Agents in Interactive Contexts. Québec City, Québec, Canada, July 23-26, 2014., JCoRe Hahn, Udo, & Matthies, Franz, & Faessler, Erik, & Hellrich, Johannes (2016). UIMA-based JCoRe 2.0 goes GitHub and Maven Central: State-of-the-art software resource engineering and distribution of NLP pipelines. in: LREC 2016 Proceedings of the 10th International Conference on Language Resources and Evaluation. Portorož, Slovenia, May 2016, Erik Fäßler Technical Introduction to Semedico 29

30 Conclusion MEDLINE Doc Doc Doc JULIE Lab Server PostgreSQL NCBI Gene ElasticSearch Concept Database C R A E A E A E C O Frontend (Tapestry / JavaScript) SeMedico Web Application Java Servlet Erik Fäßler Technical Introduction to Semedico 30

31 Talk in the Semesterprojekt Entwicklung einer Suchmaschine für Alternativmethoden zu Tierversuchen January 12, 2018 Humboldt-Universität zu Berlin A Technical Introduction to the Semantic Search Engine SeMedico Erik Fäßler Jena University Language & Information Engineering (JULIE) Lab Friedrich Schiller University Jena, Jena, Germany Erik Fäßler Technical Introduction to Semedico 31

Visualizing Semantic Metadata from Biological Publications

Visualizing Semantic Metadata from Biological Publications Visualizing Semantic Metadata from Biological Publications Johannes Hellrich, Erik Faessler, Ekaterina Buyko and Udo Hahn Jena University Language and Information Engineering (JULIE) Lab Friedrich-Schiller-Universität

More information

An UIMA based Tool Suite for Semantic Text Processing

An UIMA based Tool Suite for Semantic Text Processing An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life

More information

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T The Muc7 T Corpus Katrin Tomanek and Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany {katrin.tomanek udo.hahn}@uni-jena.de 1 Introduction

More information

Literature Databases

Literature Databases Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and

More information

SciMiner User s Manual

SciMiner User s Manual SciMiner User s Manual Copyright 2008 Junguk Hur. All rights reserved. Bioinformatics Program University of Michigan Ann Arbor, MI 48109, USA Email: juhur@umich.edu Homepage: http://jdrf.neurology.med.umich.edu/sciminer/

More information

Integrated Semantic Search on Structured and Unstructured Data in the ADOnIS System

Integrated Semantic Search on Structured and Unstructured Data in the ADOnIS System Integrated Semantic Search on Structured and Unstructured Data in the ADOnIS System Friederike Klan, Erik Faessler,Alsayed Algergawy, Birgitta König-Ries, and Udo Hahn Friedrich-Schiller-Universität Jena,

More information

UIMA-based Annotation Type System for a Text Mining Architecture

UIMA-based Annotation Type System for a Text Mining Architecture UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1 Outline Introduction Overview of Literature Data Sources PubMed, HighWire

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

efip online Help Document

efip online Help Document efip online Help Document University of Delaware Computer and Information Sciences & Center for Bioinformatics and Computational Biology Newark, DE, USA December 2013 K K S I K K Table of Contents INTRODUCTION...

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

UIMA-based JCORE 2.0 Goes GITHUB and MAVEN CENTRAL State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines

UIMA-based JCORE 2.0 Goes GITHUB and MAVEN CENTRAL State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines UIMA-based JCORE 2.0 Goes GITHUB and MAVEN CENTRAL State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines Udo Hahn 1 Franz Matthies 1 Erik Faessler 1 Johannes Hellrich 1 2 1 Jena

More information

RLIMS-P Website Help Document

RLIMS-P Website Help Document RLIMS-P Website Help Document Table of Contents Introduction... 1 RLIMS-P architecture... 2 RLIMS-P interface... 2 Login...2 Input page...3 Results Page...4 Text Evidence/Curation Page...9 URL: http://annotation.dbi.udel.edu/text_mining/rlimsp2/

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

MeSH: A Thesaurus for PubMed

MeSH: A Thesaurus for PubMed Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical

More information

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

MedKit: A Helper Toolkit for Automatic Mining of MEDLINE/PubMed Citations. Jing Ding. Daniel Berleant *

MedKit: A Helper Toolkit for Automatic Mining of MEDLINE/PubMed Citations. Jing Ding. Daniel Berleant * MedKit: A Helper Toolkit for Automatic Mining of MEDLINE/PubMed Citations Jing Ding Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA Daniel Berleant * Department

More information

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Pavel P. Kuksa, Rutgers University Yanjun Qi, Bing Bai, Ronan Collobert, NEC Labs Jason Weston, Google Research NY Vladimir

More information

Annotating Spatio-Temporal Information in Documents

Annotating Spatio-Temporal Information in Documents Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

The GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature. Jin-Dong Kim Tsujii Laboratory, University of Tokyo

The GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature. Jin-Dong Kim Tsujii Laboratory, University of Tokyo The GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature Jin-Dong Kim Tsujii Laboratory, University of Tokyo Contents Ontology, Corpus and Annotation for IE Annotation and Information

More information

PubMed Guide. A. Searching

PubMed Guide. A. Searching TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed Guide A. Searching 1. Keyword searching: What is really going on when you search for a term like stem cells? can use Boolean (AND, OR, NOT) type in:

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

EBP. Accessing the Biomedical Literature for the Best Evidence

EBP. Accessing the Biomedical Literature for the Best Evidence Accessing the Biomedical Literature for the Best Evidence Structuring the search for information and evidence Basic search resources Starting the search EBP Lab / Practice: Simple searches Using PubMed

More information

An Introduction to PubMed Searching: A Reference Guide

An Introduction to PubMed Searching: A Reference Guide An Introduction to PubMed Searching: A Reference Guide Created by the Ontario Public Health Libraries Association (OPHLA) ACCESSING PubMed PubMed, the National Library of Medicine s free version of MEDLINE,

More information

MedLingMap: A growing resource mapping the Bio-Medical NLP field

MedLingMap: A growing resource mapping the Bio-Medical NLP field MedLingMap: A growing resource mapping the Bio-Medical NLP field Marie Meteer, Bensiin Borukhov, Michael Crivaro, Michael Shafir, Attapol Thamrongrattanarit {mmeteer, bborukhov, mcrivaro, mshafir, tet}@brandeis.edu

More information

@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha

@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha @Note2 tutorial Hugo Costa (hcosta@silicolife.com) Ruben Rodrigues (pg25227@alunos.uminho.pt) Miguel Rocha (mrocha@di.uminho.pt) 23-01-2018 The document presents a typical workflow using @Note2 platform

More information

Alternative Tools for Mining The Biomedical Literature

Alternative Tools for Mining The Biomedical Literature Yale University From the SelectedWorks of Rolando Garcia-Milian May 14, 2014 Alternative Tools for Mining The Biomedical Literature Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/1/

More information

Goal of this document: A simple yet effective

Goal of this document: A simple yet effective INTRODUCTION TO ELK STACK Goal of this document: A simple yet effective document for folks who want to learn basics of ELK (Elasticsearch, Logstash and Kibana) without any prior knowledge. Introduction:

More information

NCIBI Literature Mining Behind the Scenes, Web-Based Access

NCIBI Literature Mining Behind the Scenes, Web-Based Access NCIBI Literature Mining Behind the Scenes, Web-Based Access Alex Ade National Center for Integrative Biomedical Informatics University of Michigan 30 July, 2009 Introduction NCIBI Biomedical Literature

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

Precise Medication Extraction using Agile Text Mining

Precise Medication Extraction using Agile Text Mining Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? http://www.ncbi.nlm.nih.gov/mesh

More information

NCBI News, November 2009

NCBI News, November 2009 Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved

More information

Improving Interoperability of Text Mining Tools with BioC

Improving Interoperability of Text Mining Tools with BioC Improving Interoperability of Text Mining Tools with BioC Ritu Khare, Chih-Hsuan Wei, Yuqing Mao, Robert Leaman, Zhiyong Lu * National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda,

More information

0.1 Knowledge Organization Systems for Semantic Web

0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization

More information

ElasticSearch in Production

ElasticSearch in Production ElasticSearch in Production lessons learned Anne Veling, ApacheCon EU, November 6, 2012 agenda! Introduction! ElasticSearch! Udini! Upcoming Tool! Lessons Learned introduction! Anne Veling, @anneveling!

More information

dr.ir. D. Hiemstra dr. P.E. van der Vet

dr.ir. D. Hiemstra dr. P.E. van der Vet dr.ir. D. Hiemstra dr. P.E. van der Vet Abstract Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers

More information

run your own search engine. today: Cablecar

run your own search engine. today: Cablecar run your own search engine. today: Cablecar Robert Kowalski @robinson_k http://github.com/robertkowalski Search nobody uses that, right? Services on the Market Google Bing Yahoo ask Wolfram Alpha Baidu

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Integrated Access to Biological Data. A use case

Integrated Access to Biological Data. A use case Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research

More information

EVIDENCE SEARCHING IN EBM. By: Masoud Mohammadi

EVIDENCE SEARCHING IN EBM. By: Masoud Mohammadi EVIDENCE SEARCHING IN EBM By: Masoud Mohammadi Steps in EBM Auditing the outcome Defining the question or problem Applying the results Searching for the evidence Critically appraising the literature Clinical

More information

Acquiring Experience with Ontology and Vocabularies

Acquiring Experience with Ontology and Vocabularies Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended

More information

Semantic Knowledge Discovery OntoChem IT Solutions

Semantic Knowledge Discovery OntoChem IT Solutions Semantic Knowledge Discovery OntoChem IT Solutions OntoChem IT Solutions GmbH Blücherstr. 24 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com Get the Gold!

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

The basics of searching biomedical databases. Francesca Frati, MLIS. Learning Outcomes. At the end of this workshop you will:

The basics of searching biomedical databases. Francesca Frati, MLIS. Learning Outcomes. At the end of this workshop you will: The basics of searching biomedical databases Francesca Frati, MLIS Learning Outcomes At the end of this workshop you will: Be better able to formulate a clear search question Become more familiar with

More information

LSHTC: A Benchmark for Large-Scale Classification

LSHTC: A Benchmark for Large-Scale Classification LSHTC: A Benchmark for Large-Scale Classification I. Partalas, A. Kosmopoulos,, G. Paliouras, E. Gaussier, I. Androutsopoulos, T. Artières, P. Gallinari, Massih-Reza Amini, Nicolas Baskiotis Lab. d Informatique

More information

E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE. EBSCOhost User Guide MEDLINE. MEDLINE with Full Text. MEDLINE Complete

E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE. EBSCOhost User Guide MEDLINE. MEDLINE with Full Text. MEDLINE Complete E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE EBSCOhost User Guide MEDLINE MEDLINE with Full Text MEDLINE Complete Last Updated November 13, 2013 Table of Contents What is MEDLINE?... 3 What

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

NCI Thesaurus, managing towards an ontology

NCI Thesaurus, managing towards an ontology NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports

More information

EBSCOhost User Guide MEDLINE

EBSCOhost User Guide MEDLINE EBSCOhost User Guide MEDLINE April 1, 2003 Table of Contents What is MEDLINE?... 3 What is EBSCOhost?... 3 System Requirements...3 Choosing Databases to Search... 3 Database Help...3 Using the Toolbar...

More information

Terminology Services. Diane Vizine-Goetz Senior Research Scientist OCLC Research

Terminology Services. Diane Vizine-Goetz Senior Research Scientist OCLC Research Terminology Services Diane Vizine-Goetz Senior Research Scientist OCLC Research Presentation History A version of this presentation was given at: New Dimensions in Knowledge Organization Systems: A Joint

More information

SAPIENT Automation project

SAPIENT Automation project Dr Maria Liakata Leverhulme Trust Early Career fellow Department of Computer Science, Aberystwyth University Visitor at EBI, Cambridge mal@aber.ac.uk 25 May 2010, London Motivation SAPIENT Automation Project

More information

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle Gregory, Liam McGrath, Eric Bell, Kelly O Hara, and Kelly Domico Pacific Northwest National Laboratory

More information

A Framework for BioCuration (part II)

A Framework for BioCuration (part II) A Framework for BioCuration (part II) Text Mining for the BioCuration Workflow Workshop, 3rd International Biocuration Conference Friday, April 17, 2009 (Berlin) Martin Krallinger Spanish National Cancer

More information

Brat2BioC: conversion tool between brat and BioC

Brat2BioC: conversion tool between brat and BioC Brat2: conversion tool between and Antonio Jimeno Yepes 1,2, Mariana Neves 3,4, Karin Verspoor 1,2 1 NICTA Victoria Research Lab, Melbourne VIC 3010, Australia 2 Department of Computing and Information

More information

The NLM Medical Text Indexer System for Indexing Biomedical Literature

The NLM Medical Text Indexer System for Indexing Biomedical Literature The NLM Medical Text Indexer System for Indexing Biomedical Literature James G. Mork 1, Antonio J. Jimeno Yepes 2,1, Alan R. Aronson 1 1 National Library of Medicine, Bethesda, MD, USA {mork,alan}@nlm.nih.gov

More information

HyLaP-AM Semantic Search in Scientific Documents

HyLaP-AM Semantic Search in Scientific Documents HyLaP-AM Semantic Search in Scientific Documents Ulrich Schäfer, Hans Uszkoreit, Christian Federmann, Yajing Zhang, Torsten Marek DFKI Language Technology Lab Talk Outline Extracting facts form scientific

More information

Improving Drupal search experience with Apache Solr and Elasticsearch

Improving Drupal search experience with Apache Solr and Elasticsearch Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming

More information

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

TEES 2.2: Biomedical Event Extraction for Diverse Corpora RESEARCH Open Access TEES 2.2: Biomedical Event Extraction for Diverse Corpora Jari Björne 1,2*, Tapio Salakoski 1,2 From BioNLP Shared Task 2013 Sofia, Bulgaria. 9 August 2013 Abstract Background: The

More information

Apache UIMA and Mayo ctakes

Apache UIMA and Mayo ctakes Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Chemical name recognition with harmonized feature-rich conditional random fields

Chemical name recognition with harmonized feature-rich conditional random fields Chemical name recognition with harmonized feature-rich conditional random fields David Campos, Sérgio Matos, and José Luís Oliveira IEETA/DETI, University of Aveiro, Campus Universitrio de Santiago, 3810-193

More information

C. The system is equally reliable for classifying any one of the eight logo types 78% of the time.

C. The system is equally reliable for classifying any one of the eight logo types 78% of the time. Volume: 63 Questions Question No: 1 A system with a set of classifiers is trained to recognize eight different company logos from images. It is 78% accurate. Without further information, which statement

More information

Biomedical literature mining for knowledge discovery

Biomedical literature mining for knowledge discovery Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in

More information

Re-designing Online Terminology Resources for German Grammar

Re-designing Online Terminology Resources for German Grammar Re-designing Online Terminology Resources for German Grammar Project Report Karolina Suchowolec, Christian Lang, and Roman Schneider Institut für Deutsche Sprache (IDS), Mannheim, Germany {suchowolec,

More information

Using open access literature to guide full-text query formulation. Heather A. Piwowar and Wendy W. Chapman. Background

Using open access literature to guide full-text query formulation. Heather A. Piwowar and Wendy W. Chapman. Background Using open access literature to guide full-text query formulation Heather A. Piwowar and Wendy W. Chapman Background Much scientific knowledge is contained in the details of the full-text biomedical literature.

More information

PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search

PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search Bioinformatics (2006), accepted. PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search Jing Ding Department of Electrical and Computer Engineering, Iowa State University, Ames, IA

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

TSRI, 400-S PubMed / MyNCBI

TSRI, 400-S PubMed / MyNCBI TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search

More information

Projects Tools BLAH proposal Conclusion. OntoGene/BioMeXT

Projects Tools BLAH proposal Conclusion. OntoGene/BioMeXT OntoGene/BioMeXT The Bio Term Hub and OGER Lenz Furrer, Nico Colic, Fabio Rinaldi University of Zurich and Swiss Institute of Bioinformatics January 10, 2018 Outline Projects Tools BLAH proposal Conclusion

More information

Large-Scale Semantic Indexing and Question Answering in Biomedicine

Large-Scale Semantic Indexing and Question Answering in Biomedicine Large-Scale Semantic Indexing and Question Answering in Biomedicine E. Papagiannopoulou *, Y. Papanikolaou *, D. Dimitriadis *, S. Lagopoulos *, G. Tsoumakas *, M. Laliotis **, N. Markantonatos ** and

More information

Open Research Online The Open University s repository of research publications and other research outputs

Open Research Online The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs The Smart Book Recommender: An Ontology-Driven Application for Recommending Editorial Products

More information

Measuring inter-annotator agreement in GO annotations

Measuring inter-annotator agreement in GO annotations Measuring inter-annotator agreement in GO annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns ns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

More information

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Origin and Outcomes Currently funded through a Wellcome Trust Seed award Collaboration

More information

Combining commercial and open access citation databases to delimit highly interdisciplinary research fields for citation analysis

Combining commercial and open access citation databases to delimit highly interdisciplinary research fields for citation analysis Combining commercial and open access citation databases to delimit highly interdisciplinary research fields for citation analysis Andreas Strotmann a*, Dangzhi Zhao b a School of Public Health, University

More information

BioC: a minimalist approach to interoperability for biomedical text processing. Don Comeau

BioC: a minimalist approach to interoperability for biomedical text processing. Don Comeau BioC: a minimalist approach to interoperability for biomedical text processing Don Comeau Outline Background and origin of BioC What is BioC? Available Tools and Corpora 2 BioCreative Critical Assessment

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Overview of BioCreative VI Precision Medicine Track

Overview of BioCreative VI Precision Medicine Track Overview of BioCreative VI Precision Medicine Track Mining scientific literature for protein interactions affected by mutations Organizers: Rezarta Islamaj Dogan (NCBI) Andrew Chatr-aryamontri (BioGrid)

More information

Searching the Evidence in PubMed

Searching the Evidence in PubMed CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Supporting Literature Searching Searching the Evidence in PubMed July 2017 Supporting Literature Searching Searching the Evidence in PubMed How to access PubMed

More information

Query Reformulation for Clinical Decision Support Search

Query Reformulation for Clinical Decision Support Search Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University

More information

CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3

CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3 CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3 SECTION VII IDENTIFYING THE APPROPRIATE DATABASES JOURNAL ARTICLES THROUGH PUBMED, MEDLINE AND COMMUNICATION DISORDERS MULTISEARCH

More information

PubMed Basics. Stephanie Friree Outreach and Technology Coordinator NN/LM New England Region (800)

PubMed Basics. Stephanie Friree Outreach and Technology Coordinator NN/LM New England Region (800) PubMed Basics 1 PubMed Basics 2 PubMed Basics 3 PubMed Basics Stephanie Friree Outreach and Technology Coordinator stephanie.friree@umassmed.edu NN/LM New England Region (800) 338-7657 Overview! Introductions!

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Use of graphs and taxonomic classifications to analyze content relationships among courseware

Use of graphs and taxonomic classifications to analyze content relationships among courseware Institute of Computing UNICAMP Use of graphs and taxonomic classifications to analyze content relationships among courseware Márcio de Carvalho Saraiva and Claudia Bauzer Medeiros Background and Motivation

More information

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book

More information

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL Shuguang Wang Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA swang@cs.pitt.edu Shyam Visweswaran Department of Biomedical

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

Searching for Literature Using HDAS (Healthcare Databases Advanced Search)

Searching for Literature Using HDAS (Healthcare Databases Advanced Search) Searching for Literature Using HDAS (Healthcare Databases Advanced Search) 1. What is HDAS?... page 2 2. How do I access HDAS?... page 3 3. Questions and concepts (PICO) page 4 4. Selecting a database.

More information

Languages and tools for building and using ontologies. Simon Jupp, James Malone

Languages and tools for building and using ontologies. Simon Jupp, James Malone An overview of ontology technology Languages and tools for building and using ontologies Simon Jupp, James Malone jupp@ebi.ac.uk, malone@ebi.ac.uk Outline Languages OWL and OBO classes, individuals, relations,

More information

Optimization of the PubMed Automatic Term Mapping

Optimization of the PubMed Automatic Term Mapping 238 Medical Informatics in a United and Healthy Europe K.-P. Adlassnig et al. (Eds.) IOS Press, 2009 2009 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-044-5-238

More information

Apache UIMA ConceptMapper Annotator Documentation

Apache UIMA ConceptMapper Annotator Documentation Apache UIMA ConceptMapper Annotator Documentation Written and maintained by the Apache UIMA Development Community Version 2.3.1 Copyright 2006, 2011 The Apache Software Foundation License and Disclaimer.

More information

A Semantic Search Component for BExIS 2

A Semantic Search Component for BExIS 2 A Semantic Search Component for BExIS 2 Friederike Klan, Alsayed Algergawy, Erik Fäßler, Udo Hahn, Birgitta König Ries BExIS DevConf 2017 Keyword-Based Search Keyword-Based Search supports search queries

More information

Network analysis. Martina Kutmon Department of Bioinformatics Maastricht University

Network analysis. Martina Kutmon Department of Bioinformatics Maastricht University Network analysis Martina Kutmon Department of Bioinformatics Maastricht University What's gonna happen today? Network Analysis Introduction Quiz Hands-on session ConsensusPathDB interaction database Outline

More information

An Overview of JCORE, the JULIE Lab UIMA Component Repository

An Overview of JCORE, the JULIE Lab UIMA Component Repository An Overview of JCORE, the JULIE Lab UIMA Component Repository U. Hahn, E. Buyko, R. Landefeld, M. Mühlhausen, M. Poprat, K. Tomanek, J. Wermter Jena University Language & Information Engineering (JULIE)

More information

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Ian Fore, D.Phil. Associate Director, Biorepository and Pathology Informatics Senior Program

More information