eveloping DataMed the current status

Size: px
Start display at page:

Download "eveloping DataMed the current status"

Transcription

1 eeloping DataMed the current status Hua Xu Core Deelopment Team (CDT) biocaddie AHM /8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 1

2 Outline CDT Roles and Roadmap DataMed General Architecture Search engine Ne features Next steps 8/8/17 Supported by the NIH grant #1U24 AI to the Uniersity of California, San Diego 2

3 Roles of CDT Deelop a functional prototype of DDI DataMed Implement guidelines/suggestions by WGs Integrate systems/modules deeloped by pilot projects and supplements Engage end users into DataMed deelopment and ealuation 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 3

4 V0.1 DDI architecture Data ingestion Search function Feedback collection Data identifier Data indexing Dataset result display Terminology serer search engine Interface design Usability needs analysis Ranking algorithm Search function Architecture Wrap up of pilot projects RFA for pilot on Harester for DDI schema V0.2 CDT Roadmap Usability study phase I Ranking Metadata management V0.5 Personalized search Link dataset to external resources Search algorithm Metadata ingestion Import repositories Repository submission form Map metadata to DATs model 2.x NLP-based indexing/searching V1.0 Documentation Usability study phase II & III Data duplication issue Generation of benchmark datasets Terminology serer - indexing Visualization Personalized search Improe the tracking system Search/Ranking algorithms Similar datasets to be expanded Display of results Sort datasets Additional filters V2.0 V3.0 Pilot project integration Usability study Web API

5 Accomplishments Present DataMed V 1.5 No, 2016 V 2.0 Feb, 2017 V 3.0 Jul, 2017 Data ingestion Ingestion of 74 data repositories Implementation of DATS 2.2 metadata model Enhancement modules to indexing pipeline Search engine NLP serice Terminology serice ElasticSearch optimization Other functionalities Ealuation Benchmark data set / Data Retrieal challenge User surey 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 5

6 DataMed Architecture Ingestion Indexing Data Sources Repositories Online datasets Metadata Ingestion ElasticSearch User Interface Funding Agencies Publishers Data producers Terminology & NLP serer

7 Search engine architecture Model Vie Responsible for constructing ES query and retrieing search results. User interaction; Responsible for rendering of model. Controller Responsible for responding to user input, e.g.. generate corresponding search keyords, search fields, facets fields etc 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 7

8 Search engine orkflo Query Concept extraction NLP serer Synonym expansion Terminology serer Facets ElasticSearch Ranked results Search Refined Query 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 8

9 What s ne - NLP serer Goal leerage NLP approaches to extract biomedical concepts General biomedical concepts à MESH Fie specific types of entities: disease, chemical, gene, biological process, and cell line Uses Processing user entered queries Indexing metadata textual fields (e.g., description) Implementation mixture of existing tools and locally deeloped systems MetaMap Lite Machine learning based NER systems (e.g., based on CRF) Rule-based systems (e.g., dictionary lookup) 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 9

10 NLP ealuation Category Method Precision/recall Note Mesh term MetaMap Lite Gene CRF 89.95%/50.75% Trained on pubmed corpus, tested on dataset description Disease CRF 92.54%/88.89% Trained on pubmed corpus, tested on dataset description Drug CRF 91.08%/69.06% Trained on pubmed corpus, tested on dataset description Cell Line CRF Insufficient entities for ealuation Biological process Trained on corpus proided in Dictionary lookup 93.96% /76.8% QuickGo, tested on dataset description 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 10

11 NLP s effect on search No NLP With NLP expansion infap infndcg p@

12 What s ne - Terminology serer Goal leerage existing terminologies in the biomedical domain to facilitate search Uses Term expansion during indexing by the ingestion pipeline Query expansion for search engine Auto-completion for search engine Spelling correction for search engine 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 12

13 Terminology serer implementation Sources - Condition, Chemical, Gene, Procedure, Lab, Anatomy, Organism Terminology Source Concepts Relationships NCBI taxonomy 906,782 1,813,962 SNOMEDCT_US 315,904 5,414,108 MESH 254,883 2,947,524 FMA 83,282 1,105,781 GO 39,269 2,341,646 HGNC 38, ,996 TOTAL 1,725,407 7,578,017 Techniques Neo4j - graph Database SciGraph interface (RESTful API) for Neo4j Response time < 0.1 sec 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 13

14 What s ne - Ranking algorithm Current ranking algorithms ElasticSearch s implementation of probabilistic releance model called Okapi GM25 Citation count, such as GEO Ranking by citation counts (GEO), published time. Future implementation Ne releance ranking algorithms from the Dataset Retrieal challenge: Blind Releance Feedback algorithm and learning-to-rank algorithm 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 14

15 Highlights of DataMed 3.0 More repositories 74 Web API Fine grained User Tracking System Reporting of Broken Links Duplicated datasets Visualize data statistics (Diploid project) Schema.org markup

16 Dataset statistics 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 16

17 Web API API for DataMed - REQUEST API DataMed RESULT

18 User tracking system Search term 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 18

19 User tracking system List & rank order of results returned in the page 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 19

20 User tracking system Results & position in page clicked by user Scroll position 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 20

21 User tracking system - Uses Query logs for analyses Usability studies Information for most accessed data Information for ranking by user feedback 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 21

22 Broken links 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 22

23 Duplicated datasets 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 23

24 User statistics 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 24

25 Publications (10/2016-present) Papers Dong X, Zhang Y, Xu H. Search Datasets in Literature: A Case Study of GWAS, AMIA CRI/TBI Symposium, San Francisco, 2017 Cohen T, Roberts K, Gururaj AE, Chen X, Pornejati S, Hersh WR, Demner-Fushman D, Ohno- Machado L, Xu H. A Publicly Aailable Benchmark for Biomedical Dataset Retrieal: The Reference Standard for the 2016 biocaddie Dataset Retrieal Challenge. Database, 2017 (Accepted). Roberts K, Gururaj AE, Chen X, Pornejati S, Hersh WR, Demner-Fushman D, Ohno-Machado L, Cohen T, Xu H. Information Retrieal for Biomedical Datasets: The 2016 biocaddie Dataset Retrieal Challenge. Database, 2017 (Under Reie). Chen X, Liu R, Ozyurt B, Gururaj AE, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim H, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone S-A, Fore I, Ohno-Machado L, Grethe J, Xu H, Bell E,. Building a search engine for finding biomedical datasets across repositories the DataMed system. JAMIA, 2017 (Under Reie) Dixit R, Rogith D, Narayana V, Salimi M, Gururaj AE, Ohno-Machado L, Xu H, Johnson T. User Needs Analysis and Usability Assessment of DataMed a Biomedical Data Discoery Index. JAMIA, 2017 (Under Reie) 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 25

26 Publications (10/2016-present) Abstracts/Presentations Systems Demo, AMIA Annual Symposium, 2016 Ingestion & Indexing Pipeline Abstract, AMIA Annual Symposium, 2016 DataMed Abstract, ICBO 2016 DataMed Abstract, BD2K AHM DataMed NLP Pipeline Abstract, AMIA Annual Symposium, Accepted biocaddie Dataset Retrieal Challenge Abstract, AMIA Annual Symposium, Accepted 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 26

27 Next Steps Data Ingestion Expand to 100+ data repositories, ith DATS 2.2 Deelop automated tools for data ingestion Search engine Ranking algorithms - implementation of pilot projects (Emory and UIUC) Deep search Common data elements (CDE) Other ne functionalities: Update Adanced Search Display most Accessed Datasets Word Cloud (search results isualization) Alerts to ne results from saed searches Ealuation and promotion User and performance ealuation 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 27

28 UCSD Claudiu Farcas Jeffrey Grethe Hyeoneui Kim Nansu Zong Stephen Trac Yueling Li Larry Lui Burak Ozyurt Oxford Alejandra Gonzalez-Beltran Philippe Rocca-Serra Susanna-Assunta Sansone NIH Ian Fore CDT members UTHealth Xiaoling Chen Firat Tiryakt Treor Cohen Anupama Gururaj Todd Johnson Deeakar Rogith Nina Salimi Ergin Soysal Cui Tao Hua Xu Ram Dixit Pilot projects team members Preious members Pratik Chaudhary Ruiling Liu Ron Margolis Saeid Pournejati Stephanie Ngyuen Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 28

29 Thank you! For questions, please contact CDT at: 8/8/17 Supported by the NIH grant 1U24 AI to the Uniersity of California, San Diego 29

30 Preliminary ork on integrating CDE ith DataMed Hua Xu, PhD

31 Dimensions of DATS Dimension: meant to be used to report hat data points are about in a dataset, their nature, their units. Dimension must be typed biocaddie DATS Dimension as discussed ith Google schema.org team, ho added ariablemeasured property in schema.org to coer the notion. Application: List all measurement types performed in a clinical trial Purpose: allo indexing on measurement types / ariables / dimensions

32 Dimensions in DataMed As of May 2017, 66 repositories 1,375,801 datasets in DataMed. Collected dimension information for 4 repositories ImmPort (205/222) LINCS (286 /287) MPD (376/376) NeuroMorpho (50356/50356)

33 Mapping beteen Dimension and CDE NIH CDE Repositories 20,189 CDE across all initiaties, 19,985 unique CDE Examples: Smoking status Polysomnography blood oxygen saturation distribution percent alue In the past 7 days I felt comfortable ith others my age Thinking about your child's life, My child thinks his/her life has purpose. Dimension in DataMed 3083 unique dimension names from 4 repositories Examples: Blood Cell Count ith Differential Protein measurement Skin lesion inoled and skin lesion duration total horizontal distance traeled (immediately to 15 min post-injection), baseline Oerlap (exact matching) 6 only Smoking status, Medical history, Body eight, Age, Heart rate, Vital capacity

34 Mapping Dimension to CDE Dimension name axial length, right eye cardiac output hdl cholesterol left entricle ejection fraction serum itamin d concentration hole body bone mineral density (bmd) Obserations CDE axial length right eye cardiac output measurement hdl cholesterol alue echocardiogram left entricle ejection fraction measurement person serum itamin d leel number dual x-ray absorptiometry hole body bone mineral density alue Many datasets do not use CDE Some CDEs are represented as combination of seeral concepts, expressed as sentences Dimension and CDE may be related concepts at different granularity leels Synonyms and abbreiation may be inoled.

35 Query Backe Metadata nd - Dimensio n System architecture NLPbased CDE encodi ng serice Has CDE Ye identifi s er? DataMed n default o search algorithm Metadata enrichmen t ith CDE identifiers Search results Boosts applied to search CDE identifier in CDE enrichment field

36 Preliminary results Initial similarity algorithm based on Elasticsearch Examples: Dimension name axial length, left eye total cholesterol hdl cholesterol brain eight pulse rate left entricle diastolic olume non-hdl cholesterol Mapped to CDE axial length left eye total cholesterol alue hdl cholesterol alue brain eight measurement pulse rate measurement alue echocardiogram left entricle end diastolic olume measurement hdl cholesterol alue A lot body of of eight room at for dissection improement body eight at birth NLP CDE encoding algorithm can be applied to other applications here CDE recognition is needed.

37 Demo CDE enrichment using the algorithm for 4 repositories. ex.php hdl cholesterol ill mapped to "hdl cholesterol alue and suggest for a phrase search in dimension field. anced.php

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Agenda Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Progress and updates Y1Q3 and plans for Y1Q4 Plan for the

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please isit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Metadata Ingestion and Processinng

Metadata Ingestion and Processinng biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch

More information

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Minutes. Date: Location: UCSD BRF2 5A03. Attendees Present

Minutes. Date: Location: UCSD BRF2 5A03. Attendees Present Executive Committee Meeting Location: UCSD BRF2 5A03 Date: 8-16-16 Start time: 10:00 am PDT End time: 11:30 am PDT Meeting Objective Attendees Present Minute Taker Executive Committee Meeting UCSD: Lucila

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

Susanna-Assunta Sansone, PhD. Metadata WG3 chair. Susanna-Assunta Sansone, PhD Metadata WG3 chair 3-workgroup@biocaddie.org WG3 Metadata v v Full description: goals, synergies, phases, members & files Joint effort with BD2K Center for Expanded Data Annotation

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting Agenda v Updates regarding last meeting action items v Presentation by Ergin about Ontology Services v Brief updates from others Supported by the NIH grant 1U24

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform

Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform biocaddie All Hands Meeting September 11 th, 2016 Ram Gouripeddi & Julio Facelli Department

More information

A Data Citation Roadmap for Scholarly Data Repositories

A Data Citation Roadmap for Scholarly Data Repositories A Data Citation Roadmap for Scholarly Data Repositories Tim Clark (Harvard Medical School & Massachusetts General Hospital) Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science,

More information

TSRI, 400-S PubMed / MyNCBI

TSRI, 400-S PubMed / MyNCBI TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search

More information

UC San Diego UC San Diego Electronic Theses and Dissertations

UC San Diego UC San Diego Electronic Theses and Dissertations UC San Diego UC San Diego Electronic Theses and Dissertations Title Information Retrieval in Biomedical Research: From Articles to Datasets Permalink https://escholarship.org/uc/item/660390nr Author Wei,

More information

Prototyping a Biomedical Ontology Recommender Service

Prototyping a Biomedical Ontology Recommender Service Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the

More information

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science, Harvard University) May 15, 2017 2014 Joint Declaration

More information

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine Embracing Semantic Technology for Better Metadata Authoring in Biomedicine Attila L. Egyedi, Martin J. O Connor, Marcos Martínez-Romero, Debra Willrett, Josef Hardi, John Graybeal, and Mark A. Musen Stanford

More information

Phenotype Discovery in NHLBI Genomic Studies

Phenotype Discovery in NHLBI Genomic Studies Phenotype Discovery in NHLBI Genomic Studies Final Report Hyeoneui Kim, RN, PhD Son Doan, PhD Ko-Wei Lin, DVM, PhD Michael Conway, PhD Alexander Hsieh Asher Garland Seena Farzaneh Neda Alipanah Stephanie

More information

TSRI, 400-S PubMed / MyNCBI

TSRI, 400-S PubMed / MyNCBI TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search

More information

AI Application and Development in ehealth Field. MIN Dong

AI Application and Development in ehealth Field. MIN Dong AI Application and Development in ehealth Field MIN Dong What s e-health? Defined by WHO ehealth is the cost-effective and secure use of information and communications technologies(icts) in support of

More information

ELIXIR webinar. schema.org structured data for life sciences. Events, training materials, organizations, 16 March 2016, 14:00 GMT

ELIXIR webinar. schema.org structured data for life sciences. Events, training materials, organizations, 16 March 2016, 14:00 GMT ELIXIR webinar 16 March 2016, 14:00 GMT schema.org structured data for life sciences Events, training materials, organizations, Structured data? A standard way to annotate content so machines can understand

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center

More information

IBM Marketing Operations and Campaign Version 9 Release 0 January 15, Integration Guide

IBM Marketing Operations and Campaign Version 9 Release 0 January 15, Integration Guide IBM Marketing Operations and Campaign Version 9 Release 0 January 15, 2013 Integration Guide Note Before using this information and the product it supports, read the information in Notices on page 51.

More information

SCDM 2017 ANNUAL CONFERENCE. September I Orlando

SCDM 2017 ANNUAL CONFERENCE. September I Orlando SCDM 2017 ANNUAL CONFERENCE September 24-27 I Orlando CDASH 2.0 What s New and How Does It Impact Me? Panel Discussion Moderator: Dawn M. Kaminski Director, Clinical Data Strategies Accenture Before We

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

Multi-field query expansion is effective for biomedical dataset retrieval

Multi-field query expansion is effective for biomedical dataset retrieval Database, 2017, 1 20 doi: 10.1093/database/bax062 Original article Original article Multi-field query expansion is effective for biomedical dataset retrieval Mohamed Reda Bouadjenek* and Karin Verspoor

More information

IBM Marketing Operations and Campaign Version 9 Release 1.1 November 26, Integration Guide

IBM Marketing Operations and Campaign Version 9 Release 1.1 November 26, Integration Guide IBM Marketing Operations and Campaign Version 9 Release 1.1 Noember 26, 2014 Integration Guide Note Before using this information and the product it supports, read the information in Notices on page 55.

More information

Monitor Developer s Guide

Monitor Developer s Guide IBM Tioli Priacy Manager for e-business Monitor Deeloper s Guide Version 1.1 SC23-4790-00 IBM Tioli Priacy Manager for e-business Monitor Deeloper s Guide Version 1.1 SC23-4790-00 Note: Before using this

More information

The NLM Medical Text Indexer System for Indexing Biomedical Literature

The NLM Medical Text Indexer System for Indexing Biomedical Literature The NLM Medical Text Indexer System for Indexing Biomedical Literature James G. Mork 1, Antonio J. Jimeno Yepes 2,1, Alan R. Aronson 1 1 National Library of Medicine, Bethesda, MD, USA {mork,alan}@nlm.nih.gov

More information

Improving Interoperability of Text Mining Tools with BioC

Improving Interoperability of Text Mining Tools with BioC Improving Interoperability of Text Mining Tools with BioC Ritu Khare, Chih-Hsuan Wei, Yuqing Mao, Robert Leaman, Zhiyong Lu * National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda,

More information

The IEEE Metadata Standard for Supporting Big Data Management

The IEEE Metadata Standard for Supporting Big Data Management The IEEE Metadata Standard for Supporting Big Data Management Alex MH Kuo 1,2 (Ph.D) 1 School of Health Information Science University of Victoria, BC, Canada. 2 CEDAR, School of Medicine University of

More information

Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies

Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harold R. Solbrig 1, Guoqian Jiang 1 1 Mayo Clinic College of Medicine, Rochester, MN [solbrig.harold,

More information

State Stats Over 3,000 data measures covering all 50 states and the District of Columbia.

State Stats Over 3,000 data measures covering all 50 states and the District of Columbia. INTRODUCTION SAGE Stats includes oer thirty years of data from more than 100 sources. It features more than 6,000 unique data measures. The measures are split into unique collections. Right no, SAGE Stats

More information

Searching the Evidence in PubMed

Searching the Evidence in PubMed CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Supporting Literature Searching Searching the Evidence in PubMed July 2017 Supporting Literature Searching Searching the Evidence in PubMed How to access PubMed

More information

CDASH MODEL 1.0 AND CDASHIG 2.0. Kathleen Mellars Special Thanks to the CDASH Model and CDASHIG Teams

CDASH MODEL 1.0 AND CDASHIG 2.0. Kathleen Mellars Special Thanks to the CDASH Model and CDASHIG Teams CDASH MODEL 1.0 AND CDASHIG 2.0 Kathleen Mellars Special Thanks to the CDASH Model and CDASHIG Teams 1 What is CDASH? Clinical Data Acquisition Standards Harmonization (CDASH) Standards for the collection

More information

IBM InfoSphere MDM Enterprise Viewer User's Guide

IBM InfoSphere MDM Enterprise Viewer User's Guide IBM InfoSphere Master Data Management Version 11 IBM InfoSphere MDM Enterprise Viewer User's Guide GI13-2661-00 IBM InfoSphere Master Data Management Version 11 IBM InfoSphere MDM Enterprise Viewer User's

More information

IBM Unica Distributed Marketing Version 8 Release 6 May 25, Field Marketer's Guide

IBM Unica Distributed Marketing Version 8 Release 6 May 25, Field Marketer's Guide IBM Unica Distributed Marketing Version 8 Release 6 May 25, 2012 Field Marketer's Guide Note Before using this information and the product it supports, read the information in Notices on page 83. This

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

SEEK User Manual. Introduction

SEEK User Manual. Introduction SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.

More information

A System for Ontology-Based Annotation of Biomedical Data

A System for Ontology-Based Annotation of Biomedical Data A System for Ontology-Based Annotation of Biomedical Data Clement Jonquet, Mark A. Musen, and Nigam Shah Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Medical

More information

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Taking a view on bio-ontologies Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Who we are European Bioinformatics Institute one of world s largest bio data and service providers

More information

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered. Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything

More information

Enriching Knowledge Domain Visualizations: Analysis of a Record Linkage and Information Fusion Approach to Citation Data

Enriching Knowledge Domain Visualizations: Analysis of a Record Linkage and Information Fusion Approach to Citation Data Enriching Knowledge Domain Visualizations: Analysis of a Record Linkage and Information Fusion Approach to Citation Data Marie B. Synnestvedt, MSEd 1, 2 1 Drexel University College of Information Science

More information

TEXT MINING: THE NEXT DATA FRONTIER

TEXT MINING: THE NEXT DATA FRONTIER TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach Dr. Petr Knoth CORE (core.ac.uk) Knowledge Media institute, The Open University United Kingdom 2 OpenMinTeD Establish an open and sustainable

More information

SciMiner User s Manual

SciMiner User s Manual SciMiner User s Manual Copyright 2008 Junguk Hur. All rights reserved. Bioinformatics Program University of Michigan Ann Arbor, MI 48109, USA Email: juhur@umich.edu Homepage: http://jdrf.neurology.med.umich.edu/sciminer/

More information

Internet Information Server User s Guide

Internet Information Server User s Guide IBM Tioli Monitoring for Web Infrastructure Internet Information Serer User s Guide Version 5.1.0 SH19-4573-00 IBM Tioli Monitoring for Web Infrastructure Internet Information Serer User s Guide Version

More information

NCBI News, November 2009

NCBI News, November 2009 Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved

More information

Exploring the Query Expansion Methods for Concept Based Representation

Exploring the Query Expansion Methods for Concept Based Representation Exploring the Query Expansion Methods for Concept Based Representation Yue Wang and Hui Fang Department of Electrical and Computer Engineering University of Delaware 140 Evans Hall, Newark, Delaware, 19716,

More information

A Technical Introduction to the Semantic Search Engine SeMedico

A Technical Introduction to the Semantic Search Engine SeMedico Talk in the Semesterprojekt Entwicklung einer Suchmaschine für Alternativmethoden zu Tierversuchen January 12, 2018 Humboldt-Universität zu Berlin A Technical Introduction to the Semantic Search Engine

More information

Renae Barger, Executive Director NN/LM Middle Atlantic Region

Renae Barger, Executive Director NN/LM Middle Atlantic Region Renae Barger, Executive Director NN/LM Middle Atlantic Region rbarger@pitt.edu http://nnlm.gov/mar/ DANJ Meeting, November 4, 2011 Advanced PubMed (20 min) General Information PubMed Citation Types Automatic

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

CTSA Program Common Metric for Informatics Solutions

CTSA Program Common Metric for Informatics Solutions CTSA Program Common Metric for Informatics Solutions KRISTI HOLMES, PHD DIRECTOR OF EVALUATION, NUCATS DIRECTOR, GALTER HEALTH SCIENCES LIBRARY & LEARNING CENTER NORTHWESTERN UNIVERSITY CTSA PROGRAM STEERING

More information

Guide to Managing Common Metadata

Guide to Managing Common Metadata IBM InfoSphere Information Serer Version 11 Release 3 Guide to Managing Common Metadata SC19-4297-01 IBM InfoSphere Information Serer Version 11 Release 3 Guide to Managing Common Metadata SC19-4297-01

More information

IBM Agent Builder Version User's Guide IBM SC

IBM Agent Builder Version User's Guide IBM SC IBM Agent Builder Version 6.3.5 User's Guide IBM SC32-1921-17 IBM Agent Builder Version 6.3.5 User's Guide IBM SC32-1921-17 Note Before you use this information and the product it supports, read the information

More information

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

January 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby:

January 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby: Dr. Joe V. Selby, MD, MPH Executive Director Patient-Centered Outcomes Research Institute 1828 L Street, NW, Suite 900 Washington, DC 20036 Submitted electronically at: http://www.pcori.org/webform/data-access-and-data-sharing-policypublic-comment

More information

Making data publication a first class research output

Making data publication a first class research output Making data publication a first class research output Andrew L. Hufton Managing Editor, Scientific Data https://www.nature.com/sdata/ Helping Researchers Publish, University of Cambridge, Oct 2017 Launched

More information

Multi-Backpropagation Network In Medical Diagnosis

Multi-Backpropagation Network In Medical Diagnosis Multi-Bacpropagation Netor In Medical Diagnosis Wan Hussain Wan Isha, Fadilah Sira, Abu Talib Othman School of Information Technology, Uniersiti Utara Malaysia, 0600 Sinto, Kedah, MALAYSIA Email: {hussain;

More information

Quick Reference Guide. Biomedical Answers

Quick Reference Guide. Biomedical Answers Quick Reference Guide Biomedical Answers www.embase.com .... 3 - Homepage... 4.... 5 - Search Forms... 6 - Refine... 8 - Using Emtree... 9 3.... - Reviewing Records... - Preview Abstracts and Index Terms...

More information

Mass Spec Data Post-Processing Software. ClinProTools. Wayne Xu, Ph.D. Supercomputing Institute Phone: Help:

Mass Spec Data Post-Processing Software. ClinProTools. Wayne Xu, Ph.D. Supercomputing Institute   Phone: Help: Mass Spec Data Post-Processing Software ClinProTools Presenter: Wayne Xu, Ph.D Supercomputing Institute Email: Phone: Help: wxu@msi.umn.edu (612) 624-1447 help@msi.umn.edu (612) 626-0802 Aug. 24,Thur.

More information

Managing your metadata efficiently - a structured way to organise and frontload your analysis and submission data

Managing your metadata efficiently - a structured way to organise and frontload your analysis and submission data Paper TS06 Managing your metadata efficiently - a structured way to organise and frontload your analysis and submission data Kirsten Walther Langendorf, Novo Nordisk A/S, Copenhagen, Denmark Mikkel Traun,

More information

Web of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION

Web of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION Web of Science EXTERNAL RELEASE DOCUMENTATION Platform Release 5.27 Nina Chang Product Release Date: December 10, 2017 Document Version: 1.0 Date of issue: December 7, 2017 RELEASE OVERVIEW The following

More information

Tivoli IBM Tivoli Advanced Catalog Management for z/os

Tivoli IBM Tivoli Advanced Catalog Management for z/os Tioli IBM Tioli Adanced Catalog Management for z/os Version 2.2.0 Monitoring Agent User s Guide SC23-9818-00 Tioli IBM Tioli Adanced Catalog Management for z/os Version 2.2.0 Monitoring Agent User s Guide

More information

Funding from the Robert Wood Johnson Foundation s Public Health Services & Systems Research Program (grant ID #71597 to Martin and Birkhead)

Funding from the Robert Wood Johnson Foundation s Public Health Services & Systems Research Program (grant ID #71597 to Martin and Birkhead) 1 Funding from the Robert Wood Johnson Foundation s Public Health Services & Systems Research Program (grant ID #71597 to Martin and Birkhead) Coauthors: Gus Birkhead, Natalie Helbig, Jennie Law, Weijia

More information

Using Relations for Identification and Normalization of Disorders: Team CLEAR in the ShARe/CLEF 2013 ehealth Evaluation Lab

Using Relations for Identification and Normalization of Disorders: Team CLEAR in the ShARe/CLEF 2013 ehealth Evaluation Lab Using Relations for Identification and Normalization of Disorders: Team CLEAR in the ShARe/CLEF 2013 ehealth Evaluation Lab James Gung University of Colorado, Department of Computer Science Boulder, CO

More information

IBM Tivoli Storage Manager Version Optimizing Performance IBM

IBM Tivoli Storage Manager Version Optimizing Performance IBM IBM Tioli Storage Manager Version 7.1.6 Optimizing Performance IBM IBM Tioli Storage Manager Version 7.1.6 Optimizing Performance IBM Note: Before you use this information and the product it supports,

More information

IBM Netcool Operations Insight Version 1 Release 4.1. Integration Guide IBM SC

IBM Netcool Operations Insight Version 1 Release 4.1. Integration Guide IBM SC IBM Netcool Operations Insight Version 1 Release 4.1 Integration Guide IBM SC27-8601-08 Note Before using this information and the product it supports, read the information in Notices on page 403. This

More information

Biomedical literature mining for knowledge discovery

Biomedical literature mining for knowledge discovery Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in

More information

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL Shuguang Wang Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA swang@cs.pitt.edu Shyam Visweswaran Department of Biomedical

More information

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs Tools and Infrastructure for Supporting Enterprise Knowledge Graphs Sumit Bhatia, Nidhi Rajshree, Anshu Jain, and Nitish Aggarwal IBM Research sumitbhatia@in.ibm.com, {nidhi.rajshree,anshu.n.jain}@us.ibm.com,nitish.aggarwal@ibm.com

More information

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar ClinVar What is ClinVar ClinVar is a freely available, central archive for associating observed variation with supporting clinical and experimental evidence for a wide range of disorders. The database

More information

Retrieval of Highly Related Documents Containing Gene-Disease Association

Retrieval of Highly Related Documents Containing Gene-Disease Association Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,

More information

DB2 Universal Database for z/os

DB2 Universal Database for z/os DB2 Uniersal Database for z/os Version 8 What s New? GC18-7428-02 DB2 Uniersal Database for z/os Version 8 What s New? GC18-7428-02 Note Before using this information and the product it supports, be sure

More information

iplanetwebserveruser sguide

iplanetwebserveruser sguide IBM Tioli Monitoring for Web Infrastructure iplanetwebsereruser sguide Version 5.1.0 SH19-4574-00 IBM Tioli Monitoring for Web Infrastructure iplanetwebsereruser sguide Version 5.1.0 SH19-4574-00 Note

More information