Core Technology Development Team Meeting

Size: px
Start display at page:

Download "Core Technology Development Team Meeting"

Transcription

1 Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: Access Code: For international call in numbers, please visit:

2 Agenda Updates on action items DataMed Evaluation LinkOut update Inclusion of more repositories into DataMed : plan and course of action DataMed v1.5 release before BD2K AHM Updates from all team members Supported by the NIH grant 1U24 AI to the University of California, San Diego 2

3 Updates- action items Generate a HELP and FAQ page please start adding material here ASAP Robust server to host biocaddie to be set up ongoing work by Jeff and Claudiu Feedback on video Pilot Project Integration Supported by the NIH grant 1U24 AI to the University of California, San Diego 3

4 Evaluation on Benchmark Datasets Xiaoling Chen

5 Benchmark Datasets Datasets: (before V0.5) Repositories: 20 Test queries: 15 Index of benchmark dataset

6 Example of query Query 4: Find all data types related to inflammation during oxidative stress in human hepatic cells across all databases Keywords Query: inflammation oxidative stress human hepatic cells Expanded Query: inflammation oxidative stress human hepatic cells Chronic inflammatory reaction morphologic abnormality cell infiltration arthritis disorder Oxidative Stresses Human Homo sapiens organism Man Tympanic cells set Cellulae tympanicae Mother Cell Stem cell Unit Colony Forming Progenitor

7 Queries Natural query 1 Find protein sequencing data related to bacterial chemotaxis across all databases 2 Search for data of all types related to MIP-2 gene related to biliary atresia across all databases 3 Search for all data types related to gene TP53INP1 in relation to p53 activation across all databases 4 Find all data types related to inflammation during oxidative stress in human hepatic cells across all databases 5 Search for gene expression and genetic deletion data that mention CD69 in memory augmentation studies across all databases 6 Search for data of all types related to the LDLR gene related to cardiovascular disease across all databases 7 Search for gene expression datasets on photo transduction and regulation of calcium in blind D. melanogaster 8 Search for proteomic data related to regulation of calcium in blind D. melanogaster keyword Protein sequencing bacterial chemotaxis Mip-2 biliary atresia TP53INP1 p53 activation inflammation oxidative stress human hepatic cells CD69 memory augmentation LDLR gene cardiovascular disease photo transduction regulation calcium blind D melanogaster proteomic data regulation calcium blind Drosophila melanogaster

8 Queries Natural query 9 Search for data of all types related to the ob gene in obese M. musculus across all databases 10 Search for data of all types related to energy metabolism in obese M. musculus 11 Search for all data for the HTT gene related to Huntingtonís disease across all databases 12 Search for data on neural brain tissue in transgenic mice related to Huntingtonís disease 13 Search for all data on the SNCA gene related to Parkinsonís disease across all databases 14 Search for data on nerve cells in the substantia nigra in mice across all keyword ob gene obese Mus musculus energy metabolism obese Mus musculus HTT gene Huntington disease neural brain tissue Huntington disease transgenic mice SNCA gene Parkinson Disease nerve cells substantia nigra mice

9 Which records are annotated? For each query, query in two versions, keywords version and expansion query in four search engines (Lucence, Indri, Terrier, SemanticVectors) Get first 300 results from each search engine, combine and delete the duplicated ones. For each query, maximum 2400 (300*4*2) records are annotated.

10 Gold Standard (Annotated results) id annotated relevant partial relevant not relevant

11 Trec_eval tool A standard tool used by the TREC community for evaluating an ad hoc retrieval run, given the results file and a standard of judged results. Metrics: Name infap infndcg ip@n Prec@rec 11 points Explanation inferred Average Precision. A commonly used measure by information retrieval community (based on random sampling) Inferred normalized discounted cumulated gain. A commonly used measure that incorporates graded relevance judgments. Precision after N retrieved Interpolated recall-precision averages at n recall n = [0.00, ,0.30,0.40,0.50,0.60,0.70,0.80,0.90,1.

12 Search in _all field using default TF-IDF algorithm in ES, retrieve first 300 records query infap infndcg all TREC 2014 CDS track 30 topics Top Top infap infndcg P@

13 Different Search engines Search Engine (basic query) infap infndcg ElasticSearch Lucene Terrier Indri Semantic Search Engine (expanded query) infap infndcg ElasticSearch Lucene Terrier Indri Semantic

14 Fields 187 unique fields in the benchmark dataset Run search on each field, w 11 fields is in integer or date format, cannot be searched using terms (METADATA.FemaleNum, METADATA.dataItem.releaseDate, METADATA.dataItem.depositionDate, etc) w 68 fields does not return results (METADATA.organization.homePage, METADATA.dataset.dateAccession, METADATA.internal.rank, METADATA.datastandard.license, etc) w 108 fields return results

15 Search on important fields Search field infap infndcg TITLE description Mulit_fields (Title and description) Multi_match, cross_fields Mulit_fields (Title and description) Multi_match,cross_fields,title^ _all _all (concatenate only title and description) _all (concatenate Special _all the field 108 fields that returned Combines results) the original values from each field as a string. The distinction between field lengths disappears in the _all field. The shorter the field, the more important.

16 Next steps Try different similarity algorithms in ES (BM25, LMD, DFR, etc) Explore different parameters in the query. Try boost in the query components and query fields. Expand synonyms using NLP server Try different relationship between synonyms

17 LinkOut Update - Databases available for linking

18 LinkOut update Meeting with Kathy Kwan Pubmed to be the first try Kathy sent us provider ID & ftp site We will provide sample data Supported by the NIH grant 1U24 AI to the University of California, San Diego 18

19 Inclusion of more repositories into DataMed : plan and course of action DK3ParoPWewow5lmqlGJKQc0/edit#gid=0 0BsVdI_tddAXZrXS5eQc8K8/edit#gid= Each site should get at least 2 repos mapped/team per week Progress to be reported every CDT meeting with plans for next set of repos for mapping Supported by the NIH grant 1U24 AI to the University of California, San Diego 19

20 DATS 2.1 Mapping UCSD - DBMI UTHealth UCSD CRBS Recently Completed LSDB, NDAR, HMP, American Gut Project (in EBI), IRD-JCVI, GeneNetwork Retina, EMDB, BMRB, TCGA, NBIA, Epigenomics, RGD, ClinVar, Vectorbase, IntAct (10) YPED, Uniprot - Swisprot, CIL, NURSA, ICPSR, Neuromorpho, openfmri, NIDDK CR, Physiobank, CIA-Datacite, ICPSR, PeptideAtlas, CVRG, Gemma, GEO, ArrayExpress, CTN, LINCS,PDB, NeuroVault Results, NeuroVault Atlas, NeuroVault Collections, MPD, ProteomeXChange, NITRC, ASCB Cell Image Library (25) Currently Working On Nature Scientific SRA, Dryad, Clinical Trials, dbgap, Phenogen Informatics, Human Proteinpedia (6) Uniprot Trembl, bioproject, Datacite Biomedical Repositories (30 selected based on content) (33) Up Next Waiting for new assignment EuPathDB, Diabetic Retinopathy Clinical Research Network, Diabetes Research in Children Network, Candida Genome Database (4) EU Clinical Trials Network, OmicsDI, (4) Waiting on repository response SimTK, EuPathDB Mendeley Data (OAuth connection), IMEx (waiting for source data feed), ImmPort (Contacted source for feed info), Cancer Nanotechnology Laboratory portal (Contacted source) Supported by the NIH grant 1U24 AI to the University of California, San Diego

21 DataMed Ingestion UCSD - DBMI UTHealth UCSD CRBS Recently Completed LSDB, GeneNetwork Retina, EMDB, Epigenomics, ClinVar, BMRB, TCGA (4) YPED, Uniprot - Swisprot, CIL, NURSA, ICPSR, Neuromorpho, openfmri, NIDDK CR, Physiobank, CIA-Datacite, ICPSR, PeptideAtlas, CVRG, Gemma, GEO, ArrayExpress, CTN, LINCS,PDB, NeuroVault Results, NeuroVault Atlas, NeuroVault Collections, MPD, ProteomeXChange, NITRC, ASCB Cell Image Library (25) Currently Working On AmericanGut, (EBI), HMP, NDAR NSRR, RGD, Vectorbase, IntAct (4) Uniprot Trembl, bioproject, Datacite Biomedical Repositories (30 selected based on content) (33) Up Next IRD (JCVI), NatureScientific Phenogen Informatics, Human Proteinpedia, Diabetic Retinopathy Clinical Research Network, Diabetes Research in Children Network, Candida Genome Database (5) EU Clinical Trials Network, OmicsDI (2) Waiting on repository response SimTK, EuPathDB Mendeley Data (OAuth connection), IMEx (waiting for source data feed), ImmPort (Contacted source for feed info), Cancer Nanotechnology Laboratory portal (Contacted source) Supported by the NIH grant 1U24 AI to the University of California, San Diego

22 DataMed release DataMed v1.5 release before BD2K AHM: w Increased number of repositories mapped to DATS 2.1 : target ~ 40 repos mapped to DATS 2.1 w Additional functions sorting Visualization user activity tracking NLP at backend UI functionality needs testing before release November 17 th Release on November 22 nd Supported by the NIH grant 1U24 AI to the University of California, San Diego 22

23 Github Issues Total Issues 158 Number Open 58 Number Closed 100 Associated with v1.0 Number Open 12 Number Closed 8 Usability Issues Number Open 23 Number Closed 10 Associated with v0.5 Number Open 23 Number Closed 63 Number of Bugs Number Open 5 Number Closed 12 Number of Enhancements Number Open 21 Number Closed 28 Number of Questions Number Open 9 Number Closed 11 Number of Help Wanted Number Open 3 Number Closed 0 Supported by the NIH grant 1U24 AI to the University of California, San Diego 23

24 Ongoing work Task Supported by the NIH grant 1U24 AI to the University of California, San Diego Status 1 Metadata Ingestion 1.1 Import repositories expansion Ongoing 1.2 Data repository suggestion form at DataMed George/Xiaoling / Sanda Ongoing 1.3 Metadata mapping review/ reconciliation between curators Ongoing 1.4 Metadata management Ongoing 1.5 Indexing Ongoing 1.6 NLP-based indexing : Gene/protein, Disease, Drug/chemical, Biological process, Organism, Format, Implemented at backend Access, Cell types 1.7 Bulk download of indices Not Started 2 Terminology server 2.3 Integrate terminology server (Indexing) Ongoing 4 Interface Design 4.2 Design interface usability issues Ongoing 4.5 Display most Accessed Datasets Not Started 24

25 Ongoing work Task Status 5 Personalized search 5.1 Improve the tracking system Ongoing 6 Searching/Ranking algorithms 6.1 Similar datasets to be expanded Ongoing 7 Display of results 7.1 Sort datasets author, published date, repository, title Ongoing 7.2 What fields should be displayed? Ongoing Additional filters: File type Data Restrictions (data use agreement, restricted, unrestricted) Data Level (participant/aggregate) 7.3 Population (mouse, human, etc) 8 Link to external resources 1. Pubmed: click through to pubmed records of citing publications: copy citation to clipboard Scholix Framework for Linking Data and Literature 3. Linkout Not started Not Started Supported by the NIH grant 1U24 AI to the University of California, San Diego 25

26 Ongoing work Task Status 10 Documentation 10.1 Source code Ongoing 10.2 Tutorials Not Started 10.3 Help menu Ongoing 10.4 Video Ongoing 11 Usability studies 11.2 User studies Ongoing Data Duplication issue: Create a plan for how to best display/represent the duplicate in the metadata records and set up a meeting to discuss the workflow for displaying the duplicates in the metadata records Jeff/Anu Additional field in index Generation of benchmark for the dataset Completed 14 Relationship Network Graph 15 Collaborative research support Supported by the NIH grant 1U24 AI to the University of California, San Diego 26

27 Other issues Please deposit codes in GitHub. Please contact me at if you need access hp Any other issues? Thank You

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Metadata Ingestion and Processinng

Metadata Ingestion and Processinng biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please isit: https://www.readytalk.com/account-administration/international-numbers

More information

Multi-field query expansion is effective for biomedical dataset retrieval

Multi-field query expansion is effective for biomedical dataset retrieval Database, 2017, 1 20 doi: 10.1093/database/bax062 Original article Original article Multi-field query expansion is effective for biomedical dataset retrieval Mohamed Reda Bouadjenek* and Karin Verspoor

More information

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Agenda Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Progress and updates Y1Q3 and plans for Y1Q4 Plan for the

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University

More information

Minutes. Date: Location: UCSD BRF2 5A03. Attendees Present

Minutes. Date: Location: UCSD BRF2 5A03. Attendees Present Executive Committee Meeting Location: UCSD BRF2 5A03 Date: 8-16-16 Start time: 10:00 am PDT End time: 11:30 am PDT Meeting Objective Attendees Present Minute Taker Executive Committee Meeting UCSD: Lucila

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

eveloping DataMed the current status

eveloping DataMed the current status eeloping DataMed the current status Hua Xu Core Deelopment Team (CDT) biocaddie AHM 2017 8/8/17 Supported by the NIH grant 1U24 AI117966-01 to the Uniersity of California, San Diego 1 Outline CDT Roles

More information

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword

More information

Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform

Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform biocaddie All Hands Meeting September 11 th, 2016 Ram Gouripeddi & Julio Facelli Department

More information

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as

More information

NCBI News, November 2009

NCBI News, November 2009 Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved

More information

Tutorial:OverRepresentation - OpenTutorials

Tutorial:OverRepresentation - OpenTutorials Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)

More information

Alternative Tools for Mining The Biomedical Literature

Alternative Tools for Mining The Biomedical Literature Yale University From the SelectedWorks of Rolando Garcia-Milian May 14, 2014 Alternative Tools for Mining The Biomedical Literature Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/1/

More information

Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies

Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harold R. Solbrig 1, Guoqian Jiang 1 1 Mayo Clinic College of Medicine, Rochester, MN [solbrig.harold,

More information

Mouse BIRN Data Integration. Maryann Martone Mouse All Hands Meeting

Mouse BIRN Data Integration. Maryann Martone Mouse All Hands Meeting Mouse BIRN Data Integration Maryann Martone 2005 Mouse All Hands Meeting Specific Aims Specific Aim 1: Data Access and Management Continue development of multi-scale databases along existing lines extending

More information

Presenter: Payam Karisani

Presenter: Payam Karisani Presenter: Payam Karisani Team members: Payam Karisani, CS Ph.D. Student (Team lead) Eugene Agichtein, Associate Professor/Advisor Intelligent Information Access Laboratory (IR Lab) Computer Science &

More information

Prototyping a Biomedical Ontology Recommender Service

Prototyping a Biomedical Ontology Recommender Service Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the

More information

Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E.

Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E. Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework Maryann E. Martone University of California, San Diego What does this mean? 3D Volumes

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

Minimal Metadata Standards and MIIDI Reports

Minimal Metadata Standards and MIIDI Reports Dryad-UK Workshop Wolfson College, Oxford 12 September 2011 Minimal Metadata Standards and MIIDI Reports David Shotton, Silvio Peroni and Tanya Gray Image BioInformatics Research Group Department of Zoology

More information

Biomedical literature mining for knowledge discovery

Biomedical literature mining for knowledge discovery Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in

More information

Tools for Researchers

Tools for Researchers University of Miami Scholarly Repository Faculty Research, Publications, and Presentations Department of Health Informatics 1-1-2015 Tools for Researchers Carmen Bou-Crick M.S.L.S. University of Miami,

More information

A Data Citation Roadmap for Scholarly Data Repositories

A Data Citation Roadmap for Scholarly Data Repositories A Data Citation Roadmap for Scholarly Data Repositories Tim Clark (Harvard Medical School & Massachusetts General Hospital) Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science,

More information

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of Korea wakeup06@empas.com, jinchoi@snu.ac.kr

More information

Measuring inter-annotator agreement in GO annotations

Measuring inter-annotator agreement in GO annotations Measuring inter-annotator agreement in GO annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns ns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

More information

The CALBC RDF Triple store: retrieval over large literature content

The CALBC RDF Triple store: retrieval over large literature content The CALBC RDF Triple store: retrieval over large literature content Samuel Croset, Christoph Grabmüller, Chen Li, Silverstras Kavaliauskas, Dietrich Rebholz-Schuhmann croset@ebi.ac.uk 10 th December 2010,

More information

Query Reformulation for Clinical Decision Support Search

Query Reformulation for Clinical Decision Support Search Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University

More information

Overview. TREC Genomics Track Plenary. The central dogma of biology. At the intersection of digital biology and IR. Overview of this session

Overview. TREC Genomics Track Plenary. The central dogma of biology. At the intersection of digital biology and IR. Overview of this session Overview TREC Genomics Track Plenary William Hersh Track Chair Oregon Health & Science University hersh@ohsu.edu http://medir.ohsu.edu/~genomics Introductory comments Track history 2003 track Primary task

More information

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Taking a view on bio-ontologies Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Who we are European Bioinformatics Institute one of world s largest bio data and service providers

More information

The LAILAPS Search Engine - A Feature Model for Relevance Ranking in Life Science Databases

The LAILAPS Search Engine - A Feature Model for Relevance Ranking in Life Science Databases International Symposium on Integrative Bioinformatics 2010 The LAILAPS Search Engine - A Feature Model for Relevance Ranking in Life Science Databases M Lange, K Spies, C Colmsee, S Flemming, M Klapperstück,

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting Agenda v Updates regarding last meeting action items v Presentation by Ergin about Ontology Services v Brief updates from others Supported by the NIH grant 1U24

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017 Identification of Variants Using GATK November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Supplementary Note 1: Considerations About Data Integration

Supplementary Note 1: Considerations About Data Integration Supplementary Note 1: Considerations About Data Integration Considerations about curated data integration and inferred data integration mentha integrates high confidence interaction information curated

More information

The IEEE Metadata Standard for Supporting Big Data Management

The IEEE Metadata Standard for Supporting Big Data Management The IEEE Metadata Standard for Supporting Big Data Management Alex MH Kuo 1,2 (Ph.D) 1 School of Health Information Science University of Victoria, BC, Canada. 2 CEDAR, School of Medicine University of

More information

How to store and visualize RNA-seq data

How to store and visualize RNA-seq data How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq

More information

CACAO Training. Jim Hu and Suzi Aleksander Spring 2016

CACAO Training. Jim Hu and Suzi Aleksander Spring 2016 CACAO Training Jim Hu and Suzi Aleksander Spring 2016 1 What is CACAO? Community Assessment of Community Annotation with Ontologies (CACAO) Annotation of gene function Competition Within a class Between

More information

teachers A how-to guide for SLI 2015

teachers A how-to guide for SLI 2015 A how-to guide for teachers These materials are based upon work supported by the National Science Foundation under Grant Nos. IIS-1441561, IIS-1441471, & IIS-1441481. Any opinions, findings, and conclusions

More information

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit

More information

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline Relevance Feedback and Query Reformulation Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price IR on the Internet, Spring 2010 1 Outline Query reformulation Sources of relevance

More information

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company Use of Semantic Technologies at Eli Lilly and Company J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company Notable Semantic Projects at Lilly Discovery Metadata Integration

More information

Software review. Biomolecular Interaction Network Database

Software review. Biomolecular Interaction Network Database Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction

More information

Semantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September

Semantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September Semantic Scholar ICSTI Towards a More Efficient Review of Research Literature 11 September 2018 Allen Institute for Artificial Intelligence (https://allenai.org/) Non-profit Research Institute in Seattle,

More information

Exploring the Query Expansion Methods for Concept Based Representation

Exploring the Query Expansion Methods for Concept Based Representation Exploring the Query Expansion Methods for Concept Based Representation Yue Wang and Hui Fang Department of Electrical and Computer Engineering University of Delaware 140 Evans Hall, Newark, Delaware, 19716,

More information

A System for Ontology-Based Annotation of Biomedical Data

A System for Ontology-Based Annotation of Biomedical Data A System for Ontology-Based Annotation of Biomedical Data Clement Jonquet, Mark A. Musen, and Nigam Shah Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Medical

More information

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using

More information

Facilitating Semantic Alignment of EBI Resources

Facilitating Semantic Alignment of EBI Resources Facilitating Semantic Alignment of EBI Resources 17 th March, 2017 Tony Burdett Technical Co-ordinator Samples, Phenotypes and Ontologies Team www.ebi.ac.uk What is EMBL-EBI? Europe s home for biological

More information

Retrieval of Highly Related Documents Containing Gene-Disease Association

Retrieval of Highly Related Documents Containing Gene-Disease Association Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,

More information

warwick.ac.uk/lib-publications

warwick.ac.uk/lib-publications Original citation: Zhao, Lei, Lim Choi Keung, Sarah Niukyun and Arvanitis, Theodoros N. (2016) A BioPortalbased terminology service for health data interoperability. In: Unifying the Applications and Foundations

More information

NCI Thesaurus, managing towards an ontology

NCI Thesaurus, managing towards an ontology NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports

More information

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

Susanna-Assunta Sansone, PhD. Metadata WG3 chair. Susanna-Assunta Sansone, PhD Metadata WG3 chair 3-workgroup@biocaddie.org WG3 Metadata v v Full description: goals, synergies, phases, members & files Joint effort with BD2K Center for Expanded Data Annotation

More information

Omics Discovery Index Discovering and Linking Public Omics Datasets

Omics Discovery Index Discovering and Linking Public Omics Datasets Omics Discovery Index Discovering and Linking Public Omics Datasets Yasset Perez-Riverol a,,*, Mingze Bai a,b,, Felipe da Veiga Leprevost c, Silvano Squizzato a, Young Mi Park a, Kenneth Haug a, Adam J.

More information

The ELIXIR of Linked Data

The ELIXIR of Linked Data The ELIXIR of Linked Data Professor Carole Goble (UK node) Barend Mons (NL node), Helen Parkinson (EMBL-EBI node) The Interoperability Services Backbone Team European Life Sciences Infrastructure for Biological

More information

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University Update on Dataverse Image credit: David Bygott (CC-BY-NC-SA) 2014 Dryad-Dataverse Community Meeting Mercè Crosas, Elizabeth Quigley & Eleni Castro Data Science > IQSS > Harvard University Introduction

More information

CDIS Biomedical Data Commons

CDIS Biomedical Data Commons CDIS Biomedical Data Commons Computational Life Science Seminar Series October 18, 2017 Michael Fitzsimons Center for Data Intensive Science Agenda What is a Data Commons? Data Commons at CDIS NCI GDC

More information

UC San Diego UC San Diego Electronic Theses and Dissertations

UC San Diego UC San Diego Electronic Theses and Dissertations UC San Diego UC San Diego Electronic Theses and Dissertations Title Information Retrieval in Biomedical Research: From Articles to Datasets Permalink https://escholarship.org/uc/item/660390nr Author Wei,

More information

WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources

WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources Saeid Balaneshin-kordan, Alexander Kotov, and Railan Xisto Department

More information

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project Office) Linda Pikula (IODE GEMIM/NOAA Library) Data

More information

Searching the ENCODE Portal

Searching the ENCODE Portal Searching the ENCODE Portal Asia- Pacific Bioinforma

More information

Big Data in Translational Science

Big Data in Translational Science Big Data in Translational Science Albert Wang Associate Director, Translational R&D IT Bristol-Myers Squibb 2015 AAPS Annual Meeting Agenda Perspectives on Big Data Big Data in Translational R&D Selected

More information

Maximizing Public Data Sources for Sequencing and GWAS

Maximizing Public Data Sources for Sequencing and GWAS Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda

More information

This is the author s version of a work that was submitted/accepted for publication in the following source:

This is the author s version of a work that was submitted/accepted for publication in the following source: This is the author s version of a work that was submitted/accepted for publication in the following source: Koopman, Bevan, Bruza, Peter, Sitbon, Laurianne, & Lawley, Michael (2011) AEHRC & QUT at TREC

More information

TSRI, 400-S PubMed / MyNCBI

TSRI, 400-S PubMed / MyNCBI TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search

More information

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data ( Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (http://bioqueries.uma.es) María Jesús García-Godoy, Ismael Navas-Delgado, José Francisco Aldana Montes Computing

More information

Human Disease Models Tutorial

Human Disease Models Tutorial Mouse Genome Informatics www.informatics.jax.org The fundamental mission of the Mouse Genome Informatics resource is to facilitate the use of mouse as a model system for understanding human biology and

More information

Integrated Access to Biological Data. A use case

Integrated Access to Biological Data. A use case Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research

More information

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar ClinVar What is ClinVar ClinVar is a freely available, central archive for associating observed variation with supporting clinical and experimental evidence for a wide range of disorders. The database

More information

RLIMS-P Website Help Document

RLIMS-P Website Help Document RLIMS-P Website Help Document Table of Contents Introduction... 1 RLIMS-P architecture... 2 RLIMS-P interface... 2 Login...2 Input page...3 Results Page...4 Text Evidence/Curation Page...9 URL: http://annotation.dbi.udel.edu/text_mining/rlimsp2/

More information

Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval robabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval ayam Karisani, Emory University Zhaohui Qin, Emory University Eugene Agichtein, Emory University Journal Title:

More information

Applied Bioinformatics

Applied Bioinformatics Applied Bioinformatics Course Overview & Introduction to Linux Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu What is bioinformatics Bio Bioinformatics

More information

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad

More information

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond Alessia Bardi and Paolo Manghi, Institute of Information Science and Technologies CNR Katerina Iatropoulou, ATHENA, Iryna Kuchma and Gwen Franck, EIFL Pedro Príncipe, University of Minho OpenAIRE Fostering

More information

Ontrez Project Report National Center for Biomedical Ontology November, 2007

Ontrez Project Report National Center for Biomedical Ontology November, 2007 Ontrez Project Report National Center for Biomedical Ontology November, 2007 Executive summary Currently, genomics data and data repositories in the public domain are expanding at an explosive pace. 1

More information

dr.ir. D. Hiemstra dr. P.E. van der Vet

dr.ir. D. Hiemstra dr. P.E. van der Vet dr.ir. D. Hiemstra dr. P.E. van der Vet Abstract Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers

More information