Core Technology Development Team Meeting

Size: px
Start display at page:

Download "Core Technology Development Team Meeting"

Transcription

1 Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: Access Code: For international call in numbers, please visit:

2 Agenda Updates on action items Re-indexing run summary Workflow outline Tests and verifications procedure DataMed Evaluation Updates from all team members Supported by the NIH grant 1U24 AI to the University of California, San Diego 2

3 Updates- action items Generate a HELP and FAQ page please start adding material here ASAP; FAQ from review is being generatedhttps://docs.google.com/document/d/1sczm976jc8wn hyfvysokvwuucjojdg914lqaftraiqg/edit Complete re-indexing before with updated NLP pipeline Testing to begin on February 16 th Supported by the NIH grant 1U24 AI to the University of California, San Diego 3

4 Updates- Visualization Supplement The chrome plugin is complete They will present at the CDT meeting next week Supported by the NIH grant 1U24 AI to the University of California, San Diego 4

5 Overview of Re-Indexing January 2017 Supported by the NIH grant 1U24 AI to the University of California, San Diego

6 Overview of Process All current sources that were already indexed included in re-indexing run w 58 Repositories All sources run through in sequential order w Utilized 1 development node for the processing (AWS) Re-indexing process added new data w Did not fully transform data that was already indexed w Test of Supported updating by the NIH grant 1U24 AI functionality to the University of California, San Diego

7 Sources Supported by the NIH grant 1U24 AI to the University of California, San Diego

8 Overall Statistics Number of Sources: 58 Total Number of Metadata Records: 77,085,123 w Approximately 3X size of PubMed Total Size of Metadata: GB Average Record Size w Average: 0.89 KB w Median: 2.89 KB w Min: 0.57 KB Supported by the NIH grant 1U24 AI to the University of California, San Diego

9 Source Statistics (cont.) Minimum number of records: 1 Maximum number of records: 74,809,080 w Uniprot Trembl Connectors utilized: w Rsync w CSV w FTP w XML w OAI w WEB w ASPERA w Community Aggregator Supported by the NIH grant 1U24 AI to the University of California, San Diego

10 Timing Statistics Total Ingestion Time: 23 Hours, 35 Minutes w 77+ Million Records w Did not include data download time Ingestion rate: 224 documents per second Supported by the NIH grant 1U24 AI to the University of California, San Diego

11 Source Timing Trembl Supported by the NIH grant 1U24 AI to the University of California, San Diego

12 Lessons Learned For some sources the actual document retrieval takes as long as processing w Documents begin processing once downloaded so a document can finish processing by the time the next document is downloaded Current test ran with 1 ingestion and 1 processing node w Can enhance performance by running sources in parallel Improvements to management functions w Need to enhance and improve counting of records being processed (as they flow through the message queues) w Additional time checks can be added with improved counting Supported by the NIH grant 1U24 AI to the University of California, San Diego

13 Next Tests 1)Full update of all records (simulating update to transformations or update to ontology) 2)Full update with new NLP module Supported by the NIH grant 1U24 AI to the University of California, San Diego

14 Workflow of adding/updating a repository Elasticsearch endpoint Elasticsearch endpoint Elasticsearch endpoint Manually dump index to ES endpoint at UTHealth Manually dump updated index to ES endpoint at UCSD Jeff s team (biocaddie.scicrunch.io) UTHealth (129.***.**.121) 1. Manually update the mapping Add not_analyzed field Add format field to date 2. Update source code of the UI Internal biocaddie website UCSD (192.***.***.107) Public biocaddie website Update source code of the UI UTHealth (datamedbeta.biocaddie.org) UCSD (datamed.org)

15 Workflow of Re- Indexing (i.e. no change to Transformation) Elasticsearch endpoint Elasticsearch endpoint Elasticsearch endpoint Transfer date stamed index to ES endpoint at UCSD with same ES Mappings Do ES alias swap to make new index the default Curation Team (biocaddie.scicrunch.io) No- Go UCSD (192.***.***.107) 1. Run automated tests of index to verify index and record results 2. Dashboard provides reviewer with results and provides links for easy review 3. Reviewer makes go / no- go decision Go UCSD (192.***.***.107) Public biocaddie website Testing Computer UCSD (datamed.org) JFIuQAiq7nM/edit

16 Update the mapping

17 Python code to update the mapping New mapping

18 DataMed Search Evaluation Xiaoling Chen

19 Search results from 43 repositories in DataMed V1.5 were evaluated between 1/17/17-1/31/17. Pick one query for each repository. What are compared: w w w w Number of total datasets (Datamed vs primary website) Number of return results for the query(datamed vs primary website) Top 10 annotation(relevant vs not-relevant) Number of overlap in top 10 in Datamed and primary website.

20 Results Overview Repositories which return similar numbers (19) Repositories which return different numbers (9) Repositories which have granularity problem (7) Repositories which we cannot compare (8)

21 Repositories which return similar numbers # Repo query Datame d total included 1 NIDDKC R TRANSPLA NTATION Primary website total included Datamed results returned Primary results returned Datamed relevant in top 10 Primary website relevant in top NA CIL MCF10A NA Peptidea tlas Jurkat 76 NA RGD p NA CTN pain TCIA dcis 85 NA YPED alzheimer Open FMRI 9 NURS A Brain injury Breast cancer ICPSR voting Overlap in top 10

22 Repositories which return similar numbers (continued) No Repo query Datame d total include d 11 PDB 12 Bioproje ct 13 LINCS 14 Neuromo rphp lactamas e Primary website total include d Datame d results returne d Primar y websit e results returne d Datame d Relevan t in top 10 Primary website relevant in top COPD NA Sk-br-3 amygdal a 15 CVRG Heart failure 16 Proteomex change Kinase AND proteomi c 17 GEMMA Multiple sclerosis 18 Dryad 19 Neurovaultc ols epilepsy brain NA Overla p in top We may only index part of their dataset We ingest In data package level and data file level We index some users temporary collections Note

23 No Repo query Datame d Total include d 1 ClincalTri als 2 Physio bank 3 BMRB 4 Datave rse 5 Arrayexpr ess melanoma Repositories which return different numbers Primary website total included Datamed results returned Primary results returned Datam ed Releva nt in top 10 Primary website relevant in top 10 Overl ap in top Search algorithm is different. Datamed uses synonyms to search but clinicaltrials maybe not ECG 78 NA They also return results from physiotools, etc. Troponin c Blood pressure mda Clinvar Familial Hemophagoc ytic Lymphohisti ocytosis 7 Zenodo Cell proliferation 8 Morpho bank 9 Swisspr ot lemur Estrogen 1490 NA Their search function is confusing and show duplicate results Search algorithm is different Search algorithm is different ES treat mda-231 as two words, if search mda-231, return 18 and most of them are overlap NA The primary website has richer information Search algorithm is different ,and 958 in process Note We index projects and matrix data; the results return in primary website contain lemurs. But Datamed does not match lemur to lemurs. Need to expand via terminology server The primary website has richer information. We do not include protein, gene information

24 Repositories which have granularity problem No Repository Primary website Datamed 1 Vectorbase including multiple domains (genome, expression, ontology, proteome, etc) 2 Peerj Can only search publications 3 NITRCIR 14 Projects, 6845 Subjects, and 8285 Imaging Sessions. 4 RETINA Search retinas for cells or strata, no keyword search function. Contact repo for meta, repo has mysql database available for download 5 MPD only return QTL, granularity is different 6 GEO they index in dataset level and sample level 7 GND No keyword search for projects, can only search public datafiles 1445 datafiles we just index supporting data sets 1391 sessions 384 data we are ingesting in individual mouse level (235) We are ingesting in dataset level Index in project level

25 Repositories which we cannot compare No Repository Primary website Datamed 1 EMDB no keyword search function 2 Epigenomics 3 GDC no keyword search function no keyword search function 4 Neurovault:atlases no keyword search function 5 Neurovault:nidm 6 BGI no keyword search function It is in Chinese 7 DBGAP Our index is quite old and not in DATS2.1 8 SRA Our index is quite old and not in DATS2.1

26 Summary We evaluated search results from 43 individual repositories in DataMed and their primary website. Search results for 28 repositories were compared and annotated the top 10. w Datamed : 188 relevant / 193 =97.41% w Primary website: 201 relevant /218 =92.20% w Reasons for different numbers: Index scope is different(5 repositories) Search algorithm is different(4 repositories) Metadata in primary website is richer(2 repositories) 7 repositories have granularity problem. 8 repositories cannot be compared. What we can do to improve: w w Improve terminology server (e.g. lemur to lemurs ) Contact repositories for the granularity problem.

27 Github Issues Total Issues 239 Number Open 112 Number Closed 127 Supported by the NIH grant 1U24 AI to the University of California, San Diego 27

28 Ongoing work Task Status 1 Metadata Ingestion 1.1 Import repositories expansion Ongoing 1.2 Data repository suggestion form at DataMed George/Xiaoling / Sanda Ongoing 1.3 Metadata mapping review/ reconciliation between curators Ongoing 1.4 Metadata management Ongoing 1.5 Indexing Ongoing 1.6 NLP-based/terminology server indexing : Gene/protein, Disease, Drug/chemical, Biological Implemented at backend process, Organism, Format, Access, Cell types 1.7 Bulk download of indices Not Started 2 Interface Design 2.1 Design interface usability issues Ongoing 2.2 Display most Accessed Datasets Not Started Supported by the NIH grant 1U24 AI to the University of California, San Diego 28

29 Ongoing work Task Status 3 Personalized search 3.1 Improve the tracking system Ongoing 4 Searching/Ranking algorithms 4.1 Similar datasets to be expanded Ongoing 5 Display of results 5.1 Sort datasets author, published date, repository, title Ongoing 5.2 What fields should be displayed? Ongoing Additional filters: File type Data Restrictions (data use agreement, restricted, unrestricted) Data Level (participant/aggregate) 5.3 Population (mouse, human, etc) 6 Link to external resources 1. Pubmed: click through to pubmed records of citing publications: copy citation to clipboard Scholix Framework for Linking Data and Literature 3. Linkout Not started Not Started Implemented Supported by the NIH grant 1U24 AI to the University of California, San Diego 29

30 Ongoing work Task Status 7 Documentation 7.1 Source code Ongoing 7.2 Tutorials Not Started 7.3 Help menu Ongoing 7.4 Video Ongoing 8 Usability studies 8.1 User studies Ongoing Data Duplication issue: best display/represention of the duplicate in the metadata records and workflow for displaying the duplicates in the metadata records Jeff/Anu Additional field in index tagging the dataset as duplicate Display the dataset and list repos underneath 9 10 Relationship Network Graph 11 Collaborative research support Supported by the NIH grant 1U24 AI to the University of California, San Diego 30

31 Other issues Please deposit codes in GitHub. Please contact me at if you need access hp Any other issues? Thank You

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Metadata Ingestion and Processinng

Metadata Ingestion and Processinng biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Agenda Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Progress and updates Y1Q3 and plans for Y1Q4 Plan for the

More information

Executive Committee Meeting

Executive Committee Meeting Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Minutes. Date: Location: UCSD BRF2 5A03. Attendees Present

Minutes. Date: Location: UCSD BRF2 5A03. Attendees Present Executive Committee Meeting Location: UCSD BRF2 5A03 Date: 8-16-16 Start time: 10:00 am PDT End time: 11:30 am PDT Meeting Objective Attendees Present Minute Taker Executive Committee Meeting UCSD: Lucila

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please isit: https://www.readytalk.com/account-administration/international-numbers

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University

More information

eveloping DataMed the current status

eveloping DataMed the current status eeloping DataMed the current status Hua Xu Core Deelopment Team (CDT) biocaddie AHM 2017 8/8/17 Supported by the NIH grant 1U24 AI117966-01 to the Uniersity of California, San Diego 1 Outline CDT Roles

More information

Prototyping a Biomedical Ontology Recommender Service

Prototyping a Biomedical Ontology Recommender Service Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the

More information

Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform

Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform biocaddie All Hands Meeting September 11 th, 2016 Ram Gouripeddi & Julio Facelli Department

More information

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting Agenda v Updates regarding last meeting action items v Presentation by Ergin about Ontology Services v Brief updates from others Supported by the NIH grant 1U24

More information

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

Susanna-Assunta Sansone, PhD. Metadata WG3 chair. Susanna-Assunta Sansone, PhD Metadata WG3 chair 3-workgroup@biocaddie.org WG3 Metadata v v Full description: goals, synergies, phases, members & files Joint effort with BD2K Center for Expanded Data Annotation

More information

Database of Curated Mutations (DoCM) ournal/v13/n10/full/nmeth.4000.

Database of Curated Mutations (DoCM)     ournal/v13/n10/full/nmeth.4000. Database of Curated Mutations (DoCM) http://docm.genome.wustl.edu/ http://www.nature.com/nmeth/j ournal/v13/n10/full/nmeth.4000.h tml Home Page Information in DoCM DoCM uses many data sources to compile

More information

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as

More information

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword

More information

Maximizing Public Data Sources for Sequencing and GWAS

Maximizing Public Data Sources for Sequencing and GWAS Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda

More information

NCBI News, November 2009

NCBI News, November 2009 Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved

More information

Biobtree: A tool to search, map and visualize bioinformatics identifiers and special keywords [version 1; referees: awaiting peer review]

Biobtree: A tool to search, map and visualize bioinformatics identifiers and special keywords [version 1; referees: awaiting peer review] SOFTWARE TOOL ARTICLE Biobtree: A tool to search, map and visualize bioinformatics identifiers and special keywords [version 1; referees: awaiting peer review] Tamer Gur European Bioinformatics Institute,

More information

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier Linking data and publications the past, present, and future Dr. Hylke Koers, Head of Content Innovation, Elsevier BioCADDIE webinar January 8, 2015 Ease of access Open Access 2 The issue: data is important,

More information

DBpedia Data Processing and Integration Tasks in UnifiedViews

DBpedia Data Processing and Integration Tasks in UnifiedViews 1 DBpedia Data Processing and Integration Tasks in Tomas Knap Semantic Web Company Markus Freudenberg Leipzig University Kay Müller Leipzig University 2 Introduction Agenda, Team 3 Agenda Team & Goal An

More information

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar ClinVar What is ClinVar ClinVar is a freely available, central archive for associating observed variation with supporting clinical and experimental evidence for a wide range of disorders. The database

More information

@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha

@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha @Note2 tutorial Hugo Costa (hcosta@silicolife.com) Ruben Rodrigues (pg25227@alunos.uminho.pt) Miguel Rocha (mrocha@di.uminho.pt) 23-01-2018 The document presents a typical workflow using @Note2 platform

More information

Tania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007

Tania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007 Collaborative Ontology Development in Protégé Tania Tudorache Stanford University - Ontolog forum invited talk04. October 2007 Outline Introduction and Background Tools for collaborative knowledge development

More information

CDIS Biomedical Data Commons

CDIS Biomedical Data Commons CDIS Biomedical Data Commons Computational Life Science Seminar Series October 18, 2017 Michael Fitzsimons Center for Data Intensive Science Agenda What is a Data Commons? Data Commons at CDIS NCI GDC

More information

RLIMS-P Website Help Document

RLIMS-P Website Help Document RLIMS-P Website Help Document Table of Contents Introduction... 1 RLIMS-P architecture... 2 RLIMS-P interface... 2 Login...2 Input page...3 Results Page...4 Text Evidence/Curation Page...9 URL: http://annotation.dbi.udel.edu/text_mining/rlimsp2/

More information

Open Research Online The Open University s repository of research publications and other research outputs

Open Research Online The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs The Smart Book Recommender: An Ontology-Driven Application for Recommending Editorial Products

More information

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning

More information

BovineMine Documentation

BovineMine Documentation BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................

More information

Multi-field query expansion is effective for biomedical dataset retrieval

Multi-field query expansion is effective for biomedical dataset retrieval Database, 2017, 1 20 doi: 10.1093/database/bax062 Original article Original article Multi-field query expansion is effective for biomedical dataset retrieval Mohamed Reda Bouadjenek* and Karin Verspoor

More information

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were

More information

Minimal Metadata Standards and MIIDI Reports

Minimal Metadata Standards and MIIDI Reports Dryad-UK Workshop Wolfson College, Oxford 12 September 2011 Minimal Metadata Standards and MIIDI Reports David Shotton, Silvio Peroni and Tanya Gray Image BioInformatics Research Group Department of Zoology

More information

Enterprise Data Catalog for Microsoft Azure Tutorial

Enterprise Data Catalog for Microsoft Azure Tutorial Enterprise Data Catalog for Microsoft Azure Tutorial VERSION 10.2 JANUARY 2018 Page 1 of 45 Contents Tutorial Objectives... 4 Enterprise Data Catalog Overview... 5 Overview... 5 Objectives... 5 Enterprise

More information

Finding and Exporting Data. BioMart

Finding and Exporting Data. BioMart September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.

More information

A Data Citation Roadmap for Scholarly Data Repositories

A Data Citation Roadmap for Scholarly Data Repositories A Data Citation Roadmap for Scholarly Data Repositories Tim Clark (Harvard Medical School & Massachusetts General Hospital) Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science,

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

UniProt - The Universal Protein Resource

UniProt - The Universal Protein Resource UniProt - The Universal Protein Resource Claire O Donovan Pre-UniProt Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI; TrEMBL: created at the EBI in November 1996

More information

TSRI, 400-S PubMed / MyNCBI

TSRI, 400-S PubMed / MyNCBI TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search

More information

Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E.

Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E. Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework Maryann E. Martone University of California, San Diego What does this mean? 3D Volumes

More information

Interviewee 2 I work on various bioinformatics projects, mostly looking at database integration.

Interviewee 2 I work on various bioinformatics projects, mostly looking at database integration. Interview Transcript Interview reference: Biochemistry 10 Role: Post doc researchers from the same lab group Interview length: 1hr 13mins Format: Face to face Number of interviewees: 2 Questionnaire respondent?

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

Introduction to Systems Biology II: Lab

Introduction to Systems Biology II: Lab Introduction to Systems Biology II: Lab Amin Emad NIH BD2K KnowEnG Center of Excellence in Big Data Computing Carl R. Woese Institute for Genomic Biology Department of Computer Science University of Illinois

More information

User guide for GEM-TREND

User guide for GEM-TREND User guide for GEM-TREND 1. Requirements for Using GEM-TREND GEM-TREND is implemented as a java applet which can be run in most common browsers and has been test with Internet Explorer 7.0, Internet Explorer

More information

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team Reproducible & Transparent Computational Science with Galaxy Jeremy Goecks The Galaxy Team 1 Doing Good Science Previous talks: performing an analysis setting up and scaling Galaxy adding tools libraries

More information

Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation

Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Jurisica Lab, Ontario Cancer Institute http://ophid.utoronto.ca/navigator/ November 10, 2006 Contents 1 Introduction 2

More information

SEEK User Manual. Introduction

SEEK User Manual. Introduction SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.

More information

NCBO Technology: Powering semantically aware applications

NCBO Technology: Powering semantically aware applications JOURNAL OF BIOMEDICAL SEMANTICS PROCEEDINGS Open Access NCBO Technology: Powering semantically aware applications Patricia L Whetzel 1*, NCBO Team 1,2,3,4 From Bio-Ontologies 2012 Long Beach, CA, USA.

More information

Supplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform

Supplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform Supplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform 1. Exploring the IDR This current IDR web user interface (WUI) is based on the open source

More information

Relational Retrieval Using a Combination of Path-Constrained Random Walks

Relational Retrieval Using a Combination of Path-Constrained Random Walks Relational Retrieval Using a Combination of Path-Constrained Random Walks Ni Lao, William W. Cohen University 2010.9.22 Outline Relational Retrieval Problems Path-constrained random walks The need for

More information

Introduction to The Storage Resource Broker

Introduction to The Storage Resource Broker http://www.nesc.ac.uk/training http://www.ngs.ac.uk Introduction to The Storage Resource Broker http://www.pparc.ac.uk/ http://www.eu-egee.org/ Policy for re-use This presentation can be re-used for academic

More information

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond Alessia Bardi and Paolo Manghi, Institute of Information Science and Technologies CNR Katerina Iatropoulou, ATHENA, Iryna Kuchma and Gwen Franck, EIFL Pedro Príncipe, University of Minho OpenAIRE Fostering

More information

Semantic MediaWiki (SMW) for Scientific Literature Management

Semantic MediaWiki (SMW) for Scientific Literature Management Semantic MediaWiki (SMW) for Scientific Literature Management Bahar Sateli, René Witte Semantic Software Lab Department of Computer Science and Software Engineering Concordia University, Montréal SMWCon

More information

Heiðrun. Building DPLA s New Metadata Ingestion System. Mark A. Matienzo Digital Public Library of America

Heiðrun. Building DPLA s New Metadata Ingestion System. Mark A. Matienzo Digital Public Library of America Heiðrun Building DPLA s New Metadata Ingestion System Mark A. Matienzo Digital Public Library of America Metropolitan New York Library Council Annual Conference January 15, 2015 Outline 1.

More information

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017 ProQuest Dissertations and Theses Overview Austin McLean and Marlene Coles CGS Summer Workshop, July 2017 Agenda Dissertations and ProQuest Short form video Pilot Project 2 A mission that aligns with universities

More information

Curation of Large Scale EHR Data for Use with Biobank Samples

Curation of Large Scale EHR Data for Use with Biobank Samples Curation of Large Scale EHR Data for Use with Biobank Samples Global Biobank Week 14.9.2017 Session 6B: Biobanks and Electronic Health Records Henrik Edgren, CSO Conflicts of interest Employee of MediSapiens

More information

Managing CDISC version changes: how & when to implement? Presented by Lauren Shinaberry, Project Manager Business & Decision Life Sciences

Managing CDISC version changes: how & when to implement? Presented by Lauren Shinaberry, Project Manager Business & Decision Life Sciences 1 Managing CDISC version changes: how & when to implement? Presented by Lauren Shinaberry, Project Manager Business & Decision Life Sciences 2 Content Standards Technical Standards SDTM v1.1 SDTM IG v3.1.1

More information

Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies

Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harold R. Solbrig 1, Guoqian Jiang 1 1 Mayo Clinic College of Medicine, Rochester, MN [solbrig.harold,

More information

CODE AND DATA MANAGEMENT. Toni Rosati Lynn Yarmey

CODE AND DATA MANAGEMENT. Toni Rosati Lynn Yarmey CODE AND DATA MANAGEMENT Toni Rosati Lynn Yarmey Data Management is Important! Because Reproducibility is the foundation of science Journals are starting to require data deposit You want to get credit

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal Heinrich Widmann, DKRZ DI4R 2016, Krakow, 28 September 2016 www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020

More information

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Origin and Outcomes Currently funded through a Wellcome Trust Seed award Collaboration

More information

Building Software to Translate

Building Software to Translate Bridging Archival Standards: Building Software to Translate Metadata Between PDS3 & PDS4 Planetary Science Informatics and Data Analytics Conference St. Louis, MO -- April 25, 2018 Cristina M. De Cesare

More information

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue

More information

ELIXIR Human Data Use Case

ELIXIR Human Data Use Case ELIXIR Human Data Use Case Mikael Borg, ELIXIR Sweden ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559.

More information

Software review. Biomolecular Interaction Network Database

Software review. Biomolecular Interaction Network Database Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction

More information

Paolo Missier, Khalid Belhajjame, Jun Zhao, Carole Goble School of Computer Science The University of Manchester, UK

Paolo Missier, Khalid Belhajjame, Jun Zhao, Carole Goble School of Computer Science The University of Manchester, UK Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame, Jun Zhao, Carole Goble School of Computer Science The University of Manchester, UK Context

More information

funricegenes Comprehensive understanding and application of rice functional genes

funricegenes Comprehensive understanding and application of rice functional genes funricegenes Comprehensive understanding and application of rice functional genes Part I Display of information in this database as static web pages https://funricegenes.github.io/ At the homepage of our

More information

HymenopteraMine Documentation

HymenopteraMine Documentation HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................

More information

Analyzer of Bio-resource Citations. World Data Center of Microorganisms(WDCM)

Analyzer of Bio-resource Citations. World Data Center of Microorganisms(WDCM) Analyzer of Bio-resource Citations World Data Center of Microorganisms(WDCM) http://abc.wdcm.org/ Outlines Introduction of ABC Homepage and function of ABC Text mining for microorganism : classification,

More information

TSRI, 400-S PubMed / MyNCBI

TSRI, 400-S PubMed / MyNCBI TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search

More information

IUNI Web of Science Data Enclave 102

IUNI Web of Science Data Enclave 102 Enclave 102 Katy Börner and Robert Light Cyberinfrastructure for Network Science Center School of Informatics and Computing and IUNI Indiana University, USA Val Pentchev, Matt Hutchinson, and Benjamin

More information

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University Update on Dataverse Image credit: David Bygott (CC-BY-NC-SA) 2014 Dryad-Dataverse Community Meeting Mercè Crosas, Elizabeth Quigley & Eleni Castro Data Science > IQSS > Harvard University Introduction

More information

Automatic annotation in UniProtKB using UniRule, and Complete Proteomes. Wei Mun Chan

Automatic annotation in UniProtKB using UniRule, and Complete Proteomes. Wei Mun Chan Automatic annotation in UniProtKB using UniRule, and Complete Proteomes Wei Mun Chan Talk outline Introduction to UniProt UniProtKB annotation and propagation Data increase and the need for Automatic Annotation

More information

The Data Curation Profiles Toolkit: Interview Worksheet

The Data Curation Profiles Toolkit: Interview Worksheet Purdue University Purdue e-pubs Data Curation Profiles Toolkit 11-29-2010 The Data Curation Profiles Toolkit: Interview Worksheet Jake Carlson Purdue University, jakecar@umich.edu Follow this and additional

More information

Chris Moffatt Director of Technology, Ed-Fi Alliance

Chris Moffatt Director of Technology, Ed-Fi Alliance Chris Moffatt Director of Technology, Ed-Fi Alliance Review Background and Context Temporal ODS Project Project Overview Design and Architecture Demo Temporal Snapshot & Query Proof of Concept Discussion

More information

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit

More information

THE GREAT CONSOLIDATION: ENTERTAINMENT WEEKLY MIGRATION CASE STUDY JON PECK, MATT GRILL, PRESTON SO

THE GREAT CONSOLIDATION: ENTERTAINMENT WEEKLY MIGRATION CASE STUDY JON PECK, MATT GRILL, PRESTON SO THE GREAT CONSOLIDATION: ENTERTAINMENT WEEKLY MIGRATION CASE STUDY JON PECK, MATT GRILL, PRESTON SO Slides: http://goo.gl/qji8kl WHO ARE WE? Jon Peck - drupal.org/u/fluxsauce Matt Grill - drupal.org/u/drpal

More information