Core Technology Development Team Meeting
|
|
- Eileen Dickerson
- 5 years ago
- Views:
Transcription
1 Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: Access Code: For international call in numbers, please visit:
2 Agenda Updates on action items Re-indexing run summary Workflow outline Tests and verifications procedure DataMed Evaluation Updates from all team members Supported by the NIH grant 1U24 AI to the University of California, San Diego 2
3 Updates- action items Generate a HELP and FAQ page please start adding material here ASAP; FAQ from review is being generatedhttps://docs.google.com/document/d/1sczm976jc8wn hyfvysokvwuucjojdg914lqaftraiqg/edit Complete re-indexing before with updated NLP pipeline Testing to begin on February 16 th Supported by the NIH grant 1U24 AI to the University of California, San Diego 3
4 Updates- Visualization Supplement The chrome plugin is complete They will present at the CDT meeting next week Supported by the NIH grant 1U24 AI to the University of California, San Diego 4
5 Overview of Re-Indexing January 2017 Supported by the NIH grant 1U24 AI to the University of California, San Diego
6 Overview of Process All current sources that were already indexed included in re-indexing run w 58 Repositories All sources run through in sequential order w Utilized 1 development node for the processing (AWS) Re-indexing process added new data w Did not fully transform data that was already indexed w Test of Supported updating by the NIH grant 1U24 AI functionality to the University of California, San Diego
7 Sources Supported by the NIH grant 1U24 AI to the University of California, San Diego
8 Overall Statistics Number of Sources: 58 Total Number of Metadata Records: 77,085,123 w Approximately 3X size of PubMed Total Size of Metadata: GB Average Record Size w Average: 0.89 KB w Median: 2.89 KB w Min: 0.57 KB Supported by the NIH grant 1U24 AI to the University of California, San Diego
9 Source Statistics (cont.) Minimum number of records: 1 Maximum number of records: 74,809,080 w Uniprot Trembl Connectors utilized: w Rsync w CSV w FTP w XML w OAI w WEB w ASPERA w Community Aggregator Supported by the NIH grant 1U24 AI to the University of California, San Diego
10 Timing Statistics Total Ingestion Time: 23 Hours, 35 Minutes w 77+ Million Records w Did not include data download time Ingestion rate: 224 documents per second Supported by the NIH grant 1U24 AI to the University of California, San Diego
11 Source Timing Trembl Supported by the NIH grant 1U24 AI to the University of California, San Diego
12 Lessons Learned For some sources the actual document retrieval takes as long as processing w Documents begin processing once downloaded so a document can finish processing by the time the next document is downloaded Current test ran with 1 ingestion and 1 processing node w Can enhance performance by running sources in parallel Improvements to management functions w Need to enhance and improve counting of records being processed (as they flow through the message queues) w Additional time checks can be added with improved counting Supported by the NIH grant 1U24 AI to the University of California, San Diego
13 Next Tests 1)Full update of all records (simulating update to transformations or update to ontology) 2)Full update with new NLP module Supported by the NIH grant 1U24 AI to the University of California, San Diego
14 Workflow of adding/updating a repository Elasticsearch endpoint Elasticsearch endpoint Elasticsearch endpoint Manually dump index to ES endpoint at UTHealth Manually dump updated index to ES endpoint at UCSD Jeff s team (biocaddie.scicrunch.io) UTHealth (129.***.**.121) 1. Manually update the mapping Add not_analyzed field Add format field to date 2. Update source code of the UI Internal biocaddie website UCSD (192.***.***.107) Public biocaddie website Update source code of the UI UTHealth (datamedbeta.biocaddie.org) UCSD (datamed.org)
15 Workflow of Re- Indexing (i.e. no change to Transformation) Elasticsearch endpoint Elasticsearch endpoint Elasticsearch endpoint Transfer date stamed index to ES endpoint at UCSD with same ES Mappings Do ES alias swap to make new index the default Curation Team (biocaddie.scicrunch.io) No- Go UCSD (192.***.***.107) 1. Run automated tests of index to verify index and record results 2. Dashboard provides reviewer with results and provides links for easy review 3. Reviewer makes go / no- go decision Go UCSD (192.***.***.107) Public biocaddie website Testing Computer UCSD (datamed.org) JFIuQAiq7nM/edit
16 Update the mapping
17 Python code to update the mapping New mapping
18 DataMed Search Evaluation Xiaoling Chen
19 Search results from 43 repositories in DataMed V1.5 were evaluated between 1/17/17-1/31/17. Pick one query for each repository. What are compared: w w w w Number of total datasets (Datamed vs primary website) Number of return results for the query(datamed vs primary website) Top 10 annotation(relevant vs not-relevant) Number of overlap in top 10 in Datamed and primary website.
20 Results Overview Repositories which return similar numbers (19) Repositories which return different numbers (9) Repositories which have granularity problem (7) Repositories which we cannot compare (8)
21 Repositories which return similar numbers # Repo query Datame d total included 1 NIDDKC R TRANSPLA NTATION Primary website total included Datamed results returned Primary results returned Datamed relevant in top 10 Primary website relevant in top NA CIL MCF10A NA Peptidea tlas Jurkat 76 NA RGD p NA CTN pain TCIA dcis 85 NA YPED alzheimer Open FMRI 9 NURS A Brain injury Breast cancer ICPSR voting Overlap in top 10
22 Repositories which return similar numbers (continued) No Repo query Datame d total include d 11 PDB 12 Bioproje ct 13 LINCS 14 Neuromo rphp lactamas e Primary website total include d Datame d results returne d Primar y websit e results returne d Datame d Relevan t in top 10 Primary website relevant in top COPD NA Sk-br-3 amygdal a 15 CVRG Heart failure 16 Proteomex change Kinase AND proteomi c 17 GEMMA Multiple sclerosis 18 Dryad 19 Neurovaultc ols epilepsy brain NA Overla p in top We may only index part of their dataset We ingest In data package level and data file level We index some users temporary collections Note
23 No Repo query Datame d Total include d 1 ClincalTri als 2 Physio bank 3 BMRB 4 Datave rse 5 Arrayexpr ess melanoma Repositories which return different numbers Primary website total included Datamed results returned Primary results returned Datam ed Releva nt in top 10 Primary website relevant in top 10 Overl ap in top Search algorithm is different. Datamed uses synonyms to search but clinicaltrials maybe not ECG 78 NA They also return results from physiotools, etc. Troponin c Blood pressure mda Clinvar Familial Hemophagoc ytic Lymphohisti ocytosis 7 Zenodo Cell proliferation 8 Morpho bank 9 Swisspr ot lemur Estrogen 1490 NA Their search function is confusing and show duplicate results Search algorithm is different Search algorithm is different ES treat mda-231 as two words, if search mda-231, return 18 and most of them are overlap NA The primary website has richer information Search algorithm is different ,and 958 in process Note We index projects and matrix data; the results return in primary website contain lemurs. But Datamed does not match lemur to lemurs. Need to expand via terminology server The primary website has richer information. We do not include protein, gene information
24 Repositories which have granularity problem No Repository Primary website Datamed 1 Vectorbase including multiple domains (genome, expression, ontology, proteome, etc) 2 Peerj Can only search publications 3 NITRCIR 14 Projects, 6845 Subjects, and 8285 Imaging Sessions. 4 RETINA Search retinas for cells or strata, no keyword search function. Contact repo for meta, repo has mysql database available for download 5 MPD only return QTL, granularity is different 6 GEO they index in dataset level and sample level 7 GND No keyword search for projects, can only search public datafiles 1445 datafiles we just index supporting data sets 1391 sessions 384 data we are ingesting in individual mouse level (235) We are ingesting in dataset level Index in project level
25 Repositories which we cannot compare No Repository Primary website Datamed 1 EMDB no keyword search function 2 Epigenomics 3 GDC no keyword search function no keyword search function 4 Neurovault:atlases no keyword search function 5 Neurovault:nidm 6 BGI no keyword search function It is in Chinese 7 DBGAP Our index is quite old and not in DATS2.1 8 SRA Our index is quite old and not in DATS2.1
26 Summary We evaluated search results from 43 individual repositories in DataMed and their primary website. Search results for 28 repositories were compared and annotated the top 10. w Datamed : 188 relevant / 193 =97.41% w Primary website: 201 relevant /218 =92.20% w Reasons for different numbers: Index scope is different(5 repositories) Search algorithm is different(4 repositories) Metadata in primary website is richer(2 repositories) 7 repositories have granularity problem. 8 repositories cannot be compared. What we can do to improve: w w Improve terminology server (e.g. lemur to lemurs ) Contact repositories for the granularity problem.
27 Github Issues Total Issues 239 Number Open 112 Number Closed 127 Supported by the NIH grant 1U24 AI to the University of California, San Diego 27
28 Ongoing work Task Status 1 Metadata Ingestion 1.1 Import repositories expansion Ongoing 1.2 Data repository suggestion form at DataMed George/Xiaoling / Sanda Ongoing 1.3 Metadata mapping review/ reconciliation between curators Ongoing 1.4 Metadata management Ongoing 1.5 Indexing Ongoing 1.6 NLP-based/terminology server indexing : Gene/protein, Disease, Drug/chemical, Biological Implemented at backend process, Organism, Format, Access, Cell types 1.7 Bulk download of indices Not Started 2 Interface Design 2.1 Design interface usability issues Ongoing 2.2 Display most Accessed Datasets Not Started Supported by the NIH grant 1U24 AI to the University of California, San Diego 28
29 Ongoing work Task Status 3 Personalized search 3.1 Improve the tracking system Ongoing 4 Searching/Ranking algorithms 4.1 Similar datasets to be expanded Ongoing 5 Display of results 5.1 Sort datasets author, published date, repository, title Ongoing 5.2 What fields should be displayed? Ongoing Additional filters: File type Data Restrictions (data use agreement, restricted, unrestricted) Data Level (participant/aggregate) 5.3 Population (mouse, human, etc) 6 Link to external resources 1. Pubmed: click through to pubmed records of citing publications: copy citation to clipboard Scholix Framework for Linking Data and Literature 3. Linkout Not started Not Started Implemented Supported by the NIH grant 1U24 AI to the University of California, San Diego 29
30 Ongoing work Task Status 7 Documentation 7.1 Source code Ongoing 7.2 Tutorials Not Started 7.3 Help menu Ongoing 7.4 Video Ongoing 8 Usability studies 8.1 User studies Ongoing Data Duplication issue: best display/represention of the duplicate in the metadata records and workflow for displaying the duplicates in the metadata records Jeff/Anu Additional field in index tagging the dataset as duplicate Display the dataset and list repos underneath 9 10 Relationship Network Graph 11 Collaborative research support Supported by the NIH grant 1U24 AI to the University of California, San Diego 30
31 Other issues Please deposit codes in GitHub. Please contact me at if you need access hp Any other issues? Thank You
Core Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationMetadata Ingestion and Processinng
biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationAgenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities
Agenda Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Progress and updates Y1Q3 and plans for Y1Q4 Plan for the
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationMinutes. Date: Location: UCSD BRF2 5A03. Attendees Present
Executive Committee Meeting Location: UCSD BRF2 5A03 Date: 8-16-16 Start time: 10:00 am PDT End time: 11:30 am PDT Meeting Objective Attendees Present Minute Taker Executive Committee Meeting UCSD: Lucila
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please isit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationThe Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK
The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University
More informationeveloping DataMed the current status
eeloping DataMed the current status Hua Xu Core Deelopment Team (CDT) biocaddie AHM 2017 8/8/17 Supported by the NIH grant 1U24 AI117966-01 to the Uniersity of California, San Diego 1 Outline CDT Roles
More informationPrototyping a Biomedical Ontology Recommender Service
Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the
More informationMetadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform
Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform biocaddie All Hands Meeting September 11 th, 2016 Ram Gouripeddi & Julio Facelli Department
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting Agenda v Updates regarding last meeting action items v Presentation by Ergin about Ontology Services v Brief updates from others Supported by the NIH grant 1U24
More informationSusanna-Assunta Sansone, PhD. Metadata WG3 chair.
Susanna-Assunta Sansone, PhD Metadata WG3 chair 3-workgroup@biocaddie.org WG3 Metadata v v Full description: goals, synergies, phases, members & files Joint effort with BD2K Center for Expanded Data Annotation
More informationDatabase of Curated Mutations (DoCM) ournal/v13/n10/full/nmeth.4000.
Database of Curated Mutations (DoCM) http://docm.genome.wustl.edu/ http://www.nature.com/nmeth/j ournal/v13/n10/full/nmeth.4000.h tml Home Page Information in DoCM DoCM uses many data sources to compile
More informationExploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix
Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as
More informationExercises. Biological Data Analysis Using InterMine workshop exercises with answers
Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword
More informationMaximizing Public Data Sources for Sequencing and GWAS
Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda
More informationNCBI News, November 2009
Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved
More informationBiobtree: A tool to search, map and visualize bioinformatics identifiers and special keywords [version 1; referees: awaiting peer review]
SOFTWARE TOOL ARTICLE Biobtree: A tool to search, map and visualize bioinformatics identifiers and special keywords [version 1; referees: awaiting peer review] Tamer Gur European Bioinformatics Institute,
More informationLinking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier
Linking data and publications the past, present, and future Dr. Hylke Koers, Head of Content Innovation, Elsevier BioCADDIE webinar January 8, 2015 Ease of access Open Access 2 The issue: data is important,
More informationDBpedia Data Processing and Integration Tasks in UnifiedViews
1 DBpedia Data Processing and Integration Tasks in Tomas Knap Semantic Web Company Markus Freudenberg Leipzig University Kay Müller Leipzig University 2 Introduction Agenda, Team 3 Agenda Team & Goal An
More informationClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar
ClinVar What is ClinVar ClinVar is a freely available, central archive for associating observed variation with supporting clinical and experimental evidence for a wide range of disorders. The database
More information@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha
@Note2 tutorial Hugo Costa (hcosta@silicolife.com) Ruben Rodrigues (pg25227@alunos.uminho.pt) Miguel Rocha (mrocha@di.uminho.pt) 23-01-2018 The document presents a typical workflow using @Note2 platform
More informationTania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007
Collaborative Ontology Development in Protégé Tania Tudorache Stanford University - Ontolog forum invited talk04. October 2007 Outline Introduction and Background Tools for collaborative knowledge development
More informationCDIS Biomedical Data Commons
CDIS Biomedical Data Commons Computational Life Science Seminar Series October 18, 2017 Michael Fitzsimons Center for Data Intensive Science Agenda What is a Data Commons? Data Commons at CDIS NCI GDC
More informationRLIMS-P Website Help Document
RLIMS-P Website Help Document Table of Contents Introduction... 1 RLIMS-P architecture... 2 RLIMS-P interface... 2 Login...2 Input page...3 Results Page...4 Text Evidence/Curation Page...9 URL: http://annotation.dbi.udel.edu/text_mining/rlimsp2/
More informationOpen Research Online The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs The Smart Book Recommender: An Ontology-Driven Application for Recommending Editorial Products
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationBovineMine Documentation
BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................
More informationMulti-field query expansion is effective for biomedical dataset retrieval
Database, 2017, 1 20 doi: 10.1093/database/bax062 Original article Original article Multi-field query expansion is effective for biomedical dataset retrieval Mohamed Reda Bouadjenek* and Karin Verspoor
More informationITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013
ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were
More informationMinimal Metadata Standards and MIIDI Reports
Dryad-UK Workshop Wolfson College, Oxford 12 September 2011 Minimal Metadata Standards and MIIDI Reports David Shotton, Silvio Peroni and Tanya Gray Image BioInformatics Research Group Department of Zoology
More informationEnterprise Data Catalog for Microsoft Azure Tutorial
Enterprise Data Catalog for Microsoft Azure Tutorial VERSION 10.2 JANUARY 2018 Page 1 of 45 Contents Tutorial Objectives... 4 Enterprise Data Catalog Overview... 5 Overview... 5 Objectives... 5 Enterprise
More informationFinding and Exporting Data. BioMart
September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.
More informationA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories Tim Clark (Harvard Medical School & Massachusetts General Hospital) Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science,
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationUniProt - The Universal Protein Resource
UniProt - The Universal Protein Resource Claire O Donovan Pre-UniProt Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI; TrEMBL: created at the EBI in November 1996
More informationTSRI, 400-S PubMed / MyNCBI
TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search
More informationOntology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E.
Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework Maryann E. Martone University of California, San Diego What does this mean? 3D Volumes
More informationInterviewee 2 I work on various bioinformatics projects, mostly looking at database integration.
Interview Transcript Interview reference: Biochemistry 10 Role: Post doc researchers from the same lab group Interview length: 1hr 13mins Format: Face to face Number of interviewees: 2 Questionnaire respondent?
More informationWhat is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester
National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text
More informationIntroduction to Systems Biology II: Lab
Introduction to Systems Biology II: Lab Amin Emad NIH BD2K KnowEnG Center of Excellence in Big Data Computing Carl R. Woese Institute for Genomic Biology Department of Computer Science University of Illinois
More informationUser guide for GEM-TREND
User guide for GEM-TREND 1. Requirements for Using GEM-TREND GEM-TREND is implemented as a java applet which can be run in most common browsers and has been test with Internet Explorer 7.0, Internet Explorer
More informationReproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team
Reproducible & Transparent Computational Science with Galaxy Jeremy Goecks The Galaxy Team 1 Doing Good Science Previous talks: performing an analysis setting up and scaling Galaxy adding tools libraries
More informationNetwork Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation
Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Jurisica Lab, Ontario Cancer Institute http://ophid.utoronto.ca/navigator/ November 10, 2006 Contents 1 Introduction 2
More informationSEEK User Manual. Introduction
SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.
More informationNCBO Technology: Powering semantically aware applications
JOURNAL OF BIOMEDICAL SEMANTICS PROCEEDINGS Open Access NCBO Technology: Powering semantically aware applications Patricia L Whetzel 1*, NCBO Team 1,2,3,4 From Bio-Ontologies 2012 Long Beach, CA, USA.
More informationSupplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform
Supplementary Note-- Williams et al The Image Data Resource: A Bioimage Data Integration and Publication Platform 1. Exploring the IDR This current IDR web user interface (WUI) is based on the open source
More informationRelational Retrieval Using a Combination of Path-Constrained Random Walks
Relational Retrieval Using a Combination of Path-Constrained Random Walks Ni Lao, William W. Cohen University 2010.9.22 Outline Relational Retrieval Problems Path-constrained random walks The need for
More informationIntroduction to The Storage Resource Broker
http://www.nesc.ac.uk/training http://www.ngs.ac.uk Introduction to The Storage Resource Broker http://www.pparc.ac.uk/ http://www.eu-egee.org/ Policy for re-use This presentation can be re-used for academic
More informationOpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond
Alessia Bardi and Paolo Manghi, Institute of Information Science and Technologies CNR Katerina Iatropoulou, ATHENA, Iryna Kuchma and Gwen Franck, EIFL Pedro Príncipe, University of Minho OpenAIRE Fostering
More informationSemantic MediaWiki (SMW) for Scientific Literature Management
Semantic MediaWiki (SMW) for Scientific Literature Management Bahar Sateli, René Witte Semantic Software Lab Department of Computer Science and Software Engineering Concordia University, Montréal SMWCon
More informationHeiðrun. Building DPLA s New Metadata Ingestion System. Mark A. Matienzo Digital Public Library of America
Heiðrun Building DPLA s New Metadata Ingestion System Mark A. Matienzo Digital Public Library of America Metropolitan New York Library Council Annual Conference January 15, 2015 Outline 1.
More informationProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017
ProQuest Dissertations and Theses Overview Austin McLean and Marlene Coles CGS Summer Workshop, July 2017 Agenda Dissertations and ProQuest Short form video Pilot Project 2 A mission that aligns with universities
More informationCuration of Large Scale EHR Data for Use with Biobank Samples
Curation of Large Scale EHR Data for Use with Biobank Samples Global Biobank Week 14.9.2017 Session 6B: Biobanks and Electronic Health Records Henrik Edgren, CSO Conflicts of interest Employee of MediSapiens
More informationManaging CDISC version changes: how & when to implement? Presented by Lauren Shinaberry, Project Manager Business & Decision Life Sciences
1 Managing CDISC version changes: how & when to implement? Presented by Lauren Shinaberry, Project Manager Business & Decision Life Sciences 2 Content Standards Technical Standards SDTM v1.1 SDTM IG v3.1.1
More informationHarmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies
Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harold R. Solbrig 1, Guoqian Jiang 1 1 Mayo Clinic College of Medicine, Rochester, MN [solbrig.harold,
More informationCODE AND DATA MANAGEMENT. Toni Rosati Lynn Yarmey
CODE AND DATA MANAGEMENT Toni Rosati Lynn Yarmey Data Management is Important! Because Reproducibility is the foundation of science Journals are starting to require data deposit You want to get credit
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationEUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal Heinrich Widmann, DKRZ DI4R 2016, Krakow, 28 September 2016 www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020
More informationProfiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Origin and Outcomes Currently funded through a Wellcome Trust Seed award Collaboration
More informationBuilding Software to Translate
Bridging Archival Standards: Building Software to Translate Metadata Between PDS3 & PDS4 Planetary Science Informatics and Data Analytics Conference St. Louis, MO -- April 25, 2018 Cristina M. De Cesare
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationELIXIR Human Data Use Case
ELIXIR Human Data Use Case Mikael Borg, ELIXIR Sweden ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559.
More informationSoftware review. Biomolecular Interaction Network Database
Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction
More informationPaolo Missier, Khalid Belhajjame, Jun Zhao, Carole Goble School of Computer Science The University of Manchester, UK
Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame, Jun Zhao, Carole Goble School of Computer Science The University of Manchester, UK Context
More informationfunricegenes Comprehensive understanding and application of rice functional genes
funricegenes Comprehensive understanding and application of rice functional genes Part I Display of information in this database as static web pages https://funricegenes.github.io/ At the homepage of our
More informationHymenopteraMine Documentation
HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................
More informationAnalyzer of Bio-resource Citations. World Data Center of Microorganisms(WDCM)
Analyzer of Bio-resource Citations World Data Center of Microorganisms(WDCM) http://abc.wdcm.org/ Outlines Introduction of ABC Homepage and function of ABC Text mining for microorganism : classification,
More informationTSRI, 400-S PubMed / MyNCBI
TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed / MyNCBI My NCBI is a free service available in PubMed (and all other NCBI databases) that allows you to save searches, set up email alerts for search
More informationIUNI Web of Science Data Enclave 102
Enclave 102 Katy Börner and Robert Light Cyberinfrastructure for Network Science Center School of Informatics and Computing and IUNI Indiana University, USA Val Pentchev, Matt Hutchinson, and Benjamin
More informationUpdate on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University
Update on Dataverse Image credit: David Bygott (CC-BY-NC-SA) 2014 Dryad-Dataverse Community Meeting Mercè Crosas, Elizabeth Quigley & Eleni Castro Data Science > IQSS > Harvard University Introduction
More informationAutomatic annotation in UniProtKB using UniRule, and Complete Proteomes. Wei Mun Chan
Automatic annotation in UniProtKB using UniRule, and Complete Proteomes Wei Mun Chan Talk outline Introduction to UniProt UniProtKB annotation and propagation Data increase and the need for Automatic Annotation
More informationThe Data Curation Profiles Toolkit: Interview Worksheet
Purdue University Purdue e-pubs Data Curation Profiles Toolkit 11-29-2010 The Data Curation Profiles Toolkit: Interview Worksheet Jake Carlson Purdue University, jakecar@umich.edu Follow this and additional
More informationChris Moffatt Director of Technology, Ed-Fi Alliance
Chris Moffatt Director of Technology, Ed-Fi Alliance Review Background and Context Temporal ODS Project Project Overview Design and Architecture Demo Temporal Snapshot & Query Proof of Concept Discussion
More informationEnabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services
Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit
More informationTHE GREAT CONSOLIDATION: ENTERTAINMENT WEEKLY MIGRATION CASE STUDY JON PECK, MATT GRILL, PRESTON SO
THE GREAT CONSOLIDATION: ENTERTAINMENT WEEKLY MIGRATION CASE STUDY JON PECK, MATT GRILL, PRESTON SO Slides: http://goo.gl/qji8kl WHO ARE WE? Jon Peck - drupal.org/u/fluxsauce Matt Grill - drupal.org/u/drpal
More information