Core Technology Development Team Meeting

Similar documents
Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Metadata Ingestion and Processinng

Executive Committee Meeting

Executive Committee Meeting

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Minutes. Date: Location: UCSD BRF2 5A03. Attendees Present

Core Technology Development Team Meeting

Executive Committee Meeting

Core Technology Development Team Meeting

Agenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities

Core Technology Development Team Meeting

Steering Committee Meeting

Core Technology Development Team Meeting

Steering Committee Meeting

Core Technology Development Team Meeting

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

Steering Committee Meeting

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

Steering Committee Meeting

NCBI News, November 2009

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

Executive Committee Meeting

HRA Open User Guide for Awardees

Data publication and discovery with Globus

eveloping DataMed the current status

Data Curation Profile Human Genomics

Linking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier

NIH PUBLIC ACCESS Policy WORKSHOP April

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform

Maximizing Public Data Sources for Sequencing and GWAS

Using the Payor Agreement Library

Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library

How to store and visualize RNA-seq data

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Introducing the Springer Nature Data Support Services

TSRI, 400-S PubMed / MyNCBI

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:

funricegenes Comprehensive understanding and application of rice functional genes

Core Technology Development Team Meeting

This Statement of Work describes tasks to be performed by the RFC Production Center (RPC).

Data Curation Handbook Steps

Biobtree: A tool to search, map and visualize bioinformatics identifiers and special keywords [version 1; referees: awaiting peer review]

Searching the Evidence in PubMed

Database of Curated Mutations (DoCM) ournal/v13/n10/full/nmeth.4000.

TSRI, 400-S PubMed / MyNCBI

Data Curation Profile Movement of Proteins

Curatr: a web application for creating, curating, and sharing a mass spectral library

A Data Citation Roadmap for Scholarly Data Repositories

Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E.

Nancy Baker 1, Thomas Knudsen 2, Antony Williams 2

The Data Curation Profiles Toolkit: Interview Worksheet

LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories

DataSTORRE Deposit Guide

Demos: DMP Assistant and Dataverse

SAE MOBILUS USER GUIDE Subscription Login Dashboard Login Subscription Access Administration... 5

National Center for Biotechnology Information and National Institute of Health Manuscript Submission Accounts Set Up

iportal user guide for assessors

Electronic Thesis and Dissertation Tutorial: Submitting an ETD to SFA ScholarWorks

Renae Barger, Executive Director NN/LM Middle Atlantic Region

IRB RESEARCH REPOSITORY COMPLIANCE PROGRAM. FAQs: Designing and Managing Repositories. Compliance Deadline: August 31, 2011

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Launching the. Data Curation Network NDS/MBDH 2018

DATA SHARING FOR BETTER SCIENCE

The LAILAPS Search Engine - A Feature Model for Relevance Ranking in Life Science Databases

Editorial Workflow Tasks. Michaela Barton Account Coordinator

User Help. Fabasoft Cloud

User Stories : Digital Archiving of UNHCR EDRMS Content. Prepared for UNHCR Open Preservation Foundation, May 2017 Version 0.5

Genome Browsers - The UCSC Genome Browser

All abstracts should be reviewed by meeting organisers prior to submission to BioMed Central to ensure suitability for publication.

Customer Guidance For Requesting Changes to SNOMED CT

CrossRef tools for small publishers

Data for Accountability, Transparency and Impact Monitoring (DATIM) MER Data Import Reference Guide Version 2. December 2018

CDIS Biomedical Data Commons

CODE AND DATA MANAGEMENT. Toni Rosati Lynn Yarmey

DATAVERSE FOR JOURNALS

BIBL NEEDS REVISION INTRODUCTION

Structural Bioinformatics

Human Disease Models Tutorial

Prototyping a Biomedical Ontology Recommender Service

Scholar Universe faculty profile FAQ

Responding to REB Feedback

Research Elsevier

Software review. Biomolecular Interaction Network Database

OvidSP Frequently Asked Questions

Transcription:

Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

Agenda v Updates on action items v Suggestions for biocaddie Webinars v Search workflow diagram v Review of repository submission form/ response to submissions v Linkout review v Visualization plans v Updates from all team members Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 2

Updates- action items v v v Other repositories should be sent to UTHealth team for integration Jeff to be mapped to DATS 2.0 w w w w w w ArrayExpress Dataverse dbgap LINCS PDB SRA Video Jeff Monthly testing of DataMed Anu/CDT Thursday, 08/04 Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 3

Updates- action items v v v v v Search workflow overview - Ruiling Repository submission form SOP - George Presentation of visualization/user activity tracking system Deevakar/Ruiling Record the timestamps for queries and results retrived to determine the response time - Ruiling Pubmed link investigation - Jeff Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 4

biocaddie webinars v Suggestions? v Please send to me (Anupama.E.Gururaj@uth.tmc.edu) or Elizabeth Bell (eabell@ucsd.edu) Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 5

Search Workflow Diagram Search for Datasets on DataMed Website User DataMed Website Terminology Server Submit a query term Receive and process query term No selected data types Selected data types No selected access types Selected access types No selected authorization types Selected authorization types No selected repositories Selected repositories Get a list of selected repositories Simple search Receive and process query term Advanced search Get Synonyms of the query term Parse advanced search Add synonyms to query term Generate Elasticsearch query Generate Search Results and facets Display Search Results and facets View Search Results Correct misspelled words Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 6

Repository Submission form v http://datamedbeta.biocaddie.org/ submit_repository.php v https://datamed.org/ manage_submit_repository.php?show=all - Has these requests been responded to? v Reminder email to be sent if no response seen for a week? Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 7

LinkOut Xiaoling Chen

What is LinkOut v A service allows you to link from PubMed and other NCBI databases to other resources beyond the NCBI systems. v Aims to facilitate access to relevant online resources to extend, clarify, and supplement information found in NCBI databases. v Online resources that may be valuable to users of PubMed and other NCBI databases are encourage to participate in LinkOut.

Examples of linkout v http://www.ncbi.nlm.nih.gov/pubmed/ 16465590 v http://www.ncbi.nlm.nih.gov/gene/ 3882292 v http://www.ncbi.nlm.nih.gov/pubmed/ 23335479

Databases available for linking

Type of providers v As of July 21, 2016, there are 4083 LinkOut providers. w Full-text publication w Biological databases w Consumer health information w Research tools

Prerequisites for Participation v Resources submitted for inclusion in LinkOut will be evaluated individually to determine whether they meet certain inclusion criteria. v Quality: LinkOut resources and the information therein must be of sufficiently high quality that NCBI database users will not be hindered, interrupted, or unnecessarily frustrated in their research. v Relevance: directly relevant to the specific subjects of the NCBI database records and useful to users' study and research.

Apply for Inclusion in LinkOut v To apply for inclusion in LinkOut, send an email to linkout@ncbi.nlm.nih.gov. v Include the following information: w w w Name, email address, and phone number of a contact person in your organization. The scope of your resource, including the URL of the resource. Also, please describe the type of NCBI database records to which you would like to apply links. Describe any restrictions on access to the resource. v A LinkOut team member will email the contact person within 1 week regarding your request.

Files for inclusion v LinkOut requires two types of files to describe online resources v w w Identity files (XML) Resource files (XML, CSV, Simple Text) These files are specified in the LinkOut DTD. These files include the necessary elements for the NCBI retrieval system to construct an appropriate URL to access specific resources.

File Transfer v When you receive your account information, validate the files using the LinkOut File Validation Utility and transfer all files via FTP to the host FTPprivate.ncbi.nlm.nih.gov. v Inform the LinkOut team at linkout@ncbi.nlm.nih.gov v Your files will be given a final evaluation before being placed in the production queue. From this point on, files will be processed automatically every day.

Provider Responsibilities v maintaining their LinkOut files v transferring any additions, changes or deletions of their links to NCBI v updating files and informing NCBI when access rights are changed v correcting broken or incorrect links in a timely manner v Participation in LinkOut is free and voluntary, and so may be discontinued at any time. Submission of links is at the provider s discretion; participants may choose not to submit links to certain portions of their resource.

Provider Statistics v LinkOut collects statistics on the number of clicks on each providers s links in the LinkOut display. v Statistics can be emailed to the LinkOut contact monthly.

Links v Homepage: w http://www.ncbi.nlm.nih.gov/projects/ linkout/doc/linkout.html Identify file: w http://www.ncbi.nlm.nih.gov/books/ NBK3802/ #nonbib.file_preparation_identity_file v Source file: w http://www.ncbi.nlm.nih.gov/books/ NBK3802/ #nonbib.file_preparation_resource_file_xm

Visualizations for DataMed Website

Visualization 1

Populati on On Click of bar keywor d cloud of results Organis m System Organ Tissue Cell Molecule Genome Gene PDB (23) Mouseover Repository Name and number of results Nucleoti de Chemistr y Book Stack - Bins are the data types - Thicker book = more results - Taller book = high ranking

v Pros: Book Stack w Scalable stack more repositories in bin, adjust width and spaces w Visible index data types and repositories contained within bin v Cons w Multiple datatype relationships w Thickness and Height may be confusing / not evident

Visualization 2

Spoke and Wheel - Wheel is divided into datatypes - Spokes are the repositories Word cloud of a repository Repositoriescolor = importance of the repository

Spoke and Wheel v Pros: w Scalable, by adjusting the density, and the bins w Can fit in smaller screen space v Cons w Multiple datatype relationships w Not easy to click / pick a repository w limited space to display word cloud

Github Issues Total Issues 138 Number Open 66 Number Closed 72 Associated with v1.0 Usability Issues Associated with v0.5 Number of Bugs Number of Enhancements Number of QuesOons Number Open 9 Number Closed 1 Number Open 29 Number Closed 0 Number Open 38 Number Closed 46 Number Open 7 Number Closed 10 Number Open 27 Number Closed 20 Number Open 8 Number Closed 9 Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 29

Ongoing work Task Status 1 Metadata Ingestion 1.1 Import repositories expansion Ongoing 1.2 Data repository suggestion form at DataMed - George 1.3 Metadata mapping review/ reconciliation between curators Ongoing 1.4 Metadata management Ongoing 1.5 Indexing Ongoing 1.6 NLP-based indexing : Gene/protein, Disease, Drug/chemical, Evaluation Biological process, Organism, Format, Access, Cell types phase 1.7 Bulk download of indices 2 Terminology server 2.3 Integrate terminology server (Indexing) Ongoing 4 Interface Design 4.2 Design interface usability issues Ongoing 4.5 Display most Accessed Datasets Not Started Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 30

Ongoing work Task Status 5 Personalized search 5.1 Improve the tracking system Ongoing 6 Searching/Ranking algorithms 6.1 Similar datasets to be expanded Ongoing 7 Display of results 7.1 Sort datasets author, published date, repository, title Ongoing 7.2 What fields should be displayed? Ongoing 7.3 Additional filters: File type Data Restrictions (data use agreement, restricted, unrestricted) Data Level (participant/aggregate) Population (mouse, human, etc) Not started 8 Link to external resources 8.1 1. Pubmed: click through to pubmed records of citing publications: copy citation to clipboard 2. Scholix Framework for Linking Data and Literature Not Started Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 31

Ongoing work Task Status 10 Documentation 10.1 Source code Ongoing 10.2 Tutorials Not Started 10.3 Help menu Not Started 10.4 Video Ongoing 11 Usability studies 11.2 User studies Ongoing Data Duplication issue: Create a plan for how to best display/represent the duplicate in the metadata records and set up a meeting to discuss the workflow for displaying the duplicates in the metadata records Jeff/Anu 12 Additional field in index 13 Generation of benchmark for the dataset 13.4 Execute designed queries and annotate results Ongoing 14 Relationship Network Graph 15 Collaborative research support Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 32

Other issues v Please deposit codes in GitHub. Please contact me at Anupama.E.Gururaj@uth.tmc.edu if you need access v Any other issues? v Thank You