Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
Agenda v Updates on action items v Suggestions for biocaddie Webinars v Search workflow diagram v Review of repository submission form/ response to submissions v Linkout review v Visualization plans v Updates from all team members Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 2
Updates- action items v v v Other repositories should be sent to UTHealth team for integration Jeff to be mapped to DATS 2.0 w w w w w w ArrayExpress Dataverse dbgap LINCS PDB SRA Video Jeff Monthly testing of DataMed Anu/CDT Thursday, 08/04 Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 3
Updates- action items v v v v v Search workflow overview - Ruiling Repository submission form SOP - George Presentation of visualization/user activity tracking system Deevakar/Ruiling Record the timestamps for queries and results retrived to determine the response time - Ruiling Pubmed link investigation - Jeff Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 4
biocaddie webinars v Suggestions? v Please send to me (Anupama.E.Gururaj@uth.tmc.edu) or Elizabeth Bell (eabell@ucsd.edu) Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 5
Search Workflow Diagram Search for Datasets on DataMed Website User DataMed Website Terminology Server Submit a query term Receive and process query term No selected data types Selected data types No selected access types Selected access types No selected authorization types Selected authorization types No selected repositories Selected repositories Get a list of selected repositories Simple search Receive and process query term Advanced search Get Synonyms of the query term Parse advanced search Add synonyms to query term Generate Elasticsearch query Generate Search Results and facets Display Search Results and facets View Search Results Correct misspelled words Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 6
Repository Submission form v http://datamedbeta.biocaddie.org/ submit_repository.php v https://datamed.org/ manage_submit_repository.php?show=all - Has these requests been responded to? v Reminder email to be sent if no response seen for a week? Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 7
LinkOut Xiaoling Chen
What is LinkOut v A service allows you to link from PubMed and other NCBI databases to other resources beyond the NCBI systems. v Aims to facilitate access to relevant online resources to extend, clarify, and supplement information found in NCBI databases. v Online resources that may be valuable to users of PubMed and other NCBI databases are encourage to participate in LinkOut.
Examples of linkout v http://www.ncbi.nlm.nih.gov/pubmed/ 16465590 v http://www.ncbi.nlm.nih.gov/gene/ 3882292 v http://www.ncbi.nlm.nih.gov/pubmed/ 23335479
Databases available for linking
Type of providers v As of July 21, 2016, there are 4083 LinkOut providers. w Full-text publication w Biological databases w Consumer health information w Research tools
Prerequisites for Participation v Resources submitted for inclusion in LinkOut will be evaluated individually to determine whether they meet certain inclusion criteria. v Quality: LinkOut resources and the information therein must be of sufficiently high quality that NCBI database users will not be hindered, interrupted, or unnecessarily frustrated in their research. v Relevance: directly relevant to the specific subjects of the NCBI database records and useful to users' study and research.
Apply for Inclusion in LinkOut v To apply for inclusion in LinkOut, send an email to linkout@ncbi.nlm.nih.gov. v Include the following information: w w w Name, email address, and phone number of a contact person in your organization. The scope of your resource, including the URL of the resource. Also, please describe the type of NCBI database records to which you would like to apply links. Describe any restrictions on access to the resource. v A LinkOut team member will email the contact person within 1 week regarding your request.
Files for inclusion v LinkOut requires two types of files to describe online resources v w w Identity files (XML) Resource files (XML, CSV, Simple Text) These files are specified in the LinkOut DTD. These files include the necessary elements for the NCBI retrieval system to construct an appropriate URL to access specific resources.
File Transfer v When you receive your account information, validate the files using the LinkOut File Validation Utility and transfer all files via FTP to the host FTPprivate.ncbi.nlm.nih.gov. v Inform the LinkOut team at linkout@ncbi.nlm.nih.gov v Your files will be given a final evaluation before being placed in the production queue. From this point on, files will be processed automatically every day.
Provider Responsibilities v maintaining their LinkOut files v transferring any additions, changes or deletions of their links to NCBI v updating files and informing NCBI when access rights are changed v correcting broken or incorrect links in a timely manner v Participation in LinkOut is free and voluntary, and so may be discontinued at any time. Submission of links is at the provider s discretion; participants may choose not to submit links to certain portions of their resource.
Provider Statistics v LinkOut collects statistics on the number of clicks on each providers s links in the LinkOut display. v Statistics can be emailed to the LinkOut contact monthly.
Links v Homepage: w http://www.ncbi.nlm.nih.gov/projects/ linkout/doc/linkout.html Identify file: w http://www.ncbi.nlm.nih.gov/books/ NBK3802/ #nonbib.file_preparation_identity_file v Source file: w http://www.ncbi.nlm.nih.gov/books/ NBK3802/ #nonbib.file_preparation_resource_file_xm
Visualizations for DataMed Website
Visualization 1
Populati on On Click of bar keywor d cloud of results Organis m System Organ Tissue Cell Molecule Genome Gene PDB (23) Mouseover Repository Name and number of results Nucleoti de Chemistr y Book Stack - Bins are the data types - Thicker book = more results - Taller book = high ranking
v Pros: Book Stack w Scalable stack more repositories in bin, adjust width and spaces w Visible index data types and repositories contained within bin v Cons w Multiple datatype relationships w Thickness and Height may be confusing / not evident
Visualization 2
Spoke and Wheel - Wheel is divided into datatypes - Spokes are the repositories Word cloud of a repository Repositoriescolor = importance of the repository
Spoke and Wheel v Pros: w Scalable, by adjusting the density, and the bins w Can fit in smaller screen space v Cons w Multiple datatype relationships w Not easy to click / pick a repository w limited space to display word cloud
Github Issues Total Issues 138 Number Open 66 Number Closed 72 Associated with v1.0 Usability Issues Associated with v0.5 Number of Bugs Number of Enhancements Number of QuesOons Number Open 9 Number Closed 1 Number Open 29 Number Closed 0 Number Open 38 Number Closed 46 Number Open 7 Number Closed 10 Number Open 27 Number Closed 20 Number Open 8 Number Closed 9 Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 29
Ongoing work Task Status 1 Metadata Ingestion 1.1 Import repositories expansion Ongoing 1.2 Data repository suggestion form at DataMed - George 1.3 Metadata mapping review/ reconciliation between curators Ongoing 1.4 Metadata management Ongoing 1.5 Indexing Ongoing 1.6 NLP-based indexing : Gene/protein, Disease, Drug/chemical, Evaluation Biological process, Organism, Format, Access, Cell types phase 1.7 Bulk download of indices 2 Terminology server 2.3 Integrate terminology server (Indexing) Ongoing 4 Interface Design 4.2 Design interface usability issues Ongoing 4.5 Display most Accessed Datasets Not Started Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 30
Ongoing work Task Status 5 Personalized search 5.1 Improve the tracking system Ongoing 6 Searching/Ranking algorithms 6.1 Similar datasets to be expanded Ongoing 7 Display of results 7.1 Sort datasets author, published date, repository, title Ongoing 7.2 What fields should be displayed? Ongoing 7.3 Additional filters: File type Data Restrictions (data use agreement, restricted, unrestricted) Data Level (participant/aggregate) Population (mouse, human, etc) Not started 8 Link to external resources 8.1 1. Pubmed: click through to pubmed records of citing publications: copy citation to clipboard 2. Scholix Framework for Linking Data and Literature Not Started Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 31
Ongoing work Task Status 10 Documentation 10.1 Source code Ongoing 10.2 Tutorials Not Started 10.3 Help menu Not Started 10.4 Video Ongoing 11 Usability studies 11.2 User studies Ongoing Data Duplication issue: Create a plan for how to best display/represent the duplicate in the metadata records and set up a meeting to discuss the workflow for displaying the duplicates in the metadata records Jeff/Anu 12 Additional field in index 13 Generation of benchmark for the dataset 13.4 Execute designed queries and annotate results Ongoing 14 Relationship Network Graph 15 Collaborative research support Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 32
Other issues v Please deposit codes in GitHub. Please contact me at Anupama.E.Gururaj@uth.tmc.edu if you need access v Any other issues? v Thank You