Core Technology Development Team Meeting
|
|
- Emmeline Shaw
- 5 years ago
- Views:
Transcription
1 Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: Access Code: For international call in numbers, please visit:
2 Agenda v Updates regarding last meeting action items v biocaddie AHM slides v Indexing and metadata mapping plans v Brief progress reports from All Supported by the NIH grant 1U24 AI to the University of California, San Diego 2
3 Updates Action Items v Visit datamed.biocaddie.org and provide feedback for the new version, v0.2 via the prototype_issues repository in GitHub v Add user feedback from the BD2K AHM demo to the prototype_issues repository in GitHub v Prepare the biocaddie AHM slides for discussion during the next CDT meeting v Have the biocaddie posters from BD2K AHM available as flyers/handouts for the attendees Supported by the NIH grant 1U24 AI to the University of California, San Diego 3
4 DDI Core Technology Development Team
5 Introduction v Help users find accessible data By disease, data modality, demographics, etc. Engage these users and the data community v Assist data producers on how to publish data for maximal discoverability Tools to index, submit, visualize v Build a prototype that docks targeted pilot products RFAs for others to help build DataMed
6 Core Development Roadmap DDI architecture Setup website for searching for datasets Set up infrastructure for web portal Data ingestion Determine datasets Decide on scalable data/metadata input routes Metadata mapping Search function Implement the function for 3 repositories Feedback collection Github RFA for pilot on Harvester for DDI schema RFA announced Review, selection and award Data identifier Implement Data identifier into the DDI Data indexing Set up indexing using metadata from WG 3.0 Dataset result display Sort datasets Group metadata Terminology server Import ontology Integrate to Scigraph API Integrate autocomplete feature to prototype Interface design New interface for prototype v 0.2 Global statistics Wrap up of pilot projects PP 1.1 literature/dataset link: Advanced search for domain specific repository PP 2.1 Recommender System: Ranking results PP 2.2 isee/delve: Innovative visualization PP 3.2 PDB citation pipeline Usability study UI Analysis Ranking algorithm Results from PP 2.1 Search function Expand the function to 7 repositories Find similar datasets Search history Architecture Code refactoring September 2015 November 2015 Version 0.1 Version 0.2 We are here Usability study User study Track user s action Ranking algorithm Refine search results based on user s selection Report from WG 8 Data duplication problem Metadata management Personalized search Share/save search results User account Link dataset to external resources PubMed Grants Search algorithm Boolean/advanced search Data repository search function February 2016 Version 0.5 June 2016 Version 1.0
7 User Needs Analysis v Focused Semi-Structured Interviews 13 participants for about 30 mins in person or by phone v Proper guidance on data standardization and dataset linkage required v Visualization tool compatible unified agreeable formats that provide granular access to data decrease data format issues v Robust visualization tools and dashboard view of data enhances data visualization experience v Metadata challenges mitigated via standard ontologies and controlled vocabularies text mining user friendly tools to create metadata sufficient training
8 BioCADDIE prototype architecture Ingestion Indexing Repositories Metadata Ingestion ElasticSearch Data Sources Online datasets User Interface Funding Agencie s Publishers Data producers Terminology server
9 Data Indexing Pipeline Data Source 1. Configuration file developed by curator 2. Extraction of metadata/data from data resource or dataset via ingestion module Cache information for further processing 3. Process metadata/data via a set of processing modules e.g. ID conversion, keyword extraction, data normalization 4. Mapping of metadata/data to metadata model(s) 5. Export to target endpoint(s)via export modules 6. Search via ElasticSearch APIs 12/1/15 Supported by the NIH grant #1-U24-AI to the University of California, San Diego 9
10 Data Indexing Pipeline Current Technologies 1. JSON based documents and services 2. mongodb (Apache 2 license) being used to manage cached dataset description documents 3. Processing pipeline components can take advantage of cloud deployment for scalability 4. Document processing coordinated via messaging queue (Apache ActiveMQ (Apache 2 license)) 5. ElasticSearch (Apache 2 license) being used as index endpoint Simple cloud deployment and management Sophisticated RESTful API Advanced index customization Full power of lucene and plug-ins 12/1/15 Supported by the NIH grant #1-U24-AI to the University of California, San Diego 10
11 Data Transformation 12/1/15 Supported by the NIH grant #1-U24-AI to the University of California, San Diego 11
12 Data Mapping 12/1/15 Supported by the NIH grant #1-U24-AI to the University of California, San Diego 12
13 ElasticSearch Basic Search 12/1/15 Supported by the NIH grant #xxxxxxxxx to the University of California, San Diego 13
14 ElasticSearch via CURL curl -XGET ' -d '{ "query" : { "match" : { "dataitem.keywords" : "GPCR" } },"size":100 }' > ES.out JSON data visualized via JSON Editor for the Mac wa/link?path=mac%2fjsoneditor 12/1/15 Supported by the NIH grant #1-U24-AI to the University of California, San Diego 14
15 Current Work with Data Ingest v Addition of more data repositories v Working with WG3 ensure consistency with metadata model Revisions of metadata model based on data currently being ingested v Addition of enhancement modules Semantic Expansion Metadata Enhancement 12/1/15 Supported by the NIH grant #xxxxxxxxx to the University of California, San Diego 15
16 Ontology service roles v Indexing Process metadata v Searching Search expansion Brain tumor Glioblastoma, astrocytoma, Frontal lobe tumor, Suggestions (autocompletion) Fix typo (did you mean?) Facet management Search entity type: gene, disease,
17 Terminology services v Term/Phrase analyzer/normalizer Search and indexing Synonyms/hypo-hypernyms (term mapper) cancer à {carcinoma, sarcoma, } But not carcinoma à cancer Build metadata structure v Term mapper Semantic mappings Other Terminologies v Term processor Suggestions (autocompletion) Fix typo (did you mean?)
18 Terminology v Primary Disease (Condition) Drug (Chemical, Substance, Compound) Omic: Gene Protein v Auxillary Body Structure Procedure Laboratory Organism v Maybe Physical object
19 Sources v MeSH v SNOMED CT v NCBI v GO v HGNC v FMA Condition Chemical Omic Procedure Body Loc/Sys Organism v RXNORM v UBERON
20 Relationships v Is a, part of v Synonym v Sibling v MeSH à 10^6 v SNOMED CT à 10^6 v GO à 2x10^6 v NCI à 10^5 v HGNC à ~ 10^5 (synonyms)
21 Infrastructure v SciGraph Neo4j Graph database v ElasticSearch Term search
22 User Interface workflow Query Entry Entity Identification Expansion Query Execution biocaddi E backend Advanced filters Terminology server ElasticSearch Presentation Facets Organize results Visualization
23 MVC Structure Responsible for constructing ES query and retrieving search results. Model View User interaction; Responsible for rendering of model. Controller Responsible for responding to user input; Instructing the Model to respond to the user input. Generate corresponding ES index,type, search keyword, search fields, facets fields, filter fields. Integrate with terminology server.
24 User Interface v UI webpage address: datamed.biocaddie.org User name: biocaddie Password: biocaddie Supported by the NIH grant 1U24 AI to the University of California, San Diego 24
25 Pilot project integration PP Title Linking publications and underlying datasets using natural language processing Data Recommendation Using Machine Learning and Crowdsourcing Intelligent Search Expansion and Visualization of Datasets (isee-delve) Development of Citation and Data Access Metrics applied to RCSB Protein Data Bank and related Resources Integrated As As Specialized advanced search for GWAS datasets Ranking function based on citation metrics for GEO series data a) isee similarity metric in ElasticSearch b) DELVE implementation as exploratory search and visualization option. (i) for PDB (ii) for gene expression data Ranking function based on citation metrics (dataset mentions) for PDB data Supported by the NIH grant 1U24 AI to the University of California, San Diego 25
26 Refinement & Evaluation v v Agile approach: Iterative testing by users Feedback from users based on user requirement to inform development User participation facilitated by issue tracker software, GitHub: Biocaddie/prototype_issues ( System-centric evaluation (WG4): Evaluated in the context of pre-defined use cases. Queries and benchmark datasets being developed Evaluation using standard relevance metrics, such as precision, recall, mean average precision (MAP) and the F-measure. Supports the evaluation of improved search algorithms in future iterations. Supported by the NIH grant 1U24 AI to the University of California, San Diego 26
27 Evaluation & User-testing v User-centric evaluation: Collect user behavioral data to improve user interface design: Four key components of usability: 1. User Analysis 2. Function Analysis 3. Representation Analysis 4. Task Analysis Study representative sample of users interacting with the system while performing specific tasks. Analysis of verbal think-aloud protocols to capture and characterize the interaction between user and system in the context of a relevant task. Identify issues related to system usability, enhance understanding dataset search, and the knowledge and priorities brought to bear on it by domain experts. Supported by the NIH grant 1U24 AI to the University of California, San Diego 27
28 Timeline Y2 Q1 Sep. Nov., 2015 Y2 Q2 Dec Feb., 2016 Sep., 2015 Oct., 2015 Nov., 2015 Dec., 2015 Jan., 2016 Feb., 2016 Interface design Global sta)s)cs New interface for prototype v0.2 Improving interface based on feedback Searching algorithms Finder similar datasets Add Boolean search Add advanced search Add data repositories search Ranking algorithms Refine search results based on user's selec)on Report from WG 8 Dataset result display Sort datasets Group metadata Accessibility of dataset Summarize all returned results (isee- DELVE) Allow users to select mul)ple repositories Improve faceted browsering Personalized search Search history Share search results User account Save search results Link dataset to external resources PubMed Grants (via PubMed?) 28
29 Timeline Y2 Q1 Sep. Nov., 2015 Y2 Q2 Dec Feb., 2016 Sep., 2015 Oct., 2015 Nov., 2015 Dec., 2015 Jan., 2016 Feb., 2016 Data inges=on Data inges)on and indexing Metadata mapping Terminology server Import ontology Create UI broswer Integrate to Scigraph API Integrate autocomplete feature to core UI Integra=on of pilot projects Integrate PP 1.1 GWAS Finder Integrate PP 2.2 isee- DELVE for pdb Integrate PP 2.1 DataRank for GEO Explore gene expression data using PP 2.2 Integrate PP 3.2 for PDB Integrate PP 2.1 for other dataset Feedback collec=on Github Feedback form Documenta=on Source codes on github Tutorials Usability Study UI Analysis User Study Track user's ac)ons Data duplica=on problem Metadata management Architecture/Scalability Code refactoring Back up 29
30 Team members UCSD UTHealth Claudiu Farcas Muhammad Amith Jeffrey Grethe Xiaoling Chen Yueling Li Trevor Cohen Larry Lui Xiao Dong Burak Ozyurt Anupama Gururaj Min Jiang Oxford Todd Johnson Ruiling Liu Alejandra Gonzalez-Beltran Vidya Narayana Philippe Rocca-Serra Deevakar Rogith Susanna-Assunta Sansone Ergin Soysal Cui Tao NIH Hua Xu Yaoyun Zhang Ian Fore Ron Margolis Pilot projects team members Supported by the NIH grant 1U24 AI to the University of California, San Diego 30
31 Demo of prototype Supported by the NIH grant 1U24 AI to the University of California, San Diego 31
32 Indexing and metadata mapping plans v Next set of repositories to be indexed Timeline v Metadata mapping tools for repositories v Documentation for the workflow for mapping by the repositories ICPSR as a use case Supported by the NIH grant 1U24 AI to the University of California, San Diego 32
33 Ongoing work Task Status 1 Metadata Ingestion Import repositories PDB,GEO 2. LINCS 3. BioProject, ArrayExpress, GEMMA, dbgap 4. ICPSR Stable API details Ongoing Sample files 1.2 Metadata mapping Ongoing 1.3 Metadata management Ongoing 1.4 Indexing Ongoing 2 Terminology server 2.1 Develop terminology server 1) Imported terminologies (6) and validated them 2) Created UI-Browser for TS 3) Integration to Scigraph API 4) Create auto complete feature 09/01 10/09 Ongoing 10/ Integrate terminology server Ongoing Supported by the NIH grant 1U24 AI to the University of California, San Diego 33
34 Pilot project integration (Task 3) PP Presented to CDT / / / /01 As Integrated Specialized advanced search for GWAS datasets Ranking function based on citation metrics for GEO series data a) isee similarity metric in ElasticSearch b) DELVE implementation as exploratory search and visualization option. (i) for PDB (ii) for gene expression data Ranking function based on citation metrics (dataset mentions) for PDB data Completed On 09/22 10/21 9/01 Ongoing (12/31) Ongoing (11/30) Supported by the NIH grant 1U24 AI to the University of California, San Diego 34
35 Ongoing work Task Status 4 Interface Design 4.1 Global statistics Implemented 4.2 Design interface Ongoing 4.3 Implement new design Ongoing 4.4 Breadcrumb for website navigation Not started 4.5 Display most Accessed Datasets Not Started 5 Personalized search 5.1 Search history Implemented 5.2 Save search results Not Started 5.3 Share search results Not Started 5.4 User account - - Discussion Not Started 6 Searching/Ranking algorithms 6.1 Similar datasets Implemented 6.2 Data repositories search Ongoing 6.3 Boolean/advanced search Not Started 6.4 Refine search results based on user s selection Not Started Supported by the NIH grant 1U24 AI to the University of California, San Diego 35
36 Ongoing work Task Status 7 Display of results 7.1 Sort datasets Ongoing 7.2 What fields should be displayed? - Discussion 7.3 Browsing (grouping facets/metadata) Not started 7.4 Accessibility information Not started 8 Link to external resources 8.1 Pubmed Ongoing 8.2 Grants Ongoing 9 Feedback 9.1 GitHub Implemented 9.2 Feedback form Not Started 10 Documentation 10.1 Source code Not Started 10.2 Tutorials Not Started 11 Usability studies 11.1 UI Analysis Completed 11.2 User studies Not Started 12 Data Duplication issue Supported by the NIH grant 1U24 AI to the University of California, San Diego 36
37 Timeline Y2 Q1 Sep. Nov., 2015 Y2 Q2 Dec Feb., 2016 Sep., 2015 Oct., 2015 Nov., 2015 Dec., 2015 Jan., 2016 Feb., 2016 Interface design Global sta)s)cs New interface for prototype v0.2 Improving interface based on feedback Searching algorithms Finder similar datasets Add Boolean search Add advanced search Add data repositories search Ranking algorithms Refine search results based on user's selec)on Report from WG 8 Dataset result display Sort datasets Group metadata Accessibility of dataset Summarize all returned results (isee- DELVE) Allow users to select mul)ple repositories Improve faceted browsering Personalized search Search history Share search results User account Save search results Link dataset to external resources PubMed Grants (via PubMed?) 37
38 Timeline Y2 Q1 Sep. Nov., 2015 Y2 Q2 Dec Feb., 2016 Sep., 2015 Oct., 2015 Nov., 2015 Dec., 2015 Jan., 2016 Feb., 2016 Data inges=on Data inges)on and indexing Metadata mapping Terminology server Import ontology Create UI broswer Integrate to Scigraph API Integrate autocomplete feature to core UI Integra=on of pilot projects Integrate PP 1.1 GWAS Finder Integrate PP 2.2 isee- DELVE for pdb Integrate PP 2.1 DataRank for GEO Explore gene expression data using PP 2.2 Integrate PP 3.2 for PDB Integrate PP 2.1 for other dataset Feedback collec=on Github Feedback form Documenta=on Source codes on github Tutorials Usability Study UI Analysis User Study Track user's ac)ons Data duplica=on problem Metadata management Architecture/Scalability Code refactoring Back up 38
39 Other issues v Please deposit codes in GitHub. Please contact me at Anupama.E.Gururaj@uth.tmc.edu if you need access v Any other issues? v Thank You
Core Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationAgenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities
Agenda Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Progress and updates Y1Q3 and plans for Y1Q4 Plan for the
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationeveloping DataMed the current status
eeloping DataMed the current status Hua Xu Core Deelopment Team (CDT) biocaddie AHM 2017 8/8/17 Supported by the NIH grant 1U24 AI117966-01 to the Uniersity of California, San Diego 1 Outline CDT Roles
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationMetadata Ingestion and Processinng
biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting Agenda v Updates regarding last meeting action items v Presentation by Ergin about Ontology Services v Brief updates from others Supported by the NIH grant 1U24
More informationThe Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK
The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationMinutes. Date: Location: UCSD BRF2 5A03. Attendees Present
Executive Committee Meeting Location: UCSD BRF2 5A03 Date: 8-16-16 Start time: 10:00 am PDT End time: 11:30 am PDT Meeting Objective Attendees Present Minute Taker Executive Committee Meeting UCSD: Lucila
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please isit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSusanna-Assunta Sansone, PhD. Metadata WG3 chair.
Susanna-Assunta Sansone, PhD Metadata WG3 chair 3-workgroup@biocaddie.org WG3 Metadata v v Full description: goals, synergies, phases, members & files Joint effort with BD2K Center for Expanded Data Annotation
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationMetadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform
Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform biocaddie All Hands Meeting September 11 th, 2016 Ram Gouripeddi & Julio Facelli Department
More informationA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories Tim Clark (Harvard Medical School & Massachusetts General Hospital) Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science,
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationHarmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies
Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harold R. Solbrig 1, Guoqian Jiang 1 1 Mayo Clinic College of Medicine, Rochester, MN [solbrig.harold,
More informationPrototyping a Biomedical Ontology Recommender Service
Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the
More informationData publication and discovery with Globus
Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,
More informationSimile Tools Workshop Summary MacKenzie Smith, MIT Libraries
Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries Intro On June 10 th and 11 th, 2010 a group of Simile Exhibit users, software developers and architects met in Washington D.C. to discuss the
More informationwarwick.ac.uk/lib-publications
Original citation: Zhao, Lei, Lim Choi Keung, Sarah Niukyun and Arvanitis, Theodoros N. (2016) A BioPortalbased terminology service for health data interoperability. In: Unifying the Applications and Foundations
More informationJisc Research Data Discovery Service Project Workshop Christopher Brown
18 Feb 2016 Jisc Research Data Discovery Service Project Workshop Christopher Brown Agenda» 10:30 10:40 Welcome and Introduction - Catherine Grout» 10:40 10:45 Project status and introduction to workshop/exercise
More informationWebinar Annotate data in the EUDAT CDI
Webinar Annotate data in the EUDAT CDI Yann Le Franc - e-science Data Factory, Paris, France March 16, 2017 This work is licensed under the Creative Commons CC-BY 4.0 licence. Attribution: Y. Le Franc
More informationEmbracing Semantic Technology for Better Metadata Authoring in Biomedicine
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine Attila L. Egyedi, Martin J. O Connor, Marcos Martínez-Romero, Debra Willrett, Josef Hardi, John Graybeal, and Mark A. Musen Stanford
More informationLIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories
LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories Martin Fenner (DataCite) Mercè Crosas (Institute for Quantiative Social Science, Harvard University) May 15, 2017 2014 Joint Declaration
More informationNCBI News, November 2009
Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved
More informationLinking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier
Linking data and publications the past, present, and future Dr. Hylke Koers, Head of Content Innovation, Elsevier BioCADDIE webinar January 8, 2015 Ease of access Open Access 2 The issue: data is important,
More informationInteroperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research
Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Ian Fore, D.Phil. Associate Director, Biorepository and Pathology Informatics Senior Program
More informationenanomapper database, search tools and templates Nina Jeliazkova, Nikolay Kochev IdeaConsult Ltd. Sofia, Bulgaria
enanomapper database, search tools and templates Nina Jeliazkova, Nikolay Kochev IdeaConsult Ltd. Sofia, Bulgaria www.ideaconsult.net Ø enanomapper database: data model, technology; NANoREG data transfer
More informationWhat is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester
National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text
More informationThe Materials Data Facility
The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials
More informationDigital repositories as research infrastructure: a UK perspective
Digital repositories as research infrastructure: a UK perspective Dr Liz Lyon Director This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 UKOLN is supported by: Presentation
More informationIHS Engineering Workbench V1.2 Release Notes
IHS Markit is pleased to announce the release of Version 1.2 of IHS Engineering Workbench, the next major release that delivers Standards management capabilities, along with multiple enhancements to existing
More informationIntroduction to Systems Biology II: Lab
Introduction to Systems Biology II: Lab Amin Emad NIH BD2K KnowEnG Center of Excellence in Big Data Computing Carl R. Woese Institute for Genomic Biology Department of Computer Science University of Illinois
More informationMarkus Kaindl Senior Manager Semantic Data Business Owner SN SciGraph
Analytics Building business tools for the scholarly publishing domain using LOD and the ELK stack SEMANTiCS Vienna 2018 Markus Kaindl Senior Manager Semantic Data Business Owner SN SciGraph 1 Agenda (25
More informationKoha Integrations: EDS, Publication Finder and OpenAthens
Koha Integrations: EDS, Publication Finder and OpenAthens Alvet Manager, Library Services Engineering (South West Asia, Oceania, Africa) 1 Agenda Holdings Management and Publication Finder 2 1 Search Depth
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationMultimedia Quarterly Review
Multimedia Quarterly Review April - September 2014 Wikimedia Foundation mediawiki.org/wiki/multimedia 10/23/2014 Vincent van Gogh - Self-Portrait by Vincent Van Gogh, from Google Art Project. Public domain,
More informationCall for Participation in AIP-6
Call for Participation in AIP-6 GEOSS Architecture Implementation Pilot (AIP) Issue Date of CFP: 9 February 2013 Due Date for CFP Responses: 15 March 2013 Introduction GEOSS Architecture Implementation
More informationBlink Project: Linked Open Data for Countway Library Final report for Phase 1 (June-Nov. 2012) Prepared by Sophia Cheng
Blink Project: Linked Open Data for Countway Library Final report for Phase 1 (June-Nov. 2012) Prepared by Sophia Cheng Summary We propose to improve the usefulness and discoverability of Countway Library
More informationExploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix
Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as
More informationOpen Research Online The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs The Smart Book Recommender: An Ontology-Driven Application for Recommending Editorial Products
More informationEUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT
EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts
More informationA Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision
A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of
More informationNCI Thesaurus, managing towards an ontology
NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports
More informationOracle APEX 18.1 New Features
Oracle APEX 18.1 New Features May, 2018 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationEuropeana Core Service Platform
Europeana Core Service Platform DELIVERABLE D7.1: Strategic Development Plan, Architectural Planning Revision Final Date of submission 30 October 2015 Author(s) Marcin Werla, PSNC Pavel Kats, Europeana
More informationDeliverable 8.2. Project ID Project Title. Project Acronym. Start Date of the Project. Duration of the Project. Work Package Number 8
Deliverable 8.2 Project ID 654241 Project Title Project Acronym Start Date of the Project Duration of the Project A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype
More informationQ2 2017/2018 Oct Nov Dec
Q2 2017/2018 (Oct - Dec) Platform dev Multimedia Community Research Programs Design Search Q2 2017/2018 Oct Nov Dec Multi-Content Revisions (MCR) Goal: Have MCR sufficiently ready so that the Multimedia
More informationThe NIH Big Data to Knowledge Initiative: Raising the Prominence of Data
The NIH Big Data to Knowledge Initiative: Raising the Prominence of Data Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information Programs Development
More informationMaking data publication a first class research output
Making data publication a first class research output Andrew L. Hufton Managing Editor, Scientific Data https://www.nature.com/sdata/ Helping Researchers Publish, University of Cambridge, Oct 2017 Launched
More informationverapdf Industry supported PDF/A validation
verapdf Industry supported PDF/A validation About this webinar What we ll be showing you: our current development status; the Consortium s development plans for 2016; how we ve been testing the software
More informationWHO ICD11 Wiki LexWiki, Semantic MediaWiki and the International Classification of Diseases
WHO ICD11 Wiki LexWiki, Semantic MediaWiki and the International Classification of Diseases Guoqian Jiang, PhD Harold Solbrig Division of Biomedical Statistics and Informatics Mayo Clinic College of Medicine
More informationCDIS Biomedical Data Commons
CDIS Biomedical Data Commons Computational Life Science Seminar Series October 18, 2017 Michael Fitzsimons Center for Data Intensive Science Agenda What is a Data Commons? Data Commons at CDIS NCI GDC
More informationKaltura Video Package for Moodle 2.x Quick Start Guide. Version: 3.1 for Moodle
Kaltura Video Package for Moodle 2.x Quick Start Guide Version: 3.1 for Moodle 2.0-2.4 Kaltura Business Headquarters 5 Union Square West, Suite 602, New York, NY, 10003, USA Tel.: +1 800 871 5224 Copyright
More informationAlternative Tools for Mining The Biomedical Literature
Yale University From the SelectedWorks of Rolando Garcia-Milian May 14, 2014 Alternative Tools for Mining The Biomedical Literature Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/1/
More informationElements of a Practical Roadmap for Implementation: On-Line Platform Technology Facilitation Mechanism
Elements of a Practical Roadmap for Implementation: On-Line Platform Technology Facilitation Mechanism Clovis Freire, UNDESA Jorge Martinez Navarrete, UN-OICT Workshop on Science, Technology and Innovation
More informationDesign and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch
619 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationMULTI-GROUP AUDITS, THE CENTRAL MONITORING PORTAL, AND OTHER CTSU UPDATES. Agenda MULTI-GROUP AUDITS 10/5/2017 OISHI SYMPOSIUM. Multi-Group Audits
MULTI-GROUP AUDITS, THE CENTRAL MONITORING PORTAL, AND OTHER CTSU UPDATES OISHI SYMPOSIUM 1 Agenda Multi-Group Audits Central Monitoring Portal Website and Administrative Updates 2 MULTI-GROUP AUDITS 3
More informationExploring the Nuxeo REST API
Exploring the Nuxeo REST API Enabling Rapid Content Application Craftsmanship Copyright 2018 Nuxeo. All rights reserved. Copyright 2017 Nuxeo. All rights reserved. Chapter 1 The Nuxeo REST API What do
More informationCoG: The NEW ESGF WEB USER INTERFACE
CoG: The NEW ESGF WEB USER INTERFACE ESGF F2F Workshop, Livermore, CA, December 2014 Luca Cinquini [1], Cecelia DeLuca [2], Sylvia Murphy [2] [1] California Ins/tute of Technology & NASA Jet Propulsion
More informationA USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0
A USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0 Prepared by Jon Pollak, CUAHSI Water Data Center User Support Specialist September 2014 1 DISCLAIMERS The HIS Central application
More informationFinding and Exporting Data. BioMart
September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.
More informationPhenotype Discovery in NHLBI Genomic Studies
Phenotype Discovery in NHLBI Genomic Studies Final Report Hyeoneui Kim, RN, PhD Son Doan, PhD Ko-Wei Lin, DVM, PhD Michael Conway, PhD Alexander Hsieh Asher Garland Seena Farzaneh Neda Alipanah Stephanie
More informationWhat s Out There and Where Do I find it: Enterprise Metacard Builder Resource Portal
What s Out There and Where Do I find it: Enterprise Metacard Builder Resource Portal Gary W. Allen, PhD Project Manager Joint Training Integration and Evaluation Center Orlando, FL William C. Riggs Senior
More informationThings to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic
Things to consider when using Semantics in your Information Management strategy Toby Conrad Smartlogic toby.conrad@smartlogic.com +1 773 251 0824 Some of Smartlogic s 250+ Customers Awards Trend Setting
More informationData Exchange and Conversion Utilities and Tools (DExT)
Data Exchange and Conversion Utilities and Tools (DExT) Louise Corti, Angad Bhat, Herve L Hours UK Data Archive CAQDAS Conference, April 2007 An exchange format for qualitative data Data exchange models
More informationDevelopment of an Ontology-Based Portal for Digital Archive Services
Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw
More informationUpdate on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University
Update on Dataverse Image credit: David Bygott (CC-BY-NC-SA) 2014 Dryad-Dataverse Community Meeting Mercè Crosas, Elizabeth Quigley & Eleni Castro Data Science > IQSS > Harvard University Introduction
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017
ProQuest Dissertations and Theses Overview Austin McLean and Marlene Coles CGS Summer Workshop, July 2017 Agenda Dissertations and ProQuest Short form video Pilot Project 2 A mission that aligns with universities
More informationRESTful API Design APIs your consumers will love
RESTful API Design APIs your consumers will love Matthias Biehl RESTful API Design Copyright 2016 by Matthias Biehl All rights reserved, including the right to reproduce this book or portions thereof in
More informationDBpedia Data Processing and Integration Tasks in UnifiedViews
1 DBpedia Data Processing and Integration Tasks in Tomas Knap Semantic Web Company Markus Freudenberg Leipzig University Kay Müller Leipzig University 2 Introduction Agenda, Team 3 Agenda Team & Goal An
More informationIndiana University Research Technology and the Research Data Alliance
Indiana University Research Technology and the Research Data Alliance Rob Quick Manager High Throughput Computing Operations Officer - OSG and SWAMP Board Member - RDA Organizational Assembly RDA Mission
More informationBig Data Integrator Platform Platform Architecture and Features Dr. Hajira Jabeen Technical Team Leader-BDE University of Bonn
Big Data Integrator Platform Platform Architecture and Features Dr. Hajira Jabeen Technical Team Leader-BDE University of Bonn BDE Presentation, EBDVF, 17 BigDataEurope 2 Making Big Data Accessible 3 How
More informationReducing Consumer Uncertainty
Spatial Analytics Reducing Consumer Uncertainty Towards an Ontology for Geospatial User-centric Metadata Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate
More informationWhat's New in CTPAT. Logo and Abbreviation Current Membership Trusted Trader Best Practices Minimum Security Criteria Outreach/Training
What s New Our Mission Detect and prevent terrorists and terrorist weapons from entering the United States, while facilitating the orderly and efficient flow of legitimate trade and people at and through
More informationUMLS-Query: A Perl Module for Querying the UMLS
UMLS-Query: A Perl Module for Querying the UMLS Nigam H. Shah, MBBS, PhD, Mark A. Musen, MD, PhD Center for Biomedical Informatics Research, Stanford University, Stanford, CA Abstract The Metathesaurus
More information