Introduction to FAIRDOM (fair-dom.org): Findable Accessible Interoperable Reusable Data Operations Models Jon Olav Vik Centre for Integrative Genetics (CIGENE) IHA, BIOVIT, NMBU www.nmbu.no/prosjekter/digisal
Norges Forskningsråd, bioteknologiprogrammet. Liv, teknologi og verdiskapning. 12 forskerprosjekter + nettverksprosjektet "Digitalt Liv Norge". Totalt 360 millioner kroner over 5 år. 2
Den digitale laksen: et bibliotek (Storyboard fra animasjon i samarbeid med NMBUs kommunikasjonsavdeling og animatør Tor Martin Austad, visuallab.no.)
Menu FAIRDOM Findable Accessible Interoperable Reusable Data Operations Models ISA Investigation Study Assay Providing context for data Metadata = data about data Making data easy to navigate and use Your own data management plan
Electrofished trout A menagerie of biological data Heart deformation Reindeer tracking (obsolete file format ) Magnetic resonance imaging RNAseq gene expression Satellite vegetation index Heart electrophysiology Liver metabolomics 5 GB/sample 5 MB 500 kb
Now you. Team up with a friend Tell them your project data Then we will review
Responses 8-)
fair-dom.org manages to be useful for...
Data generation and management in the Digital Salmon Fabian Jacob Knowledge manager Jon Olav SPARQL/RDF with GBOL (Genetic Biology Ontology Language) 9
fairdomhub.org 10
16
API (application programming interface)
Investigation Study Assay: Scientists actually agreeing on something
The Investigation-Study-Assay (ISA) structure Programme Overarching research theme (The Digital Salmon) Project Research grant (DigiSal, GenoSysFat) Investigation A particular biological process, phenomenon or thing (typically corresponds to [plans for] one or more closely related papers) Study Experiment whose design reflects a specific biological research question Assay Standardized measurement or diagnostic experiment using a specific protocol (applied to material from a study)
Now you. Programme Overarching research theme (if applicable) Project Research grant Investigation A particular biological process, phenomenon or thing (typically corresponds to [plans for] one or more closely related papers) Study Experiment whose design reflects a specific biological research question Assay Standardized measurement or diagnostic experiment using a specific protocol (applied to material from a study) makes a Data File, may have a SOP (standard operating procedure),
Standards and metadata will save your sanity The analyst who had to work with people who didn't annotate their data Théodore Géricault, oil on canvas, 1822
Open formats? Reindeer tracking (obsolete file format )
Data about data Descriptive metadata (title, abstract, author, keywords) Structural metadata (linking related pieces of information) Administrative metadata (file format, access control, version history)
Minimum information standards
Encoding knowledge for humans and computers HTML (hypertext markup language) browser human-readable 25
Encoding knowledge for humans and computers SBML (systems biology markup language) automatic human-readable (PDF), or executable (C++, ) 26
Standards related to dynamical systems modelling co.mbine.org Fig.: Mosaic of standards, adapted from (Chelliah et al., 2009, DILS)
Self-assembling jigsaw puzzles MODELL om kalsium-ioner i røde blodceller inni hovedpulsåren DATA on Ca 2+ in erythrocytes in aorta d[ca]/dt =... 28
Self-assembling jigsaw puzzles: Ontological annotation MODELL om kalsium-ioner i røde blodceller inni hovedpulsåren FMA:3734 CHEBI:29108 FMA:62885 DATA CHEBI:29108 on Ca 2+ FMA:62885 in erythrocytes FMA:3734 in aorta d[ca]/dt =... New knowledge automatically connects to that which already exists 29
Now you. Three minutes. Choose one aspect of your work and list: Data formats that go into and out of it. Any databases (input and output). Relevant ontologies (lists of concepts and relationships between them), e.g. a namespace of chemical identifiers gene names a bacterial taxonomy Your primary users: Who will take your results further?
The data manager job Talk to each partner in the project, from lab techs to researchers and professors. Identify what data they take in and give out. Do they have lab protocols, bioinformatics pipelines,? What file formats and databases do they relate to? Where do they publish? Who uses their data? Guide them gently towards being conscious of standards and formats. Help them help themselves. When to tidy things up? when you hand it off to someone else, if not before
So, you want research data to be shareable and reusable? Data management plans must be made concrete in each project You need to allocate people time to coordinate and assist 32
Wrapping up You now have a flying start for your data management plan. Later today: FAIRDOM data management checklist, 80 minutes in pairs/groups, 60 minutes to consolidate for your own project. a strategy for gradually fleshing out the details, with your partners Take notes of questions arising, we're here to help!