The Euro-BioImaging Image Data Repository: Update & Future Directions Jason Swedlow University of Dundee Open Microscopy Environment openmicroscopy.org eurobioimaging.eu @jrswedlow
Mission & Scope
Europe s Research Infrastructure for Imaging Technologies Services to be provided Access to cutting-edge biological & medical imaging technologies with high-level user support by expert technical staff Image data repositories and analysis tools Advanced training for users and providers of imaging technologies by Hub and Nodes Euro-BioImaging HUB Coordination & support of access, data, training European infrastructure management Flagship Technology NODES Access to unique imaging technology in Europe Multimodal Technology NODES Integrated access to multiple imaging technologies
NODES USER TRAINING FLAGSHIP NODE FLAGSHIP NODE FLAGSHIP NODE FLAGSHIP NODE HUB European life scientists as users Physical Open User Access Web-access portal MULTIMODAL TECHNOLOGY NODE MULTIMODAL TECHNOLOGY NODE STAFF TRAINING Data storage and analysis infrastructure User goes home with results for publication
Progress & Roadmap Euro-BioImaging will become a pan-european research infrastructure, which provides: Open access and services to imaging technologies with high-level user support by expert technical staff. Timelines towards launch of operation 2010 May 2014 2015 2016 2017 Preparatory Phase Interim Phase Operation
Interim Board Governs EuBI Interim Phase (2014 2016): The EuBI Memorandum of Understanding formally joins 14 European countries and EMBL to implement Euro-BioImaging. Interim Board Belgium Bulgaria Czech Republic Finland France Israel Italy Norway Portugal Poland Slovakia Spain The Netherlands United Kingdom & EMBL Observers Austria, Germany (DFG), Hungary, Sweden Additional countries can join continuously.
Strong Pan-European Support 250 Euro-BioImaging Partners from 29 countries National BioImaging Chapters in23 countries Prioritized, on the roadmap of 9 countries (plus ongoing evaluations in BE, ES, HU, GR, PT, AT) 38 Legal Partners, 250 Associated Partners from 29 Countries, > 270 Letters of Intent, > 2500 Stakeholders
The Euro-BioImaging Infrastructure Model is Needed by Researchers and Works 228 User applications in four weeks 110 User visits in 14 countries in 6 months > 35 publications, more manuscripts submitted
Image Data Storage and Analysis HUB Data services European life scientists as users Web-access portal provides virtual access IMAGE TOOL REPOSITORY IMAGE DATA REPOSITORY Repository of software tools (user friendly and interoperable) Euro- BioImaging CLOUD Repository for reference image data sets (for sharing and re-use)
Image Data Storage and Analysis HUB Data services European life scientists as users Web-access portal provides virtual access IMAGE TOOL REPOSITORY IMAGE DATA REPOSITORY Repository of software tools (user friendly and interoperable) Euro- BioImaging CLOUD Repository for reference image data sets (for sharing and re-use)
Why Share Data? [S]cientific journals should progressively enforce requirements for traceable and usable data available through an article Materials should be uploaded to a repository before publication of the article (Science as an Open Enterprise, Royal Society)
Why Share Data? The Wellcome Trust expects all of its funded researchers to maximise the availability of research data with as few restrictions as possible (Wellcome Trust, Policy on Data Management and Sharing)
But What About Images? Large: Single dataset 10 GB 10 TB Diverse: LM, EM, HCS, DP, MR, CT, PET, FMRI and many subdomains Data acquired as part of specific experiments, with perturbations: Genetic, RNAi, Chemical, Geographic and others Poorly standardised metadata complete imaging study is challenging Experimental (genes, chemistry, etc.) Imaging (modality, light sources detectors, lenses, etc.) Analytic (measurements, regions, text, etc.) Experience shows that study metadata are complex, poorly annotated, need curation Which image datasets should be published?
Scaled Data Storage Requirements Estimated storage required in TBs per year. Data based on Euro-BioImaging Proof of Concept 2012/2013; >50 Nodes, >200 projects More Info: Document Repository on http://eurobioimaging.eu
Progress Towards Metadata Standards Experimental Tab-delimited data formats, molecular and chemical IDs and resources Use and exercise ELIXIR resources!! Imaging OME Data Model Bio-Formats converts >145 proprietary image formats into OME model OMERO provides common, platform-independent API for images, metadata openmicroscopy.org Analytic Metadata Tab-delimited data formats HDF5 files
Which Image Datasets? Require a definition of reference images, i.e., datasets that should be published Euro-BioImaging PCS show that <10% of images recorded at Nodes will be candidates Proposal: Euro-BioImaging ELIXIR Image Data Strategy
The Science: Similarity in Image Data Feature level similarity Phenotypic similarity Dataset 1 Dataset 2 Gene level similarity 17
PoC: Euro-BioImaging Image Data Repository - 12 studies, 16 screens - Human, S. pombe, S. cerevisiae, D. melanogaster - Nuclei, cytoskeleton, membranes, organelles - 37 Tb data - >29 Mio images - Experimental, analytic & phenotypic metadata In progress: - Deep tissue imaging - Single molecule imaging - Light sheet microscopy - Super-resolution microscopy Alvis Brazma/EBI, Rafael Carazo-Salas/Cambridge, Jason Swedlow/OME/Dundee; Funding: BBSRC 18
PoC: Euro-BioImaging Image Data Repository idr-demo.openmicroscopy.org: 37 TB images, 29M images, 12 studies, 16 screens Alvis Brazma/EBI, Rafael Carazo-Salas/Cambridge, Jason Swedlow/OME/Dundee; Funding: BBSRC
PoC: Euro-BioImaging Image Data Repository idr-demo.openmicroscopy.org: 37 TB images, 29M images, 12 studies, 16 screens Tara Oceans Data from Luis Pedro Coehlo and Eric Karsenti, EMBL
Some Useful URLs Views into IDR: RNAi in Human cells-- http://goo.gl/1zoiik Drosophila-- http://goo.gl/jpfm3j Fungi-- http://goo.gl/yfpqcw; http://goo.gl/n3ix5v Mitocheck-- http://goo.gl/2ffbwd Breinig Chemical Screen-- http://goo.gl/2uwwnj (DOI:10.17867/10000101)
Feedback on the IDR Data Re-Use...We are taking raw HCI images from small-molecule chemical library experiments, re-analyzing them to extract as many numeric features as we can, and then feeding them into machine learning algorithms for compound activity prediction The nature of the ML algorithms requires fairly large data sets to generate enough predictive power in the range 25,000+ to 500,000 compounds (or more) external data could be highly useful it would be good to process publicly available image data sets and compound collections (Emphasis added) Principal Scientist, Top 5 Pharma
Enabling IDR Data Re-Use (1) Multi-Resolution Feature Extraction e.g., WND-CHRM... Feature calculations running Feb 2016
Enabling IDR Data Re-Use (2) +... EMBL EBI Embassy Prototype Q2/2016 Benchmarks Algorithm Development Data Integration
Image Data Storage and Analysis HUB Data Services Vision Reality European life scientists as users Web-access portal provides virtual access IMAGE TOOL DIRECTORY IMAGE DATA REPOSITORY Repository of software tools (user friendly and interoperable) Euro- BioImaging CLOUD Repository for reference image data sets (for sharing and re-use)
Euro-BioImaging Jan Ellenberg, Ante Keppler, Tanja Ninkovic, Federica Paina, Interim Board, and the Stakeholders! The OME Consortium Josh Moore, Simon Li, Eleanor Williams, Gabriella Rustici EMBL EBI Alvis Brazma, Ugis Sarkans, Steven Newhouse, Helen Parkinson Cambridge Rafael Carazo y Salas, Bálint Antal BBSRC Rowan Mckibbin, John Hancock Data Rainer Pepperkok, Jan Ellenberg, Jean-Karim Heriche, Eric Karsenti, Luis Pedro Coehlo, Jeremy Simpson, Peter Thorpe, Jennifer Rohn, Buzz Baum, Rodney Rothstein, Laurence Pelletier Thank you!!!!
Thank you!!!!