A framework for image analysis in plant breeding. Paul van Schayck

Similar documents
Michal Kuneš

The walkthrough is available at /

You need to use the URL provided by your institute s OMERO administrator to access the OMERO.web client.

1 Download the latest version of ImageJ for your platform from the website:

2. RELATED WORK 3.DISEASE IDENTIFICATION SYSTEM IN VEGETABLES

Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice, Slovak Republic)

TOPIM-2018 schedule. Session 1: Introduction to OMERO & CellProfiler

Kobe Open Workshop 25/10/2018. Software versions used for the workshop: Workshop content. Resources:

Image Processing with KNIME

Visualization on BioHPC

DURING THIS TWO-DAY CLASS, PARTICIPANTS WILL:

CMPUT 695 Fall 2004 Assignment 2 Xelopes

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

HMS Image Management Core

Prof. Konstantinos Krampis Office: Rm. 467F Belfer Research Building Phone: (212) Fax: (212)

Global Non-Rigid Alignment. Benedict J. Brown Katholieke Universiteit Leuven

An Introduction to Apache Spark

Next Generation Storage for The Software-Defned World

Designing an institutional research data management infrastructure for the life sciences

Metadata Models for Experimental Science Data Management

Choosing an Interface

Introduction. Loading Images

CloudMan cloud clusters for everyone

Continuous Integration and Deployment (CI/CD)

Processing Genomics Data: High Performance Computing meets Big Data. Jan Fostier

CSinParallel Workshop. OnRamp: An Interactive Learning Portal for Parallel Computing Environments

Multiple Usage of KNIME in a Screening Laboratory Environment

CO 2 DAS Data Ingestion System Design Version 1.0 Charles J Antonelli 1 April 2011

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure

Alaris S2000 Series Scanner Firmware Release Notes

Description. Speaker Patrizia Monteduro (International Consultant, FAO) TRAINING GEONETWORK OPENSOURCE Islamabad, Pakistan, Jan 29-31, 2014

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Fault-Tolerant Parallel Analysis of Millisecond-scale Molecular Dynamics Trajectories. Tiankai Tu D. E. Shaw Research

Based on Big Data: Hype or Hallelujah? by Elena Baralis

AutoTx. (and more ) Smart automatic background data transfer for Windows. Niko Ehrenfeuchter - Imaging Core Facility - University of Basel

Groovy in Jenkins. Ioannis K. Moutsatsos. Repurposing Jenkins for Life Sciences Data Pipelining

Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

Automatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain

FuncX: A Function Serving Platform for HPC. Ryan Chard 28 Jan 2019

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Building an Exotic HPC Ecosystem at The University of Tulsa

Handling and Processing Big Data for Biomedical Discovery with MATLAB

ReFrame: A Regression Testing Framework Enabling Continuous Integration of Large HPC Systems

DiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version Dec Flexense Ltd.

Data Protection Guide

VxRail: Level Up with New Capabilities and Powers GLOBAL SPONSORS

Task-based distributed processing for radio-interferometric imaging with CASA

Managing CAE Simulation Workloads in Cluster Environments

Dealing with Large Datasets. or, So I have 40TB of data.. Jonathan Dursi, SciNet/CITA, University of Toronto

KNIME What s new?! Bernd Wiswedel KNIME.com AG, Zurich, Switzerland

NVIDIA DATA LOADING LIBRARY (DALI)

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Dac-Man: Data Change Management for Scientific Datasets on HPC systems

StarWind Virtual SAN Automating Management with SMI-S in System Center Virtual Machine Manager 2016

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Manual. HTPheno. A Novel Image Analysis Pipeline for High-Throughput Plant Phenotyping

DiskBoss DATA MANAGEMENT

Belnet and BrENIAC. High Performance Network for High Performance Computing. Jan Ooghe, Herman Moons KU Leuven BNC Oktober

Chuck Cartledge, PhD. 12 October 2018

Cloud Bursting Jim Thompson Avere Systems

The CEDA Archive: Data, Services and Infrastructure

How Scalable is your SMB?

A COMMON SOFTWARE FRAMEWORK FOR FEL DATA ACQUISITION AND EXPERIMENT MANAGEMENT AT FERMI

Windows Server 2012 Top Ten

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Characterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters

HortiCube: A Platform for Transparent, Trusted Data Sharing in the Food Supply Chain

On-the-Fly Data Race Detection in MPI One-Sided Communication

Harp-DAAL for High Performance Big Data Computing

Atrium Webinar- What's new in ADDM Version 10

Kerberos & HPC Batch systems. Matthieu Hautreux (CEA/DAM/DIF)

Community edition(open-source) Enterprise edition

S i m p l i f y i n g A d m i n i s t r a t i o n a n d M a n a g e m e n t P r o c e s s e s i n t h e P o l i s h N a t i o n a l C l u s t e r

Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms

Recommender System for Personalization in. Daniel Mican Nicolae Tomai

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button.

ImageJ2 Directions & Goals

EMC DATA PROTECTION, FAILOVER AND FAILBACK, AND RESOURCE REPURPOSING IN A PHYSICAL SECURITY ENVIRONMENT

Insight: Measurement Tool. User Guide

INTRODUCTION TO NEXTFLOW

escience in the Cloud Dan Fay Director Earth, Energy and Environment

Introduction to BioHPC New User Training

Hyper-V Innovations for the SMB. IT Pro Camp, Northwest Florida State College, Niceville, FL, October 5, 2013

Parallel learning of content recommendations using map- reduce

ISSN: International Journal of Science, Engineering and Technology Research (IJSETR) Volume 6, Issue 1, January 2017

Veritas Storage Foundation in a VMware ESX Environment

LOCI, Fiji & ImageJ2

SnapCenter Software 4.0 Concepts Guide

Software Defined Storage. Reality or BS?

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:

Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

Collaborative data-driven science. Collaborative data-driven science. Mike Rippin

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

The Materials Data Facility

Chuck Cartledge, PhD. 19 January 2018

Glas robotprojecten + (PicknPack, Phenobot, Trimbot & Sweeper) Erik Pekkeriet

An introduction to checkpointing. for scientifc applications

The VGG Face Finder (VFF) Engine

Transcription:

A framework for image analysis in plant breeding Paul van Schayck

Paul van Schayck Wageningen University MSc Biotechnology (molecular life science) p.vanschayck@maastrichtuniversity.nl Nanoscopy Data management

Image analysis in plant breeding Marker assisted breeding Markers: Molecular Biochemical Morphological Shape, color and quantitative Tool for breeding From melon to cucumber

Examples of image analysis used in plant breeding An ImageJ plugin for plant variety testing Polder, G., Blokker, G., Heijden, G.W.A.M. van der (2012) In: Proceedings of the ImageJ User and Developer Conference 2012, 24-26 October 2012, Mondorf-les-Bains, Luxembourg. - : Centre de Recherche Public Henri Tudor, 2012 - p. 168-173.

Existing practice of image analysis Use of popular open source tools

Current work flow of image analysis Control flow Images and results Windows Server Via remote desktop Pipeline Macro End users: Breeders Researchers Network disks Portable disks SMB Local disks Storage

Current challenges Increasing size of datasets Data storage decentralized Little structure in metadata: Identification Results Quality assurance User interface

Current challenges Difficult end-user experience

Aim of project Improve end-user experience Store metadata of results and images Improve data storage capabilities Improve performance scalability Improve quality assurance

Discovering OMERO

Pipeline of image analysis Pipeline common library OMERO.Web: OMERO. Processor: Control script (Python) Macro Plugin OMERO.Insight: High Performance Computing cluster Control flow Data flow (images and annotations) Validation image Results

Progress monitoring Making use of jobhandle.setmessage()

Infrastructure of the framework Big-data storage VMware node NFS Image data OMERO.Web: Client NFS slurm DRMAA Slurm HPC nodes HPC cluster

Acquisition of images into OMERO OMERO. Importer Identification metadata Barcode scanner Image data NAS storage

Quality Assurance Version control (SVN) Doc generation (Sphinx) Automated testing Functional and acceptance (Python nose) Nightly tests. Reporting to monitoring system (PRTG). Testing environment Monitoring system availability and regressions

Test case: Shape parameters of carrot Determine different shape parameters Crop selection

Step 1: Acquisition and calibration Batch-wise import Color and light control script Image data After Before NAS storage HPC Cluster

Step 2: Classification and segmentation Crucial step Time-consuming calculations control script HPC cluster

Step 3: Measurement of parameters Results in CSV format FileAnnotation control script Validation image Results HPC cluster

Conclusions of the carrot test case Further implementation: Validation Integration Web Client User Results

Conclusions Improved end-user experience Improved end-user experience

Conclusions Before Improved end user experience Centralized storage After Network disks Image data Portable disks Local disks Storage Centralized NAS storage Centralized storage

Conclusions Improved end user experience Centralized storage Storage of metadata Storage of metadata

Conclusions Improved end user experience Centralized storage Storage of metadata Performance scalability Image data HPC nodes NAS storage HPC cluster Performance scalability

Conclusions Improved end user experience Centralized storage Storage of metadata Performance scalability Complexity for developers Software dependencies At the cost of More steps required by developers Software dependencies

Conclusions Improved end user experience Centralized storage Storage of metadata Performance scalability Complexity for developers Software dependencies Quality assurance Quality assurance

Acknowledgements Raimond Ravelli Peter Peters Jan-Willem Borst