SETTING UP AN HCS DATA ANALYSIS SYSTEM

Similar documents
a measurable difference

Transitioning to Symyx

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Integration in the 21 st -Century Enterprise. Thomas Blackadar American Chemical Society Meeting New York, September 10, 2003

Life Sciences Oracle Based Solutions. June 2004

White Paper: Delivering Enterprise Web Applications on the Curl Platform

Overview. Experiment Specifications. This tutorial will enable you to

The Business Case for a Web Content Management System. Published: July 2001

GE Healthcare. Visualize Analyze. Realize. IN Cell Miner HCM Data management for high-content analysis and screening

X-Color QC Enterprise Edition White Paper

Store and Report Waters Empower Data with OpenLAB ECM and OpenLAB ECM Intelligent Reporter

SharePoint 2016 Site Collections and Site Owner Administration

The Workflow Driven Lab

XML in the bipharmaceutical

DjVu Technology Primer

Automating Publishing Workflows through Standardization. XML Publishing with SDL

Seamless Dynamic Web (and Smart Device!) Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN

From Visual Data Exploration and Analysis to Scientific Conclusions

Mobile Application Workbench. SAP Mobile Platform 3.0 SP02

Configuring Job Monitoring in SAP Solution Manager 7.2

A Blaise Editing System at Westat. Rick Dulaney, Westat Boris Allan, Westat

STRATEGIC WHITE PAPER. Securing cloud environments with Nuage Networks VSP: Policy-based security automation and microsegmentation overview

Categorizing Migrations

Analysis Exchange Framework Terms of Reference December 2016

SharePoint 2016 Site Collections and Site Owner Administration

A QUICK OVERVIEW OF THE OMNeT++ IDE

Evaluator Group Inc. Executive Editor: Randy Kerns

Simplify WAN Service Discovery for Mac Users & Eliminate AppleTalk

Spotfire and Tableau Positioning. Summary

Optimizing and Managing File Storage in Windows Environments

"Charting the Course... MOC A: SharePoint 2016 Site Collections and Site Owner Administration. Course Summary

Move Beyond Primitive Drawing Tools with SAP Sybase PowerDesigner Create and Manage Business Change in Your Enterprise Architecture

Sustainable Security Operations

Embarcadero PowerSQL 1.1 Evaluation Guide. Published: July 14, 2008

Vulnerability Assessments and Penetration Testing

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

The VERITAS VERTEX Initiative. The Future of Data Protection

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Taming Rave: How to control data collection standards?

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

How Security Policy Orchestration Extends to Hybrid Cloud Platforms

Cellular Imaging Solutions Imaging with a vision

Exsys RuleBook Selector Tutorial. Copyright 2004 EXSYS Inc. All right reserved. Printed in the United States of America.

Solstice Pod Admin Guide

New Zealand Government IBM Infrastructure as a Service

Symantec Data Center Migration Service

Multi-Tenancy Designs for the F5 High-Performance Services Fabric

EMC ACADEMIC ALLIANCE

Remove complexity in protecting your virtual infrastructure with. IBM Spectrum Protect Plus. Data availability made easy. Overview

KNIME Enalos+ Molecular Descriptor nodes

The Need for a Holistic Automation Solution to Overcome the Pitfalls in Test Automation

Cisco Service Control Online Advertising Solution Guide: Behavioral. Profile Creation Using Traffic Mirroring, Release 4.0.x

Desktop DNA r11.1. PC DNA Management Challenges

BUZZ - DIGITAL OR WHITEPAPER

Groovy in Jenkins. Ioannis K. Moutsatsos. Repurposing Jenkins for Life Sciences Data Pipelining

KNIME Enalos+ Modelling nodes

CocoBase Delivers TOP TEN Enterprise Persistence Features For JPA Development! CocoBase Pure POJO

Magnolia Community Edition vs. Enterprise Edition. Non-Functional Features. Magnolia EE. Magnolia CE. Topic. Good value for money.

CommVault Simpana 9 Virtual Server - Lab Validation

CONCEPTUAL DESIGN FOR SOFTWARE PRODUCTS: SERVICE REQUEST PORTAL. Tyler Munger Subhas Desa

What s New in VMware vsphere 5.1 VMware vcenter Server

Instrument Management Design Features of Agilent OpenLAB. White Paper

Design and deliver cloud-based apps and data for flexible, on-demand IT

SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT

QUARTZ PCI. SLOW-SCAN for ANALOG SEMs THE MEASURING, ANNOTATING, PROCESSING, REPORTING, ARCHIVING, DO EVERYTHING SOLUTION FOR MICROSCOPY

Oracle Exadata Statement of Direction NOVEMBER 2017

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Microsoft SharePoint Server 2013 Plan, Configure & Manage

SharePoint 2016 Site Collections and Site Owner Administration

BioImaging facility update: from multi-photon in vivo imaging to highcontent high-throughput image-based screening. Alex Laude The BioImaging Unit

IBM Rational Software Architect

Galaxy Custom Designer SE The New Choice in Custom Schematic Editing and Simulation Environment

GoldSim: Using Simulation to Move Beyond the Limitations of Spreadsheet Models

Nikon Capture NX "How To..." Series

SIEM: Five Requirements that Solve the Bigger Business Issues

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

SpaceClaim Professional The Natural 3D Design System. Advanced Technology

A Practical Approach to Balancing Application Performance and Instrumentation Information Using Symantec i 3 for J2EE

Deploying a System Center 2012 R2 Configuration Manager Hierarchy

Data Virtualization Implementation Methodology and Best Practices

Legacy Metamorphosis. By Charles Finley, Transformix Computer Corporation

IP PBX for Service Oriented Architectures Communications Web Services

PVS Interactive Short Locator: Establishing Efficiency and Predictability in the LVS Short Debug Process for Advanced SoC Design

Guidelines for Designing Web Navigation

Wave User s Guide. Contents

20331B: Core Solutions of Microsoft SharePoint Server 2013

ChromQuest 5.0 Quick Reference Guide

Technical Overview. Elastic Path Commerce

White Paper. The Benefits of Object-Based Architectures for SCADA and Supervisory Systems. What s Inside:

Thermo Xcalibur Getting Started (Quantitative Analysis)

IDL DISCOVER WHAT S IN YOUR DATA

Virtuoso Layout Suite XL

: Course : SharePoint 2016 Site Collection and Site Administration

Use Case Brief BORDERLESS DATACENTERS

Voluntary Product Accessibility Report

Security Gap Analysis: Aggregrated Results

Integration With the Business Modeler

An Oracle White Paper February Comprehensive Testing for Siebel With Oracle Application Testing Suite

QuickBooks 2008 Software Installation Guide

Transcription:

A WHITE PAPER FROM GENEDATA JANUARY 2010 SETTING UP AN HCS DATA ANALYSIS SYSTEM WHY YOU NEED ONE HOW TO CREATE ONE HOW IT WILL HELP HCS MARKET AND DATA ANALYSIS CHALLENGES High Content Screening (HCS) has climbed out of the Trough of Disillusionment, as defined by Gartner, Inc., and is on its way to becoming a standard technology in modern drug discovery 1,2. This is especially true in the area of hit identification and confirmation where HCS is an integral part of the screening infrastructure. With the increasing sophistication and throughput of high content systems from the major vendors, the need for an effective HCS data analysis solution is becoming increasingly important. While there are a number of generic data analysis systems and HTS-specific systems available today, the options become extremely limited as soon as HCS labs carefully consider both the breadth and scale of their needs. With many current systems already taxed by perhaps a million data points per assay, HCS labs need systems that can scale to smoothly process tens to hundreds of millions of data points while leveraging the complex information contained in HCS data. While many current HTS data management systems were designed in a one readout per well framework, HCS researchers now require an analytical framework that can capture, process and utilize this rich content, and make it accessible and understandable to others in the organization. During 2009, Genedata worked with several major pharmaceutical and biotechnology companies, helping them to establish a complete HCS Data Analysis (HCA) infrastructure, in alignment with their existing screening data analysis solutions 3. These customers run or plan to run HCS assays in different formats and throughputs from lower throughput toxicity, validation or confirmation screens up to full deck library screens with more than 1 million compounds. In setting up specific HCS infrastructures, these customers identified a gap in their software landscape: While the HCS instrument vendor-specific packages for data analysis are adapted specifically to the requirements of High 1 Fenn, Jackie (2008-06-27). "Understanding hype cycles. Gartner, Inc. http://www.gartner.com/pages/story.php.id.8795.s.8.jsp. 2 Peppard, Jane (2009-12-10). Screening for Small Molecules Promoting Oligodendrocyte Differentiation from Glial Precursor Cells, at the Act 2010 Cellular & Systems Models for Human Disease Biology conference organized by IBC life sciences. 3 Leven, Oliver (2009-09-21). Scalable Data Management and Analysis in High-Content Screens Leveraging Rich Biological Outcomes with Extreme Efficiency (2009 High-Content Analysis East conference organized by Cambridge Healthtech Institute). Setting Up an HCS Data Analysis System Page 1/7

Content Screening Data Analysis, they often do not scale and often completely lack functionalities for typical screening data processing tasks (i.e. normalization, correction, masking, hit list creation). Conversely, the existing screening data analysis infrastructure does not handle the complexity of typical HCS data with multiple features (readouts) and also does not provide a link back to the underlying HCS images. This White Paper describes the typical HCS infrastructures and goals, concept, and results for setting up an integrated HCA platform using Genedata Screener High Content Analyzer. TECHNICAL INFRASTRUCTURE AND BASIC WORKFLOW The basic HCS infrastructure consists of one or several HCS instruments, accompanied by one of multiple HCS feature extraction software packages. This basic infrastructure allows users to conduct their HCS experiments, take one or several images per well, and perform the image analysis to quantify the changes in cellular phenotypes. With low throughput experiments (e.g. one or two 96-well plates), this workflow can be performed easily without any specific integration or automation. For example: 1. HCS instrument runs manually for a couple of plates; 2. Resulting images are stored on a shared network drive; 3. Image analysis software starts manually, analyzing the respective images; 4. Results of the feature extraction are stored on the user s desktop in form of files; 5. Features of relevance are extracted from the result files with standard desktop software; 6. Inspection and basic quality control of the numerical well results use spreadsheet software; 7. File browser finds the images from which the well results were generated, and images are displayed by a specific image viewer (e.g. ImageJ). While this process works well with a small number of wells during protocol development, it becomes very cumbersome when increasing the throughput even to moderate sizes, e.g. to test a library of 1,000 compounds. The main drawback is the manual operation that lacks automation support. For example, it is straightforward to identify and display 3 images out of 2x96 image files using a file browser. This is no longer true if the user needs to inspect the relevant images for the 30 most active compounds from a total set of 3,000 images. Most HCS labs have multiple imaging instruments, often bundled with different feature extraction software packages. These variables impose user challenges such as: - Each instrument creates images in its own result format - Each feature extraction software creates results in different formats - Feature extraction software packages vary greatly in terms of image display, feature semantics and analysis functionalities - Larger groups now often establish a dedicated enterprise image storage system and shared media for HCS screening results, e.g. a fast and centrally-accessible shared drive. However, such dedicated image management solutions usually lack essential workflow functionalities. Setting Up an HCS Data Analysis System Page 2/7

GOALS FOR AN HCS ANALYSIS SYSTEM There are two broad categories of HCS Data Analysis, assay development and screening, and the goals for these categories are quite different. During assay development, the scientist s main goal is to identify an optimal assay protocol (cell handling, experimental conditions, imaging parameters, feature extraction, etc). In contrast, the primary goal of screening is to perform the experiment and data analysis as quickly and efficiently as possible, while limiting process artifacts and unforeseen compound effects. This is true for primary, secondary, confirmation and other follow-up screens and these use cases require better automation, infrastructure integration and the elimination of manual tasks. An efficient informatics solution for this category of HCS Data Analysis thus includes: 1. Easy import of multi-layered well data from each instrument and feature extraction software, via local or centralized storage 2. Fast, automated access to the HCS images throughout the analysis workflow 3. Support for centralized image storage systems, be it commercial or home-grown systems 4. Strict business rules and quality control criteria aligned with standard screening methodologies 5. Limited effort to adapt existing workflow infrastructure to new assays or infrastructure components 6. Integration of the HCS data analysis system with corporate databases, inventories and workflow systems CONCEPT The HCS Data Analysis system presented here is built on Genedata Screener as the core component. Genedata Screener is a software platform dedicated to High Throughput and High Content data analysis and management. Screener High Content Analyzer module focuses on facilitating primary, secondary, confirmation and other follow up high content screens by addressing the critical issues listed above. Screener High Content Analyzer thus becomes the central point of interaction for users doing routine analysis of HCS data (Figure 1). By virtue of its public data import APIs and a file parser infrastructure, High Content Analyzer is easily configured to match the output from any instrument and feature extraction software that is writing its data in a publicly documented format. Additionally, it copes well with ongoing changes in instrumentation or image analysis. Automated access to the corresponding HCS images from a screen is provided by linking its public image API to the respective image storage system. This link allows the scientist to go back to and view the original HCS images at any point of the analysis workflow. With High Content Analyzer as part of the Screener platform, business rules and quality control criteria (using standard screening procedures) are provided out-of-the-box. Building on Screener technology also allows a straightforward fit of HCS data into a standard infrastructure, making them just another data stream in Screener. Previous integrations with corporate databases and inventory systems are preserved with minimal adaptations; new integrations are created with little effort. Setting Up an HCS Data Analysis System Page 3/7

Figure 1: Screener High Content Analyzer embedded in a typical HCS environment. It imports well data plate-by-plate (1) either from local files or from a central server resource (e.g. a shared file system or a data base, e.g. the Cellomics Store), depending on available infrastructure and integration requirements. Centralized storage and access to images (2) helps to systematically search for and retrieve well results (3). The automated propagation of results to down-stream databases (4) completes the integration. Integrated in this way, High Content Analyzer provides scientists with automated processing, fast diagnostics and review capabilities for High Content Screens on the scale of a few plates to full-deck screens. Interactive reprocessing, result generation and automated documentation become a matter of seconds to a few minutes while handling the full complexity of the underlying HCS data. EXAMPLE INTEGRATIONS STAND-ALONE IMAGING HARDWARE While Screener High Content Analyzer provides the tools for data import and analysis in a wide variety of scenarios, one common set-up makes use of the hardware and basic software shipped with high content instrumentation. These installations may also make use of data and image storage on a shared network drive without additional software to manage the data. STORAGE OF IMAGE FILES In such an environment, data and image management are based on a directory hierarchy. In a typical example, a three-level hierarchy is used to locate the appropriate images: Experiment -> Run -> Plate, where Experiment denotes the overarching folder for all plates measured under the same experimental conditions while Run specifies the set of plates screened on a given day and Plate holds the final images. In such scenario, it is mandatory that individual image file names contain the well identifier for their automated association. With images stored in this way, feature extraction software can place the numeric data derived from the well images within the same directory structure. Setting Up an HCS Data Analysis System Page 4/7

DATA IMPORT INTO SCREENER HIGH CONTENT ANALYZER To import the raw well data, the user of Screener High Content Analyzer selects the plates to analyze on the client by simple file browsing to the experiment folder on the centralized share. The only requirement is that the shared drive is accessible from the user s client computer. This centralized selection of plates requires a directory structure specification to identify the individual plates as outlined above. These directories and the files therein are loaded and the user identifies the plates of interest by experiment, run, plate barcode, plate timestamp or any other available meta data. This access method requires the configuration of a standard data parser matching the numeric well data generated by the feature extraction software. IMAGE LOOK-UP With images stored in the directory structure with the well data, the information to automatically identify the images becomes stored within the software when the numeric well data are imported. In the software, the user then simply selects any wells of interest and the corresponding images are displayed instantaneously for visual inspection. This requires a server-side implementation of the Screener HCS Image API. The semantics of the available images (channels, fields) and the storage of the images in the directory structure in relation to the raw well data are defined in this implementation. This work can be done as a small service project by Genedata or using in-house resources, as the API is well-documented and comes with several example implementations. DEDICATED IMAGE STORAGE SOLUTION: CELLOMICS STORE More and more facilities have started to make use of more sophisticated solutions for image storage as the amount of files and raw size of the cumulative data can become very large and complex. Screener High Content Analyzer has public and open APIs for importing the well data (Plate Data Import API) and image data (HCS Image API), which can be used with a wide range of proprietary and custom dedicated image storage solutions. For brevity, an example of the integration of Screener High Content Analyzer with Thermo Fisher Cellomics Store is described below. In this process, Cellomics Store provides the following functionalities: a) manages the relationship between raw well data and images; b) specifies additional information on the experiments useful for image display purposes, such as composites, overlays and their combination with the channel images; and c) manages the use of additional enterprise storage hardware. STORAGE OF IMAGES AND RAW WELL DATA When operated in conjunction with the Cellomics ArrayScan instrument, HCS images can be spooled automatically into the Cellomics Store. From the Cellomics Store, the images automatically feed into the Cellomics BioApplications. The results are written back to the Cellomics Store, which maintains the relation between the original images and the derived numerical results. Setting Up an HCS Data Analysis System Page 5/7

DATA IMPORT INTO SCREENER HIGH CONTENT ANALYZER In a standard integration of Cellomics Store with Screener High Content Analyzer the selection of plates to be imported is done via the plate selection dialog; it features both hierarchical navigation (1) as well as selection of individual plates (2). In the next step, the users select the features to be imported (highlighted in (3)): all features defined within the Cellomics protocol are offered to the user and the user can import all features or just the features of interest for the selected plates. Import of results typically happens in a matter of seconds to a few minutes. IMAGE LOOK-UP While analyzing the data with Screener High Content Analyzer, the raw images used to generate the well data are available via a simple right-click at any point in the analysis. Screener has access to all of the available images in the Cellomics Store. These include base images as well as a combination of images from different channels, composites and overlays (highlighted cellular objects) as defined within the Cellomics protocol. In addition, the user can switch between different images of the same well (fields, sites) by a simple mouse click. A default implementation of the HCS Image API is readily available for interested customers. It requires only the installation and proper configuration of the Cellomics Store and Cellomics HCS Connect API Runtime. REALIZED BENEFITS According to several major pharmaceutical and biotechnology companies where Genedata Screener High Content Analyzer was introduced and integrated in 2009, this work has overcome major issues that had previously prevented a scale-up of HCS experiments. Realized benefits include: 1) Fast and convenient access to the HCS well data, regardless of the specific HCS instrumentation. 2) Ad-hoc access to the HCS images at any stage of the Plate QC or hit list creation process. 3) Powerful and intuitive user interface for rapid plate QC in complex HCS experiments. 4) Quick set up of the analysis procedure for a new experiment (within a few minutes). 5) HCS data analysis with other screening activities (classical HTS or MTS) and corresponding corporate databases, allowing easy result sharing and ensuring comparability of results. Setting Up an HCS Data Analysis System Page 6/7

Now, these companies can process their HCS data in a standardized fashion, including HCS image look-up and review, for screens ranging from a few plates to more than 1,000,000 compounds. Smooth integration has taken the burden from the scientist to manually locate, re-format and import HCS data from different sources and manually search for HCS images in file systems. Interactive visualization, annotation, and re-processing options enable review and optimization of analysis for complete campaigns within minutes (Figure 2). Results are reported in a standardized fashion, compatible with other screening assays, to corporate systems for access and retrieval by the wider lead discovery community. Figure 2: This screenshot illustrates the interactivity of Screener High Content Analyzer for Hit List creation and plate QC. Genedata Phylospher, Genedata Screener, Genedata Expressionist and the Genedata logo are registered trademarks of Genedata AG. All other product and service names mentioned are the trademarks of their respective companies. 2010 Genedata AG. The information contained herein is subject to change without notice. Setting Up an HCS Data Analysis System Page 7/7