A WHITE PAPER FROM GENEDATA JANUARY 2010 SETTING UP AN HCS DATA ANALYSIS SYSTEM WHY YOU NEED ONE HOW TO CREATE ONE HOW IT WILL HELP HCS MARKET AND DATA ANALYSIS CHALLENGES High Content Screening (HCS) has climbed out of the Trough of Disillusionment, as defined by Gartner, Inc., and is on its way to becoming a standard technology in modern drug discovery 1,2. This is especially true in the area of hit identification and confirmation where HCS is an integral part of the screening infrastructure. With the increasing sophistication and throughput of high content systems from the major vendors, the need for an effective HCS data analysis solution is becoming increasingly important. While there are a number of generic data analysis systems and HTS-specific systems available today, the options become extremely limited as soon as HCS labs carefully consider both the breadth and scale of their needs. With many current systems already taxed by perhaps a million data points per assay, HCS labs need systems that can scale to smoothly process tens to hundreds of millions of data points while leveraging the complex information contained in HCS data. While many current HTS data management systems were designed in a one readout per well framework, HCS researchers now require an analytical framework that can capture, process and utilize this rich content, and make it accessible and understandable to others in the organization. During 2009, Genedata worked with several major pharmaceutical and biotechnology companies, helping them to establish a complete HCS Data Analysis (HCA) infrastructure, in alignment with their existing screening data analysis solutions 3. These customers run or plan to run HCS assays in different formats and throughputs from lower throughput toxicity, validation or confirmation screens up to full deck library screens with more than 1 million compounds. In setting up specific HCS infrastructures, these customers identified a gap in their software landscape: While the HCS instrument vendor-specific packages for data analysis are adapted specifically to the requirements of High 1 Fenn, Jackie (2008-06-27). "Understanding hype cycles. Gartner, Inc. http://www.gartner.com/pages/story.php.id.8795.s.8.jsp. 2 Peppard, Jane (2009-12-10). Screening for Small Molecules Promoting Oligodendrocyte Differentiation from Glial Precursor Cells, at the Act 2010 Cellular & Systems Models for Human Disease Biology conference organized by IBC life sciences. 3 Leven, Oliver (2009-09-21). Scalable Data Management and Analysis in High-Content Screens Leveraging Rich Biological Outcomes with Extreme Efficiency (2009 High-Content Analysis East conference organized by Cambridge Healthtech Institute). Setting Up an HCS Data Analysis System Page 1/7
Content Screening Data Analysis, they often do not scale and often completely lack functionalities for typical screening data processing tasks (i.e. normalization, correction, masking, hit list creation). Conversely, the existing screening data analysis infrastructure does not handle the complexity of typical HCS data with multiple features (readouts) and also does not provide a link back to the underlying HCS images. This White Paper describes the typical HCS infrastructures and goals, concept, and results for setting up an integrated HCA platform using Genedata Screener High Content Analyzer. TECHNICAL INFRASTRUCTURE AND BASIC WORKFLOW The basic HCS infrastructure consists of one or several HCS instruments, accompanied by one of multiple HCS feature extraction software packages. This basic infrastructure allows users to conduct their HCS experiments, take one or several images per well, and perform the image analysis to quantify the changes in cellular phenotypes. With low throughput experiments (e.g. one or two 96-well plates), this workflow can be performed easily without any specific integration or automation. For example: 1. HCS instrument runs manually for a couple of plates; 2. Resulting images are stored on a shared network drive; 3. Image analysis software starts manually, analyzing the respective images; 4. Results of the feature extraction are stored on the user s desktop in form of files; 5. Features of relevance are extracted from the result files with standard desktop software; 6. Inspection and basic quality control of the numerical well results use spreadsheet software; 7. File browser finds the images from which the well results were generated, and images are displayed by a specific image viewer (e.g. ImageJ). While this process works well with a small number of wells during protocol development, it becomes very cumbersome when increasing the throughput even to moderate sizes, e.g. to test a library of 1,000 compounds. The main drawback is the manual operation that lacks automation support. For example, it is straightforward to identify and display 3 images out of 2x96 image files using a file browser. This is no longer true if the user needs to inspect the relevant images for the 30 most active compounds from a total set of 3,000 images. Most HCS labs have multiple imaging instruments, often bundled with different feature extraction software packages. These variables impose user challenges such as: - Each instrument creates images in its own result format - Each feature extraction software creates results in different formats - Feature extraction software packages vary greatly in terms of image display, feature semantics and analysis functionalities - Larger groups now often establish a dedicated enterprise image storage system and shared media for HCS screening results, e.g. a fast and centrally-accessible shared drive. However, such dedicated image management solutions usually lack essential workflow functionalities. Setting Up an HCS Data Analysis System Page 2/7
GOALS FOR AN HCS ANALYSIS SYSTEM There are two broad categories of HCS Data Analysis, assay development and screening, and the goals for these categories are quite different. During assay development, the scientist s main goal is to identify an optimal assay protocol (cell handling, experimental conditions, imaging parameters, feature extraction, etc). In contrast, the primary goal of screening is to perform the experiment and data analysis as quickly and efficiently as possible, while limiting process artifacts and unforeseen compound effects. This is true for primary, secondary, confirmation and other follow-up screens and these use cases require better automation, infrastructure integration and the elimination of manual tasks. An efficient informatics solution for this category of HCS Data Analysis thus includes: 1. Easy import of multi-layered well data from each instrument and feature extraction software, via local or centralized storage 2. Fast, automated access to the HCS images throughout the analysis workflow 3. Support for centralized image storage systems, be it commercial or home-grown systems 4. Strict business rules and quality control criteria aligned with standard screening methodologies 5. Limited effort to adapt existing workflow infrastructure to new assays or infrastructure components 6. Integration of the HCS data analysis system with corporate databases, inventories and workflow systems CONCEPT The HCS Data Analysis system presented here is built on Genedata Screener as the core component. Genedata Screener is a software platform dedicated to High Throughput and High Content data analysis and management. Screener High Content Analyzer module focuses on facilitating primary, secondary, confirmation and other follow up high content screens by addressing the critical issues listed above. Screener High Content Analyzer thus becomes the central point of interaction for users doing routine analysis of HCS data (Figure 1). By virtue of its public data import APIs and a file parser infrastructure, High Content Analyzer is easily configured to match the output from any instrument and feature extraction software that is writing its data in a publicly documented format. Additionally, it copes well with ongoing changes in instrumentation or image analysis. Automated access to the corresponding HCS images from a screen is provided by linking its public image API to the respective image storage system. This link allows the scientist to go back to and view the original HCS images at any point of the analysis workflow. With High Content Analyzer as part of the Screener platform, business rules and quality control criteria (using standard screening procedures) are provided out-of-the-box. Building on Screener technology also allows a straightforward fit of HCS data into a standard infrastructure, making them just another data stream in Screener. Previous integrations with corporate databases and inventory systems are preserved with minimal adaptations; new integrations are created with little effort. Setting Up an HCS Data Analysis System Page 3/7
Figure 1: Screener High Content Analyzer embedded in a typical HCS environment. It imports well data plate-by-plate (1) either from local files or from a central server resource (e.g. a shared file system or a data base, e.g. the Cellomics Store), depending on available infrastructure and integration requirements. Centralized storage and access to images (2) helps to systematically search for and retrieve well results (3). The automated propagation of results to down-stream databases (4) completes the integration. Integrated in this way, High Content Analyzer provides scientists with automated processing, fast diagnostics and review capabilities for High Content Screens on the scale of a few plates to full-deck screens. Interactive reprocessing, result generation and automated documentation become a matter of seconds to a few minutes while handling the full complexity of the underlying HCS data. EXAMPLE INTEGRATIONS STAND-ALONE IMAGING HARDWARE While Screener High Content Analyzer provides the tools for data import and analysis in a wide variety of scenarios, one common set-up makes use of the hardware and basic software shipped with high content instrumentation. These installations may also make use of data and image storage on a shared network drive without additional software to manage the data. STORAGE OF IMAGE FILES In such an environment, data and image management are based on a directory hierarchy. In a typical example, a three-level hierarchy is used to locate the appropriate images: Experiment -> Run -> Plate, where Experiment denotes the overarching folder for all plates measured under the same experimental conditions while Run specifies the set of plates screened on a given day and Plate holds the final images. In such scenario, it is mandatory that individual image file names contain the well identifier for their automated association. With images stored in this way, feature extraction software can place the numeric data derived from the well images within the same directory structure. Setting Up an HCS Data Analysis System Page 4/7
DATA IMPORT INTO SCREENER HIGH CONTENT ANALYZER To import the raw well data, the user of Screener High Content Analyzer selects the plates to analyze on the client by simple file browsing to the experiment folder on the centralized share. The only requirement is that the shared drive is accessible from the user s client computer. This centralized selection of plates requires a directory structure specification to identify the individual plates as outlined above. These directories and the files therein are loaded and the user identifies the plates of interest by experiment, run, plate barcode, plate timestamp or any other available meta data. This access method requires the configuration of a standard data parser matching the numeric well data generated by the feature extraction software. IMAGE LOOK-UP With images stored in the directory structure with the well data, the information to automatically identify the images becomes stored within the software when the numeric well data are imported. In the software, the user then simply selects any wells of interest and the corresponding images are displayed instantaneously for visual inspection. This requires a server-side implementation of the Screener HCS Image API. The semantics of the available images (channels, fields) and the storage of the images in the directory structure in relation to the raw well data are defined in this implementation. This work can be done as a small service project by Genedata or using in-house resources, as the API is well-documented and comes with several example implementations. DEDICATED IMAGE STORAGE SOLUTION: CELLOMICS STORE More and more facilities have started to make use of more sophisticated solutions for image storage as the amount of files and raw size of the cumulative data can become very large and complex. Screener High Content Analyzer has public and open APIs for importing the well data (Plate Data Import API) and image data (HCS Image API), which can be used with a wide range of proprietary and custom dedicated image storage solutions. For brevity, an example of the integration of Screener High Content Analyzer with Thermo Fisher Cellomics Store is described below. In this process, Cellomics Store provides the following functionalities: a) manages the relationship between raw well data and images; b) specifies additional information on the experiments useful for image display purposes, such as composites, overlays and their combination with the channel images; and c) manages the use of additional enterprise storage hardware. STORAGE OF IMAGES AND RAW WELL DATA When operated in conjunction with the Cellomics ArrayScan instrument, HCS images can be spooled automatically into the Cellomics Store. From the Cellomics Store, the images automatically feed into the Cellomics BioApplications. The results are written back to the Cellomics Store, which maintains the relation between the original images and the derived numerical results. Setting Up an HCS Data Analysis System Page 5/7
DATA IMPORT INTO SCREENER HIGH CONTENT ANALYZER In a standard integration of Cellomics Store with Screener High Content Analyzer the selection of plates to be imported is done via the plate selection dialog; it features both hierarchical navigation (1) as well as selection of individual plates (2). In the next step, the users select the features to be imported (highlighted in (3)): all features defined within the Cellomics protocol are offered to the user and the user can import all features or just the features of interest for the selected plates. Import of results typically happens in a matter of seconds to a few minutes. IMAGE LOOK-UP While analyzing the data with Screener High Content Analyzer, the raw images used to generate the well data are available via a simple right-click at any point in the analysis. Screener has access to all of the available images in the Cellomics Store. These include base images as well as a combination of images from different channels, composites and overlays (highlighted cellular objects) as defined within the Cellomics protocol. In addition, the user can switch between different images of the same well (fields, sites) by a simple mouse click. A default implementation of the HCS Image API is readily available for interested customers. It requires only the installation and proper configuration of the Cellomics Store and Cellomics HCS Connect API Runtime. REALIZED BENEFITS According to several major pharmaceutical and biotechnology companies where Genedata Screener High Content Analyzer was introduced and integrated in 2009, this work has overcome major issues that had previously prevented a scale-up of HCS experiments. Realized benefits include: 1) Fast and convenient access to the HCS well data, regardless of the specific HCS instrumentation. 2) Ad-hoc access to the HCS images at any stage of the Plate QC or hit list creation process. 3) Powerful and intuitive user interface for rapid plate QC in complex HCS experiments. 4) Quick set up of the analysis procedure for a new experiment (within a few minutes). 5) HCS data analysis with other screening activities (classical HTS or MTS) and corresponding corporate databases, allowing easy result sharing and ensuring comparability of results. Setting Up an HCS Data Analysis System Page 6/7
Now, these companies can process their HCS data in a standardized fashion, including HCS image look-up and review, for screens ranging from a few plates to more than 1,000,000 compounds. Smooth integration has taken the burden from the scientist to manually locate, re-format and import HCS data from different sources and manually search for HCS images in file systems. Interactive visualization, annotation, and re-processing options enable review and optimization of analysis for complete campaigns within minutes (Figure 2). Results are reported in a standardized fashion, compatible with other screening assays, to corporate systems for access and retrieval by the wider lead discovery community. Figure 2: This screenshot illustrates the interactivity of Screener High Content Analyzer for Hit List creation and plate QC. Genedata Phylospher, Genedata Screener, Genedata Expressionist and the Genedata logo are registered trademarks of Genedata AG. All other product and service names mentioned are the trademarks of their respective companies. 2010 Genedata AG. The information contained herein is subject to change without notice. Setting Up an HCS Data Analysis System Page 7/7