Vector Xpression 3. Speed Tutorial: III. Creating a Script for Automating Normalization of Data

Similar documents
Insight: Measurement Tool. User Guide

Tutorial 3 - Performing a Change-Point Analysis in Excel

Scientific Graphing in Excel 2013

Scientific Graphing in Excel 2007

Contents. CRITERION Vantage 3 Analysis Training Manual. Introduction 1. Basic Functionality of CRITERION Analysis 5. Charts and Reports 17

QDA Miner. Addendum v2.0

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Chapter 3: Rate Laws Excel Tutorial on Fitting logarithmic data

Spreadsheet View and Basic Statistics Concepts

WORKFLOW GUIDE. Content Management Solutions. Copyright 2014 Data Management Internationale'

AEMLog Users Guide. Version 1.01

Switchboard. Creating and Running a Navigation Form

Introduction. Inserting and Modifying Tables. Word 2010 Working with Tables. To Insert a Blank Table: Page 1

Workflow 1. Description

MS Excel Advanced Level

When you open SPSS for the first time, the SPSS Data Editor opens. However, a

5 Creating a Form Using Excel

Nintex Reporting 2008 Help

Release notes for StatCrunch mid-march 2015 update

The walkthrough is available at /

Flow Cytometry Analysis Software. Developed by scientists, for scientists. User Manual. Version Introduction:

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays

Excel 2013 Charts and Graphs

UW Department of Chemistry Lab Lectures Online

All About PlexSet Technology Data Analysis in nsolver Software

2. create the workbook file

Xfmea Version 10 First Steps Example

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures

Alibre Design Tutorial - Simple Revolve Translucent Glass Lamp Globe

ARM : Features Added After Version Gylling Data Management, Inc. * = key features. October 2015

MAIL MERGE LABELS USE THE MAIL MERGE WIZARD

Import and preprocessing of raw spectrum data

CCRS Quick Start Guide for Program Administrators. September Bank Handlowy w Warszawie S.A.

Introduction to Excel 2013 Part 2

Data Import and Quality Control in Geochemistry for ArcGIS

Generating a Custom Bill of Materials

v SMS 11.2 Tutorial Overview Prerequisites Requirements Time Objectives

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

LEGENDplex Data Analysis Software Version 8 User Guide

Pre-Lab Excel Problem

Image Analysis begins with loading an image into GenePix Pro, and takes you through all the analysis steps required to extract data from the image.

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Microsoft Excel 2010 Part 2: Intermediate Excel

v Overview SMS Tutorials Prerequisites Requirements Time Objectives

How to use Excel Spreadsheets for Graphing

Concordance Basics. Part I

Agilent Feature Extraction Software (v10.5)

Overview. Experiment Specifications. This tutorial will enable you to

Data Collection Software Release Notes. Real-Time PCR Analysis Software Release Notes. SNP Genotyping Analysis Software Release Notes

Before you use EBIS for the first time or every time you replace your PC/Laptop please follow the guidance below:

v Importing Rasters SMS 11.2 Tutorial Requirements Raster Module Map Module Mesh Module Time minutes Prerequisites Overview Tutorial

PEERNET PDF Creator Plus 6.0 Thank you for choosing PDF Creator Plus! Getting Started QUICK START GUIDE

How to Remove Duplicate Rows in Excel

Technology Assignment: Scatter Plots

SAS Visual Analytics 8.2: Working with Report Content

EXCEL 2002 (XP) FOCUS ON: DESIGNING SPREADSHEETS AND WORKBOOKS

Access Review. 4. Save the table by clicking the Save icon in the Quick Access Toolbar or by pulling

Chemistry Excel. Microsoft 2007

The Arena View. Tutorial. November The Arena is an easy-to-use image management system for analysis of high volume data.

This is the preferred method when combining documents that are finished and not expected to change.

MODFLOW Automated Parameter Estimation

v SMS 11.1 Tutorial Overview Time minutes

Microarray Excel Hands-on Workshop Handout

Introduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout,

Microsoft Excel 2007

Microsoft Excel 2010 Tutorial

AEMLog users guide V User Guide - Advanced Engine Management 2205 West 126 th st Hawthorne CA,

Microsoft Excel 2016 / 2013 Basic & Intermediate

SAS Visual Analytics 8.2: Getting Started with Reports

Probabilistic Analysis Tutorial

Introduction to BEST Viewpoints

TUTORIAL - COMMAND CENTER

Making Tables and Graphs with Excel. The Basics

Agilent CytoGenomics 2.0 Feature Extraction for CytoGenomics

Oasys Pdisp. Copyright Oasys 2013

TIBCO Spotfire DecisionSite Quick Start Guide

Application of Skills: Microsoft Excel 2013 Tutorial

INFORMATION TECHNOLOGY 402 UNIT IV SPREADSHEET

TraceFinder Analysis Quick Reference Guide

You can clear the sample data from the table by selecting the table and pressing Delete.

Microsoft MOS-EXP. Microsoft Excel 2002 Core.

Microsoft Access 2010

Creating Interactive PDF Forms

Pathway Analysis using Partek Genomics Suite 6.6 and Partek Pathway

download instant at

Introduction to Excel Workshop

Contents Part I: Background Information About This Handbook... 2 Excel Terminology Part II: Advanced Excel Tasks...

GenStat for Schools. Disappearing Rock Wren in Fiordland

Tutorial 2: Analysis of DIA/SWATH data in Skyline

Numbers Basics Website:

Chapter-2 Digital Data Analysis

Microsoft Access 2013

Microsoft Access 2013

Some useful shortcut keys applicable for both Excel and Word (16 to 19 is only for Excel): Sr.No. Shortcut Keys Description

Microsoft Excel 2007 Creating a XY Scatter Chart

v SMS 12.2 Tutorial Observation Prerequisites Requirements Time minutes

7.4 Tutorial #4: Profiling LC Segments Using the CHAID Option

Equipment Support Structures

Statistics and Graphics Functions

Flowlogic. User Manual. Developed by scientists, for scientists. Version Flow Cytometry Analysis Software

Transcription:

Vector Xpression 3 Speed Tutorial: III. Creating a Script for Automating Normalization of Data

Table of Contents Table of Contents...1 Important: Please Read...1 Opening Data in Raw Data Viewer...2 Creating a Normalization Script Using the Recorder...3 Split Spots into Blocks...4 M vs A Plot...4 Scatter Plot...5 Normalize by Lowess...7 Histogram...10 Creating an Expression Run from Raw Data...10 Applying the Script to Automatically Process and Save Data...11 Important: Please Read STOP: The first tutorial in the suite, Part I: Importing Two-Channel Raw Data, must be completed before proceeding with this tutorial. The raw data that is used in this tutorial must be loaded using the import procedures outlined in the first tutorial. Completing Part II: Adding Annotations is optional before proceeding with this section, Part III: Creating a Script for Automating Normalization of Data. GO: If you have completed Part I: Importing Two-Channel Raw Data, proceed with this tutorial. Introduction Purpose This segment of the Speed tutorial teaches you to create a script for automating normalization of the expression data you imported in the first tutorial. You may have added annotation to the data in Part II of the tutorial series. This is the third of the series of three tutorials to teach you how to use Vector Xpression: I. Importing Two-Channel Expression Raw Data II. Adding Annotations III. Creating a Script for Automating Data Normalization Other tutorials are available from InforMax to teach you other methods of using Vector Xpression. Refer to the InforMax website for more information: http://www.informaxinc.com/content.cfm?pageid=25 1

IMPORTANT: You may want to use this tutorial in conjunction with the Vector Xpression 3 User s Manual for clarification of all functionality. NOTE: This section can begin where either Part I: Importing Two-Channel Raw Data or Part II: Importing Annotations tutorial ended. If you are currently logged into Vector Xpression 3 and are continuing the series of tutorials directly from the Part I or Part II tutorial, begin with Step 2 in the following section. If you are starting a new Vector Xpression session, begin the tutorial with Step 1. Opening Data in Raw Data Viewer Overview of Raw Data Viewer Expression Raw Data Viewer is the user interface in Vector Xpression designed to display raw data; it allows you to normalize and consolidate raw data into Expression Runs, assess the quality and edit the data when necessary. Note that the column heading names you assigned to your raw data in the Finalize Import dialog box when you imported the data (see Part 1: Importing Two-Channel Raw Data) display on the Raw Data Viewer spreadsheet. Also note that each channel displays Signal and Background columns of data. Action 1. Launch Vector Xpression 3 Explorer. From the Windows Start button, select Start > Programs > InforMax 2003 > Xpression Explorer. 2. In Vector Xpression Database Explorer, ensure the Raw Data table is displayed in the Vector Xpression Database viewer. a. Select the sample.c1 raw data object, one of the objects you imported in Part 1 of the tutorials (Figure 1). b. Open it by selecting Data > Open from the main menu. Result Figure 1. Vector Xpression Database Explorer - Raw Data The sample.c1 raw data object opens in the Raw Data Viewer. 2

Creating a Normalization Script Using the Recorder Overview Xpression Scripts are automation tools used to automate repeated tasks in the Raw Data Viewer. The scripts must be created in Raw Data Viewer, but after a script is created, the tool can be launched on raw data objects selected in the Xpression Database Explorer. When a script is initiated from the Explorer, every raw data object currently selected in the Explorer opens in the Raw Data Viewer, the script is run against it, and results are saved. You can reopen the associated raw data object(s) in Raw Data Viewer to review the results once the script is finished. The script that you will record in this section consists of the following steps, performed in a consecutive workflow: 1. Split Spots into Blocks 2. M vs A Transformation 3. Scatter Plot 4. Fit Lowess line 5. Normalize by Lowess 6. Histogram 7. Create Ratio Expression Run from normalized data In Vector Xpression, either raw data or Expression Runs can be normalized. The basic goal of this section of the Speed tutorial is to normalize the raw data you imported in Part I: Importing Two-Channel Raw Data. Normalization is often advisable in order to make valid comparisons between different chip reads or different channels. This process corrects for differences in labeling, detection efficiencies, differences in intensities, or other minor variations such as systematic measurement bias by re-scaling the data. Action 1. Begin recording the script by clicking the Record Script button ( ) from the lower tool bar. 2. The first step in the script is to identify the block coordinates. Select Calculations > Split Spots into Blocks from the main menu. 3

Split Spots into Blocks Overview of Split Spots into Blocks In Raw Data Viewer, the Grid and Spot columns on the spreadsheet display coordinates for each spot on the chip, based on its physical location on the chip. The spots on chips are often arranged in a series of blocks, with each block assigned a specific number. Each block contains individual spots arranged in rows and columns. The layout is typically dependent on the print-head used to fabricate the chips. These grid/spot locations in the raw data refer back to a chip design document which defines the link to your gene name. The Split Spots into Blocks function organizes spots into logical larger groupings or blocks based on their physical position on a chip. When blocks are created, the spots corresponding to those blocks are tagged with the corresponding block number. 3. In the Split Spots into Blocks dialog box, select the check boxes adjacent to BlockY and BlockX, and click OK (Figure 2). Figure 2 Split Spots into Blocks Dialog Box 4. The results display in new columns on the spreadsheet labeled BlockX and BlockY. 5. Next, calculate an M vs A plot by selecting Calculations > M vs. A Transform M vs A Plot Overview of M vs A Plot The M vs A plot is the ratio of the signal from the two channels for each individual spot as a function of the average log intensity of that spot. Ideally, the ratios should be independent of the average intensity, and all data points will scatter along a horizontal straight line at zero. The M vs A plot generally reveals systematic biases in the data set, providing a visual justification of why normalization might be needed. 6. In the Select Columns for M vs A Plot dialog box, select options from the drop-down menus as shown in Figure 3. 4

Figure 3 Select Columns for M vs A Plot Dialog Box 7. Click OK. The calculations are performed, and the M and A data are added as new columns entitled Results: A and M (Figure 4). The A value represents the log geometric mean of the signal. The M value is the log2 ratio of the experimental sample (Cy5) over the reference sample (Cy3). Figure 4 New columns in the Raw Data Viewer display M and A Plot calculations 8. Now, check the results of the M and A transform by creating an M vs. A Scatter Plot. Select View > Make Scatter Plot Scatter Plot Overview of Scatter Plot of M vs A Results The Scatter Plot in an Raw Data Viewer displaying raw data is a graphical representation that allows direct comparison of spots or genes as represented by values from two different scans of the same chip. The position of each spot or gene symbol (+) on the plot corresponds to the values for each gene taken over each scan represented on the graph. 9. In the Select Columns dialog box, to create the appropriate scatter plot, select the A check box in the X Axis column, the M check box in the Y Axis column, and click OK (Figure 5). 5

Figure 5 Selecting columns for a Scatter Plot 10. Select the Scatter Plot tab ( ) to view the plot (Figure 6). On the plot, A values appear on the X axis and M values on the Y axis. Note that the scatter plot is visually non-linear, strongly suggesting systematic biases in the data. Figure 6 Scatter Plot prepared from M and A value; data is not normalized To verify the systematic bias in the data, you will fit a Lowess line to the M vs A plots of each block. A Lowess line is a smooth non-parametric line representative of the data in the scatter plot. It is robust to outliers. In this case, Lowess lines will be applied individually to all 16 blocks of data. 11. Right-click on the plot for the shortcut menu and click Fit Lowess from the shortcut menu. 12. In the Fit Lowess dialog box, select the Block radio button and click OK. 6

Figure 7 Lowess linesfor for 16 blocks of data are superimposed on the M vs A scatter plot Because the Lowess lines are not horizontal at zero, this demonstrates bias in the data (Figure 7). In addition, the Lowess lines reveal print-block-specific differences. Because of these observations, you will now normalize the data. Normalize by Lowess Overview of Normalizing by Lowess This algorithm normalizes the M values (the log2 ratios) to the A values (the log geometric mean of the signal), which leaves the A values unchanged. In other words, the M values are treated as dependent values, while the A values are treated as independent values. 13. Perform a Lowess normalization by selecting Calculations > Normalize > Lowess from the main menu. 14. In the Lowess Normalization dialog box, select options from the drop-down menus as shown in Figure 8. Be sure and check the Process Each Block Separately checkbox. Click OK. 7

Figure 8 Lowess Normalization dialog box The calculations are performed and the new data entered into the Raw Data Viewer spreadsheet as Lowess normalized Results M and Lowess normalized Results A columns under the Results header (Figure 9). Figure 9 Lowess normalized calculations display in new spreadsheet columns 15. Check the results of the normalization by creating a Scatter Plot of the normalized data. Select View > Make Scatter Plot.. 16. In the Select Columns dialog box that opens: a. Check the Lowess normalized Results A check box in the X Axis column, b. Check the Lowess normalized Results M check box in the Y Axis column, c. Click OK. 18. In Raw Data Viewer, select the Scatter Plot tab again. The plot displays as a second plot. 19. To check the results of the normalization, fit Lowess lines to the new M vs. A scatter plots of each block again. Right click on the lower plot and select Fit Lowess from the shortcut menu. 8

20. In the Fit Lowess dialog box, select the Block radio button. Accept the default setting and click OK. Lowess lines are again applied to all 16 print blocks in the lower Scatter Plot. 21. Open the shortcut menu associated with the lower scatter plot with a right click on the plot. Select Show Line > Y=0. Following the Lowess normalization, the normalized M values now are uniformly distributed along the line y = 0 (lower panel of Figure 10), indicating compensation for any systematic biases in the data. This contrasts to the systematic curvature in the data points formed by the original data (upper panel of Figure 10). Figure 10 Lowess lines applied to normalized data (lower plot) demonstrate the uniform distribution of values, in contrast to Lowess lines applied to unnormalized data (upper plot) 22. Return to the spreadsheet view using the Spreadsheet tab ( ). The results reveal that the normalization procedure did account for most of the print-block specific differences. 23. As an additional check on the quality of the data, create a histogram of the normalized M values (the log2 ratios) by selecting View > Make Histogram from the main menu. 24. In the Histogram dialog box, a. Select Results Lowess M from the drop-down list in the text box. b. Insure that the Use two data columns check box is not checked. c. Click OK. 24. Select the Histogram tab ( ) to review the histogram (Figure 11). 9

Histogram Overview of Histogram display A Histogram displays a general profile of the raw data points representing all spots on a chip. Result Figure 11 Histogram representing normalized raw data You have proceeded through six steps of a raw data analysis workflow. You have normalized the raw data from a selected file, and have reviewed the results of each step on the raw data spreadsheet and/or graphics tabs. All of these components will also be part of the script you have been recording. The seventh and final step of the script follows, creating an Expression Run from the normalized data. Creating an Expression Run from Raw Data Overview of Creating Expression Runs from Raw Data The Save Column as Expression Runs tool is used for creating an Expression Run from two-channel raw data displayed in the Raw Data Viewer. The tool can also be used for saving M values (log2 ratios of the two channels) as an Expression Run. 10

Action 1. Create a new expression run by selecting Tool > Save Column as Expression Run. 2. In the Save Column as Expression Run dialog box that opens (Figure 12): a. In the Column: drop-down list, select Results Lowess normalized Results M, the column to be saved as an Expression Run. b. Select the Ratio radio button and select Log2 in the corresponding drop-down list. c. For the Target option, select Channel 1. d. Click OK. Figure 12 Selecting the raw data column to save as an Expression Run 28. In the Save Expression Run as dialog box that opens: a. Accept the default name for the Expression Run: sample.c1 Lowess normalized Results M and click Save. b. When prompted to open the Expression Run, select No. 29. To save the sample.c1 raw data object as you have modified it, select File > Save from the main menu. 30. Stop the macro record by selecting Tools > Xpression Scripts > Record script. 31. In the Enter script name dialog box, name the new script Speed_1 and click OK. 32. Choose File > Close from the main menu, and choose No when prompted to save the results. Result You have converted the raw data you normalized to an Expression Run, and saved the analyzed data as part of the raw data file. Additionally, all of the operations since you initiated the Record Script tool are now recorded in a script. You can use this script in the future to automate the same sequence of steps at any time. Applying the Script to Automatically Process and Save Data Overview of Applying an Expression Script Now you will apply the Speed_1 script to the remainder of the data you imported. As the script is executed, each raw data object is sequentially opened into the Raw Data Viewer, normalized, converted to an Expression Run and saved to the database. 11

Action 1. In the Vector Xpression Database Explorer, select the five Raw Data objects that you have not processed. Use CTRL + CLICK on each to select it. (Figure 13). Figure 13 Selecting raw data objects to be the target of script execution 2. To run the script on the selected objects, select Tools > Expression Scripts > Speed_1. (Figure 83). The script is executed, and a monitor follows its progress. The newly formed Expression Runs are saved to the database. 3. In the Xpression Database Explorer, select the Expression Runs table from the drop-down Tables list in the upper left corner (Figure 14). Figure 14 New Expression Runs created from the script execution display in the Xpression Database Explorer The Expression Runs displayed in the Database Objects Pane consist of the one Expression Run you created when you were recording the script and five Expression Runs when you ran the script against the other five newly imported raw data objects. Note how names of the six Expression Runs coincide with the data you imported and the column names of the normalized data. 4. Press CTRL + CLICK on each of the Expression Runs, and select Runs > Open. (Figure 84). The six Expression Runs open in an Expression Run Viewer. 12

Overview of Expression Run Viewer The Expression Run Viewer displays a textual representation of Expression Run data on a spreadsheet, allowing you to view the exact numerical data imported in all fields associated with a given file format. Additionally, the Expression Run Viewer provides the setting for the following operations on Expression Runs: generating histograms and Scatter Plots finding and merging Expression Runs normalizing Expression Runs converting absolute data to ratios performing various statistical analyses such as Latin Squares calculations and t-tests generating reports exporting user-selected numerical values to Microsoft Excel. Multiple Expression Runs can be saved as Run Projects, which also open in Expression Run Viewer. 5. Note the columns with the Lowess M results in the spreadsheet (Figure 15). Figure 15 New Expression Runs created from the Lowess M results column display the corresponding results in the Expression Run Viewer 7. Compare the results from all the genes from the two groups using a t-test. Begin by selecting Tools > Compare Groups > Compare Two Groups. 8. In the Select Group Comparison Methods dialog box that opens, select the T Test check box (Figure 16). The T test (or any test you select) in this dialog box is described in the panel to the right of the test options. Click OK. 13

Figure 16 Selecting a statistical test for Expression Run data 9. In the Compare Groups dialog box that opens, identify in the two groups of Expression Runs for comparison by selecting the appropriate check boxes. c. In the left-most pane, check sample.c1, sample.c2 and sample.c3 Lowess normalized ; in the right-most pane, check sample.t1, sample.t2, and sample.t3 Lowess normalized. (Figure 17). d. Click OK. Figure 17 Selecting Expression Runs for the two-group comparison The t values and probabilities for each gene are computed and displayed in the Expression Run Viewer spreadsheet in new columns under the T Test label (Figure 18). 14

Result Figure 18 T-test values calculated for Expression Runs display as additional columns on the spreadsheet You have opened Expression Runs that you converted from normalized raw data using a script you recorded in this tutorial. You have used one statistical tool available in the Expression Run Viewer as an example of the many tools that can be used in this setting. Other tools are available in the Expression Run Viewer, but you have completed the main purpose of this tutorial, creating a script for automating the normalization of raw data. This ends Part III of the Speed tutorial. 15