Nature Methods: doi: /nmeth Supplementary Figure 1

Similar documents
CLC Server. End User USER MANUAL

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Expression Analysis with the Advanced RNA-Seq Plugin

Differential Expression Analysis at PATRIC

Mascot Insight is a new application designed to help you to organise and manage your Mascot search and quantitation results. Mascot Insight provides

srap: Simplified RNA-Seq Analysis Pipeline

Figures and figure supplements

Orange3 Data Fusion Documentation. Biolab

Course on Microarray Gene Expression Analysis

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels.

Chapter 5snow year.notebook March 15, 2018

Advanced Applied Multivariate Analysis

Tutorial: De Novo Assembly of Paired Data

Learn What s New. Statistical Software

Analyzing Genomic Data with NOJAH

An Introduction to Preparing Data for Analysis with JMP. Full book available for purchase here. About This Book... ix About The Author...

NGS Data Visualization and Exploration Using IGV

Minitab 17 commands Prepared by Jeffrey S. Simonoff

SAS (Statistical Analysis Software/System)

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

ChIP-Seq Tutorial on Galaxy

PSS718 - Data Mining

Page 1. Graphical and Numerical Statistics

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Supplementary Material

Dimension Induced Clustering

Key Terms. Symbology. Categorical attributes. Style. Layer file

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

SEEK User Manual. Introduction

Quick Start Guide. Copyright 2016 Rapid Insight Inc. All Rights Reserved

Tutorial 7: Automated Peak Picking in Skyline

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Visual Analytics. Visualizing multivariate data:

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

EECS730: Introduction to Bioinformatics

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018

Release Notes. JMP Genomics. Version 4.0

Package SC3. November 27, 2017

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

ViTraM: VIsualization of TRAnscriptional Modules

Package SC3. September 29, 2018

MSA220 - Statistical Learning for Big Data

Expander Online Documentation

Exploring gene expression datasets

Data Management - 50%

All About PlexSet Technology Data Analysis in nsolver Software

Supplementary text S6 Comparison studies on simulated data

Corra v2.0 User s Guide

Frequency Distributions

Our typical RNA quantification pipeline

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2016

CHAPTER 2 Modeling Distributions of Data

IT 403 Practice Problems (1-2) Answers

More about liquid association

Numerical Descriptive Measures

TexRAD Research Version Client User Guide Version 3.9

STATA 13 INTRODUCTION

So..to be able to make comparisons possible, we need to compare them with their respective distributions.

Kernel Density Estimation (KDE)

Fathom Dynamic Data TM Version 2 Specifications

Version 2.4 of Idiogrid

Step-by-Step Guide to Advanced Genetic Analysis

Introduction to BEST Viewpoints

Tutorial. De Novo Assembly of Paired Data. Sample to Insight. November 21, 2017

Analyzing ICAT Data. Analyzing ICAT Data

SAS (Statistical Analysis Software/System)

LEGENDplex Data Analysis Software Version 8 User Guide

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

Choosing the Right Procedure

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Band matching and polymorphism analysis

In this tutorial, we show how to implement this approach and how to interpret the results with Tanagra.

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

Exercise Producing Thematic Maps for Dissemination

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Using Large Data Sets Workbook Version A (MEI)

3. Data Preprocessing. 3.1 Introduction

2. Data Preprocessing

PROMO 2017a - Tutorial

SAS Visual Analytics 8.2: Working with Report Content

Introduction to Geospatial Analysis

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

ViTraM: VIsualization of TRAnscriptional Modules

CHAPTER 2: Describing Location in a Distribution

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

SciMiner User s Manual

US Geo-Explorer User s Guide. Web:

UNIT 4. Research Methods in Business

Minitab Notes for Activity 1

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures

Tutorial Base Module

Data Preprocessing. Data Preprocessing

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017

An Introduction to Minitab Statistics 529

Release Notes

Tutorial 3. Chiun-How Kao 高君豪

Regression III: Advanced Methods

Advanced RNA-Seq 1.5. User manual for. Windows, Mac OS X and Linux. November 2, 2016 This software is for research purposes only.

Transcription:

Supplementary Figure 1 Schematic representation of the Workflow window in Perseus All data matrices uploaded in the running session of Perseus and all processing steps are displayed in the order of execution. The workflow allows the users to keep track of all steps in the analysis and to navigate through data matrices and visualization components just by clicking on the respective node in the diagram. The nodes can be modified to contain description and additional information for clarity. If a data matrix node is selected, information about the number of samples and data points is displayed in the most right panel of Perseus. Moreover, if an analysis node is selected, all parameters that were used in that step can be reviewed. Each data matrix, as well as all visualization windows can be exported in publication ready formats. The workflow scheme can be conveniently saved as a pdf file and used as a documentation of all steps of the analysis.

Supplementary Figure 2 Plug-in architecture of Perseus The current structure of Perseus relies on a data matrix type and various functions for accessing and transforming the matrix are developed. The base code implementing these operations is open source and can be downloaded from GitHub (github.com/jurgencox/perseus-plugins). The rest of the functionality is organized in two main interfaces: Processing and Analysis and the resulting module are added to the software core as plug-ins. Developers wishing to extend the software can build upon the main source code and contribute the new plug-ins to our online plug-in store.

Supplementary Figure 3 Missing value imputation Perseus offers several imputation techniques including a method that draws random values from a distribution meant to simulate expression below the detection limit. The width and the down shift of the distribution can be set to closely represent the missing population. When missing values occur randomly, a distribution similar to that of the measured data is normally used for imputation. In contrast, a frequently used assumption in proteomics experiments is that low expression proteins give rise to missing values, therefore a Gaussian distribution with a median shifted from the measured data distribution median towards low expression should result in accurate imputation of such values. The mode parameter defines the measured data distribution to be used in the calculation of the random distribution. When the samples do not differ largely in their overall distribution, the use of the complete dataset is recommended. The measured distribution is shown in blue and the imputed values in orange. (a) No down-shift and distribution width of 0.5 do not simulate low abundant missing values. (b) Down-shift of 1.8 and distribution width of 0.5 simulate the assumption of low abundant proteins giving rise to missing values. (c) Down-shift of 3.6 and width of 0.5 result in an undesirable bi-modal distribution.

Supplementary Figure 4 Density-enhanced scatterplots between proteome, transcriptome and translatome levels produced by the upload plug-in Short read NGS data as for instance produced by the Illumina platform can be imported for further analysis in the Perseus workflow. In the example we calculate RPKM values for each gene (Ingolia N. T. et al., Science, 2009) and compare these with ibaq values calculated by MaxQuant from proteomics data derived from yeast (Kulak N. A. et al., Nature methods, 2014).

Supplementary Figure 5 Augmented data matrix In addition to the main data matrix, Perseus can make use of background information complementary to the expression columns. (a) Often one of the first processing steps in data analysis is filtering for a minimum number of valid values. As some statistical methods require all values to be present (e.g. PCA) data imputation may be necessary. Upon imputation a second matrix is created in the background storing information of which values were measured and which imputed and can later be used to highlight or remove the imputed values. (b) In a more advanced filtering option, first a Quality matrix is created, which contains additional information about each expression value in the main matrix and which is used for filtering. For example, the number of peptides used for protein quantification can be used to filter proteins, which were identified with less than 2 peptides.

Supplementary table 1. A list of the main functionalities in Perseus. LOAD ANALYSIS Generic matrix upload Visualization Raw upload Scatter plot Create gene list Profile plot Binary upload Histogram Create random matrix Multi-scatter plot Next generation sequencing data upload 3D plot Clustering/PCA MULTI-PROC. Hierarchical clustering Basic Principal component analysis Match rows by name Misc. Match columns by name Volcano plot Replace strings Select rows manually Sequence logos EXPORT Numeric venn diagram Generic matrix export PROCESSING Remove empty columns Basic Transpose Transform Sort by column Combine main columns Fill categorical columns Column correlation De-hyphenate ids Row correlation Expand multi-numeric and text columns Summary statistics (columns) Unique values Summary statistics (rows) Convert multi-numeric column Quantiles Combine categorical columns Density estimation Process text column Performance curves Search text column Combine rows by identifiers Normalization Clone Z-score Add noise Rank Rearrange Unit vectors Change column type Scale to interval Rename columns Width adjustment Rename columns [reg. ex.] Subtract Reorder/remove columns Divide Reorder/remove annotation rows Modify by column Duplicate columns Subtract row cluster Combine annotations Un-Z-score

Filter rows Imputation Filter rows based on categorical column Replace missing val. from normal distrib. Filter rows based on numerical/main column Replace missing values by constant Filter rows based on text column Replace imputed values by NaN Filter rows based on valid values Modifications Filter rows based on random sampling Expand site table Filter columns Add linear motifs Filter columns based on categorical row Add known sites Filter columns based on valid values Add modification counts Quality Kinase-substrate relations Create quality matrix Add sequence features Filter quality Add regulatory sites Convert to NaN Shorten motif length Annot. columns Time series Add annotation Cyclic annotation enrichment To base identifiers Periodicity analysis Fisher exact test Periodogram Average categories Time series ordering Category counting Outliers 1D annotation enrichment Significance A 2D annotation enrichment Significance B Annot. rows Learning Categorical annotation rows Classification Numerical annotation rows Classification feature optimization Average groups Classification parameter optimization Join terms in categorical row Clustering Tests Generic clustering One-sample tests Two-sample tests Multiple-sample tests Two-way ANOVA Three-way ANOVA