Package ffpe. October 1, 2018

Similar documents
Course on Microarray Gene Expression Analysis

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode

affyqcreport: A Package to Generate QC Reports for Affymetrix Array Data

Package mimager. March 7, 2019

Package STROMA4. October 22, 2018

DAY 52 BOX-AND-WHISKER

Package yaqcaffy. January 19, 2019

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode

Package AffyExpress. October 3, 2013

Package ChIPXpress. October 4, 2013

Package OrderedList. December 31, 2017

Package pcagopromoter

Package affyplm. January 6, 2019

Package makecdfenv. March 13, 2019

Package cghmcr. March 6, 2019

Package macorrplot. R topics documented: October 2, Title Visualize artificial correlation in microarray data. Version 1.50.

Package dupradar. R topics documented: July 12, Type Package

Package snm. July 20, 2018

Package TilePlot. April 8, 2011

Package TilePlot. February 15, 2013

Package cycle. March 30, 2019

Package matchbox. December 31, 2018

Package affypdnn. January 10, 2019

Package sscore. R topics documented: June 27, Version Date

Package SC3. September 29, 2018

Package POST. R topics documented: November 8, Type Package

Package M3Drop. August 23, 2018

Package EMDomics. November 11, 2018

Package dualks. April 3, 2019

Package scmap. March 29, 2019

Package clusterseq. R topics documented: June 13, Type Package

Package GSRI. March 31, 2019

Package biosigner. March 6, 2019

Package MiRaGE. October 16, 2018

Package CONOR. August 29, 2013

Package RLMM. March 7, 2019

Package starank. August 3, 2013

/ Computational Genomics. Normalization

Package cgh. R topics documented: February 19, 2015

Package ConsensusClusterPlus

Package ABSSeq. September 6, 2018

Towards an Optimized Illumina Microarray Data Analysis Pipeline

Package flowdensity. June 28, 2018

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays

NCSS Statistical Software

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Chapter 2 Describing, Exploring, and Comparing Data

Package crossmeta. September 5, 2018

Package starank. May 11, 2018

Package affylmgui. July 21, 2018

Package LMGene. R topics documented: December 23, Version Date

Package ruv. March 12, 2018

Microarray Data Analysis (VI) Preprocessing (ii): High-density Oligonucleotide Arrays

Package spikeli. August 3, 2013

Package RmiR. R topics documented: September 26, 2018

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster

Package M3D. March 21, 2019

Package DFP. February 2, 2018

Package BgeeDB. January 5, 2019

Introduction: microarray quality assessment with arrayqualitymetrics

Package clstutils. R topics documented: December 30, 2018

Package CNTools. November 6, 2018

Package RobustRankAggreg

Package anota2seq. January 30, 2018

Package virtualarray

Microarray Excel Hands-on Workshop Handout

Package geecc. R topics documented: December 7, Type Package

Package TMixClust. July 13, 2018

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

The analysis of rtpcr data

ROTS: Reproducibility Optimized Test Statistic

srap: Simplified RNA-Seq Analysis Pipeline

No. of blue jelly beans No. of bags

Package GSVA. R topics documented: January 22, Version

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Package RTNduals. R topics documented: March 7, Type Package

Package twilight. August 3, 2013

Chapter 3 - Displaying and Summarizing Quantitative Data

Package RobLoxBioC. August 3, 2018

Package BAC. R topics documented: November 30, 2018

Package qboxplot. November 12, qboxplot... 1 qboxplot.stats Index 5. Quantile-Based Boxplots

Numerical Summaries of Data Section 14.3

Package affyio. January 18, Index 6. CDF file format function

Package ALDEx2. April 4, 2019

Package les. October 4, 2013

Package omicade4. June 29, 2018

Package SC3. November 27, 2017

Package splinetimer. December 22, 2016

Package simulatorz. March 7, 2019

Preprocessing Affymetrix Data. Educational Materials 2005 R. Irizarry and R. Gentleman Modified 21 November, 2009, M. Morgan

Package arrayqualitymetrics

Evaluation of VST algorithm in lumi package

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Package Binarize. February 6, 2017

Package dmrseq. September 14, 2018

Package HTSFilter. November 30, 2017

Description of gcrma package

Quality control of array genotyping data with argyle Andrew P Morgan

Package anidom. July 25, 2017

Transcription:

Type Package Package ffpe October 1, 2018 Title Quality assessment and control for FFPE microarray expression data Version 1.24.0 Author Levi Waldron Maintainer Levi Waldron <lwaldron.research@gmail.com> Description Identify low-quality data using metrics developed for expression data derived from Formalin-Fixed, Paraffin-Embedded (FFPE) data. Also a function for making Concordance at the Top plots (CAT-plots). Collate sampleqc.r sortediqrplot.r CATplot.R Depends R (>= 2.10.0), TTR, methods Imports Biobase, BiocGenerics, affy, lumi, methylumi, sfsmisc Suggests genefilter, ffpeexampledata License GPL (>2) LazyLoad yes biocviews Microarray, GeneExpression, QualityControl git_url https://git.bioconductor.org/packages/ffpe git_branch RELEASE_3_7 git_last_commit 8047032 git_last_commit_date 2018-04-30 Date/Publication 2018-09-30 R topics documented: ffpe-package......................................... 2 CATplot........................................... 3 sampleqc.......................................... 4 sortediqrplot........................................ 7 Index 9 1

2 ffpe-package ffpe-package Quality assessment and control for FFPE microarray expression data Description Details Identify low-quality data using metrics developed for expression data derived from Formalin-Fixed, Paraffin-Embedded (FFPE) data. Also a function for making Concordance at the Top plots (CATplots). Package: ffpe Type: Package Version: 1.0.0 Date: 2011-11-17 License: GPL (>=2) LazyLoad: yes biocviews: Microarray, GeneExpression, QualityControl, Bioinformatics Quality control of FFPE expression data for Illumina and Affymetrix microarrays. The function sampleqc identifies low-quality expression data, using IQR or any other surrogate quality measure for expression data. sortediqrplot provides a simplified, sorted boxplot of raw expression intensities as a quality summary for the experiment, suitable for large sample sizes and multiple batches. Author(s) Levi Waldron Maintainer: Levi Waldron <lwaldron@hsph.harvard.edu> References under review Examples library(ffpeexampledata) data(lumibatch.gse17565) QC <- sampleqc(lumibatch.gse17565,xaxis="index",cor.to="pseudochip",qcmeasure="iqr") ##sort samples QCvsRNA <- data.frame(inputrna.ng=lumibatch.gse17565$inputrna.ng,rejectqc=qc$rejectqc) QCvsRNA <- QCvsRNA[order(QCvsRNA$rejectQC,-QCvsRNA$inputRNA.ng),] ##QC rejects samples with lowest input RNA concentration\n par(mgp=c(4,2,0)) dotchart(log10(qcvsrna$inputrna.ng), QCvsRNA$rejectQC, xlab="log10(rna conc. in ng)",

CATplot 3 ylab="rejected?", col=ifelse(qcvsrna$rejectqc,"red","black")) CATplot Make a Concordance at the Top plot. Description For the i top-ranked members of each list, concordance is defined as length(intersect(vec1[1:i],vec2[1:i]))/i. This concordance is plotted as a function of i. Usage CATplot(vec1, vec2, maxrank = min(length(vec1),length(vec2)), make.plot = TRUE,...) Arguments vec1, vec2 maxrank make.plot Two numeric vectors, for computing concordance. If these are numeric vectors with names, the numeric values will be used for sorting and the names will be used for calculating concordance. Otherwise, they are assumed to be already-ranked vectors, and the values themselves will be used for calculating concordance. Optionally specify the maximum size of top-ranked items that you want to plot. If TRUE, the plot will be made. Set to FALSE if you just want the concordance calculations.... Optional arguments passed onto plot() Value Returns a dataframe with two columns: i concordance length of top lists fraction in common that the two provided lists have in the top i items Author(s) Levi Waldron <lwaldron@hsph.harvard.edu> References The CAT-plot was suggested by Irizarry et al.: Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat Meth 2, 345-350 (2005). The CAT-boxplot for multiple splits of a single dataset was suggested by Waldron et al. (under review).

4 sampleqc Examples library(ffpeexampledata) data(lumibatch.gse17565) ##preprocessing, individually for rep1 and rep2 lumibatch.rep1 <- lumibatch.gse17565[,lumibatch.gse17565$replicate==1] lumbiatch.rep1 <- lumit(lumibatch.rep1,"log2") lumbiatch.rep1 <- lumin(lumibatch.rep1,"quantile") probe.var <- apply(exprs(lumibatch.rep1),1,var) lumibatch.rep1 <- lumibatch.rep1[probe.var > median(probe.var),] lumibatch.rep2 <- lumibatch.gse17565[,lumibatch.gse17565$replicate==2] lumibatch.rep2 <- lumit(lumibatch.rep2,"log2") lumibatch.rep2 <- lumin(lumibatch.rep2,"quantile") lumibatch.rep2 <- lumibatch.rep2[featurenames(lumibatch.rep1),] ##row t-tests for differential expression library(genefilter) ttests.rep1 <- rowttests(exprs(lumibatch.rep1),fac=factor(lumibatch.rep1$cell.type)) ttests.rep2 <- rowttests(exprs(lumibatch.rep2),fac=factor(lumibatch.rep2$cell.type)) pvals.rep1 <- ttests.rep1$p.value;names(pvals.rep1) <- rownames(ttests.rep1) pvals.rep2 <- ttests.rep2$p.value;names(pvals.rep2) <- rownames(ttests.rep2) ## Very high concordance between top differentially expressed gene lists ## identified by different replicates x <- CATplot(pvals.rep1,pvals.rep2,maxrank=1000,xlab="Size of top-ranked gene lists",ylab="concordance") legend("topleft",lty=1:2,legend=c("actual concordance","concordance expected by chance"), bty="n") sampleqc Sample quality control for FFPE expression data Description Usage Expression data from FFPE tissues may contain a much larger range of quality than data from freshfrozen tissues. This function sorts samples by some specified measure of array quality, Interquartile Range (IQR) by default, and plots the correlation of each sample s expression values against a typical sample for the study, as a function of the quality measure. A typical sample can either be the median pseudochip (default), or a specified number of samples with quality measure most similar to that sample. An attempt is made to automatically select a threshold for rejection of lowquality samples, at the point of largest negative inflection of a Loess smoothing curve. for this plot. sampleqc(data.obj,logtransform = TRUE, goby = 3, xaxis = "notindex", QCmeasure = "IQR", cor.to = "p ## S4 method for signature 'LumiBatch' sampleqc(data.obj,logtransform = TRUE, goby = 3, xaxis = "notindex", QCmeasure = "IQR", cor.to = "p ## S4 method for signature 'AffyBatch' sampleqc(data.obj,logtransform = TRUE, goby = 3, xaxis = "notindex", QCmeasure = "IQR", cor.to = "p

sampleqc 5 ## S4 method for signature 'matrix' sampleqc(data.obj,logtransform = TRUE, goby = 3, xaxis = "notindex", QCmeasure = "IQR", cor.to = "p Arguments data.obj logtransform goby xaxis QCmeasure A data object of class LumiBatch, AffyBatch, or matrix. If matrix, the columns should contain samples and the rows probes. If using QCmeasure="IQR", it is critical that the data not be normalized. QCmeasure="ndetectedprobes" currently works only for LumiBatch objects. If TRUE, data will be log2-transformed before calculating IQR and correlation. This number of samples above and below each sample will be used to for calculating correlation. Used only if cor.to="similar". If "index", the QC measure will be converted to ranks. This can be useful for very discontinuous values of the QC measure, which interfere with generation of a smoothing line. If "notindex", the QC measure is used as-is. Automated options include "IQR" and "ndetectedprobes". These are the Interquartile Range and number of probes called present, respectively. QCmeasure can also be a numeric vector of length equal to the number of samples, to manually specify some other quality metric. cor.to "similar" to calculate correlation of each chip to neighbors within a sliding window of size 2*goby+1, or "pseudochip" to calculate correlation to a study-wide pseudochip. The former can be more sensitive, or more appropriate with a large number of failed chips, but does not work well with small sample size (<20). The default is "pseudochip". pseudochip.samples An integer vector, specifying the column numbers of samples to use in calculation of the median pseudochip. Default is to use all samples. detectionth manualcutoff mincor Nnominal detection p-value to consider a probe as detected or not (0.01 by default). Used only if QCmeasure="ndetectedprobes" and class(data.obj)=="lumibatch" Optional manual specification of a cutoff for good and bad samples. If xaxis="index", manualcutoff specifies the number of samples that will be rejected. If xaxis="notindex", it is the value of the QC measure plotted on the x-axis, below which samples will be rejected. Optional specification of a minimum correlation to the sliding window samples or median pseudochip, below which all samples will be rejected for QC. This is drawn as a horizontal line on the output plot. maxcor Optional specification of an upper limit of correlation to sliding window samples or median pseudochip, above which samples will not be considered in determining the QC cutoff. This can be useful if some structure in the high-quality end of the plot has the maximum downward inflection, causing most samples to be incorrectly rejected. In such case, specifying maxcor can force the otherwise automatically-determined threshold into a more reasonable region. Only values above at least 0.25, and probably below 0.8, make sense. below.smoothed.threshold Samples falling more than below.smoothed.threshold times the IQR of the residuals will be rejected for low QC. Large negative residuals from the Loess best-fit line may indicate outlier samples even if that sample has a high IQR or other quality measure.

6 sampleqc lowess.f labelnote pch lw Details Value linecol make.legend main.title Degree of smoothing of the Loess best-fit line (see?loess) An optional label for the plot, used only if QCmeasure is a numeric vector. Plotting character to be used for points (see?par). Line width for Loess curve. Line color for Loess curve. If TRUE, an automatic legend will be added to the plot. If specified, this over-rides the automatically-generated title.... Other arguments passed on to plot(). These methods aid in the identification of low-quality samples from FFPE expression data, when technical replication is not available. If only one method is specified (one value each for xaxis, QCmeasure, and cor.to, the output is a dataframe with the following columns: i spearman movingaverage interpolate.i smoothed ddy rejectqc index or QC measure of each sample spearman correlation to sliding window samples or to median pseudochip moving average smoothing of spearman correlation evenly spaced QC measure (index or actual QC measure) used for plotting Loess curve values of the Loess curve second derivative of the Loess curve was this sample rejected in the QC process? logical TRUE or FALSE If more than one method is specified, the output is a list, where each element contains a dataframe of the above description. Author(s) Levi Waldron <lwaldron@hsph.harvard.edu> References Under review. Examples library(ffpeexampledata) data(lumibatch.gse17565) QC <- sampleqc(lumibatch.gse17565,xaxis="index",cor.to="pseudochip",qcmeasure="iqr") ##sort samples QCvsRNA <- data.frame(inputrna.ng=lumibatch.gse17565$inputrna.ng,rejectqc=qc$rejectqc) QCvsRNA <- QCvsRNA[order(QCvsRNA$rejectQC,-QCvsRNA$inputRNA.ng),]

sortediqrplot 7 ##QC rejects samples with lowest input RNA concentration\n par(mgp=c(4,2,0)) dotchart(log10(qcvsrna$inputrna.ng), QCvsRNA$rejectQC, xlab="log10(rna conc. in ng)", ylab="rejected?", col=ifelse(qcvsrna$rejectqc,"red","black")) sortediqrplot Minimal box plot of samples, which get ordered optionally by batch, then by IQR. Description Usage This plots only the 25th to 75th percentile of expression intensities (Interquartile Range), sorted from smallest to largest IQR. This modification is more readable than a regular boxplot for large sample sizes. An optional batch variable may be specified, so that the sorting is done within each batch. sortediqrplot(data.obj, batchvar = rep(1, ncol(data.obj)), dolog2=false,...) ## S4 method for signature 'LumiBatch' sortediqrplot(data.obj, batchvar = rep(1, ncol(data.obj)), dolog2=false,...) ## S4 method for signature 'AffyBatch' sortediqrplot(data.obj, batchvar = rep(1, ncol(data.obj)), dolog2=false,...) ## S4 method for signature 'matrix' sortediqrplot(data.obj, batchvar = rep(1, ncol(data.obj)), dolog2=false,...) Arguments Details Value data.obj An object of class LumiBatch, AffyBatch, AffyBatch, or matrix. This should contain raw, unnormalized, expression intensities. batchvar Optional integer batch variable. If specified, samples will be sorted within batches only. Default is to assume a single batch for all data. dolog2 If TRUE, data will be log2-transformed before plotting. Default is FALSE.... Optional arguments passed on to the plot() function. Function will be generally called for the side-effect of producing a plot of sorted IQRs, but if the output is redirected it also produces the IQRs. If the output is redirected, e.g.: output <- sortediqrplot() the function will return the IQR of each sample.

8 sortediqrplot Author(s) Levi Waldron <lwaldron@hsph.harvard.edu> References Under review. See Also sampleqc Examples library(ffpeexampledata) data(lumibatch.gse17565) sortediqrplot(lumibatch.gse17565,main="gse17565")

Index Topic hplot CATplot, 3 sampleqc, 4 sortediqrplot, 7 Topic package ffpe-package, 2 CATplot, 3 ffpe (ffpe-package), 2 ffpe-package, 2 sampleqc, 4 sampleqc,affybatch-method (sampleqc), 4 sampleqc,lumibatch-method (sampleqc), 4 sampleqc,matrix-method (sampleqc), 4 sampleqc-method (sampleqc), 4 sortediqrplot, 7 sortediqrplot,affybatch-method (sortediqrplot), 7 sortediqrplot,lumibatch-method (sortediqrplot), 7 sortediqrplot,matrix-method (sortediqrplot), 7 9