Shrinkage of logarithmic fold changes

Similar documents
Count outlier detection using Cook s distance

How to Use pkgdeptools

Image Analysis with beadarray

Advanced analysis using bayseq; generic distribution definitions

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes

Analysis of two-way cell-based assays

CQN (Conditional Quantile Normalization)

segmentseq: methods for detecting methylation loci and differential methylation

An Introduction to the methylumi package

How To Use GOstats and Category to do Hypergeometric testing with unsupported model organisms

MIGSA: Getting pbcmc datasets

riboseqr Introduction Getting Data Workflow Example Thomas J. Hardcastle, Betty Y.W. Chung October 30, 2017

Practical: exploring RNA-Seq counts Hugo Varet, Julie Aubert and Jacques van Helden

How to use CNTools. Overview. Algorithms. Jianhua Zhang. April 14, 2011

segmentseq: methods for detecting methylation loci and differential methylation

Introduction to the Codelink package

The bumphunter user s guide

RNAseq Differential Expression Analysis Jana Schor and Jörg Hackermüller November, 2017

Creating a New Annotation Package using SQLForge

MLSeq package: Machine Learning Interface to RNA-Seq Data

The analysis of rtpcr data

MMDiff2: Statistical Testing for ChIP-Seq Data Sets

INSPEcT - INference of Synthesis, Processing and degradation rates in Time-course analysis

Analysis of screens with enhancer and suppressor controls

How to use cghmcr. October 30, 2017

A complete analysis of peptide microarray binding data using the pepstat framework

Creating a New Annotation Package using SQLForge

Analysis of Pairwise Interaction Screens

PROPER: PROspective Power Evaluation for RNAseq

Visualisation, transformations and arithmetic operations for grouped genomic intervals

bayseq: Empirical Bayesian analysis of patterns of differential expression in count data

Simulation of molecular regulatory networks with graphical models

An Overview of the S4Vectors package

Bioconductor: Annotation Package Overview

mocluster: Integrative clustering using multiple

Working with aligned nucleotides (WORK- IN-PROGRESS!)

HowTo: Querying online Data

Introduction: microarray quality assessment with arrayqualitymetrics

Creating IGV HTML reports with tracktables

An Introduction to the methylumi package

INSPEcT - INference of Synthesis, Processing and degradation rates in Time-course analysis

NacoStringQCPro. Dorothee Nickles, Thomas Sandmann, Robert Ziman, Richard Bourgon. Modified: April 10, Compiled: April 24, 2017.

Classification of Breast Cancer Clinical Stage with Gene Expression Data

RMassBank for XCMS. Erik Müller. January 4, Introduction 2. 2 Input files LC/MS data Additional Workflow-Methods 2

Contents. Introduction 2

Getting started with flowstats

Overview of ensemblvep

Evaluation of VST algorithm in lumi package

ISA internals. October 18, Speeding up the ISA iteration 2

Analyse RT PCR data with ddct

SimBindProfiles: Similar Binding Profiles, identifies common and unique regions in array genome tiling array data

Introduction: microarray quality assessment with arrayqualitymetrics

GenoGAM: Genome-wide generalized additive

crlmm to downstream data analysis

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec

Analyzing RNA-seq data for differential exon usage with the DEXSeq package

Differential Expression with DESeq2

Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set

An Introduction to ShortRead

Overview of ensemblvep

The rtracklayer package

ssviz: A small RNA-seq visualizer and analysis toolkit

An Introduction to GenomeInfoDb

Preliminary Figures for Renormalizing Illumina SNP Cell Line Data

SCnorm: robust normalization of single-cell RNA-seq data

The REDseq user s guide

Cardinal design and development

DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis

GxE.scan. October 30, 2018

HMMcopy: A package for bias-free copy number estimation and robust CNA detection in tumour samples from WGS HTS data

SIBER User Manual. Pan Tong and Kevin R Coombes. May 27, Introduction 1

Wave correction for arrays

Fitting Baselines to Spectra

The Command Pattern in R

Introduction to GenomicFiles

RDAVIDWebService: a versatile R interface to DAVID

Using crlmm for copy number estimation and genotype calling with Illumina platforms

Analysing RNA-Seq data with the DESeq package

genefu: a package for breast cancer gene expression analysis

M 3 D: Statistical Testing for RRBS Data Sets

RAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1

Package DESeq2. April 9, 2015

highscreen: High-Throughput Screening for Plate Based Assays

qplexanalyzer Matthew Eldridge, Kamal Kishore, Ashley Sawle January 4, Overview 1 2 Import quantitative dataset 2 3 Quality control 2

MethylAid: Visual and Interactive quality control of Illumina Human DNA Methylation array data

MultipleAlignment Objects

Analysing RNA-Seq data with the DESeq package

Graphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

Introduction to BatchtoolsParam

Basic4Cseq: an R/Bioconductor package for the analysis of 4C-seq data

Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors

Comparing methods for differential expression analysis of RNAseq data with the compcoder

rhdf5 - HDF5 interface for R

Preprocessing and Genotyping Illumina Arrays for Copy Number Analysis

Analyze Illumina Infinium methylation microarray data

ExomeDepth. Vincent Plagnol. May 15, What is new? 1. 4 Load an example dataset 4. 6 CNV calling 5. 9 Visual display 9

Sparse Model Matrices

Plotting Segment Calls From SNP Assay

Graphics - Part III: Basic Graphics Continued

Basic4Cseq: an R/Bioconductor package for the analysis of 4C-seq data

Transcription:

Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs and once with unshrunken LFCs. library("deseq2") library("deseq2paper") data("bottomly_sumexp") dds <- DESeqDataSetFromMatrix(assay(bottomly), DataFrame(colData(bottomly)), ~strain) ddsprior <- DESeq(dds, minreplicatesforreplace = Inf) ## estimating size factors ## estimating dispersions ## gene-wise dispersion estimates ## mean-dispersion relationship ## final dispersion estimates ## fitting model and testing ddsnoprior <- nbinomwaldtest(ddsprior, betaprior = FALSE) ## you had results columns, replacing these The following code, not run, was used to find two genes with similar average expression strength and unshrunken LFC, but with disparate dispersion estimates (and therefore Wald statistic in the unshrunken case). with(results(ddsnoprior), plot(basemean, log2foldchange, log = "x", ylim = c(0, 4), xlim = c(10, 1000), col = ifelse(padj < 0.1, "red", "black"), cex = log(abs(stat)))) genes <- rownames(ddsnoprior)[with(results(ddsnoprior), identify(basemean, log2foldchange))] The two genes chosen are: genes <- c("ensmusg00000081504", "ENSMUSG00000092968") We now extract the results from each run. Note that the second gene (with large dispersion) has a high adjusted p-value with or without the shrinkage of LFCs. betapriorvar <- attr(ddsprior, "betapriorvar") resnoprior <- results(ddsnoprior, cookscutoff = FALSE) resnoprior[genes, "padj"] ## [1] 4.123e-24 1.957e-01 resprior <- results(ddsprior, cookscutoff = FALSE) resprior[genes, "padj"] ## [1] 3.647e-24 2.077e-01 1

2 Plot cols <- c("green3", "mediumpurple3") line <- 0.1 adj <- -0.4 cex <- 1.2 par(mar = c(4.5, 4.5, 1.3, 0.5), mfrow = c(2, 2)) # two MA plots plotma(resnoprior, ylim = c(-4, 4), ylab = expression(mle ~ log[2] ~ fold ~ change), colnonsig = rgb(0, 0, 0, 0.3), colsig = rgb(1, 0, 0, 0.3), colline = NULL) abline(h = 0, col = "dodgerblue", lwd = 2) with(mcols(ddsnoprior[genes, ]), points(basemean, strain_dba.2j_vs_c57bl.6j, cex = 1.7, lwd = 2, col = cols)) legend("bottomright", "adj. p <.1", pch = 16, col = "red", cex = 0.9, bg = "white") mtext("a", side = 3, line = line, adj = adj, cex = cex) plotma(resprior, ylim = c(-4, 4), ylab = expression(map ~ log[2] ~ fold ~ change), colnonsig = rgb(0, 0, 0, 0.3), colsig = rgb(1, 0, 0, 0.3), colline = NULL) abline(h = 0, col = "dodgerblue", lwd = 2) with(mcols(ddsprior[genes, ]), points(basemean, straindba.2j - strainc57bl.6j, cex = 1.7, lwd = 2, col = cols)) legend("bottomright", "adj. p <.1", pch = 16, col = "red", cex = 0.9, bg = "white") mtext("b", side = 3, line = line, adj = adj, cex = cex) # data plot k1 <- counts(ddsnoprior)[genes[1], ] k2 <- counts(ddsnoprior)[genes[2], ] mu1 <- assays(ddsnoprior)[["mu"]][genes[1], ] mu2 <- assays(ddsnoprior)[["mu"]][genes[2], ] cond <- as.numeric(coldata(dds)$strain) - 1 sf <- sizefactors(ddsnoprior) plot(c(cond, cond + 2) + runif(2 * 21, -0.1, 0.1), c(k1/sf, k2/sf), xaxt = "n", xlab = "", ylab = "normalized counts", col = rep(cols, each = 21), cex = 0.5) abline(v = 1.5) axis(1, at = 0:3, label = rep(levels(coldata(ddsnoprior)$strain), times = 2), las = 2, cex.axis = 0.8) mtext("c", side = 3, line = line, adj = adj, cex = cex) # prior/likelihood/posterior curves plot betas <- seq(from = -1, to = 3, length = 500) disps <- dispersions(ddsnoprior[genes, ]) a1 <- disps[1] a2 <- disps[2] # the prior is on the expanded fold changes (effects for both groups) so to # translate to a univariate case, multiply the prior standard deviation by # sqrt(2) why? sigma_expanded^2 = (.5 beta)^2 + (.5 beta)^2 = 0.5 beta^2 # sigma_standard^2 = (.5 beta +.5 beta)^2 = beta^2 priorsigma <- sqrt(betapriorvar[2]) * sqrt(2) likelihood <- function(k, alpha, intercept) { 2

z <- sapply(betas, function(b) { prod(dnbinom(k, mu = sf * 2^(intercept + b * cond), size = 1/alpha)) ) z/sum(z * diff(betas[1:2])) posterior <- function(k, alpha, intercept) { z <- likelihood(k, alpha, intercept) * dnorm(betas, 0, priorsigma) z/sum(z * diff(betas[1:2])) points2 <- function(x, y, col = "black", lty) { points(x, y, col = col, lty = lty, type = "l") points(x[which.max(y)], max(y), col = col, pch = 20) ints <- mcols(ddsnoprior[genes, ])$Intercept intsprior <- mcols(ddsprior[genes, ])$Intercept + mcols(ddsprior[genes, ])$strainc57bl.6j # estimate the LFC using DESeq and test equality deseqnoprior <- results(ddsnoprior[genes, ])$log2foldchange deseqprior <- results(ddsprior[genes, ])$log2foldchange mlefromplot <- c(betas[which.max(likelihood(k1, a1, ints[1]))], betas[which.max(likelihood(k2, a2, ints[2]))]) mapfromplot <- c(betas[which.max(posterior(k1, a1, intsprior[1]))], betas[which.max(posterior(k2, a2, intsprior[2]))]) stopifnot(all(abs(deseqnoprior - mlefromplot) < 0.01)) stopifnot(all(abs(deseqprior - mapfromplot) < 0.01)) # also want SE after prior for plotting deseqpriorse <- results(ddsprior[genes, ])$lfcse plot(betas, likelihood(k1, 0.5, 4), type = "n", ylim = c(0, 4.1), xlab = expression(log[2] ~ fold ~ change), ylab = "density") lines(betas, dnorm(betas, 0, priorsigma), col = "black") points2(betas, likelihood(k1, a1, ints[1]), lty = 1, col = cols[1]) points2(betas, likelihood(k2, a2, ints[2]), lty = 1, col = cols[2]) points2(betas, posterior(k1, a1, intsprior[1]), lty = 2, col = cols[1]) points2(betas, posterior(k2, a2, intsprior[2]), lty = 2, col = cols[2]) hghts <- c(4, 1.2) points(deseqprior, hghts, col = cols, pch = 16) arrows(deseqprior, hghts, deseqprior + deseqpriorse, hghts, col = cols, length = 0.05, angle = 90) arrows(deseqprior, hghts, deseqprior - deseqpriorse, hghts, col = cols, length = 0.05, angle = 90) mtext("d", side = 3, line = line, adj = adj, cex = cex) 3

Figure 1: Comparison of logarithmic fold changes without and with zero-centered prior, using the Bottomly et al dataset. 4

3 Session information R version 3.1.0 (2014-04-10), x86_64-unknown-linux-gnu Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=C, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C Base packages: base, datasets, grdevices, graphics, methods, parallel, stats, utils Other packages: BiocGenerics 0.10.0, DESeq2 1.4.0, DESeq2paper 1.2, GenomeInfoDb 1.0.0, GenomicRanges 1.16.0, IRanges 1.21.45, LSD 2.5, MASS 7.3-31, RColorBrewer 1.0-5, Rcpp 0.11.1, RcppArmadillo 0.4.200.0, colorramps 2.3, ellipse 0.3-8, gtools 3.3.1, schoolmath 0.4, xtable 1.7-3 Loaded via a namespace (and not attached): AnnotationDbi 1.26.0, Biobase 2.24.0, DBI 0.2-7, RSQLite 0.11.4, XML 3.98-1.1, XVector 0.4.0, annotate 1.42.0, codetools 0.2-8, digest 0.6.4, evaluate 0.5.5, formatr 0.10, genefilter 1.46.0, geneplotter 1.42.0, grid 3.1.0, highr 0.3, knitr 1.5, lattice 0.20-29, locfit 1.5-9.1, splines 3.1.0, stats4 3.1.0, stringr 0.6.2, survival 2.37-7, tools 3.1.0 5