HT Expression Data Analysis

Similar documents
Differential Expression with DESeq2

Testing for Differential Expression

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

Differential Expression

A review of RNA-Seq normalization methods

How to store and visualize RNA-seq data

Package SCAN.UPC. October 9, Type Package. Title Single-channel array normalization (SCAN) and University Probability of expression Codes (UPC)

Package HTSFilter. November 30, 2017

Sequence Analysis Pipeline

Practical: exploring RNA-Seq counts Hugo Varet, Julie Aubert and Jacques van Helden

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Package SCAN.UPC. July 17, 2018

The software and data for the RNA-Seq exercise are already available on the USB system

Package tximport. November 25, 2017

ROTS: Reproducibility Optimized Test Statistic

Statistical analysis of RNA-Seq data

Package anota2seq. January 30, 2018

Single/paired-end RNAseq analysis with Galaxy

Package tximport. May 13, 2018

RNA-seq. Manpreet S. Katari

Windows. RNA-Seq Tutorial

Exercise 1 Review. --outfiltermismatchnmax : max number of mismatch (Default 10) --outreadsunmapped fastx: output unmapped reads

Package BEARscc. May 2, 2018

Expander 7.2 Online Documentation

srap: Simplified RNA-Seq Analysis Pipeline

SIBER User Manual. Pan Tong and Kevin R Coombes. May 27, Introduction 1

TCC: Differential expression analysis for tag count data with robust normalization strategies

Package DESeq2. April 9, 2015

Package Linnorm. October 12, 2016

Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:

Ballgown. flexible RNA-seq differential expression analysis. Alyssa Frazee Johns Hopkins

Import GEO Experiment into Partek Genomics Suite

Introduction to Cancer Genomics

Differential gene expression analysis using RNA-seq

Package DESeq2. February 9, 2018

Package DESeq2. May 10, 2018

RNA-Seq analysis with Astrocyte Differential expression and transcriptome assembly

TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq

Drug versus Disease (DrugVsDisease) package

Package ssizerna. January 9, 2017

Package EventPointer

Using metama for differential gene expression analysis from multiple studies

Package roar. August 31, 2018

Differential gene expression analysis

You can also compare expression between two genes by introducing both gene names in the boxes and pressing the

ArrayExpress and Expression Atlas: Mining Functional Genomics data

RNAseq Differential Expression Analysis Jana Schor and Jörg Hackermüller November, 2017

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays

Our typical RNA quantification pipeline

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub

Package ChIPXpress. October 4, 2013

Data Processing and Analysis in Systems Medicine. Milena Kraus Data Management for Digital Health Summer 2017

Package BgeeDB. January 5, 2019

Course on Microarray Gene Expression Analysis

Expander Online Documentation

Advanced RNA-Seq 1.5. User manual for. Windows, Mac OS X and Linux. November 2, 2016 This software is for research purposes only.

Anaquin - Vignette Ted Wong January 05, 2019

ChIPXpress: enhanced ChIP-seq and ChIP-chip target gene identification using publicly available gene expression data

Building R objects from ArrayExpress datasets

KisSplice. Identifying and Quantifying SNPs, indels and Alternative Splicing Events from RNA-seq data. 29th may 2013

Differential Expression Analysis at PATRIC

Package PGSEA. R topics documented: May 4, Type Package Title Parametric Gene Set Enrichment Analysis Version 1.54.

CARMAweb users guide version Johannes Rainer

Release Notes. JMP Genomics. Version 4.0

Services Performed. The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples.

Package RNASeqR. January 8, 2019

Package HTSFilter. August 2, 2013

MLSeq package: Machine Learning Interface to RNA-Seq Data

CQN (Conditional Quantile Normalization)

Browser Exercises - I. Alignments and Comparative genomics

variancepartition: Quantifying and interpreting drivers of variation in multilevel gene expression experiments

onechannelgui Package Vignette

genbart package Vignette Jacob Cardenas, Jacob Turner, and Derek Blankenship

What does analyze.itraq( )?

User guide for GEM-TREND

Package tatest. July 18, 2018

Bioconductor. BIOF 339 December 11th, 2018

NacoStringQCPro. Dorothee Nickles, Thomas Sandmann, Robert Ziman, Richard Bourgon. Modified: April 10, Compiled: April 24, 2017.

Short Read Sequencing Analysis Workshop

Package enrich. September 3, 2013

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster

AGA User Manual. Version 1.0. January 2014

Goal: Learn how to use various tool to extract information from RNAseq reads. 4.1 Mapping RNAseq Reads to a Genome Assembly

Reference guided RNA-seq data analysis using BioHPC Lab computers

Introduction to GE Microarray data analysis Practical Course MolBio 2012

RNA- SeQC Documentation

TieDIE Tutorial. Version 1.0. Evan Paull

Package NOISeq. R topics documented: August 3, Type Package. Title Exploratory analysis and differential expression for RNA-seq data

Package AffyExpress. October 3, 2013

Statistical analysis of RNA-seq data from nextgeneration

PROPER: PROspective Power Evaluation for RNAseq

Using Galaxy: RNA-seq

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

桌上電腦及筆記本電腦安裝 Acrobat Reader 應用程式

Package scmap. March 29, 2019

Package dupradar. R topics documented: July 12, Type Package

Pathway Studio Quick Start Guide

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data - supplementary materials

Workflows for RNAseq analyses (polysomes or hippocamal/ca1 dissec=on experiment) Cufflinks 1.3 output (unfiltered) to normalized, filtered expression

Transcription:

HT Expression Data Analysis 台大農藝系劉力瑜 lyliu@ntu.edu.tw 08/03/2018 1

HT Transcriptomic Data Microarray RNA-seq

HT Transcriptomic Data Microarray RNA-seq

Workflow Data import Preprocessing* Visualization DE analysis* Adjust p-values for multiple comparisons Cluster analysis * Different methods are used for microarray and RNA-seq data

R / Bioconductor for HT Transcriptomic Data "affylmgui" for Affymetrix microarrays "limma" for microarrays in general "DESeq" for RNA-seq data

Affymetrix Microarrays Example data: (GSE59533) Expression data from Zea mays cultivars Tietar and DKC 6575 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse59533

Get GEO Data using R # Install "GEOquery" package in Bioconductor > source("http://bioconductor.org/bioclite.r") > bioclite("geoquery") > library(geoquery) # 取得 GSE59533 的 CEL 檔案 : > gse0 = getgeosuppfiles("gse59533") > gse0 # 取得下載後檔案存放位置 ; 解壓縮檔案

> source("http://bioconductor.org/bioclite.r") > bioclite("affylmgui") > library(affylmgui) > affylmgui() # mac OS

File -> New

Target File Format The file at the right is known as "RNA Targets" file in affylmgui. It describes the experimental conditions for each of the 12 arrays. The file should be: In tab-delimited text format. Having 3 columns in the file The column headings must appear exactly as shown: Name: the unique name for each chip FileName: Affymetrix.CEL file name for each chip Target: Used by affylmgui to group the arrays into different classes (for downstream differential expression analysis).

Normalization for RNAseq There are two main sources of systematic variability that require normalization. (1) RNA fragmentation during library construction causes longer transcripts to generate more reads compared to shorter transcripts present at the same abundance in the sample (3&4). (2) The variability in the number of reads produced for each run causes fluctuations in the number of fragments mapped across samples (1&2).

Normalization for RNAseq Single-end reads: use reads per kilobase of transcript per million mapped reads (RPKM) metric 10 9 x R / (N x L) Pair-end reads: use analogous fragments per kilobase of transcript per million mapped reads (FPKM) metric

Scaling Method in DESeq

DE Analysis for RNAseq DESeq (DESeq2) is an BioC package: Assume the read counts are distributed as negative binomial (NB) distribution. 1. Estimate the variance for NB distribution 2. Hypothesis testing under NB distribution

DESeq2 Input from count matrix: ctdata.tab gene T1a T1b T2 T3 N1 N2 Gene_00001 0 0 2 0 0 1 Gene_00002 20 8 12 5 19 26 Gene_00003 3 0 2 0 0 0 Gene_00004 75 84 241 149 271 257 Gene_00005 10 16 4 0 4 10 Gene_00006 129 126 451 223 243 149 Gene_00007 13 4 21 19 31 4 Gene_00008 0 3 0 0 0 0 Gene_00009 202 122 256 43 287 357 Gene_00010 10 8 56 145 14 15 Gene_00011 2 3 5 0 3 0 Gene_00012 104 60 218 213 111 121 Gene_00013 6 6 22 13 15 6 (18761 genes) (6 samples)

DESeq2 > library('deseq2') > samplecountdata = read.delim("data/ctdata.tab") > samplecoldata = DataFrame( condition=as.factor(c("treated","treated", "treated","treated","control","control")), row.names=colnames(samplecountdata)) > dds = DESeqDataSetFromMatrix( countdata = samplecountdata, coldata = samplecoldata, design = ~ condition)

DESeq2 > dds = DESeq(dds) > res = results(dds) > res = res[order(res$padj),] > plotma(dds) > write.csv(as.data.frame(res), file="condition_treated_results.csv") # save normalized read counts > norm.cts = counts(dds, normalized=true) > write.csv(norm.cts, file="normalizedcounts.csv")

DESeq2 # LRT for mutiple levels > coldata(dds)$condition = as.factor(c("t1","t1","t2","t2","ctrl","ctrl")) > coldata(dds)$condition = relevel(coldata(dds)$condition, "ctrl") > ddslrt = DESeq(dds,test="LRT", reduced= ~ 1) > reslrt=results(ddslrt) > mcols(ddslrt,use.names=true)[1:3,] # when there is no replicate > trt = c("t1a","t1b") > dds.short = DESeqDataSetFromMatrix(countData = samplecountdata[,1:2], + coldata = DataFrame(condition=as.factor(trt), row.names=trt), + design = ~ condition) > dds.short = DESeq(dds.short) > plotma(dds.short)