Preprocessing -- examples in microarrays

Size: px
Start display at page:

Download "Preprocessing -- examples in microarrays"

Transcription

1 Preprocessing -- examples in microarrays

2 I: cdna arrays Image processing Addressing (gridding) Segmentation (classify a pixel as foreground or background) Intensity extraction (summary statistic) Normalization Plate to plate By grid By print tip

3 FYI: Segmentation

4 FYI: Information extracting Calculating and summarizing background and foreground intensity for each channel Foreground Mean, median, or other quantile of spot area Local Background Often median intensity from background pixels All pixels within a square, but not in spot Area between two concentric circles Valleys - furthest pixels from real spots

5 FYI: Background adjusting

6 Data table header you might see in online data Data table header descriptions ID_REF VALUE CH1_Median CH1_MEAN CH1_SD CH1_BG_Median CH1_BG_Mean CH1_BG_SD CH2_Median CH2_MEAN CH2_SD CH2_BG_Median CH2_BG_Mean CH2_BG_SD AREA AREA_BG FLAGS Normalized log2 ratio of background subtracted medians defined by CH2/CH1 (LPS over Mock) CH1 (F635) median fluorescence intensity CH1 (F635) mean fluorescence intensity CH1 (F635) standard deviation of fluorescence intenisty CH1 (F635) background median fluorescence intensity CH1 (F635) background mean fluorescence intensity CH1 (F635) background standard deviation of fluorescence intensity CH2 (F532) median fluorescence intensity CH2 (F532) mean fluorescence intensity CH2 (F532) standard deviation of fluorescence intenisty CH2 (F532) background median fluorescence intensity CH2 (F532) background mean fluorescence intensity CH2 (F532) background standard deviation of fluorescence intensity Number of feature pixels Number of feature background pixels 0 found feature, -50 not found feature, -75 absent feature, -100 manually flagged bad feature.

7 acc=gsm25330&id=8662&db=geodb_blob02blob02

8 Analysis of cdna chip Data From each array, there are four vectors of length J

9

10 Normalization Identify and remove the effects of systematic variation in the measured fluorescence intensities, other than differential expression, for example different labelling lli efficiencies i i of the dyes; different scanning parameters; print-tip, spatial, or plate effects, etc. The need for normalization can be seen most clearly in self-self hybridizations, where the same mrna sample is labeled with the Cy3 and Cy5 dyes. Q: In sequencing, how do we check if there is need for normalization?

11 Intensity log-ratio, M Boxplots by print-tip-group tip

12 Self-self hybridization There appears to be a non-linear dependence on intensity

13 Self-self hybridization Two-color platform

14 MA-plot by print-tip-group The smooth curves were created using loess. Splines work just as well

15 Nonlinear Normalization normalized log 2 R/G log 2 R/G L(intensity, sector, ) L depends on a number of predictor variables, such as spot intensity A, sector, plate origin. Intensity-dependent normalization. Intensity and sector-dependent normalization. 2D spatial normalization. Other variables: time of printing, plate, etc. Composite normalization. Weighted average of several normalization functions.

16 2D images of L values Global l median normalization Global loess normalization Within-print-tipgroup loess normalization 2D spatial normalization

17 2D images of normalized M-L Global l median normalization Global loess normalization Within-print-tipgroup loess normalization 2D spatial normalization

18 Boxplots of normalized M-L Global median normalization Global loess normalization Within-print-tip- p group loess normalization 2D spatial normalization

19 MA-plots of normalized M-L Global l median normalization Global loess normalization Within-print-tipgroup loess normalization 2D spatial normalization

20 Assumptions we made M does not depend on spot location M does not depend on A E[M]=0 for most genes, or symmetric differential expression What if we expect most genes are up and/or p g p down regulated? (to be continued)

21 Background subtraction: yes, no, when? Motivation: measured foreground intensity includes contribution other than target hybridization Yes: to remove bias. B x x B y y No: to avoid variance var[log{(r-rb)/(g-gb)}] >> var[log(r/g)] for many background measurements. What s the trade-off?

22

23

24 References Empirical Evaluation of Methodologies for Microarray Data Analysis Lixuan Qinand Kathleen Kerr Analysis of cdna microarray images Yee Hwa Yang, Michael J Burckley and Terence P Speed Briefings in Bioinformatics, 2001 Comparison of methods for image analysis on cdna microarray data YH Yang, MJ Buckley, S Dudoit, TP Speed - Journal of Computational and Graphical Statistics, 2002 Improved Background Correction for Spotted DNA Microarrays C KOOPERBERG et al, Journal of Computational Biology

25 II. Preprocessing Oligonucleotide arrays Background adjustment Normalization Summarization Why and how?

26 GeneChip Feature Level Data Each gene represented by a set of probes, referred to as probesets More than 10, probesets Each probeset includes features (probes) Exon arrays may include on average 4 features for each exon

27 EDA-I Specific Binding: Linear with concentration Background Additive Y=B+aC

28 Effect of Background: revisited B + ac B ac 1 C 1 B + ac 1 1 B +ac 2 B ac 2 C 2 B + ac 2 log(b ac 1 ) log(b ac 2 ) log(c 1 ) log(c 2 ) When a or C is small log(b ac) log(b) When C (concentration) is low

29 Why background correct

30 Distribution of Optical and NSB Noise

31 Can we do Y-B? Different probes have different background level Motivation for MM measurements

32 Direct Measurement Strategy Affymetrix s tries to measure non-specific binding directly Deterministic Model PM = O + N + S MM = O + N PM MM = S PM-MM is supposed to give us a good estimate of specific binding But it is not so simple

33 Sometimes MM larger than PM

34 Deterministic model is wrong Do MM measure non- specific binding? Look at Yeast DNA hybridized to Human Chip Look at PM, MM logscale scatter-plot R 2 is only 0.5

35 Stochastic Model (Additive background/multiplicative error) PM = O PM + N PM + S, MM = O MM + N MM E[ PM MM ] = S, but Var[ log( PM MM ) ] ~ 1/S 2 (can be very large) Note: Technically, var[ log(pm-mm) ] is not defined. We need to make sure we avoid taking logs of negative. Affymetrix s current default does this.

36 Simulation We create some feature level data for two replicate arrays Then compute Y=log(PM-kMM) for each array We make an MA using the Ys for each array We make an observed concentration versus known concentration plot We do this for various values of k. The following movie shows k moving from 0 to 1.

37

38 Does this happen in practice? MAS 5.0

39 Introducing the Empirical Bayes Concept Is referred to as the prior distribution and the parameters describing the prior distribution ib ti are hyperparameters.

40 Toy example for Bayesian estimate Data:11110: 1,1,1,1,0: likelihood:

41 Toy example Data: c(rep(1,80),rep(0,20)):, )) likelihood: 80 (1- ) 20

42 When sample size increases the estimates moves from prior location towards MLE

43 R code for making the previous figures pi0=seq(0,1,.01) ### note "pi" is a reserved name for the pi= ################## prior distribution beta(50,50) and beta(5,5) ########## plot(pi0,dbeta(pi0,50,50),type="l",col=1,lwd=3, xlab=expression(pi),ylab="density", main=expression(paste("prior distribution of ",pi))) lines(pi0,dbeta(pi0,5,5),col=2,lty=2,lwd=3) legend(0.7,6,legend=c("beta(50,50)","beta(5,5)"),lty=1:2,col=1:2,lwd=3) abline(v=.5) ################## posterior distribution ########## ################## prior: beta(50 50) and beta(5 5) ########## ################## prior: beta(50,50) and beta(5,5) ########## ################## data: 4 "1"s and 1 "0" ########## plot(pi0,dbeta(pi0,50,50)*pi0^4*(1-pi0),type="l",col=1,lwd=3, xlab=expression(pi),ylab="density", main=expression(paste("posterior distribution of ",pi))) lines(pi0,dbeta(pi0,5,5)*pi0^4*(1-pi0),,col=2,lty=2,lwd=3) legend(0.7,0.2,legend=c("prior","beta(50,50)","beta(5,5)"),lty=0:2,col=0:2,lwd=3) text(.8,.12,"data=1,1,1,1,0") abline(v=c(.5,.8))

44 ################## posterior distribution ib ti ########## ################## prior: beta(50,50) and beta(5,5) ########## ################## data: 80 "1"s and 20 "0" par(mfrow=c(2,1)) plot(pi0,dbeta(pi0,50,50)*pi0^80*(1-pi0)^20,type="l",col=1,lwd=3,,, p ( p ),,,, xlab=expression(pi),ylab="density", main=expression(paste("posterior distribution of ",pi))) abline(v=c(.5,.8)) text(.2,2e-26,"data=80(1) and 20(0)") text(.2,3e-26,"strong t( 26 t prior beta(50,50)") 50)") plot(pi0,dbeta(pi0,5,5)*pi0^80*(1-pi0)^20,,col=2,lty=2,lwd=3) legend(0.7,0.2,legend=c("prior","beta(50,50)","beta(5,5)"),lty=0:2,col=0:2,lwd=3) text(.8,.12,"data=80(1) and 20(0)") text(.8,.12, data 80(1) and 20(0) ) abline(v=c(.5,.8)) text(.2,2e-43,"data=80(1) and 20(0)") text(.2,3e-43,"week prior beta(5,5)")

45 How do you set prior? If the hyper parameters are estimated from data, it is an Empirical Bayes method. This only works well when you have a large amount of data, such as in genomics borrow information : data on a particular feature may not be directly affected by other features, but data on other features help setting the prior distribution

46 Borrow information from all other probes on the same array Robust Multiarray Analysis (RMA) Y=B+S B ~ Normal (, 2 ) All probes exchangeable S ~ Exp(a) Estimate a,, 2 using all probes ( borrow strength across all probes ) Estimated Signal: E [ S Y ] Irizarry et al (2003)

47 Does it help? MAS 5.0 RMA

48 Aclose-up

49 II: Borrow information from other Rational: similar probes (GCRMA) identity of probes are determined by their sequences. predict affinity to non-specific binding based on sequence probes sharing similar affinity are expected to have similar background level

50 Which are the similar probes 0.6 C C C C C C C C C C C C 0.4 C C C C C C 0.2 C C C C C 0.0 GC A G G G G G G G T G C T G T T G T T T T T T T GT GT GT GT T T T T T T T T G G G G A G G G G A A A A A A A G A T G A T A A A A A A A A A A A A A A j, k1 1 k 1 j { A, T, G, C} b k j µ j,k ~ smooth function of k

51

52 Normalization Normalization is needed to ensure that differences in intensities are indeed due to differential expression, and not some printing, hybridization, or scanning artifact. Normalization is different in spotted/two-color and highdensity-oligonucleotides (Affy) technologies No dye bias (hence no within-array normalization to remove dye effect) No print-tip issue

53 Normalization density log (PM) These are from replicate RNA Left are empirical density These are from replicate RNA. Left are empirical density estimates for 5 GeneChip arrays.

54 Non-linear dependency on intensity again These are two technical replicates. Only the red are differentially expressed

55 Nonlinear normalization Proposed solutions Force distributions (not just medians) to be the same: Amaratunga and Cabrera (2001) Bolstad et al. (2003) Use curve estimators such as splines to adjust for the effect: Li and Wong (2001) Note: they also use a rank invariant set Colantuoni et al (2002) Dudoit et al (2002) Use adjustments based on additive/multiplicative model: Rocke and Durbin (2003) Huber et al (2002) Cui et al (2003)

56 Quantile normalization All these non-linear methods perform similarly Quantile normalization is easy and fast Basic idea: order value in each array take average across probes Substitute probe intensity with average Put in original order

57 Example of quantile normalization Original Ordered Averaged Re-ordered

58 How does it work Before After But does it force everything to be the same and remove real differences?

59 After quantile normalization We say more later..

60 Assumption Either the majority of genes are not differentially expressed Or the differentially expression is symmetric

61 Dilution Experiment crna hybridized to human chip (HGU95) in range of proportions and dilutions Dilution series begins at 1.25 g crna per GeneChip array, and rises through 2.5, 5.0, 7.5, 10.0, to 20.0 g per array. 5 replicate chips were used at each dilution A few control genes were spiked in at same concentration across all arrays. Notice we can only make usual assumption within groups of 5

62 Raw Data

63 If you can t normalize your conclusions will likely l be wrong! 1.25 g, g 2.5 g,1g st 5 th scanner scanner

64 Normalize with control genes Looks almost as bad!

65 We can not median normalize We wash out real biological effects

66 Or quantile normalize

67 But we can quantile normalize and then use controls

68 Look at the improvement

69 Look at the improvement

70 Summarization Robust average Median polish (rma, gcrma) For a matrix X, assumes X ij =X 0 +R i +C j Estimates row effects (probe) and column effects (sample). Tukey, J. W. (1977). Exploratory Data Analysis R function medpolish (package: stat) Tukey biweight (MAS 5.0) R function tukey.biweight (package: affy)

71 Influence function How one observation changes the statistic Sample Mean can be sensitive to the change of any one data point Change x i to x *, the new sample mean changes by (x * /n-x i /n) The influence function is a straight line

72 FYI

73 summary Preprocessing is a common problem for many technologies Always keep in mind what is expected in the data. This is useful for quality control and normalization Where to look for pseudo-control when real controls are not available Robustness is important Empirical Bayes is particularly useful in bioinformatics

74 Exercise For those in Biostat PhD: Consider the model Y=B+S, where B ~ Normal (, 2 ) and S~ Exp(a). Suppose, 2 and a are known parameters. Derive the conditional expectation E[S Y] The code for plotting the prior and posterior distributions is provided in the lecture notes. For each of the two priors, Plot the posterior distributions of pi in the toy example, suppose the data is now 8 success out of 10 trials. Plot the posterior distributions of pi in the toy example, suppose the data is now 80 success out of 100 trials. (For better visual effect you may have to plot the two posterior distributions separately rather than in the same figure as shown in the lecture notes.) how would you explain to a non-statistician the role of prior and likelihood in Bayesian inference?

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program

More information

Course on Microarray Gene Expression Analysis

Course on Microarray Gene Expression Analysis Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level

More information

Microarray Data Analysis (VI) Preprocessing (ii): High-density Oligonucleotide Arrays

Microarray Data Analysis (VI) Preprocessing (ii): High-density Oligonucleotide Arrays Microarray Data Analysis (VI) Preprocessing (ii): High-density Oligonucleotide Arrays High-density Oligonucleotide Array High-density Oligonucleotide Array PM (Perfect Match): The perfect match probe has

More information

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays Preprocessing Probe-level data: the intensities read for each of the components. Genomic-level data: the measures being used in

More information

Using FARMS for summarization Using I/NI-calls for gene filtering. Djork-Arné Clevert. Institute of Bioinformatics, Johannes Kepler University Linz

Using FARMS for summarization Using I/NI-calls for gene filtering. Djork-Arné Clevert. Institute of Bioinformatics, Johannes Kepler University Linz Software Manual Institute of Bioinformatics, Johannes Kepler University Linz Using FARMS for summarization Using I/NI-calls for gene filtering Djork-Arné Clevert Institute of Bioinformatics, Johannes Kepler

More information

Gene Expression an Overview of Problems & Solutions: 1&2. Utah State University Bioinformatics: Problems and Solutions Summer 2006

Gene Expression an Overview of Problems & Solutions: 1&2. Utah State University Bioinformatics: Problems and Solutions Summer 2006 Gene Expression an Overview of Problems & Solutions: 1&2 Utah State University Bioinformatics: Problems and Solutions Summer 2006 Review DNA mrna Proteins action! mrna transcript abundance ~ expression

More information

Normalization: Bioconductor s marray package

Normalization: Bioconductor s marray package Normalization: Bioconductor s marray package Yee Hwa Yang 1 and Sandrine Dudoit 2 October 30, 2017 1. Department of edicine, University of California, San Francisco, jean@biostat.berkeley.edu 2. Division

More information

Description of gcrma package

Description of gcrma package Description of gcrma package Zhijin(Jean) Wu, Rafael Irizarry October 30, 2018 Contents 1 Introduction 1 2 What s new in version 2.0.0 3 3 Options for gcrma 3 4 Getting started: the simplest example 4

More information

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Introduction to GE Microarray data analysis Practical Course MolBio 2012 Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical

More information

Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation

Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation MICROARRAY ANALYSIS My (Educated?) View 1. Data included in GEXEX a. Whole data stored and securely available b. GP3xCLI on

More information

BIOINFORMATICS. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

BIOINFORMATICS. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias BIOINFORMATICS Vol. 19 no. 2 2003 Pages 185 193 A comparison of normalization methods for high density oligonucleotide array data based on variance and bias B. M. Bolstad 1,,R.A.Irizarry 2,M.Åstrand 3

More information

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 6 November 2009 3.00 pm BRAGG Cluster This document contains the tasks need to be done and completed by

More information

CARMAweb users guide version Johannes Rainer

CARMAweb users guide version Johannes Rainer CARMAweb users guide version 1.0.8 Johannes Rainer July 4, 2006 Contents 1 Introduction 1 2 Preprocessing 5 2.1 Preprocessing of Affymetrix GeneChip data............................. 5 2.2 Preprocessing

More information

Organizing, cleaning, and normalizing (smoothing) cdna microarray data

Organizing, cleaning, and normalizing (smoothing) cdna microarray data Organizing, cleaning, and normalizing (smoothing) cdna microarray data All product names are given as examples only and they are not endorsed by the USDA or the University of Illinois. INTRODUCTION The

More information

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Matthew N. McCall November 9, 2017 Contents 1 Frozen Robust Multiarray Analysis (frma) 2 1.1 From CEL files to expression estimates...................

More information

Bioconductor s stepnorm package

Bioconductor s stepnorm package Bioconductor s stepnorm package Yuanyuan Xiao 1 and Yee Hwa Yang 2 October 18, 2004 Departments of 1 Biopharmaceutical Sciences and 2 edicine University of California, San Francisco yxiao@itsa.ucsf.edu

More information

PROCEDURE HELP PREPARED BY RYAN MURPHY

PROCEDURE HELP PREPARED BY RYAN MURPHY Module on Microarray Statistics for Biochemistry: Metabolomics & Regulation Part 2: Normalization of Microarray Data By Johanna Hardin and Laura Hoopes Instructions and worksheet to be handed in NAME Lecture/Discussion

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

MiChip. Jonathon Blake. October 30, Introduction 1. 5 Plotting Functions 3. 6 Normalization 3. 7 Writing Output Files 3

MiChip. Jonathon Blake. October 30, Introduction 1. 5 Plotting Functions 3. 6 Normalization 3. 7 Writing Output Files 3 MiChip Jonathon Blake October 30, 2018 Contents 1 Introduction 1 2 Reading the Hybridization Files 1 3 Removing Unwanted Rows and Correcting for Flags 2 4 Summarizing Intensities 3 5 Plotting Functions

More information

affypara: Parallelized preprocessing algorithms for high-density oligonucleotide array data

affypara: Parallelized preprocessing algorithms for high-density oligonucleotide array data affypara: Parallelized preprocessing algorithms for high-density oligonucleotide array data Markus Schmidberger Ulrich Mansmann IBE August 12-14, Technische Universität Dortmund, Germany Preprocessing

More information

Applying Data-Driven Normalization Strategies for qpcr Data Using Bioconductor

Applying Data-Driven Normalization Strategies for qpcr Data Using Bioconductor Applying Data-Driven Normalization Strategies for qpcr Data Using Bioconductor Jessica Mar April 30, 2018 1 Introduction High-throughput real-time quantitative reverse transcriptase polymerase chain reaction

More information

affyqcreport: A Package to Generate QC Reports for Affymetrix Array Data

affyqcreport: A Package to Generate QC Reports for Affymetrix Array Data affyqcreport: A Package to Generate QC Reports for Affymetrix Array Data Craig Parman and Conrad Halling April 30, 2018 Contents 1 Introduction 1 2 Getting Started 2 3 Figure Details 3 3.1 Report page

More information

Introduction to the xps Package: Comparison to Affymetrix Power Tools (APT)

Introduction to the xps Package: Comparison to Affymetrix Power Tools (APT) Introduction to the xps Package: Comparison to Affymetrix Power Tools (APT) Christian Stratowa April, 2010 Contents 1 Introduction 1 2 Comparison of results for the Human Genome U133 Plus 2.0 Array 2 2.1

More information

Preprocessing Affymetrix Data. Educational Materials 2005 R. Irizarry and R. Gentleman Modified 21 November, 2009, M. Morgan

Preprocessing Affymetrix Data. Educational Materials 2005 R. Irizarry and R. Gentleman Modified 21 November, 2009, M. Morgan Preprocessing Affymetrix Data Educational Materials 2005 R. Irizarry and R. Gentleman Modified 21 November, 2009, M. Morgan 1 Input, Quality Assessment, and Pre-processing Fast Track Identify CEL files.

More information

Methodology for spot quality evaluation

Methodology for spot quality evaluation Methodology for spot quality evaluation Semi-automatic pipeline in MAIA The general workflow of the semi-automatic pipeline analysis in MAIA is shown in Figure 1A, Manuscript. In Block 1 raw data, i.e..tif

More information

Package OLIN. September 30, 2018

Package OLIN. September 30, 2018 Version 1.58.0 Date 2016-02-19 Package OLIN September 30, 2018 Title Optimized local intensity-dependent normalisation of two-color microarrays Author Matthias Futschik Maintainer Matthias

More information

How to use the DEGseq Package

How to use the DEGseq Package How to use the DEGseq Package Likun Wang 1,2 and Xi Wang 1. October 30, 2018 1 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST /Department of Automation, Tsinghua University. 2

More information

Codelink Legacy: the old Codelink class

Codelink Legacy: the old Codelink class Codelink Legacy: the old Codelink class Diego Diez October 30, 2018 1 Introduction Codelink is a platform for the analysis of gene expression on biological samples property of Applied Microarrays, Inc.

More information

Package stepnorm. R topics documented: April 10, Version Date

Package stepnorm. R topics documented: April 10, Version Date Version 1.38.0 Date 2008-10-08 Package stepnorm April 10, 2015 Title Stepwise normalization functions for cdna microarrays Author Yuanyuan Xiao , Yee Hwa (Jean) Yang

More information

Micro-array Image Analysis using Clustering Methods

Micro-array Image Analysis using Clustering Methods Micro-array Image Analysis using Clustering Methods Mrs Rekha A Kulkarni PICT PUNE kulkarni_rekha@hotmail.com Abstract Micro-array imaging is an emerging technology and several experimental procedures

More information

Bayesian Analysis of Differential Gene Expression

Bayesian Analysis of Differential Gene Expression Bayesian Analysis of Differential Gene Expression Biostat Journal Club Chuan Zhou chuan.zhou@vanderbilt.edu Department of Biostatistics Vanderbilt University Bayesian Modeling p. 1/1 Lewin et al., 2006

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS

MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak

More information

Package affyplm. January 6, 2019

Package affyplm. January 6, 2019 Version 1.58.0 Title Methods for fitting probe-level models Author Ben Bolstad Maintainer Ben Bolstad Package affyplm January 6, 2019 Depends R (>= 2.6.0), BiocGenerics

More information

Nature Publishing Group

Nature Publishing Group Figure S I II III 6 7 8 IV ratio ssdna (S/G) WT hr hr hr 6 7 8 9 V 6 6 7 7 8 8 9 9 VII 6 7 8 9 X VI XI VIII IX ratio ssdna (S/G) rad hr hr hr 6 7 Chromosome Coordinate (kb) 6 6 Nature Publishing Group

More information

Towards an Optimized Illumina Microarray Data Analysis Pipeline

Towards an Optimized Illumina Microarray Data Analysis Pipeline Towards an Optimized Illumina Microarray Data Analysis Pipeline Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center, Northwestern University August 06, 2007 Outline Introduction of Illumina Beadarray

More information

A short reference to FSPMA definition files

A short reference to FSPMA definition files A short reference to FSPMA definition files P. Sykacek Department of Genetics & Department of Pathology University of Cambridge peter@sykacek.net June 22, 2005 Abstract This report provides a brief reference

More information

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments Ning Leng and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 The model 2 2.1 EBSeqHMM model..........................................

More information

Affymetrix GeneChip DNA Analysis Software

Affymetrix GeneChip DNA Analysis Software Affymetrix GeneChip DNA Analysis Software User s Guide Version 3.0 For Research Use Only. Not for use in diagnostic procedures. P/N 701454 Rev. 3 Trademarks Affymetrix, GeneChip, EASI,,,, HuSNP, GenFlex,

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510

More information

Introduction to the Bioconductor marray package : Input component

Introduction to the Bioconductor marray package : Input component Introduction to the Bioconductor marray package : Input component Yee Hwa Yang 1 and Sandrine Dudoit 2 April 30, 2018 Contents 1. Department of Medicine, University of California, San Francisco, jean@biostat.berkeley.edu

More information

Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image

Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image www.ijcsi.org 316 Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image J.Harikiran 1, D.RamaKrishna 2, M.L.Phanendra 3, Dr.P.V.Lakshmi 4, Dr.R.Kiran Kumar

More information

EBSeq: An R package for differential expression analysis using RNA-seq data

EBSeq: An R package for differential expression analysis using RNA-seq data EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John A. Dawson, Christina Kendziorski October 9, 2012 Contents 1 Introduction 2 2 The Model 3 2.1 Two conditions............................

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Class Discovery and Prediction of Tumor with Microarray Data

Class Discovery and Prediction of Tumor with Microarray Data Minnesota State University, Mankato Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato Theses, Dissertations, and Other Capstone Projects 2011 Class Discovery

More information

Analyzing ICAT Data. Analyzing ICAT Data

Analyzing ICAT Data. Analyzing ICAT Data Analyzing ICAT Data Gary Van Domselaar University of Alberta Analyzing ICAT Data ICAT: Isotope Coded Affinity Tag Introduced in 1999 by Ruedi Aebersold as a method for quantitative analysis of complex

More information

multiscan: Combining multiple laser scans of microarrays

multiscan: Combining multiple laser scans of microarrays multiscan: Combining multiple laser scans of microarrays Mizanur R. Khondoker Chris A. Glasbey Bruce J. Worton April 4, 2013 Contents 1 Introduction 1 2 Package documentation 2 2.1 Help files..........................................

More information

A Two-Way Semi-Linear Model for Normalization and Significant Analysis of cdna Microarray Data

A Two-Way Semi-Linear Model for Normalization and Significant Analysis of cdna Microarray Data A Two-Way Semi-Linear Model for Normalization and Significant Analysis of cdna Microarray Data Jian Huang, Deli Wang, and Cun-Hui Zhang 1:Dpeartment of Statistics and Actuarial Science, and Program in

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers HW 34. Sketch

More information

The analysis of acgh data: Overview

The analysis of acgh data: Overview The analysis of acgh data: Overview JC Marioni, ML Smith, NP Thorne January 13, 2006 Overview i snapcgh (Segmentation, Normalisation and Processing of arraycgh data) is a package for the analysis of array

More information

Bayesian Robust Inference of Differential Gene Expression The bridge package

Bayesian Robust Inference of Differential Gene Expression The bridge package Bayesian Robust Inference of Differential Gene Expression The bridge package Raphael Gottardo October 30, 2017 Contents Department Statistics, University of Washington http://www.rglab.org raph@stat.washington.edu

More information

sscore July 13, 2010 vector of data tuning constant (see details) fuzz value to avoid division by zero (see details)

sscore July 13, 2010 vector of data tuning constant (see details) fuzz value to avoid division by zero (see details) sscore July 13, 2010 OneStepBiweightAlgorithm One-step Tukey s biweight Computes one-step Tukey s biweight on a vector. that this implementation follows the Affymetrix code, which is different from the

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Parallelized preprocessing algorithms for high-density oligonucleotide arrays

Parallelized preprocessing algorithms for high-density oligonucleotide arrays Parallelized preprocessing algorithms for high-density oligonucleotide arrays Markus Schmidberger, Ulrich Mansmann Chair of Biometrics and Bioinformatics, IBE University of Munich 81377 Munich, Germany

More information

Package spikeli. August 3, 2013

Package spikeli. August 3, 2013 Package spikeli August 3, 2013 Type Package Title Affymetrix Spike-in Langmuir Isotherm Data Analysis Tool Version 2.21.0 Date 2009-04-03 Author Delphine Baillon, Paul Leclercq ,

More information

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1 Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular

More information

AgiMicroRna. Pedro Lopez-Romero. April 30, 2018

AgiMicroRna. Pedro Lopez-Romero. April 30, 2018 AgiMicroRna Pedro Lopez-Romero April 30, 2018 1 Package Overview AgiMicroRna provides useful functionality for the processing, quality assessment and differential expression analysis of Agilent microrna

More information

How do microarrays work

How do microarrays work Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

Package LMGene. R topics documented: December 23, Version Date

Package LMGene. R topics documented: December 23, Version Date Version 2.38.0 Date 2013-07-24 Package LMGene December 23, 2018 Title LMGene Software for Data Transformation and Identification of Differentially Expressed Genes in Gene Expression Arrays Author David

More information

EECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ GeneChip 2011/11/29 EECS 730 2 Hybridization to the Chip 2011/11/29

More information

aroma.affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory

aroma.affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory aroma.affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory Henrik Bengtsson, Ken Simpson, James Bullard, Kasper Hansen February 18, 2008 Abstract

More information

DNA microarrays [1] are used to measure the expression

DNA microarrays [1] are used to measure the expression IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 24, NO. 7, JULY 2005 901 Mixture Model Analysis of DNA Microarray Images K. Blekas*, Member, IEEE, N. P. Galatsanos, Senior Member, IEEE, A. Likas, Senior Member,

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

Clustering Techniques

Clustering Techniques Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,

More information

Package sscore. R topics documented: June 27, Version Date

Package sscore. R topics documented: June 27, Version Date Version 1.52.0 Date 2009-04-11 Package sscore June 27, 2018 Title S-Score Algorithm for Affymetrix Oligonucleotide Microarrays Author Richard Kennedy , based on C++ code from

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005 Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber June 7, 00 The following exercise will guide you through the first steps of a spotted cdna microarray analysis.

More information

QAnalyzer 1.0c. User s Manual. Features

QAnalyzer 1.0c. User s Manual. Features QAnalyzer 1.0c User s Manual Packard BioScience QuantArray(QA) microarray analysis software is a powerful microarray analysis software that enables researchers to easily and accurately visualize and quantitate

More information

Expander Online Documentation

Expander Online Documentation Expander Online Documentation Table of Contents Introduction...1 Starting EXPANDER...2 Input Data...4 Preprocessing GE Data...8 Viewing Data Plots...12 Clustering GE Data...14 Biclustering GE Data...17

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Exploring cdna Data Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Computer Exercise - Microarray Analysis using Bioconductor

Computer Exercise - Microarray Analysis using Bioconductor Computer Exercise - Microarray Analysis using Bioconductor Introduction The SWIRL dataset The SWIRL dataset comes from an experiment using zebrafish to study early development in vertebrates. SWIRL is

More information

Anaquin - Vignette Ted Wong January 05, 2019

Anaquin - Vignette Ted Wong January 05, 2019 Anaquin - Vignette Ted Wong (t.wong@garvan.org.au) January 5, 219 Citation [1] Representing genetic variation with synthetic DNA standards. Nature Methods, 217 [2] Spliced synthetic genes as internal controls

More information

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode Version 1.34.0 Date 2017-11-08 Title Frozen RMA and Barcode Package frma March 8, 2019 Preprocessing and analysis for single microarrays and microarray batches. Author Matthew N. McCall ,

More information

Package ffpe. October 1, 2018

Package ffpe. October 1, 2018 Type Package Package ffpe October 1, 2018 Title Quality assessment and control for FFPE microarray expression data Version 1.24.0 Author Levi Waldron Maintainer Levi Waldron

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Computer Vision I - Basics of Image Processing Part 1

Computer Vision I - Basics of Image Processing Part 1 Computer Vision I - Basics of Image Processing Part 1 Carsten Rother 28/10/2014 Computer Vision I: Basics of Image Processing Link to lectures Computer Vision I: Basics of Image Processing 28/10/2014 2

More information

Schedule for Rest of Semester

Schedule for Rest of Semester Schedule for Rest of Semester Date Lecture Topic 11/20 24 Texture 11/27 25 Review of Statistics & Linear Algebra, Eigenvectors 11/29 26 Eigenvector expansions, Pattern Recognition 12/4 27 Cameras & calibration

More information

Correction for pixel censoring in cdna microarrays

Correction for pixel censoring in cdna microarrays Correction for pixel censoring in cdna microarrays Chris Glasbey 1 and Mizanur Khondoker 1 1 Biomathematics and Statistics Scotland, King s Buildings, Edinburgh EH9 3JZ, UK Abstract: cdna microarrays are

More information

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal

More information

2 varying degree of distortion, or loss. The degree of loss can be chosen by the user, or be based on parameters such as local image intensities. We c

2 varying degree of distortion, or loss. The degree of loss can be chosen by the user, or be based on parameters such as local image intensities. We c On the Bitplane Compression of Microarray Images. Rebecka Jornsten, Yehuda Vardi, Cun-Hui Zhang. Abstract. The microarray image technology is a new and powerful tool for studying the expression of thousands

More information

Package yaqcaffy. January 19, 2019

Package yaqcaffy. January 19, 2019 Package yaqcaffy January 19, 2019 Title Affymetrix expression data quality control and reproducibility analysis Version 1.42.0 Author Quality control of Affymetrix GeneChip expression data and reproducibility

More information

Pattern recognition (4)

Pattern recognition (4) Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and

More information

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known

More information

Statistical Methods for Preprocessing Microarray Gene Expression Data. Md. Mizanur Rahman Khondoker

Statistical Methods for Preprocessing Microarray Gene Expression Data. Md. Mizanur Rahman Khondoker Statistical Methods for Preprocessing Microarray Gene Expression Data Md. Mizanur Rahman Khondoker Doctor of Philosophy University of Edinburgh 2006 Abstract DNA microarrays facilitate the simultaneous

More information

RJaCGH, a package for analysis of

RJaCGH, a package for analysis of RJaCGH, a package for analysis of CGH arrays with Reversible Jump MCMC 1. CGH Arrays: Biological problem: Changes in number of DNA copies are associated to cancer activity. Microarray technology: Oscar

More information

Stat 602X Exam 2 Spring 2011

Stat 602X Exam 2 Spring 2011 Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . Below is a small p classification training set (for classes) displayed in

More information

Expander 7.2 Online Documentation

Expander 7.2 Online Documentation Expander 7.2 Online Documentation Introduction... 2 Starting EXPANDER... 2 Input Data... 3 Tabular Data File... 4 CEL Files... 6 Working on similarity data no associated expression data... 9 Working on

More information

Package AffyExpress. October 3, 2013

Package AffyExpress. October 3, 2013 Version 1.26.0 Date 2009-07-22 Package AffyExpress October 3, 2013 Title Affymetrix Quality Assessment and Analysis Tool Author Maintainer Xuejun Arthur Li Depends R (>= 2.10), affy (>=

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Automatic Techniques for Gridding cdna Microarray Images

Automatic Techniques for Gridding cdna Microarray Images Automatic Techniques for Gridding cda Microarray Images aima Kaabouch, Member, IEEE, and Hamid Shahbazkia Department of Electrical Engineering, University of orth Dakota Grand Forks, D 58202-765 2 University

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current

More information

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness Bioconductor exercises Exploring cdna data June 2004 Wolfgang Huber and Andreas Buness The following exercise will show you some possibilities to load data from spotted cdna microarrays into R, and to

More information

A Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods

A Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods The Open Bioinformatics Journal, 2012, 6, 37-42 37 Open Access A Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods Dajie Luo 1,#, Prithish Banerjee 1,#, E. James Harner

More information

Using oligonucleotide microarray reporter sequence information for preprocessing and quality assessment

Using oligonucleotide microarray reporter sequence information for preprocessing and quality assessment Using oligonucleotide microarray reporter sequence information for preprocessing and quality assessment Wolfgang Huber and Robert Gentleman April 30, 2018 Contents 1 Overview 1 2 Using probe packages 1

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information