Preprocessing -- examples in microarrays
|
|
- Morgan Craig
- 5 years ago
- Views:
Transcription
1 Preprocessing -- examples in microarrays
2 I: cdna arrays Image processing Addressing (gridding) Segmentation (classify a pixel as foreground or background) Intensity extraction (summary statistic) Normalization Plate to plate By grid By print tip
3 FYI: Segmentation
4 FYI: Information extracting Calculating and summarizing background and foreground intensity for each channel Foreground Mean, median, or other quantile of spot area Local Background Often median intensity from background pixels All pixels within a square, but not in spot Area between two concentric circles Valleys - furthest pixels from real spots
5 FYI: Background adjusting
6 Data table header you might see in online data Data table header descriptions ID_REF VALUE CH1_Median CH1_MEAN CH1_SD CH1_BG_Median CH1_BG_Mean CH1_BG_SD CH2_Median CH2_MEAN CH2_SD CH2_BG_Median CH2_BG_Mean CH2_BG_SD AREA AREA_BG FLAGS Normalized log2 ratio of background subtracted medians defined by CH2/CH1 (LPS over Mock) CH1 (F635) median fluorescence intensity CH1 (F635) mean fluorescence intensity CH1 (F635) standard deviation of fluorescence intenisty CH1 (F635) background median fluorescence intensity CH1 (F635) background mean fluorescence intensity CH1 (F635) background standard deviation of fluorescence intensity CH2 (F532) median fluorescence intensity CH2 (F532) mean fluorescence intensity CH2 (F532) standard deviation of fluorescence intenisty CH2 (F532) background median fluorescence intensity CH2 (F532) background mean fluorescence intensity CH2 (F532) background standard deviation of fluorescence intensity Number of feature pixels Number of feature background pixels 0 found feature, -50 not found feature, -75 absent feature, -100 manually flagged bad feature.
7 acc=gsm25330&id=8662&db=geodb_blob02blob02
8 Analysis of cdna chip Data From each array, there are four vectors of length J
9
10 Normalization Identify and remove the effects of systematic variation in the measured fluorescence intensities, other than differential expression, for example different labelling lli efficiencies i i of the dyes; different scanning parameters; print-tip, spatial, or plate effects, etc. The need for normalization can be seen most clearly in self-self hybridizations, where the same mrna sample is labeled with the Cy3 and Cy5 dyes. Q: In sequencing, how do we check if there is need for normalization?
11 Intensity log-ratio, M Boxplots by print-tip-group tip
12 Self-self hybridization There appears to be a non-linear dependence on intensity
13 Self-self hybridization Two-color platform
14 MA-plot by print-tip-group The smooth curves were created using loess. Splines work just as well
15 Nonlinear Normalization normalized log 2 R/G log 2 R/G L(intensity, sector, ) L depends on a number of predictor variables, such as spot intensity A, sector, plate origin. Intensity-dependent normalization. Intensity and sector-dependent normalization. 2D spatial normalization. Other variables: time of printing, plate, etc. Composite normalization. Weighted average of several normalization functions.
16 2D images of L values Global l median normalization Global loess normalization Within-print-tipgroup loess normalization 2D spatial normalization
17 2D images of normalized M-L Global l median normalization Global loess normalization Within-print-tipgroup loess normalization 2D spatial normalization
18 Boxplots of normalized M-L Global median normalization Global loess normalization Within-print-tip- p group loess normalization 2D spatial normalization
19 MA-plots of normalized M-L Global l median normalization Global loess normalization Within-print-tipgroup loess normalization 2D spatial normalization
20 Assumptions we made M does not depend on spot location M does not depend on A E[M]=0 for most genes, or symmetric differential expression What if we expect most genes are up and/or p g p down regulated? (to be continued)
21 Background subtraction: yes, no, when? Motivation: measured foreground intensity includes contribution other than target hybridization Yes: to remove bias. B x x B y y No: to avoid variance var[log{(r-rb)/(g-gb)}] >> var[log(r/g)] for many background measurements. What s the trade-off?
22
23
24 References Empirical Evaluation of Methodologies for Microarray Data Analysis Lixuan Qinand Kathleen Kerr Analysis of cdna microarray images Yee Hwa Yang, Michael J Burckley and Terence P Speed Briefings in Bioinformatics, 2001 Comparison of methods for image analysis on cdna microarray data YH Yang, MJ Buckley, S Dudoit, TP Speed - Journal of Computational and Graphical Statistics, 2002 Improved Background Correction for Spotted DNA Microarrays C KOOPERBERG et al, Journal of Computational Biology
25 II. Preprocessing Oligonucleotide arrays Background adjustment Normalization Summarization Why and how?
26 GeneChip Feature Level Data Each gene represented by a set of probes, referred to as probesets More than 10, probesets Each probeset includes features (probes) Exon arrays may include on average 4 features for each exon
27 EDA-I Specific Binding: Linear with concentration Background Additive Y=B+aC
28 Effect of Background: revisited B + ac B ac 1 C 1 B + ac 1 1 B +ac 2 B ac 2 C 2 B + ac 2 log(b ac 1 ) log(b ac 2 ) log(c 1 ) log(c 2 ) When a or C is small log(b ac) log(b) When C (concentration) is low
29 Why background correct
30 Distribution of Optical and NSB Noise
31 Can we do Y-B? Different probes have different background level Motivation for MM measurements
32 Direct Measurement Strategy Affymetrix s tries to measure non-specific binding directly Deterministic Model PM = O + N + S MM = O + N PM MM = S PM-MM is supposed to give us a good estimate of specific binding But it is not so simple
33 Sometimes MM larger than PM
34 Deterministic model is wrong Do MM measure non- specific binding? Look at Yeast DNA hybridized to Human Chip Look at PM, MM logscale scatter-plot R 2 is only 0.5
35 Stochastic Model (Additive background/multiplicative error) PM = O PM + N PM + S, MM = O MM + N MM E[ PM MM ] = S, but Var[ log( PM MM ) ] ~ 1/S 2 (can be very large) Note: Technically, var[ log(pm-mm) ] is not defined. We need to make sure we avoid taking logs of negative. Affymetrix s current default does this.
36 Simulation We create some feature level data for two replicate arrays Then compute Y=log(PM-kMM) for each array We make an MA using the Ys for each array We make an observed concentration versus known concentration plot We do this for various values of k. The following movie shows k moving from 0 to 1.
37
38 Does this happen in practice? MAS 5.0
39 Introducing the Empirical Bayes Concept Is referred to as the prior distribution and the parameters describing the prior distribution ib ti are hyperparameters.
40 Toy example for Bayesian estimate Data:11110: 1,1,1,1,0: likelihood:
41 Toy example Data: c(rep(1,80),rep(0,20)):, )) likelihood: 80 (1- ) 20
42 When sample size increases the estimates moves from prior location towards MLE
43 R code for making the previous figures pi0=seq(0,1,.01) ### note "pi" is a reserved name for the pi= ################## prior distribution beta(50,50) and beta(5,5) ########## plot(pi0,dbeta(pi0,50,50),type="l",col=1,lwd=3, xlab=expression(pi),ylab="density", main=expression(paste("prior distribution of ",pi))) lines(pi0,dbeta(pi0,5,5),col=2,lty=2,lwd=3) legend(0.7,6,legend=c("beta(50,50)","beta(5,5)"),lty=1:2,col=1:2,lwd=3) abline(v=.5) ################## posterior distribution ########## ################## prior: beta(50 50) and beta(5 5) ########## ################## prior: beta(50,50) and beta(5,5) ########## ################## data: 4 "1"s and 1 "0" ########## plot(pi0,dbeta(pi0,50,50)*pi0^4*(1-pi0),type="l",col=1,lwd=3, xlab=expression(pi),ylab="density", main=expression(paste("posterior distribution of ",pi))) lines(pi0,dbeta(pi0,5,5)*pi0^4*(1-pi0),,col=2,lty=2,lwd=3) legend(0.7,0.2,legend=c("prior","beta(50,50)","beta(5,5)"),lty=0:2,col=0:2,lwd=3) text(.8,.12,"data=1,1,1,1,0") abline(v=c(.5,.8))
44 ################## posterior distribution ib ti ########## ################## prior: beta(50,50) and beta(5,5) ########## ################## data: 80 "1"s and 20 "0" par(mfrow=c(2,1)) plot(pi0,dbeta(pi0,50,50)*pi0^80*(1-pi0)^20,type="l",col=1,lwd=3,,, p ( p ),,,, xlab=expression(pi),ylab="density", main=expression(paste("posterior distribution of ",pi))) abline(v=c(.5,.8)) text(.2,2e-26,"data=80(1) and 20(0)") text(.2,3e-26,"strong t( 26 t prior beta(50,50)") 50)") plot(pi0,dbeta(pi0,5,5)*pi0^80*(1-pi0)^20,,col=2,lty=2,lwd=3) legend(0.7,0.2,legend=c("prior","beta(50,50)","beta(5,5)"),lty=0:2,col=0:2,lwd=3) text(.8,.12,"data=80(1) and 20(0)") text(.8,.12, data 80(1) and 20(0) ) abline(v=c(.5,.8)) text(.2,2e-43,"data=80(1) and 20(0)") text(.2,3e-43,"week prior beta(5,5)")
45 How do you set prior? If the hyper parameters are estimated from data, it is an Empirical Bayes method. This only works well when you have a large amount of data, such as in genomics borrow information : data on a particular feature may not be directly affected by other features, but data on other features help setting the prior distribution
46 Borrow information from all other probes on the same array Robust Multiarray Analysis (RMA) Y=B+S B ~ Normal (, 2 ) All probes exchangeable S ~ Exp(a) Estimate a,, 2 using all probes ( borrow strength across all probes ) Estimated Signal: E [ S Y ] Irizarry et al (2003)
47 Does it help? MAS 5.0 RMA
48 Aclose-up
49 II: Borrow information from other Rational: similar probes (GCRMA) identity of probes are determined by their sequences. predict affinity to non-specific binding based on sequence probes sharing similar affinity are expected to have similar background level
50 Which are the similar probes 0.6 C C C C C C C C C C C C 0.4 C C C C C C 0.2 C C C C C 0.0 GC A G G G G G G G T G C T G T T G T T T T T T T GT GT GT GT T T T T T T T T G G G G A G G G G A A A A A A A G A T G A T A A A A A A A A A A A A A A j, k1 1 k 1 j { A, T, G, C} b k j µ j,k ~ smooth function of k
51
52 Normalization Normalization is needed to ensure that differences in intensities are indeed due to differential expression, and not some printing, hybridization, or scanning artifact. Normalization is different in spotted/two-color and highdensity-oligonucleotides (Affy) technologies No dye bias (hence no within-array normalization to remove dye effect) No print-tip issue
53 Normalization density log (PM) These are from replicate RNA Left are empirical density These are from replicate RNA. Left are empirical density estimates for 5 GeneChip arrays.
54 Non-linear dependency on intensity again These are two technical replicates. Only the red are differentially expressed
55 Nonlinear normalization Proposed solutions Force distributions (not just medians) to be the same: Amaratunga and Cabrera (2001) Bolstad et al. (2003) Use curve estimators such as splines to adjust for the effect: Li and Wong (2001) Note: they also use a rank invariant set Colantuoni et al (2002) Dudoit et al (2002) Use adjustments based on additive/multiplicative model: Rocke and Durbin (2003) Huber et al (2002) Cui et al (2003)
56 Quantile normalization All these non-linear methods perform similarly Quantile normalization is easy and fast Basic idea: order value in each array take average across probes Substitute probe intensity with average Put in original order
57 Example of quantile normalization Original Ordered Averaged Re-ordered
58 How does it work Before After But does it force everything to be the same and remove real differences?
59 After quantile normalization We say more later..
60 Assumption Either the majority of genes are not differentially expressed Or the differentially expression is symmetric
61 Dilution Experiment crna hybridized to human chip (HGU95) in range of proportions and dilutions Dilution series begins at 1.25 g crna per GeneChip array, and rises through 2.5, 5.0, 7.5, 10.0, to 20.0 g per array. 5 replicate chips were used at each dilution A few control genes were spiked in at same concentration across all arrays. Notice we can only make usual assumption within groups of 5
62 Raw Data
63 If you can t normalize your conclusions will likely l be wrong! 1.25 g, g 2.5 g,1g st 5 th scanner scanner
64 Normalize with control genes Looks almost as bad!
65 We can not median normalize We wash out real biological effects
66 Or quantile normalize
67 But we can quantile normalize and then use controls
68 Look at the improvement
69 Look at the improvement
70 Summarization Robust average Median polish (rma, gcrma) For a matrix X, assumes X ij =X 0 +R i +C j Estimates row effects (probe) and column effects (sample). Tukey, J. W. (1977). Exploratory Data Analysis R function medpolish (package: stat) Tukey biweight (MAS 5.0) R function tukey.biweight (package: affy)
71 Influence function How one observation changes the statistic Sample Mean can be sensitive to the change of any one data point Change x i to x *, the new sample mean changes by (x * /n-x i /n) The influence function is a straight line
72 FYI
73 summary Preprocessing is a common problem for many technologies Always keep in mind what is expected in the data. This is useful for quality control and normalization Where to look for pseudo-control when real controls are not available Robustness is important Empirical Bayes is particularly useful in bioinformatics
74 Exercise For those in Biostat PhD: Consider the model Y=B+S, where B ~ Normal (, 2 ) and S~ Exp(a). Suppose, 2 and a are known parameters. Derive the conditional expectation E[S Y] The code for plotting the prior and posterior distributions is provided in the lecture notes. For each of the two priors, Plot the posterior distributions of pi in the toy example, suppose the data is now 8 success out of 10 trials. Plot the posterior distributions of pi in the toy example, suppose the data is now 80 success out of 100 trials. (For better visual effect you may have to plot the two posterior distributions separately rather than in the same figure as shown in the lecture notes.) how would you explain to a non-statistician the role of prior and likelihood in Bayesian inference?
/ Computational Genomics. Normalization
10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level
More informationMicroarray Data Analysis (VI) Preprocessing (ii): High-density Oligonucleotide Arrays
Microarray Data Analysis (VI) Preprocessing (ii): High-density Oligonucleotide Arrays High-density Oligonucleotide Array High-density Oligonucleotide Array PM (Perfect Match): The perfect match probe has
More informationMicroarray Data Analysis (V) Preprocessing (i): two-color spotted arrays
Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays Preprocessing Probe-level data: the intensities read for each of the components. Genomic-level data: the measures being used in
More informationUsing FARMS for summarization Using I/NI-calls for gene filtering. Djork-Arné Clevert. Institute of Bioinformatics, Johannes Kepler University Linz
Software Manual Institute of Bioinformatics, Johannes Kepler University Linz Using FARMS for summarization Using I/NI-calls for gene filtering Djork-Arné Clevert Institute of Bioinformatics, Johannes Kepler
More informationGene Expression an Overview of Problems & Solutions: 1&2. Utah State University Bioinformatics: Problems and Solutions Summer 2006
Gene Expression an Overview of Problems & Solutions: 1&2 Utah State University Bioinformatics: Problems and Solutions Summer 2006 Review DNA mrna Proteins action! mrna transcript abundance ~ expression
More informationNormalization: Bioconductor s marray package
Normalization: Bioconductor s marray package Yee Hwa Yang 1 and Sandrine Dudoit 2 October 30, 2017 1. Department of edicine, University of California, San Francisco, jean@biostat.berkeley.edu 2. Division
More informationDescription of gcrma package
Description of gcrma package Zhijin(Jean) Wu, Rafael Irizarry October 30, 2018 Contents 1 Introduction 1 2 What s new in version 2.0.0 3 3 Options for gcrma 3 4 Getting started: the simplest example 4
More informationIntroduction to GE Microarray data analysis Practical Course MolBio 2012
Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical
More informationAnalysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation
Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation MICROARRAY ANALYSIS My (Educated?) View 1. Data included in GEXEX a. Whole data stored and securely available b. GP3xCLI on
More informationBIOINFORMATICS. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
BIOINFORMATICS Vol. 19 no. 2 2003 Pages 185 193 A comparison of normalization methods for high density oligonucleotide array data based on variance and bias B. M. Bolstad 1,,R.A.Irizarry 2,M.Åstrand 3
More informationMATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster
MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 6 November 2009 3.00 pm BRAGG Cluster This document contains the tasks need to be done and completed by
More informationCARMAweb users guide version Johannes Rainer
CARMAweb users guide version 1.0.8 Johannes Rainer July 4, 2006 Contents 1 Introduction 1 2 Preprocessing 5 2.1 Preprocessing of Affymetrix GeneChip data............................. 5 2.2 Preprocessing
More informationOrganizing, cleaning, and normalizing (smoothing) cdna microarray data
Organizing, cleaning, and normalizing (smoothing) cdna microarray data All product names are given as examples only and they are not endorsed by the USDA or the University of Illinois. INTRODUCTION The
More informationFrozen Robust Multi-Array Analysis and the Gene Expression Barcode
Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Matthew N. McCall November 9, 2017 Contents 1 Frozen Robust Multiarray Analysis (frma) 2 1.1 From CEL files to expression estimates...................
More informationBioconductor s stepnorm package
Bioconductor s stepnorm package Yuanyuan Xiao 1 and Yee Hwa Yang 2 October 18, 2004 Departments of 1 Biopharmaceutical Sciences and 2 edicine University of California, San Francisco yxiao@itsa.ucsf.edu
More informationPROCEDURE HELP PREPARED BY RYAN MURPHY
Module on Microarray Statistics for Biochemistry: Metabolomics & Regulation Part 2: Normalization of Microarray Data By Johanna Hardin and Laura Hoopes Instructions and worksheet to be handed in NAME Lecture/Discussion
More informationGene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients
1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public
More informationMiChip. Jonathon Blake. October 30, Introduction 1. 5 Plotting Functions 3. 6 Normalization 3. 7 Writing Output Files 3
MiChip Jonathon Blake October 30, 2018 Contents 1 Introduction 1 2 Reading the Hybridization Files 1 3 Removing Unwanted Rows and Correcting for Flags 2 4 Summarizing Intensities 3 5 Plotting Functions
More informationaffypara: Parallelized preprocessing algorithms for high-density oligonucleotide array data
affypara: Parallelized preprocessing algorithms for high-density oligonucleotide array data Markus Schmidberger Ulrich Mansmann IBE August 12-14, Technische Universität Dortmund, Germany Preprocessing
More informationApplying Data-Driven Normalization Strategies for qpcr Data Using Bioconductor
Applying Data-Driven Normalization Strategies for qpcr Data Using Bioconductor Jessica Mar April 30, 2018 1 Introduction High-throughput real-time quantitative reverse transcriptase polymerase chain reaction
More informationaffyqcreport: A Package to Generate QC Reports for Affymetrix Array Data
affyqcreport: A Package to Generate QC Reports for Affymetrix Array Data Craig Parman and Conrad Halling April 30, 2018 Contents 1 Introduction 1 2 Getting Started 2 3 Figure Details 3 3.1 Report page
More informationIntroduction to the xps Package: Comparison to Affymetrix Power Tools (APT)
Introduction to the xps Package: Comparison to Affymetrix Power Tools (APT) Christian Stratowa April, 2010 Contents 1 Introduction 1 2 Comparison of results for the Human Genome U133 Plus 2.0 Array 2 2.1
More informationPreprocessing Affymetrix Data. Educational Materials 2005 R. Irizarry and R. Gentleman Modified 21 November, 2009, M. Morgan
Preprocessing Affymetrix Data Educational Materials 2005 R. Irizarry and R. Gentleman Modified 21 November, 2009, M. Morgan 1 Input, Quality Assessment, and Pre-processing Fast Track Identify CEL files.
More informationMethodology for spot quality evaluation
Methodology for spot quality evaluation Semi-automatic pipeline in MAIA The general workflow of the semi-automatic pipeline analysis in MAIA is shown in Figure 1A, Manuscript. In Block 1 raw data, i.e..tif
More informationPackage OLIN. September 30, 2018
Version 1.58.0 Date 2016-02-19 Package OLIN September 30, 2018 Title Optimized local intensity-dependent normalisation of two-color microarrays Author Matthias Futschik Maintainer Matthias
More informationHow to use the DEGseq Package
How to use the DEGseq Package Likun Wang 1,2 and Xi Wang 1. October 30, 2018 1 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST /Department of Automation, Tsinghua University. 2
More informationCodelink Legacy: the old Codelink class
Codelink Legacy: the old Codelink class Diego Diez October 30, 2018 1 Introduction Codelink is a platform for the analysis of gene expression on biological samples property of Applied Microarrays, Inc.
More informationPackage stepnorm. R topics documented: April 10, Version Date
Version 1.38.0 Date 2008-10-08 Package stepnorm April 10, 2015 Title Stepwise normalization functions for cdna microarrays Author Yuanyuan Xiao , Yee Hwa (Jean) Yang
More informationMicro-array Image Analysis using Clustering Methods
Micro-array Image Analysis using Clustering Methods Mrs Rekha A Kulkarni PICT PUNE kulkarni_rekha@hotmail.com Abstract Micro-array imaging is an emerging technology and several experimental procedures
More informationBayesian Analysis of Differential Gene Expression
Bayesian Analysis of Differential Gene Expression Biostat Journal Club Chuan Zhou chuan.zhou@vanderbilt.edu Department of Biostatistics Vanderbilt University Bayesian Modeling p. 1/1 Lewin et al., 2006
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS
Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak
More informationPackage affyplm. January 6, 2019
Version 1.58.0 Title Methods for fitting probe-level models Author Ben Bolstad Maintainer Ben Bolstad Package affyplm January 6, 2019 Depends R (>= 2.6.0), BiocGenerics
More informationNature Publishing Group
Figure S I II III 6 7 8 IV ratio ssdna (S/G) WT hr hr hr 6 7 8 9 V 6 6 7 7 8 8 9 9 VII 6 7 8 9 X VI XI VIII IX ratio ssdna (S/G) rad hr hr hr 6 7 Chromosome Coordinate (kb) 6 6 Nature Publishing Group
More informationTowards an Optimized Illumina Microarray Data Analysis Pipeline
Towards an Optimized Illumina Microarray Data Analysis Pipeline Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center, Northwestern University August 06, 2007 Outline Introduction of Illumina Beadarray
More informationA short reference to FSPMA definition files
A short reference to FSPMA definition files P. Sykacek Department of Genetics & Department of Pathology University of Cambridge peter@sykacek.net June 22, 2005 Abstract This report provides a brief reference
More informationEBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments
EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments Ning Leng and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 The model 2 2.1 EBSeqHMM model..........................................
More informationAffymetrix GeneChip DNA Analysis Software
Affymetrix GeneChip DNA Analysis Software User s Guide Version 3.0 For Research Use Only. Not for use in diagnostic procedures. P/N 701454 Rev. 3 Trademarks Affymetrix, GeneChip, EASI,,,, HuSNP, GenFlex,
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510
More informationIntroduction to the Bioconductor marray package : Input component
Introduction to the Bioconductor marray package : Input component Yee Hwa Yang 1 and Sandrine Dudoit 2 April 30, 2018 Contents 1. Department of Medicine, University of California, San Francisco, jean@biostat.berkeley.edu
More informationFuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image
www.ijcsi.org 316 Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image J.Harikiran 1, D.RamaKrishna 2, M.L.Phanendra 3, Dr.P.V.Lakshmi 4, Dr.R.Kiran Kumar
More informationEBSeq: An R package for differential expression analysis using RNA-seq data
EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John A. Dawson, Christina Kendziorski October 9, 2012 Contents 1 Introduction 2 2 The Model 3 2.1 Two conditions............................
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationClass Discovery and Prediction of Tumor with Microarray Data
Minnesota State University, Mankato Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato Theses, Dissertations, and Other Capstone Projects 2011 Class Discovery
More informationAnalyzing ICAT Data. Analyzing ICAT Data
Analyzing ICAT Data Gary Van Domselaar University of Alberta Analyzing ICAT Data ICAT: Isotope Coded Affinity Tag Introduced in 1999 by Ruedi Aebersold as a method for quantitative analysis of complex
More informationmultiscan: Combining multiple laser scans of microarrays
multiscan: Combining multiple laser scans of microarrays Mizanur R. Khondoker Chris A. Glasbey Bruce J. Worton April 4, 2013 Contents 1 Introduction 1 2 Package documentation 2 2.1 Help files..........................................
More informationA Two-Way Semi-Linear Model for Normalization and Significant Analysis of cdna Microarray Data
A Two-Way Semi-Linear Model for Normalization and Significant Analysis of cdna Microarray Data Jian Huang, Deli Wang, and Cun-Hui Zhang 1:Dpeartment of Statistics and Actuarial Science, and Program in
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers HW 34. Sketch
More informationThe analysis of acgh data: Overview
The analysis of acgh data: Overview JC Marioni, ML Smith, NP Thorne January 13, 2006 Overview i snapcgh (Segmentation, Normalisation and Processing of arraycgh data) is a package for the analysis of array
More informationBayesian Robust Inference of Differential Gene Expression The bridge package
Bayesian Robust Inference of Differential Gene Expression The bridge package Raphael Gottardo October 30, 2017 Contents Department Statistics, University of Washington http://www.rglab.org raph@stat.washington.edu
More informationsscore July 13, 2010 vector of data tuning constant (see details) fuzz value to avoid division by zero (see details)
sscore July 13, 2010 OneStepBiweightAlgorithm One-step Tukey s biweight Computes one-step Tukey s biweight on a vector. that this implementation follows the Affymetrix code, which is different from the
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves
More informationParallelized preprocessing algorithms for high-density oligonucleotide arrays
Parallelized preprocessing algorithms for high-density oligonucleotide arrays Markus Schmidberger, Ulrich Mansmann Chair of Biometrics and Bioinformatics, IBE University of Munich 81377 Munich, Germany
More informationPackage spikeli. August 3, 2013
Package spikeli August 3, 2013 Type Package Title Affymetrix Spike-in Langmuir Isotherm Data Analysis Tool Version 2.21.0 Date 2009-04-03 Author Delphine Baillon, Paul Leclercq ,
More informationAutomated Bioinformatics Analysis System on Chip ABASOC. version 1.1
Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular
More informationAgiMicroRna. Pedro Lopez-Romero. April 30, 2018
AgiMicroRna Pedro Lopez-Romero April 30, 2018 1 Package Overview AgiMicroRna provides useful functionality for the processing, quality assessment and differential expression analysis of Agilent microrna
More informationHow do microarrays work
Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,
More informationPackage LMGene. R topics documented: December 23, Version Date
Version 2.38.0 Date 2013-07-24 Package LMGene December 23, 2018 Title LMGene Software for Data Transformation and Identification of Differentially Expressed Genes in Gene Expression Arrays Author David
More informationEECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ GeneChip 2011/11/29 EECS 730 2 Hybridization to the Chip 2011/11/29
More informationaroma.affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory
aroma.affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory Henrik Bengtsson, Ken Simpson, James Bullard, Kasper Hansen February 18, 2008 Abstract
More informationDNA microarrays [1] are used to measure the expression
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 24, NO. 7, JULY 2005 901 Mixture Model Analysis of DNA Microarray Images K. Blekas*, Member, IEEE, N. P. Galatsanos, Senior Member, IEEE, A. Likas, Senior Member,
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationPackage sscore. R topics documented: June 27, Version Date
Version 1.52.0 Date 2009-04-11 Package sscore June 27, 2018 Title S-Score Algorithm for Affymetrix Oligonucleotide Microarrays Author Richard Kennedy , based on C++ code from
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005
Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber June 7, 00 The following exercise will guide you through the first steps of a spotted cdna microarray analysis.
More informationQAnalyzer 1.0c. User s Manual. Features
QAnalyzer 1.0c User s Manual Packard BioScience QuantArray(QA) microarray analysis software is a powerful microarray analysis software that enables researchers to easily and accurately visualize and quantitate
More informationExpander Online Documentation
Expander Online Documentation Table of Contents Introduction...1 Starting EXPANDER...2 Input Data...4 Preprocessing GE Data...8 Viewing Data Plots...12 Clustering GE Data...14 Biclustering GE Data...17
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth
Exploring cdna Data Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you
More informationComputer Exercise - Microarray Analysis using Bioconductor
Computer Exercise - Microarray Analysis using Bioconductor Introduction The SWIRL dataset The SWIRL dataset comes from an experiment using zebrafish to study early development in vertebrates. SWIRL is
More informationAnaquin - Vignette Ted Wong January 05, 2019
Anaquin - Vignette Ted Wong (t.wong@garvan.org.au) January 5, 219 Citation [1] Representing genetic variation with synthetic DNA standards. Nature Methods, 217 [2] Spliced synthetic genes as internal controls
More informationPackage frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode
Version 1.34.0 Date 2017-11-08 Title Frozen RMA and Barcode Package frma March 8, 2019 Preprocessing and analysis for single microarrays and microarray batches. Author Matthew N. McCall ,
More informationPackage ffpe. October 1, 2018
Type Package Package ffpe October 1, 2018 Title Quality assessment and control for FFPE microarray expression data Version 1.24.0 Author Levi Waldron Maintainer Levi Waldron
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber
Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you
More informationComputer Vision I - Basics of Image Processing Part 1
Computer Vision I - Basics of Image Processing Part 1 Carsten Rother 28/10/2014 Computer Vision I: Basics of Image Processing Link to lectures Computer Vision I: Basics of Image Processing 28/10/2014 2
More informationSchedule for Rest of Semester
Schedule for Rest of Semester Date Lecture Topic 11/20 24 Texture 11/27 25 Review of Statistics & Linear Algebra, Eigenvectors 11/29 26 Eigenvector expansions, Pattern Recognition 12/4 27 Cameras & calibration
More informationCorrection for pixel censoring in cdna microarrays
Correction for pixel censoring in cdna microarrays Chris Glasbey 1 and Mizanur Khondoker 1 1 Biomathematics and Statistics Scotland, King s Buildings, Edinburgh EH9 3JZ, UK Abstract: cdna microarrays are
More informationStat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution
Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal
More information2 varying degree of distortion, or loss. The degree of loss can be chosen by the user, or be based on parameters such as local image intensities. We c
On the Bitplane Compression of Microarray Images. Rebecka Jornsten, Yehuda Vardi, Cun-Hui Zhang. Abstract. The microarray image technology is a new and powerful tool for studying the expression of thousands
More informationPackage yaqcaffy. January 19, 2019
Package yaqcaffy January 19, 2019 Title Affymetrix expression data quality control and reproducibility analysis Version 1.42.0 Author Quality control of Affymetrix GeneChip expression data and reproducibility
More informationPattern recognition (4)
Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and
More informationCANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.
CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known
More informationStatistical Methods for Preprocessing Microarray Gene Expression Data. Md. Mizanur Rahman Khondoker
Statistical Methods for Preprocessing Microarray Gene Expression Data Md. Mizanur Rahman Khondoker Doctor of Philosophy University of Edinburgh 2006 Abstract DNA microarrays facilitate the simultaneous
More informationRJaCGH, a package for analysis of
RJaCGH, a package for analysis of CGH arrays with Reversible Jump MCMC 1. CGH Arrays: Biological problem: Changes in number of DNA copies are associated to cancer activity. Microarray technology: Oscar
More informationStat 602X Exam 2 Spring 2011
Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . Below is a small p classification training set (for classes) displayed in
More informationExpander 7.2 Online Documentation
Expander 7.2 Online Documentation Introduction... 2 Starting EXPANDER... 2 Input Data... 3 Tabular Data File... 4 CEL Files... 6 Working on similarity data no associated expression data... 9 Working on
More informationPackage AffyExpress. October 3, 2013
Version 1.26.0 Date 2009-07-22 Package AffyExpress October 3, 2013 Title Affymetrix Quality Assessment and Analysis Tool Author Maintainer Xuejun Arthur Li Depends R (>= 2.10), affy (>=
More informationChapter 2: The Normal Distribution
Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60
More informationChapter 2 Modeling Distributions of Data
Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationAutomatic Techniques for Gridding cdna Microarray Images
Automatic Techniques for Gridding cda Microarray Images aima Kaabouch, Member, IEEE, and Hamid Shahbazkia Department of Electrical Engineering, University of orth Dakota Grand Forks, D 58202-765 2 University
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationSmoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data
Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current
More informationBioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness
Bioconductor exercises Exploring cdna data June 2004 Wolfgang Huber and Andreas Buness The following exercise will show you some possibilities to load data from spotted cdna microarrays into R, and to
More informationA Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods
The Open Bioinformatics Journal, 2012, 6, 37-42 37 Open Access A Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods Dajie Luo 1,#, Prithish Banerjee 1,#, E. James Harner
More informationUsing oligonucleotide microarray reporter sequence information for preprocessing and quality assessment
Using oligonucleotide microarray reporter sequence information for preprocessing and quality assessment Wolfgang Huber and Robert Gentleman April 30, 2018 Contents 1 Overview 1 2 Using probe packages 1
More informationChapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea
Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.
More information