Simulation studies of module preservation: Simulation study of weak module preservation
|
|
- Derek Parker
- 5 years ago
- Views:
Transcription
1 Simulation studies of module preservation: Simulation study of weak module preservation Peter Langfelder and Steve Horvath October 25, 2010 Contents 1 Overview 1 1.a Setting up the R session Data simulation 2 3 Module identification 2 4 Calculation of module preservation 5 5 Analysis of results 5 1 Overview This tutorial presents simulation a simulation study of module preservation in which we simulate a reference set with 20 modules of sizes around 200 profiles ( genes ), and a test set in which 10 of the 20 reference modules are preserved, and genes in the other 10 modules are simulated with independent random profiles (in the language of WGCNA these genes are simulated grey ). Unlike in our other simulation studies, here genes in the preserved modules are simulated to be only very weakly co-expressed. In fact, we set up the parameters such that the standard module identification method in WGCNA does not find any modules; hence, cross-tabulation methods would by definition conclude that none of the modules are preserved. To give cross-tabulation methods a chance, we also employ Partitioning Around Medoids (PAM) with a fixed number of clusters to partition the test set into 20 clusters. We find that PAM is moderately successful in identifying the preserved modules. Lastly, we apply the function cluterrepro to this simulated data and find that observed IGP is not very good at distinguishing the preserved and non-preserved modules. We encourage readers unfamiliar with any of the functions used in this tutorial to type, in the active R session, help(functionname) (replace functionname with the actual name of the function) to get a detailed description of what the functions does, what the input arguments mean, and what is the output. 1.a Setting up the R session After starting R we execute a few commands to set the working directory and load the requisite packages: # Display the current working directory getwd(); # If necessary, change the path below to the directory where the data files are stored. # "." means current directory. On Windows use a forward slash / instead of the usual \. workingdir = "."; setwd(workingdir);
2 # Load the packages WGCNA and cluster library(wgcna); library(cluster); # The following setting is important, do not omit. options(stringsasfactors = FALSE); 2 Data simulation We simulate two data sets, each with 100 samples. First we set up simulation parameters such as module sizes etc. We also set up parameters such that in the reference data set the genes in each module are tightly co-expressed, but in the test set genes in each preserved module are only weakly co-expressed. nsamples = 100; ngenes = 5000; nmodules = c(20,20); prop = seq(from = 0.044, to = 0.037, length.out = nmodules[1]+1); modprops = list(prop, prop); nsets = 2; # Here we set how tightly co-expressed the modules should be. mincor = c(0.3, 0.05); maxcor = c(1, 0.35); eigengenes = list(); expr = list(); simlabels = list(); cutheight = c(0.999, ); Next we simulate the data using the WGCNA function simulatemultiexpr. We define the matrix leaveout which tells the simulation function which modules should be left out in each of the data sets. In this case, we leave out half of the modules in the second data set. The seed eigengenes are simulated as independent random vectors. The modules in the test (second) data set are simulated to be very loose. set.seed(1); leaveout = list(rep(false, nmodules[1]), rep(false, nmodules[1])); leaveout[[2]][c(1:(nmodules[1]/2))*2] = TRUE simorder = list(); for (set in 1:nSets) eigengenes[[set]] = matrix(rnorm(nsamples * nmodules[set]), nsamples, nmodules[set]) x = simulatedatexpr(eigengenes[[set]], ngenes, modprops[[set]], mincor = mincor[set], maxcor = maxcor[set], signed = TRUE, backgroundnoise = 1.0, leaveout = leaveout[[set]]); simlabels[[set]] = x$alllabels simorder[[set]] = x$labelorder expr[[set]] = list(data = x$datexpr); colnames(expr[[set]]$data) = spaste("gene.", c(1:ngenes)); 3 Module identification We now identify modules in the each of the simulated data sets using the WGCNA function blockwisemodules. mods = list(); # Sof thresholding powers for network definition. power = c(6, 4); collectgarbage(); labels = list();
3 nn = if (interactive()) nsets else 1; for (set in 1:nn) mods[[set]] = blockwisemodules(expr[[set]]$data, networktype = "signed hybrid", deepsplit = 1, detectcutheight = cutheight[set], TOMType = "none", power = power[set], numericlabels = TRUE, verbose = 4); labels[[set]] = matchlabels(mods[[set]]$colors, simlabels[[set]]); collectgarbage(); We also run Partitioning Around Medoids (PAM) on the data. PAMlabels = matrix(0, ngenes, nsets) for (set in 1:nSets) cr = cor(expr[[set]]$data); cr[cr<0] = 0; adj = cr^power[set]; dist = as.dist(1-adj); PAMlabels[, set] = pam(dist, nmodules[set], cluster.only = TRUE); PAMlabels[, set] = matchlabels(pamlabels[, set], simlabels[[set]]); collectgarbage(); How did module identification do? We plot the gene dendorgrams with the simulaeted and identified module colors. sizegrwindow(10,7); #pdf(file = "Plots/preserved-moduleDetectionFailed-dendrograms.pdf", width = 10, height = 7) layout(matrix(c(1:5), 5, 1), heights = c(rep(c(0.8, 0.2), 2), 0.3)); setnames = c("reference data set", "Test data set"); for (set in 1:nSets) if (set==1) colors = labels2colors(cbind(labels[[1]], simlabels[[set]])) names = c("inferred", "Simulated"); else colors = labels2colors(cbind(pamlabels[, set], simlabels[[set]])) names = c("pam", "Simulated"); plotdendroandcolors(mods[[set]]$dendrograms[[1]], colors, names, dendrolabels = FALSE, hang = 0.03, main = spaste(letters[set], ". ", setnames[set], ": gene clustering tree and module colors"), setlayout = FALSE, abheight = cutheight[set], cex.colorlabels = 1.2, cex.main = 1.5, cex.lab = 1.2, cex.axis = 1.2); The result is shown in Figure 1. In the test data set, hierarchical clustering did not identify any modules. That is because we have simulated the modules with very weak correlations.
4 A. Reference data set: gene clustering tree and module colors Height Inferred Simulated d hclust (*, "average") B. Test data set: gene clustering tree and module colors Height PAM Simulated PAM Simulated d hclust (*, "average") C. PAM vs. simulated module colors Figure 1: Module identification in the simulated data sets. In the reference set the hierarchical clustering (panel A) easily identifies the 20 modules as distinct branches. Simulated and identified module colors, shown below the dendrogram, show excellent agreement. In the test set (panel B) the hierarchical clustering did not identify any recognizable branches. The simulated and PAM colors, shown below the clustering tree, also do not show any apparent relationship to the dendrogram. Panel C shows a comparison of simulated module colors and PAM cluster labels. It is very difficult to argue that any of the modules in the test set are preserved.
5 4 Calculation of module preservation Here we run the main module preservation function modulepreservation. After the calculation we save the results; if a re-analysis of previously calculated results is performed, one can simply read the results from disk, thus saving a lot of time. names(expr) = c("set1", "Set2"); labellist = list(labels[[1]], PAMlabels[, 2]); names(labellist) = names(expr); mp = modulepreservation(expr, labellist, networktype = "signed", npermutations = 200, verbose = 3, maxgoldmodulesize = 1000); # Save the module preservation results as well as the PAM cluster labels save(mp, PAMlabels, file = "preserved-moduledetectionfailed-20modules.rdata"); If the module preservation results have been calculated previously, load the results from the disk: load(file= "preserved-moduledetectionfailed-20modules.rdata"); Calculation of IGP in clusterrepro Here we apply cluterrepro to the test set. We calculated the eigengenes of the reference modules in the test set and use them as the centroids in the IGP calculation. # Need centroids for the new data set. Calculate module eigengenes. MEs = moduleeigengenes(expr[[2]]$data, labels[[1]])$eigengenes # Get rid of the grey eigengene MEs = MEs[, -1] doclusterrepro = TRUE if (doclusterrepro) library(clusterrepro) rownames(mes) = spaste("sample.", c(1:nsamples)); rownames(expr[[2]]$data) = spaste("sample.", c(1:nsamples)); set.seed(40); print(system.time( cr = clusterrepro(as.matrix(mes), expr[[2]]$data, 1000); )); save(cr, file = "preserved-moduledetectionfailed-20modules-cr.rdata"); If the clusterrepro results have been calculated previously, load the results from the disk: load(file = "preserved-moduledetectionfailed-20modules-cr.rdata"); 5 Analysis of results Here we look at how well each method did at identifying the 10 preserved modules in the hopelessly noisy test data. Since the modules all have very similar sizes, we do not plot results as a function of module size; rather, in each plot we simply order the modules by their corresponding preservation statistic and look for a clean separation of preserved and non-preserved modules. # How well can one distinguish preserved from non-preserved modules? sizegrwindow(10,8) #pdf(file = "Plots/preserved-moduleDetectionFailed-20Modules-preservationSuccess.pdf", w= 10, h = 8); prescolor = c("red", "black")[as.numeric(leaveout[[2]])+1]; # Set graphical parameters par(mfrow = c(3,2)); par(mar = c(3.8, 3.8, 2, 0.5)); par(mgp = c(2.3, 0.7, 0));
6 cex.lab = 1.3; cex.axis = 1.3; cex.main = 1.4 # Module preservation: Zsummary scores Zs = mp$preservation$z[[1]][[2]]$zsummary[order(as.numeric(rownames(mp$preservation$z[[1]][[2]])))][-c(1:2)]; order = order(-zs); plot(zs[order], col = prescolor[order], cex.main=cex.main, xlab = "", ylab = "Preservation Zsummary",cex.lab = cex.lab, cex.axis = cex.axis, main = "A. Network-based preservation indices: Zsummary") # Module preservation: psummary statistics Zs = -mp$preservation$log.p[[1]][[2]]$log.psummary[ order(as.numeric(rownames(mp$preservation$z[[1]][[2]])))][-c(1:2)]; order = order(-zs); plot(zs[order], col = prescolor[order], xlab = "", ylab = "-log10(psummary)", cex.lab = cex.lab, cex.axis = cex.axis, cex.main=cex.main, main = "B. Network-based preservation indices: psummary") abline(h=-log10(0.05), col = "blue"); abline(h=-log10(0.05/nmodules[1]), col = "green"); # Co-clustering cc = mp$accuracy$observed[[1]][[2]][-1, coclustering ]; order = order(-cc) plot(cc[order], col = prescolor[order], cex.main=cex.main, xlab = "", ylab = "coclustering", cex.lab = cex.lab, cex.axis = cex.axis, main = "D. Cross-tabulation with results of PAM: Co-clustering") # Cross-tabulation: fisher p-value bestp = apply(tab$ptable[-1, ], 1, min); order = order(bestp) plot(-log10(pmin(rep(1, nmodules[1]), bestp[order])), col = prescolor[order], cex.main=cex.main, xlab = "", ylab = "-log10(overlap p-value)", cex.lab = cex.lab, cex.axis = cex.axis, main = "C. Cross-tabulation with results of PAM: overlap p-value") abline(h=-log10(0.05), col = "blue"); abline(h=-log10(0.05/nmodules[1]), col = "green"); # clusterrepro: observed IGP p = cr$actual.igp; order = order(-p); plot(p[order], col = prescolor[order], xlab = "", ylab = "IGP", cex.lab = cex.lab, cex.axis = cex.axis, cex.main=cex.main, main = "E. clusterrepro: IGP") # clusterrepro: permutation p-value p = cr$p; order = order(p); plot(-log10(p+1e-4)[order], col = prescolor[order], cex.main=cex.main, xlab = "", ylab = "-log10(clusterrepro p-value)", cex.lab = cex.lab, cex.axis = cex.axis, main = "F. clusterrepro: permutation p-value") abline(h=-log10(0.05), col = "blue"); abline(h=-log10(0.05/nmodules[1]), col = "green");
7 # If plotting into a pdf file, close it dev.off(); The resulting plots are shown in Figure 2. The figure shows that network preservation statistics are in this case successful in reliably separating preserved and non-preserved modules. On the other hand, cross-tabulation and clusterrepro have only limited success; based on Bonferoni corrected p-values, most of the preserved modules are called non-preserved. Preservation Zsummary A. Network based preservation indices: Zsummary Non preserved module Preserved module log10(psummary) B. Network based preservation indices: psummary Non preserved module Preserved module coclustering IGP D. Cross tabulation with results of PAM: Co clustering E. clusterrepro: IGP Non preserved module Preserved module Non preserved module Preserved module log10(overlap p value) log10(clusterrepro p value) C. Cross tabulation with results of PAM: overlap p value Non preserved module Preserved module F. clusterrepro: permutation p value Non preserved module Preserved module Figure 2: Success of several module preservation measures at distinguishing weakly preserved from non-preserved modules. In each plot, modules are ordered by the preservation statistic shown in the plot. Red color denotes preserved and black non-preserved modules. In the p-value plots (the right column), the blue line denotes the threshold p = 0.05, and the green line denotes the Bonferoni-corrected threshold p = In the clusterrepro p-value plot, we added 10 4 to all p-values so that zero p-values become 10 4 and fit into the plot. This figure shows that network preservation statistics are in this case successful in reliably separating preserved and non-preserved modules. On the other hand, cross-tabulation and clusterrepro have only limited success; based on Bonferoni corrected p-values, most of the preserved modules are called non-preserved.
Preservation of protein-protein interaction networks Simple simulated example
Preservation of protein-protein interaction networks Simple simulated example Peter Langfelder and Steve Horvath May, 0 Contents Overview.a Setting up the R session............................................
More informationSupplementary text S6 Comparison studies on simulated data
Supplementary text S Comparison studies on simulated data Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath Corresponding author: shorvath@mednet.ucla.edu Overview In this document we illustrate
More informationTutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning
Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 1. Data input and cleaning Peter Langfelder and Steve Horvath February 13, 2016 Contents
More informationTutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice
Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 2.b Step-by-step network construction and module detection Peter Langfelder and Steve
More informationShort tutorial on studying module preservation: Preservation of female mouse liver modules in male data
Short tutorial on studying module preservation: Preservation of female mouse liver modules in male data Peter Langfelder and Steve Horvath October 1, 0 Contents 1 Overview 1 1.a Setting up the R session............................................
More informationTutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches
Tutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches 8. Visualization of gene networks Steve Horvath and Peter Langfelder
More informationSupplemental Data. Cañas et al. Plant Cell (2017) /tpc
Supplemental Method 1. WGCNA script. #Microarray and Trait data load getwd() workingdir = "C:/Users/..." setwd(workingdir) library(wgcna) library(flashclust) options(stringsasfactors = FALSE) femdata =
More informationClustering using WGCNA
Clustering using WGCNA Overview: The WGCNA package (in R) uses functions that perform a correlation network analysis of large, high-dimensional data sets (RNAseq datasets). This unbiased approach clusters
More informationMeta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes
Meta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes Peter Langfelder and Steve Horvath June 27, 2012 Contents 1 Overview 1 2 Setting up the
More informationMeta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes
Meta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes Peter Langfelder and Steve Horvath June 26, 2012 Contents 1 Overview 1 2 Setting
More informationPackage dynamictreecut
Package dynamictreecut November 18, 2013 Version 1.60-2 Date 2013-11-16 Title Methods for detection of clusters in hierarchical clustering dendrograms. Author Peter Langfelder
More information(1) where, l. denotes the number of nodes to which both i and j are connected, and k is. the number of connections of a node, with.
A simulated gene co-expression network to illustrate the use of the topological overlap matrix for module detection Steve Horvath, Mike Oldham Correspondence to shorvath@mednet.ucla.edu Abstract Here we
More informationPackage dynamictreecut
Package dynamictreecut June 13, 2014 Version 1.62 Date 2014-05-07 Title Methods for detection of clusters in hierarchical clustering dendrograms. Author Peter Langfelder and
More informationIdentification of consensus modules in Adenocarcinoma data
Identification of consensus modules in Adenocarcinoma data Peter Langfelder and Steve Horvath June 9, 01 Contents 1 Overview 1 Setting up the R session 1 Loading of data Scale-free topology analysis Identification
More informationClustering. Dick de Ridder 6/10/2018
Clustering Dick de Ridder 6/10/2018 In these exercises, you will continue to work with the Arabidopsis ST vs. HT RNAseq dataset. First, you will select a subset of the data and inspect it; then cluster
More informationGeneral instructions:
R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied to Female Mouse Liver Microarray Data Jun Dong, Steve Horvath Correspondence: shorvath@mednet.ucla.edu, http://www.ph.ucla.edu/biostat/people/horvath.htm
More informationIntroduction to R for Epidemiologists
Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber
Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis, Heidelberg, March 2005 http://compdiag.molgen.mpg.de/ngfn/pma2005mar.shtml The following
More informationPackage MODA. January 8, 2019
Type Package Package MODA January 8, 2019 Title MODA: MOdule Differential Analysis for weighted gene co-expression network Version 1.8.0 Date 2016-12-16 Author Dong Li, James B. Brown, Luisa Orsini, Zhisong
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber
Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you
More informationModule 10. Data Visualization. Andrew Jaffe Instructor
Module 10 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots on Wednesday, but we are going to expand the ability to customize these basic graphics first. 2/37 But first...
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth
Exploring cdna Data Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you
More information10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2
161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationIntro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington
Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs
More informationPackage TROM. August 29, 2016
Type Package Title Transcriptome Overlap Measure Version 1.2 Date 2016-08-29 Package TROM August 29, 2016 Author Jingyi Jessica Li, Wei Vivian Li Maintainer Jingyi Jessica
More informationMicroarray Technology (Affymetrix ) and Analysis. Practicals
Data Analysis and Modeling Methods Microarray Technology (Affymetrix ) and Analysis Practicals B. Haibe-Kains 1,2 and G. Bontempi 2 1 Unité Microarray, Institut Jules Bordet 2 Machine Learning Group, Université
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationTutorial script for whole-cell MALDI-TOF analysis
Tutorial script for whole-cell MALDI-TOF analysis Julien Textoris June 19, 2013 Contents 1 Required libraries 2 2 Data loading 2 3 Spectrum visualization and pre-processing 4 4 Analysis and comparison
More informationPackage allelematch. R topics documented: February 19, Type Package
Type Package Package allelematch February 19, 2015 Title Identifying unique multilocus genotypes where genotyping error and missing data may be present Version 2.5 Date 2014-09-18 Author Paul Galpern
More informationPackage PropClust. September 15, 2018
Type Package Title Propensity Clustering and Decomposition Version 1.4-6 Date 2018-09-12 Package PropClust September 15, 2018 Author John Michael O Ranola, Kenneth Lange, Steve Horvath, Peter Langfelder
More informationLab 1 Introduction to R
Lab 1 Introduction to R Date: August 23, 2011 Assignment and Report Due Date: August 30, 2011 Goal: The purpose of this lab is to get R running on your machines and to get you familiar with the basics
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationAnalyzing Genomic Data with NOJAH
Analyzing Genomic Data with NOJAH TAB A) GENOME WIDE ANALYSIS Step 1: Select the example dataset or upload your own. Two example datasets are available. Genome-Wide TCGA-BRCA Expression datasets and CoMMpass
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org
More informationPackage ConsensusClusterPlus
Type Package Package ConsensusClusterPlus October 1, 2018 Imports Biobase, ALL, graphics, stats, utils, cluster Title ConsensusClusterPlus Version 1.44.0 Date 2015-12-29 Author Matt Wilkerson ,
More informationUNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering
UNSUPERVISED LEARNING IN R Introduction to hierarchical clustering Hierarchical clustering Number of clusters is not known ahead of time Two kinds: bottom-up and top-down, this course bottom-up Hierarchical
More informationEye Localization Using Color Information. Amit Chilgunde
Eye Localization Using Color Information Amit Chilgunde Department of Electrical and Computer Engineering National University of Singapore, Singapore ABSTRACT In this project, we propose localizing the
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationPackage hbm. February 20, 2015
Type Package Title Hierarchical Block Matrix Analysis Version 1.0 Date 2015-01-25 Author Maintainer Package hbm February 20, 2015 A package for building hierarchical block matrices from
More informationComputing with large data sets
Computing with large data sets Richard Bonneau, spring 2009 Lecture 8(week 5): clustering 1 clustering Clustering: a diverse methods for discovering groupings in unlabeled data Because these methods don
More informationStatistical Programming Camp: An Introduction to R
Statistical Programming Camp: An Introduction to R Handout 3: Data Manipulation and Summarizing Univariate Data Fox Chapters 1-3, 7-8 In this handout, we cover the following new materials: ˆ Using logical
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005
Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber June 7, 00 The following exercise will guide you through the first steps of a spotted cdna microarray analysis.
More informationjackstraw: Statistical Inference using Latent Variables
jackstraw: Statistical Inference using Latent Variables Neo Christopher Chung August 7, 2018 1 Introduction This is a vignette for the jackstraw package, which performs association tests between variables
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationChapter 6: Cluster Analysis
Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationThis tutorial is a similar analysis on the GBM data, but only with the 500 most biologically significant genes with respect to the survival time.
R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied to Brain Cancer Microarray Data Jun Dong, Steve Horvath Correspondence: shorvath@mednet.ucla.edu, http://www.ph.ucla.edu/biostat/people/horvath.htm
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationR-Programming Fundamentals for Business Students Cluster Analysis, Dendrograms, Word Cloud Clusters
R-Programming Fundamentals for Business Students Cluster Analysis, Dendrograms, Word Cloud Clusters Nick V. Flor, University of New Mexico (nickflor@unm.edu) Assumptions. This tutorial assumes (1) that
More informationIntroduction for heatmap3 package
Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are
More informationAn Introduction to Some Graphics in Bioconductor
n Introduction to ome raphics in ioconductor une 4, 2003 Introduction e first need to set up the basic data regarding the genome of interest. The chrom- ocation class describes the necessary components
More informationAn introduction to network inference and mining - TP
An introduction to network inference and mining - TP Nathalie Villa-Vialaneix - nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org INRA, UR 0875 MIAT Formation INRA, Niveau 3 Formation INRA (Niveau
More informationPackage QUBIC. September 1, 2018
Type Package Package QUBIC September 1, 2018 Title An R package for qualitative biclustering in support of gene co-expression analyses The core function of this R package is to provide the implementation
More informationUnsupervised learning: Clustering & Dimensionality reduction. Theo Knijnenburg Jorma de Ronde
Unsupervised learning: Clustering & Dimensionality reduction Theo Knijnenburg Jorma de Ronde Source of slides Marcel Reinders TU Delft Lodewyk Wessels NKI Bioalgorithms.info Jeffrey D. Ullman Stanford
More informationdata visualization Show the Data Snow Month skimming deep waters
data visualization skimming deep waters Show the Data Snow 2 4 6 8 12 Minimize Distraction Minimize Distraction Snow 2 4 6 8 12 2 4 6 8 12 Make Big Data Coherent Reveal Several Levels of Detail 1974 1975
More informationBioconductor s sva package
Bioconductor s sva package Jeffrey Leek and John Storey Department of Biostatistics University of Washington email: jtleek@u.washington.edu June 14, 2007 Contents 1 Overview 1 2 Simulated Eample 1 3 The
More informationPackage ctc. R topics documented: August 2, Version Date Depends amap. Title Cluster and Tree Conversion.
Package ctc August 2, 2013 Version 1.35.0 Date 2005-11-16 Depends amap Title Cluster and Tree Conversion. Author Antoine Lucas , Laurent Gautier biocviews Microarray,
More informationPackage DiffCorr. August 29, 2016
Type Package Package DiffCorr August 29, 2016 Title Analyzing and Visualizing Differential Correlation Networks in Biological Data Version 0.4.1 Date 2015-03-31 Author, Kozo Nishida Maintainer
More informationPackage comphclust. February 15, 2013
Package comphclust February 15, 2013 Version 1.0-1 Date 2010-02-27 Title Complementary Hierarchical Clustering Author Gen Nowak and Robert Tibshirani Maintainer Gen Nowak Description
More informationPackage NetCluster. R topics documented: February 19, Type Package Version 0.2 Date Title Clustering for networks
Type Package Version 0.2 Date 2010-05-09 Title Clustering for networks Package NetCluster February 19, 2015 Author Mike Nowak , Solomon Messing , Sean
More information## For detailed description of RF clustering theory and algorithm, ## please consult the following references.
###################################################### ## Random Forest Clustering Tutorial ## ## ## ## Copyright 2005 Tao Shi, Steve Horvath ## ## ## ## emails: shidaxia@yahoo.com (Tao Shi) ## ## shorvath@mednet.ucla.edu
More informationThe Generalized Topological Overlap Matrix in Biological Network Analysis
The Generalized Topological Overlap Matrix in Biological Network Analysis Andy Yip, Steve Horvath Email: shorvath@mednet.ucla.edu Depts Human Genetics and Biostatistics, University of California, Los Angeles
More informationPackage ibbig. R topics documented: December 24, 2018
Type Package Title Iterative Binary Biclustering of Genesets Version 1.26.0 Date 2011-11-23 Author Daniel Gusenleitner, Aedin Culhane Package ibbig December 24, 2018 Maintainer Aedin Culhane
More informationPackage comphclust. May 4, 2017
Version 1.0-3 Date 2017-05-04 Title Complementary Hierarchical Clustering Imports graphics, stats Package comphclust May 4, 2017 Description Performs the complementary hierarchical clustering procedure
More informationProblem Set 3. MATH 778C, Spring 2009, Austin Mohr (with John Boozer) April 15, 2009
Problem Set 3 MATH 778C, Spring 2009, Austin Mohr (with John Boozer) April 15, 2009 1. Show directly that P 1 (s) P 1 (t) for all t s. Proof. Given G, let H s be a subgraph of G on s vertices such that
More information#1#set Working directory #2# Download packages: source(" bioclite("affy") library (affy)
#1#set Working directory #2# Download packages: source("http://bioconductor.org/bioclite.r") bioclite("affy") library (affy) #3# Read the CEL files: Med
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationAn introduction to the picante package
An introduction to the picante package Steven Kembel (skembel@uoregon.edu) April 2010 Contents 1 Installing picante 1 2 Data formats in picante 1 2.1 Phylogenies................................ 2 2.2 Community
More informationPackage nlnet. April 8, 2018
Type Package Package nlnet April 8, 2018 Title Nonlinear Network Reconstruction, Clustering, and Variable Selection Based on DCOL (Distance Based on Conditional Ordered List) Version 1.2 Date 2018-04-07
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationPackage CEMiTool. April 11, 2018
Title Co-expression Modules identification Tool Version 1.0.3 Package CEMiTool April 11, 2018 The CEMiTool package unifies the discovery and the analysis of coexpression gene modules in a fully automatic
More informationExploring and Understanding Data Using R.
Exploring and Understanding Data Using R. Loading the data into an R data frame: variable
More informationHierarchical and Ensemble Clustering
Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7., EA], [25.5, KPM], [Fred & Jain, 25] COMP24 Machine Learning Outline Introduction Cluster Distance Measures Agglomerative Algorithm Example
More informationDidacticiel Études de cas
1 Subject Two step clustering approach on large dataset. The aim of the clustering is to identify homogenous subgroups of instance in a population 1. In this tutorial, we implement a two step clustering
More informationLD vignette Measures of linkage disequilibrium
LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn
More informationPackage cgh. R topics documented: February 19, 2015
Package cgh February 19, 2015 Version 1.0-7.1 Date 2009-11-20 Title Microarray CGH analysis using the Smith-Waterman algorithm Author Tom Price Maintainer Tom Price
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationjava -jar picard.jar CollectInsertSizeMetrics I=aln_sorted.bam O=out.metrics HISTOGRAM_FILE=chartoutput.pdf VALIDATION_STRINGENCY=LENIENT
Supplementary Note 1 Pre-Processing and Alignment Commands Trimmomatic Command java -jar trimmomatic-0.32.jar PE -threads 15 -phred33 /Volumes/Drobo_Storage/Raw_Data_and_Trimmomatic_Files/First_Six_Samples_Raw_Data/Ra
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationMultivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)
Multivariate analyses in ecology Cluster (part 2) Ordination (part 1 & 2) 1 Exercise 9B - solut 2 Exercise 9B - solut 3 Exercise 9B - solut 4 Exercise 9B - solut 5 Multivariate analyses in ecology Cluster
More informationPackage RTNduals. R topics documented: March 7, Type Package
Type Package Package RTNduals March 7, 2019 Title Analysis of co-regulation and inference of 'dual regulons' Version 1.7.0 Author Vinicius S. Chagas, Clarice S. Groeneveld, Gordon Robertson, Kerstin B.
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationPackage EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.
Type Package Title A package dedicated to questionnaires Version 0.10 Date 2009-06-10 Package EnQuireR February 19, 2015 Author Fournier Gwenaelle, Cadoret Marine, Fournier Olivier, Le Poder Francois,
More informationRunning Minitab for the first time on your PC
Running Minitab for the first time on your PC Screen Appearance When you select the MINITAB option from the MINITAB 14 program group, or click on MINITAB 14 under RAS you will see the following screen.
More informationPackage clusterseq. R topics documented: June 13, Type Package
Type Package Package clusterseq June 13, 2018 Title Clustering of high-throughput sequencing data by identifying co-expression patterns Version 1.4.0 Depends R (>= 3.0.0), methods, BiocParallel, bayseq,
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT MD Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 19
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More information5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction
Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering
More informationMATH5745 Multivariate Methods Lecture 13
MATH5745 Multivariate Methods Lecture 13 April 24, 2018 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 1 / 33 Cluster analysis. Example: Fisher iris data Fisher (1936) 1 iris data consists of
More informationPackage DPBBM. September 29, 2016
Type Package Title Dirichlet Process Beta-Binomial Mixture Version 0.2.5 Date 2016-09-21 Author Lin Zhang Package DPBBM September 29, 2016 Maintainer Lin Zhang Depends R (>= 3.1.0)
More informationCOmbined Mapping of Multiple clustering ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number K: R Package Vignette
COmbined Mapping of Multiple clustering ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number K: R Package Vignette Timothy E Sweeney Stanford University Albert Chen Stanford University
More informationPackage ClustGeo. R topics documented: July 14, Type Package
Type Package Package ClustGeo July 14, 2017 Title Hierarchical Clustering with Spatial Constraints Version 2.0 Author Marie Chavent [aut, cre], Vanessa Kuentz [aut], Amaury Labenne [aut], Jerome Saracco
More informationMultivariate Analysis (slides 9)
Multivariate Analysis (slides 9) Today we consider k-means clustering. We will address the question of selecting the appropriate number of clusters. Properties and limitations of the algorithm will be
More informationHierarchical clustering
Aprendizagem Automática Hierarchical clustering Ludwig Krippahl Hierarchical clustering Summary Hierarchical Clustering Agglomerative Clustering Divisive Clustering Clustering Features 1 Aprendizagem Automática
More informationData Term. Michael Bleyer LVA Stereo Vision
Data Term Michael Bleyer LVA Stereo Vision What happened last time? We have looked at our energy function: E ( D) = m( p, dp) + p I < p, q > N s( p, q) We have learned about an optimization algorithm that
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationVIDAEXPERT: DATA ANALYSIS Here is the Statistics button.
Here is the Statistics button. After creating dataset you can analyze it in different ways. First, you can calculate statistics. Open Statistics dialog, Common tabsheet, click Calculate. Min, Max: minimal
More informationPackage HMRFBayesHiC
Package HMRFBayesHiC February 3, 2015 Type Package Title HMRFBayesHiC conduct Hidden Markov Random Field (HMRF) Bayes Peak Calling Method on HiC Data Version 1.0 Date 2015-01-30 Author Zheng Xu Maintainer
More information