Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning

Size: px
Start display at page:

Download "Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning"

Transcription

1 Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 1. Data input and cleaning Peter Langfelder and Steve Horvath February 13, 2016 Contents 1 Data input, cleaning and pre-processing 1 1.a Loading expression data b Rudimentary data cleaning and outlier removal c Loading clinical trait data Data input, cleaning and pre-processing 1.a Loading expression data The expression data is contained in two files that come with this tutorial, LiverFemale3600.csv and LiverMale3600.csv. After starting an R session, we check that the current directory is appropriately set, and load the requisite packages and the data. In addition to expression data, the data files contain extra information about the surveyed probes we do not need. # Display the current working directory getwd(); # If necessary, change the path below to the directory where the data files are stored. # "." means current directory. On Windows use a forward slash / instead of the usual \. workingdir = "."; setwd(workingdir); # Load the package library(wgcna); # The following setting is important, do not omit. options(stringsasfactors = FALSE); #Read in the female liver data set femdata = read.csv("liverfemale3600.csv"); # Read in the male liver data set maledata = read.csv("livermale3600.csv"); # Take a quick look at what is in the data sets (caution, longish output): dim(femdata) names(femdata) dim(maledata) names(maledata)

2 The data sets contain roughly 130 samples each. Note that each row corresponds to a gene and column to a sample or auxiliary information. We now remove the auxiliary data and put the expression data into a multi-set format suitable for consensus analysis. # We work with two sets: nsets = 2; # For easier labeling of plots, create a vector holding descriptive names of the two sets. setlabels = c("female liver", "Male liver") shortlabels = c("female", "Male") # Form multi-set expression data: columns starting from 9 contain actual expression data. multiexpr = vector(mode = "list", length = nsets) multiexpr[[1]] = list(data = as.data.frame(t(femdata[-c(1:8)]))); names(multiexpr[[1]]$data) = femdata$substancebxh; rownames(multiexpr[[1]]$data) = names(femdata)[-c(1:8)]; multiexpr[[2]] = list(data = as.data.frame(t(maledata[-c(1:8)]))); names(multiexpr[[2]]$data) = maledata$substancebxh; rownames(multiexpr[[2]]$data) = names(maledata)[-c(1:8)]; # Check that the data has the correct format for many functions operating on multiple sets: The variable exprsize contains useful information about the sizes of all of the data sets: >exprsize $nsets [1] 2 $ngenes [1] 3600 $nsamples [1] $structureok [1] TRUE 1.b Rudimentary data cleaning and outlier removal We first identify genes and samples with excessive numbers of missing samples. These can be identified using the function goodsamplesgenesms. # Check that all genes and samples have sufficiently low numbers of missing values. gsg = goodsamplesgenesms(multiexpr, verbose = 3); gsg$allok If the last statement returns TRUE, all genes and samples have passed the cuts. If it returns FALSE, the following code removes the offending samples and genes: if (!gsg$allok) # Print information about the removed genes: if (sum(!gsg$goodgenes) > 0) printflush(paste("removing genes:", paste(names(multiexpr[[1]]$data)[!gsg$goodgenes], collapse = ", "))) for (set in 1:exprSize$nSets)

3 if (sum(!gsg$goodsamples[[set]])) printflush(paste("in set", setlabels[set], "removing samples", paste(rownames(multiexpr[[set]]$data)[!gsg$goodsamples[[set]]], collapse = ", "))) # Remove the offending genes and samples multiexpr[[set]]$data = multiexpr[[set]]$data[gsg$goodsamples[[set]], gsg$goodgenes]; # Update exprsize We now cluster the samples on their Euclidean distance, separately in each set. sampletrees = list() sampletrees[[set]] = hclust(dist(multiexpr[[set]]$data), method = "average") The easiest way to see the two dendrograms at the same time is to plot both into a pdf file that can be viewed using standard pdf viewers. pdf(file = "Plots/SampleClustering.pdf", width = 12, height = 12); par(mfrow=c(2,1)) par(mar = c(0, 4, 2, 0)) plot(sampletrees[[set]], main = paste("sample clustering on all genes in", setlabels[set]), xlab="", sub="", cex = 0.7); dev.off(); By inspection, there seems to be one outlier in the female data set, and no obvious outliers in the male set. We now remove the female outlier using a semi-automatic code that only requires a choice of a height cut. We first re-plot the two sample trees with the cut lines included (see Fig. 1): # Choose the "base" cut height for the female data set baseheight = 16 # Adjust the cut height for the male data set for the number of samples cutheights = c(16, 16*exprSize$nSamples[2]/exprSize$nSamples[1]); # Re-plot the dendrograms including the cut lines pdf(file = "Plots/SampleClustering.pdf", width = 12, height = 12); par(mfrow=c(2,1)) par(mar = c(0, 4, 2, 0)) plot(sampletrees[[set]], main = paste("sample clustering on all genes in", setlabels[set]), xlab="", sub="", cex = 0.7); abline(h=cutheights[set], col = "red"); dev.off(); Now comes the actual outlier removal: # Find clusters cut by the line labels = cutreestatic(sampletrees[[set]], cutheight = cutheights[set]) # Keep the largest one (labeled by the number 1) keep = (labels==1) multiexpr[[set]]$data = multiexpr[[set]]$data[keep, ]

4 collectgarbage(); # Check the size of the leftover data exprsize 1.c Loading clinical trait data We now read in the trait data and match the samples for which they were measured to the expression samples. traitdata = read.csv("clinicaltraits.csv"); dim(traitdata) names(traitdata) # remove columns that hold information we do not need. alltraits = traitdata[, -c(31, 16)]; alltraits = alltraits[, c(2, 11:36) ]; # See how big the traits are and what are the trait and sample names dim(alltraits) names(alltraits) alltraits$mice # Form a multi-set structure that will hold the clinical traits. Traits = vector(mode="list", length = nsets); setsamples = rownames(multiexpr[[set]]$data); traitrows = match(setsamples, alltraits$mice); Traits[[set]] = list(data = alltraits[traitrows, -1]); rownames(traits[[set]]$data) = alltraits[traitrows, 1]; collectgarbage(); # Define data set dimensions ngenes = exprsize$ngenes; nsamples = exprsize$nsamples; We now have the expression data for both sets in the variable multiexpr, and the corresponding clinical traits in the variable Traits. The last step is to save the relevant data for use in the subsequent analysis: save(multiexpr, Traits, ngenes, nsamples, setlabels, shortlabels, exprsize, file = "Consensus-dataInput.RData");

5 Sample clustering on all genes in Female liver F2_238 F2_235 F2_279 F2_148 F2_219 F2_152 F2_275 F2_172 F2_176 F2_317 F2_9 F2_93 F2_251 F2_186 F2_281 F2_120 F2_232 F2_179 F2_159 F2_178 F2_123 F2_185 F2_174 F2_149 F2_153 F2_197 F2_5 F2_59 F2_343 F2_49 F2_50 F2_220 F2_233 F2_207 F2_234 F2_105 F2_160 F2_183 F2_353 F2_18 F2_91 F2_17 F2_40 F2_265 F2_282 F2_74 F2_236 F2_315 F2_199 F2_28 F2_4 F2_92 F2_22 F2_33 F2_198 F2_285 F2_29 F2_237 F2_30 F2_147 F2_171 F2_121 F2_286 F2_254 F2_55 F2_76 F2_35 F2_314 F2_84 F2_276 F2_318 F2_57 F2_266 F2_60 F2_116 F2_217 F2_239 F2_7 F2_216 F2_316 F2_56 F2_73 F2_39 F2_184 F2_313 F2_104 F2_252 F2_13 F2_158 F2_173 F2_151 F2_124 F2_146 F2_16 F2_294 F2_268 F2_170 F2_249 F2_41 F2_75 F2_257 F2_209 F2_115 F2_292 F2_94 F2_34 F2_27 F2_114 F2_218 F2_250 F2_230 F2_8 F2_85 F2_284 F2_256 F2_280 F2_10 F2_295 F2_231 F2_122 F2_210 F2_6 F2_208 F2_274 Height F2_182 F2_332 F2_303 F2_304 F2_110 F2_155 F2_302 F2_245 F2_107 F2_228 F2_201 F2_80 F2_81 F2_108 F2_243 F2_263 F2_291 F2_87 F2_330 F2_69 F2_181 F2_157 F2_169 F2_112 F2_195 F2_355 F2_43 F2_20 F2_119 F2_329 F2_165 F2_300 F2_154 F2_167 F2_247 F2_298 F2_72 F2_324 F2_42 F2_145 F2_156 F2_290 F2_111 F2_189 F2_327 F2_309 F2_323 F2_223 F2_127 F2_162 F2_194 F2_222 F2_47 F2_326 F2_320 F2_51 F2_144 F2_289 F2_2 F2_89 F2_141 F2_188 F2_142 F2_26 F2_311 F2_192 F2_296 F2_270 F2_310 F2_312 F2_357 F2_125 F2_126 F2_328 F2_163 F2_190 F2_200 F2_166 F2_191 F2_215 F2_271 F2_261 F2_109 F2_306 F2_272 F2_213 F2_86 F2_299 F2_88 F2_15 F2_66 F2_143 F2_278 F2_321 F2_187 F2_180 F2_78 F2_305 F2_287 F2_308 F2_37 F2_71 F2_68 F2_83 F2_46 F2_225 F2_241 F2_214 F2_224 F2_70 F2_48 F2_117 F2_79 F2_226 F2_264 F2_19 F2_227 F2_212 F2_3 F2_52 F2_54 F2_45 F2_63 F2_65 F2_242 F2_23 F2_244 F2_14 F2_164 F2_248 F2_24 F2_288 F2_307 F2_325 F2_221 Sample clustering on all genes in Male liver Height Figure 1: Clustering of samples in the two input sets, female and male liver expression data. The red line in the female dendrogram is the cut line for outlier detection; samples cut by this line are considered outliers. The corresponding cut line for the male samples lies outside of the dendrogram, indicating that no samples from the male set are considered outliers.

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 2.b Step-by-step network construction and module detection Peter Langfelder and Steve

More information

Simulation studies of module preservation: Simulation study of weak module preservation

Simulation studies of module preservation: Simulation study of weak module preservation Simulation studies of module preservation: Simulation study of weak module preservation Peter Langfelder and Steve Horvath October 25, 2010 Contents 1 Overview 1 1.a Setting up the R session............................................

More information

Tutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches

Tutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches Tutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches 8. Visualization of gene networks Steve Horvath and Peter Langfelder

More information

Short tutorial on studying module preservation: Preservation of female mouse liver modules in male data

Short tutorial on studying module preservation: Preservation of female mouse liver modules in male data Short tutorial on studying module preservation: Preservation of female mouse liver modules in male data Peter Langfelder and Steve Horvath October 1, 0 Contents 1 Overview 1 1.a Setting up the R session............................................

More information

Preservation of protein-protein interaction networks Simple simulated example

Preservation of protein-protein interaction networks Simple simulated example Preservation of protein-protein interaction networks Simple simulated example Peter Langfelder and Steve Horvath May, 0 Contents Overview.a Setting up the R session............................................

More information

Supplemental Data. Cañas et al. Plant Cell (2017) /tpc

Supplemental Data. Cañas et al. Plant Cell (2017) /tpc Supplemental Method 1. WGCNA script. #Microarray and Trait data load getwd() workingdir = "C:/Users/..." setwd(workingdir) library(wgcna) library(flashclust) options(stringsasfactors = FALSE) femdata =

More information

Clustering using WGCNA

Clustering using WGCNA Clustering using WGCNA Overview: The WGCNA package (in R) uses functions that perform a correlation network analysis of large, high-dimensional data sets (RNAseq datasets). This unbiased approach clusters

More information

Identification of consensus modules in Adenocarcinoma data

Identification of consensus modules in Adenocarcinoma data Identification of consensus modules in Adenocarcinoma data Peter Langfelder and Steve Horvath June 9, 01 Contents 1 Overview 1 Setting up the R session 1 Loading of data Scale-free topology analysis Identification

More information

Supplementary text S6 Comparison studies on simulated data

Supplementary text S6 Comparison studies on simulated data Supplementary text S Comparison studies on simulated data Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath Corresponding author: shorvath@mednet.ucla.edu Overview In this document we illustrate

More information

General instructions:

General instructions: R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied to Female Mouse Liver Microarray Data Jun Dong, Steve Horvath Correspondence: shorvath@mednet.ucla.edu, http://www.ph.ucla.edu/biostat/people/horvath.htm

More information

Meta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes

Meta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes Meta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes Peter Langfelder and Steve Horvath June 27, 2012 Contents 1 Overview 1 2 Setting up the

More information

(1) where, l. denotes the number of nodes to which both i and j are connected, and k is. the number of connections of a node, with.

(1) where, l. denotes the number of nodes to which both i and j are connected, and k is. the number of connections of a node, with. A simulated gene co-expression network to illustrate the use of the topological overlap matrix for module detection Steve Horvath, Mike Oldham Correspondence to shorvath@mednet.ucla.edu Abstract Here we

More information

Clustering. Dick de Ridder 6/10/2018

Clustering. Dick de Ridder 6/10/2018 Clustering Dick de Ridder 6/10/2018 In these exercises, you will continue to work with the Arabidopsis ST vs. HT RNAseq dataset. First, you will select a subset of the data and inspect it; then cluster

More information

Package MODA. January 8, 2019

Package MODA. January 8, 2019 Type Package Package MODA January 8, 2019 Title MODA: MOdule Differential Analysis for weighted gene co-expression network Version 1.8.0 Date 2016-12-16 Author Dong Li, James B. Brown, Luisa Orsini, Zhisong

More information

Analyzing Genomic Data with NOJAH

Analyzing Genomic Data with NOJAH Analyzing Genomic Data with NOJAH TAB A) GENOME WIDE ANALYSIS Step 1: Select the example dataset or upload your own. Two example datasets are available. Genome-Wide TCGA-BRCA Expression datasets and CoMMpass

More information

Package dynamictreecut

Package dynamictreecut Package dynamictreecut November 18, 2013 Version 1.60-2 Date 2013-11-16 Title Methods for detection of clusters in hierarchical clustering dendrograms. Author Peter Langfelder

More information

Meta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes

Meta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes Meta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes Peter Langfelder and Steve Horvath June 26, 2012 Contents 1 Overview 1 2 Setting

More information

Package dynamictreecut

Package dynamictreecut Package dynamictreecut June 13, 2014 Version 1.62 Date 2014-05-07 Title Methods for detection of clusters in hierarchical clustering dendrograms. Author Peter Langfelder and

More information

Package allelematch. R topics documented: February 19, Type Package

Package allelematch. R topics documented: February 19, Type Package Type Package Package allelematch February 19, 2015 Title Identifying unique multilocus genotypes where genotyping error and missing data may be present Version 2.5 Date 2014-09-18 Author Paul Galpern

More information

UNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering

UNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering UNSUPERVISED LEARNING IN R Introduction to hierarchical clustering Hierarchical clustering Number of clusters is not known ahead of time Two kinds: bottom-up and top-down, this course bottom-up Hierarchical

More information

Preliminary Figures for Renormalizing Illumina SNP Cell Line Data

Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

Basics of Plotting Data

Basics of Plotting Data Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based

More information

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs

More information

GenViewer Tutorial / Manual

GenViewer Tutorial / Manual GenViewer Tutorial / Manual Table of Contents Importing Data Files... 2 Configuration File... 2 Primary Data... 4 Primary Data Format:... 4 Connectivity Data... 5 Module Declaration File Format... 5 Module

More information

Package comphclust. February 15, 2013

Package comphclust. February 15, 2013 Package comphclust February 15, 2013 Version 1.0-1 Date 2010-02-27 Title Complementary Hierarchical Clustering Author Gen Nowak and Robert Tibshirani Maintainer Gen Nowak Description

More information

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N. LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I Part Two Introduction to R Programming RStudio November 2016 Written by N.Nilgün Çokça Introduction to R Programming 5 Installing R & RStudio 5 The R Studio

More information

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics Section. Displaying Quantitative Data with Graphs Mrs. Daniel AP Statistics Section. Displaying Quantitative Data with Graphs After this section, you should be able to CONSTRUCT and INTERPRET dotplots,

More information

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0. Type Package Title A package dedicated to questionnaires Version 0.10 Date 2009-06-10 Package EnQuireR February 19, 2015 Author Fournier Gwenaelle, Cadoret Marine, Fournier Olivier, Le Poder Francois,

More information

Graphics - Part III: Basic Graphics Continued

Graphics - Part III: Basic Graphics Continued Graphics - Part III: Basic Graphics Continued Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Highway MPG 20 25 30 35 40 45 50 y^i e i = y i y^i 2000 2500 3000 3500 4000 Car Weight Copyright

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

Microarray Technology (Affymetrix ) and Analysis. Practicals

Microarray Technology (Affymetrix ) and Analysis. Practicals Data Analysis and Modeling Methods Microarray Technology (Affymetrix ) and Analysis Practicals B. Haibe-Kains 1,2 and G. Bontempi 2 1 Unité Microarray, Institut Jules Bordet 2 Machine Learning Group, Université

More information

June 2, 2016 by India Daniels and Neil Carlson Designed for Tableau Public

June 2, 2016 by India Daniels and Neil Carlson Designed for Tableau Public June 2, 2016 by India Daniels and Neil Carlson Designed for Tableau Public You can view the data visualization online by clicking on this link. Note: The workbook is best viewed on a large HD-compatible

More information

Data Science and Statistics in Research: unlocking the power of your data Session 3.4: Clustering

Data Science and Statistics in Research: unlocking the power of your data Session 3.4: Clustering Data Science and Statistics in Research: unlocking the power of your data Session 3.4: Clustering 1/ 1 OUTLINE 2/ 1 Overview 3/ 1 CLUSTERING Clustering is a statistical technique which creates groupings

More information

Demo yeast mutant analysis

Demo yeast mutant analysis Demo yeast mutant analysis Jean-Yves Sgro February 20, 2018 Contents 1 Analysis of yeast growth data 1 1.1 Set working directory........................................ 1 1.2 List all files in directory.......................................

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Exploring cdna Data Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Analysis of Genomic and Proteomic Data. Practicals. Benjamin Haibe-Kains. February 17, 2005

Analysis of Genomic and Proteomic Data. Practicals. Benjamin Haibe-Kains. February 17, 2005 Analysis of Genomic and Proteomic Data Affymetrix c Technology and Preprocessing Methods Practicals Benjamin Haibe-Kains February 17, 2005 1 R and Bioconductor You must have installed R (available from

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis, Heidelberg, March 2005 http://compdiag.molgen.mpg.de/ngfn/pma2005mar.shtml The following

More information

AA BB CC DD EE. Introduction to Graphics in R

AA BB CC DD EE. Introduction to Graphics in R Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat

More information

Canadian Bioinforma,cs Workshops.

Canadian Bioinforma,cs Workshops. Canadian Bioinforma,cs Workshops www.bioinforma,cs.ca Module #: Title of Module 2 Modified from Richard De Borja, Cindy Yao and Florence Cavalli R Review Objectives To review the basic commands in R To

More information

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1 SPSS IBMSPSSSTATL1P IBMSPSSSTATL1P: IBM SPSS Statistics Level 1 Version: 4.4 QUESTION NO: 1 Which statement concerning IBM SPSS Statistics application windows is correct? A. At least one Data Editor window

More information

Package comphclust. May 4, 2017

Package comphclust. May 4, 2017 Version 1.0-3 Date 2017-05-04 Title Complementary Hierarchical Clustering Imports graphics, stats Package comphclust May 4, 2017 Description Performs the complementary hierarchical clustering procedure

More information

Introduction to R Reading, writing and exploring data

Introduction to R Reading, writing and exploring data Introduction to R Reading, writing and exploring data R-peer-group QUB February 12, 2013 R-peer-group (QUB) Session 2 February 12, 2013 1 / 26 Session outline Review of last weeks exercise Introduction

More information

Practical 2: Plotting

Practical 2: Plotting Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory

More information

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont Quick introduction to descriptive statistics and graphs in R Commander Written by: Robin Beaumont e-mail: robin@organplayers.co.uk http://www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html Date

More information

Package OmicCircos. R topics documented: November 17, Version Date

Package OmicCircos. R topics documented: November 17, Version Date Version 1.16.0 Date 2015-02-23 Package OmicCircos November 17, 2017 Title High-quality circular visualization of omics data Author Maintainer Ying Hu biocviews Visualization,Statistics,Annotation

More information

Package KEGGprofile. February 10, 2018

Package KEGGprofile. February 10, 2018 Type Package Package KEGGprofile February 10, 2018 Title An annotation and visualization package for multi-types and multi-groups expression data in KEGG pathway Version 1.20.0 Date 2017-10-30 Author Shilin

More information

Lab 1 Introduction to R

Lab 1 Introduction to R Lab 1 Introduction to R Date: August 23, 2011 Assignment and Report Due Date: August 30, 2011 Goal: The purpose of this lab is to get R running on your machines and to get you familiar with the basics

More information

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Data input & output Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University June 2012 1. Working directories 2. Loading data 3. Strings and factors

More information

GWAS Exercises 3 - GWAS with a Quantiative Trait

GWAS Exercises 3 - GWAS with a Quantiative Trait GWAS Exercises 3 - GWAS with a Quantiative Trait Peter Castaldi January 28, 2013 PLINK can also test for genetic associations with a quantitative trait (i.e. a continuous variable). In this exercise, we

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

Hierarchical Clustering Tutorial

Hierarchical Clustering Tutorial Hierarchical Clustering Tutorial Ignacio Gonzalez, Sophie Lamarre, Sarah Maman, Luc Jouneau CATI Bios4Biol - Statistical group March 2017 To know about clustering There are two main methods: Classification

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

Package Daim. February 15, 2013

Package Daim. February 15, 2013 Package Daim February 15, 2013 Version 1.0.0 Title Diagnostic accuracy of classification models. Author Sergej Potapov, Werner Adler and Berthold Lausen. Several functions for evaluating the accuracy of

More information

Package varistran. July 25, 2016

Package varistran. July 25, 2016 Package varistran July 25, 2016 Title Variance Stabilizing Transformations appropriate for RNA-Seq expression data Transform RNA-Seq count data so that variance due to biological noise plus its count-based

More information

Tutorial (Beginner level): Orthomosaic and DEM Generation with Agisoft PhotoScan Pro 1.3 (with Ground Control Points)

Tutorial (Beginner level): Orthomosaic and DEM Generation with Agisoft PhotoScan Pro 1.3 (with Ground Control Points) Tutorial (Beginner level): Orthomosaic and DEM Generation with Agisoft PhotoScan Pro 1.3 (with Ground Control Points) Overview Agisoft PhotoScan Professional allows to generate georeferenced dense point

More information

Introduction to SPSS Edward A. Greenberg, PhD

Introduction to SPSS Edward A. Greenberg, PhD Introduction to SPSS Edward A. Greenberg, PhD ASU HEALTH SOLUTIONS DATA LAB JANUARY 7, 2013 Files for this workshop Files can be downloaded from: http://www.public.asu.edu/~eagle/spss or (with less typing):

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Gene Survey: FAQ. Gene Survey: FAQ Tod Casasent DRAFT

Gene Survey: FAQ. Gene Survey: FAQ Tod Casasent DRAFT Gene Survey: FAQ Tod Casasent 2016-02-22-1245 DRAFT 1 What is this document? This document is intended for use by internal and external users of the Gene Survey package, results, and output. This document

More information

Upper left area. It is R Script. Here you will write the code. 1. The goal of this problem set is to learn how to use HP filter in R.

Upper left area. It is R Script. Here you will write the code. 1. The goal of this problem set is to learn how to use HP filter in R. Instructions Setting up R Follow these steps if you are new with R: 1. Install R using the following link http://cran.us.r-project.org/ Make sure to select the version compatible with your operative system.

More information

Package gibbs.met. February 19, 2015

Package gibbs.met. February 19, 2015 Version 1.1-3 Title Naive Gibbs Sampling with Metropolis Steps Author Longhai Li Package gibbs.met February 19, 2015 Maintainer Longhai Li Depends R (>=

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial

More information

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project Data Content: Example: Who chats on-line most frequently? This Technology Use dataset in

More information

No. of blue jelly beans No. of bags

No. of blue jelly beans No. of bags Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in

More information

A brief introduction to R

A brief introduction to R A brief introduction to R Cavan Reilly September 29, 2017 Table of contents Background R objects Operations on objects Factors Input and Output Figures Missing Data Random Numbers Control structures Background

More information

Statistical Research Consultants Bangladesh (SRCBD) Testing for Normality using SPSS

Statistical Research Consultants Bangladesh (SRCBD)   Testing for Normality using SPSS Testing for Normality using SPSS An assessment of the normality of data is a prerequisite for many statistical tests because normal data is an underlying assumption in parametric testing. There are two

More information

Introduction to R for Epidemiologists

Introduction to R for Epidemiologists Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression

More information

R-Programming Fundamentals for Business Students Cluster Analysis, Dendrograms, Word Cloud Clusters

R-Programming Fundamentals for Business Students Cluster Analysis, Dendrograms, Word Cloud Clusters R-Programming Fundamentals for Business Students Cluster Analysis, Dendrograms, Word Cloud Clusters Nick V. Flor, University of New Mexico (nickflor@unm.edu) Assumptions. This tutorial assumes (1) that

More information

The aplpack Package. April 15, 2006

The aplpack Package. April 15, 2006 The aplpack Package April 15, 2006 Type Package Title Another Plot PACKage: stem.leaf, bagplot, faces, spin3r,... Version 1.0 Date 2006-04-04 Author Peter Wolf, Uni Bielefeld Maintainer Peter Wolf

More information

haplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous

haplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous haploscore Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous Charles M Rowland, David E Tines, and Daniel J Schaid Mayo Clinic Rochester, MN E-mail contact: rowland@mayoedu

More information

Package TROM. August 29, 2016

Package TROM. August 29, 2016 Type Package Title Transcriptome Overlap Measure Version 1.2 Date 2016-08-29 Package TROM August 29, 2016 Author Jingyi Jessica Li, Wei Vivian Li Maintainer Jingyi Jessica

More information

CIND123 Module 6.2 Screen Capture

CIND123 Module 6.2 Screen Capture CIND123 Module 6.2 Screen Capture Hello, everyone. In this segment, we will discuss the basic plottings in R. Mainly; we will see line charts, bar charts, histograms, pie charts, and dot charts. Here is

More information

Plotting Segment Calls From SNP Assay

Plotting Segment Calls From SNP Assay Plotting Segment Calls From SNP Assay Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

HIERARCHICAL clustering analysis (or HCA) is an

HIERARCHICAL clustering analysis (or HCA) is an A Data-Driven Approach to Estimating the Number of Clusters in Hierarchical Clustering* Antoine E. Zambelli arxiv:1608.04700v1 [q-bio.qm] 16 Aug 2016 Abstract We propose two new methods for estimating

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

This tutorial is a similar analysis on the GBM data, but only with the 500 most biologically significant genes with respect to the survival time.

This tutorial is a similar analysis on the GBM data, but only with the 500 most biologically significant genes with respect to the survival time. R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied to Brain Cancer Microarray Data Jun Dong, Steve Horvath Correspondence: shorvath@mednet.ucla.edu, http://www.ph.ucla.edu/biostat/people/horvath.htm

More information

Introduction for heatmap3 package

Introduction for heatmap3 package Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are

More information

Computing with large data sets

Computing with large data sets Computing with large data sets Richard Bonneau, spring 2009 Lecture 8(week 5): clustering 1 clustering Clustering: a diverse methods for discovering groupings in unlabeled data Because these methods don

More information

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0 Title D3 Scatterplot Matrices Version 0.1.0 Package pairsd3 August 29, 2016 Creates an interactive scatterplot matrix using the D3 JavaScript library. See for more information on D3.

More information

Wolf EMR SMART Forms Course workbook

Wolf EMR SMART Forms Course workbook SMART Forms Workbook.book Page 1 Monday, October 26, 2015 11:44 AM Wolf EMR SMART Forms Course workbook Wolf EMR v2015.1.7 Issue 01.02 SMART Forms Workbook.book Page 2 Monday, October 26, 2015 11:44 AM

More information

Stat 428 Autumn 2006 Homework 2 Solutions

Stat 428 Autumn 2006 Homework 2 Solutions Section 6.3 (5, 8) 6.3.5 Here is the Minitab output for the service time data set. Descriptive Statistics: Service Times Service Times 0 69.35 1.24 67.88 17.59 28.00 61.00 66.00 Variable Q3 Maximum Service

More information

Plotting: An Iterative Process

Plotting: An Iterative Process Plotting: An Iterative Process Plotting is an iterative process. First we find a way to represent the data that focusses on the important aspects of the data. What is considered an important aspect may

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005 Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber June 7, 00 The following exercise will guide you through the first steps of a spotted cdna microarray analysis.

More information

1 Introduction. 1.1 What is Statistics?

1 Introduction. 1.1 What is Statistics? 1 Introduction 1.1 What is Statistics? MATH1015 Biostatistics Week 1 Statistics is a scientific study of numerical data based on natural phenomena. It is also the science of collecting, organising, interpreting

More information

MAIL MERGE DIRECTORY USE THE MAIL MERGE WIZARD

MAIL MERGE DIRECTORY USE THE MAIL MERGE WIZARD MAIL MERGE DIRECTORY USE THE MAIL MERGE WIZARD When working with the Mail Merge feature, it is possible to create several types of documents, such as directories. A directory is a list of the data in the

More information

Package gibbs.met documentation

Package gibbs.met documentation Version 1.1-2 Package gibbs.met documentation Title Naive Gibbs Sampling with Metropolis Steps Author Longhai Li of March 26, 2008 Maintainer Longhai Li

More information

Phone: Fax: Directions for setting up MARCO Insert Item #A-6LI 3 H x 4 W

Phone: Fax: Directions for setting up MARCO Insert Item #A-6LI 3 H x 4 W Phone: 1.866.289.9909 Fax: 1.866.545.5672 www.marcopromotionalproducts.com Directions for setting up MARCO Insert Item #A-6LI 3 H x 4 W Word Perfect Directions Step 1. Open Word Perfect Step 2. Click Format

More information

RhinoCFD Tutorial. Flow Past a Sphere

RhinoCFD Tutorial. Flow Past a Sphere RhinoCFD Tutorial Flow Past a Sphere RhinoCFD Ocial document produced by CHAM September 26, 2017 Introduction Flow Past a Sphere This tutorial will describe a simple calculation of ow around a sphere and

More information

#1#set Working directory #2# Download packages: source(" bioclite("affy") library (affy)

#1#set Working directory #2# Download packages: source(  bioclite(affy) library (affy) #1#set Working directory #2# Download packages: source("http://bioconductor.org/bioclite.r") bioclite("affy") library (affy) #3# Read the CEL files: Med

More information

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first Why use R? Introduction to R: Using R for statistics ti ti and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ To perform inferential statistics

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Using R for statistics and data analysis

Using R for statistics and data analysis Introduction ti to R: Using R for statistics and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ Why use R? To perform inferential statistics (e.g.,

More information

R/BioC Exercises & Answers: Unsupervised methods

R/BioC Exercises & Answers: Unsupervised methods R/BioC Exercises & Answers: Unsupervised methods Perry Moerland April 20, 2010 Z Information on how to log on to a PC in the exercise room and the UNIX server can be found here: http://bioinformaticslaboratory.nl/twiki/bin/view/biolab/educationbioinformaticsii.

More information

Practice for Learning R and Learning Latex

Practice for Learning R and Learning Latex Practice for Learning R and Learning Latex Jennifer Pan August, 2011 Latex Environments A) Try to create the following equations: 1. 5+6 α = β2 2. P r( 1.96 Z 1.96) = 0.95 ( ) ( ) sy 1 r 2 3. ˆβx = r xy

More information

Hierarchical clustering

Hierarchical clustering Aprendizagem Automática Hierarchical clustering Ludwig Krippahl Hierarchical clustering Summary Hierarchical Clustering Agglomerative Clustering Divisive Clustering Clustering Features 1 Aprendizagem Automática

More information

Package PropClust. September 15, 2018

Package PropClust. September 15, 2018 Type Package Title Propensity Clustering and Decomposition Version 1.4-6 Date 2018-09-12 Package PropClust September 15, 2018 Author John Michael O Ranola, Kenneth Lange, Steve Horvath, Peter Langfelder

More information

Package hbm. February 20, 2015

Package hbm. February 20, 2015 Type Package Title Hierarchical Block Matrix Analysis Version 1.0 Date 2015-01-25 Author Maintainer Package hbm February 20, 2015 A package for building hierarchical block matrices from

More information