Preservation of protein-protein interaction networks Simple simulated example

Size: px
Start display at page:

Download "Preservation of protein-protein interaction networks Simple simulated example"

Transcription

1 Preservation of protein-protein interaction networks Simple simulated example Peter Langfelder and Steve Horvath May, 0 Contents Overview.a Setting up the R session Calculation of module preservation Analysis of module preservation statistics A Simulation of PPI networks Overview This document contains a simple illustration of the use of the function modulepreservation [] to study the preservation of complexes in protein-protein interaction (PPI) networks. We simulate two PPI networks. Each network contains complexes with sizes between and 0 proteins. Five of the complexes, labeled, are preserved between the two networks, while the other five complexes (labeled -) are not preserved. We encourage readers unfamiliar with any of the functions used in this tutorial to type, in an active R session, help(functionname) (replace functionname with the actual name of the function) to get a detailed description of what the functions does, what the input arguments mean, and what is the output..a Setting up the R session After starting R we execute a few commands to set the working directory and load the requisite packages: # Display the current working directory getwd(); # If necessary, change the path below to the directory where the data files are stored. # "." means current directory. On Windows use a forward slash / instead of the usual \. workingdir = "."; setwd(workingdir); # Load the package library(wgcna); # The following setting is important, do not omit. options(stringsasfactors = FALSE);

2 Calculation of module preservation We use simulated PPI networks that are generated using code provided in Appendix A. For simplicity, we simply load the networks saved there. load(file = "simulatedppinetworks.rdata"); The above command loads two object, PPInetwork and PPInetwork. Each of them is a list with two components: the component adjacency contains the network adjacency matrix, and the component labels contains the module (or protein complex) labels. The modules are labeled by numbers. Proteins that are not part of any complex carry the label 0. To get a basic idea of how big the network is, we can use dim(ppinetwork$adjacency) which will tell us that the network contains 0 proteins. Also note that the columns of the adjacency matrix must carry protein names. In our example we named the simulated proteins simply "Protein." "Protein.0": colnames(ppinetwork$adjacency) Column names for the adjacency matrices are important because they allow the module preservation function to match proteins between reference and test networks even though here we use the same proteins in the same order, in practice this may not be the case. We next create multi-adjacency and module multi-labels. These variables are lists with one component per data set. In this example we study two data sets, a reference set () and a test set (). Note that the components of the list must be named. The names are used as identifiers for the data set. multiadj = list( network = list(data = PPInetwork$adjacency), network = list(data = PPInetwork$adjacency)); multilabels = list(network = PPInetwork$labels); We now call the modulepreservation function to calculate network module preservation statistics. This calculation may take up to a few hours, depending on the available computational speed. mp = modulepreservation(multiadj, multilabels, dataisexpr = FALSE, referencenetworks =, restrictsummaryforgeneralnetworks = FALSE, npermutations = 0, calculatecor.kimall = FALSE, verbose = ); # Save the results save(mp, file = "mp.rdata"); We saved the results so the calculation only need to be run once. The results can be re-loaded using the following command: load(file = "mp.rdata"); Analysis of module preservation statistics We now isolate the medianrank and the Z statistics and plot them as a function of module size. stats = cbind(medianrank = mp$preservation$observed[[]][[]]$medianrank.pres[-c(,)], mp$preservation$z[[]][[]][-c(,), -]); modulesizes = mp$preservation$z[[]][[]][-c(,), ]; # Order rows by module label order = order(as.numeric(rownames(stats))) stats = stats[order, ] modulesizes = modulesizes[order] labels = as.numeric(rownames(stats)) # Indicate preserved modules by red color and non-preserved by black color preserved = c(:);

3 presind = match(preserved, labels); prescolor = rep(, length(labels)); prescolor[presind] = ; # Open a suitably sized graphics window or, alternatively, open a pdf file to hold the plot sizegrwindow(,); #pdf(file=spaste("plots/ppisimulation-halfpreserved"), wi=, he=) # Set sectioning and margins par(mfrow = c(,)) par(mar = c(.,.,, 0.)) par(mgp = c(.0, 0., 0)) # Plot the individual statistics for (s in :ncol(stats)) min = min(stats[, s], na.rm = TRUE); max = max(stats[, s], na.rm = TRUE); if (s > ) if (min > -max/) min = -max/; else tmp = min; min = max; max = tmp; plot(modulesizes, stats[, s], main = colnames(stats)[s], ylab = colnames(stats)[s], type = "n", xlab = "", cex.main =, ylim = c(min, max)) text(modulesizes, stats[, s], labels = labels, col = prescolor); box = par("usr"); if (s==) legend(x = box[], y = (max+min)/, xjust =, yjust = 0., legend = c("preserved", "Non-preserved"), fill = c(,), cex = 0.) if (s>) abline(h=0) abline(h=, col = "blue", lty = ); abline(h=, col = "darkgreen", lty = ); # If plotting into a file, close it. dev.off(); The resulting plot is shown in Figure. We note that in this example the composite statistics medianrank and Z summary work best at separating the preserved and non-preserved modules. While medianrank appears largely independent of module size, the Z statistics for preserved modules show a marked dependence on module size. This agrees with the intuition that it is more significant to observe a preservation of a pattern among 0 proteins than among proteins.

4 medianrank Z.propVarExplained Z.cor.kIM 0 0 medianrank Preserved Non preserved Z.propVarExplained Z.cor.kIM Zsummary Z.meanKIM Z.cor.kME Zsummary Z.meanKIM Z.cor.kME Zdensity Z.meanAdj Z.cor.adj Zdensity Z.meanAdj Z.cor.adj Zconnectivity Z.meanClusterCoeff Z.cor.clusterCoeff Zconnectivity Z.meanClusterCoeff Z.cor.clusterCoeff Figure : Module preservation statistics of simulated modules in this study. Each plot shows one of the preservation statistics (indicated in the title) as a function of the module size. Modules are labeled by their numeric labels; red color denotes preserved and black non-preserved modules. The blue and green dashed lines denote the thresholds Z = and Z =. The statistics medianrank and Z summary do the best job of distinguishing the preserved and non-preserved modules in this study.

5 A Simulation of PPI networks Here we generate the reference and test networks used in this tutorial. We start by defining two functions, one for simulating a protein complex (a group of densely interconnected proteins), and for simulating a network consisting of several complexes. simulatecomplex = function(nproteins, minscaledk, maxscaledk) k = seq(from = maxscaledk, to=minscaledk, length.out = nproteins) * nproteins; K = sum(k); adjacency = matrix(, nproteins, nproteins); pmat = matrix(na, nproteins, nproteins) for (i in :(nproteins-)) for (j in (i+):nproteins) p = k[i]*k[j] / (K - (k[i] + k[j])/); if (p >) p = ; pmat[i,j] = pmat[j,i] = p; adjacency[i,j] = adjacency[j,i] = sample(c(0,), size =, prob = c(-p, p)) adjacency; simulateproteinnetwork = function( complexsizes, nsigletons, minscaledk = 0., maxscaledk = 0., propmissinglinks = 0, propintercomplexlinks = 0) nproteins = sum(complexsizes) + nsingletons; adjacency = matrix(0, nproteins, nproteins); diag(adjacency) = ; labels = rep(0, nproteins); starts = c(, cumsum(complexsizes)+); ends = c(cumsum(complexsizes), nproteins); for (c in :ncomplexes) st = starts[c]; en = ends[c]; adj.complex = simulatecomplex(complexsizes[c], minscaledk, maxscaledk); adj.dst = as.dist(adj.complex); leaveout = sample(c(false, TRUE), size = length(adj.dst), prob = c(-propmissinglinks, propmissinglinks), replace = TRUE); adj.dst[leaveout] = 0; adj.complex = as.matrix(adj.dst); diag(adj.complex) = ; adjacency[st:en, st:en] = adj.complex; labels[st:en] = c; for (c in :(ncomplexes+)) if (c <= ncomplexes) cx = c + else

6 cx = c; for (c in cx:(ncomplexes + )) st = starts[c]; en = ends[c]; st = starts[c]; en = ends[c]; n = en - st + ; n = en - st + ; interadj = sample(c(0, ), size = n*n, prob = c(-propintercomplexlinks, propintercomplexlinks), replace = TRUE); dim(interadj) = c(n, n); if (c==c) interadj = as.matrix(as.dist(interadj)); diag(interadj) = ; adjacency[st:en, st:en] = interadj; adjacency[st:en, st:en] = t(interadj); colnames(adjacency) = spaste("protein.", c(:nproteins)); rownames(adjacency) = spaste("protein.", c(:nproteins)); list(adjacency = adjacency, labels = labels); We next define basic paramaters of the simulation. ncomplexes = ; npreserved = ; preserved = c(:npreserved) nnonpreserved = ncomplexes - npreserved; nonpreserved = c(:ncomplexes)[-preserved]; complexsizes = seq(from = 0, to =, length.out = npreserved); complexsizes = rep(complexsizes, ); nsingletons = 0; We call the simulation function twice, to generate two separate networks with the same complex structure, but details of the connections within complexes differ a bit. For simplicity we do not simulate any connections between proteins in different complexes although the above functions support it. set.seed(); PPInetwork = simulateproteinnetwork(complexsizes, nsingletons); PPInetwork = simulateproteinnetwork(complexsizes, nsingletons); The networks can be visualized, for example, using the heatmap function: sizegrwindow(,); #pdf(file = "Plots/networkImage.pdf", wi=, he=); image(ppinetwork$adjacency, xaxt = "none", yaxt = "none") dev.off(); The plot is shown in Figure. The network image verifies that we have simulated complexes of different sizes. We now permute the proteins in the non-preserved complexes in the test data set. starts = c(, cumsum(complexsizes)+); ends = cumsum(complexsizes); scramble = starts[ min(nonpreserved)]:ends[max(nonpreserved)]; neworder = sample(scramble); PPInetwork$adjacency[scramble, scramble] = PPInetwork$adjacency[newOrder, neworder];

7 Figure : Image of the simulated reference PPI network. Each row and column represents one protein; red color means not connected and white color means connected. Squares along the diagonal with dense connections correspond to simulated complexes. PPInetwork$labels[scramble] = PPInetwork$labels[newOrder]; Lastly, we save the networks for future use. save(ppinetwork, PPInetwork, file = "simulatedppinetworks.rdata"); The resulting file is used as input at the start of this tutorial.

8 References [] Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath. Is my network module preserved and reproducible? PLoS Comput Biol, ():e0, 0 0.

Simulation studies of module preservation: Simulation study of weak module preservation

Simulation studies of module preservation: Simulation study of weak module preservation Simulation studies of module preservation: Simulation study of weak module preservation Peter Langfelder and Steve Horvath October 25, 2010 Contents 1 Overview 1 1.a Setting up the R session............................................

More information

Short tutorial on studying module preservation: Preservation of female mouse liver modules in male data

Short tutorial on studying module preservation: Preservation of female mouse liver modules in male data Short tutorial on studying module preservation: Preservation of female mouse liver modules in male data Peter Langfelder and Steve Horvath October 1, 0 Contents 1 Overview 1 1.a Setting up the R session............................................

More information

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 2.b Step-by-step network construction and module detection Peter Langfelder and Steve

More information

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 1. Data input and cleaning Peter Langfelder and Steve Horvath February 13, 2016 Contents

More information

Tutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches

Tutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches Tutorial for the WGCNA package for R: III. Using simulated data to evaluate different module detection methods and gene screening approaches 8. Visualization of gene networks Steve Horvath and Peter Langfelder

More information

Supplementary text S6 Comparison studies on simulated data

Supplementary text S6 Comparison studies on simulated data Supplementary text S Comparison studies on simulated data Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath Corresponding author: shorvath@mednet.ucla.edu Overview In this document we illustrate

More information

AA BB CC DD EE. Introduction to Graphics in R

AA BB CC DD EE. Introduction to Graphics in R Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat

More information

Statistical Programming Camp: An Introduction to R

Statistical Programming Camp: An Introduction to R Statistical Programming Camp: An Introduction to R Handout 3: Data Manipulation and Summarizing Univariate Data Fox Chapters 1-3, 7-8 In this handout, we cover the following new materials: ˆ Using logical

More information

Supplemental Data. Cañas et al. Plant Cell (2017) /tpc

Supplemental Data. Cañas et al. Plant Cell (2017) /tpc Supplemental Method 1. WGCNA script. #Microarray and Trait data load getwd() workingdir = "C:/Users/..." setwd(workingdir) library(wgcna) library(flashclust) options(stringsasfactors = FALSE) femdata =

More information

Graphics - Part III: Basic Graphics Continued

Graphics - Part III: Basic Graphics Continued Graphics - Part III: Basic Graphics Continued Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Highway MPG 20 25 30 35 40 45 50 y^i e i = y i y^i 2000 2500 3000 3500 4000 Car Weight Copyright

More information

Practice for Learning R and Learning Latex

Practice for Learning R and Learning Latex Practice for Learning R and Learning Latex Jennifer Pan August, 2011 Latex Environments A) Try to create the following equations: 1. 5+6 α = β2 2. P r( 1.96 Z 1.96) = 0.95 ( ) ( ) sy 1 r 2 3. ˆβx = r xy

More information

Clustering using WGCNA

Clustering using WGCNA Clustering using WGCNA Overview: The WGCNA package (in R) uses functions that perform a correlation network analysis of large, high-dimensional data sets (RNAseq datasets). This unbiased approach clusters

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis, Heidelberg, March 2005 http://compdiag.molgen.mpg.de/ngfn/pma2005mar.shtml The following

More information

Graphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

Graphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Base Graphics 2 Graphics in R Traditional

More information

Meta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes

Meta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes Meta-analysis of aging methylation data sets Validation success of various meta-analysis methods in selecting genes Peter Langfelder and Steve Horvath June 27, 2012 Contents 1 Overview 1 2 Setting up the

More information

Matrix algebra. Basics

Matrix algebra. Basics Matrix.1 Matrix algebra Matrix algebra is very prevalently used in Statistics because it provides representations of models and computations in a much simpler manner than without its use. The purpose of

More information

Module 10. Data Visualization. Andrew Jaffe Instructor

Module 10. Data Visualization. Andrew Jaffe Instructor Module 10 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots on Wednesday, but we are going to expand the ability to customize these basic graphics first. 2/37 But first...

More information

Introduction to R for Epidemiologists

Introduction to R for Epidemiologists Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression

More information

Graph tool instructions and R code

Graph tool instructions and R code Graph tool instructions and R code 1) Prepare data: tab-delimited format Data need to be inputted in a tab-delimited format. This can be easily achieved by preparing the data in a spread sheet program

More information

Package hbm. February 20, 2015

Package hbm. February 20, 2015 Type Package Title Hierarchical Block Matrix Analysis Version 1.0 Date 2015-01-25 Author Maintainer Package hbm February 20, 2015 A package for building hierarchical block matrices from

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Exploring cdna Data Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Lab 1 Introduction to R

Lab 1 Introduction to R Lab 1 Introduction to R Date: August 23, 2011 Assignment and Report Due Date: August 30, 2011 Goal: The purpose of this lab is to get R running on your machines and to get you familiar with the basics

More information

Statistical Programming with R

Statistical Programming with R Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan biqelan@iugaza.edu.ps Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester

More information

Shrinkage of logarithmic fold changes

Shrinkage of logarithmic fold changes Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs

More information

Package JBTools. R topics documented: June 2, 2015

Package JBTools. R topics documented: June 2, 2015 Package JBTools June 2, 2015 Title Misc Small Tools and Helper Functions for Other Code of J. Buttlar Version 0.7.2.9 Date 2015-05-20 Author Maintainer Collection of several

More information

Introduction for heatmap3 package

Introduction for heatmap3 package Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are

More information

Package lvm4net. R topics documented: August 29, Title Latent Variable Models for Networks

Package lvm4net. R topics documented: August 29, Title Latent Variable Models for Networks Title Latent Variable Models for Networks Package lvm4net August 29, 2016 Latent variable models for network data using fast inferential procedures. Version 0.2 Depends R (>= 3.1.1), MASS, ergm, network,

More information

Linkage analysis with paramlink Appendix: Running MERLIN from paramlink

Linkage analysis with paramlink Appendix: Running MERLIN from paramlink Linkage analysis with paramlink Appendix: Running MERLIN from paramlink Magnus Dehli Vigeland 1 Introduction While multipoint analysis is not implemented in paramlink, a convenient wrapper for MERLIN (arguably

More information

jackstraw: Statistical Inference using Latent Variables

jackstraw: Statistical Inference using Latent Variables jackstraw: Statistical Inference using Latent Variables Neo Christopher Chung August 7, 2018 1 Introduction This is a vignette for the jackstraw package, which performs association tests between variables

More information

(1) where, l. denotes the number of nodes to which both i and j are connected, and k is. the number of connections of a node, with.

(1) where, l. denotes the number of nodes to which both i and j are connected, and k is. the number of connections of a node, with. A simulated gene co-expression network to illustrate the use of the topological overlap matrix for module detection Steve Horvath, Mike Oldham Correspondence to shorvath@mednet.ucla.edu Abstract Here we

More information

Package HMRFBayesHiC

Package HMRFBayesHiC Package HMRFBayesHiC February 3, 2015 Type Package Title HMRFBayesHiC conduct Hidden Markov Random Field (HMRF) Bayes Peak Calling Method on HiC Data Version 1.0 Date 2015-01-30 Author Zheng Xu Maintainer

More information

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0. Type Package Title A package dedicated to questionnaires Version 0.10 Date 2009-06-10 Package EnQuireR February 19, 2015 Author Fournier Gwenaelle, Cadoret Marine, Fournier Olivier, Le Poder Francois,

More information

Preliminary Figures for Renormalizing Illumina SNP Cell Line Data

Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

Package sciplot. February 15, 2013

Package sciplot. February 15, 2013 Package sciplot February 15, 2013 Version 1.1-0 Title Scientific Graphing Functions for Factorial Designs Author Manuel Morales , with code developed by the R Development Core Team

More information

Plotting Complex Figures Using R. Simon Andrews v

Plotting Complex Figures Using R. Simon Andrews v Plotting Complex Figures Using R Simon Andrews simon.andrews@babraham.ac.uk v2017-11 The R Painters Model Plot area Base plot Overlays Core Graph Types Local options to change a specific plot Global options

More information

Plotting Segment Calls From SNP Assay

Plotting Segment Calls From SNP Assay Plotting Segment Calls From SNP Assay Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

INTRODUCTION TO R. Basic Graphics

INTRODUCTION TO R. Basic Graphics INTRODUCTION TO R Basic Graphics Graphics in R Create plots with code Replication and modification easy Reproducibility! graphics package ggplot2, ggvis, lattice graphics package Many functions plot()

More information

Assignment 4: Pimp my streamer

Assignment 4: Pimp my streamer Assignment 4: Pimp my streamer In this assignment you will clean and reorganize your code to produce an efficient Twitter streamer. NEW: 200% more emojis inside (+ native encodings) For the later part

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 53 INTRO. TO DATA MINING Locality Sensitive Hashing (LSH) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan Parthasarathy @OSU MMDS Secs. 3.-3.. Slides

More information

R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website:

R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website: Introduction to R R R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website: http://www.r-project.org/ Code Editor: http://rstudio.org/

More information

CIND123 Module 6.2 Screen Capture

CIND123 Module 6.2 Screen Capture CIND123 Module 6.2 Screen Capture Hello, everyone. In this segment, we will discuss the basic plottings in R. Mainly; we will see line charts, bar charts, histograms, pie charts, and dot charts. Here is

More information

Package rafalib. R topics documented: August 29, Version 1.0.0

Package rafalib. R topics documented: August 29, Version 1.0.0 Version 1.0.0 Package rafalib August 29, 2016 Title Convenience Functions for Routine Data Eploration A series of shortcuts for routine tasks originally developed by Rafael A. Irizarry to facilitate data

More information

Package pandar. April 30, 2018

Package pandar. April 30, 2018 Title PANDA Algorithm Version 1.11.0 Package pandar April 30, 2018 Author Dan Schlauch, Joseph N. Paulson, Albert Young, John Quackenbush, Kimberly Glass Maintainer Joseph N. Paulson ,

More information

## For detailed description of RF clustering theory and algorithm, ## please consult the following references.

## For detailed description of RF clustering theory and algorithm, ## please consult the following references. ###################################################### ## Random Forest Clustering Tutorial ## ## ## ## Copyright 2005 Tao Shi, Steve Horvath ## ## ## ## emails: shidaxia@yahoo.com (Tao Shi) ## ## shorvath@mednet.ucla.edu

More information

Types of Plotting Functions. Managing graphics devices. Further High-level Plotting Functions. The plot() Function

Types of Plotting Functions. Managing graphics devices. Further High-level Plotting Functions. The plot() Function 3 / 23 5 / 23 Outline The R Statistical Environment R Graphics Peter Dalgaard Department of Biostatistics University of Copenhagen January 16, 29 1 / 23 2 / 23 Overview Standard R Graphics The standard

More information

Main Results. Kevin R, Coombes. 10 September 2011

Main Results. Kevin R, Coombes. 10 September 2011 Main Results Kevin R, Coombes 10 September 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives.................................. 1 1.2

More information

Plotting: An Iterative Process

Plotting: An Iterative Process Plotting: An Iterative Process Plotting is an iterative process. First we find a way to represent the data that focusses on the important aspects of the data. What is considered an important aspect may

More information

Package areaplot. October 18, 2017

Package areaplot. October 18, 2017 Version 1.2-0 Date 2017-10-18 Package areaplot October 18, 2017 Title Plot Stacked Areas and Confidence Bands as Filled Polygons Imports graphics, grdevices, stats Suggests MASS Description Plot stacked

More information

STATISTICAL LABORATORY, April 30th, 2010 BIVARIATE PROBABILITY DISTRIBUTIONS

STATISTICAL LABORATORY, April 30th, 2010 BIVARIATE PROBABILITY DISTRIBUTIONS STATISTICAL LABORATORY, April 3th, 21 BIVARIATE PROBABILITY DISTRIBUTIONS Mario Romanazzi 1 MULTINOMIAL DISTRIBUTION Ex1 Three players play 1 independent rounds of a game, and each player has probability

More information

Meta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes

Meta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes Meta-analysis of lung cancer expression data sets Validation success of various meta-analysis methods in selecting genes Peter Langfelder and Steve Horvath June 26, 2012 Contents 1 Overview 1 2 Setting

More information

##Dataset summaries - number of specimens, samples and taxa sum(gcp) dim(gcp) sum(col) dim(col)

##Dataset summaries - number of specimens, samples and taxa sum(gcp) dim(gcp) sum(col) dim(col) library(vegan) ##Read in data files and sample info files for GCP and Colombia (samples in rows) gcp

More information

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0 Title D3 Scatterplot Matrices Version 0.1.0 Package pairsd3 August 29, 2016 Creates an interactive scatterplot matrix using the D3 JavaScript library. See for more information on D3.

More information

Tutorial script for whole-cell MALDI-TOF analysis

Tutorial script for whole-cell MALDI-TOF analysis Tutorial script for whole-cell MALDI-TOF analysis Julien Textoris June 19, 2013 Contents 1 Required libraries 2 2 Data loading 2 3 Spectrum visualization and pre-processing 4 4 Analysis and comparison

More information

limma: A brief introduction to R

limma: A brief introduction to R limma: A brief introduction to R Natalie P. Thorne September 5, 2006 R basics i R is a command line driven environment. This means you have to type in commands (line-by-line) for it to compute or calculate

More information

STAT 135 Lab 1 Solutions

STAT 135 Lab 1 Solutions STAT 135 Lab 1 Solutions January 26, 2015 Introduction To complete this lab, you will need to have access to R and RStudio. If you have not already done so, you can download R from http://cran.cnr.berkeley.edu/,

More information

ddhazard Diagnostics Benjamin Christoffersen

ddhazard Diagnostics Benjamin Christoffersen ddhazard Diagnostics Benjamin Christoffersen 2017-11-25 Introduction This vignette will show examples of how the residuals and hatvalues functions can be used for an object returned by ddhazard. See vignette("ddhazard",

More information

Package funchir. March 6, 2017

Package funchir. March 6, 2017 Version 0.1.4 Title Convenience Functions by Michael Chirico Author Michael Chirico Package funchir March 6, 2017 Maintainer Michael Chirico Depends R (>= 3.2.2) Description

More information

Package ConvergenceClubs

Package ConvergenceClubs Title Finding Convergence Clubs Package ConvergenceClubs June 25, 2018 Functions for clustering regions that form convergence clubs, according to the definition of Phillips and Sul (2009) .

More information

Package RTNduals. R topics documented: March 7, Type Package

Package RTNduals. R topics documented: March 7, Type Package Type Package Package RTNduals March 7, 2019 Title Analysis of co-regulation and inference of 'dual regulons' Version 1.7.0 Author Vinicius S. Chagas, Clarice S. Groeneveld, Gordon Robertson, Kerstin B.

More information

DSCI 325: Handout 18 Introduction to Graphics in R

DSCI 325: Handout 18 Introduction to Graphics in R DSCI 325: Handout 18 Introduction to Graphics in R Spring 2016 This handout will provide an introduction to creating graphics in R. One big advantage that R has over SAS (and over several other statistical

More information

Practical 2: Plotting

Practical 2: Plotting Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory

More information

Package osrm. November 13, 2017

Package osrm. November 13, 2017 Package osrm November 13, 2017 Type Package Title Interface Between R and the OpenStreetMap-Based Routing Service OSRM Version 3.1.0 Date 2017-11-13 An interface between R and the OSRM API. OSRM is a routing

More information

Statistical Software Camp: Introduction to R

Statistical Software Camp: Introduction to R Statistical Software Camp: Introduction to R Day 1 August 24, 2009 1 Introduction 1.1 Why Use R? ˆ Widely-used (ever-increasingly so in political science) ˆ Free ˆ Power and flexibility ˆ Graphical capabilities

More information

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging,

More information

Package RCA. R topics documented: February 29, 2016

Package RCA. R topics documented: February 29, 2016 Type Package Title Relational Class Analysis Version 2.0 Date 2016-02-25 Author Amir Goldberg, Sarah K. Stein Package RCA February 29, 2016 Maintainer Amir Goldberg Depends igraph,

More information

Chapter 5 An Introduction to Basic Plotting Tools

Chapter 5 An Introduction to Basic Plotting Tools Chapter 5 An Introduction to Basic Plotting Tools We have demonstrated the use of R tools for importing data, manipulating data, extracting subsets of data, and making simple calculations, such as mean,

More information

Package PropClust. September 15, 2018

Package PropClust. September 15, 2018 Type Package Title Propensity Clustering and Decomposition Version 1.4-6 Date 2018-09-12 Package PropClust September 15, 2018 Author John Michael O Ranola, Kenneth Lange, Steve Horvath, Peter Langfelder

More information

Package ClustGeo. R topics documented: July 14, Type Package

Package ClustGeo. R topics documented: July 14, Type Package Type Package Package ClustGeo July 14, 2017 Title Hierarchical Clustering with Spatial Constraints Version 2.0 Author Marie Chavent [aut, cre], Vanessa Kuentz [aut], Amaury Labenne [aut], Jerome Saracco

More information

Configuring Figure Regions with prepplot Ulrike Grömping 03 April 2018

Configuring Figure Regions with prepplot Ulrike Grömping 03 April 2018 Configuring Figure Regions with prepplot Ulrike Grömping 3 April 218 Contents 1 Purpose and concept of package prepplot 1 2 Overview of possibilities 2 2.1 Scope.................................................

More information

Appendix A. Introduction to MATLAB. A.1 What Is MATLAB?

Appendix A. Introduction to MATLAB. A.1 What Is MATLAB? Appendix A Introduction to MATLAB A.1 What Is MATLAB? MATLAB is a technical computing environment developed by The Math- Works, Inc. for computation and data visualization. It is both an interactive system

More information

Count outlier detection using Cook s distance

Count outlier detection using Cook s distance Count outlier detection using Cook s distance Michael Love August 9, 2014 1 Run DE analysis with and without outlier removal The following vignette produces the Supplemental Figure of the effect of replacing

More information

MATLAB SUMMARY FOR MATH2070/2970

MATLAB SUMMARY FOR MATH2070/2970 MATLAB SUMMARY FOR MATH2070/2970 DUNCAN SUTHERLAND 1. Introduction The following is inted as a guide containing all relevant Matlab commands and concepts for MATH2070 and 2970. All code fragments should

More information

Package seg. February 15, 2013

Package seg. February 15, 2013 Package seg February 15, 2013 Version 0.2-4 Date 2013-01-21 Title A set of tools for residential segregation research Author Seong-Yun Hong, David O Sullivan Maintainer Seong-Yun Hong

More information

Computation for the Introduction to MCMC Chapter of Handbook of Markov Chain Monte Carlo By Charles J. Geyer Technical Report No.

Computation for the Introduction to MCMC Chapter of Handbook of Markov Chain Monte Carlo By Charles J. Geyer Technical Report No. Computation for the Introduction to MCMC Chapter of Handbook of Markov Chain Monte Carlo By Charles J. Geyer Technical Report No. 679 School of Statistics University of Minnesota July 29, 2010 Abstract

More information

Analyzing Genomic Data with NOJAH

Analyzing Genomic Data with NOJAH Analyzing Genomic Data with NOJAH TAB A) GENOME WIDE ANALYSIS Step 1: Select the example dataset or upload your own. Two example datasets are available. Genome-Wide TCGA-BRCA Expression datasets and CoMMpass

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005 Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber June 7, 00 The following exercise will guide you through the first steps of a spotted cdna microarray analysis.

More information

Dilations With Matrices

Dilations With Matrices About the Lesson In this activity, students use matrices to perform dilations centered at the origin of triangles. As a result, students will: Explore the effect of the scale factor on the size relationship

More information

Introduction Basics Simple Statistics Graphics. Using R for Data Analysis and Graphics. 4. Graphics

Introduction Basics Simple Statistics Graphics. Using R for Data Analysis and Graphics. 4. Graphics Using R for Data Analysis and Graphics 4. Graphics Overview 4.1 Overview Several R graphics functions have been presented so far: > plot(d.sport[,"kugel"], d.sport[,"speer"], + xlab="ball push", ylab="javelin",

More information

Introduction to R 21/11/2016

Introduction to R 21/11/2016 Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced

More information

The Basics of Plotting in R

The Basics of Plotting in R The Basics of Plotting in R R has a built-in Datasets Package: iris mtcars precip faithful state.x77 USArrests presidents ToothGrowth USJudgeRatings You can call built-in functions like hist() or plot()

More information

Package mlegp. April 15, 2018

Package mlegp. April 15, 2018 Type Package Package mlegp April 15, 2018 Title Maximum Likelihood Estimates of Gaussian Processes Version 3.1.7 Date 2018-01-29 Author Garrett M. Dancik Maintainer Garrett M. Dancik

More information

Package simed. November 27, 2017

Package simed. November 27, 2017 Version 1.0.3 Title Simulation Education Author Barry Lawson, Larry Leemis Package simed November 27, 2017 Maintainer Barry Lawson Imports graphics, grdevices, methods, stats, utils

More information

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM 1 Instructions Labs may be done in groups of 2 or 3 (i.e., not alone). You may use any programming language you wish but MATLAB is highly suggested.

More information

EE 301 Signals & Systems I MATLAB Tutorial with Questions

EE 301 Signals & Systems I MATLAB Tutorial with Questions EE 301 Signals & Systems I MATLAB Tutorial with Questions Under the content of the course EE-301, this semester, some MATLAB questions will be assigned in addition to the usual theoretical questions. This

More information

Algebraic Graph Theory- Adjacency Matrix and Spectrum

Algebraic Graph Theory- Adjacency Matrix and Spectrum Algebraic Graph Theory- Adjacency Matrix and Spectrum Michael Levet December 24, 2013 Introduction This tutorial will introduce the adjacency matrix, as well as spectral graph theory. For those familiar

More information

Package MODA. January 8, 2019

Package MODA. January 8, 2019 Type Package Package MODA January 8, 2019 Title MODA: MOdule Differential Analysis for weighted gene co-expression network Version 1.8.0 Date 2016-12-16 Author Dong Li, James B. Brown, Luisa Orsini, Zhisong

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

MATLAB Introduction to MATLAB Programming

MATLAB Introduction to MATLAB Programming MATLAB Introduction to MATLAB Programming MATLAB Scripts So far we have typed all the commands in the Command Window which were executed when we hit Enter. Although every MATLAB command can be executed

More information

MCMC for Bayesian Inference Metropolis: Solutions

MCMC for Bayesian Inference Metropolis: Solutions MCMC for Bayesian Inference Metropolis: Solutions Below are the solutions to these exercises on MCMC for Bayesian Inference Metropolis: Exercises. #### Run the folowing lines before doing the exercises:

More information

You can change the line style by adding some information in the plot command within single quotation marks.

You can change the line style by adding some information in the plot command within single quotation marks. Plotting Continued: You can change the line style by adding some information in the plot command within single quotation marks. y = x.^2; plot(x,y, '-xr') xlabel('x, meters') ylabel('y, meters squared')

More information

CQN (Conditional Quantile Normalization)

CQN (Conditional Quantile Normalization) CQN (Conditional Quantile Normalization) Kasper Daniel Hansen khansen@jhsph.edu Zhijin Wu zhijin_wu@brown.edu Modified: August 8, 2012. Compiled: April 30, 2018 Introduction This package contains the CQN

More information

Getting Started with DADiSP

Getting Started with DADiSP Section 1: Welcome to DADiSP Getting Started with DADiSP This guide is designed to introduce you to the DADiSP environment. It gives you the opportunity to build and manipulate your own sample Worksheets

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

GPoM : 6 Approach sensitivity

GPoM : 6 Approach sensitivity GPoM : 6 Approach sensitivity Sylvain Mangiarotti & Mireille Huc 2018-07-26 Approach sensitivity One important interest of the GPoM package comes from its potential to tackle chaotic dynamics (that is,

More information

SSH Device Manager user guide.

SSH Device Manager user guide. SSH Device Manager user guide Contact yulia@switcharena.com Table of Contents Operation... 3 First activation... 3 Adding a device... 4 Adding a script... 5 Adding a group... 7 Assign or remove a device

More information

ISA internals. October 18, Speeding up the ISA iteration 2

ISA internals. October 18, Speeding up the ISA iteration 2 ISA internals Gábor Csárdi October 18, 2011 Contents 1 Introduction 1 2 Why two packages? 1 3 Speeding up the ISA iteration 2 4 Running time analysis 3 4.1 The hardware and software.....................

More information