Preliminary Figures for Renormalizing Illumina SNP Cell Line Data

Size: px
Start display at page:

Download "Preliminary Figures for Renormalizing Illumina SNP Cell Line Data"

Transcription

1 Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary Introduction Aims/Objectives Methods Description of Data Statistical Methods Results Conclusions Details Load All Segment Information Read Segment Summary Information Exploratory Figures Segment Means By Number of BAF Components All Segment Means Appendix 10 1 Executive Summary 1.1 Introduction This report describes the analysis of a data set from Lynn Barron, a member of the laboratory of Lynne V. Abruzzo. This dataset was acquired using Illumina 610K SNP chips. The main goal of the study is to identify genetic abnormalities that are associated with clinical outcome (including overall survival and time-to-treatment). This is the seventh part of a series of related reports. 1

2 07-adjustFigs Aims/Objectives We noticed in the first figures (from a previous report with lung cancer cell lines) of the log R ratios (LRR) and B allele frequencies (BAF) that some of the data appeared inconsistent with our understanding of how to interpret the plots. We hypothesized that, for many if not most cell lines, the number of chromosomes present in a cell is far in excess of the typical value of 46. If so, this excess would likely cause the Illumina normalization procedure to scale all of the LRR data to a value that is too small (since the normalization implicitly assumes that the intensities come from cells with about 46 chromosomes). The objective of this report is to test this hypothesis and try to develop methods to correct for any distortions introduced by normalization. 1.2 Methods Description of Data The dataset contains measurements on 176 previously untreated patients with CLL. Extensive clinical followup is available Statistical Methods Raw data were processed in BeadStudio to yield genotype calls, log R ratios (LRR), and B allele frequencies (BAF) for each SNP in each sample. Since the study does not include matched normal DNA, the BeadStudio computations were performed relative to the pool of 120 HapMap samples run by Illumina. In Report 2, we applied the circular binary segmentation (CBS) algorithm to the intensity (log R ratio; LRR) data for each sample and each chromosome. CBS was first described by Olshan et al. [Biostatistics 2004; 23:657 63]; we use the implementation of CBS from the R package DNAcopy. In Report 3, we computed the odds ratio for LOH versus no LOH in windows of width 40 along each chromosome. In Report 4, we applied the CBS algorithm to transformed B allele frequency (BAF) values on each chromosome of each cell line sample. In all three of those reports, we saved the segmentation results in per-sample files. In Report 6, we pooled the segment data from the different algorithms for each sample. We also computed summary statistics along each resulting segment, including the LRR mean and standard deviation, a summary of the genotypes for the SNPs in the region, and the best fit for modeling the BAF as a mixture of multiple components. In this report, we create figures showing the distribution of segment means, as a function of the number of BAF components. 1.3 Results 1. Figures (see Figure 1) showing the segment means of each cell line, separated by chromosome and by the number of bands or components in the BAF plots are stored in the subdirectory Adjust. 2. Figures (see Figure 2) showing all segments of each cell line, separated by chromosome, are stored in the subdirectory SegMeans.

3 07-adjustFigs Conclusions The model-based approach to estimate the renormalization constant appears to work well. 2 Details 2.1 Load All Segment Information We first read the sample names, which were saved by Report 1, so we can use them in the analysis. > load("allsamplenames.rda") 2.2 Read Segment Summary Information As mentioned above, Report 5(1) combined the segmentation information from the three methods (LRR, BAF, and LOH) into a single file per sample. In the next loop, we read all of the segment summaries and combine them into a single structure. > library(nlme) > memory.limit(2048) [1] > if (.USECACHE & file.exists("segs.rda")) { + load("segs.rda") else { + for (cid in shortnames) { + temp <- read.table(file.path("newmerge", paste(cid, "tsv", sep='.')), + sep="\t", header=true, fill=true) + if (ncol(temp)!=.expectedcolumns) { + stop(paste(cid, " segments have wrong number, ", ncol(temp), ", of columns", sep='')) + # Handle the case of purely numeric IDs. + temp$samid <- factor(temp$samid) + # Order the levels of the chromosome factor. + temp$chrom <- factor(temp$chrom, levels=c(1:22, "X")) + # Compute the percentage of AB calls per segment. + temp$abperc <- 100*temp$AB/(temp$AA+temp$AB+temp$BB) + if (exists('segs')) { + segs <- rbind(segs, temp) else { + segs <- temp

4 07-adjustFigs 4 + # clean up + rm(temp, cid) + gc() + save(segs, file="segs.rda") For this analysis, we accept the conclusions from Report 5 about the number of BAF components. The numbers that were changed from 2 to 3 or 4, or from 3 to 4, are recorded in the file as 3.2, 4.2,or 4.3, respectively. We simply round these back to the nearest integer. > table(segs$nbafcomp) > segs$nbafcomp <- round(segs$nbafcomp) > table(segs$nbafcomp) Exploratory Figures Here we generate a set of figures, one per sample (cell line). We will illustrate the plots for the following cell line: > cid <-.EGSAMPLE > cid [1] "CL001" 3.1 Segment Means By Number of BAF Components The underlying motivation is that we think we understand how to interpret a couple of the standard plots of the SNP data. 1. When a sample is homozygous, then the BAF plot should show two bands (corresponding to the two possible genotypes, A or B). The LRR plot can be centered along the levels of any half-integer. 2. When a sample has the normal complement of two (heterozygous) copies of a chromosome, then the BAF plot should show three bands (corresponding to the three possible genotypes; AA, AB, or BB) and the LRR plot should be centered along the value 0.

5 07-adjustFigs 5 3. When a sample has three copies of a chromosome, then the BAF plot should have four bands (corresponding to the genotypes AAA, AAB, ABB, or BBB) and the LRR plot should be centered above 0 (and ideally at log(3/2). So, we identify all segments where the BAF plot has 2, 3 or 4 bands, using the following block of code. > segway1 <- segs[!is.na(segs$samid) & segs$samid == cid & segs$num.mark > + 50 &!is.na(segs$nbafcomp) & segs$nbafcomp == 2, ] > segway2 <- segs[!is.na(segs$samid) & segs$samid == cid & segs$num.mark > + 50 &!is.na(segs$nbafcomp) & segs$nbafcomp == 3 &!is.na(segs$abperc) & + segs$abperc > 10, ] > segway3 <- segs[!is.na(segs$samid) & segs$samid == cid & segs$num.mark > + 50 &!is.na(segs$nbafcomp) & segs$nbafcomp == 4 &!is.na(segs$abperc) & + segs$abperc > 10, ] To plot the segment LRR means for the segments that contain three BAF bands (and are thus nominally regions of normal two-copy heterozygous chromosomes), we use the next block of code. > if (nrow(segway2) > 0) { + temp <- segway2$seg.mean + temp[temp < -1.5] <- NA + plot(jitter(as.numeric(segway2$chrom)), temp, main = cid, xlim = c(1, + 23), ylab = "Balanced heterozygous (BAF=3) segment means", + xlab = "Chromosome") + abline(h = median(segway2$seg.mean), col = "green") else { + plot(c(1, 23), c(0, 0), main = cid, type = "n", ylim = c(-0.1, + 0.1), ylab = "Two-copy segment means", xlab = "Chromosome") > abline(v = seq(1.5, 22.5), col = "gray", lwd = 2) > abline(h = 0, col = "blue", lwd = 2) To plot the segment LRR means for the segments that contain three BAF bands (and are thus nominally regions of normal two-copy heterozygous chromosomes), we use the next block of code. > if (nrow(segway3) > 0) { + temp <- segway3$seg.mean + temp[temp < -1.5] <- NA + plot(jitter(as.numeric(segway3$chrom)), temp, xlim = c(1, 23), + main = cid, ylab = "Unbalanced heterozygous (BAF=4) segment means", + xlab = "Chromosome") + abline(h = median(segway3$seg.mean), col = "green") else { + plot(c(1, 23), c(0, 0), main = cid, type = "n", ylim = c(-0.1,

6 07-adjustFigs ), ylab = "Three-copy segment means", xlab = "Chromosome") > abline(v = seq(1.5, 22.5), col = "gray", lwd = 2) > abline(h = log10(3/2), col = "green", lwd = 2) > abline(h = 0, col = "blue", lwd = 2) To plot the segment LRR means for the segments that contain two BAF bands (and are thus regions of normal homozygous chromosomes), we use the next block of code. > if (nrow(segway1) > 0 && sum(segway1$seg.mean > -1.5) > 0) { + temp <- segway1$seg.mean + temp[temp < -1.5] <- NA + plot(jitter(as.numeric(segway1$chrom)), temp, main = cid, xlim = c(1, + 23), ylab = "Homozygous (BAF=2) segment means", xlab = "Chromosome") + abline(h = median(segway1$seg.mean), col = "blue") else { + plot(c(1, 23), c(0, 0), main = cid, type = "n", ylim = c(-0.1, + 0.1), ylab = "Homozygous segment means", xlab = "Chromosome") > abline(v = seq(1.5, 22.5), col = "gray", lwd = 2) > abline(h = 0, col = "blue", lwd = 2) > abline(h = log10(1/2), col = "orange", lwd = 2) The results for cell line CL001 are shown in Figure 1. From the central panel, we see that there are at lest three different characteristic intensity levels. The lowest of these levels, at approximately 0.3 or 0.2, presumably corresponds to the normal two-copy situation. Each higher level should double the number of copies (since the three-band BAF plot can only occur with equal numbers of heterozygous chromosomes). Similarly, the lower panel also shows multiple characteristic intensity levels. The lowest level (at approximately 0.3) presumably corresponds to the three-copy situation, with each additional level increasing the number of chromosomes by one (since a four-band BAF plot will occur whenever you have two heterozygous chromosomes in unequal numbers). Both panels strongly suggest that the LRR values are systematically biased down and need to be increased in order to get sensible interpretations. Now we generate similar plots for all the cell lines and save copies in the subdirectory Adjust. > if (!file.exists("adjust")) dir.create("adjust") > for (cid in shortnames) { + <<identify.baf.segments>> + opar <- par(mfrow=c(3,1), mai=c(0.82, 0.8, 0.5, 0.1), bg='white') + <<plot.onecopy.lrr.means>> + <<plot.twocopy.lrr.means>> + <<plot.threecopy.lrr.means>> + par(opar) + fn <- file.path("adjust", paste(cid, "png", sep='.'))

7 07-adjustFigs 7 Homozygous (BAF=2) segment means CL Chromosome Balanced heterozygous (BAF=3) segment means CL Chromosome Unbalanced heterozygous (BAF=4) segment mean CL Chromosome Figure 1: Segment means, by chromosome, of regions that are nominally two-copy (top) or threecopy (bottom).

8 07-adjustFigs 8 + dev.copy(png, filename=fn, width=800, height=600) + dev.off() 3.2 All Segment Means We have also found it useful to plot all of the segment means in a single graph per sample. Here is the code to create these plots: > segway <- segs[segs$samid == cid, ] > temp <- segway$seg.mean > temp[temp < -1.5] <- NA > ot <- order(segway$chrom, segway$loc.start) > cs <- cumsum(table(segway$chrom)) > cs2 <- ((c(0, cs) + c(cs, max(cs)))/2)[1:23] > plot(temp[ot], main = cid, pch = 16, cex = 1/2, xlab = "", xaxt = "n", + ylab = "LRR") > mtext(c(1:22, "X"), side = 3, at = cs2, las = 2, line = 0.5) > mtext(c(1:22, "X"), side = 1, at = cs2, las = 2, line = 0.5) > abline(v = cs, col = "purple") > abline(h = 0, col = "blue", lwd = 2) > abline(h = log10(3/2), col = "green", lwd = 2) > abline(h = log10(1/2), col = "orange", lwd = 2) In Figure 2, we plot all the segment means for CL001. We see that the means are roughly centered about LRR = 0, which would be appropriate only if this cell line has close to the normal complement of 46 chromosomes. As we will see later, that is clearly not the case. As above, we generate similar plots for each cell line. These are stored in the subdirectory SegMeans. > if (!file.exists("segmeans")) dir.create("segmeans") > for (cid in shortnames) { + segway <- segs[segs$samid == cid, ] + temp <- segway$seg.mean + temp[temp < -1.5] <- NA + ot <- order(segway$chrom, segway$loc.start) + cs <- cumsum(table(segway$chrom)) + cs2 <- ((c(0, cs) + c(cs, max(cs)))/2)[1:23] + plot(temp[ot], main = cid, pch = 16, cex = 1/2, xlab = "", xaxt = "n", + ylab = "LRR") + mtext(c(1:22, "X"), side = 3, at = cs2, las = 2, line = 0.5) + mtext(c(1:22, "X"), side = 1, at = cs2, las = 2, line = 0.5) + abline(v = cs, col = "purple") + abline(h = 0, col = "blue", lwd = 2)

9 07-adjustFigs 9 CL001 LRR X X Figure 2: Plot of LRR segments means, ordered along chromosomes, for cell line CL001.

10 07-adjustFigs 10 + abline(h = log10(3/2), col = "green", lwd = 2) + abline(h = log10(1/2), col = "orange", lwd = 2) + fn <- file.path("segmeans", paste(cid, "png", sep = ".")) + dev.copy(png, filename = fn, width = 800, height = 600) + dev.off() 4 Appendix This analysis was run in the following directory: > getwd() [1] "o:/private/abruzzo/snp-cll/aa" Note that \\mdadqsfs02 is the standard insititutional location for storing data and analyses; N: is the name given to that location on this machine. This analysis was run in the following software environment: > sessioninfo() R version ( ) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grdevices utils datasets methods base other attached packages: [1] nlme_ loaded via a namespace (and not attached): [1] cluster_ grid_ lattice_ RColorBrewer_1.0-2 [5] xtable_1.5-6 > while (!is.null(dev.list())) dev.off()

Preparing the Final Data Set

Preparing the Final Data Set Preparing the Final Data Set Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

Plotting Segment Calls From SNP Assay

Plotting Segment Calls From SNP Assay Plotting Segment Calls From SNP Assay Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

Reorganizing the data by sample

Reorganizing the data by sample Reorganizing the data by sample Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

Reorganizing the data by sample

Reorganizing the data by sample Reorganizing the data by sample Kevin R. Coombes 23 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

Main Results. Kevin R, Coombes. 10 September 2011

Main Results. Kevin R, Coombes. 10 September 2011 Main Results Kevin R, Coombes 10 September 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives.................................. 1 1.2

More information

Quality control of array genotyping data with argyle Andrew P Morgan

Quality control of array genotyping data with argyle Andrew P Morgan Quality control of array genotyping data with argyle Andrew P Morgan 2015-10-08 Introduction Proper quality control of array genotypes is an important prerequisite to further analysis. Genotype quality

More information

Shrinkage of logarithmic fold changes

Shrinkage of logarithmic fold changes Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs

More information

Package RLMM. March 7, 2019

Package RLMM. March 7, 2019 Version 1.44.0 Date 2005-09-02 Package RLMM March 7, 2019 Title A Genotype Calling Algorithm for Affymetrix SNP Arrays Author Nusrat Rabbee , Gary Wong

More information

genocn: integrated studies of copy number and genotype

genocn: integrated studies of copy number and genotype genocn: integrated studies of copy number and genotype Sun, W., Wright, F., Tang, Z., Nordgard, S.H., Van Loo, P., Yu, T., Kristensen, V., Perou, C. February 22, 2010 1 Overview > library(genocn) This

More information

500K Data Analysis Workflow using BRLMM

500K Data Analysis Workflow using BRLMM 500K Data Analysis Workflow using BRLMM I. INTRODUCTION TO BRLMM ANALYSIS TOOL... 2 II. INSTALLATION AND SET-UP... 2 III. HARDWARE REQUIREMENTS... 3 IV. BRLMM ANALYSIS TOOL WORKFLOW... 3 V. RESULTS/OUTPUT

More information

AA BB CC DD EE. Introduction to Graphics in R

AA BB CC DD EE. Introduction to Graphics in R Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat

More information

CQN (Conditional Quantile Normalization)

CQN (Conditional Quantile Normalization) CQN (Conditional Quantile Normalization) Kasper Daniel Hansen khansen@jhsph.edu Zhijin Wu zhijin_wu@brown.edu Modified: August 8, 2012. Compiled: April 30, 2018 Introduction This package contains the CQN

More information

Preprocessing and Genotyping Illumina Arrays for Copy Number Analysis

Preprocessing and Genotyping Illumina Arrays for Copy Number Analysis Preprocessing and Genotyping Illumina Arrays for Copy Number Analysis Rob Scharpf September 18, 2012 Abstract This vignette illustrates the steps required prior to copy number analysis for Infinium platforms.

More information

Affymetrix GeneChip DNA Analysis Software

Affymetrix GeneChip DNA Analysis Software Affymetrix GeneChip DNA Analysis Software User s Guide Version 3.0 For Research Use Only. Not for use in diagnostic procedures. P/N 701454 Rev. 3 Trademarks Affymetrix, GeneChip, EASI,,,, HuSNP, GenFlex,

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis, Heidelberg, March 2005 http://compdiag.molgen.mpg.de/ngfn/pma2005mar.shtml The following

More information

Package saascnv. May 18, 2016

Package saascnv. May 18, 2016 Version 0.3.4 Date 2016-05-10 Package saascnv May 18, 2016 Title Somatic Copy Number Alteration Analysis Using Sequencing and SNP Array Data Author Zhongyang Zhang [aut, cre], Ke Hao [aut], Nancy R. Zhang

More information

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging,

More information

Bioinformatics - Homework 1 Q&A style

Bioinformatics - Homework 1 Q&A style Bioinformatics - Homework 1 Q&A style Instructions: in this assignment you will test your understanding of basic GWAS concepts and GenABEL functions. The materials needed for the homework (two datasets

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Exploring cdna Data Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Using crlmm for copy number estimation and genotype calling with Illumina platforms

Using crlmm for copy number estimation and genotype calling with Illumina platforms Using crlmm for copy number estimation and genotype calling with Illumina platforms Rob Scharpf November, Abstract This vignette illustrates the steps necessary for obtaining marker-level estimates of

More information

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

Examples of implementation of pre-processing method described in paper with R code snippets - Electronic Supplementary Information (ESI)

Examples of implementation of pre-processing method described in paper with R code snippets - Electronic Supplementary Information (ESI) Electronic Supplementary Material (ESI) for Analyst. This journal is The Royal Society of Chemistry 2015 Examples of implementation of pre-processing method described in paper with R code snippets - Electronic

More information

Plotting: An Iterative Process

Plotting: An Iterative Process Plotting: An Iterative Process Plotting is an iterative process. First we find a way to represent the data that focusses on the important aspects of the data. What is considered an important aspect may

More information

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes Kouros Owzar Zhiguo Li Nancy Cox Sin-Ho Jung Chanhee Yi June 29, 2016 1 Introduction This vignette

More information

R/BioC Exercises & Answers: Unsupervised methods

R/BioC Exercises & Answers: Unsupervised methods R/BioC Exercises & Answers: Unsupervised methods Perry Moerland April 20, 2010 Z Information on how to log on to a PC in the exercise room and the UNIX server can be found here: http://bioinformaticslaboratory.nl/twiki/bin/view/biolab/educationbioinformaticsii.

More information

Graphics - Part III: Basic Graphics Continued

Graphics - Part III: Basic Graphics Continued Graphics - Part III: Basic Graphics Continued Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Highway MPG 20 25 30 35 40 45 50 y^i e i = y i y^i 2000 2500 3000 3500 4000 Car Weight Copyright

More information

crlmm to downstream data analysis

crlmm to downstream data analysis crlmm to downstream data analysis VJ Carey, B Carvalho March, 2012 1 Running CRLMM on a nontrivial set of CEL files To use the crlmm algorithm, the user must load the crlmm package, as described below:

More information

Package SNPchip. May 3, 2018

Package SNPchip. May 3, 2018 Version 2.26.0 Title Visualizations for copy number alterations Package SNPchip May 3, 2018 Author Robert Scharpf and Ingo Ruczinski Maintainer Robert Scharpf Depends

More information

Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set

Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging, IPHT Jena e.v. February 13,

More information

LD vignette Measures of linkage disequilibrium

LD vignette Measures of linkage disequilibrium LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn

More information

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you

More information

Statistical Programming with R

Statistical Programming with R Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan biqelan@iugaza.edu.ps Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester

More information

Module 10. Data Visualization. Andrew Jaffe Instructor

Module 10. Data Visualization. Andrew Jaffe Instructor Module 10 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots on Wednesday, but we are going to expand the ability to customize these basic graphics first. 2/37 But first...

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

How to use CNTools. Overview. Algorithms. Jianhua Zhang. April 14, 2011

How to use CNTools. Overview. Algorithms. Jianhua Zhang. April 14, 2011 How to use CNTools Jianhua Zhang April 14, 2011 Overview Studies have shown that genomic alterations measured as DNA copy number variations invariably occur across chromosomal regions that span over several

More information

Contents. Introduction 2

Contents. Introduction 2 R code for The human immune system is robustly maintained in multiple stable equilibriums shaped by age and cohabitation Vasiliki Lagou, on behalf of co-authors 18 September 2015 Contents Introduction

More information

Count outlier detection using Cook s distance

Count outlier detection using Cook s distance Count outlier detection using Cook s distance Michael Love August 9, 2014 1 Run DE analysis with and without outlier removal The following vignette produces the Supplemental Figure of the effect of replacing

More information

The LDheatmap Package

The LDheatmap Package The LDheatmap Package May 6, 2006 Title Graphical display of pairwise linkage disequilibria between SNPs Version 0.2-1 Author Ji-Hyung Shin , Sigal Blay , Nicholas Lewin-Koh

More information

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert 2017-01-08 Contents Introduction 2 Peak-calling: question...........................................

More information

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness Bioconductor exercises Exploring cdna data June 2004 Wolfgang Huber and Andreas Buness The following exercise will show you some possibilities to load data from spotted cdna microarrays into R, and to

More information

Contents of this guide

Contents of this guide extraction sequencing genotyping extraction sequencing genotyping extraction sequencing genotyping extraction sequencing SNPviewer User guide Contents of this guide 1 Introduction 2 Getting started 3 Exploring

More information

The analysis of acgh data: Overview

The analysis of acgh data: Overview The analysis of acgh data: Overview JC Marioni, ML Smith, NP Thorne January 13, 2006 Overview i snapcgh (Segmentation, Normalisation and Processing of arraycgh data) is a package for the analysis of array

More information

Package lodgwas. R topics documented: November 30, Type Package

Package lodgwas. R topics documented: November 30, Type Package Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter

More information

Introduction to R 21/11/2016

Introduction to R 21/11/2016 Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced

More information

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs

More information

Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set

Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Claudia Beleites DIA Raman Spectroscopy Group, University of Trieste/Italy (2 28) Spectroscopy Imaging,

More information

Creating a custom mappings similarity matrix

Creating a custom mappings similarity matrix BioNumerics Tutorial: Creating a custom mappings similarity matrix 1 Aim In BioNumerics, character values can be mapped to categorical names according to predefined criteria (see tutorial Importing non-numerical

More information

Using the qrqc package to gather information about sequence qualities

Using the qrqc package to gather information about sequence qualities Using the qrqc package to gather information about sequence qualities Vince Buffalo Bioinformatics Core UC Davis Genome Center vsbuffalo@ucdavis.edu 2012-02-19 Abstract Many projects in bioinformatics

More information

Introduction to R (BaRC Hot Topics)

Introduction to R (BaRC Hot Topics) Introduction to R (BaRC Hot Topics) George Bell September 30, 2011 This document accompanies the slides from BaRC s Introduction to R and shows the use of some simple commands. See the accompanying slides

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Package icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang

Package icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang Title Integrated Copy Number Variation detection Version 1.2.1 Author Zilu Zhou, Nancy Zhang Package icnv March 8, 2019 Maintainer Zilu Zhou Integrative copy number variation

More information

Data Visualization. Andrew Jaffe Instructor

Data Visualization. Andrew Jaffe Instructor Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Introduction to R for Epidemiologists

Introduction to R for Epidemiologists Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression

More information

Package logicfs. R topics documented:

Package logicfs. R topics documented: Package logicfs November 21, 2017 Title Identification of SNP Interactions Version 1.48.0 Date 2013-09-12 Author Holger Schwender Maintainer Holger Schwender Depends LogicReg, mcbiopi

More information

R.devices. Henrik Bengtsson. November 19, 2012

R.devices. Henrik Bengtsson. November 19, 2012 R.devices Henrik Bengtsson November 19, 2012 Abstract The R.devices package provides utility methods that enhance the existing graphical device functions already available in R for the purpose of simplifying

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010 BEAGLECALL 1.0 Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington 15 November 2010 BEAGLECALL 1.0 P a g e i Contents 1 Introduction... 1 1.1 Citing BEAGLECALL...

More information

Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc.

Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc. Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc. Overview This script converts allelic dosage values to genotypes based on user-specified thresholds. The dosage data may be in

More information

Package mixphm. July 23, 2015

Package mixphm. July 23, 2015 Type Package Title Mixtures of Proportional Hazard Models Version 0.7-2 Date 2015-07-23 Package mixphm July 23, 2015 Fits multiple variable mixtures of various parametric proportional hazard models using

More information

Performance assessment of vsn with simulated data

Performance assessment of vsn with simulated data Performance assessment of vsn with simulated data Wolfgang Huber November 30, 2008 Contents 1 Overview 1 2 Helper functions used in this document 1 3 Number of features n 3 4 Number of samples d 3 5 Number

More information

Using the RCircos Package

Using the RCircos Package Using the RCircos Package Hongen Zhang, Ph.D. Genetics Branch, Center for Cancer Research, National Cancer Institute, NIH August 01, 2016 Contents 1 Introduction 1 2 Input Data Format 2 3 Plot Track Layout

More information

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning

Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 1. Data input and cleaning Peter Langfelder and Steve Horvath February 13, 2016 Contents

More information

Statistical Programming Camp: An Introduction to R

Statistical Programming Camp: An Introduction to R Statistical Programming Camp: An Introduction to R Handout 3: Data Manipulation and Summarizing Univariate Data Fox Chapters 1-3, 7-8 In this handout, we cover the following new materials: ˆ Using logical

More information

RAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1

RAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1 RAPIDR Kitty Lo November 20, 2014 Contents 1 Intended use of RAPIDR 1 2 Create binned counts file from BAMs 1 2.1 Masking.................................................... 1 3 Build the reference 2 3.1

More information

Practical 2: Plotting

Practical 2: Plotting Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory

More information

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Genome Assembly Using de Bruijn Graphs. Biostatistics 666 Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position

More information

Package sciplot. February 15, 2013

Package sciplot. February 15, 2013 Package sciplot February 15, 2013 Version 1.1-0 Title Scientific Graphing Functions for Factorial Designs Author Manuel Morales , with code developed by the R Development Core Team

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

data visualization Show the Data Snow Month skimming deep waters

data visualization Show the Data Snow Month skimming deep waters data visualization skimming deep waters Show the Data Snow 2 4 6 8 12 Minimize Distraction Minimize Distraction Snow 2 4 6 8 12 2 4 6 8 12 Make Big Data Coherent Reveal Several Levels of Detail 1974 1975

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

Package R2SWF. R topics documented: February 15, Version 0.4. Title Convert R Graphics to Flash Animations. Date

Package R2SWF. R topics documented: February 15, Version 0.4. Title Convert R Graphics to Flash Animations. Date Package R2SWF February 15, 2013 Version 0.4 Title Convert R Graphics to Flash Animations Date 2012-07-14 Author Yixuan Qiu and Yihui Xie Maintainer Yixuan Qiu Suggests XML, Cairo

More information

Package vipor. March 22, 2017

Package vipor. March 22, 2017 Type Package Package vipor March 22, 2017 Title Plot Categorical Data Using Quasirandom Noise and Density Estimates Version 0.4.5 Date 2017-03-22 Author Scott Sherrill-Mix, Erik Clarke Maintainer Scott

More information

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2 ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................

More information

How to use cghmcr. October 30, 2017

How to use cghmcr. October 30, 2017 How to use cghmcr Jianhua Zhang Bin Feng October 30, 2017 1 Overview Copy number data (arraycgh or SNP) can be used to identify genomic regions (Regions Of Interest or ROI) showing gains or losses that

More information

SNPViewer Documentation

SNPViewer Documentation SNPViewer Documentation Module name: Description: Author: SNPViewer Displays SNP data plotting copy numbers and LOH values Jim Robinson (Broad Institute), gp-help@broad.mit.edu Summary: The SNPViewer displays

More information

Operating instructions for MixtureCalc v1.2 (Freeware Version)

Operating instructions for MixtureCalc v1.2 (Freeware Version) 1 Objective To provide instructions for importing data into and for performing mixture calculations using MixtureCalc-v1.2. MixtureCalc-v1.2 is validated for SPSA casework only and no warranty is provided

More information

Univariate Data - 2. Numeric Summaries

Univariate Data - 2. Numeric Summaries Univariate Data - 2. Numeric Summaries Young W. Lim 2018-08-01 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-08-01 Mon 1 / 36 Outline 1 Univariate Data Based on Numerical Summaries R Numeric

More information

KaryoStudio v1.4 User Guide

KaryoStudio v1.4 User Guide KaryoStudio v1.4 User Guide FOR RESEARCH USE ONLY ILLUMINA PROPRIETARY Part # 11328837 Rev. C June 2011 Notice This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"),

More information

The nor1mix Package. August 3, 2006

The nor1mix Package. August 3, 2006 The nor1mix Package August 3, 2006 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-6 Date 2006-08-02 Author: Martin Mächler Maintainer Martin Maechler

More information

Release Notes. JMP Genomics. Version 4.0

Release Notes. JMP Genomics. Version 4.0 JMP Genomics Version 4.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

Package gsbdesign. March 29, 2016

Package gsbdesign. March 29, 2016 Type Package Title Group Sequential Bayes Design Version 1.00 Date 2016-03-27 Author Florian Gerber, Thomas Gsponer Depends gsdesign, lattice, grid, Imports stats, graphics, grdevices, utils Package gsbdesign

More information

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables: Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum

More information

Microarray Technology (Affymetrix ) and Analysis. Practicals

Microarray Technology (Affymetrix ) and Analysis. Practicals Data Analysis and Modeling Methods Microarray Technology (Affymetrix ) and Analysis Practicals B. Haibe-Kains 1,2 and G. Bontempi 2 1 Unité Microarray, Institut Jules Bordet 2 Machine Learning Group, Université

More information

Introduction for heatmap3 package

Introduction for heatmap3 package Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are

More information

Analysis of two-way cell-based assays

Analysis of two-way cell-based assays Analysis of two-way cell-based assays Lígia Brás, Michael Boutros and Wolfgang Huber April 16, 2015 Contents 1 Introduction 1 2 Assembling the data 2 2.1 Reading the raw intensity files..................

More information

Package copynumber. November 21, 2017

Package copynumber. November 21, 2017 Package copynumber November 21, 2017 Title Segmentation of single- and multi-track copy number data by penalized least squares regression. Version 1.19.0 Author Gro Nilsen, Knut Liestoel and Ole Christian

More information

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106) This document describes how to use a number of R commands for plotting one variable and for calculating one variable summary statistics Specifically, it describes how to use R to create dotplots, histograms,

More information

Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.)

Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.) Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.) Genotyping Console 4.2 includes the following changes and enhancements: 1. Edit Calls within

More information

Package OmicCircos. R topics documented: November 17, Version Date

Package OmicCircos. R topics documented: November 17, Version Date Version 1.16.0 Date 2015-02-23 Package OmicCircos November 17, 2017 Title High-quality circular visualization of omics data Author Maintainer Ying Hu biocviews Visualization,Statistics,Annotation

More information

Graphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

Graphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Base Graphics 2 Graphics in R Traditional

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Validating predictions on the Chang data

Validating predictions on the Chang data Validating predictions on the Chang data Kevin R. Coombes, Jing Wang, and Keith A. Baggerly 13 March 2007 1 Description of the problem We want to test the predictions from five models, built on training

More information

RLMM - Robust Linear Model with Mahalanobis Distance Classifier

RLMM - Robust Linear Model with Mahalanobis Distance Classifier RLMM - Robust Linear Model with Mahalanobis Distance Classifier Nusrat Rabbee and Gary Wong June 13, 2018 Contents 1 Introduction 1 2 Instructions for Genotyping Affymetrix Mapping 100K array - Xba set

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

SIBER User Manual. Pan Tong and Kevin R Coombes. May 27, Introduction 1

SIBER User Manual. Pan Tong and Kevin R Coombes. May 27, Introduction 1 SIBER User Manual Pan Tong and Kevin R Coombes May 27, 2015 Contents 1 Introduction 1 2 Using SIBER 1 2.1 A Quick Example........................................... 1 2.2 Dealing With RNAseq Normalization................................

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

An Introduction to the genoset Package

An Introduction to the genoset Package An Introduction to the genoset Package Peter M. Haverty April 4, 2013 Contents 1 Introduction 2 1.1 Creating Objects........................................... 2 1.2 Accessing Genome Information...................................

More information