Preliminary Figures for Renormalizing Illumina SNP Cell Line Data
|
|
- Britton Bradley
- 6 years ago
- Views:
Transcription
1 Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary Introduction Aims/Objectives Methods Description of Data Statistical Methods Results Conclusions Details Load All Segment Information Read Segment Summary Information Exploratory Figures Segment Means By Number of BAF Components All Segment Means Appendix 10 1 Executive Summary 1.1 Introduction This report describes the analysis of a data set from Lynn Barron, a member of the laboratory of Lynne V. Abruzzo. This dataset was acquired using Illumina 610K SNP chips. The main goal of the study is to identify genetic abnormalities that are associated with clinical outcome (including overall survival and time-to-treatment). This is the seventh part of a series of related reports. 1
2 07-adjustFigs Aims/Objectives We noticed in the first figures (from a previous report with lung cancer cell lines) of the log R ratios (LRR) and B allele frequencies (BAF) that some of the data appeared inconsistent with our understanding of how to interpret the plots. We hypothesized that, for many if not most cell lines, the number of chromosomes present in a cell is far in excess of the typical value of 46. If so, this excess would likely cause the Illumina normalization procedure to scale all of the LRR data to a value that is too small (since the normalization implicitly assumes that the intensities come from cells with about 46 chromosomes). The objective of this report is to test this hypothesis and try to develop methods to correct for any distortions introduced by normalization. 1.2 Methods Description of Data The dataset contains measurements on 176 previously untreated patients with CLL. Extensive clinical followup is available Statistical Methods Raw data were processed in BeadStudio to yield genotype calls, log R ratios (LRR), and B allele frequencies (BAF) for each SNP in each sample. Since the study does not include matched normal DNA, the BeadStudio computations were performed relative to the pool of 120 HapMap samples run by Illumina. In Report 2, we applied the circular binary segmentation (CBS) algorithm to the intensity (log R ratio; LRR) data for each sample and each chromosome. CBS was first described by Olshan et al. [Biostatistics 2004; 23:657 63]; we use the implementation of CBS from the R package DNAcopy. In Report 3, we computed the odds ratio for LOH versus no LOH in windows of width 40 along each chromosome. In Report 4, we applied the CBS algorithm to transformed B allele frequency (BAF) values on each chromosome of each cell line sample. In all three of those reports, we saved the segmentation results in per-sample files. In Report 6, we pooled the segment data from the different algorithms for each sample. We also computed summary statistics along each resulting segment, including the LRR mean and standard deviation, a summary of the genotypes for the SNPs in the region, and the best fit for modeling the BAF as a mixture of multiple components. In this report, we create figures showing the distribution of segment means, as a function of the number of BAF components. 1.3 Results 1. Figures (see Figure 1) showing the segment means of each cell line, separated by chromosome and by the number of bands or components in the BAF plots are stored in the subdirectory Adjust. 2. Figures (see Figure 2) showing all segments of each cell line, separated by chromosome, are stored in the subdirectory SegMeans.
3 07-adjustFigs Conclusions The model-based approach to estimate the renormalization constant appears to work well. 2 Details 2.1 Load All Segment Information We first read the sample names, which were saved by Report 1, so we can use them in the analysis. > load("allsamplenames.rda") 2.2 Read Segment Summary Information As mentioned above, Report 5(1) combined the segmentation information from the three methods (LRR, BAF, and LOH) into a single file per sample. In the next loop, we read all of the segment summaries and combine them into a single structure. > library(nlme) > memory.limit(2048) [1] > if (.USECACHE & file.exists("segs.rda")) { + load("segs.rda") else { + for (cid in shortnames) { + temp <- read.table(file.path("newmerge", paste(cid, "tsv", sep='.')), + sep="\t", header=true, fill=true) + if (ncol(temp)!=.expectedcolumns) { + stop(paste(cid, " segments have wrong number, ", ncol(temp), ", of columns", sep='')) + # Handle the case of purely numeric IDs. + temp$samid <- factor(temp$samid) + # Order the levels of the chromosome factor. + temp$chrom <- factor(temp$chrom, levels=c(1:22, "X")) + # Compute the percentage of AB calls per segment. + temp$abperc <- 100*temp$AB/(temp$AA+temp$AB+temp$BB) + if (exists('segs')) { + segs <- rbind(segs, temp) else { + segs <- temp
4 07-adjustFigs 4 + # clean up + rm(temp, cid) + gc() + save(segs, file="segs.rda") For this analysis, we accept the conclusions from Report 5 about the number of BAF components. The numbers that were changed from 2 to 3 or 4, or from 3 to 4, are recorded in the file as 3.2, 4.2,or 4.3, respectively. We simply round these back to the nearest integer. > table(segs$nbafcomp) > segs$nbafcomp <- round(segs$nbafcomp) > table(segs$nbafcomp) Exploratory Figures Here we generate a set of figures, one per sample (cell line). We will illustrate the plots for the following cell line: > cid <-.EGSAMPLE > cid [1] "CL001" 3.1 Segment Means By Number of BAF Components The underlying motivation is that we think we understand how to interpret a couple of the standard plots of the SNP data. 1. When a sample is homozygous, then the BAF plot should show two bands (corresponding to the two possible genotypes, A or B). The LRR plot can be centered along the levels of any half-integer. 2. When a sample has the normal complement of two (heterozygous) copies of a chromosome, then the BAF plot should show three bands (corresponding to the three possible genotypes; AA, AB, or BB) and the LRR plot should be centered along the value 0.
5 07-adjustFigs 5 3. When a sample has three copies of a chromosome, then the BAF plot should have four bands (corresponding to the genotypes AAA, AAB, ABB, or BBB) and the LRR plot should be centered above 0 (and ideally at log(3/2). So, we identify all segments where the BAF plot has 2, 3 or 4 bands, using the following block of code. > segway1 <- segs[!is.na(segs$samid) & segs$samid == cid & segs$num.mark > + 50 &!is.na(segs$nbafcomp) & segs$nbafcomp == 2, ] > segway2 <- segs[!is.na(segs$samid) & segs$samid == cid & segs$num.mark > + 50 &!is.na(segs$nbafcomp) & segs$nbafcomp == 3 &!is.na(segs$abperc) & + segs$abperc > 10, ] > segway3 <- segs[!is.na(segs$samid) & segs$samid == cid & segs$num.mark > + 50 &!is.na(segs$nbafcomp) & segs$nbafcomp == 4 &!is.na(segs$abperc) & + segs$abperc > 10, ] To plot the segment LRR means for the segments that contain three BAF bands (and are thus nominally regions of normal two-copy heterozygous chromosomes), we use the next block of code. > if (nrow(segway2) > 0) { + temp <- segway2$seg.mean + temp[temp < -1.5] <- NA + plot(jitter(as.numeric(segway2$chrom)), temp, main = cid, xlim = c(1, + 23), ylab = "Balanced heterozygous (BAF=3) segment means", + xlab = "Chromosome") + abline(h = median(segway2$seg.mean), col = "green") else { + plot(c(1, 23), c(0, 0), main = cid, type = "n", ylim = c(-0.1, + 0.1), ylab = "Two-copy segment means", xlab = "Chromosome") > abline(v = seq(1.5, 22.5), col = "gray", lwd = 2) > abline(h = 0, col = "blue", lwd = 2) To plot the segment LRR means for the segments that contain three BAF bands (and are thus nominally regions of normal two-copy heterozygous chromosomes), we use the next block of code. > if (nrow(segway3) > 0) { + temp <- segway3$seg.mean + temp[temp < -1.5] <- NA + plot(jitter(as.numeric(segway3$chrom)), temp, xlim = c(1, 23), + main = cid, ylab = "Unbalanced heterozygous (BAF=4) segment means", + xlab = "Chromosome") + abline(h = median(segway3$seg.mean), col = "green") else { + plot(c(1, 23), c(0, 0), main = cid, type = "n", ylim = c(-0.1,
6 07-adjustFigs ), ylab = "Three-copy segment means", xlab = "Chromosome") > abline(v = seq(1.5, 22.5), col = "gray", lwd = 2) > abline(h = log10(3/2), col = "green", lwd = 2) > abline(h = 0, col = "blue", lwd = 2) To plot the segment LRR means for the segments that contain two BAF bands (and are thus regions of normal homozygous chromosomes), we use the next block of code. > if (nrow(segway1) > 0 && sum(segway1$seg.mean > -1.5) > 0) { + temp <- segway1$seg.mean + temp[temp < -1.5] <- NA + plot(jitter(as.numeric(segway1$chrom)), temp, main = cid, xlim = c(1, + 23), ylab = "Homozygous (BAF=2) segment means", xlab = "Chromosome") + abline(h = median(segway1$seg.mean), col = "blue") else { + plot(c(1, 23), c(0, 0), main = cid, type = "n", ylim = c(-0.1, + 0.1), ylab = "Homozygous segment means", xlab = "Chromosome") > abline(v = seq(1.5, 22.5), col = "gray", lwd = 2) > abline(h = 0, col = "blue", lwd = 2) > abline(h = log10(1/2), col = "orange", lwd = 2) The results for cell line CL001 are shown in Figure 1. From the central panel, we see that there are at lest three different characteristic intensity levels. The lowest of these levels, at approximately 0.3 or 0.2, presumably corresponds to the normal two-copy situation. Each higher level should double the number of copies (since the three-band BAF plot can only occur with equal numbers of heterozygous chromosomes). Similarly, the lower panel also shows multiple characteristic intensity levels. The lowest level (at approximately 0.3) presumably corresponds to the three-copy situation, with each additional level increasing the number of chromosomes by one (since a four-band BAF plot will occur whenever you have two heterozygous chromosomes in unequal numbers). Both panels strongly suggest that the LRR values are systematically biased down and need to be increased in order to get sensible interpretations. Now we generate similar plots for all the cell lines and save copies in the subdirectory Adjust. > if (!file.exists("adjust")) dir.create("adjust") > for (cid in shortnames) { + <<identify.baf.segments>> + opar <- par(mfrow=c(3,1), mai=c(0.82, 0.8, 0.5, 0.1), bg='white') + <<plot.onecopy.lrr.means>> + <<plot.twocopy.lrr.means>> + <<plot.threecopy.lrr.means>> + par(opar) + fn <- file.path("adjust", paste(cid, "png", sep='.'))
7 07-adjustFigs 7 Homozygous (BAF=2) segment means CL Chromosome Balanced heterozygous (BAF=3) segment means CL Chromosome Unbalanced heterozygous (BAF=4) segment mean CL Chromosome Figure 1: Segment means, by chromosome, of regions that are nominally two-copy (top) or threecopy (bottom).
8 07-adjustFigs 8 + dev.copy(png, filename=fn, width=800, height=600) + dev.off() 3.2 All Segment Means We have also found it useful to plot all of the segment means in a single graph per sample. Here is the code to create these plots: > segway <- segs[segs$samid == cid, ] > temp <- segway$seg.mean > temp[temp < -1.5] <- NA > ot <- order(segway$chrom, segway$loc.start) > cs <- cumsum(table(segway$chrom)) > cs2 <- ((c(0, cs) + c(cs, max(cs)))/2)[1:23] > plot(temp[ot], main = cid, pch = 16, cex = 1/2, xlab = "", xaxt = "n", + ylab = "LRR") > mtext(c(1:22, "X"), side = 3, at = cs2, las = 2, line = 0.5) > mtext(c(1:22, "X"), side = 1, at = cs2, las = 2, line = 0.5) > abline(v = cs, col = "purple") > abline(h = 0, col = "blue", lwd = 2) > abline(h = log10(3/2), col = "green", lwd = 2) > abline(h = log10(1/2), col = "orange", lwd = 2) In Figure 2, we plot all the segment means for CL001. We see that the means are roughly centered about LRR = 0, which would be appropriate only if this cell line has close to the normal complement of 46 chromosomes. As we will see later, that is clearly not the case. As above, we generate similar plots for each cell line. These are stored in the subdirectory SegMeans. > if (!file.exists("segmeans")) dir.create("segmeans") > for (cid in shortnames) { + segway <- segs[segs$samid == cid, ] + temp <- segway$seg.mean + temp[temp < -1.5] <- NA + ot <- order(segway$chrom, segway$loc.start) + cs <- cumsum(table(segway$chrom)) + cs2 <- ((c(0, cs) + c(cs, max(cs)))/2)[1:23] + plot(temp[ot], main = cid, pch = 16, cex = 1/2, xlab = "", xaxt = "n", + ylab = "LRR") + mtext(c(1:22, "X"), side = 3, at = cs2, las = 2, line = 0.5) + mtext(c(1:22, "X"), side = 1, at = cs2, las = 2, line = 0.5) + abline(v = cs, col = "purple") + abline(h = 0, col = "blue", lwd = 2)
9 07-adjustFigs 9 CL001 LRR X X Figure 2: Plot of LRR segments means, ordered along chromosomes, for cell line CL001.
10 07-adjustFigs 10 + abline(h = log10(3/2), col = "green", lwd = 2) + abline(h = log10(1/2), col = "orange", lwd = 2) + fn <- file.path("segmeans", paste(cid, "png", sep = ".")) + dev.copy(png, filename = fn, width = 800, height = 600) + dev.off() 4 Appendix This analysis was run in the following directory: > getwd() [1] "o:/private/abruzzo/snp-cll/aa" Note that \\mdadqsfs02 is the standard insititutional location for storing data and analyses; N: is the name given to that location on this machine. This analysis was run in the following software environment: > sessioninfo() R version ( ) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grdevices utils datasets methods base other attached packages: [1] nlme_ loaded via a namespace (and not attached): [1] cluster_ grid_ lattice_ RColorBrewer_1.0-2 [5] xtable_1.5-6 > while (!is.null(dev.list())) dev.off()
Preparing the Final Data Set
Preparing the Final Data Set Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................
More informationPlotting Segment Calls From SNP Assay
Plotting Segment Calls From SNP Assay Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................
More informationReorganizing the data by sample
Reorganizing the data by sample Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................
More informationReorganizing the data by sample
Reorganizing the data by sample Kevin R. Coombes 23 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................
More informationMain Results. Kevin R, Coombes. 10 September 2011
Main Results Kevin R, Coombes 10 September 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives.................................. 1 1.2
More informationQuality control of array genotyping data with argyle Andrew P Morgan
Quality control of array genotyping data with argyle Andrew P Morgan 2015-10-08 Introduction Proper quality control of array genotypes is an important prerequisite to further analysis. Genotype quality
More informationShrinkage of logarithmic fold changes
Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs
More informationPackage RLMM. March 7, 2019
Version 1.44.0 Date 2005-09-02 Package RLMM March 7, 2019 Title A Genotype Calling Algorithm for Affymetrix SNP Arrays Author Nusrat Rabbee , Gary Wong
More informationgenocn: integrated studies of copy number and genotype
genocn: integrated studies of copy number and genotype Sun, W., Wright, F., Tang, Z., Nordgard, S.H., Van Loo, P., Yu, T., Kristensen, V., Perou, C. February 22, 2010 1 Overview > library(genocn) This
More information500K Data Analysis Workflow using BRLMM
500K Data Analysis Workflow using BRLMM I. INTRODUCTION TO BRLMM ANALYSIS TOOL... 2 II. INSTALLATION AND SET-UP... 2 III. HARDWARE REQUIREMENTS... 3 IV. BRLMM ANALYSIS TOOL WORKFLOW... 3 V. RESULTS/OUTPUT
More informationAA BB CC DD EE. Introduction to Graphics in R
Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat
More informationCQN (Conditional Quantile Normalization)
CQN (Conditional Quantile Normalization) Kasper Daniel Hansen khansen@jhsph.edu Zhijin Wu zhijin_wu@brown.edu Modified: August 8, 2012. Compiled: April 30, 2018 Introduction This package contains the CQN
More informationPreprocessing and Genotyping Illumina Arrays for Copy Number Analysis
Preprocessing and Genotyping Illumina Arrays for Copy Number Analysis Rob Scharpf September 18, 2012 Abstract This vignette illustrates the steps required prior to copy number analysis for Infinium platforms.
More informationAffymetrix GeneChip DNA Analysis Software
Affymetrix GeneChip DNA Analysis Software User s Guide Version 3.0 For Research Use Only. Not for use in diagnostic procedures. P/N 701454 Rev. 3 Trademarks Affymetrix, GeneChip, EASI,,,, HuSNP, GenFlex,
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber
Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis, Heidelberg, March 2005 http://compdiag.molgen.mpg.de/ngfn/pma2005mar.shtml The following
More informationPackage saascnv. May 18, 2016
Version 0.3.4 Date 2016-05-10 Package saascnv May 18, 2016 Title Somatic Copy Number Alteration Analysis Using Sequencing and SNP Array Data Author Zhongyang Zhang [aut, cre], Ke Hao [aut], Nancy R. Zhang
More informationCalibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec
Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging,
More informationBioinformatics - Homework 1 Q&A style
Bioinformatics - Homework 1 Q&A style Instructions: in this assignment you will test your understanding of basic GWAS concepts and GenABEL functions. The materials needed for the homework (two datasets
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth
Exploring cdna Data Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you
More informationUsing crlmm for copy number estimation and genotype calling with Illumina platforms
Using crlmm for copy number estimation and genotype calling with Illumina platforms Rob Scharpf November, Abstract This vignette illustrates the steps necessary for obtaining marker-level estimates of
More informationBICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017
BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and
More informationExamples of implementation of pre-processing method described in paper with R code snippets - Electronic Supplementary Information (ESI)
Electronic Supplementary Material (ESI) for Analyst. This journal is The Royal Society of Chemistry 2015 Examples of implementation of pre-processing method described in paper with R code snippets - Electronic
More informationPlotting: An Iterative Process
Plotting: An Iterative Process Plotting is an iterative process. First we find a way to represent the data that focusses on the important aspects of the data. What is considered an important aspect may
More informationsurvsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes
survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes Kouros Owzar Zhiguo Li Nancy Cox Sin-Ho Jung Chanhee Yi June 29, 2016 1 Introduction This vignette
More informationR/BioC Exercises & Answers: Unsupervised methods
R/BioC Exercises & Answers: Unsupervised methods Perry Moerland April 20, 2010 Z Information on how to log on to a PC in the exercise room and the UNIX server can be found here: http://bioinformaticslaboratory.nl/twiki/bin/view/biolab/educationbioinformaticsii.
More informationGraphics - Part III: Basic Graphics Continued
Graphics - Part III: Basic Graphics Continued Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Highway MPG 20 25 30 35 40 45 50 y^i e i = y i y^i 2000 2500 3000 3500 4000 Car Weight Copyright
More informationcrlmm to downstream data analysis
crlmm to downstream data analysis VJ Carey, B Carvalho March, 2012 1 Running CRLMM on a nontrivial set of CEL files To use the crlmm algorithm, the user must load the crlmm package, as described below:
More informationPackage SNPchip. May 3, 2018
Version 2.26.0 Title Visualizations for copy number alterations Package SNPchip May 3, 2018 Author Robert Scharpf and Ingo Ruczinski Maintainer Robert Scharpf Depends
More informationRaman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set
Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging, IPHT Jena e.v. February 13,
More informationLD vignette Measures of linkage disequilibrium
LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn
More informationExploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber
Exploring cdna Data Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber Practical DNA Microarray Analysis http://compdiag.molgen.mpg.de/ngfn/pma0nov.shtml The following exercise will guide you
More informationStatistical Programming with R
Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan biqelan@iugaza.edu.ps Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester
More informationModule 10. Data Visualization. Andrew Jaffe Instructor
Module 10 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots on Wednesday, but we are going to expand the ability to customize these basic graphics first. 2/37 But first...
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More informationHow to use CNTools. Overview. Algorithms. Jianhua Zhang. April 14, 2011
How to use CNTools Jianhua Zhang April 14, 2011 Overview Studies have shown that genomic alterations measured as DNA copy number variations invariably occur across chromosomal regions that span over several
More informationContents. Introduction 2
R code for The human immune system is robustly maintained in multiple stable equilibriums shaped by age and cohabitation Vasiliki Lagou, on behalf of co-authors 18 September 2015 Contents Introduction
More informationCount outlier detection using Cook s distance
Count outlier detection using Cook s distance Michael Love August 9, 2014 1 Run DE analysis with and without outlier removal The following vignette produces the Supplemental Figure of the effect of replacing
More informationThe LDheatmap Package
The LDheatmap Package May 6, 2006 Title Graphical display of pairwise linkage disequilibria between SNPs Version 0.2-1 Author Ji-Hyung Shin , Sigal Blay , Nicholas Lewin-Koh
More informationStats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert
Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert 2017-01-08 Contents Introduction 2 Peak-calling: question...........................................
More informationBioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness
Bioconductor exercises Exploring cdna data June 2004 Wolfgang Huber and Andreas Buness The following exercise will show you some possibilities to load data from spotted cdna microarrays into R, and to
More informationContents of this guide
extraction sequencing genotyping extraction sequencing genotyping extraction sequencing genotyping extraction sequencing SNPviewer User guide Contents of this guide 1 Introduction 2 Getting started 3 Exploring
More informationThe analysis of acgh data: Overview
The analysis of acgh data: Overview JC Marioni, ML Smith, NP Thorne January 13, 2006 Overview i snapcgh (Segmentation, Normalisation and Processing of arraycgh data) is a package for the analysis of array
More informationPackage lodgwas. R topics documented: November 30, Type Package
Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter
More informationIntroduction to R 21/11/2016
Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced
More informationIntro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington
Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs
More informationRaman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set
Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Claudia Beleites DIA Raman Spectroscopy Group, University of Trieste/Italy (2 28) Spectroscopy Imaging,
More informationCreating a custom mappings similarity matrix
BioNumerics Tutorial: Creating a custom mappings similarity matrix 1 Aim In BioNumerics, character values can be mapped to categorical names according to predefined criteria (see tutorial Importing non-numerical
More informationUsing the qrqc package to gather information about sequence qualities
Using the qrqc package to gather information about sequence qualities Vince Buffalo Bioinformatics Core UC Davis Genome Center vsbuffalo@ucdavis.edu 2012-02-19 Abstract Many projects in bioinformatics
More informationIntroduction to R (BaRC Hot Topics)
Introduction to R (BaRC Hot Topics) George Bell September 30, 2011 This document accompanies the slides from BaRC s Introduction to R and shows the use of some simple commands. See the accompanying slides
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationPackage icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang
Title Integrated Copy Number Variation detection Version 1.2.1 Author Zilu Zhou, Nancy Zhang Package icnv March 8, 2019 Maintainer Zilu Zhou Integrative copy number variation
More informationData Visualization. Andrew Jaffe Instructor
Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationIntroduction to R for Epidemiologists
Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression
More informationPackage logicfs. R topics documented:
Package logicfs November 21, 2017 Title Identification of SNP Interactions Version 1.48.0 Date 2013-09-12 Author Holger Schwender Maintainer Holger Schwender Depends LogicReg, mcbiopi
More informationR.devices. Henrik Bengtsson. November 19, 2012
R.devices Henrik Bengtsson November 19, 2012 Abstract The R.devices package provides utility methods that enhance the existing graphical device functions already available in R for the purpose of simplifying
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationBEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010
BEAGLECALL 1.0 Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington 15 November 2010 BEAGLECALL 1.0 P a g e i Contents 1 Introduction... 1 1.1 Citing BEAGLECALL...
More informationConvert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc.
Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc. Overview This script converts allelic dosage values to genotypes based on user-specified thresholds. The dosage data may be in
More informationPackage mixphm. July 23, 2015
Type Package Title Mixtures of Proportional Hazard Models Version 0.7-2 Date 2015-07-23 Package mixphm July 23, 2015 Fits multiple variable mixtures of various parametric proportional hazard models using
More informationPerformance assessment of vsn with simulated data
Performance assessment of vsn with simulated data Wolfgang Huber November 30, 2008 Contents 1 Overview 1 2 Helper functions used in this document 1 3 Number of features n 3 4 Number of samples d 3 5 Number
More informationUsing the RCircos Package
Using the RCircos Package Hongen Zhang, Ph.D. Genetics Branch, Center for Cancer Research, National Cancer Institute, NIH August 01, 2016 Contents 1 Introduction 1 2 Input Data Format 2 3 Plot Track Layout
More informationTutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice. 1. Data input and cleaning
Tutorial for the WGCNA package for R II. Consensus network analysis of liver expression data, female and male mice 1. Data input and cleaning Peter Langfelder and Steve Horvath February 13, 2016 Contents
More informationStatistical Programming Camp: An Introduction to R
Statistical Programming Camp: An Introduction to R Handout 3: Data Manipulation and Summarizing Univariate Data Fox Chapters 1-3, 7-8 In this handout, we cover the following new materials: ˆ Using logical
More informationRAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1
RAPIDR Kitty Lo November 20, 2014 Contents 1 Intended use of RAPIDR 1 2 Create binned counts file from BAMs 1 2.1 Masking.................................................... 1 3 Build the reference 2 3.1
More informationPractical 2: Plotting
Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationPackage sciplot. February 15, 2013
Package sciplot February 15, 2013 Version 1.1-0 Title Scientific Graphing Functions for Factorial Designs Author Manuel Morales , with code developed by the R Development Core Team
More informationStep-by-Step Guide to Basic Genetic Analysis
Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control
More informationIntroduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010
UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview
More informationdata visualization Show the Data Snow Month skimming deep waters
data visualization skimming deep waters Show the Data Snow 2 4 6 8 12 Minimize Distraction Minimize Distraction Snow 2 4 6 8 12 2 4 6 8 12 Make Big Data Coherent Reveal Several Levels of Detail 1974 1975
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationPackage R2SWF. R topics documented: February 15, Version 0.4. Title Convert R Graphics to Flash Animations. Date
Package R2SWF February 15, 2013 Version 0.4 Title Convert R Graphics to Flash Animations Date 2012-07-14 Author Yixuan Qiu and Yihui Xie Maintainer Yixuan Qiu Suggests XML, Cairo
More informationPackage vipor. March 22, 2017
Type Package Package vipor March 22, 2017 Title Plot Categorical Data Using Quasirandom Noise and Density Estimates Version 0.4.5 Date 2017-03-22 Author Scott Sherrill-Mix, Erik Clarke Maintainer Scott
More informationELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2
ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................
More informationHow to use cghmcr. October 30, 2017
How to use cghmcr Jianhua Zhang Bin Feng October 30, 2017 1 Overview Copy number data (arraycgh or SNP) can be used to identify genomic regions (Regions Of Interest or ROI) showing gains or losses that
More informationSNPViewer Documentation
SNPViewer Documentation Module name: Description: Author: SNPViewer Displays SNP data plotting copy numbers and LOH values Jim Robinson (Broad Institute), gp-help@broad.mit.edu Summary: The SNPViewer displays
More informationOperating instructions for MixtureCalc v1.2 (Freeware Version)
1 Objective To provide instructions for importing data into and for performing mixture calculations using MixtureCalc-v1.2. MixtureCalc-v1.2 is validated for SPSA casework only and no warranty is provided
More informationUnivariate Data - 2. Numeric Summaries
Univariate Data - 2. Numeric Summaries Young W. Lim 2018-08-01 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-08-01 Mon 1 / 36 Outline 1 Univariate Data Based on Numerical Summaries R Numeric
More informationKaryoStudio v1.4 User Guide
KaryoStudio v1.4 User Guide FOR RESEARCH USE ONLY ILLUMINA PROPRIETARY Part # 11328837 Rev. C June 2011 Notice This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"),
More informationThe nor1mix Package. August 3, 2006
The nor1mix Package August 3, 2006 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-6 Date 2006-08-02 Author: Martin Mächler Maintainer Martin Maechler
More informationRelease Notes. JMP Genomics. Version 4.0
JMP Genomics Version 4.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive
More informationPackage gsbdesign. March 29, 2016
Type Package Title Group Sequential Bayes Design Version 1.00 Date 2016-03-27 Author Florian Gerber, Thomas Gsponer Depends gsdesign, lattice, grid, Imports stats, graphics, grdevices, utils Package gsbdesign
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationMicroarray Technology (Affymetrix ) and Analysis. Practicals
Data Analysis and Modeling Methods Microarray Technology (Affymetrix ) and Analysis Practicals B. Haibe-Kains 1,2 and G. Bontempi 2 1 Unité Microarray, Institut Jules Bordet 2 Machine Learning Group, Université
More informationIntroduction for heatmap3 package
Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are
More informationAnalysis of two-way cell-based assays
Analysis of two-way cell-based assays Lígia Brás, Michael Boutros and Wolfgang Huber April 16, 2015 Contents 1 Introduction 1 2 Assembling the data 2 2.1 Reading the raw intensity files..................
More informationPackage copynumber. November 21, 2017
Package copynumber November 21, 2017 Title Segmentation of single- and multi-track copy number data by penalized least squares regression. Version 1.19.0 Author Gro Nilsen, Knut Liestoel and Ole Christian
More information> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)
This document describes how to use a number of R commands for plotting one variable and for calculating one variable summary statistics Specifically, it describes how to use R to create dotplots, histograms,
More informationAffymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.)
Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.) Genotyping Console 4.2 includes the following changes and enhancements: 1. Edit Calls within
More informationPackage OmicCircos. R topics documented: November 17, Version Date
Version 1.16.0 Date 2015-02-23 Package OmicCircos November 17, 2017 Title High-quality circular visualization of omics data Author Maintainer Ying Hu biocviews Visualization,Statistics,Annotation
More informationGraphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley
Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Base Graphics 2 Graphics in R Traditional
More informationImporting and visualizing data in R. Day 3
Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation
More informationValidating predictions on the Chang data
Validating predictions on the Chang data Kevin R. Coombes, Jing Wang, and Keith A. Baggerly 13 March 2007 1 Description of the problem We want to test the predictions from five models, built on training
More informationRLMM - Robust Linear Model with Mahalanobis Distance Classifier
RLMM - Robust Linear Model with Mahalanobis Distance Classifier Nusrat Rabbee and Gary Wong June 13, 2018 Contents 1 Introduction 1 2 Instructions for Genotyping Affymetrix Mapping 100K array - Xba set
More informationGenetic Analysis. Page 1
Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced
More informationSIBER User Manual. Pan Tong and Kevin R Coombes. May 27, Introduction 1
SIBER User Manual Pan Tong and Kevin R Coombes May 27, 2015 Contents 1 Introduction 1 2 Using SIBER 1 2.1 A Quick Example........................................... 1 2.2 Dealing With RNAseq Normalization................................
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More informationAn Introduction to the genoset Package
An Introduction to the genoset Package Peter M. Haverty April 4, 2013 Contents 1 Introduction 2 1.1 Creating Objects........................................... 2 1.2 Accessing Genome Information...................................
More information