Analysis of ChIP-seq Data with mosaics Package
|
|
- Aldous Wheeler
- 5 years ago
- Views:
Transcription
1 Analysis of ChIP-seq Data with mosaics Package Dongjun Chung 1, Pei Fen Kuan 2 and Sündüz Keleş 1,3 1 Department of Statistics, University of Wisconsin Madison, WI Department of Biostatistics, University of North Carolina at Chapel Hill Chapel Hill, NC Department of Biostatistics and Medical Informatics, University of Wisconsin Madison, WI September 24, Overview This vignette provides an introduction to the analysis of ChIP-seq data with mosaics package. R package mosaics implements MOSAiCS, a statistical framework for the analysis of ChIP-seq data, proposed in [1]. MOSAiCS stands for MOdel-based one and two Sample Analysis and Inference for ChIP-Seq Data. Based on careful investigation of biases in ChIP-seq data such as mappability and GC content, MOSAiCS has been developed as a flexible mixture modeling approach for detecting peaks of one-sample (ChIP sample) or two-sample (ChIP sample and matched control sample) ChIP-seq data. mosaics package assumes chromosome-wise analysis of ChIP-seq data. In this vignette, we use chromosome 22 of the data from a ChIP-seq experiment of STAT1 binding in interferon-γstimulated HeLa S3 cells from [2] The package can be loaded with the command: R> library("mosaics") 2 Workflow: One-Sample Analysis 2.1 Read Bin-Level Data into the R Environment For the one-sample analysis, MOSAiCS requires four information: bin-level ChIP data, mappability score, GC content score, and sequence ambiguity score. For human genome (HG18) and mouse genome (MM9), we provide mappability score, GC content score, and sequence ambiguity score for each chromosome. Please check for further information about these files and how to download them. Bin-level ChIP data can be obtained from the aligned data (e.g., a file obtained from ELAND) using the perl codes we provide. Users can download these perl codes from Bin-level data obtained from our perl codes can be imported to the R environment with the command: R> bin1 <- readbins(type = c("chip", "M", "GC", "N"), filename = c("./chip_chr22.txt", + "./M_chr22.txt", "./GC_chr22.txt", "./N_chr22.txt")) 1
2 In type, chip, M, GC, N indicate bin-level ChIP data, mappability score, GC content score, and sequence ambiguity score, respectively. User needs to provide the corresponding file names in filename. mosaics package assumes that each file name in filename is provided in the same order as in type. This method also provides information about percentage of bins with ambiguous sequences (i.e., sequence ambiguity score = 1 ) and first/last coordinates of bins before and after preprocessing. R package mosaics provides simple investigation tools for bin-level data. The following command prints out basic information about the bin-level data, such as number of coordinates and total effective tag counts. Total effective tag counts are defined as the sum of tag counts of all bins. This value is usually larger than the number of all the tags used in the analysis. Total effective tag counts provide more useful information about the size of data used in the analysis. R> bin1 Summary: bin-level data (class: BinData) # of coordinates = total effective tag counts = (sum of tag counts of all bins) - input sample is incorporated - mappability is incorporated - GC content is incorporated - uniquely matched tags are assumed print method returns the bin-level data in data frame format. R> print(bin1)[1:10, ] coord tagcount mappability gccontent input plot method provides exploratory plots of mean ChIP tag counts. plottype controls which plot to be drawn. plottype="m" and plottype="gc" generate a plot of mean tag counts versus mappability score and GC content score, respectively. If plottype is not specified, this method plots histogram of mean ChIP tag count. Figure 1, 2, and 3 show examples. One can see that mean tag count increases as mappability score increases. As GC content score increases, mean tag count increases to certain value of GC content score (about 0.6) and then it decreases. R> plot(bin1, plottype = "M") R> plot(bin1, plottype = "GC") R> plot(bin1) 2
3 Mappability vs. ChIP mean tag count ChIP mean tag count mappability Figure 1: Plot of mean ChIP tag counts versus mappability score. 2.2 Fit MOSAiCS We are now ready to fit a MOSAiCS model. The bin-level data above (say, bin1) is used as an input. A MOSAiCS model is fitted with the command: R> fit1 <- mosaicsfit(bin1, analysistype = "OS") analysistype="os" indicates implementation of the one-sample analysis. Without input sample, only the one-sample analysis is permitted. mosaicsfit fits both one-signal-component and two-signal-component models. When calling peaks, user can choose the number of signal components used for the final model. The optimal choice of the number of signal components depends on the characteristics of data. In order to support user in choice of optimal signal model, mosaics package provides Bayesian Information Criterion (BIC) values and Goodness of Fit (GOF) plot of these signal models. The following command shows BIC of signal models, with information about the parameters used in fitting the null (non-enriched) distribution. Lower BIC indicates better model fit and we can see that two-signal-component model has lower BIC in our case. 3
4 GC content vs. ChIP mean tag count ChIP mean tag count GC content Figure 2: Plot of mean ChIP tag counts versus GC content score. R> fit1 Summary: MOSAiCS model fitting (class: MosaicsFit) analysis type: one-sample analysis parameters used: k = 3, meanthres = 0 BIC of one-signal-component model = BIC of two-signal-component model = plot method provides GOF plot. Using the GOF plot, user can compare null model, onesignal-component model, and two-signal-component model, with the actual data. Specifically, user can check which signal model fits the actual data better. Figure 4 shows an example GOF plot. One can see that two-signal-component model provides better fit for our data. R> plot(fit1) 4
5 Histogram of ChIP tag count Frequency e ChIP Tag count Figure 3: Histogram of mean ChIP tag counts. 2.3 Call Peaks [Note: We may be going to include explanation of peak refinement and the procedure to implement it in this section.] We can call peaks with the command: R> peak1 <- mosaicspeak(fit1, signalmodel = "2S", FDR = 0.05, maxgap = 200, + minsize = 50, thres = 10) Info: use two-signal-component model. Info: calculating posterior probabilities... Info: calling peaks... Info: empirical FDR (before thresholding) = 0.05 Info: empirical FDR (after thresholding) = Info: done! Using BIC and GOF plot in the previous section, we can conclude that two-signal-component model fits our data better. signalmodel="2s" indicates two-signal-component model. Similarly, 5
6 Frequency e+05 Actual data Sim:N Sim:N+S1 Sim:N+S1+S Tag count Figure 4: Goodness of Fit (GOF) plot. The plot shows null model (Sim:N), one-signal-component model (Sim:N+S1), and two-signal-component model (Sim:N+S1+S2), with the actual data. one-signal-component model can be specified by signalmodel="1s". Additionally, user can control false discovery rate ( FDR ), distance (in bp) to combine initial peaks ( maxgap ), minimum size (in bp) of the initial peak so that it is claimed as a peak ( minsize ), and minimum ChIP tag count in each bin so that the bin is claimed as a peak ( thres). User needs to choose these paramaters in order to call peaks accurately. We recommend to set maxgap and minsize as the fragment length and bin size, respectively. Initial nearby peaks are merged if the distance between them is less than maxgap. This is because a fragment covers several bins when bin size is chosen to be shorter than the fragment length. The initial peaks of which length are shorter than minsize are removed. This is because very small initial peak (e.g., singleton) is less likely to be a real peak when bin size is chosen to be shorter than the fragment length. When bin size is same as the fragment length, minsize should be smaller than the fragment length. This is because there are many (potentially real) peaks of which size is the same as bin size. thres is employed to filter out initial peaks with small tag counts because these initial peaks might correspond to false discoveries. Optimal choice depends on the sequencing depth of the ChIP-Seq data. If negative value is provided for thres, thresholding is not applied. 6
7 The following command prints out basic information of called peaks such as number of called peaks, median peak length, and empirical false discovery rate. R> peak1 Summary: MOSAiCS peak calling (class: MosaicsPeak) final model: one-sample analysis with two signal components setting: FDR = 0.05, maxgap = 200, minsize = 50, thres = 10 # of peaks = 1690 median peak length = 349 empirical FDR = print method returns the peak calling results in data frame format. This data frame can contain different number of columns, depending on analysistype of mosaicsfit. For example, if analysistype="os", columns are peak start position, peak stop position, peak size, averaged posterior probabilities, minimum posterior probability, averaged ChIP tag counts, and maximum of ChIP tag counts in each peak. Here, posterior probablity of a bin means the probability that the bin is not a peak, conditional on data. Hence, small posterior probability indicates more evidence that the bin is actually a peak. This output is intended as an input for downstream analysis such as motif analysis. R> print(peak1)[1:10, ] peakstart peakstop peaksize avep minp avechipcount e e e e e e e e e e e e e e e e e e e e maxchipcount map GC User can also export peak calling results to text files in diverse file formats. Currently, mosaics package support TXT, BED, and GFF file formats. In the exported file, TXT file format ( type="txt" ) 7
8 includes all the columns that print method returns. type="bed" and type="gff" export peak calling results in standard BED and GFF file format, respectively, where score is summed tag counts in each peak. Peak calling results can be exported in TXT, BED, and GFF file formats, respectively, with the commands: R> export(peak1, type = "txt", fileloc = "./", filename = "OSpeakList.txt", + chrid = "chr22") R> export(peak1, type = "bed", fileloc = "./", filename = "OSpeakList.bed", + chrid = "chr22") R> export(peak1, type = "gff", fileloc = "./", filename = "OSpeakList.gff", + chrid = "chr22") fileloc and filename indicate the directory and the name of the exported file. chrid means chromosome ID. mosaics package assumes chromosome-wise analysis and does not require user to supply chromosome ID during the analysis. In the exported file, the character specified in chrid is used as chromosome ID, which will be the first column in standard BED and GFF file formats. 3 Workflow: Two-Sample Analysis 3.1 Analysis using Mappability and GC Content When input data is additionally provided, the two-sample analysis can be implemented. The procedure to implement the two-sample analysis is analogous to the procedure to implement the onesample analysis. Bin-level data for the two-sample analysis can be imported to the R environment with the command: R> bin2 <- readbins(type = c("chip", "M", "GC", "N", "input"), filename = c("./chip_chr22.txt", + "./M_chr22.txt", "./GC_chr22.txt", "./N_chr22.txt", "./Input_chr22.txt")) In other words, we need one more element, input, in type. The file name of input data also needs to be specified in filename. When input data is provided, plottype="input" generates a plot of mean ChIP tag counts versus input tag count. Moreover, plottype="m input" and plottype="gc input" generate plots of mean ChIP tag counts versus mappability score and GC content score, respectively, conditional on input tag count. R> plot(bin1, plottype = "input") R> plot(bin1, plottype = "M input") R> plot(bin1, plottype = "GC input") In order to fit MOSAiCS model for the two-sample analysis, when calling mosaicsfit method, user should specify analysistype="ts" instead of analysistype="os". R> fit2 <- mosaicsfit(bin2, analysistype = "TS") Peak calling can be done exactly in the same way as before. Also all the other methods can still be used as in the one-sample analysis case. R> peak2 <- mosaicspeak(fit2, signalmodel = "2S", FDR = 0.05, maxgap = 200, + minsize = 50, thres = 10) 8
9 3.2 Analysis without using Mappability and GC Content Application of MOSAiCS methods to the real data showed that consideration of mappability and GC content in the model improves sensitivity and specificity of peak calling [1]. However, mosaics package still permits the two-sample analysis without considering mappability and GC content. In order to fit MOSAiCS model for the two-sample analysis without considering mappability and GC content, analysistype="io" needs to be specified when calling mosaicsfit method. R> fit3 <- mosaicsfit(bin2, analysistype = "IO") Furthermore, in this case, mappability score, GC content score, and sequence ambiguity score might not be incorporated when importing bin-level data. User can import bin-level data and fit MOSAiCS model for the two-sample analysis without mappabilibity and GC content, with the commands: R> bin4 <- readbins(type = c("chip", "input"), filename = c("./chip_chr22.txt", + "./Input_chr22.txt")) R> fit4 <- mosaicsfit(bin4, analysistype = "IO") Note that with these commands mosaics package skips initial processing of bin-level data using mappability score, GC content score, and sequence ambiguity score. We recommend to incorporate mappability score, GC content score, and sequence ambiguity score as long as they are available to users. 4 Tuning of MOSAiCS Parameters For the two-sample analysis, mosaics package permits user to tune the parameters used in fitting the null distribution of MOSAiCS model. When both mappability score and GC content score are available (i.e., the two-sample analysis using mappability and GC content), user can control two tuning parameters, s and d. s means the cut-off to determine lower and higher values of input data. d means the order used in the transformation of input data. Please see [1] for further details about model parameters. For the two-sample analysis using mappability and GC content, mosaics package uses the following parameter setting as default: R> fit3_ts <- mosaicsfit(bin2, analysistype = "TS", s = 2, d = 0.25) When mappability score and GC content score are not available (i.e., the two-sample analysis without using mappability and GC content), user can tune only one parameter, d. For the twosample analysis without using mappability and GC content, mosaics package uses the following parameter setting as default: R> fit3_io <- mosaicsfit(bin2, analysistype = "IO", d = 0.25) 5 Conclusion and Ongoing Development R package mosaics provides effective tools to read and investigate ChIP-seq data, fit MOSAiCS, and claim peaks. We are continuously working on improving mosaics package further in more friendly user interface and more effective and diverse investigation tools. 9
10 References [1] Kuan, PF, D Chung, JA Thomson, R Stewart, and S Keleş (2010), A Statistical Framework for the Analysis of ChIP-Seq Data, submitted ( 19/). [2] Rozowsky, J, G Euskirchen, R Auerbach, D Zhang, T Gibson, R Bjornson, N Carriero, M Snyder, and M Gerstein (2009), PeakSeq enables systematic scoring of ChIP-Seq experiments relative to controls, Nature Biotechnology, 27,
Analysis of ChIP-seq Data with mosaics Package
Analysis of ChIP-seq Data with mosaics Package Dongjun Chung 1, Pei Fen Kuan 2 and Sündüz Keleş 1,3 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics,
More informationAnalysis of ChIP-seq Data with mosaics Package
Analysis of ChIP-seq Data with mosaics Package Dongjun Chung 1, Pei Fen Kuan 2 and Sündüz Keleş 1,3 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics,
More informationAnalysis of ChIP-seq Data with mosaics Package: MOSAiCS and MOSAiCS-HMM
Analysis of ChIP-seq Data with mosaics Package: MOSAiCS and MOSAiCS-HMM Dongjun Chung 1, Pei Fen Kuan 2, Rene Welch 3, and Sündüz Keleş 3,4 1 Department of Public Health Sciences, Medical University of
More informationPackage mosaics. April 23, 2016
Type Package Package mosaics April 23, 2016 Title MOSAiCS (MOdel-based one and two Sample Analysis and Inference for ChIP-Seq) Version 2.4.1 Depends R (>= 3.0.0), methods, graphics, Rcpp Imports MASS,
More informationPackage mosaics. July 3, 2018
Type Package Package mosaics July 3, 2018 Title MOSAiCS (MOdel-based one and two Sample Analysis and Inference for ChIP-Seq) Version 2.19.0 Depends R (>= 3.0.0), methods, graphics, Rcpp Imports MASS, splines,
More informationUser Manual of ChIP-BIT 2.0
User Manual of ChIP-BIT 2.0 Xi Chen September 16, 2015 Introduction Bayesian Inference of Target genes using ChIP-seq profiles (ChIP-BIT) is developed to reliably detect TFBSs and their target genes by
More informationUser's guide to ChIP-Seq applications: command-line usage and option summary
User's guide to ChIP-Seq applications: command-line usage and option summary 1. Basics about the ChIP-Seq Tools The ChIP-Seq software provides a set of tools performing common genome-wide ChIPseq analysis
More informationPICS: Probabilistic Inference for ChIP-Seq
PICS: Probabilistic Inference for ChIP-Seq Xuekui Zhang * and Raphael Gottardo, Arnaud Droit and Renan Sauteraud April 30, 2018 A step-by-step guide in the analysis of ChIP-Seq data using the PICS package
More informationEBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments
EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments Ning Leng and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 The model 2 2.1 EBSeqHMM model..........................................
More informationProtocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data
Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data Table of Contents Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification
More informationHIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)
HIPPIE User Manual (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu) OVERVIEW OF HIPPIE o Flowchart of HIPPIE o Requirements PREPARE DIRECTORY STRUCTURE FOR HIPPIE EXECUTION o
More informationChIP-Seq Tutorial on Galaxy
1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data
More informationChIP-seq hands-on practical using Galaxy
ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling
More informationCLC Server. End User USER MANUAL
CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark
More informationAnalysis of ChIP-seq data
Before we start: 1. Log into tak (step 0 on the exercises) 2. Go to your lab space and create a folder for the class (see separate hand out) 3. Connect to your lab space through the wihtdata network and
More informationTriform: peak finding in ChIP-Seq enrichment profiles for transcription factors
Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors Karl Kornacker * and Tony Håndstad October 30, 2018 A guide for using the Triform algorithm to predict transcription factor
More informationPackage BAC. R topics documented: November 30, 2018
Type Package Title Bayesian Analysis of Chip-chip experiment Version 1.42.0 Date 2007-11-20 Author Raphael Gottardo Package BAC November 30, 2018 Maintainer Raphael Gottardo Depends
More informationChIP-seq (NGS) Data Formats
ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/
More informationPackage polyphemus. February 15, 2013
Package polyphemus February 15, 2013 Type Package Title Genome wide analysis of RNA Polymerase II-based ChIP Seq data. Version 0.3.4 Date 2010-08-11 Author Martial Sankar, supervised by Marco Mendoza and
More informationm6aviewer Version Documentation
m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.
More informationPackage BayesPeak. R topics documented: September 21, Version Date Title Bayesian Analysis of ChIP-seq Data
Package BayesPeak September 21, 2014 Version 1.16.0 Date 2009-11-04 Title Bayesian Analysis of ChIP-seq Data Author Christiana Spyrou, Jonathan Cairns, Rory Stark, Andy Lynch,Simon Tavar\{}\{}'{e}, Maintainer
More informationExpander 7.2 Online Documentation
Expander 7.2 Online Documentation Introduction... 2 Starting EXPANDER... 2 Input Data... 3 Tabular Data File... 4 CEL Files... 6 Working on similarity data no associated expression data... 9 Working on
More informationA short Introduction to UCSC Genome Browser
A short Introduction to UCSC Genome Browser Elodie Girard, Nicolas Servant Institut Curie/INSERM U900 Bioinformatics, Biostatistics, Epidemiology and computational Systems Biology of Cancer 1 Why using
More informationPackage GreyListChIP
Type Package Version 1.14.0 Package GreyListChIP November 2, 2018 Title Grey Lists -- Mask Artefact Regions Based on ChIP Inputs Date 2018-01-21 Author Gord Brown Maintainer Gordon
More informationThe ChIP-seq quality Control package ChIC: A short introduction
The ChIP-seq quality Control package ChIC: A short introduction April 30, 2018 Abstract The ChIP-seq quality Control package (ChIC) provides functions and data structures to assess the quality of ChIP-seq
More informationWorking with ChIP-Seq Data in R/Bioconductor
Working with ChIP-Seq Data in R/Bioconductor Suraj Menon, Tom Carroll, Shamith Samarajiwa September 3, 2014 Contents 1 Introduction 1 2 Working with aligned data 1 2.1 Reading in data......................................
More informationChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima
ChIP-seq practical: peak detection and peak annotation Mali Salmon-Divon Remco Loos Myrto Kostadima March 2012 Introduction The goal of this hands-on session is to perform some basic tasks in the analysis
More informationEBSeq: An R package for differential expression analysis using RNA-seq data
EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John A. Dawson, Christina Kendziorski October 9, 2012 Contents 1 Introduction 2 2 The Model 3 2.1 Two conditions............................
More informationChromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan
Supplementary Information for: Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan Contents Instructions for installing and running
More informationAgilent Genomic Workbench Lite Edition 6.5
Agilent Genomic Workbench Lite Edition 6.5 SureSelect Quality Analyzer User Guide For Research Use Only. Not for use in diagnostic procedures. Agilent Technologies Notices Agilent Technologies, Inc. 2010
More informationAnalyzing ChIP- Seq Data in Galaxy
Analyzing ChIP- Seq Data in Galaxy Lauren Mills RISS ABSTRACT Step- by- step guide to basic ChIP- Seq analysis using the Galaxy platform. Table of Contents Introduction... 3 Links to helpful information...
More informationPackage RAPIDR. R topics documented: February 19, 2015
Package RAPIDR February 19, 2015 Title Reliable Accurate Prenatal non-invasive Diagnosis R package Package to perform non-invasive fetal testing for aneuploidies using sequencing count data from cell-free
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationChIP-seq Analysis Practical
ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how
More informationPackage SVM2CRM. R topics documented: May 1, Type Package
Type Package Package SVM2CRM May 1, 2018 Title SVM2CRM: support vector machine for cis-regulatory elements detections Version 1.12.0 Date 2014-03-30 Description Detection of cis-regulatory elements using
More informationR topics documented: 2 R topics documented:
Package cn.mops March 20, 2019 Maintainer Guenter Klambauer Author Guenter Klambauer License LGPL (>= 2.0) Type Package Title cn.mops - Mixture of Poissons for CNV detection in
More informationPackage ChIC. R topics documented: May 4, Title Quality Control Pipeline for ChIP-Seq Data Version Author Carmen Maria Livi
Title Quality Control Pipeline for ChIP-Seq Data Version 1.0.0 Author Carmen Maria Livi Package ChIC May 4, 2018 Maintainer Carmen Maria Livi Quality control pipeline for ChIP-seq
More informationKC-SMART Vignette. Jorma de Ronde, Christiaan Klijn April 30, 2018
KC-SMART Vignette Jorma de Ronde, Christiaan Klijn April 30, 2018 1 Introduction In this Vignette we demonstrate the usage of the KCsmart R package. This R implementation is based on the original implementation
More informationRsubread package: high-performance read alignment, quantification and mutation discovery
Rsubread package: high-performance read alignment, quantification and mutation discovery Wei Shi 14 September 2015 1 Introduction This vignette provides a brief description to the Rsubread package. For
More informationPackage rgreat. June 19, 2018
Type Package Title Client for GREAT Analysis Version 1.12.0 Date 2018-2-20 Author Zuguang Gu Maintainer Zuguang Gu Package rgreat June 19, 2018 Depends R (>= 3.1.2), GenomicRanges, IRanges,
More informationTutorial for the Exon Ontology website
Tutorial for the Exon Ontology website Table of content Outline Step-by-step Guide 1. Preparation of the test-list 2. First analysis step (without statistical analysis) 2.1. The output page is composed
More informationRsubread package: high-performance read alignment, quantification and mutation discovery
Rsubread package: high-performance read alignment, quantification and mutation discovery Wei Shi 14 September 2015 1 Introduction This vignette provides a brief description to the Rsubread package. For
More informationEasy visualization of the read coverage using the CoverageView package
Easy visualization of the read coverage using the CoverageView package Ernesto Lowy European Bioinformatics Institute EMBL June 13, 2018 > options(width=40) > library(coverageview) 1 Introduction This
More informationSICER 1.1. If you use SICER to analyze your data in a published work, please cite the above paper in the main text of your publication.
SICER 1.1 1. Introduction For details description of the algorithm, please see A clustering approach for identification of enriched domains from histone modification ChIP-Seq data Chongzhi Zang, Dustin
More informationQIAseq DNA V3 Panel Analysis Plugin USER MANUAL
QIAseq DNA V3 Panel Analysis Plugin USER MANUAL User manual for QIAseq DNA V3 Panel Analysis 1.0.1 Windows, Mac OS X and Linux January 25, 2018 This software is for research purposes only. QIAGEN Aarhus
More informationASAP - Allele-specific alignment pipeline
ASAP - Allele-specific alignment pipeline Jan 09, 2012 (1) ASAP - Quick Reference ASAP needs a working version of Perl and is run from the command line. Furthermore, Bowtie needs to be installed on your
More informationAutomated Bioinformatics Analysis System on Chip ABASOC. version 1.1
Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular
More informationCopyright 2014 Regents of the University of Minnesota
Quality Control of Illumina Data using Galaxy Contents September 16, 2014 1 Introduction 2 1.1 What is Galaxy?..................................... 2 1.2 Galaxy at MSI......................................
More informationCorrelation Motif Vignette
Correlation Motif Vignette Hongkai Ji, Yingying Wei October 30, 2018 1 Introduction The standard algorithms for detecting differential genes from microarray data are mostly designed for analyzing a single
More informationPractical 4: ChIP-seq Peak calling
Practical 4: ChIP-seq Peak calling Shamith Samarajiiwa, Dora Bihary September 2017 Contents 1 Calling ChIP-seq peaks using MACS2 1 1.1 Assess the quality of the aligned datasets..................................
More informationIntroduction to Galaxy
Introduction to Galaxy Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 1 Thurs 28 th January 2016 Overview What is Galaxy? Description of
More informationDesign and Annotation Files
Design and Annotation Files Release Notes SeqCap EZ Exome Target Enrichment System The design and annotation files provide information about genomic regions covered by the capture probes and the genes
More informationBadRegionFinder an R/Bioconductor package for identifying regions with bad coverage
BadRegionFinder an R/Bioconductor package for identifying regions with bad coverage Sarah Sandmann April 30, 2018 Contents 1 Introduction 1 1.1 Loading the package.................................... 2
More informationPackage saascnv. May 18, 2016
Version 0.3.4 Date 2016-05-10 Package saascnv May 18, 2016 Title Somatic Copy Number Alteration Analysis Using Sequencing and SNP Array Data Author Zhongyang Zhang [aut, cre], Ke Hao [aut], Nancy R. Zhang
More informationRunning SNAP. The SNAP Team October 2012
Running SNAP The SNAP Team October 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More
More informationAdvanced UCSC Browser Functions
Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for
More informationStats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert
Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert 2017-01-08 Contents Introduction 2 Peak-calling: question...........................................
More informationThe list of data and results files that have been used for the analysis can be found at:
BASIC CHIP-SEQ AND SSA TUTORIAL In this tutorial we are going to illustrate the capabilities of the ChIP-Seq server using data from an early landmark paper on STAT1 binding sites in γ-interferon stimulated
More informationPackage enrich. September 3, 2013
Package enrich September 3, 2013 Type Package Title An R package for the analysis of multiple ChIP-seq data Version 2.0 Date 2013-09-01 Author Yanchun Bao , Veronica Vinciotti
More informationSFDR (Stratified False Discovery Rate) Software Documentation. Version 1.6 Feb 7, 2010
SFDR (Stratified False Discovery Rate) Software Documentation 1. Overview of the methods Version 1.6 Feb 7, 2010 Yun Joo Yoo, Shelley B. Bull, Andrew D.Paterson, Daryl Waggott, Lei Sun FDR, SFDR, WFDR,
More informationChIP-seq Analysis. BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute.
ChIP-seq Analysis BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Outline ChIP-seq overview Experimental design Quality control/preprocessing
More informationCopyright 2014 Regents of the University of Minnesota
Quality Control of Illumina Data using Galaxy August 18, 2014 Contents 1 Introduction 2 1.1 What is Galaxy?..................................... 2 1.2 Galaxy at MSI......................................
More informationBiTrinA. Tamara J. Blätte, Florian Schmid, Stefan Mundus, Ludwig Lausser, Martin Hopfensitz, Christoph Müssel and Hans A. Kestler.
BiTrinA Tamara J. Blätte, Florian Schmid, Stefan Mundus, Ludwig Lausser, Martin Hopfensitz, Christoph Müssel and Hans A. Kestler January 29, 2017 Contents 1 Introduction 2 2 Methods in the package 2 2.1
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationAll About PlexSet Technology Data Analysis in nsolver Software
All About PlexSet Technology Data Analysis in nsolver Software PlexSet is a multiplexed gene expression technology which allows pooling of up to 8 samples per ncounter cartridge lane, enabling users to
More informationPackage MethylSeekR. January 26, 2019
Type Package Title Segmentation of Bis-seq data Version 1.22.0 Date 2014-7-1 Package MethylSeekR January 26, 2019 Author Lukas Burger, Dimos Gaidatzis, Dirk Schubeler and Michael Stadler Maintainer Lukas
More informationpanda Documentation Release 1.0 Daniel Vera
panda Documentation Release 1.0 Daniel Vera February 12, 2014 Contents 1 mat.make 3 1.1 Usage and option summary....................................... 3 1.2 Arguments................................................
More informationRunning SNAP. The SNAP Team February 2012
Running SNAP The SNAP Team February 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More
More informationPackage PICS. R topics documented: February 4, Type Package Title Probabilistic inference of ChIP-seq Version
Type Package Title Probabilistic inference of ChIP-seq Version 2.22.0 Package PICS February 4, 2018 Author Xuekui Zhang , Raphael Gottardo Maintainer Renan Sauteraud
More informationPackage MIRA. October 16, 2018
Version 1.2.0 Date 2018-04-17 Package MIRA October 16, 2018 Title Methylation-Based Inference of Regulatory Activity MIRA measures the degree of ``dip'' in methylation level surrounding a regulatory site
More informationIntroduction to the TSSi package: Identification of Transcription Start Sites
Introduction to the TSSi package: Identification of Transcription Start Sites Julian Gehring, Clemens Kreutz October 30, 2017 Along with the advances in high-throughput sequencing, the detection of transcription
More informationYou will be re-directed to the following result page.
ENCODE Element Browser Goal: to navigate the candidate DNA elements predicted by the ENCODE consortium, including gene expression, DNase I hypersensitive sites, TF binding sites, and candidate enhancers/promoters.
More informationTutorial. OTU Clustering Step by Step. Sample to Insight. June 28, 2018
OTU Clustering Step by Step June 28, 2018 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com ts-bioinformatics@qiagen.com
More informationChIP-seq hands-on practical using Galaxy
ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling
More informationPart 1: How to use IGV to visualize variants
Using IGV to identify true somatic variants from the false variants http://www.broadinstitute.org/igv A FAQ, sample files and a user guide are available on IGV website If you use IGV in your publication:
More informationClick on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:
CircosVCF: CircosVCF is a web based visualization tool of genome-wide variant data described in VCF files using circos plots. The provided visualization capabilities, gives a broad overview of the genomic
More informationPackage CopywriteR. April 20, 2018
Type Package Package CopywriteR April 20, 2018 Title Copy number information from targeted sequencing using off-target reads Version 2.11.0 Date 2016-02-23 Author Thomas Kuilman Maintainer Thomas Kuilman
More informationUSING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012)
USING BRAT-BW-2.0.1 BRAT-bw is a tool for BS-seq reads mapping, i.e. mapping of bisulfite-treated sequenced reads. BRAT-bw is a part of BRAT s suit. Therefore, input and output formats for BRAT-bw are
More informationCHAPTER 6 RESULTS AND DISCUSSIONS
151 CHAPTER 6 RESULTS AND DISCUSSIONS In this chapter the performance of the personal identification system on the PolyU database is presented. The database for both Palmprint and Finger Knuckle Print
More informationPackage icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang
Title Integrated Copy Number Variation detection Version 1.2.1 Author Zilu Zhou, Nancy Zhang Package icnv March 8, 2019 Maintainer Zilu Zhou Integrative copy number variation
More informationseqcna: A Package for Copy Number Analysis of High-Throughput Sequencing Cancer DNA
seqcna: A Package for Copy Number Analysis of High-Throughput Sequencing Cancer DNA David Mosen-Ansorena 1 October 30, 2017 Contents 1 Genome Analysis Platform CIC biogune and CIBERehd dmosen.gn@cicbiogune.es
More informationCustomizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels.
Manage. Analyze. Discover. NEW FEATURES BioNumerics Seven comes with several fundamental improvements and a plethora of new analysis possibilities with a strong focus on user friendliness. Among the most
More informationPackage GOTHiC. November 22, 2017
Title Binomial test for Hi-C data analysis Package GOTHiC November 22, 2017 This is a Hi-C analysis package using a cumulative binomial test to detect interactions between distal genomic loci that have
More informationmethylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013
methylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013 Contents 1 Introduction 1 2 Preparations 2 3 Data format 2 4 Data Pre-processing 3 4.1 CpG number of each bin.......................
More informationITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013
ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were
More informationChIP-seq Analysis. BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute.
ChIP-seq Analysis BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Outline ChIP-seq overview Experimental design Quality control/preprocessing
More informationThe software comes with 2 installers: (1) SureCall installer (2) GenAligners (contains BWA, BWA- MEM).
Release Notes Agilent SureCall 4.0 Product Number G4980AA SureCall Client 6-month named license supports installation of one client and server (to host the SureCall database) on one machine. For additional
More informationEval: A Gene Set Comparison System
Masters Project Report Eval: A Gene Set Comparison System Evan Keibler evan@cse.wustl.edu Table of Contents Table of Contents... - 2 - Chapter 1: Introduction... - 5-1.1 Gene Structure... - 5-1.2 Gene
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationIdentiyfing splice junctions from RNA-Seq data
Identiyfing splice junctions from RNA-Seq data Joseph K. Pickrell pickrell@uchicago.edu October 4, 2010 Contents 1 Motivation 2 2 Identification of potential junction-spanning reads 2 3 Calling splice
More informationTable of contents Genomatix AG 1
Table of contents! Introduction! 3 Getting started! 5 The Genome Browser window! 9 The toolbar! 9 The general annotation tracks! 12 Annotation tracks! 13 The 'Sequence' track! 14 The 'Position' track!
More informationAgilent Genomic Workbench 6.5
Agilent Genomic Workbench 6.5 Product Overview Guide For Research Use Only. Not for use in diagnostic procedures. Agilent Technologies Notices Agilent Technologies, Inc. 2010, 2015 No part of this manual
More informationSome Basic ChIP-Seq Data Analysis
Some Basic ChIP-Seq Data Analysis July 28, 2009 Our goal is to describe the use of Bioconductor software to perform some basic tasks in the analysis of ChIP-Seq data. We will use several functions in the
More informationPackage les. October 4, 2013
Package les October 4, 2013 Type Package Title Identifying Differential Effects in Tiling Microarray Data Version 1.10.0 Date 2011-10-10 Author Julian Gehring, Clemens Kreutz, Jens Timmer Maintainer Julian
More informationHigh-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines 454 GS Junior,
More informationCisGenome User s Manual
CisGenome User s Manual 1. Overview 1.1 Basic Framework of CisGenome 1.2 Installation 1.3 Summary of All Functions 1.4 A Quick Start Analysis of a ChIP-chip Experiment 2. Genomics Toolbox I Establishing
More informationCNV-seq Manual. Xie Chao. May 26, 2011
CNV-seq Manual Xie Chao May 26, 20 Introduction acgh CNV-seq Test genome X Genomic fragments Reference genome Y Test genome X Genomic fragments Reference genome Y 2 Sampling & sequencing Whole genome microarray
More informationPeak Finder Meta Server (PFMS) User Manual. Computational & Systems Biology, ICM Uppsala University
Peak Finder Meta Server (PFMS) 1.1 Overview User Manual Computational & Systems Biology, ICM Uppsala University http://www.icm.uu.se/ PFMS is a free software application which identifies genome wide transcription
More informationDemultiplexing Illumina sequencing data containing unique molecular indexes (UMIs)
next generation sequencing analysis guidelines Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs) See what more we can do for you at www.idtdna.com. For Research Use Only
More informationGeneral Help & Instructions to use with Examples
General Help & Instructions to use with Examples Contents Types of Searches and their Purposes... 2 Basic Search:... 2 Advance search option... 6 List Search:... 7 Details Page... 8 Results Grid functionalities:...
More information