Analysis of ChIP-seq Data with mosaics Package

Size: px
Start display at page:

Download "Analysis of ChIP-seq Data with mosaics Package"

Transcription

1 Analysis of ChIP-seq Data with mosaics Package Dongjun Chung 1, Pei Fen Kuan 2 and Sündüz Keleş 1,3 1 Department of Statistics, University of Wisconsin Madison, WI Department of Biostatistics, University of North Carolina at Chapel Hill Chapel Hill, NC Department of Biostatistics and Medical Informatics, University of Wisconsin Madison, WI September 24, Overview This vignette provides an introduction to the analysis of ChIP-seq data with mosaics package. R package mosaics implements MOSAiCS, a statistical framework for the analysis of ChIP-seq data, proposed in [1]. MOSAiCS stands for MOdel-based one and two Sample Analysis and Inference for ChIP-Seq Data. Based on careful investigation of biases in ChIP-seq data such as mappability and GC content, MOSAiCS has been developed as a flexible mixture modeling approach for detecting peaks of one-sample (ChIP sample) or two-sample (ChIP sample and matched control sample) ChIP-seq data. mosaics package assumes chromosome-wise analysis of ChIP-seq data. In this vignette, we use chromosome 22 of the data from a ChIP-seq experiment of STAT1 binding in interferon-γstimulated HeLa S3 cells from [2] The package can be loaded with the command: R> library("mosaics") 2 Workflow: One-Sample Analysis 2.1 Read Bin-Level Data into the R Environment For the one-sample analysis, MOSAiCS requires four information: bin-level ChIP data, mappability score, GC content score, and sequence ambiguity score. For human genome (HG18) and mouse genome (MM9), we provide mappability score, GC content score, and sequence ambiguity score for each chromosome. Please check for further information about these files and how to download them. Bin-level ChIP data can be obtained from the aligned data (e.g., a file obtained from ELAND) using the perl codes we provide. Users can download these perl codes from Bin-level data obtained from our perl codes can be imported to the R environment with the command: R> bin1 <- readbins(type = c("chip", "M", "GC", "N"), filename = c("./chip_chr22.txt", + "./M_chr22.txt", "./GC_chr22.txt", "./N_chr22.txt")) 1

2 In type, chip, M, GC, N indicate bin-level ChIP data, mappability score, GC content score, and sequence ambiguity score, respectively. User needs to provide the corresponding file names in filename. mosaics package assumes that each file name in filename is provided in the same order as in type. This method also provides information about percentage of bins with ambiguous sequences (i.e., sequence ambiguity score = 1 ) and first/last coordinates of bins before and after preprocessing. R package mosaics provides simple investigation tools for bin-level data. The following command prints out basic information about the bin-level data, such as number of coordinates and total effective tag counts. Total effective tag counts are defined as the sum of tag counts of all bins. This value is usually larger than the number of all the tags used in the analysis. Total effective tag counts provide more useful information about the size of data used in the analysis. R> bin1 Summary: bin-level data (class: BinData) # of coordinates = total effective tag counts = (sum of tag counts of all bins) - input sample is incorporated - mappability is incorporated - GC content is incorporated - uniquely matched tags are assumed print method returns the bin-level data in data frame format. R> print(bin1)[1:10, ] coord tagcount mappability gccontent input plot method provides exploratory plots of mean ChIP tag counts. plottype controls which plot to be drawn. plottype="m" and plottype="gc" generate a plot of mean tag counts versus mappability score and GC content score, respectively. If plottype is not specified, this method plots histogram of mean ChIP tag count. Figure 1, 2, and 3 show examples. One can see that mean tag count increases as mappability score increases. As GC content score increases, mean tag count increases to certain value of GC content score (about 0.6) and then it decreases. R> plot(bin1, plottype = "M") R> plot(bin1, plottype = "GC") R> plot(bin1) 2

3 Mappability vs. ChIP mean tag count ChIP mean tag count mappability Figure 1: Plot of mean ChIP tag counts versus mappability score. 2.2 Fit MOSAiCS We are now ready to fit a MOSAiCS model. The bin-level data above (say, bin1) is used as an input. A MOSAiCS model is fitted with the command: R> fit1 <- mosaicsfit(bin1, analysistype = "OS") analysistype="os" indicates implementation of the one-sample analysis. Without input sample, only the one-sample analysis is permitted. mosaicsfit fits both one-signal-component and two-signal-component models. When calling peaks, user can choose the number of signal components used for the final model. The optimal choice of the number of signal components depends on the characteristics of data. In order to support user in choice of optimal signal model, mosaics package provides Bayesian Information Criterion (BIC) values and Goodness of Fit (GOF) plot of these signal models. The following command shows BIC of signal models, with information about the parameters used in fitting the null (non-enriched) distribution. Lower BIC indicates better model fit and we can see that two-signal-component model has lower BIC in our case. 3

4 GC content vs. ChIP mean tag count ChIP mean tag count GC content Figure 2: Plot of mean ChIP tag counts versus GC content score. R> fit1 Summary: MOSAiCS model fitting (class: MosaicsFit) analysis type: one-sample analysis parameters used: k = 3, meanthres = 0 BIC of one-signal-component model = BIC of two-signal-component model = plot method provides GOF plot. Using the GOF plot, user can compare null model, onesignal-component model, and two-signal-component model, with the actual data. Specifically, user can check which signal model fits the actual data better. Figure 4 shows an example GOF plot. One can see that two-signal-component model provides better fit for our data. R> plot(fit1) 4

5 Histogram of ChIP tag count Frequency e ChIP Tag count Figure 3: Histogram of mean ChIP tag counts. 2.3 Call Peaks [Note: We may be going to include explanation of peak refinement and the procedure to implement it in this section.] We can call peaks with the command: R> peak1 <- mosaicspeak(fit1, signalmodel = "2S", FDR = 0.05, maxgap = 200, + minsize = 50, thres = 10) Info: use two-signal-component model. Info: calculating posterior probabilities... Info: calling peaks... Info: empirical FDR (before thresholding) = 0.05 Info: empirical FDR (after thresholding) = Info: done! Using BIC and GOF plot in the previous section, we can conclude that two-signal-component model fits our data better. signalmodel="2s" indicates two-signal-component model. Similarly, 5

6 Frequency e+05 Actual data Sim:N Sim:N+S1 Sim:N+S1+S Tag count Figure 4: Goodness of Fit (GOF) plot. The plot shows null model (Sim:N), one-signal-component model (Sim:N+S1), and two-signal-component model (Sim:N+S1+S2), with the actual data. one-signal-component model can be specified by signalmodel="1s". Additionally, user can control false discovery rate ( FDR ), distance (in bp) to combine initial peaks ( maxgap ), minimum size (in bp) of the initial peak so that it is claimed as a peak ( minsize ), and minimum ChIP tag count in each bin so that the bin is claimed as a peak ( thres). User needs to choose these paramaters in order to call peaks accurately. We recommend to set maxgap and minsize as the fragment length and bin size, respectively. Initial nearby peaks are merged if the distance between them is less than maxgap. This is because a fragment covers several bins when bin size is chosen to be shorter than the fragment length. The initial peaks of which length are shorter than minsize are removed. This is because very small initial peak (e.g., singleton) is less likely to be a real peak when bin size is chosen to be shorter than the fragment length. When bin size is same as the fragment length, minsize should be smaller than the fragment length. This is because there are many (potentially real) peaks of which size is the same as bin size. thres is employed to filter out initial peaks with small tag counts because these initial peaks might correspond to false discoveries. Optimal choice depends on the sequencing depth of the ChIP-Seq data. If negative value is provided for thres, thresholding is not applied. 6

7 The following command prints out basic information of called peaks such as number of called peaks, median peak length, and empirical false discovery rate. R> peak1 Summary: MOSAiCS peak calling (class: MosaicsPeak) final model: one-sample analysis with two signal components setting: FDR = 0.05, maxgap = 200, minsize = 50, thres = 10 # of peaks = 1690 median peak length = 349 empirical FDR = print method returns the peak calling results in data frame format. This data frame can contain different number of columns, depending on analysistype of mosaicsfit. For example, if analysistype="os", columns are peak start position, peak stop position, peak size, averaged posterior probabilities, minimum posterior probability, averaged ChIP tag counts, and maximum of ChIP tag counts in each peak. Here, posterior probablity of a bin means the probability that the bin is not a peak, conditional on data. Hence, small posterior probability indicates more evidence that the bin is actually a peak. This output is intended as an input for downstream analysis such as motif analysis. R> print(peak1)[1:10, ] peakstart peakstop peaksize avep minp avechipcount e e e e e e e e e e e e e e e e e e e e maxchipcount map GC User can also export peak calling results to text files in diverse file formats. Currently, mosaics package support TXT, BED, and GFF file formats. In the exported file, TXT file format ( type="txt" ) 7

8 includes all the columns that print method returns. type="bed" and type="gff" export peak calling results in standard BED and GFF file format, respectively, where score is summed tag counts in each peak. Peak calling results can be exported in TXT, BED, and GFF file formats, respectively, with the commands: R> export(peak1, type = "txt", fileloc = "./", filename = "OSpeakList.txt", + chrid = "chr22") R> export(peak1, type = "bed", fileloc = "./", filename = "OSpeakList.bed", + chrid = "chr22") R> export(peak1, type = "gff", fileloc = "./", filename = "OSpeakList.gff", + chrid = "chr22") fileloc and filename indicate the directory and the name of the exported file. chrid means chromosome ID. mosaics package assumes chromosome-wise analysis and does not require user to supply chromosome ID during the analysis. In the exported file, the character specified in chrid is used as chromosome ID, which will be the first column in standard BED and GFF file formats. 3 Workflow: Two-Sample Analysis 3.1 Analysis using Mappability and GC Content When input data is additionally provided, the two-sample analysis can be implemented. The procedure to implement the two-sample analysis is analogous to the procedure to implement the onesample analysis. Bin-level data for the two-sample analysis can be imported to the R environment with the command: R> bin2 <- readbins(type = c("chip", "M", "GC", "N", "input"), filename = c("./chip_chr22.txt", + "./M_chr22.txt", "./GC_chr22.txt", "./N_chr22.txt", "./Input_chr22.txt")) In other words, we need one more element, input, in type. The file name of input data also needs to be specified in filename. When input data is provided, plottype="input" generates a plot of mean ChIP tag counts versus input tag count. Moreover, plottype="m input" and plottype="gc input" generate plots of mean ChIP tag counts versus mappability score and GC content score, respectively, conditional on input tag count. R> plot(bin1, plottype = "input") R> plot(bin1, plottype = "M input") R> plot(bin1, plottype = "GC input") In order to fit MOSAiCS model for the two-sample analysis, when calling mosaicsfit method, user should specify analysistype="ts" instead of analysistype="os". R> fit2 <- mosaicsfit(bin2, analysistype = "TS") Peak calling can be done exactly in the same way as before. Also all the other methods can still be used as in the one-sample analysis case. R> peak2 <- mosaicspeak(fit2, signalmodel = "2S", FDR = 0.05, maxgap = 200, + minsize = 50, thres = 10) 8

9 3.2 Analysis without using Mappability and GC Content Application of MOSAiCS methods to the real data showed that consideration of mappability and GC content in the model improves sensitivity and specificity of peak calling [1]. However, mosaics package still permits the two-sample analysis without considering mappability and GC content. In order to fit MOSAiCS model for the two-sample analysis without considering mappability and GC content, analysistype="io" needs to be specified when calling mosaicsfit method. R> fit3 <- mosaicsfit(bin2, analysistype = "IO") Furthermore, in this case, mappability score, GC content score, and sequence ambiguity score might not be incorporated when importing bin-level data. User can import bin-level data and fit MOSAiCS model for the two-sample analysis without mappabilibity and GC content, with the commands: R> bin4 <- readbins(type = c("chip", "input"), filename = c("./chip_chr22.txt", + "./Input_chr22.txt")) R> fit4 <- mosaicsfit(bin4, analysistype = "IO") Note that with these commands mosaics package skips initial processing of bin-level data using mappability score, GC content score, and sequence ambiguity score. We recommend to incorporate mappability score, GC content score, and sequence ambiguity score as long as they are available to users. 4 Tuning of MOSAiCS Parameters For the two-sample analysis, mosaics package permits user to tune the parameters used in fitting the null distribution of MOSAiCS model. When both mappability score and GC content score are available (i.e., the two-sample analysis using mappability and GC content), user can control two tuning parameters, s and d. s means the cut-off to determine lower and higher values of input data. d means the order used in the transformation of input data. Please see [1] for further details about model parameters. For the two-sample analysis using mappability and GC content, mosaics package uses the following parameter setting as default: R> fit3_ts <- mosaicsfit(bin2, analysistype = "TS", s = 2, d = 0.25) When mappability score and GC content score are not available (i.e., the two-sample analysis without using mappability and GC content), user can tune only one parameter, d. For the twosample analysis without using mappability and GC content, mosaics package uses the following parameter setting as default: R> fit3_io <- mosaicsfit(bin2, analysistype = "IO", d = 0.25) 5 Conclusion and Ongoing Development R package mosaics provides effective tools to read and investigate ChIP-seq data, fit MOSAiCS, and claim peaks. We are continuously working on improving mosaics package further in more friendly user interface and more effective and diverse investigation tools. 9

10 References [1] Kuan, PF, D Chung, JA Thomson, R Stewart, and S Keleş (2010), A Statistical Framework for the Analysis of ChIP-Seq Data, submitted ( 19/). [2] Rozowsky, J, G Euskirchen, R Auerbach, D Zhang, T Gibson, R Bjornson, N Carriero, M Snyder, and M Gerstein (2009), PeakSeq enables systematic scoring of ChIP-Seq experiments relative to controls, Nature Biotechnology, 27,

Analysis of ChIP-seq Data with mosaics Package

Analysis of ChIP-seq Data with mosaics Package Analysis of ChIP-seq Data with mosaics Package Dongjun Chung 1, Pei Fen Kuan 2 and Sündüz Keleş 1,3 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics,

More information

Analysis of ChIP-seq Data with mosaics Package

Analysis of ChIP-seq Data with mosaics Package Analysis of ChIP-seq Data with mosaics Package Dongjun Chung 1, Pei Fen Kuan 2 and Sündüz Keleş 1,3 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics,

More information

Analysis of ChIP-seq Data with mosaics Package: MOSAiCS and MOSAiCS-HMM

Analysis of ChIP-seq Data with mosaics Package: MOSAiCS and MOSAiCS-HMM Analysis of ChIP-seq Data with mosaics Package: MOSAiCS and MOSAiCS-HMM Dongjun Chung 1, Pei Fen Kuan 2, Rene Welch 3, and Sündüz Keleş 3,4 1 Department of Public Health Sciences, Medical University of

More information

Package mosaics. April 23, 2016

Package mosaics. April 23, 2016 Type Package Package mosaics April 23, 2016 Title MOSAiCS (MOdel-based one and two Sample Analysis and Inference for ChIP-Seq) Version 2.4.1 Depends R (>= 3.0.0), methods, graphics, Rcpp Imports MASS,

More information

Package mosaics. July 3, 2018

Package mosaics. July 3, 2018 Type Package Package mosaics July 3, 2018 Title MOSAiCS (MOdel-based one and two Sample Analysis and Inference for ChIP-Seq) Version 2.19.0 Depends R (>= 3.0.0), methods, graphics, Rcpp Imports MASS, splines,

More information

User Manual of ChIP-BIT 2.0

User Manual of ChIP-BIT 2.0 User Manual of ChIP-BIT 2.0 Xi Chen September 16, 2015 Introduction Bayesian Inference of Target genes using ChIP-seq profiles (ChIP-BIT) is developed to reliably detect TFBSs and their target genes by

More information

User's guide to ChIP-Seq applications: command-line usage and option summary

User's guide to ChIP-Seq applications: command-line usage and option summary User's guide to ChIP-Seq applications: command-line usage and option summary 1. Basics about the ChIP-Seq Tools The ChIP-Seq software provides a set of tools performing common genome-wide ChIPseq analysis

More information

PICS: Probabilistic Inference for ChIP-Seq

PICS: Probabilistic Inference for ChIP-Seq PICS: Probabilistic Inference for ChIP-Seq Xuekui Zhang * and Raphael Gottardo, Arnaud Droit and Renan Sauteraud April 30, 2018 A step-by-step guide in the analysis of ChIP-Seq data using the PICS package

More information

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments Ning Leng and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 The model 2 2.1 EBSeqHMM model..........................................

More information

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data Table of Contents Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification

More information

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu) HIPPIE User Manual (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu) OVERVIEW OF HIPPIE o Flowchart of HIPPIE o Requirements PREPARE DIRECTORY STRUCTURE FOR HIPPIE EXECUTION o

More information

ChIP-Seq Tutorial on Galaxy

ChIP-Seq Tutorial on Galaxy 1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data

More information

ChIP-seq hands-on practical using Galaxy

ChIP-seq hands-on practical using Galaxy ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling

More information

CLC Server. End User USER MANUAL

CLC Server. End User USER MANUAL CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark

More information

Analysis of ChIP-seq data

Analysis of ChIP-seq data Before we start: 1. Log into tak (step 0 on the exercises) 2. Go to your lab space and create a folder for the class (see separate hand out) 3. Connect to your lab space through the wihtdata network and

More information

Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors

Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors Karl Kornacker * and Tony Håndstad October 30, 2018 A guide for using the Triform algorithm to predict transcription factor

More information

Package BAC. R topics documented: November 30, 2018

Package BAC. R topics documented: November 30, 2018 Type Package Title Bayesian Analysis of Chip-chip experiment Version 1.42.0 Date 2007-11-20 Author Raphael Gottardo Package BAC November 30, 2018 Maintainer Raphael Gottardo Depends

More information

ChIP-seq (NGS) Data Formats

ChIP-seq (NGS) Data Formats ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/

More information

Package polyphemus. February 15, 2013

Package polyphemus. February 15, 2013 Package polyphemus February 15, 2013 Type Package Title Genome wide analysis of RNA Polymerase II-based ChIP Seq data. Version 0.3.4 Date 2010-08-11 Author Martial Sankar, supervised by Marco Mendoza and

More information

m6aviewer Version Documentation

m6aviewer Version Documentation m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.

More information

Package BayesPeak. R topics documented: September 21, Version Date Title Bayesian Analysis of ChIP-seq Data

Package BayesPeak. R topics documented: September 21, Version Date Title Bayesian Analysis of ChIP-seq Data Package BayesPeak September 21, 2014 Version 1.16.0 Date 2009-11-04 Title Bayesian Analysis of ChIP-seq Data Author Christiana Spyrou, Jonathan Cairns, Rory Stark, Andy Lynch,Simon Tavar\{}\{}'{e}, Maintainer

More information

Expander 7.2 Online Documentation

Expander 7.2 Online Documentation Expander 7.2 Online Documentation Introduction... 2 Starting EXPANDER... 2 Input Data... 3 Tabular Data File... 4 CEL Files... 6 Working on similarity data no associated expression data... 9 Working on

More information

A short Introduction to UCSC Genome Browser

A short Introduction to UCSC Genome Browser A short Introduction to UCSC Genome Browser Elodie Girard, Nicolas Servant Institut Curie/INSERM U900 Bioinformatics, Biostatistics, Epidemiology and computational Systems Biology of Cancer 1 Why using

More information

Package GreyListChIP

Package GreyListChIP Type Package Version 1.14.0 Package GreyListChIP November 2, 2018 Title Grey Lists -- Mask Artefact Regions Based on ChIP Inputs Date 2018-01-21 Author Gord Brown Maintainer Gordon

More information

The ChIP-seq quality Control package ChIC: A short introduction

The ChIP-seq quality Control package ChIC: A short introduction The ChIP-seq quality Control package ChIC: A short introduction April 30, 2018 Abstract The ChIP-seq quality Control package (ChIC) provides functions and data structures to assess the quality of ChIP-seq

More information

Working with ChIP-Seq Data in R/Bioconductor

Working with ChIP-Seq Data in R/Bioconductor Working with ChIP-Seq Data in R/Bioconductor Suraj Menon, Tom Carroll, Shamith Samarajiwa September 3, 2014 Contents 1 Introduction 1 2 Working with aligned data 1 2.1 Reading in data......................................

More information

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima ChIP-seq practical: peak detection and peak annotation Mali Salmon-Divon Remco Loos Myrto Kostadima March 2012 Introduction The goal of this hands-on session is to perform some basic tasks in the analysis

More information

EBSeq: An R package for differential expression analysis using RNA-seq data

EBSeq: An R package for differential expression analysis using RNA-seq data EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John A. Dawson, Christina Kendziorski October 9, 2012 Contents 1 Introduction 2 2 The Model 3 2.1 Two conditions............................

More information

Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan

Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan Supplementary Information for: Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan Contents Instructions for installing and running

More information

Agilent Genomic Workbench Lite Edition 6.5

Agilent Genomic Workbench Lite Edition 6.5 Agilent Genomic Workbench Lite Edition 6.5 SureSelect Quality Analyzer User Guide For Research Use Only. Not for use in diagnostic procedures. Agilent Technologies Notices Agilent Technologies, Inc. 2010

More information

Analyzing ChIP- Seq Data in Galaxy

Analyzing ChIP- Seq Data in Galaxy Analyzing ChIP- Seq Data in Galaxy Lauren Mills RISS ABSTRACT Step- by- step guide to basic ChIP- Seq analysis using the Galaxy platform. Table of Contents Introduction... 3 Links to helpful information...

More information

Package RAPIDR. R topics documented: February 19, 2015

Package RAPIDR. R topics documented: February 19, 2015 Package RAPIDR February 19, 2015 Title Reliable Accurate Prenatal non-invasive Diagnosis R package Package to perform non-invasive fetal testing for aneuploidies using sequencing count data from cell-free

More information

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your

More information

ChIP-seq Analysis Practical

ChIP-seq Analysis Practical ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how

More information

Package SVM2CRM. R topics documented: May 1, Type Package

Package SVM2CRM. R topics documented: May 1, Type Package Type Package Package SVM2CRM May 1, 2018 Title SVM2CRM: support vector machine for cis-regulatory elements detections Version 1.12.0 Date 2014-03-30 Description Detection of cis-regulatory elements using

More information

R topics documented: 2 R topics documented:

R topics documented: 2 R topics documented: Package cn.mops March 20, 2019 Maintainer Guenter Klambauer Author Guenter Klambauer License LGPL (>= 2.0) Type Package Title cn.mops - Mixture of Poissons for CNV detection in

More information

Package ChIC. R topics documented: May 4, Title Quality Control Pipeline for ChIP-Seq Data Version Author Carmen Maria Livi

Package ChIC. R topics documented: May 4, Title Quality Control Pipeline for ChIP-Seq Data Version Author Carmen Maria Livi Title Quality Control Pipeline for ChIP-Seq Data Version 1.0.0 Author Carmen Maria Livi Package ChIC May 4, 2018 Maintainer Carmen Maria Livi Quality control pipeline for ChIP-seq

More information

KC-SMART Vignette. Jorma de Ronde, Christiaan Klijn April 30, 2018

KC-SMART Vignette. Jorma de Ronde, Christiaan Klijn April 30, 2018 KC-SMART Vignette Jorma de Ronde, Christiaan Klijn April 30, 2018 1 Introduction In this Vignette we demonstrate the usage of the KCsmart R package. This R implementation is based on the original implementation

More information

Rsubread package: high-performance read alignment, quantification and mutation discovery

Rsubread package: high-performance read alignment, quantification and mutation discovery Rsubread package: high-performance read alignment, quantification and mutation discovery Wei Shi 14 September 2015 1 Introduction This vignette provides a brief description to the Rsubread package. For

More information

Package rgreat. June 19, 2018

Package rgreat. June 19, 2018 Type Package Title Client for GREAT Analysis Version 1.12.0 Date 2018-2-20 Author Zuguang Gu Maintainer Zuguang Gu Package rgreat June 19, 2018 Depends R (>= 3.1.2), GenomicRanges, IRanges,

More information

Tutorial for the Exon Ontology website

Tutorial for the Exon Ontology website Tutorial for the Exon Ontology website Table of content Outline Step-by-step Guide 1. Preparation of the test-list 2. First analysis step (without statistical analysis) 2.1. The output page is composed

More information

Rsubread package: high-performance read alignment, quantification and mutation discovery

Rsubread package: high-performance read alignment, quantification and mutation discovery Rsubread package: high-performance read alignment, quantification and mutation discovery Wei Shi 14 September 2015 1 Introduction This vignette provides a brief description to the Rsubread package. For

More information

Easy visualization of the read coverage using the CoverageView package

Easy visualization of the read coverage using the CoverageView package Easy visualization of the read coverage using the CoverageView package Ernesto Lowy European Bioinformatics Institute EMBL June 13, 2018 > options(width=40) > library(coverageview) 1 Introduction This

More information

SICER 1.1. If you use SICER to analyze your data in a published work, please cite the above paper in the main text of your publication.

SICER 1.1. If you use SICER to analyze your data in a published work, please cite the above paper in the main text of your publication. SICER 1.1 1. Introduction For details description of the algorithm, please see A clustering approach for identification of enriched domains from histone modification ChIP-Seq data Chongzhi Zang, Dustin

More information

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL QIAseq DNA V3 Panel Analysis Plugin USER MANUAL User manual for QIAseq DNA V3 Panel Analysis 1.0.1 Windows, Mac OS X and Linux January 25, 2018 This software is for research purposes only. QIAGEN Aarhus

More information

ASAP - Allele-specific alignment pipeline

ASAP - Allele-specific alignment pipeline ASAP - Allele-specific alignment pipeline Jan 09, 2012 (1) ASAP - Quick Reference ASAP needs a working version of Perl and is run from the command line. Furthermore, Bowtie needs to be installed on your

More information

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1 Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular

More information

Copyright 2014 Regents of the University of Minnesota

Copyright 2014 Regents of the University of Minnesota Quality Control of Illumina Data using Galaxy Contents September 16, 2014 1 Introduction 2 1.1 What is Galaxy?..................................... 2 1.2 Galaxy at MSI......................................

More information

Correlation Motif Vignette

Correlation Motif Vignette Correlation Motif Vignette Hongkai Ji, Yingying Wei October 30, 2018 1 Introduction The standard algorithms for detecting differential genes from microarray data are mostly designed for analyzing a single

More information

Practical 4: ChIP-seq Peak calling

Practical 4: ChIP-seq Peak calling Practical 4: ChIP-seq Peak calling Shamith Samarajiiwa, Dora Bihary September 2017 Contents 1 Calling ChIP-seq peaks using MACS2 1 1.1 Assess the quality of the aligned datasets..................................

More information

Introduction to Galaxy

Introduction to Galaxy Introduction to Galaxy Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 1 Thurs 28 th January 2016 Overview What is Galaxy? Description of

More information

Design and Annotation Files

Design and Annotation Files Design and Annotation Files Release Notes SeqCap EZ Exome Target Enrichment System The design and annotation files provide information about genomic regions covered by the capture probes and the genes

More information

BadRegionFinder an R/Bioconductor package for identifying regions with bad coverage

BadRegionFinder an R/Bioconductor package for identifying regions with bad coverage BadRegionFinder an R/Bioconductor package for identifying regions with bad coverage Sarah Sandmann April 30, 2018 Contents 1 Introduction 1 1.1 Loading the package.................................... 2

More information

Package saascnv. May 18, 2016

Package saascnv. May 18, 2016 Version 0.3.4 Date 2016-05-10 Package saascnv May 18, 2016 Title Somatic Copy Number Alteration Analysis Using Sequencing and SNP Array Data Author Zhongyang Zhang [aut, cre], Ke Hao [aut], Nancy R. Zhang

More information

Running SNAP. The SNAP Team October 2012

Running SNAP. The SNAP Team October 2012 Running SNAP The SNAP Team October 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More

More information

Advanced UCSC Browser Functions

Advanced UCSC Browser Functions Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for

More information

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert 2017-01-08 Contents Introduction 2 Peak-calling: question...........................................

More information

The list of data and results files that have been used for the analysis can be found at:

The list of data and results files that have been used for the analysis can be found at: BASIC CHIP-SEQ AND SSA TUTORIAL In this tutorial we are going to illustrate the capabilities of the ChIP-Seq server using data from an early landmark paper on STAT1 binding sites in γ-interferon stimulated

More information

Package enrich. September 3, 2013

Package enrich. September 3, 2013 Package enrich September 3, 2013 Type Package Title An R package for the analysis of multiple ChIP-seq data Version 2.0 Date 2013-09-01 Author Yanchun Bao , Veronica Vinciotti

More information

SFDR (Stratified False Discovery Rate) Software Documentation. Version 1.6 Feb 7, 2010

SFDR (Stratified False Discovery Rate) Software Documentation. Version 1.6 Feb 7, 2010 SFDR (Stratified False Discovery Rate) Software Documentation 1. Overview of the methods Version 1.6 Feb 7, 2010 Yun Joo Yoo, Shelley B. Bull, Andrew D.Paterson, Daryl Waggott, Lei Sun FDR, SFDR, WFDR,

More information

ChIP-seq Analysis. BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute.

ChIP-seq Analysis. BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute. ChIP-seq Analysis BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Outline ChIP-seq overview Experimental design Quality control/preprocessing

More information

Copyright 2014 Regents of the University of Minnesota

Copyright 2014 Regents of the University of Minnesota Quality Control of Illumina Data using Galaxy August 18, 2014 Contents 1 Introduction 2 1.1 What is Galaxy?..................................... 2 1.2 Galaxy at MSI......................................

More information

BiTrinA. Tamara J. Blätte, Florian Schmid, Stefan Mundus, Ludwig Lausser, Martin Hopfensitz, Christoph Müssel and Hans A. Kestler.

BiTrinA. Tamara J. Blätte, Florian Schmid, Stefan Mundus, Ludwig Lausser, Martin Hopfensitz, Christoph Müssel and Hans A. Kestler. BiTrinA Tamara J. Blätte, Florian Schmid, Stefan Mundus, Ludwig Lausser, Martin Hopfensitz, Christoph Müssel and Hans A. Kestler January 29, 2017 Contents 1 Introduction 2 2 Methods in the package 2 2.1

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

All About PlexSet Technology Data Analysis in nsolver Software

All About PlexSet Technology Data Analysis in nsolver Software All About PlexSet Technology Data Analysis in nsolver Software PlexSet is a multiplexed gene expression technology which allows pooling of up to 8 samples per ncounter cartridge lane, enabling users to

More information

Package MethylSeekR. January 26, 2019

Package MethylSeekR. January 26, 2019 Type Package Title Segmentation of Bis-seq data Version 1.22.0 Date 2014-7-1 Package MethylSeekR January 26, 2019 Author Lukas Burger, Dimos Gaidatzis, Dirk Schubeler and Michael Stadler Maintainer Lukas

More information

panda Documentation Release 1.0 Daniel Vera

panda Documentation Release 1.0 Daniel Vera panda Documentation Release 1.0 Daniel Vera February 12, 2014 Contents 1 mat.make 3 1.1 Usage and option summary....................................... 3 1.2 Arguments................................................

More information

Running SNAP. The SNAP Team February 2012

Running SNAP. The SNAP Team February 2012 Running SNAP The SNAP Team February 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More

More information

Package PICS. R topics documented: February 4, Type Package Title Probabilistic inference of ChIP-seq Version

Package PICS. R topics documented: February 4, Type Package Title Probabilistic inference of ChIP-seq Version Type Package Title Probabilistic inference of ChIP-seq Version 2.22.0 Package PICS February 4, 2018 Author Xuekui Zhang , Raphael Gottardo Maintainer Renan Sauteraud

More information

Package MIRA. October 16, 2018

Package MIRA. October 16, 2018 Version 1.2.0 Date 2018-04-17 Package MIRA October 16, 2018 Title Methylation-Based Inference of Regulatory Activity MIRA measures the degree of ``dip'' in methylation level surrounding a regulatory site

More information

Introduction to the TSSi package: Identification of Transcription Start Sites

Introduction to the TSSi package: Identification of Transcription Start Sites Introduction to the TSSi package: Identification of Transcription Start Sites Julian Gehring, Clemens Kreutz October 30, 2017 Along with the advances in high-throughput sequencing, the detection of transcription

More information

You will be re-directed to the following result page.

You will be re-directed to the following result page. ENCODE Element Browser Goal: to navigate the candidate DNA elements predicted by the ENCODE consortium, including gene expression, DNase I hypersensitive sites, TF binding sites, and candidate enhancers/promoters.

More information

Tutorial. OTU Clustering Step by Step. Sample to Insight. June 28, 2018

Tutorial. OTU Clustering Step by Step. Sample to Insight. June 28, 2018 OTU Clustering Step by Step June 28, 2018 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com ts-bioinformatics@qiagen.com

More information

ChIP-seq hands-on practical using Galaxy

ChIP-seq hands-on practical using Galaxy ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling

More information

Part 1: How to use IGV to visualize variants

Part 1: How to use IGV to visualize variants Using IGV to identify true somatic variants from the false variants http://www.broadinstitute.org/igv A FAQ, sample files and a user guide are available on IGV website If you use IGV in your publication:

More information

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

Click on + button Select your VCF data files (see #Input Formats->1 above) Remove file from files list: CircosVCF: CircosVCF is a web based visualization tool of genome-wide variant data described in VCF files using circos plots. The provided visualization capabilities, gives a broad overview of the genomic

More information

Package CopywriteR. April 20, 2018

Package CopywriteR. April 20, 2018 Type Package Package CopywriteR April 20, 2018 Title Copy number information from targeted sequencing using off-target reads Version 2.11.0 Date 2016-02-23 Author Thomas Kuilman Maintainer Thomas Kuilman

More information

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012)

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012) USING BRAT-BW-2.0.1 BRAT-bw is a tool for BS-seq reads mapping, i.e. mapping of bisulfite-treated sequenced reads. BRAT-bw is a part of BRAT s suit. Therefore, input and output formats for BRAT-bw are

More information

CHAPTER 6 RESULTS AND DISCUSSIONS

CHAPTER 6 RESULTS AND DISCUSSIONS 151 CHAPTER 6 RESULTS AND DISCUSSIONS In this chapter the performance of the personal identification system on the PolyU database is presented. The database for both Palmprint and Finger Knuckle Print

More information

Package icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang

Package icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang Title Integrated Copy Number Variation detection Version 1.2.1 Author Zilu Zhou, Nancy Zhang Package icnv March 8, 2019 Maintainer Zilu Zhou Integrative copy number variation

More information

seqcna: A Package for Copy Number Analysis of High-Throughput Sequencing Cancer DNA

seqcna: A Package for Copy Number Analysis of High-Throughput Sequencing Cancer DNA seqcna: A Package for Copy Number Analysis of High-Throughput Sequencing Cancer DNA David Mosen-Ansorena 1 October 30, 2017 Contents 1 Genome Analysis Platform CIC biogune and CIBERehd dmosen.gn@cicbiogune.es

More information

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels.

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels. Manage. Analyze. Discover. NEW FEATURES BioNumerics Seven comes with several fundamental improvements and a plethora of new analysis possibilities with a strong focus on user friendliness. Among the most

More information

Package GOTHiC. November 22, 2017

Package GOTHiC. November 22, 2017 Title Binomial test for Hi-C data analysis Package GOTHiC November 22, 2017 This is a Hi-C analysis package using a cumulative binomial test to detect interactions between distal genomic loci that have

More information

methylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013

methylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013 methylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013 Contents 1 Introduction 1 2 Preparations 2 3 Data format 2 4 Data Pre-processing 3 4.1 CpG number of each bin.......................

More information

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were

More information

ChIP-seq Analysis. BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute.

ChIP-seq Analysis. BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute. ChIP-seq Analysis BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Outline ChIP-seq overview Experimental design Quality control/preprocessing

More information

The software comes with 2 installers: (1) SureCall installer (2) GenAligners (contains BWA, BWA- MEM).

The software comes with 2 installers: (1) SureCall installer (2) GenAligners (contains BWA, BWA- MEM). Release Notes Agilent SureCall 4.0 Product Number G4980AA SureCall Client 6-month named license supports installation of one client and server (to host the SureCall database) on one machine. For additional

More information

Eval: A Gene Set Comparison System

Eval: A Gene Set Comparison System Masters Project Report Eval: A Gene Set Comparison System Evan Keibler evan@cse.wustl.edu Table of Contents Table of Contents... - 2 - Chapter 1: Introduction... - 5-1.1 Gene Structure... - 5-1.2 Gene

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Identiyfing splice junctions from RNA-Seq data

Identiyfing splice junctions from RNA-Seq data Identiyfing splice junctions from RNA-Seq data Joseph K. Pickrell pickrell@uchicago.edu October 4, 2010 Contents 1 Motivation 2 2 Identification of potential junction-spanning reads 2 3 Calling splice

More information

Table of contents Genomatix AG 1

Table of contents Genomatix AG 1 Table of contents! Introduction! 3 Getting started! 5 The Genome Browser window! 9 The toolbar! 9 The general annotation tracks! 12 Annotation tracks! 13 The 'Sequence' track! 14 The 'Position' track!

More information

Agilent Genomic Workbench 6.5

Agilent Genomic Workbench 6.5 Agilent Genomic Workbench 6.5 Product Overview Guide For Research Use Only. Not for use in diagnostic procedures. Agilent Technologies Notices Agilent Technologies, Inc. 2010, 2015 No part of this manual

More information

Some Basic ChIP-Seq Data Analysis

Some Basic ChIP-Seq Data Analysis Some Basic ChIP-Seq Data Analysis July 28, 2009 Our goal is to describe the use of Bioconductor software to perform some basic tasks in the analysis of ChIP-Seq data. We will use several functions in the

More information

Package les. October 4, 2013

Package les. October 4, 2013 Package les October 4, 2013 Type Package Title Identifying Differential Effects in Tiling Microarray Data Version 1.10.0 Date 2011-10-10 Author Julian Gehring, Clemens Kreutz, Jens Timmer Maintainer Julian

More information

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines 454 GS Junior,

More information

CisGenome User s Manual

CisGenome User s Manual CisGenome User s Manual 1. Overview 1.1 Basic Framework of CisGenome 1.2 Installation 1.3 Summary of All Functions 1.4 A Quick Start Analysis of a ChIP-chip Experiment 2. Genomics Toolbox I Establishing

More information

CNV-seq Manual. Xie Chao. May 26, 2011

CNV-seq Manual. Xie Chao. May 26, 2011 CNV-seq Manual Xie Chao May 26, 20 Introduction acgh CNV-seq Test genome X Genomic fragments Reference genome Y Test genome X Genomic fragments Reference genome Y 2 Sampling & sequencing Whole genome microarray

More information

Peak Finder Meta Server (PFMS) User Manual. Computational & Systems Biology, ICM Uppsala University

Peak Finder Meta Server (PFMS) User Manual. Computational & Systems Biology, ICM Uppsala University Peak Finder Meta Server (PFMS) 1.1 Overview User Manual Computational & Systems Biology, ICM Uppsala University http://www.icm.uu.se/ PFMS is a free software application which identifies genome wide transcription

More information

Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs)

Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs) next generation sequencing analysis guidelines Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs) See what more we can do for you at www.idtdna.com. For Research Use Only

More information

General Help & Instructions to use with Examples

General Help & Instructions to use with Examples General Help & Instructions to use with Examples Contents Types of Searches and their Purposes... 2 Basic Search:... 2 Advance search option... 6 List Search:... 7 Details Page... 8 Results Grid functionalities:...

More information