Package GOTHiC. November 22, 2017

Similar documents
Package RAPIDR. R topics documented: February 19, 2015

Package GreyListChIP

Basic4Cseq: an R/Bioconductor package for the analysis of 4C-seq data

Basic4Cseq: an R/Bioconductor package for the analysis of 4C-seq data

Package FitHiC. October 16, 2018

Package bnbc. R topics documented: April 1, Version 1.4.0

Package MIRA. October 16, 2018

segmentseq: methods for detecting methylation loci and differential methylation

Package PlasmaMutationDetector

segmentseq: methods for detecting methylation loci and differential methylation

Package HTSeqGenie. April 16, 2019

Package MethylSeekR. January 26, 2019

Package mosaics. July 3, 2018

Package PGA. July 6, 2018

Package FunciSNP. November 16, 2018

Package DNaseR. November 1, 2013

RAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1

Package GUIDEseq. October 12, 2016

Package dmrseq. September 14, 2018

Package roar. August 31, 2018

Package seqcat. March 25, 2019

Package metagene. November 1, 2018

Package scruff. November 6, 2018

Package CopywriteR. April 20, 2018

Package GUIDEseq. April 23, 2016

Package h5vc. R topics documented: October 2, Type Package

Package LOLA. December 24, 2018

Practical 4: ChIP-seq Peak calling

Package icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang

Package Basic4Cseq. August 24, 2018

Package customprodb. September 9, 2018

Package ChIC. R topics documented: May 4, Title Quality Control Pipeline for ChIP-Seq Data Version Author Carmen Maria Livi

An Introduction to VariantTools

MMDiff2: Statistical Testing for ChIP-Seq Data Sets

Package MEDIPS. September 22, 2014

Package multihiccompare

methylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013

Package NGScopy. April 10, 2015

Package methylmnm. January 14, 2013

Package EMDomics. November 11, 2018

Package mosaics. April 23, 2016

An Introduction to ShortRead

Package MEDIPS. August 13, 2018

Representing sequencing data in Bioconductor

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

Package CorporaCoCo. R topics documented: November 23, 2017

R topics documented: 2 R topics documented:

Package AnnotationHub

Package PICS. R topics documented: February 4, Type Package Title Probabilistic inference of ChIP-seq Version

BadRegionFinder an R/Bioconductor package for identifying regions with bad coverage

Working with aligned nucleotides (WORK- IN-PROGRESS!)

Package DupChecker. April 11, 2018

ssviz: A small RNA-seq visualizer and analysis toolkit

Package MACPET. January 16, 2019

Package CAGEr. September 4, 2018

Package ChIPseeker. July 12, 2018

Package RTCGAToolbox

Package DMCHMM. January 19, 2019

Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors

STREAMING FRAGMENT ASSIGNMENT FOR REAL-TIME ANALYSIS OF SEQUENCING EXPERIMENTS. Supplementary Figure 1

Some Basic ChIP-Seq Data Analysis

Package ArrayExpressHTS

Package wavcluster. January 23, 2019

Package geneslope. October 26, 2016

Package GUIDEseq. May 3, 2018

Package DNAshapeR. February 9, 2018

Package methylpipe. November 20, 2018

Visualisation, transformations and arithmetic operations for grouped genomic intervals

Package VSE. March 21, 2016

Package CNVrd2. March 18, 2019

Using the qrqc package to gather information about sequence qualities

PICS: Probabilistic Inference for ChIP-Seq

Package diffhic. October 9, 2015

Package EventPointer

Package MsatAllele. February 15, 2013

Package MiRaGE. October 16, 2018

Package SVM2CRM. R topics documented: May 1, Type Package

Package HMMcopy. September 6, 2011

Wave correction for arrays

Package SGSeq. December 31, 2018

Package cleanupdtseq

Package polyphemus. February 15, 2013

Package methimpute. August 31, 2018

The rtracklayer package

Package DRIMSeq. December 19, 2018

riboseqr Introduction Getting Data Workflow Example Thomas J. Hardcastle, Betty Y.W. Chung October 30, 2017

CNV-seq Manual. Xie Chao. May 26, 2011

Practical: Read Counting in RNA-seq

Package rmat. November 1, 2018

Introduction to the TSSi package: Identification of Transcription Start Sites

Package gmapr. January 22, 2019

IRanges and GenomicRanges An introduction

Package BayesPeak. R topics documented: September 21, Version Date Title Bayesian Analysis of ChIP-seq Data

Package Hapi. July 28, 2018

Package gsalib. R topics documented: February 20, Type Package. Title Utility Functions For GATK. Version 2.1.

Package M3D. March 21, 2019

Package igc. February 10, 2018

Package genotypeeval

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

Package NB. R topics documented: February 19, Type Package

Transcription:

Title Binomial test for Hi-C data analysis Package GOTHiC November 22, 2017 This is a Hi-C analysis package using a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments. It takes mapped paired NGS reads as input and gives back the list of significant interactions for a given bin size in the genome. Version 1.15.0 Date 2013-06-07 Author Borbala Mifsud and Robert Sugar Maintainer Borbala Mifsud <b.mifsud@qmul.ac.uk> Depends R (>= 2.15.1), methods, utils, GenomicRanges, Biostrings, BSgenome, data.table Imports BiocGenerics, S4Vectors (>= 0.9.38), IRanges, Rsamtools, ShortRead, rtracklayer, ggplot2 Suggests HiCDataLymphoblast Enhances parallel License GPL-3 biocviews Sequencing, Preprocessing, Epigenetics, HiC NeedsCompilation no R topics documented: filtered............................................ 2 GOTHiC........................................... 2 GOTHiChicup........................................ 4 mapreadstorestrictionsites................................ 6 pairreads.......................................... 7 Index 9 1

2 GOTHiC filtered A GenomicRangesList object used as an example in the GOTHiC package Format filtered is a GenomicRangesList example object used as an example for the binomialhic package. This GenomicRangesList contains reads from a human lymphoblastoid cell line HiC experiment (Lieberman-Aiden et al. 2009) for chr20, that were mapped to the genome, paired and PCR duplicate-filtered. data(lymphoid_chr20_paired_filtered) The format is: GenomicRangesList with 2 slots: $paired_reads_1 contains the coordinates for one end of the paired reads $paired_reads_2 contains the coordinates for the other end of the paired reads Borbala Gerle and Robert Sugar mapreadstorestrictionsites data(lymphoid_chr20_paired_filtered) GOTHiC Genome Organisation Through HiC GOTHiC performs a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments. It takes mapped paired NGS reads as input and gives back the list of significant interactions for a given bin size in the genome. GOTHiC(fileName1, filename2, samplename, res, BSgenomeName='BSgenome.Hsapiens.UCSC.hg19', genome=bsgenome.hsapiens.ucsc.hg19, restrictionsite='a^agctt', enzyme='hindiii',cistrans='all',filterdist=10000, DUPLICATETHRESHOLD=1, filetype='bam', parallel=false, cores=null)

GOTHiC 3 Arguments Value filename1 filename2 samplename res BSgenomeName File containing the mapped reads of the first fragment ends (BAM or Bowtie format) File containing the mapped reads of the second fragment ends (BAM or Bowtie format) A character string that will be used to name the exported BedGraph file containing the coverage, R object files with paired and mapped reads, and the final data frame with the results from the binomial test. They will be saved in the current directory. An integer that gives the required bin size or resolution of the contact map e.g. 1000000. A character string of the BSgenome package required to make the restriction fragment file containing information for both the organism the experiment was made in, and the genome version the reads were mapped to. The default is the current human genome build BSgenome.Hsapiens.UCSC.hg19. genome The BSgenome package required to make the restriction fragment file containing information for both the organism the experiment was made in, and the genome version the reads were mapped to. The default is the current human genome build BSgenome.Hsapiens.UCSC.hg19. restrictionsite A character string that specifies the enzymes recognition site, ^ indicating where the enzyme actually cuts. The default is the HindIII restriction site: A^AGCTT. enzyme cistrans A character string containing the name of the enzyme used during the Hi-C experiment (i.e. "HindIII", "NcoI"). The default is "HindIII". A character string with three possibilities. "all" runs the binomial test on all interactions, "cis" runs the binomial test only on intrachromosomal/cis interactions, "trans" runs the binomial test only on interchromosomal/trans interactions. filterdist An integer specifying the distance between the midpoint of fragments under which interactions are filtered out in order to filter for those read-pairs where the digestion was incomplete. The default is 10000. DUPLICATETHRESHOLD An integer specifying the maximum amount of duplicated paired-end reads allowed, over that value it is expected to be PCR bias. The default is 1. filetype parallel cores A data.frame containing elements A character string specifying the format of the aligned reads. The default is BAM. Other accepted format is Bowtie. Logical argument. If TRUE the mapping and the binomial test will be performed faster using multiple cores. The default is FALSE. An integer specifying the number of cores used in the parallel processing if parellel=true. The default is NULL. chr1 / chr2 chromosome(s) containing interacting regions 1 and 2 locus1 / locus2 start positions of the interacting regions 1 and 2 in the corresponding chromosome(s)

4 GOTHiChicup relcoverage1 / relcoverage2 relative coverage corresponding to regions 1 and 2 probability expected readcount pvalue expected frequency expected number of reads observed reads number binomial p-value qvalue binomial p-value corrected for multi-testing with Benjamini-Hochberg logobservedoverexpected observed/expected read numbers log ratio Borbala Mifsud and Robert Sugar binom.test, pairreads, mapreadstorestrictionsites library(gothic) dirpath <- system.file("extdata", package="hicdatalymphoblast") filename1 <- list.files(dirpath, full.names=true)[1] filename2 <- list.files(dirpath, full.names=true)[2] binom=gothic(filename1, filename2, samplename='lymphoid_chr20', res=1000000, BSgenomeName='BSgenome.Hsapiens.UCSC.hg18', genome=bsgenome.hsapiens.ucsc.hg18, restrictionsite='a^agctt', enzyme='hindiii',cistrans='all', filterdist=10000, DUPLICATETHRESHOLD=1, filetype='table', parallel=false, cores=null) GOTHiChicup Genome Organisation Through HiC from HiCUP output GOTHiChicup performs a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments. It takes mapped and filtered paired NGS reads from HiCUP as input and gives back the list of significant interactions for a given bin size in the genome. GOTHiChicup(fileName, samplename, res, restrictionfile, cistrans='all', parallel=false, cores=nul Arguments filename samplename A character string with the name of the file containing the mapped, filtered reads from HiCUP, after the default HiCUP output is converted to a table containing only the read ID, chromosome, start and end positions columns. Can be gzipped. (Tab separated text format) A character string that will be used to name the quality control plot. It will be saved in the current directory.

GOTHiChicup 5 res An integer that gives the required bin size or resolution of the contact map e.g. 1000000, for fragment level use 1. restrictionfile A character string with the name of the digest file from HiCUP. It is used to map reads to restriction fragments. (.txt file name) cistrans parallel cores A character string with three possibilities. "all" runs the binomial test on all interactions, "cis" runs the binomial test only on intrachromosomal/cis interactions, "trans" runs the binomial test only on interchromosomal/trans interactions. Logical argument. If TRUE the mapping and the binomial test will be performed faster using multiple cores. The default is FALSE. An integer specifying the number of cores used in the parallel processing if parellel=true. The default is NULL. Value A data.frame containing elements chr1 / chr2 chromosome(s) containing interacting regions 1 and 2 locus1 / locus2 start positions of the interacting regions 1 and 2 in the corresponding chromosome(s) relcoverage1 / relcoverage2 relative coverage corresponding to regions 1 and 2 probability expected readcount pvalue qvalue expected frequency expected number of reads observed reads number binomial p-value binomial p-value corrected for multi-testing with Benjamini-Hochberg logobservedoverexpected observed/expected read numbers log ratio Borbala Mifsud and Robert Sugar binom.test library(gothic) dirpath <- system.file("extdata", package="hicdatalymphoblast") filename <- list.files(dirpath, full.names=true)[4] restrictionfile <- list.files(dirpath, full.names=true)[3] binom=gothichicup(filename, samplename='lymphoid_chr20', res=1000000, restrictionfile, cistrans='all', parallel=false, cores=null)

6 mapreadstorestrictionsites mapreadstorestrictionsites Function to map aligned and paired reads to the restriction fragments This function takes mapped paired NGS reads in the format of a GenomicRangesList object where the two end of the reads are in the GenomicRanges paired_reads_1 and paired_reads_2. It prepares the digestion file from the genome supplied to it with the given restriction enzyme and specificity and maps the reads to the fragments. mapreadstorestrictionsites(pairedreadsfile, samplename, BSgenomeName, genome, restrictionsite, enzyme, parallel=f, cores=1) Arguments Value pairedreadsfile R object of GenomicRangesList containing paired_reads_1 and paired_reads_2 GenomicRanges with the paired mapped reads from a Hi-C experiment. samplename BSgenomeName A character string that will be used to name the exported R object file with the mapped reads containing a GenomicRangesList with slots locus1 and locus2. It will be saved in the current directory. A character string of the BSgenome package required to make the restriction fragment file containing information for both the organism the experiment was made in, and the genome version the reads were mapped to. The default is the current human genome build BSgenome.Hsapiens.UCSC.hg19. genome The BSgenome package required to make the restriction fragment file containing information for both the organism the experiment was made in, and the genome version the reads were mapped to. The default is the current human genome build BSgenome.Hsapiens.UCSC.hg19. restrictionsite A character string that specifies the enzymes recognition site, ^ indicating where the enzyme actually cuts. The default is the HindIII restriction site: A^AGCTT. enzyme parallel cores A GenomicRangesList A character string containing the name of the enzyme used during the Hi-C experiment (i.e. "HindIII", "NcoI"). The default is "HindIII". Logical argument. If TRUE the mapping will be performed faster using multiple cores. The default is FALSE. An integer specifying the number of cores used in the parallel processing if parellel=true. The default is 1. locus1 locus2 GenomicRanges with the coordinates of the start of the fragment where one end of the read mapped GenomicRanges with the coordinates of the start of the fragment where the other end of the read mapped

pairreads 7 Borbala Mifsud and Robert Sugar pairreads, GOTHiC library(gothic) data(lymphoid_chr20_paired_filtered) mapped=mapreadstorestrictionsites(filtered, samplename='lymphoid_chr20', BSgenomeName='BSgenome.Hsapiens.UCSC.hg18', genome=bsgenome.hsapiens.ucsc.hg18, restrictionsite='a^agctt', enzyme='hindiii', parallel=false, cores=1) pairreads Function pairs aligned paired NGS reads This function takes bowtie output files, pairs the reads, only keeps those where both ends mapped, filters for perfect duplicates to avoid PCR bias, and saves and returns a GenomicRangesList object that contains the paired_reads_1 and paired_reads_2 GenomicRanges with the paired reads pairreads(filename1, filename2, samplename, DUPLICATETHRESHOLD = 1, filetype='bam') Arguments Value filename1 filename2 File containing the mapped reads of the first fragment ends (BAM or Bowtie format) File containing the mapped reads of the second fragment ends (BAM or Bowtie format) samplename A character string that will be used to name the exported BedGraph file containing the coverage, and the R object file with paired reads. They will be saved in the current directory. DUPLICATETHRESHOLD An integer specifying the maximum amount of duplicated paired-end reads allowed, over that value it is expected to be PCR bias. The default is 1. filetype A GenomicRangesList called filtered A character string specifying the format of the aligned reads. The default is BAM. Other accepted format is Bowtie. paired_reads_1 GenomicRanges with the coordinates of where one end of the read mapped paired_reads_2 GenomicRanges with the coordinates of where the other end of the read mapped

8 pairreads Borbala Mifsud and Robert Sugar mapreadstorestrictionsites, GOTHiC library(gothic) dirpath <- system.file("extdata", package="hicdatalymphoblast") filename1 <- list.files(dirpath, full.names=true)[1] filename2 <- list.files(dirpath, full.names=true)[2] paired <- pairreads(filename1, filename2, samplename='lymphoid_chr20', DUPLICATETHRESHOLD = 1, filetype='table')

Index Topic datasets filtered, 2 Topic manip GOTHiC, 2 GOTHiChicup, 4 mapreadstorestrictionsites, 6 pairreads, 7 Topic package GOTHiC, 2 GOTHiChicup, 4 filtered, 2 GOTHiC, 2 GOTHiChicup, 4 lymphoid_chr20_paired_filtered (filtered), 2 mapreadstorestrictionsites, 6 pairreads, 7 9