Package dupradar. R topics documented: July 12, Type Package

Similar documents
Package SC3. September 29, 2018

Package rdgidb. November 10, 2018

Package scmap. March 29, 2019

Package scruff. November 6, 2018

Package seqcat. March 25, 2019

Package TMixClust. July 13, 2018

Package RNAseqNet. May 16, 2017

Package SC3. November 27, 2017

Package EventPointer

Package MIRA. October 16, 2018

Package multihiccompare

Package metagene. November 1, 2018

Package omicplotr. March 15, 2019

Package ChIC. R topics documented: May 4, Title Quality Control Pipeline for ChIP-Seq Data Version Author Carmen Maria Livi

Package GARS. March 17, 2019

Package dmrseq. September 14, 2018

Package SNPediaR. April 9, 2019

Package EMDomics. November 11, 2018

Package bacon. October 31, 2018

Package DMCHMM. January 19, 2019

Package DRIMSeq. December 19, 2018

Package biosigner. March 6, 2019

Package BASiNET. October 2, 2018

Package mfa. R topics documented: July 11, 2018

Package Linnorm. October 12, 2016

Package RNASeqR. January 8, 2019

Package HTSeqGenie. April 16, 2019

Package CoGAPS. January 24, 2019

Package RTCGAToolbox

Package ffpe. October 1, 2018

Package gpart. November 19, 2018

Package pgca. March 17, 2019

Package FitHiC. October 16, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package RTNduals. R topics documented: March 7, Type Package

Package IsoCorrectoR. R topics documented:

Package RmiR. R topics documented: September 26, 2018

Package phantasus. March 8, 2019

Package wavcluster. January 23, 2019

Package BDMMAcorrect

Package RcisTarget. July 17, 2018

Package bnbc. R topics documented: April 1, Version 1.4.0

Package Rsubread. July 21, 2013

The ChIP-seq quality Control package ChIC: A short introduction

Package geneslope. October 26, 2016

Package icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang

Package BgeeDB. January 5, 2019

Package svalues. July 15, 2018

Package rqt. November 21, 2017

Package LOLA. December 24, 2018

Package ECctmc. May 1, 2018

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode

Package spade. R topics documented:

Package RNASeqBias. R topics documented: March 7, Type Package. Title Bias Detection and Correction in RNA-Sequencing Data. Version 1.

Package datasets.load

Package M3Drop. August 23, 2018

Package aop. December 5, 2016

Rsubread package: high-performance read alignment, quantification and mutation discovery

Package EDASeq. December 30, 2018

Package gifti. February 1, 2018

Package anota2seq. January 30, 2018

Package ccmap. June 17, 2018

Package igc. February 10, 2018

Package tximport. November 25, 2017

Rsubread package: high-performance read alignment, quantification and mutation discovery

Package ChIPseeker. July 12, 2018

Package mgc. April 13, 2018

Package SIMLR. December 1, 2017

Package OLScurve. August 29, 2016

Package roar. August 31, 2018

Package tximport. May 13, 2018

Package conumee. February 17, 2018

Package BEARscc. May 2, 2018

Package linkspotter. July 22, Type Package

Package PGA. July 6, 2018

Package sigqc. June 13, 2018

Package SCORPIUS. June 29, 2018

Package GRmetrics. R topics documented: September 27, Type Package

Package docxtools. July 6, 2018

Package NetWeaver. January 11, 2017

Package snm. July 20, 2018

Package preprosim. July 26, 2016

Package STROMA4. October 22, 2018

Package MSstatsQC. R topics documented: November 3, Version Type Package

Package MEAL. August 16, 2018

Package WhiteStripe. April 19, 2018

Package sparsenetgls

Package Numero. November 24, 2018

Package EFS. R topics documented:

Package decomptumor2sig

Package fastqcr. April 11, 2017

Package macorrplot. R topics documented: October 2, Title Visualize artificial correlation in microarray data. Version 1.50.

Package Repliscope. February 6, 2019

Package NanoStringQCPro

Package fastdummies. January 8, 2018

m6aviewer Version Documentation

Package M3D. March 21, 2019

Package customprodb. September 9, 2018

Package GCAI.bias. February 19, 2015

Package ggmosaic. February 9, 2017

Transcription:

Type Package Package dupradar July 12, 2018 Title Assessment of duplication rates in RNA-Seq datasets Version 1.10.0 Date 2015-09-26 Author Sergi Sayols <sergisayolspuig@gmail.com>, Holger Klein <holger.klein@gmail.com> Maintainer Sergi Sayols <sergisayolspuig@gmail.com>, Holger Klein <holger.klein@gmail.com> Duplication rate quality control for RNA-Seq datasets. VignetteBuilder knitr Suggests BiocStyle, knitr, rmarkdown, AnnotationHub Depends R (>= 3.2.0) Imports Rsubread (>= 1.14.1) LazyData true License GPL-3 biocviews Technology, Sequencing, RNASeq, QualityControl NeedsCompilation no git_url https://git.bioconductor.org/packages/dupradar git_branch RELEASE_3_7 git_last_commit 6eea570 git_last_commit_date 2018-04-30 Date/Publication 2018-07-12 R topics documented: analyzeduprates....................................... 2 bamutilmarkduplicates................................... 3 cumulativedupratebarplot................................. 4 dm.............................................. 4 dm.bad........................................... 5 dupradar.......................................... 5 dupradar_examples..................................... 5 duprateexpboxplot..................................... 6 1

2 analyzeduprates duprateexpdensplot.................................... 6 duprateexpfit........................................ 7 duprateexpidentify..................................... 8 duprateexpplot....................................... 9 expressionhist........................................ 10 getbinduplication...................................... 10 getbinrpkmean....................................... 11 getbin....................................... 11 getstats....................................... 12 getdynamicrange...................................... 12 getrpkbinreadcountfraction............................... 13 getrpkcumulativereadcountfraction........................... 13 markduplicates....................................... 14 picardmarkduplicates.................................... 15 readcountexpboxplot.................................... 15 Index 17 analyzeduprates Read in a BAM file and count the tags falling on the features described in the GTF file analyzeduprates returns a data.frame with tag counts analyzeduprates(bam, gtf, stranded = 0, paired = FALSE, threads = 1, verbose = FALSE,...) bam gtf stranded paired threads verbose The bam file containing the duplicate-marked reads The gtf file describing the features Whether the reads are strand specific Paired end experiment? The number of threads to be used for counting Whether to output Rsubread messages into the console... Other params sent to featurecounts This function makes use of the Rsubread package to count tags on the GTF features in different scenarios. The scenarios are the 4 possible combinations of allowing multimappers (yes/no) and duplicate reads (yes/no). A data.frame with counts on features, with and without taking into account multimappers/duplicated reads

bamutilmarkduplicates 3 bam <- system.file("extdata", "wgencodecaltechrnaseqgm12878r1x75dalignsrep2v2_duprm.bam", package="dupradar") gtf <- system.file("extdata","genes.gtf",package="dupradar") stranded <- 2 # '0' (unstranded), '1' (stranded) and '2' (reverse) paired <- FALSE threads <- 4 # call the duplicate marker and analyze the reads dm <- analyzeduprates(bam,gtf,stranded,paired,threads) bamutilmarkduplicates Mark duplicates using bamutil bamutilmarkduplicates Mark duplicated reads from a BAM file by calling bamutil bamutilmarkduplicates(bam, out, path, verbose) bam out path verbose The bam file to mark duplicates from Regular expression describing the transformation on the original filename to get the output filename. By default, a "_duprm" suffix is added before the bam extension Path to the duplicate marker binaries Redirect all the program output to the R console This function is supposed to be called through the markduplicates wrapper The return code of the system call

4 dm cumulativedupratebarplot Barplot showing the cumulative read counts fraction cumulativedupratebarplot Barplot showing the cumulative read counts fraction cumulativedupratebarplot(, stepsize = 0.05,...) stepsize The size of the windows used for plotting... Other params sent to barplot This function makes a barplot showing the cumulative read counts fraction from the duplication matrix calculated by analyzeduprates. nothing # call the plot cumulativedupratebarplot(=dm) dm Duplication matrix of a good RNASeq experiment A dataset containing the duplication matrix of a good RNASeq experiment, in terms of duplicates. Comes from the GM12878 SR1x75 replicate 2 from Caltech (UCSC s table Browser name: wgencodecaltechrnaseqgm12878r1x75dalignsrep2v2) data(dupradar_examples) Format A data frame with 23228 rows and 14 variables

dm.bad 5 dm.bad Duplication matrix of a failed RNASeq experiment A dataset containing the duplication matrix of a failed RNASeq experiment, containing unusual duplication rate. Comes from the HCT116 PE2x75 replicate 1 from Caltech (UCSC s table Browser name: wgencodecaltechrnaseqhct116r2x75il200alignsrep1v2) data(dupradar_examples) Format A data frame with 23228 rows and 14 variables dupradar dupradar. dupradar. dupradar_examples Example data containing precomputed matrices for two RNASeq experiments Precomputed duplication matrices for two RNASeq experiments used as examples of a good and a failed (in terms of high redundancy of reads) experiments. The experiments come from the EN- CODE project, as a source of a broad variety of protocols, library types and sequencing facilities. data(dupradar_examples) Format A list with two example duplication matrices

6 duprateexpdensplot duprateexpboxplot Duplication rate ~ total reads per kilobase (RPK) boxplot duprateexpboxplot Duplication rate ~ total reads per kilobase (RPK) boxplot duprateexpboxplot(, stepsize = 0.05,...) stepsize Expression bin seze for the boxplot... Other params sent to boxplot This function makes a boxplot showing the distribution of per gene duplication rate versus the reads per kilobase (RPK) inside gene expression bins. nothing # duprate boxplot duprateexpboxplot(=dm) duprateexpdensplot Duplication rate ~ total read count plot duprateexpdensplot Duplication rate ~ total read count plot duprateexpdensplot(, pal = c("cyan", "blue", "green", "yellow", "red"), tnoalternative = TRUE, trpkm = TRUE, trpkmval = 0.5, tfit = TRUE, addlegend = TRUE,...)

duprateexpfit 7 pal The color palette to use to display the density tnoalternative Display threshold of 1000 reads per kilobase trpkm trpkmval tfit addlegend Display threshold at a given RPKM level The given RPKM level Whether to fit the model Whether to add a legend to the plot... Other parameters sent to plot() This function makes a scatter plot showing the per gene duplication rate versus the total read count. nothing # duprate plot duprateexpdensplot(=dm) duprateexpfit Duplication rate ~ total read count fit model duprateexpdensplot Duplication rate ~ total read count fit model duprateexpfit() Fit a Generalized Linear Model using a logit function between thegene duplication rate and the total read count. The GLM and the coefficients of the fitted logit function

8 duprateexpidentify # duprate plot duprateexpfit(=dm) duprateexpidentify Identify genes plotted by duprateexpplot duprateexpidentify Identify genes plotted by duprateexpplot duprateexpidentify(, idcol = "ID") idcol The column from the duplication matrix containing the labels This function makes a barplot showing the cumulative read counts fraction from the duplication matrix calculated by analyzeduprates. The identified points. x and y values match the ones from duprateexpplot # call the plot and identify genes duprateexpplot(=dm) duprateexpidentify(=dm)

duprateexpplot 9 duprateexpplot Duplication rate ~ total read count plot duprateexpplot Duplication rate ~ total read count plot duprateexpplot(, tnoalternative = TRUE, trpkm = TRUE, trpkmval = 0.5, addlegend = TRUE,...) tnoalternative Display threshold of 1000 reads per kilobase trpkm trpkmval addlegend Display threshold at a given RPKM level The given RPKM level Whether to add a legend to the plot... Other parameters sent to smoothscatter() This function makes a smooth scatter plot showing the per gene duplication rate versus the total read count. nothing # duprate plot duprateexpplot(=dm)

10 getbinduplication expressionhist Draw histogram with the expression values expressionhist Draw histogram with the expression values expressionhist(, value = "RPK",...) value The column from the duplication matrix containing the expression values... Other parameters sent to hist() This function draws a histogram of the expression values from the duplication matrix calculated by analyzeduprates. nothing # histogram of expression values for annotation expressionhist(=dm) getbinduplication Helper function used in duprateexpboxplot getbinduplication get duplication rate for a subset of the duplication matrix getbinduplication(p, stepsize, ) p stepsize The vector of bins The window size

getbinrpkmean 11 The duplication rate per bin getbinrpkmean Helper function used in duprateexpboxplot getbinrpkmean get mean duplication rate per bin getbinrpkmean(p, stepsize, ) p stepsize The vector of bins The window size The averaged RPK per bin getbin Helper function used in getbinduplication and getbinrpkmean getbin get a subset of the matrix for values in a specific bin defined by the upper bound p and stepsize getbin(p, stepsize = 0.05, value = "allcounts", ) p stepsize value The vector of bins The window size The column to be subset The subseted matrix

12 getdynamicrange getstats Report duplication stats on regions getstats Report duplication stats based on the data calculated in the duplication matrix getstats() A data.frame containing the stats about the number of genes covered (1+ tags) and the number of genes containing duplicates (1+) # call the plot and identify genes getstats(=dm) getdynamicrange Dynamic range getdynamicrange Calculate the dynamic range of the RNAseq experiment getdynamicrange(dm) dm This function calculates the dynamic range of the RNAseq eperiment A list with 2 elements, containing the dynamic range counting all reads and the dynamic range after removing duplicates.

getrpkbinreadcountfraction 13 # calculate the dynamic range getdynamicrange(dm) getrpkbinreadcountfraction Helper function used in readcountexpressionboxplot readcountexpressionboxplot Calculates the fraction of total reads in a vector of bins getrpkbinreadcountfraction(p, stepsize = stepsize, = ) p stepsize The vector of bins The window size The fraction of total reads in a vector of bins getrpkcumulativereadcountfraction Helper function used in readcountexpressionboxplot getrpkcumulativereadcountfraction get the cumulative read count fraction getrpkcumulativereadcountfraction(p, = ) p The vector of bins The cumulative read count fraction

14 markduplicates markduplicates Program dispatchers to mark duplicated reads from a BAM file markduplicates Mark duplicated reads from a BAM file by calling widely used tools. markduplicates(dupremover = "bamutil", bam = NULL, out = gsub("\\.bam$", "_duprm.bam", bam), rminput = TRUE, path = ".", verbose = TRUE,...) dupremover bam out rminput path verbose The tool to be called. Currently, "picard" and "bamutils" are supported The bam file to mark duplicates from Regular expression describing the transformation on the original filename to get the output filename. By default, a "_duprm" suffix is added before the bam extension Whether to keep the original, non duplicate-marked, bam file Path to the duplicate marker binaries Redirect all the program output to the R console... Other parameters sent to the caller function This function works as a wrapper for several tools widely adopted tr mark duplicated reads in a BAM file. Currently, it supports PICARD and BamUtils. The output filename ## Not run: bam <- system.file("extdata","sample1aligned.out.bam",package="dupradar") gtf <- "genes.gtf" stranded <- 2 # '0' (unstranded), '1' (stranded) and '2' (reverse) paired <- FALSE threads <- 4 # call the duplicate marker and analyze the reads bamduprm <- markduplicates(dupremover="bamutil",bam, path="/opt/bamutil-master/bin",rminput=false) dm <- analyzeduprates(bamduprm,gtf,stranded,paired,threads) ## End(Not run)

picardmarkduplicates 15 picardmarkduplicates Mark duplicates using Picard tools picardmarkduplicates Mark duplicated reads from a BAM file by calling picard tools picardmarkduplicates(bam, out, path, verbose, threads = 1, maxmem = "4g") bam out path verbose threads maxmem The bam file to mark duplicates from Regular expression describing the transformation on the original filename to get the output filename. By default, a "_duprm" suffix is added before the bam extension Path to the duplicate marker binaries Redirect all the program output to the R console Number of threads to use Max memory assigned to the jvm This function is supposed to be called through the markduplicates wrapper The return code of the system call readcountexpboxplot Barplot of percentage of reads falling into expression bins readcountexpboxplot Barplot of percentage of reads falling into expression bins readcountexpboxplot(, stepsize = 0.05,...) stepsize The number of bars to be shown... Other parameters sent to barplot()

16 readcountexpboxplot This function makes a barplot of percentage of reads falling into expression bins nothing Other parameters sent to barplot() # barplot of percentage of reads falling into expression bins readcountexpboxplot(=dm)

Index Topic datasets dm, 4 dm.bad, 5 dupradar_examples, 5 analyzeduprates, 2 bamutilmarkduplicates, 3 cumulativedupratebarplot, 4 dm, 4 dm.bad, 5 dupradar, 5 dupradar-package (dupradar), 5 dupradar_examples, 5 duprateexpboxplot, 6 duprateexpdensplot, 6 duprateexpfit, 7 duprateexpidentify, 8 duprateexpplot, 9 expressionhist, 10 getbinduplication, 10 getbinrpkmean, 11 getbin, 11 getstats, 12 getdynamicrange, 12 getrpkbinreadcountfraction, 13 getrpkcumulativereadcountfraction, 13 markduplicates, 14 picardmarkduplicates, 15 readcountexpboxplot, 15 17