Package vipor. March 22, 2017

Similar documents
Package ClustGeo. R topics documented: July 14, Type Package

Package beanplot. R topics documented: February 19, Type Package

Package lvplot. August 29, 2016

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package balance. October 12, 2018

Package mgc. April 13, 2018

Package r2d2. February 20, 2015

Package apastyle. March 29, 2017

Package strat. November 23, 2016

Package kdetrees. February 20, 2015

Package spark. July 21, 2017

Package areaplot. October 18, 2017

Package mmtsne. July 28, 2017

Package ipft. January 4, 2018

Package qualmap. R topics documented: September 12, Type Package

Package areal. December 31, 2018

Package lumberjack. R topics documented: July 20, 2018

Package qicharts2. March 3, 2018

Package deductive. June 2, 2017

Package TPD. June 14, 2018

Package VSE. March 21, 2016

Package fitur. March 11, 2018

Package Mondrian. R topics documented: March 4, Type Package

Package cgh. R topics documented: February 19, 2015

Package Numero. November 24, 2018

Package blocksdesign

Package reval. May 26, 2015

Package ssa. July 24, 2016

Package ggextra. April 4, 2018

Package panelview. April 24, 2018

Package kirby21.base

Package autoshiny. June 25, 2018

Package MonteCarlo. April 23, 2017

Package datasets.load

Package fastdummies. January 8, 2018

Package docxtools. July 6, 2018

Package MeanShift. R topics documented: August 29, 2016

Package OLScurve. August 29, 2016

Package blocksdesign

Package narray. January 28, 2018

Package PCADSC. April 19, 2017

Package optimus. March 24, 2017

Package nonlinearicp

Package desplot. R topics documented: April 3, 2018

Package sparsereg. R topics documented: March 10, Type Package

Package extremevalues

Package rafalib. R topics documented: August 29, Version 1.0.0

Package qicharts. October 7, 2014

Package dyncorr. R topics documented: December 10, Title Dynamic Correlation Package Version Date

Package zebu. R topics documented: October 24, 2017

Package SFtools. June 28, 2017

Package blandr. July 29, 2017

Package SC3. November 27, 2017

Mixed models in R using the lme4 package Part 2: Lattice graphics

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Package bisect. April 16, 2018

Package treedater. R topics documented: May 4, Type Package

Package WordR. September 7, 2017

Package qboxplot. November 12, qboxplot... 1 qboxplot.stats Index 5. Quantile-Based Boxplots

Frequency Distributions

Package vip. June 15, 2018

Package GLDreg. February 28, 2017

Package plotluck. November 13, 2016

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.

Package voxel. R topics documented: May 23, 2017

Package pwrrasch. R topics documented: September 28, Type Package

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package kdevine. May 19, 2017

Package hypercube. December 15, 2017

Package sigqc. June 13, 2018

Package SparseFactorAnalysis

Package kdtools. April 26, 2018

Package beast. March 16, 2018

Package Rtsne. April 14, 2017

Package spcadjust. September 29, 2016

Package readobj. November 2, 2017

Package samplesizecmh

Package harrypotter. September 3, 2018

Package curvhdr. April 16, 2017

Package comparedf. February 11, 2019

Package radiomics. March 30, 2018

Package nmslibr. April 14, 2018

Package glmnetutils. August 1, 2017

Package sankey. R topics documented: October 22, 2017

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Package OptimaRegion

Package densityclust

Package woebinning. December 15, 2017

Package LVGP. November 14, 2018

Package ICSShiny. April 1, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package mixsqp. November 14, 2018

Package fso. February 19, 2015

Package dmrseq. September 14, 2018

Package splithalf. March 17, 2018

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package scmap. March 29, 2019

Package queuecomputer

Package simed. November 27, 2017

Transcription:

Type Package Package vipor March 22, 2017 Title Plot Categorical Data Using Quasirandom Noise and Density Estimates Version 0.4.5 Date 2017-03-22 Author Scott Sherrill-Mix, Erik Clarke Maintainer Scott Sherrill-Mix <shescott@upenn.edu> Generate a violin point plot, a combination of a violin/histogram plot and a scatter plot by offsetting points within a category based on their density using quasirandom noise. License GPL (>= 2) LazyData True Depends R (>= 3.0.0) Imports stats, graphics Suggests testthat, beeswarm, lattice, ggplot2, beanplot, vioplot, ggbeeswarm, RoxygenNote 6.0.1 NeedsCompilation no Repository CRAN Date/Publication 2017-03-22 22:18:11 UTC R topics documented: avewithargs........................................ 2 counties........................................... 3 digits2number........................................ 3 generatepermutestring................................... 4 integrations......................................... 5 number2digits........................................ 5 offsetx........................................... 6 permute........................................... 8 1

2 avewithargs topbottomdistribute.................................... 8 tukeypermutes........................................ 9 tukeyt............................................ 10 tukeytexture........................................ 10 vandercorput........................................ 11 vipor............................................. 12 vpplot............................................ 12 Index 14 avewithargs the ave() function but with arguments passed to FUN A function is applied to subsets of x where each subset consist of those observations with the same groupings in y avewithargs(x, y, FUN = mean,...) x a vector to apply FUN to y a vector or list of vectors of grouping variables all of the same length as x FUN function to apply for each factor level combination.... additional arguments to FUN A numeric vector of the same length as x where an each element contains the output from FUN after FUN was applied on the corresponding subgroup for that element (repeated if necessary within a subgroup). See Also ave avewithargs(1:10,rep(1:5,2)) avewithargs(c(1:9,na),rep(1:5,2),max,na.rm=true)

counties 3 counties Census ata on US counties A dataset containing data from the US census burea counties Format Source A data frame with 3143 rows and 8 variables: id GEO.id from original data state state in which the county is located county name of the county population population of the county housingunits housing units in the county totalarea Area in square miles - Total area waterarea Area in square miles - Water area landarea Area in square miles - Land area http://factfinder.census.gov/bkmk/table/1.0/en/dec/10_sf1/gctph1.us05pr, system.file("dataraw", "makecounties.r", package = "vipor") References https://www.census.gov/prod/cen2010/cph-2-1.pdf digits2number Convert a vector of integers representing digits in an arbitrary base to an integer Takes a vector of integers representing digits in an arbitrary base e.g. binary or octal and converts it into an integer (or the integer divided by base^length(digits) for the number of digits if fractional is TRUE). Note that the first digit in the input is the least significant.

4 generatepermutestring digits2number(digits, base = 2, fractional = FALSE) digits base fractional a vector of integers representing digits in an arbitrary base the base for the numeral system (e.g. 2 for binary or 8 for octal) divide the output by the max for this number of digits and base. Note that this is base^length(digits) not base^length(digits)-1. an integer References https://en.wikipedia.org/wiki/radix digits2number(c(4,4,1),8) digits2number(number2digits(100)) generatepermutestring Generate a permutation string meeting Tukey criteria Find a random string of concatenated permutations of 1:n fulfilling Tukey s criteria that there are no runs of 3 or more increases or decreases in a row. Tukey just uses the default n=5. generatepermutestring(nreps = 20, n = 5) nreps n number of permutations to concatenate permutations from 1 to n a vector of nreps*n integers giving concatenated permutations tukeypermutes() tukeypermutes(6,3)

integrations 5 integrations Data on HIV integration sites from several studies A dataset containing data from a meta-analysis looking for differences between active and inactive HIV integrations. Each row represents a provirus integrated somewhere in a human chromosome with whether viral expression was detectd, the distance to the nearest gene and the number of reads from H4K12ac ChIP-Seq mapped to within 50,000 bases of the integration. integrations Format Source A data frame with 12436 rows and 4 variables: study the cell population infected by HIV latent whether the provirus was active (expressed) or inactive (latent) nearestgene distance to nearest gene (transcription unit) (0 if in a gene) H4K12ac number of reads aligned within +- 50,000 bases in a H4K12ac ChIP-Seq http://www.retrovirology.com/content/10/1/90/additional, system.file("data-raw", "makeintegrations.r", package = "vipor") References http://www.retrovirology.com/content/10/1/90 number2digits Convert an integer to an arbitrary base Takes an integer and converts it into an arbitrary base e.g. binary or octal. Note that the first digit in the output is the least significant. number2digits(n, base = 2)

6 offsetx n base the integer to be converted the base for the numeral system (e.g. 2 for binary or 8 for octal) a vector of length ceiling(log(n+1,base)) respresenting each digit for that numeral system References https://en.wikipedia.org/wiki/radix number2digits(100) number2digits(100,8) offsetx Offset data using quasirandom noise to avoid overplotting Arranges data points using quasirandom noise (van der Corput sequence), pseudorandom noise or alternatively positioning extreme values within a band to the left and right to form beeswarm/onedimensional scatter/strip chart style plots. That is a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points). This function returns a vector of the offsets to be used in plotting. offsetx(y, x = rep(1, length(y)), width = 0.4, varwidth = FALSE,...) offsetsinglegroup(y, maxlength = NULL, method = c("quasirandom", "pseudorandom", "smiley", "maxout", "frowney", "minout", "tukey", "tukeydense"), nbins = NULL, adjust = 1) y x width varwidth vector of data points a grouping factor for y (optional) the maximum spacing away from center for each group of points. Since points are spaced to left and right, the maximum width of the cluster will be approximately width*2 (0 = no offset, default = 0.4) adjust the width of each group based on the number of points in the group... additional arguments to offsetsinglegroup

offsetx 7 maxlength method nbins adjust multiply the offset by sqrt(length(y)/maxlength) if not NULL. The sqrt is to match boxplot (allows comparison of order of magnitude different ns, scale with standard error) method used to distribute the points: quasirandom: points are distributed within a kernel density estimate of the distribution with offset determined by quasirandom Van der Corput noise pseudorandom: points are distributed within a kernel density estimate of the distribution with offset determined by pseudorandom noise a la jitter maxout: points are distributed within a kernel density with points in a band distributed with highest value points on the outside and lowest in the middle minout: points are distributed within a kernel density with points in a band distributed with highest value points in the middle and lowest on the outside tukey: points are distributed as described in Tukey and Tukey "Strips displaying empirical distributions: I. textured dot strips" tukeydense: points are distributed as described in Tukey and Tukey but are constrained with the kernel density estimate the number of points used to calculate density (defaults to 1000 for quasirandom and pseudorandom and 100 for others) adjust the bandwidth used to calculate the kernel density (smaller values mean tighter fit, larger values looser fit, default is 1) a vector with of x-offsets of the same length as y ## Generate fake data dat <- list(rnorm(50), rnorm(500), c(rnorm(100), rnorm(100,5)), rcauchy(100)) names(dat) <- c("normal", "Dense Normal", "Bimodal", "Extremes") ## Plot each distribution with a variety of parameters par(mfrow=c(4,1), mar=c(2,4, 0.5, 0.5)) sapply(names(dat),function(label) { y<-dat[[label]] offsets <- list( 'Default'=offsetX(y), 'Smoother'=offsetX(y, adjust=2), 'Tighter'=offsetX(y, adjust=0.1), 'Thinner'=offsetX(y, width=0.1) ) ids <- rep(1:length(offsets), sapply(offsets,length)) plot(unlist(offsets) + ids, rep(y, length(offsets)), ylab=label, xlab='', xaxt='n', pch=21, las=1) axis(1, 1:4, c("default", "Adjust=2", "Adjust=0.1", "Width=10%")) })

8 topbottomdistribute permute Return all permutations of a vector Recursively generates all permutations of a vector. The result will be factorial(length(vals)) long so be careful with any longer vectors (e.g. longer than 10). permute(vals) vals a vector of elements to be permuted A list of vectors containing all permutation of the values See Also sample permute(letters[1:3]) permute(1:5) topbottomdistribute Produce offsets such that points are sorted with most extreme values to right and left Produce offsets to generate smile-like or frown-like distributions of points. That is sorting the points so that the most extreme values alternate between the left and right e.g. (max,3rd max,...,4th max, 2nd max). The function returns either a proportion between 0 and 1 (useful for plotting) or an order topbottomdistribute(x, frowney = FALSE, prop = TRUE)

tukeypermutes 9 x frowney prop the elements to be sorted if TRUE then sort minimums to the outside, otherwise sort maximums to the outside if FALSE then return an ordering of the data with extremes on the outside. If TRUE then return a sequence between 0 and 1 sorted by the ordering a vector of the same length as x with values ranging between 0 and 1 if prop is TRUE or an ordering of 1 to length(x) topbottomdistribute(1:10) topbottomdistribute(1:10,true) tukeypermutes Find permutations meeting Tukey criteria Find all permutations of 1:n fulfilling Tukey s criteria that there are no runs of 3 or more increases or decreases in a row. Tukey just uses the default n=5 and limit=2. tukeypermutes(n = 5, limit = 2) n limit permutations from 1 to n the maximum number of increases or decreases in a row a list of vectors containing valid permutations tukeypermutes() tukeypermutes(6,3)

10 tukeytexture tukeyt Combine multiple permutation strings into one Combine base+1 permutation strings to generate offsets tukeyt(nreps = 10, base = 5) nreps base number of permutations to paste together generate permutations of integers 1:base A nreps*base length vector giving offset positions based on Tukey s algorithm tukeyt() tukeyt() tukeyt(5,4) tukeytexture Generate random positions based on Tukey texture algorithm Generate partly random, partly constrained lateral displacements based on Tukey texture algorithm from Tukey and Tukey 1990 tukeytexture(x, jitter = TRUE, thin = FALSE, hollow = FALSE, delta = diff(stats::quantile(x, c(0.25, 0.75))) * 0.03) x jitter thin hollow delta the points to be jittered. really only used to calculate length if TRUE add random jitter to each point if TRUE then push points to the center in thin regions if TRUE then expand points outward to avoid hollowness a reasonably small value used in edge straightening and thinning

vandercorput 11 a vector of length length(x) giving displacements for each corresponding point in x x<-rnorm(200) plot(tukeytexture(x),x) x<-1:100 plot(tukeytexture(x),x) plot(tukeytexture(log10(counties$landarea),true,true),log10(counties$landarea),cex=.25) vandercorput Generate van der Corput sequences Generates the first (or an arbitrary offset) n elements of the van der Corput low-discrepancy sequence for a given base vandercorput(n, base = 2, start = 1) n base start the first n elements of the van der Corput sequence the base to use for calculating the van der Corput sequence start at this position in the sequence a vector of length n with values ranging between 0 and 1 References https://en.wikipedia.org/wiki/van_der_corput_sequence vandercorput(100)

12 vpplot vipor Functions to generate violin scatter plots Details Arranges data points using quasirandom noise (van der Corput sequence) to create a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points). The development version of this package is on http://github.com/sherrillmix/ vipor The main functions are: offsetx: calculate offsets in X position for plotting (groups of) one dimensional data vpplot: a simple wrapper around plot and offsetx to generate plots of grouped data Author(s) See Also Scott Sherrill-Mix, <shescott@upenn.edu> http://github.com/sherrillmix/vipor dat<-list(rnorm(100),rnorm(50,1,2)) ids<-rep(1:length(dat),sapply(dat,length)) offset<-offsetx(unlist(dat),ids) plot(unlist(dat),ids+offset) vpplot Plot data using offsets by quasirandom noise to generate a violin point plot Arranges data points using quasirandom noise (van der Corput sequence), pseudorandom noise or alternatively positioning extreme values within a band to the left and right to form beeswarm/onedimensional scatter/strip chart style plots. That is a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points) and so here we ll call it a violin point plot. vpplot(x = rep("data", length(y)), y, xaxt = "y", offsetxargs = NULL,...)

vpplot 13 x y xaxt See Also offsetxargs a grouping factor for y (optional) vector of data points if n then no x axis is plotted a list with arguments for offsetx... additional arguments to plot invisibly return the adjusted x positions of the points offsetx dat<-list( 'Mean=0'=rnorm(200), 'Mean=1'=rnorm(50,1), 'Bimodal'=c(rnorm(40,-2),rnorm(60,2)), 'Gamma'=rgamma(50,1) ) labs<-factor(rep(names(dat),sapply(dat,length)),levels=names(dat)) vpplot(labs,unlist(dat))

Index Topic datasets counties, 3 integrations, 5 ave, 2 avewithargs, 2 counties, 3 digits2number, 3 generatepermutestring, 4 integrations, 5 number2digits, 5 offsetsinglegroup (offsetx), 6 offsetx, 6, 12, 13 permute, 8 sample, 8 topbottomdistribute, 8 tukeypermutes, 9 tukeyt, 10 tukeytexture, 10 vandercorput, 11 vipor, 12 vipor-package (vipor), 12 vpplot, 12, 12 14