Package wrswor. R topics documented: February 2, Type Package

Similar documents
Package ECctmc. May 1, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package validara. October 19, 2017

Package bisect. April 16, 2018

Package kdtools. April 26, 2018

Package SEMrushR. November 3, 2018

Package semver. January 6, 2017

Package WordR. September 7, 2017

Package knitrprogressbar

Package rprojroot. January 3, Title Finding Files in Project Subdirectories Version 1.3-2

Package fastdummies. January 8, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package rsppfp. November 20, 2018

Package messaging. May 27, 2018

Package tibble. August 22, 2017

Package sfc. August 29, 2016

Package gtrendsr. October 19, 2017

Package triebeard. August 29, 2016

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package queuecomputer

Package milr. June 8, 2017

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package vinereg. August 10, 2018

Package nmslibr. April 14, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package ramcmc. May 29, 2018

Package docxtools. July 6, 2018

Package strat. November 23, 2016

Package statsdk. September 30, 2017

Package vip. June 15, 2018

Package datasets.load

Package widyr. August 14, 2017

Package canvasxpress

Package qualmap. R topics documented: September 12, Type Package

Package calpassapi. August 25, 2018

Package hyphenatr. August 29, 2016

Package splithalf. March 17, 2018

Package projector. February 27, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package deductive. June 2, 2017

Package mixsqp. November 14, 2018

Package githubinstall

Package TrafficBDE. March 1, 2018

Package tidyselect. October 11, 2018

Package cancensus. February 4, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package facerec. May 14, 2018

Package barcoder. October 26, 2018

Package readobj. November 2, 2017

Package gtrendsr. August 4, 2018

Package dbx. July 5, 2018

Package coga. May 8, 2018

Package sessioninfo. June 21, 2017

Package fastqcr. April 11, 2017

Package shinyfeedback

Package phonics. February 13, Type Package Title Phonetic Spelling Algorithms Version Date Encoding UTF-8

Package embed. November 19, 2018

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package stapler. November 27, 2017

Package opencage. January 16, 2018

Package auctestr. November 13, 2017

Package geogrid. August 19, 2018

Package ompr. November 18, 2017

Package elmnnrcpp. July 21, 2018

Package crossrun. October 8, 2018

Package jpmesh. December 4, 2017

Package mdftracks. February 6, 2017

Package repec. August 31, 2018

Package styler. December 11, Title Non-Invasive Pretty Printing of R Code Version 1.0.0

Package fusionclust. September 19, 2017

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package nngeo. September 29, 2018

Package effectr. January 17, 2018

Package robotstxt. November 12, 2017

Package postgistools

Package understandbpmn

Package fitur. March 11, 2018

Package fingertipsr. May 25, Type Package Version Title Fingertips Data for Public Health

Package ether. September 22, 2018

Package spark. July 21, 2017

Package snakecase. R topics documented: March 25, Version Date Title Convert Strings into any Case

Package ecoseries. R topics documented: September 27, 2017

Package rgho. R topics documented: January 18, 2017

Package condusco. November 8, 2017

Package wikitaxa. December 21, 2017

Package raker. October 10, 2017

Package ruler. February 23, 2018

Package combiter. December 4, 2017

Package AhoCorasickTrie

Package pdfsearch. July 10, 2018

Package keyholder. May 19, 2018

Package fastrtext. December 10, 2017

Package packcircles. April 28, 2018

Package farver. November 20, 2018

Package diffusr. May 17, 2018

Package quadprogxt. February 4, 2018

Package haven. April 9, 2015

Package omu. August 2, 2018

Package balance. October 12, 2018

Package RcppProgress

Transcription:

Type Package Package wrswor February 2, 2018 Title Weighted Random Sampling without Replacement Version 1.1 Date 2018-02-02 Description A collection of implementations of classical and novel algorithms for weighted sampling without replacement. License GPL-3 URL http://krlmlr.github.io/wrswor BugReports https://github.com/krlmlr/wrswor/issues Depends R (>= 3.0.2) Imports logging (>= 0.4-13), Rcpp Suggests BatchExperiments, dplyr, ggplot2, import, kimisc (>= 0.2-4), knitcitations, knitr, metap, microbenchmark, rmarkdown, roxygen2, rticles (>= 0.1), sampling, testthat, tidyr, tikzdevice (>= 0.9-1) LinkingTo Rcpp (>= 0.11.5) Encoding UTF-8 LazyData true RoxygenNote 6.0.1.9000 URLNote https://github.com/krlmlr/wrswor NeedsCompilation yes Author Kirill Müller [aut, cre] Maintainer Kirill Müller <krlmlr+r@mailbox.org> Repository CRAN Date/Publication 2018-02-02 18:26:36 UTC R topics documented: wrswor-package...................................... 2 sample_int_r........................................ 2 1

2 sample_int_r Index 6 wrswor-package Faster weighted sampling without replacement Description R s default sampling without replacement using base::sample.int() seems to require quadratic run time, e.g., when using weights drawn from a uniform distribution. For large sample sizes, this is too slow. This package contains several alternative implementations. Details Implementations are adapted from http://stackoverflow.com/q/15113650/946850. Author(s) Kirill Müller References Efraimidis, Pavlos S., and Paul G. Spirakis. "Weighted random sampling with a reservoir." Information Processing Letters 97, no. 5 (2006): 181-185. Wong, Chak-Kuen, and Malcolm C. Easton. "An efficient method for weighted sampling without replacement." SIAM Journal on Computing 9, no. 1 (1980): 111-113. Examples sample_int_rej(100, 50, 1:100) sample_int_r Weighted sampling without replacement Description These functions implement weighted sampling without replacement using various algorithms, i.e., they take a sample of the specified size from the elements of 1:n without replacement, using the weights defined by prob. The call sample_int_*(n, size, prob) is equivalent to sample.int(n, size, replace = F, p (The results will most probably be different for the same random seed, but the returned samples are distributed identically for both calls.) Except for sample_int_r() (which has quadratic complexity as of this writing), all functions have complexity O(n log n) or better and often run faster than R s implementation, especially when n and size are large.

sample_int_r 3 Usage sample_int_r(n, size, prob) sample_int_ccrank(n, size, prob) sample_int_crank(n, size, prob) sample_int_expj(n, size, prob) sample_int_expjs(n, size, prob) sample_int_rank(n, size, prob) sample_int_rej(n, size, prob) Arguments n size prob a positive number, the number of items to choose from. See Details. a non-negative integer giving the number of items to choose. a vector of probability weights for obtaining the elements of the vector being sampled. Details sample_int_r() is a simple wrapper for base::sample.int(). sample_int_expj() and sample_int_expjs() implement one-pass random sampling with a reservoir with exponential jumps (Efraimidis and Spirakis, 2006, Algorithm A-ExpJ). Both functions are implemented in Rcpp; *_expj() uses log-transformed keys, *_expjs() implements the algorithm in the paper verbatim (at the cost of numerical stability). sample_int_rank(), sample_int_crank() and sample_int_ccrank() implement one-pass random sampling (Efraimidis and Spirakis, 2006, Algorithm A). The first function is implemented purely in R, the other two are optimized Rcpp implementations (*_crank() uses R vectors internally, while *_ccrank() uses std::vector; surprisingly, *_crank() seems to be faster on most inputs). It can be shown that the order statistic of U (1/wi) has the same distribution as random sampling without replacement (U = uniform(0, 1) distribution). To increase numerical stability, log(u)/w i is computed instead; the log transform does not change the order statistic. sample_int_rej() uses repeated weighted sampling with replacement and a variant of rejection sampling. It is implemented purely in R. This function simulates weighted sampling without replacement using somewhat more draws with replacement, and then discarding duplicate values (rejection sampling). If too few items are sampled, the routine calls itself recursively on a (hopefully) much smaller problem. See also http://stats.stackexchange.com/q/20590/6432. Value An integer vector of length size with elements from 1:n.

4 sample_int_r Author(s) Dinre (for *_rank()), Kirill Müller (for all other functions) References http://stackoverflow.com/q/15113650/946850 Efraimidis, Pavlos S., and Paul G. Spirakis. "Weighted random sampling with a reservoir." Information Processing Letters 97, no. 5 (2006): 181-185. See Also base::sample.int() Examples # Base R implementation s <- sample_int_r(2000, 1000, runif(2000)) tbl <- table(replicate(sample_int_r(6, 3, p), ## Algorithm A, Rcpp version using std::vector s <- sample_int_ccrank(20000, 10000, runif(20000)) tbl <- table(replicate(sample_int_ccrank(6, 3, p), ## Algorithm A, Rcpp version using R vectors s <- sample_int_crank(20000, 10000, runif(20000)) tbl <- table(replicate(sample_int_crank(6, 3, p), ## Algorithm A-ExpJ (with log-transformed keys) s <- sample_int_expj(20000, 10000, runif(20000))

sample_int_r 5 tbl <- table(replicate(sample_int_expj(6, 3, p), ## Algorithm A-ExpJ (paper version) s <- sample_int_expjs(20000, 10000, runif(20000)) tbl <- table(replicate(sample_int_expjs(6, 3, p), ## Algorithm A s <- sample_int_rank(20000, 10000, runif(20000)) tbl <- table(replicate(sample_int_rank(6, 3, p), ## Rejection sampling s <- sample_int_rej(20000, 10000, runif(20000)) tbl <- table(replicate(sample_int_rej(6, 3, p),

Index Topic package wrswor-package, 2 base::sample.int(), 2 4 sample_int_ccrank (sample_int_r), 2 sample_int_crank (sample_int_r), 2 sample_int_expj (sample_int_r), 2 sample_int_expjs (sample_int_r), 2 sample_int_r, 2 sample_int_rank (sample_int_r), 2 sample_int_rej (sample_int_r), 2 wrswor (wrswor-package), 2 wrswor-package, 2 6