Package preprosim. July 26, 2016

Similar documents
Package preprocomb. June 26, 2016

Package metaheur. June 30, 2016

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package reval. May 26, 2015

Package fastdummies. January 8, 2018

Package vip. June 15, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package ECctmc. May 1, 2018

Package datasets.load

Package rsppfp. November 20, 2018

Package docxtools. July 6, 2018

Package weco. May 4, 2018

Package fitbitscraper

Package validara. October 19, 2017

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package vinereg. August 10, 2018

Package balance. October 12, 2018

Package SEMrushR. November 3, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package canvasxpress

Package gtrendsr. October 19, 2017

Package bisect. April 16, 2018

Package gppm. July 5, 2018

Package calpassapi. August 25, 2018

Package QCAtools. January 3, 2017

Package kirby21.base

Package deductive. June 2, 2017

Package TrafficBDE. March 1, 2018

Package catenary. May 4, 2018

Package splithalf. March 17, 2018

Package WordR. September 7, 2017

Package embed. November 19, 2018

Package projector. February 27, 2018

Package ASICS. January 23, 2018

Package barcoder. October 26, 2018

Package lumberjack. R topics documented: July 20, 2018

Package omu. August 2, 2018

Package GetITRData. October 22, 2017

Package edfreader. R topics documented: May 21, 2017

Package benchmarkme. R topics documented: February 26, Type Package Title Crowd Sourced System Benchmarks Version 0.2.

Package gtrendsr. August 4, 2018

Package qualmap. R topics documented: September 12, Type Package

Package climber. R topics documented:

Package censusapi. August 19, 2018

Package svalues. July 15, 2018

Package kdtools. April 26, 2018

Package enpls. May 14, 2018

Package zoomgrid. January 3, 2019

Package ClusterSignificance

Package editdata. October 7, 2017

Package NFP. November 21, 2016

Package PUlasso. April 7, 2018

Package postgistools

Package effectr. January 17, 2018

Package mdftracks. February 6, 2017

Package wrswor. R topics documented: February 2, Type Package

Package geneslope. October 26, 2016

Package fastqcr. April 11, 2017

Package fusionclust. September 19, 2017

Package shinyfeedback

Package prozor. July 26, 2018

Package semver. January 6, 2017

Package cregulome. September 13, 2018

Package kerasformula

Package UNF. June 13, 2017

Package sfc. August 29, 2016

Package blandr. July 29, 2017

Package ezknitr. September 16, 2016

Package triebeard. August 29, 2016

Package dyncorr. R topics documented: December 10, Title Dynamic Correlation Package Version Date

Package robotstxt. November 12, 2017

Package SC3. September 29, 2018

Package wikitaxa. December 21, 2017

Package CuCubes. December 9, 2016

Package rollply. R topics documented: August 29, Title Moving-Window Add-on for 'plyr' Version 0.5.0

Package PCADSC. April 19, 2017

Package ggmosaic. February 9, 2017

Package narray. January 28, 2018

Package states. May 4, 2018

Package TestDataImputation

Package queuecomputer

Package profvis. R topics documented:

Package elmnnrcpp. July 21, 2018

Package facerec. May 14, 2018

Package ggextra. April 4, 2018

Package nngeo. September 29, 2018

Package strat. November 23, 2016

Package nonlinearicp

Package liftr. R topics documented: May 14, Type Package

Package meme. November 2, 2017

Package SC3. November 27, 2017

Package mgc. April 13, 2018

Package repec. August 31, 2018

Package dkanr. July 12, 2018

Package ezsummary. August 29, 2016

Package condusco. November 8, 2017

Package condir. R topics documented: February 15, 2017

Package avirtualtwins

Package textrank. December 18, 2017

Transcription:

Package preprosim July 26, 2016 Type Package Title Lightweight Data Quality Simulation for Classification Version 0.2.0 Date 2016-07-26 Data quality simulation can be used to check the robustness of data analysis findings and learn about the impact of data quality contaminations on classification. This package helps to add contaminations (noise, missing values, outliers, low variance, irrelevant features, class swap (inconsistency), class imbalance and decrease in data volume) to data and then evaluate the simulated data sets for classification accuracy. As a lightweight solution simulation runs can be set up with no or minimal up-front effort. License GPL-2 LazyData TRUE Imports DMwR, reshape2, ggplot2, methods, stats, caret, doparallel, foreach, e1071 Suggests gbm, preprocomb, preproviz, knitr, rmarkdown URL https://github.com/mvattulainen/preprosim BugReports https://github.com/mvattulainen/preprosim/issues VignetteBuilder knitr RoxygenNote 5.0.1 NeedsCompilation no Author Markus Vattulainen [aut, cre] Maintainer Markus Vattulainen <markus.vattulainen@gmail.com> Repository CRAN Date/Publication 2016-07-26 12:14:58 1

2 changeparam R topics documented: changeparam........................................ 2 getpreprosimdata...................................... 3 getpreprosimdf....................................... 4 newparam.......................................... 4 preprosimanalysis-class................................... 5 preprosimparameter-class.................................. 5 preprosimplot........................................ 7 preprosimrun........................................ 7 Index 9 changeparam Change simulation control parameter object Preprosim parameter objects contain eight contaminations: noise, lowvar, misval, irfeature, classswap, classimbalance, volumedecrease and outlier. Each contamination has three sub parameters: cols as columns the contamination is applied to, param as the parameter of the contaminations itself (i.e. intensity of contamination) and order as order in which the parameter is applied to the data. changeparam(object, contamination, param, value) object contamination param value (preprosimparameter object) (character) one of the following: noise, lowvar, misval, irfeature, classswap, classimbalance, volumedecrease, outlier (character) one of the following: cols, param, order (numeric) scalar (for order) or vector (for cols and param) of parameter values The order of contaminations (cols parameter) must be between 1 and 8, and no two contaminations can have the same order. The contamination parameter (param parameter) must start with 0 (e.g. param="param", value=c(0,0.3)) Value preprosimparameter class object

getpreprosimdata 3 pa <- newparam(iris) pa <- changeparam(pa, "noise", "cols", value=1) pa <- changeparam(pa, "noise", "param", value=c(0,0.1)) pa <- changeparam(pa, "noise", "order", value=1) getpreprosimdata Get simulation run result data Get simulation run result data getpreprosimdata(object, type = "accuracy", x, z) object type x z (preprosimanalysis class object) object (character) type of data: accuracy, varimportance, outliers or xz (character) x axis contamination (character) z axis contamination contaminations are : noise, lowvar, misval, irfeature, classswap, classimbalance, volumedecrease, outlier ## res <- preprosimrun(iris) ## getpreprosimdata(res, "accuracy") ## getpreprosimdata(res, type="xz", x="misval", z="noise")

4 newparam getpreprosimdf Get a contaminated data frame Get a contaminated data frame getpreprosimdf(object, paramvector) object paramvector (preprosimanalysis class object) object to be plotted (numeric) contamination combinations to be searched for ## res <- preprosimrun(iris) ## df <- preprosimdf(res, c(0,0,0,0,0,0,0,0)) # returns uncontaminated original data set newparam Create new simulation control parameter object Preprosim parameter objects contain eight contaminations: noise, lowvar, misval, irfeature, classswap, classimbalance, volumedecrease and outlier. Each contamination has three sub parameters: cols as columns the contamination is applied to, param as the parameter of the contamination itself (i.e. intensity of contamination) and order as order in which the parameter is applied to the data. newparam(dataframe, type = "default", x, z) dataframe type x z (data frame) original data to be used in simulations (character) creation type: empty, default or custom, defaults to "default" (character) primary contamination of interest such as "misval" (character) secondary contamination of interest such as "noise"

preprosimanalysis-class 5 For argument type: empty creates a preprosimparameter object with empty params (but not empty cols or order). default creates 6561 combinations with all params 0, 0.1, 0.2. custom creates params seq(0, 0.9, by 0.1) for primary (x) and 0., 0.1, 0.2 for secondary (z). The implicit y (not an argument) refers to classification accuracy. Value preprosimparameter class object pa <- newparam(iris) pa1 <- newparam(iris, "empty") pa2 <- newparam(iris, "custom", "misval", "noise") preprosimanalysis-class An S4 class representing simulation run results An S4 class representing simulation run results Slots grid (data frame) data frame consisting of combinations of preprosimparameters data (list) list of simulated data sets output (numeric) vector of classification accuracies variableimportance (data frame) data frame consisting of variable importance values outliers (numeric) vector of outlier scores preprosimparameter-class An S4 class representing simulation control parameters An S4 class representing simulation control parameters

6 preprosimparameter-class Slots noisecol (numeric) noiseparam (numeric) noiseorder (numeric) noisefunction (character) lowvarcol (numeric) lowvarparam (numeric) lowvarorder (numeric) lowvarfunction (character) misvalcol (numeric) misvalparam (numeric) misvalorder (numeric) misvalfunction (character) irfeaturecol (numeric) irfeatureparam (numeric) irfeatureorder (numeric) irfeaturefunction (character) classswapcol (numeric) classswapparam (numeric) classswaporder (numeric) classswapfunction (character) classimbalancecol (numeric) classimbalanceparam (numeric) classimbalanceorder (numeric) classimbalancefunction (character) volumedecreasecol (numeric) volumedecreaseparam (numeric) volumedecreaseorder (numeric) volumedecreasefunction (character) outliercol (numeric) outlierparam (numeric) outlierorder (numeric) outlierfunction (character)

preprosimplot 7 preprosimplot Plot simulation run results Plot simulation run results preprosimplot(object, type = "accuracy", x, z) object type x z (preprosimanalysis class object) object to be plotted (character) type of plot: accuracy, varimportance, outliers or xz; defaults to accuracy (character) x axis contamination (character) z axis contamination plotted as panels contaminations are : noise, lowvar, misval, irfeature, classswap, classimbalance, volumedecrease, outlier ## res <- preprosimrun(iris) ## preprosimplot(res) ## preprosimplot(res, type="xz", x="misval", z="noise") preprosimrun Run simulation Run simulation preprosimrun(data, param = newparam(data, "default"), seed = 1, caretmodel = "gbm", holdoutrounds = 10, cores = 1, verbose = TRUE, fitmodels = TRUE)

8 preprosimrun data Value param (data frame) one factor columns for class labels, other columns numeric, no missing values (preprosimparameter object) simulation parameters, defaults to parameters set automatically for data. seed (integer) seed to be used for reproducible results, defaults to 1 caretmodel (character) a model from package Caret, defaults to gbm (gbm must be installed before preprosimrun) holdoutrounds (integer) number of holdout rounds, defaults to 10 cores (integer) number of cores used in parallel processing, defaults to 1 verbose fitmodels (boolean) progress information outputted, defaults to TRUE (boolean) whether classification models are fitted, defaults to TRUE (FALSE: get only the contaminated datasets) caretmodel must be able to deal with missing values and have in-build variable importance such as rpart and gbm. Note: caret message will be outputted regardless of verbose. preprosimanalysis class object res <- preprosimrun(iris, param=newparam(iris, "custom", x="misval", z="noise"), fitmodels=false)

Index changeparam, 2 getpreprosimdata, 3 getpreprosimdf, 4 newparam, 4 preprosimanalysis-class, 5 preprosimparameter-class, 5 preprosimplot, 7 preprosimrun, 7 9