Package CuCubes. December 9, 2016

Similar documents
Package datasets.load

Package manet. September 19, 2017

Package bisect. April 16, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package spark. July 21, 2017

Package fastdummies. January 8, 2018

Package QCAtools. January 3, 2017

Package mgc. April 13, 2018

Package mdftracks. February 6, 2017

Package gtrendsr. October 19, 2017

Package balance. October 12, 2018

Package ECctmc. May 1, 2018

Package optimus. March 24, 2017

Package clustering.sc.dp

Package Numero. November 24, 2018

Package gtrendsr. August 4, 2018

Package IATScore. January 10, 2018

Package preprosim. July 26, 2016

Package kirby21.base

Package fst. December 18, 2017

Package textrank. December 18, 2017

Package splithalf. March 17, 2018

Package birtr. October 4, 2017

Package TVsMiss. April 5, 2018

Package SoftClustering

Package weco. May 4, 2018

Package RNAseqNet. May 16, 2017

Package geogrid. August 19, 2018

Package SEMrushR. November 3, 2018

Package ClusterBootstrap

Package Rsomoclu. January 3, 2019

Package OLScurve. August 29, 2016

Package DRaWR. February 5, 2016

Package SAENET. June 4, 2015

Package projector. February 27, 2018

Package raker. October 10, 2017

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package ConvergenceClubs

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package strat. November 23, 2016

Package tidyimpute. March 5, 2018

Package SSLASSO. August 28, 2018

Package sfc. August 29, 2016

Package crossrun. October 8, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package comparedf. February 11, 2019

Package manet. August 23, 2018

Package omu. August 2, 2018

Package Rtsne. April 14, 2017

Package elmnnrcpp. July 21, 2018

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package embed. November 19, 2018

Package regexselect. R topics documented: September 22, Version Date Title Regular Expressions in 'shiny' Select Lists

Package repec. August 31, 2018

Package gppm. July 5, 2018

Package nmslibr. April 14, 2018

Package LVGP. November 14, 2018

Package readobj. November 2, 2017

Package jstree. October 24, 2017

Package estprod. May 2, 2018

Package PedCNV. February 19, 2015

Package C50. December 1, 2017

Package DCEM. September 30, 2018

Package ArrayBin. February 19, 2015

Package wskm. July 8, 2015

Package ClusterSignificance

Package mmtsne. July 28, 2017

Package validara. October 19, 2017

Package BASiNET. October 2, 2018

Package spelling. December 18, 2017

Package rxylib. December 20, 2017

Package arc. April 18, 2018

Package docxtools. July 6, 2018

Package CorporaCoCo. R topics documented: November 23, 2017

Package samplesizecmh

Package sgmcmc. September 26, Type Package

Package rpst. June 6, 2017

Package addhaz. September 26, 2018

Package vip. June 15, 2018

Package messaging. May 27, 2018

Package sigqc. June 13, 2018

Package geneslope. October 26, 2016

Package assortnet. January 18, 2016

Package hyphenatr. August 29, 2016

Package zoomgrid. January 3, 2019

Package hypercube. December 15, 2017

Package PCADSC. April 19, 2017

Package modmarg. R topics documented:

Package sessioninfo. June 21, 2017

Package qualmap. R topics documented: September 12, Type Package

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package MTLR. March 9, 2019

Package enpls. May 14, 2018

Package BayesCR. September 11, 2017

Package MOST. November 9, 2017

Package quickreg. R topics documented:

Package fst. June 7, 2018

Package dkdna. June 1, Description Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes.

Package kdtools. April 26, 2018

Transcription:

Package CuCubes December 9, 2016 Title MultiDimensional Feature Selection (MDFS) Version 0.1.0 URL https://featureselector.uco.uwb.edu.pl/pub/cucubes/ Functions for MultiDimensional Feature Selection (MDFS): * calculating multidimensional information gains, * finding interesting tuples for chosen variables, * scoring variables, * finding important variables, * plotting selection results. CuCubes is also known as CUDA Cubes and it is a library that allows fast CUDA-accelerated computation of information gains in binary classification problems. This package wraps CuCubes and provides an alternative CPU version as well as helper functions for building MultiDimensional Feature Selectors. Depends R (>= 3.2.1) License GPL-3 Encoding UTF-8 LazyData true RoxygenNote 5.0.1 NeedsCompilation yes Author Radosław Piliszek [aut, cre], Andrzej Sułecki [aut], Paweł Tabaszewski [aut], Krzysztof Mnich [aut], Witold Rudnicki [cph] Maintainer Radosław Piliszek <r.piliszek@uwb.edu.pl> Repository CRAN Date/Publication 2016-12-09 13:13:25 1

2 ComputeInterestingTuples R topics documented: ComputeInterestingTuples................................. 2 ComputeMaxInfoGains................................... 3 madelon........................................... 4 MDFS............................................ 4 plot.mdfs......................................... 5 RelevantVariables...................................... 6 RelevantVariables.MDFS.................................. 6 Index 8 ComputeInterestingTuples Interesting tuples Interesting tuples ComputeInterestingTuples(acceleration.type = "scalar", dimensions = 1, divisions = 1, discretizations = 1, seed = 0, range = 1, pseudo.count = 0.001, reduce.method = "max", ig.thr, interesting.vars = c(), data, decision) acceleration.type acceleration type ( scalar for none, avx / avx2 for use of the AVX/AVX2 instruction set respectively) dimensions number of dimensions divisions number of divisions discretizations number of discretizations seed seed for PRNG used during discretizations range discretization range (from 0.0 to 1.0) pseudo.count reduce.method pseudo count discretization reduce method (either "max" or "mean") ig.thr IG threshold above which the tuple is interesting interesting.vars variables for which to check the IGs (none = all) data decision input data where columns are variables and rows are observations decision variable as a boolean vector of length equal to number of observations

ComputeMaxInfoGains 3 none (the function prints results) ComputeMaxInfoGains Max information gains Max information gains ComputeMaxInfoGains(acceleration.type = "scalar", dimensions = 1, divisions = 1, discretizations = 1, seed = 0, range = 1, pseudo.count = 0.001, reduce.method = "max", data, decision) acceleration.type acceleration type ( scalar for none, avx / avx2 for use of the AVX/AVX2 instruction set respectively, cuda for CUDA) dimensions number of dimensions divisions number of divisions discretizations number of discretizations seed seed for PRNG used during discretizations range discretization range (from 0.0 to 1.0) pseudo.count reduce.method data decision pseudo count discretization reduce method (either "max" or "mean") input data where columns are variables and rows are observations decision variable as a boolean vector of length equal to number of observations numeric vector with max information gain for each input variable Examples ComputeMaxInfoGains(data = madelon$data, decision = madelon$decision, discretizations = 1, range = 0, divisions = 22, dimensions = 1)

4 MDFS madelon An artificial dataset called MADELON Format Details Source An artificial dataset containing data points grouped in 32 clusters placed on the vertices of a five dimensional hypercube and randomly labeled 0/1. madelon A list of two elements: data 2000 by 500 matrix of 2000 objects with 500 features decision vector of 2000 decisions (labels 0/1) The five dimensions constitute 5 informative features. 15 linear combinations of those features are added to form a set of 20 (redundant) informative features. There are 480 distractor features called probes having no predictive power. Included is the original training set with label -1 changed to 0. https://archive.ics.uci.edu/ml/datasets/madelon MDFS Build MultiDimensional Feature Selector from IGs Build MultiDimensional Feature Selector from IGs MDFS(IGs, dimensions, divisions, response_divisions = 1, IG_bits = TRUE, IG_doubled = FALSE, ignore_lowest = length(igs)%/%10, variable_number = length(igs), calc_variable_number = TRUE, mode_1d = "exp", min_variable_number = variable_number%/%2, max_ignore_lowest = variable_number%/%3, max_iterations = 20, acceptable_error = 0.05)

plot.mdfs 5 IGs dimensions max conditional information gains number of dimensions divisions number of divisions response_divisions number of response divisions (i.e. categories-1) IG_bits IG_doubled ignore_lowest input is in binary log (as opposed to natural log) input is doubled (to follow the chi-squared distribution) number of variables with the lowest IG to ignore (ignored if computed) variable_number number of irrelevant variables (ignored if computed) calc_variable_number whether to compute the number of neglected and irrelevant variables mode_1d "exp" - exponential distribution, "lin" - linear function of chi-squared, "raw" - raw chi-squared min_variable_number minimum number of irrelevant variables max_ignore_lowest maximum number of ignored variables max_iterations maximum number of iterations in variable number calculation acceptable_error acceptable error level for distribution parameter MDFS (list-based S3 class object) with the following named elements: "IGs" is a vector of information gains (input copy) "order" is a vector of ordinal numbers (order of variables by decreasing score) "chi.squared" is a vector of chi-squared p-values "p.values" is a vector of eventual p-values "scores" is a list of two vectors FDR and FWER with FDR and FWER scores respectively "lo.sq.dev." is a vector of square deviations used to calculate the number of ignored variables "hi.sq.dev." is a vector of square deviations used to calculate the number of irrelevant variables "ign.lowest" is a number of ignored variables "var.number" is a number of irrelevant variables "dist.param." is an exponential distribution parameter or linear coefficient "err.param." is a square error of the parameter plot.mdfs Plot MDFS details Plot MDFS details

6 RelevantVariables.MDFS ## S3 method for class 'MDFS' plot(x, plots = c("i", "p", "FDR"),...) x an MDFS object plots plots to plot (I for max IG, p for p-values, FDR, FWER)... ignored RelevantVariables Find indices of relevant variables Find indices of relevant variables RelevantVariables(fs,...) fs feature selector... arguments passed to methods indices of important variables RelevantVariables.MDFS Find indices of relevant variables from MDFS Find indices of relevant variables from MDFS ## S3 method for class 'MDFS' RelevantVariables(fs, level = 0.05, score = "FDR",...)

RelevantVariables.MDFS 7 fs an MDFS object level statistical significance level score score to use... ignored indices of relevant variables

Index Topic datasets madelon, 4 ComputeInterestingTuples, 2 ComputeMaxInfoGains, 3 madelon, 4 MDFS, 4 plot.mdfs, 5 RelevantVariables, 6 RelevantVariables.MDFS, 6 8