Package SFtools. June 28, 2017

Similar documents
Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package datasets.load

Package kirby21.base

Package MFDFA. April 18, 2018

Package fastdummies. January 8, 2018

Package spark. July 21, 2017

Package repec. August 31, 2018

Package Mondrian. R topics documented: March 4, Type Package

Package TVsMiss. April 5, 2018

Package mdftracks. February 6, 2017

Package gtrendsr. October 19, 2017

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package sessioninfo. June 21, 2017

Package fastrtext. December 10, 2017

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package crossword.r. January 19, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package messaging. May 27, 2018

Package jstree. October 24, 2017

Package bisect. April 16, 2018

Package geogrid. August 19, 2018

Package gtrendsr. August 4, 2018

Package tidyimpute. March 5, 2018

Package docxtools. July 6, 2018

Package tm.plugin.lexisnexis

Package clipr. June 23, 2018

Package sankey. R topics documented: October 22, 2017

Package ECctmc. May 1, 2018

Package gifti. February 1, 2018

Package ClusterBootstrap

Package TrafficBDE. March 1, 2018

Package qualmap. R topics documented: September 12, Type Package

Package preprosim. July 26, 2016

Package gcite. R topics documented: February 2, Type Package Title Google Citation Parser Version Date Author John Muschelli

Package validara. October 19, 2017

Package canvasxpress

Package strat. November 23, 2016

Package splithalf. March 17, 2018

Package nngeo. September 29, 2018

Package nonlinearicp

Package calpassapi. August 25, 2018

Package harrypotter. September 3, 2018

Package condusco. November 8, 2017

Package vinereg. August 10, 2018

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package CatEncoders. March 8, 2017

Package fusionclust. September 19, 2017

Package raker. October 10, 2017

Package CorporaCoCo. R topics documented: November 23, 2017

Package reval. May 26, 2015

Package SEMrushR. November 3, 2018

Package deductive. June 2, 2017

Package QCAtools. January 3, 2017

Package regexselect. R topics documented: September 22, Version Date Title Regular Expressions in 'shiny' Select Lists

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package sure. September 19, 2017

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package curry. September 28, 2016

Package intccr. September 12, 2017

Package ecoseries. R topics documented: September 27, 2017

Package ConvergenceClubs

Package vip. June 15, 2018

Package rcv. August 11, 2017

Package textrank. December 18, 2017

Package available. November 17, 2017

Package panelview. April 24, 2018

Package librarian. R topics documented:

Package IATScore. January 10, 2018

Package spelling. December 18, 2017

Package queuecomputer

Package areaplot. October 18, 2017

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package dyncorr. R topics documented: December 10, Title Dynamic Correlation Package Version Date

Package tm.plugin.factiva

Package estprod. May 2, 2018

Package editdata. October 7, 2017

Package hypercube. December 15, 2017

Package PCADSC. April 19, 2017

Package rsppfp. November 20, 2018

Package sfc. August 29, 2016

Package postgistools

Package rprojroot. January 3, Title Finding Files in Project Subdirectories Version 1.3-2

Package gameofthrones

Package tm.plugin.factiva

Package basictrendline

Package iccbeta. January 28, 2019

Package ASICS. January 23, 2018

Package desplot. R topics documented: April 3, 2018

Package opencage. January 16, 2018

Package projector. February 27, 2018

Package ipft. January 4, 2018

Package scraep. July 3, Index 6

Package PxWebApiData

Package orthodr. R topics documented: March 26, Type Package

Package textstem. R topics documented:

Package oec. R topics documented: May 11, Type Package

Package rxylib. December 20, 2017

Package dbx. July 5, 2018

Package robotstxt. November 12, 2017

Package nprotreg. October 14, 2018

Transcription:

Type Package Title Space Filling Based Tools for Data Mining Version 0.1.0 Author Mohamed Laib and Mikhail Kanevski Package SFtools June 28, 2017 Maintainer Mohamed Laib <laib.med@gmail.com> Contains space filling based tools for machine learning and data mining. Some functions offer several computational techniques and deal with the out of memory for large big data by using the ff package. Imports wordspace, doparallel, ff, parallel, stats License GPL-3 URL https://sites.google.com/site/mohamedlaibwebpage/ BugReports https://github.com/mlaib/sftools/issues Encoding UTF-8 LazyData true Note The GPU computing version can be found soon, on following url https://sites.google.com/site/mohamedlaibwebpage/software. RoxygenNote 6.0.1 NeedsCompilation no Repository CRAN Date/Publication 2017-06-28 15:53:43 UTC R topics documented: SFtools-package....................................... 2 Infinity............................................ 2 UfsCov........................................... 3 UfsCov_ff.......................................... 5 UfsCov_par......................................... 6 Index 9 1

2 Infinity SFtools-package SFtools: Space Filling Based Tools for Data Mining Contains space filling based tools for machine learning and data mining. Some functions offer several computational techniques and deal with the out of memory for large big data by using the ff package. Mohamed Laib <Mohamed.Laib@unil.ch> and Mikhail Kanevski <Mikhail.Kanevski@unil.ch>, Maintainer: Mohamed Laib <laib.med@gmail.com> References M. Laib and M. Kanevski (2017). Unsupervised Feature Selection Based on Space Filling Concept, arxiv:1706.08894. J. A. Royle, D. Nychka, An algorithm for the construction of spatial coverage designs with implementation in Splus, Computers and Geosciences 24 (1997) p. 479 488. J. Franco, Planification d expériences numériques en phase exploratoire pour la simulation des phénomènes complexes, Thesis (2008) 282. D. Dupuy, C. Helbert, J. Franco (2015). DiceDesign and DiceEval: Two R Packages for Design and Analysis of Computer Experiments. Journal of Statistical Software, 65(11), 1-38. Jstatsoft. See Also Useful links: https://sites.google.com/site/mohamedlaibwebpage/ Report bugs at https://github.com/mlaib/sftools/issues Infinity Simulated dataset Generates a simulated dataset (the Infinity dataset) Usage Infinity(n=1000)

UfsCov 3 Arguments n Number of generated data points (by default: n=1000). Value A data.frame of simulated dataset, with 7 features (4 of them are redundants) Mohamed Laib <Mohamed.Laib@unil.ch> Examples infinity<-infinity(n=1000) plot(infinity$x1,infinity$x2) ## Not run: #### Visualisation of the infinity dataset (3D) #### require(rgl) require(colorramps) c <- cut(infinity$z,breaks=100) cols <- matlab.like(100)[as.numeric(c)] plot3d(infinity$x1,infinity$x2,infinity$z,radius=0.01, col=cols, type="s",xlab="x1",ylab="x2",zlab="z",box=f) grid3d(c("x","y","z"),col="black",lwd=1) ## End(Not run) UfsCov UfsCov algorithm for unsupervised feature selection Applies the UfsCov algorithm based on the space filling concept, by using a sequatial forward search (SFS). Usage UfsCov(data) Arguments data Data of class: matrix or data.frame.

4 UfsCov Details Value Note Since the algorithm is based on pairwise distances, and according to the computing power of your machine, large number of data points can take much time and needs more memory. See UfsCov_par for parellel computing, or UfsCov_ff for memory efficient storage of large data on disk and fast access (by using the ff and the ffbase packages). A list of two elements: CovD a vector containing the coverage measure of each step of the SFS. IdR a vector containing the added variables during the selection procedure. The algorithm does not deal with missing values and constant features. Please make sure to remove them. Mohamed Laib <Mohamed.Laib@unil.ch> References M. Laib and M. Kanevski (2017). Unsupervised Feature Selection Based on Space Filling Concept, arxiv:1706.08894. Examples infinity<-infinity(n=800) Results<- UfsCov(infinity) cou<-colnames(infinity) nom<-cou[results[[2]]] par(mfrow=c(1,1), mar=c(5,5,2,2)) names(results[[1]])<-cou[results[[2]]] plot(results[[1]],pch=16,cex=1,col="blue", axes = FALSE, xlab = "Added Features", ylab = "Coverage measure") lines(results[[1]],cex=2,col="blue") grid(lwd=1.5,col="gray" ) box() axis(2) axis(1,1:length(nom),nom) which.min(results[[1]]) ## Not run: #### UfsCov on the Butterfly dataset #### require(idmining) N <- 1000

UfsCov_ff 5 raw_dat <- Butterfly(N) dat<-raw_dat[,-9] Results<- UfsCov(dat) cou<-colnames(dat) nom<-cou[results[[2]]] par(mfrow=c(1,1), mar=c(5,5,2,2)) names(results[[1]])<-cou[results[[2]]] plot(results[[1]],pch=16,cex=1,col="blue", axes = FALSE, xlab = "Added Features", ylab = "Coverage measure") lines(results[[1]],cex=2,col="blue") grid(lwd=1.5,col="gray" ) box() axis(2) axis(1,1:length(nom),nom) which.min(results[[1]]) ## End(Not run) UfsCov_ff UfsCov for unsupervised features selection Usage Applies the UfsCov algorithm based on the space filling concept, by using a sequatial forward search (for memory efficient storage of large data on disk and fast access). UfsCov_ff(data, blocks=2) Arguments data blocks Data of class: matrix or data.frame. Number of splits to facilitate the computation of the distance matrix (by default: blocks=2). Value A list of two elements: CovD a vector containing the coverage measure of each step of the SFS. IdR a vector containing the added variables during the selection procedure. Note This function is still under developement.

6 UfsCov_par Mohamed Laib <Mohamed.Laib@unil.ch> References M. Laib and M. Kanevski (2017). Unsupervised Feature Selection Based on Space Filling Concept, arxiv:1706.08894. Examples ## Not run: #### Infinity dataset #### N <- 1000 dat<-infinity(n) Results<- UfsCov_ff(dat) cou<-colnames(dat) nom<-cou[results[[2]]] par(mfrow=c(1,1), mar=c(5,5,2,2)) names(results[[1]])<-cou[results[[2]]] plot(results[[1]],pch=16,cex=1,col="blue", axes = FALSE, xlab = "Added Features", ylab = "Coverage measure") lines(results[[1]],cex=2,col="blue") grid(lwd=1.5,col="gray" ) box() axis(2) axis(1,1:length(nom),nom) which.min(results[[1]]) #### Butterfly dataset #### require(idmining) N <- 1000 raw_dat <- Butterfly(N) dat<-raw_dat[,-9] Results<- UfsCov_ff(dat) ## End(Not run) UfsCov_par UfsCov algorithm for unsupervised feature selection Applies the UfsCov algorithm based on the space filling concept, by using a sequatial forward search (SFS).This function offers a parellel computing.

UfsCov_par 7 Usage UfsCov_par(data, ncores=2) Arguments data ncores Data of class: matrix or data.frame. Number of cores to use (by default: ncores=2). Details Value Note Since the algorithm is based on pairwise distances, and according to the computing power of your machine, large number of data points needs more memory. See UfsCov_ff for memory efficient storage of large data on disk and fast access (by using the ff and the ffbase packages). A list of two elements: CovD a vector containing the coverage measure of each step of the SFS. IdR a vector containing the added variables during the selection procedure. The algorithm does not deal with missing values and constant features. Please make sure to remove them. Note that it is not recommanded to use this function with small data, it takes more time than using the standard UfsCov function. Mohamed Laib <Mohamed.Laib@unil.ch> References M. Laib and M. Kanevski (2017). Unsupervised Feature Selection Based on Space Filling Concept, arxiv:1706.08894. See Also UfsCov, UfsCov_ff Examples N <- 800 dat<-infinity(n) Results<- UfsCov_par(dat,ncores=2) cou<-colnames(dat) nom<-cou[results[[2]]] par(mfrow=c(1,1), mar=c(5,5,2,2)) names(results[[1]])<-cou[results[[2]]]

8 UfsCov_par plot(results[[1]],pch=16,cex=1,col="blue", axes = FALSE, xlab = "Added Features", ylab = "Coverage measure") lines(results[[1]],cex=2,col="blue") grid(lwd=1.5,col="gray" ) box() axis(2) axis(1,1:length(nom),nom) which.min(results[[1]]) ## Not run: N<-5000 dat<-infinity(n) ## Little comparison: system.time(uf<-ufscov(dat)) system.time(uf.p<-ufscov_par(dat, ncores = 4)) ## End(Not run)

Index Infinity, 2 SFtools (SFtools-package), 2 SFtools-package, 2 UfsCov, 3, 7 UfsCov_ff, 4, 5, 7 UfsCov_par, 4, 6 9