Package snn. August 23, 2015

Similar documents
Package RPEnsemble. October 7, 2017

Package FastKNN. February 19, 2015

Package penalizedlda

Package RandPro. January 10, 2018

Package flam. April 6, 2018

Package dkdna. June 1, Description Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes.

Package sbf. R topics documented: February 20, Type Package Title Smooth Backfitting Version Date Author A. Arcagni, L.

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Package gibbs.met. February 19, 2015

Package RLRsim. November 4, 2016

Package ssa. July 24, 2016

Package hgam. February 20, 2015

Package glassomix. May 30, 2013

Package spatialtaildep

Package longclust. March 18, 2018

Package RLT. August 20, 2018

Package orthodr. R topics documented: March 26, Type Package

Package simsurv. May 18, 2018

Package madsim. December 7, 2016

Package RAMP. May 25, 2017

Package spcadjust. September 29, 2016

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Package MixSim. April 29, 2017

Package rmi. R topics documented: August 2, Title Mutual Information Estimators Version Author Isaac Michaud [cre, aut]

Package Modeler. May 18, 2018

Package RaPKod. February 5, 2018

Package alphastable. June 15, 2018

Package PUlasso. April 7, 2018

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package CatEncoders. March 8, 2017

Package RCA. R topics documented: February 29, 2016

Package SSLASSO. August 28, 2018

Package knnflex. April 17, 2009

Package rgcvpack. February 20, Index 6. Fitting Thin Plate Smoothing Spline. Fit thin plate splines of any order with user specified knots

Package hsiccca. February 20, 2015

Package slp. August 29, 2016

Package crossval. R topics documented: March 31, Version Date Title Generic Functions for Cross Validation

Package milr. June 8, 2017

Package SAMUR. March 27, 2015

Package nmslibr. April 14, 2018

Package TANDEM. R topics documented: June 15, Type Package

Package CVST. May 27, 2018

Package FPDclustering

Package JOP. February 19, 2015

Package listdtr. November 29, 2016

Package missforest. February 15, 2013

Package ADMM. May 29, 2018

Package quadprogxt. February 4, 2018

Package misclassglm. September 3, 2016

Package TVsMiss. April 5, 2018

Package LINselect. June 9, 2018

Package MeanShift. R topics documented: August 29, 2016

Package mvpot. R topics documented: April 9, Type Package

Package FWDselect. December 19, 2015

Package robustgam. February 20, 2015

Bayes Classifiers and Generative Methods

Package sfc. August 29, 2016

Package CALIBERrfimpute

Package wskm. July 8, 2015

Package rknn. June 9, 2015

Package dissutils. August 29, 2016

Package SAM. March 12, 2019

Package sensory. R topics documented: February 23, Type Package

Package AnDE. R topics documented: February 19, 2015

Package MultiOrd. May 21, 2016

Package stockr. June 14, 2018

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Package GenSA. R topics documented: January 17, Type Package Title Generalized Simulated Annealing Version 1.1.

Package esabcv. May 29, 2015

Package ssd. February 20, Index 5

From dynamic classifier selection to dynamic ensemble selection Albert H.R. Ko, Robert Sabourin, Alceu Souza Britto, Jr.

Package npcp. August 23, 2017

Package vinereg. August 10, 2018

Package glcm. August 29, 2016

Package rngsetseed. February 20, 2015

Package restlos. June 18, 2013

Package PottsUtils. R topics documented: February 19, 2015

Package ICsurv. February 19, 2015

Nonparametric Classification Methods

Package ecp. December 15, 2017

Package glinternet. June 15, 2018

Package ldbod. May 26, 2017

Package CVR. March 22, 2017

Package smoothmest. February 20, 2015

Semi-Supervised Clustering with Partial Background Information

Package ibm. August 29, 2016

Package robustgam. January 7, 2013

Package apricom. R topics documented: November 24, 2015

Package rereg. May 30, 2018

Package lmesplines. R topics documented: February 20, Version

Package qlearn. R topics documented: February 20, Type Package Title Estimation and inference for Q-learning Version 1.

Package DPBBM. September 29, 2016

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Package sprinter. February 20, 2015

Package strat. November 23, 2016

Package PrivateLR. March 20, 2018

Package areaplot. October 18, 2017

Package kamila. August 29, 2016

Package MIICD. May 27, 2017

Transcription:

Type Package Title Stabilized Nearest Neighbor Classifier Version 1.1 Date 2015-08-22 Author Package snn August 23, 2015 Maintainer Wei Sun <sunweisurrey8@gmail.com> Implement K-nearest neighbor classifier, weighted nearest neighbor classifier, bagged nearest neighbor classifier, optimal weighted nearest neighbor classifier and stabilized nearest neighbor classifier, and perform model selection via 5 fold crossvalidation for them. This package also provides functions for computing the classification error and classification instability of a classification procedure. License GPL-3 Depends R (>= 3.0.0), stats NeedsCompilation no Repository CRAN Date/Publication 2015-08-23 10:22:09 R topics documented: snn-package......................................... 2 cv.tune............................................ 2 mybnn............................................ 4 mycis............................................ 5 mydata............................................ 6 myerror........................................... 7 myknn............................................ 8 myownn........................................... 10 mysnn............................................ 11 mywnn........................................... 12 Index 14 1

2 cv.tune snn-package Package for Stabilized Nearest Neighbor Classifier Details A package for implementations of various nearest neighbor classifiers, including K-nearest neighbor classifier, weighted nearest neighbor classifier, bagged nearest neighbor classifier, optimal weighted nearest neighbor classifier, and a new stabilized nearest neighbor classifier. This package also provides functions for computing the classification error and classification instability of a classification procedure. Package: snn Type: Package Version: 1.0 Date: 2015-07-31 License: GPL-3 The package "snn" provides 8 main functions: (1) the classification error. (2) the classification instability. (3) the K-nearest neighbor classifier. (4) the weighted neighbor classifier. (5) the bagged nearest neighbor classifier. (6) the optimal nearest neighbor classifier. (7) the stabilized nearest neighbor classifier. (8) the model selection via cross-validation for K-nearest neighbor classifier, bagged nearest neighbor classifier, optimal nearest neighbor classifier, and stabilized nearest neighbor classifier. Maintainer: Wei Sun <sunweisurrey8@gmail.com> References W. Sun, X. Qiao, and G. Cheng (2015) Stabilized Nearest Neighbor Classifier and Its Statistical Properties. Available at arxiv.org/abs/1405.6642. cv.tune Tuning via 5 fold Cross-Validation. Implement the tuning procedure for K-nearest neighbor classifier, bagged nearest neighbor classifier, optimal weighted nearest neighbor classifier, and stabilized nearest neighbor classifier.

cv.tune 3 cv.tune(train, numgrid = 20, classifier = "snn") train numgrid classifier Matrix of training data sets. An n by (d+1) matrix, where n is the sample size and d is the dimension. The last column is the class label. Number of grids for search The classifier for tuning. Possible choices are knn, bnn, ownn, snn. Details For the K-nearest neighbor classifier (knn), the grids for search are equal spaced integers in [1, n/2]. Given the best k for the K-nearest neighbor classifier, the best parameter for the bagged nearest neighbor classifier (bnn) is computed via (3.5) in Samworth (2012). Given the best k for the K-nearest neighbor classifier, the best parameter for Samworth s optimal weighted nearest neighbor classifier (ownn) is computed via (2.9) in Samworth (2012). For the stabilized nearest neighbor classifier (snn), we first identify a set of lambda s whose corresponding risks are among the lower 10th percentiles, and then choose from them an optimal one which has the minimal estimated classification instability. The grids of lambda s are chosen such that each one is corresponding to an evenly spaced grid of k in [1, n/2]. See Sun et al. (2015) for details. The returned list contains: parameter.opt The best tuning parameter for the chosen classifier. For example, the best K for knn and ownn, the best ratio for bnn, and the best lambda for snn. parameter.list The list of parameters in the grid search for the chosen classifier. References R.J. Samworth (2012), "Optimal Weighted Nearest Neighbor Classifiers," Annals of Statistics, 40:5, 2733-2763. W. Sun, X. Qiao, and G. Cheng (2015) Stabilized Nearest Neighbor Classifier and Its Statistical Properties. Available at arxiv.org/abs/1405.6642.

4 mybnn ## Tuning procedure out.tune = cv.tune(data, classifier = "knn") out.tune mybnn Bagged Nearest Neighbor Classifier Implement the bagged nearest neighbor classification algorithm to predict the label of a new input using a training data set. mybnn(train, test, ratio) train test ratio Matrix of training data sets. An n by (d+1) matrix, where n is the sample size and d is the dimension. The last column is the class label. Vector of a test point. It also admits a matrix input with each row representing a new test point. Resampling ratio. Details The bagged nearest neighbor classifier is asymptotically equivalent to a weighted nearest neighbor classifier with the i-th weight a function of the resampling ratio, the sample size n, and i. See Hall and Samworth (2005) for details. The tuning parameter ratio can be tuned via cross-validation, see cv.tune function for the tuning procedure. It returns the predicted class label of the new test point. If input is a matrix, it returns a vector which contains the predicted class labels of all the new test points.

mycis 5 References Hall, P. and Samworth, R. (2005). Properties of Bagged Nearest Neighbor Classifiers. Journal of the Royal Statistical Society, Series B, 67, 363-379. # Training data # Testing data set.seed(2015) ntest = 100 TEST = mydata(ntest, d) TEST.x = TEST[,1:d] # bagged nearest neighbor classifier mybnn(data, TEST.x, ratio = 0.5) mycis Classification Instability Compute the classification instability of a classification procedure. mycis(predict1, predict2) predict1 predict2 The list of predicted labels based on one training data set. The list of predicted labels based on another training data set. Details CIS of a classification procedure is defined as the probability that the same object is classified to two different classes by this classification procedure trained from two i.i.d. data sets. Therefore, the arguments predict1 and predict2 are generated on the same test data from the same classification procedure trained on two i.i.d. training data sets. CIS is among [0,1] and a smaller CIS represents a more stable classification procedure. See Section 2 of Sun et al. (2015) for details.

6 mydata References W. Sun, X. Qiao, and G. Cheng (2015) Stabilized Nearest Neighbor Classifier and Its Statistical Properties. Available at arxiv.org/abs/1405.6642. # Training data # Testing data set.seed(2015) ntest = 100 TEST = mydata(ntest, d) TEST.x = TEST[,1:d] ## Compute classification instability for knn, bnn, ownn, and snn with given parameters nn=floor(n/2) permindex = sample(n) predict1.knn = myknn(data[permindex[1:nn],], TEST.x, K = 5) predict2.knn = myknn(data[permindex[-(1:nn)],], TEST.x, K = 5) predict1.bnn = mybnn(data[permindex[1:nn],], TEST.x, ratio = 0.5) predict2.bnn = mybnn(data[permindex[-(1:nn)],], TEST.x, ratio = 0.5) predict1.ownn = myownn(data[permindex[1:nn],], TEST.x, K = 5) predict2.ownn = myownn(data[permindex[-(1:nn)],], TEST.x, K = 5) predict1.snn = mysnn(data[permindex[1:nn],], TEST.x, lambda = 10) predict2.snn = mysnn(data[permindex[-(1:nn)],], TEST.x, lambda = 10) mycis(predict1.knn, predict2.knn) mycis(predict1.bnn, predict2.bnn) mycis(predict1.ownn, predict2.ownn) mycis(predict1.snn, predict2.snn) mydata Data Generator Generate random data from mixture Gaussian distribution.

myerror 7 mydata(n, d, mu = 0.8, portion = 1/2) n d mu portion The number of observations (sample size). The number of variables (dimension). In the Gaussian mixture model, the first Gaussian is generated with zero mean and identity covariance matrix. The second Gaussian is generated with mean a d-dimensional vector with all mu and identity covariance matrix. The prior probability for the first Gaussian component. Return the data matrix with n rows and d + 1 columns. Each row represents a sample generated from the mixture Gaussian distribution. The first d columns are features and the last column is the class label of the corresponding sample. DATA.x = DATA[,1:d] DATA.y = DATA[,d+1] myerror Classification Error Compute the error of the predict list given the true list. myerror(predict, true)

8 myknn predict true The list of predicted labels The list of true labels It returns the errors of the predicted labels from a classification algorithm. # Training data # Testing data set.seed(2015) ntest = 100 TEST = mydata(ntest, d) TEST.x = TEST[,1:d] TEST.y = TEST[,d+1] ## Compute the errors for knn, bnn, ownn, and snn with given parameters. predict.knn = myknn(data, TEST.x, K = 5) predict.bnn = mybnn(data, TEST.x, ratio = 0.5) predict.ownn = myownn(data, TEST.x, K = 5) predict.snn = mysnn(data, TEST.x, lambda = 10) myerror(predict.knn, TEST.y) myerror(predict.bnn, TEST.y) myerror(predict.ownn, TEST.y) myerror(predict.snn, TEST.y) myknn K Nearest Neighbor Classifier Implement the K nearest neighbor classification algorithm to predict the label of a new input using a training data set.

myknn 9 myknn(train, test, K) train test K Matrix of training data sets. An n by (d+1) matrix, where n is the sample size and d is the dimension. The last column is the class label. Vector of a test point. It also admits a matrix input with each row representing a new test point. Number of nearest neighbors considered. Details The tuning parameter K can be tuned via cross-validation, see cv.tune function for the tuning procedure. It returns the predicted class label of the new test point. If input is a matrix, it returns a vector which contains the predicted class labels of all the new test points. References Fix, E. and Hodges, J. L., Jr. (1951). Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Randolph Field, Texas, Project 21-49-004, Report No.4. # Training data # Testing data set.seed(2015) ntest = 100 TEST = mydata(ntest, d) TEST.x = TEST[,1:d] # K nearest neighbor classifier myknn(data, TEST.x, K = 5)

10 myownn myownn Optimal Weighted Nearest Neighbor Classifier Implement Samworth s optimal weighted nearest neighbor classification algorithm to predict the label of a new input using a training data set. myownn(train, test, K) train test K Matrix of training data sets. An n by (d+1) matrix, where n is the sample size and d is the dimension. The last column is the class label. Vector of a test point. It also admits a matrix input with each row representing a new test point. Number of nearest neighbors considered. Details The tuning parameter K can be tuned via cross-validation, see cv.tune function for the tuning procedure. It returns the predicted class label of the new test point. If input is a matrix, it returns a vector which contains the predicted class labels of all the new test points. References R.J. Samworth (2012), "Optimal Weighted Nearest Neighbor Classifiers," Annals of Statistics, 40:5, 2733-2763. # Training data

mysnn 11 # Testing data set.seed(2015) ntest = 100 TEST = mydata(ntest, d) TEST.x = TEST[,1:d] # optimal weighted nearest neighbor classifier myownn(data, TEST.x, K = 5) mysnn Stabilized Nearest Neighbor Classifier Implement the stabilized nearest neighbor classification algorithm to predict the label of a new input using a training data set. The stabilized nearest neighbor classifier contains the K-nearest neighbor classifier and the optimal weighted nearest neighbor classifier as two special cases. mysnn(train, test, lambda) train test lambda Matrix of training data sets. An n by (d+1) matrix, where n is the sample size and d is the dimension. The last column is the class label. Vector of a test point. It also admits a matrix input with each row representing a new test point. Tuning parameter controlling the degree of stabilization of the nearest neighbor classification procedure. The larger lambda, the more stable the procedure is. Details The tuning parameter lambda can be tuned via cross-validation, see cv.tune for the tuning procedure. It returns the predicted class label of the new test point. If input is a matrix, it returns a vector which contains the predicted class labels of all the new test points. References W. Sun, X. Qiao, and G. Cheng (2015) Stabilized Nearest Neighbor Classifier and Its Statistical Properties. Available at arxiv.org/abs/1405.6642.

12 mywnn # Training data # Testing data set.seed(2015) ntest = 100 TEST = mydata(ntest, d) TEST.x = TEST[,1:d] # stabilized nearest neighbor classifier mysnn(data, TEST.x, lambda = 10) mywnn Weighted Nearest Neighbor Classifier Implement the weighted nearest neighbor classification algorithm to predict the label of a new input using a training data set. mywnn(train, test, weight) train test weight Matrix of training data sets. An n by (d+1) matrix, where n is the sample size and d is the dimension. The last column is the class label. Vector of a test point. The weight vector for all n nearest neighbors. It returns the predicted class label of the new test point.

mywnn 13 ## weighted nearest neighbor classifier weight.vec = c(rep(0.02,50), rep(0,50)) mywnn(data, rep(-5,d), weight = weight.vec)

Index cv.tune, 2 mybnn, 4 mycis, 5 mydata, 6 myerror, 7 myknn, 8 myownn, 10 mysnn, 11 mywnn, 12 snn (snn-package), 2 snn-package, 2 14