Package penalizedlda

Similar documents
Package TVsMiss. April 5, 2018

Package RAMP. May 25, 2017

Package msda. February 20, 2015

Package CVR. March 22, 2017

Package snn. August 23, 2015

Package hiernet. March 18, 2018

Package flam. April 6, 2018

Package glinternet. June 15, 2018

Package glassomix. May 30, 2013

Package parcor. February 20, 2015

Package SSLASSO. August 28, 2018

Package FisherEM. February 19, 2015

Package PUlasso. April 7, 2018

Package svmpath. R topics documented: August 30, Title The SVM Path Algorithm Date Version Author Trevor Hastie

Package ordinalnet. December 5, 2017

Package flsa. February 19, 2015

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package cwm. R topics documented: February 19, 2015

Package longclust. March 18, 2018

Package SAM. March 12, 2019

Package RegularizedSCA

Package edrgraphicaltools

Package CVST. May 27, 2018

Package hgam. February 20, 2015

Package sbf. R topics documented: February 20, Type Package Title Smooth Backfitting Version Date Author A. Arcagni, L.

Package wskm. July 8, 2015

Package mtsdi. January 23, 2018

Package glmnetutils. August 1, 2017

Package milr. June 8, 2017

Package mlegp. April 15, 2018

Model selection and validation 1: Cross-validation

Package ic10. September 23, 2015

Package fbroc. August 29, 2016

Lecture 22 The Generalized Lasso

Package NBLDA. July 1, 2018

Package DRaWR. February 5, 2016

Package gwrr. February 20, 2015

Linear Methods for Regression and Shrinkage Methods

Package RCA. R topics documented: February 29, 2016

Chapter 6: Linear Model Selection and Regularization

Package LINselect. June 9, 2018

Package BigVAR. R topics documented: April 3, Type Package

Package ADMM. May 29, 2018

Package mrm. December 27, 2016

Package rpst. June 6, 2017

Package picasso. December 18, 2015

Package LVGP. November 14, 2018

Package GenSA. R topics documented: January 17, Type Package Title Generalized Simulated Annealing Version 1.1.

Package EBglmnet. January 30, 2016

Package listdtr. November 29, 2016

Package parfossil. February 20, 2015

Package nodeharvest. June 12, 2015

Package esabcv. May 29, 2015

Package mixphm. July 23, 2015

Package camel. September 9, 2013

Package pendvine. R topics documented: July 9, Type Package

Package vennlasso. January 25, 2019

Package IntNMF. R topics documented: July 19, 2018

Package clime. April 9, 2010

Package pnmtrem. February 20, Index 9

Package TANDEM. R topics documented: June 15, Type Package

Package XMRF. June 25, 2015

Package gibbs.met. February 19, 2015

Package lol. R topics documented: December 13, Type Package Title Lots Of Lasso Version Date Author Yinyin Yuan

Package restlos. June 18, 2013

Package PTE. October 10, 2017

Package VDA. R topics documented: February 19, Type Package Title VDA Version 1.3 Date

Package SAENET. June 4, 2015

Package ssa. July 24, 2016

Package CondReg. February 19, 2015

Package RPEnsemble. October 7, 2017

Package EnsembleBase

Package CepLDA. January 18, 2016

Package cgh. R topics documented: February 19, 2015

Package mmtsne. July 28, 2017

Package rgcvpack. February 20, Index 6. Fitting Thin Plate Smoothing Spline. Fit thin plate splines of any order with user specified knots

Package Combine. R topics documented: September 4, Type Package Title Game-Theoretic Probability Combination Version 1.

Package sspline. R topics documented: February 20, 2015

Package RaPKod. February 5, 2018

Package datasets.load

Package catenary. May 4, 2018

Package lga. R topics documented: February 20, 2015

Package missforest. February 15, 2013

Package MTLR. March 9, 2019

Package rknn. June 9, 2015

Package GADAG. April 11, 2017

Package orthodr. R topics documented: March 26, Type Package

Package FWDselect. December 19, 2015

Package enpls. May 14, 2018

Package freeknotsplines

Package MixSim. April 29, 2017

Package flare. August 29, 2013

Package OmicKriging. August 29, 2016

Package corclass. R topics documented: January 20, 2016

Discriminate Analysis

Package clustering.sc.dp

Package MMIX. R topics documented: February 15, Type Package. Title Model selection uncertainty and model mixing. Version 1.2.

Package clusternomics

Package sprinter. February 20, 2015

Package alphashape3d

Transcription:

Type Package Package penalizedlda July 11, 2015 Title Penalized Classification using Fisher's Linear Discriminant Version 1.1 Date 2015-07-09 Author Daniela Witten Maintainer Daniela Witten <dwitten@u.washington.edu> Implements the penalized LDA proposal of ``Witten and Tibshirani (2011), Penalized classification using Fisher's linear discriminant, to appear in Journal of the Royal Statistical Society, Series B''. License GPL (>= 2) LazyLoad yes Depends R (>= 1.9.0) Imports flsa, graphics, stats, utils NeedsCompilation no Repository CRAN Date/Publication 2015-07-11 01:39:15 R topics documented: penalizedlda-package................................... 2 PenalizedLDA........................................ 3 PenalizedLDA.cv...................................... 5 plot.penlda......................................... 7 plot.penldacv........................................ 8 predict.penlda........................................ 9 print.penlda......................................... 10 print.penldacv........................................ 11 Index 13 1

2 penalizedlda-package penalizedlda-package Penalized linear discriminant analysis using lasso and fused lasso penalties. Details This package performs penalized linear discriminant analysis, intended for the high-dimensional setting in which the number of features p exceeds the number of observations n. Fisher s discriminant problem is modified in two ways: 1. A diagonal estimate of the within-class class covariance is used. 2. Lasso or fused lasso penalties are applied to the discriminant vectors in order to encourage sparsity, or sparsity and smoothness. Package: penalizedlda Type: Package Version: 1.1 Date: 2015-07-09 License: GPL (>=2.0) LazyLoad: yes The main functions are PenalizedLDA, which performs penalized linear discriminant analysis, and PenalizedLDA.cv, which performs cross-validation in order to select the optimal tuning parameters for penalized LDA. Maintainer: <dwitten@u.washington.edu> D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discriminant. To appear in Journal of the Royal Statistical Society, Series B. out <- PenalizedLDA(x,y,lambda=.14,K=2)

PenalizedLDA 3 PenalizedLDA Perform penalized linear discriminant analysis using L1 or fused lasso penalties. Solve Fisher s discriminant problem in high-dimensions using (a) a diagonal estimate of the withinclass covariance matrix, and (b) lasso or fused lasso penalties on the discriminant vectors. PenalizedLDA(x, y, xte=null, type = "standard", lambda, K = 2, chrom = NULL, lambda2 = NULL, standardized = FALSE, wcsd.x = NULL, ymat = NULL, maxiter = 20, trace=false) x y xte type lambda K chrom lambda2 standardized wcsd.x ymat A nxp data matrix; n observations on the rows and p features on the columns. A n-vector containing the class labels. Should be coded as 1, 2,..., nclasses, where nclasses is the number of classes. A mxp data matrix; m test observations on the rows and p features on the columns. Predictions will be made at these test observations. If NULL then no predictions will be output. Either "standard" or "ordered". The former will result in the use of lasso penalties, and the latter will result in fused lasso penalties. "Ordered" is appropriate if the features are ordered and it makes sense for the discriminant vector(s) to preserve that ordering. If type="standard" then this is the lasso penalty tuning parameter. If type="ordered" then this is the tuning parameter for the sparsity component of the fused lasso penalty term. The number of discriminant vectors desired. Must be no greater than (number of classes - 1). Only applies to type="ordered". Should be used only if the p features correspond to chromosomal locations. In this case, a numeric vector of length p indicating which "chromosome" each feature belongs to. The purpose is to avoid imposing smoothness between chromosomes. If type="ordered", then this penalty controls the smoothness term in the fused lasso penalty. Larger lambda2 will lead to more smoothness. Have the features on the data already been standardized to have mean zero and within-class standard deviation 1? In general, set standardized=false. If the within-class standard deviation for each feature has already been computed, it can be passed in. Usually will be NULL. If y has already been converted into a n x nclasses matrix of indicator variables, it can be passed in. Usually will be NULL.

4 PenalizedLDA Details Value maxiter Maximum number of iterations to be performed; default is 20. trace Print out progress through iterations? Assume that the features (columns) of x have been standardized to have mean 0 and standard deviation 1. Then, if type="standard", the optimization problem for the first discriminant vector is max_b b Sigma_bet b - lambda*d*sum(abs(b)) s.t. b ^2 <= 1 where d is the largest eigenvalue of Sigma_bet. Alternatively, if type="ordered", the optimization problem for the first discriminant vector is max_b b Sigma_bet b - lambda*d*sum(abs(b)) - lambda2*d*sum(abs(bj - b(j-1))) s.t. b ^2 <= 1. For details about the optimization problem for obtaining subsequent discriminant vectors, see the paper below. ypred discrim xproj xteproj crits A mxk of predicted class labels for the test observations; output ONLY if xte was passed in. The kth column indicates the test set classifications if the first k discriminant vectors are used. If K=1 then simple a m-vector is output. A pxk matrix of penalized discriminant vectors. Note that these discriminant vectors were computed after first scaling each feature of the data to have mean zero and within-class standard deviation one. A nxk matrix of the data projected onto the discriminant vectors A mxk matrix of the data projected onto the discriminant vectors; output ONLY if xte was passed in. The value of the objective at each iteration. If K>1 then this is a list with the value of the objective obtained in. D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discrimint. To appear in JRSSB. out <- PenalizedLDA(x,y,lambda=.14,K=2)

PenalizedLDA.cv 5 # For more examples, try "?PenalizedLDA.cv" PenalizedLDA.cv Perform cross-validation for penalized linear discriminant analysis. Performs cross-validation for PenalizedLDA function. PenalizedLDA.cv(x, y, lambdas = NULL, K = NULL, nfold = 6, folds = NULL, type = "standard", chrom = NULL, lambda2 = NULL) x y lambdas K nfold folds type chrom lambda2 A nxp data matrix; n is the number of observations and p is the number of features. A n-vector y containing class labels, represented as 1, 2,..., nclasses. A vector of lambda values to be considered. The number of discriminant vectors to be used. If K is not specified, then crossvalidation will be performed in order to choose the number of discriminant vectors to use. Number of cross-validation folds. Optional - one can pass in a list containing the observations that should be used as the test set in each cross-validation fold. Either "standard" or "ordered". The former will result in the use of lasso penalties, and the latter will result in fused lasso penalties. "Ordered" is appropriate if the features are ordered and it makes sense for the discriminant vector(s) to preserve that ordering. Only applies to type="ordered". Should be used only if the p features correspond to chromosomal locations. In this case, a numeric vector of length p indicating which "chromosome" each feature belongs to. The purpose is to avoid imposing smoothness between chromosomes. If type is "ordered", enter the value of lambda2 to be used. Note that crossvalidation is performed over lambda (and possibly over K) but not over lambda2. Value errs The mean cross-validation error rates obtained. Either a vector of length equal to length(lambdas) or a length(lambdas)x(length(unique(y))-1) matrix. The former will occur if K is specified and the latter will occur otherwise, in which case cross-validation occurred over K as well as over lambda.

6 PenalizedLDA.cv nnonzero bestk bestlambda A vector or matrix of the same dimension as "errs". Entries indicate the number of nonzero features involved in the corresponding classifier. Value of K(= number of discriminant vectors) that minimizes the cross-validation error. Value of "lambdas" that minimizes the cross-validation error. bestlambda.1se Given that K equals bestk, this is the largest value of lambda such that the corresponding error is within 1 standard error of the minimum. This is the "one standard error" rule for selecting the tuning parameter. lambdas Ks folds Values of lambda considered. Values of K considered - only output if K=NULL was input. Folds used in cross-validation. D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discriminant. To appear in JRSSB. # Generate data # # number of training obs m <- 40 # number of test obs # number of features xte <- matrix(rnorm(m*p), ncol=p) y <- c(rep(1,5),rep(2,5),rep(3,6), rep(4,4)) yte <- rep(1:4, each=10) x[y==3,21:30] <- x[y==3,21:30] - 2.5 xte[yte==1,1:10] <- xte[yte==1,1:10] + 2 xte[yte==2,11:20] <- xte[yte==2,11:20] - 2 xte[yte==3,21:30] <- xte[yte==3,21:30] - 2.5 # Perform cross-validation # # Use type="ordered" -- that is, we are assuming that the features have # some sort of spatial structure cv.out <- PenalizedLDA.cv(x,y,type="ordered",lambdas=c(1e-4,1e-3,1e-2,.1,1,10),lambda2=.3) print(cv.out) plot(cv.out) # Perform penalized LDA # out <- PenalizedLDA(x,y,xte=xte,type="ordered", lambda=cv.out$bestlambda, K=cv.out$bestK, lambda2=.3)

plot.penlda 7 print(table(out$ypred[,out$k],yte)) # Now repeat penalized LDA computations but this time use # type="standard" - i.e. don t exploit spatial structure # Perform cross-validation # cv.out <- PenalizedLDA.cv(x,y,lambdas=c(1e-4,1e-3,1e-2,.1,1,10)) print(cv.out) plot(cv.out) # Perform penalized LDA # out <- PenalizedLDA(x,y,xte=xte,lambda=cv.out$bestlambda,K=cv.out$bestK) print(table(out$ypred[,out$k],yte)) plot.penlda Plot PenalizedLDA outputs. Make some plots! ## S3 method for class penlda plot(x,...) x A "penlda" object; this is the output of the PenalizedLDA function....... D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discrimint. To appear in JRSSB.

8 plot.penldacv xte <- matrix(rnorm(n*p), ncol=p) xte[y==1,1:10] <- xte[y==1,1:10] + 2 xte[y==2,11:20] <- xte[y==2,11:20] - 2 out <- PenalizedLDA(x,y,xte,lambda=.14,K=2) pred.out <- predict(out,xte=xte) cat("predictions obtained using PenalizedLDA function and using predict.penlda function are the same.") print(cor(pred.out$ypred,out$ypred)) plot.penldacv Plot penalized LDA CV results. Make a plot of the CV output. ## S3 method for class penldacv plot(x,...) x A "penldacv" object....... D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discrimint. To appear in JRSSB.

predict.penlda 9 xte <- matrix(rnorm(n*p), ncol=p) xte[y==1,1:10] <- xte[y==1,1:10] + 2 xte[y==2,11:20] <- xte[y==2,11:20] - 2 out <- PenalizedLDA(x,y,xte,lambda=.14,K=2) pred.out <- predict(out,xte=xte) cat("predictions obtained using PenalizedLDA function and using predict.penlda function are the same.") print(cor(pred.out$ypred,out$ypred)) predict.penlda Make PenalizedLDA predictions on a new data set. Given output from the PenalizedLDA function, make predictions on a new data set. ## S3 method for class penlda predict(object,xte,...) object A "penlda" object; this is the output of the PenalizedLDA function. xte A data set on which predictions should be made....... Value ypred A matrix with nrow(xte) rows and K columns where K is the number of discriminant vectors in the "penlda" object passed in. The first column contains predictions obtained if only the 1st discriminant vector is used, the 2nd column contains predictions obtained if the first 2 discriminant vectors are used, and so on. If there is only 1 discriminant vector in the "penlda" object passed in, then just a vector is output.

10 print.penlda D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discrimint. To appear in JRSSB. xte <- matrix(rnorm(n*p), ncol=p) xte[y==1,1:10] <- xte[y==1,1:10] + 2 xte[y==2,11:20] <- xte[y==2,11:20] - 2 out <- PenalizedLDA(x,y,xte,lambda=.14,K=2) pred.out <- predict(out,xte=xte) cat("predictions obtained using PenalizedLDA function and using predict.penlda function are the same.") print(cor(pred.out$ypred,out$ypred)) print.penlda Print PenalizedLDA output. A nice way to display the results of PenalizedLDA. ## S3 method for class penlda print(x,...) x...... A "penlda" object; this is the output of the PenalizedLDA function. D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discriminant. To appear in JRSSB.

print.penldacv 11 xte <- matrix(rnorm(n*p), ncol=p) xte[y==1,1:10] <- xte[y==1,1:10] + 2 xte[y==2,11:20] <- xte[y==2,11:20] - 2 out <- PenalizedLDA(x,y,xte,lambda=.14,K=2) pred.out <- predict(out,xte=xte) cat("predictions obtained using PenalizedLDA function and using predict.penlda function are the same.") print(cor(pred.out$ypred,out$ypred)) print.penldacv Print PenalizedLDA CV results. A nice print-out of the results obtained from running the CV function for PenalizedLDA. ## S3 method for class penldacv print(x,...) x A "penldacv" object; this is the output of the PenalizedLDA CV function....... D Witten and R Tibshirani (2011) Penalized classification using Fisher s linear discrimint. To appear in JRSSB.

12 print.penldacv xte <- matrix(rnorm(n*p), ncol=p) xte[y==1,1:10] <- xte[y==1,1:10] + 2 xte[y==2,11:20] <- xte[y==2,11:20] - 2 out <- PenalizedLDA(x,y,xte,lambda=.14,K=2) pred.out <- predict(out,xte=xte) cat("predictions obtained using PenalizedLDA function and using predict.penlda function are the same.") print(cor(pred.out$ypred,out$ypred))

Index Topic package penalizedlda-package, 2 PenalizedLDA, 3 penalizedlda (penalizedlda-package), 2 penalizedlda-package, 2 PenalizedLDA.cv, 5 plot.penlda, 7 plot.penldacv, 8 predict.penlda, 9 print.penlda, 10 print.penldacv, 11 13