Package esabcv. May 29, 2015

Similar documents
Package SSLASSO. August 28, 2018

Package QCAtools. January 3, 2017

Package MIICD. May 27, 2017

Package svd. R topics documented: September 26, 2017

Package texteffect. November 27, 2017

Package sure. September 19, 2017

Package datasets.load

Package weco. May 4, 2018

Package intccr. September 12, 2017

Package DTRlearn. April 6, 2018

Package CVR. March 22, 2017

Package rpst. June 6, 2017

Package glinternet. June 15, 2018

Package OLScurve. August 29, 2016

Package areaplot. October 18, 2017

Package TVsMiss. April 5, 2018

Package ECctmc. May 1, 2018

Package mlegp. April 15, 2018

Package r.jive. R topics documented: April 12, Type Package

Package ConvergenceClubs

Package frequencyconnectedness

Package svd. R topics documented: March 24, Type Package. Title Interfaces to various state-of-art SVD and eigensolvers. Version 0.3.

Package EBglmnet. January 30, 2016

Package misclassglm. September 3, 2016

Package gppm. July 5, 2018

Package orthodr. R topics documented: March 26, Type Package

Package sglr. February 20, 2015

Package MixSim. April 29, 2017

Package BKPC. March 13, 2018

Package cwm. R topics documented: February 19, 2015

Package PCADSC. April 19, 2017

Package TPD. June 14, 2018

Package PSTR. September 25, 2017

Package mgc. April 13, 2018

Package crochet. January 8, 2018

Package ClustGeo. R topics documented: July 14, Type Package

Package longclust. March 18, 2018

Package flam. April 6, 2018

Package beast. March 16, 2018

Package nprotreg. October 14, 2018

Package strat. November 23, 2016

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

Package cgh. R topics documented: February 19, 2015

Package tropicalsparse

Package clustering.sc.dp

Package CausalImpact

Package bayescl. April 14, 2017

Package RNAseqNet. May 16, 2017

Package LVGP. November 14, 2018

Package rereg. May 30, 2018

Package spatialtaildep

Package EDFIR. R topics documented: July 17, 2015

Package spark. July 21, 2017

Package FisherEM. February 19, 2015

Package bisect. April 16, 2018

Package robets. March 6, Type Package

Package CINID. February 19, 2015

Package lolr. April 13, Type Package Title Linear Optimal Low-Rank Projection Version 2.0 Date

Package swcrtdesign. February 12, 2018

Package LINselect. June 9, 2018

Package JOP. February 19, 2015

Package OrthoPanels. November 11, 2016

Package NNLM. June 18, 2018

Package msda. February 20, 2015

Package apastyle. March 29, 2017

Package gibbs.met. February 19, 2015

Package MatrixCorrelation

Package Mondrian. R topics documented: March 4, Type Package

Package gtrendsr. August 4, 2018

Package ADMM. May 29, 2018

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package mixphm. July 23, 2015

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package endogenous. October 29, 2016

Package FWDselect. December 19, 2015

Package listdtr. November 29, 2016

Package CuCubes. December 9, 2016

Package slam. February 15, 2013

Package bdpopt. March 30, 2016

Package bdots. March 12, 2018

Package StVAR. February 11, 2017

Package madsim. December 7, 2016

Package hbm. February 20, 2015

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package sigqc. June 13, 2018

Package ArCo. November 5, 2017

Package gtrendsr. October 19, 2017

Package RPMM. August 10, 2010

Package bayesdp. July 10, 2018

Package lcc. November 16, 2018

Package DCD. September 12, 2017

Package asymmetry. February 1, 2018

Package rxylib. December 20, 2017

Package pomdp. January 3, 2019

Package BayesCR. September 11, 2017

Package longitudinal

Package lvm4net. R topics documented: August 29, Title Latent Variable Models for Networks

Package bootsvd. August 29, 2016

Package reval. May 26, 2015

Package OutlierDC. R topics documented: February 19, 2015

Transcription:

Package esabcv May 29, 2015 Title Estimate Number of Latent Factors and Factor Matrix for Factor Analysis Version 1.2.1 These functions estimate the latent factors of a given matrix, no matter it is highdimensional or not. It tries to first estimate the number of factors using bi-crossvalidation and then estimate the latent factor matrix and the noise variances. For more information about the method, see Art B. Owen and Jingshu Wang 2015 archived article on factor model (http://arxiv.org/abs/1503.03515). Depends R (>= 3.0.2) License GPL (>= 2) LazyData true Imports corpcor, svd Suggests MASS Author Art B. Owen [aut], Jingshu Wang [aut, cre] Maintainer Jingshu Wang <wangjingshususan@gmail.com> NeedsCompilation no Repository CRAN Date/Publication 2015-05-29 08:45:53 R topics documented: ESA............................................. 2 EsaBcv........................................... 3 esabcv_package....................................... 5 plot.esabcv......................................... 6 simdat............................................ 7 Index 8 1

2 ESA ESA Estimate Latent Factor Matrix With Known Number of Factors Estimate the latent factor matrix and noise variance using early stopping alternation (ESA) given the number of factors. Usage ESA(Y, r, X = NULL, center = F, niter = 3, svd.method = "fast") Arguments Y r observed data matrix. p is the number of variables and n is the sample size. Dimension is c(n, p) The number of factors to use X the known predictors of size c(n, k) if any. Default is NULL (no known predictors). k is the number of known covariates. center logical, whether to add an intercept term in the model. Default is False. niter the number of iterations for ESA. Default is 3. svd.method either "fast", "propack" or "standard". "fast" is using the fast.svd function in package corpcor to compute SVD, "propack" is using the propack.svd to compute SVD and "standard" is using the svd function in the base package. Because of PROPACK issues, "propack" fails for some matrices, and when that happens, the function will use "fast" to compute the SVD of that matrix instead. Default method is "fast". Details The model used is Y = 1µ + Xβ + n 1/2 UDV + EΣ 1/2 where D and Σ are diagonal matrices, U and V are orthogonal and µ and V mean _mu transposed_ and _V transposed_ respectively. The entries of E are assumed to be i.i.d. standard Gaussian. The model assumes heteroscedastic noises and especially works well for high-dimensional data. The method is based on Owen and Wang (2015). Notice that when nonnull X is given or centering the data is required (which is essentially adding a known covariate with all 1), for identifiability, it s required that < X, U >= 0 or < 1, U >= 0 respectively. Then the method will first make a rotation of the data matrix to remove the known predictors or centers, and then use the latter n - k (or n - k - 1 if centering is required) samples to estimate the latent factors.

EsaBcv 3 Value The returned value is a list with components estsigma the diagonal entries of estimated Σ which is a vector of length p estu the estimated U. Dimension c(n, r) estd the estimated diagonal entries of D which is a vector of length r estv the estimated V. Dimension is c(p, r) beta ests the estimated beta which is a matrix of size c(k, p). Return NULL if the argument X is NULL. the estimated signal (factor) matrix S where S = 1µ + Xβ + n 1/2 UDV mu the sample centers of each variable which is a vector of length p. It s an estimate of µ. Return NULL if the argument center is False. References Art B. Owen and Jingshu Wang(2015), Bi-cross-validation for factor analysis, http://arxiv.org/ abs/1503.03515 Examples Y <- matrix(rnorm(100), nrow = 10) + 3 * rnorm(10) %*% t(rep(1, 10)) ESA(Y, 1) EsaBcv Estimate Latent Factor Matrix Usage Find out the best number of factors using Bi-Cross-Validation (BCV) with Early-Stopping-Alternation (ESA) and then estimate the factor matrix. EsaBcv(Y, X = NULL, r.limit = 20, niter = 3, nrepeat = 12, only.r = F, svd.method = "fast", center = F) Arguments Y observed data matrix. p is the number of variables and n is the sample size. Dimension is c(n, p) X the known predictors of size c(n, k) if any. Default is NULL (no known predictors). k is the number of known covariates. r.limit the maximum number of factor to try. Default is 20. Can be set to Inf.

4 EsaBcv Details Value niter the number of iterations for ESA. Default is 3. nrepeat only.r svd.method center The model is number of repeats of BCV. In other words, the random partition of Y will be repeated for nrepeat times. Default is 12. whether only to estimate and return the number of factors. either "fast", "propack" or "standard". "fast" is using the fast.svd function in package corpcor to compute SVD, "propack" is using the propack.svd to compute SVD and "standard" is using the svd function in the base package. Because of PROPACK issues, "propack" fails for some matrices, and when that happens, the function will use "fast" to compute the SVD of that matrix instead. Default method is "fast". logical, whether to add an intercept term in the model. Default is False. Y = 1µ + Xβ + n 1/2 UDV + EΣ 1/2 where D and Σ are diagonal matrices, U and V are orthogonal and mu and V represent _mu transposed_ and _V transposed_ respectively. The entries of E are assumed to be i.i.d. standard Gaussian. The model assumes heteroscedastic noises and especially works well for high-dimensional data. The method is based on Owen and Wang (2015). Notice that when nonnull X is given or centering the data is required (which is essentially adding a known covariate with all 1), for identifiability, it s required that < X, U >= 0 or < 1, U >= 0 respectively. Then the method will first make a rotation of the data matrix to remove the known predictors or centers, and then use the latter n - k (or n - k - 1 if centering is required) samples to estimate the latent factors. The rotation idea first appears in Sun et.al. (2012). EsaBcv returns an obejct of class "esabcv" The function plot plots the cross-validation results and points out the number of factors estimated An object of class "esabcv" is a list containing the following components: best.r estsigma the best number of factor estimated the diagonal entries of estimated Σ which is a vector of length p estu the estimated U. Dimension is c(n, r) estd the estimated diagonal entries of D which is a vector of length r estv the estimated V. Dimension is c(p, r) beta ests the estimated β which is a matrix of size c(k, p). Return NULL if the argument X is NULL. the estimated signal(factor) matrix S where S = 1µ + Xβ + n 1/2 UDV mu the sample centers of each variable which is a vector of length p. It s an estimate of µ. Return NULL if the argument center is False.

esabcv_package 5 max.r result.list the actual maximum number of factors used. For the details of how this is decided, please refer to Owen and Wang (2015) a matrix with dimension c(nrepeat, (max.r + 1)) storing the detailed BCV entrywise MSE of each repeat for r from 0 to max.r References Art B. Owen and Jingshu Wang(2015), Bi-cross-validation for factor analysis, http://arxiv.org/ abs/1503.03515 Yunting Sun, Nancy R. Zhang and Art B. Owen, Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data. The Annuals of Applied Statistics, 6(4): 1664-1688, 2012 See Also ESA, plot.esabcv Examples Y <- matrix(rnorm(100), nrow = 10) EsaBcv(Y) esabcv_package esabcv The esabcv package provides functions to estimate the latent factors of a given matrix, no matter it is high-dimensional or not. It tries to first estimate the number of factors using Bi-cross-validation and then estimate the latent factor matrix and the noise variances using an Early-stopping-alternation method. The method is proposed by Art B. Owen and Jingshu Wang (2015). Author(s) Maintainer: Jingshu Wang <wangjingshususan@gmail.com> See Also Owen and Wang (2015) Bi-cross-validation for factor analysis, http://arxiv.org/abs/1503. 03515

6 plot.esabcv Examples ## Not run: data(simdat) result <- EsaBcv(simdat$Y) plot(result) ## End(Not run) plot.esabcv Plot Bi-cross-validation(BCV) Errors Plot the average BCV entrywise MSE against the number of factors tried, with error bars and the best number of factors picked. Usage ## S3 method for class esabcv plot(x, start.r = 0, end.r = NA, xlab = "Number of Factors", ylab = "BCV MSE", main = "Bi-cross-validation Error", col.line = "BLUE",...) Arguments x start.r end.r xlab ylab main col.line esabcv object, typically result of EsaBcv. the starting number of factors to display in the plot. the largest number of factors allowed to display in the plot. Default is NA, which means to make end.r as max.r. title for the x axis. title for the y axis. title for the plot. the line color.... other parameters to be passed through to plotting functions. Details The esabcv object contains the raw BCV result result.list, which is a matrix with dimension c(nrepeat, (max.r + 1)) where nrepeat is the number of BCV repeats and max.r is the maximum number of factors tried. If either tail of the error curve dominates, then the user has the option to change the start and end rank for plotting.

simdat 7 Value A plot ploting the average BCV entrywise MSE against the number of factors tried (start.r to max.r + 1), with error bars (one standard deviation) in grey and selected number of factors marked by a red crossing. Examples ## Not run: data(simdat) result <- EsaBcv(simdat$Y) plot(result) plot(result, start.r = 1) ## End(Not run) simdat Example Dataset Details The data is a simulated data set where the data matrix is generated from the latent factor model Y = n 1/2 UDV + EΣ 1/2 where D and Σ are diagonal matrices, and U and V are orthogonal. V means _V transposed_. For the factors, we include one giant factor, five useful factors, one harmful factor and one undetectable factor. For more details of the simulation method used, please refer to Appendix A.1 of Owen and Wang (2015) Bi-cross-validation for factor analysis, http://arxiv.org/abs/1503.03515. The dataset is a list of components: Y a data matrix of 200 by 1000, where each row is a sample and each column is a variable U the orthogonal factor matrix U of size 200 by 8. V the orthogonal factor matrix V of size 1000 by 8. D the vector of diagonal entries of D. Sigma the vector of diagonal entries of Σ. oracle.r the oracle rank (the optimal number of factors that should be kept) of the factor matrix.

Index class, 4 ESA, 2, 5 EsaBcv, 3, 6 esabcv_package, 5 esabcv_package-package (esabcv_package), 5 fast.svd, 2, 4 plot.esabcv, 5, 6 propack.svd, 2, 4 simdat, 7 svd, 2, 4 8