The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R

Similar documents
The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Parametric Simplex Method for Sparse Learning

The picasso Package for High Dimensional Regularized Sparse Learning in R

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Comparison of Optimization Methods for L1-regularized Logistic Regression

Gradient LASSO algoithm

Seeking Interpretable Models for High Dimensional Data

Package clime. April 9, 2010

Sketching Method for Large Scale Combinatorial Inference

Linear Methods for Regression and Shrinkage Methods

An efficient algorithm for sparse PCA

Package ADMM. May 29, 2018

An efficient algorithm for rank-1 sparse PCA

Model selection and validation 1: Cross-validation

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

GRAPH LEARNING UNDER SPARSITY PRIORS. Hermina Petric Maretic, Dorina Thanou and Pascal Frossard

Expectation Maximization (EM) and Gaussian Mixture Models

Two algorithms for large-scale L1-type estimation in regression

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Discrete Optimization. Lecture Notes 2

Supervised vs. Unsupervised Learning

Programming, numerics and optimization

Lasso.jl Documentation

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

SnapVX: A Network-Based Convex Optimization Solver

Generalized Network Flow Programming

Lecture 22 The Generalized Lasso

Package sparsenetgls

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

VARIANTS OF THE SIMPLEX METHOD

The Alternating Direction Method of Multipliers

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm

Package flare. August 29, 2013

Bilevel Sparse Coding

Section Notes 4. Duality, Sensitivity, and the Dual Simplex Algorithm. Applied Math / Engineering Sciences 121. Week of October 8, 2018

Predictor Selection Algorithm for Bayesian Lasso

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

CS294-1 Assignment 2 Report

BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

I How does the formulation (5) serve the purpose of the composite parameterization

Network Lasso: Clustering and Optimization in Large Graphs

Package glassomix. May 30, 2013

A Comparative study on Algorithms for Shortest-Route Problem and Some Extensions

A fast Mixed Model B-splines algorithm

Lasso. November 14, 2017

Package flare. March 25, 2018

4.1 The original problem and the optimal tableau

Bilinear Programming

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

New Directions in Linear Programming

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Chapter II. Linear Programming

Change-Point Estimation of High-Dimensional Streaming Data via Sketching

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Random Projection Features and Generalized Additive Models

The Simplex Algorithm

Clustering. Supervised vs. Unsupervised Learning

Parallel and Distributed Sparse Optimization Algorithms

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Developing a Data Driven System for Computational Neuroscience

Sparsity and image processing

Outline. Column Generation: Cutting Stock A very applied method. Introduction to Column Generation. Given an LP problem

Column Generation: Cutting Stock

A Multilevel Acceleration for l 1 -regularized Logistic Regression

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Convexization in Markov Chain Monte Carlo

THE simplex algorithm [1] has been popularly used

Package XMRF. June 25, 2015

An explicit feature control approach in structural topology optimization

Dantzig selector homotopy with dynamic measurements

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

Selected Topics in Column Generation

A PARAMETRIC SIMPLEX METHOD FOR OPTIMIZING A LINEAR FUNCTION OVER THE EFFICIENT SET OF A BICRITERIA LINEAR PROBLEM. 1.

Some Advanced Topics in Linear Programming

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Department of Mathematics Oleg Burdakov of 30 October Consider the following linear programming problem (LP):

Package camel. September 9, 2013

Voxel selection algorithms for fmri

Package picasso. December 18, 2015

Manufacturing Time Series Modeling via a Natural Language Perspective

SUBSTITUTING GOMORY CUTTING PLANE METHOD TOWARDS BALAS ALGORITHM FOR SOLVING BINARY LINEAR PROGRAMMING

Stable sets, corner polyhedra and the Chvátal closure

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

CONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year

Generalized Additive Model

Machine Learning for Signal Processing Lecture 4: Optimization

A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS

Assessing the Quality of the Natural Cubic Spline Approximation

Sparse Screening for Exact Data Reduction

College of Computer & Information Science Fall 2007 Northeastern University 14 September 2007

Second Order SMO Improves SVM Online and Active Learning

Transcription:

Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton University Princeton, NJ 08540, USA hpang@princeton.edu hanliu@princeton.edu rvdb@princeton.edu Editor: Abstract We develop an R package fastclime for solving a family of regularized linear programming (LP) problems. Our package efficiently implements the parametric simplex algorithm, which provides a scalable and sophisticated tool for solving large-scale linear programs. As an illustrative example, one use of our LP solver is to implement an important sparse precision matrix estimation method called CLIME (Constrained L 1 Minimization Estimator). Compared with existing packages for this problem such as clime and flare, our package has three advantages: (1) it efficiently calculates the full piecewise-linear regularization path; (2) it provides an accurate dual certificate as stopping criterion; (3) it is completely coded in C and is highly portable. This package is designed to be useful to statisticians and machine learning researchers for solving a wide range of problems. Keywords: High dimensional data, sparse precision matrix, linear programming, parametric simplex method, undirected graphical model 1. Introduction and Parametric Simplex Method We introduce an R package, fastclime, that efficiently solves a family of regularized LP problems. Our package has two major components. First, we provide an interface function that implements the parametric simplex method (PSM). This algorithm can efficiently solve large-scale LPs. Second, we apply the PSM to implement an important sparse precision matrix estimation method called CLIME (Cai et al., 2011), which is useful in high-dimensional graphical models. In the rest of this section, we describe briefly the main idea of the PSM. We refer readers who are unfamiliar with simplex methods in general to Vanderbei (2008). Consider the LP problem max c T β subject to: Aβ b, β 0, (1) where A R n d, c R d and b R n are given and, and are defined componentwise. Simplex methods expect problems in equality form. Therefore, the first task is to introduce new variables which we tack onto the end of β and make it a longer vector ˆβ. We rewrite the constraints with the new variables on the left and the old ones on the right: max c T β N subject to: β B = b Aβ N, β B 0, β N 0. (2) Here, N = {1, 2,..., d}, B = {d+1, d+2,..., d+n}, and β B and β N denote subvectors of ˆβ associated with the indices in the set. In each iteration, one variable on the left is swapped with one on the right. In general, the variables on the left are called basic variables and c 2013 Pang, Liu and Vanderbei.

Pang, Liu and Vanderbei the variables on the right are called nonbasic variables. As the algorithm progresses, the set of nonbasic variables changes and the objective function is re-expressed purely in terms of the current nonbasic variables and therefore the coefficients for the objective function change. In a similar manner, the coefficients in the linear equality constraints also change. We denote these changed quantities by Â, ˆb, and ĉ. Associated with each of these updated representations of the equations of the problem is a particular candidate solution obtained by setting β N = 0 and reading off the corresponding values for basic variables β B = ˆb. If ˆb 0, then the candidate solution is feasible (i.e., satisfies all constraints). If in addition ĉ 0, then the solution is optimal. Each variant of the simplex method is defined by the rule for choosing the pair of variables to swap at each iteration. The PSM s rule is described as follows (Vanderbei, 2008). Before the algorithm starts, it parametrically perturbs b and c: max(c + λc ) T β subject to: Aβ b + λb, β 0. (3) Here b 0 and c 0; they are called perturbation vectors. With this choice, the perturbed problem is optimal for large λ. The method then uses pivots to systematically reduce λ to smaller values while maintaining optimality as it goes. Once the interval of optimal λ values covers zero, we simply set λ = 0 and read off the solution to the original problem. Sometimes there is a natural choice of the perturbation vectors b and c suggested by the underlying problem for which it is known that the initial solution to the perturbed problem is optimal for some value of λ. Otherwise, the solver generates perturbations on its own. If we are only interested in solving generic LPs, the PSM is comparable to any other variant of the simplex method. However, as we will see in the next section, the parametric simplex method is particularly well-suited to machine learning problems since the relaxation parameter λ in (3) is naturally related to the regularization parameter in sparse learning problems. This connection allows us to solve a full range of learning problems corresponding to all the regularization parameters. If a regularized learning problem can be formulated as (3), then the entire solution path can be obtained by solving one LP with the PSM. More precisely, at each iteration of the PSM, the current solution is the optimal solution for some interval of λ values. If these solutions are stored, then when λ reaches 0, we have the optimal solution to every λ-perturbed problem for all λ between 0 and the starting value. We describe the application of the PSM to sparse precision matrix estimation in Section 2. In section 3, we show the user interface and gives some examples. Numerical benchmark and comparisons with other implementations of the precision matrix estimation are provided in Section 4. For details of examples and how to use the package, we refer the user to the companion vignette and package references. 2. Application to Sparse Precision Matrix Estimation Estimating large covariance and precision matrices is a fundamental problem which has many applications in modern statistics and machine learning. We denote Ω = Σ 1, where Σ is the population covariance matrix. Under Gaussian model, the sparse precision matrix Ω encodes the conditional independence relationships among the variables and that is why sparse precision matrices are closely related to undirected graphs. Recently, several sparse precision matrix estimation methods have been proposed, including penalized 2

The fastclime Package for Large-Scale Precision Matrix Estimation in R maximum-likelihood estimation (MLE) (Banerjee et al., 2008; Friedman et al., 2007b,a, 2010), neighborhood selection method (Meinshausen and Bühlmann, 2006) and LP based methods (Cai et al., 2011; Yuan, 2011). In general, solvers based on penalized MLE methods, such as QUIC (Hsieh et al., 2011) and HUGE (Zhao and Liu, 2012), are faster than the others. However, these MLE methods aim to find an approximate solution quickly whereas the linear programming methods are designed to find solutions that are correct essentially to machine precision. The comparison of classification performance can be found in Cai et al. (2011) and it is shown that CLIME uniformly outperforms the MLE methods. Because of the good theoretical properties shown by CLIME, we would like to develop a fast algorithm for implementing this method which serves as an important building block for more sophisticated learning algorithms. The CLIME solves the following optimization problem min Ω 1 subject to: Σ n Ω I d max λ and Ω R d d, (4) where I d is the d-dimensional identity matrix, Σ n is the sample covariance matrix, and λ > 0 is a tuning parameter. Here Ω 1 = j,k Ω j,k and max is the elementwise sup-norm. This minimization problem can be further decomposed into d smaller problems, which allows us to recover the precision matrix in a column by column fashion. For the i-th subproblem, we get the i-th column of Ω, denoted as ˆβ, by solving min β 1 subject to: Σ n β e i λ and β R d, (5) where β 1 = d j=1 β j and e i R d is the i-th basis vector. The original clime package manually sets a default path for λ and solves the LP problem for each different value of λ. In this paper, we propose to use the PSM to solve this problem more efficiently. CLIME can be easily formulated in parametric simplex LP form. Let β + and β be the positive and negative parts of β. Since β = β + β and β 1 = β + + β, equation (5) becomes: ( ) ( ) ( ) Σn Σ min β + + β subject to: n β + λ + ei Σ n Σ n β (6) λ e i Comparing (3) and (6), we can give the following identification: ( ) ( ) 1 1 0 Σn Σ A = n ei b = c = Σ n Σ n e. b =. c =. i 1 1 0 The path of λ defined by the PSM corresponds to the path of λ as described in CLIME. Therefore, CLIME can be solved efficiently by the PSM; furthermore, when the optimal solution is sparse, the parametric simplex is able to find the optimal solution after very few iterations. 3. User Interface: An Example In this section, we present and describe a short example when using fastclime to estimate the precision matrix. See package vignette (Pang et al., 2013) for the full documentation and examples of how to use the PSM LP solvers. Our package fastclime consists of two major parts: a pair of LP solvers, and a series of functions used to estimate precision matrix: a data generator, a graph estimator, and a data visualizer. The data generator generates 3

Pang, Liu and Vanderbei multivariate Gaussian data with different graph structures. The graph estimator is the main function used to estimate the graph. The data visualizer provides a method to plot the undirected graph at each path. The user interface for estimating the precision matrix is illustrated by the following example. We first generate 200 samples from a 50-dimensional Gaussian distribution with hub structure. The program automatically sets up a sequence of regularization parameters λ and estimates the graph path. > library(fastclime) # Load the package > L = fastclime.generator(n=200,d=50,graph="hub") # Generate data > out = fastclime(l$data) # Estimate solution path > fastclime.roc(out$path,l$theta) # Plot ROC Curve > fastclime.plot(out$path[[3]]) # Plot solution path 4. Performance Benchmark For our experiments we focused solely on CLIME. We compare the timing performance of our package with the packages flare and clime. Flare uses the Alternating Direction Method of Multipliers (ADMM) algorithm to evaluate CLIME (Li et al., 2012), whereas clime solves a sequence of LP problems for a certain specific set of values of λ. As explained in Section 1, our method calculates the solution for all λ, while flare and clime use a discrete set of λ values as specified in the function. We fix the sample size n to be 200 and vary the data dimension d from 50 to 800. We generate our data using fastclime.generator, without any particular data structures. Clime and fastclime are based on algorithms that solve problems to machine precision (10 5 ). Flare, on the other hand, is an ADMM-based algorithm that stops when the change from one iteration to the next drops below the same threshold. As shown in Table 1, fastclime performances significantly faster than clime when d equals 50 and 100. When d becomes large, we are not able to obtain results from clime in one hour. We also notice that, in most cases, fastclime performances consistently better than flare, and it has a smaller deviation compared with flare. The reason fastclime outperforms the other methods is primarily because the PSM only solves one LP problem to get the entire solution path for all λ quickly and without using much memory. The code is implemented on a i5-3320 2.6GHz computer with 8G RAM, and the R version used is 2.15.0. Table 1: Average Timing Performance of Three Solvers in Seconds Method d=50 d=100 d=200 d=400 d=800 clime 103.52(9.11) 937.37(6.77) N/A N/A N/A flare 0.632(0.335) 1.886(0.755) 10.770(0.184) 74.106(33.940) 763.632(135.724) fastclime 0.248(0.0148) 0.928(0.0268) 9.928(3.702) 53.038(1.488) 386.880(58.210) 5. Summary and Acknowledgement We developed a new package named fastclime, for solving linear programming problems with a relaxation parameter and high dimensional sparse precision matrix estimation. We plan to maintain and support this package in the future. Han Liu is supported by NSF grant IIS-1116730 and IIS-1332109. Haotian Pang and Robert Vanderbei are supported by a grant from ONR. 4

The fastclime Package for Large-Scale Precision Matrix Estimation in R References O. Banerjee, L. E. Ghaoui, and A. d Aspremont. Model selection through sparse maximum likelihood estimation. Journal of Machine Learning Research, 9:485 516, 2008. T. Cai, W. Liu, and X. Luo. A constrained l 1 minimization approach to sparse precision matrix estimation. J. American Statistical Association, 106:594 607, 2011. J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied Statistics, 1(2):302 332, 2007a. J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432 441, 2007b. J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 2010. C-J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. Ravikumar. Sparse inverse covariance matrix estimation using quadratic approximation. Advances in Neural Information Processing Systems, 24, 2011. X. Li, T. Zhao, X. Yuan, and H. Liu. An R package flare for high dimensional linear regression and precision matrix estimator. R Package Vigette, 2012. N. Meinshausen and P. Bühlmann. High dimensional graphs and variable selection with the lasso. Annals of Statistics, 34(3):1436 1462, 2006. H. Pang, H. Liu, and R. Vanderbei. The fastclime package for linear programming and constrained l1-minimization approach to sparse precision matrix estimation in R. R Package Vigette, 2013. R. Vanderbei. Linear Programming, Fundations and Extensions. Springer, 2008. M. Yuan. High dimensional inverse covariance matrix estimation via linear programming. Journal of Machine Learning Research, 11:2261 2286, 2011. T. Zhao and H. Liu. The huge package for high-dimensional undirected graph estimation in R. Journal of Machine Learning Research, 13:1059 1062, 2012. 5