Sparse Screening for Exact Data Reduction

Size: px

Start display at page:

Download "Sparse Screening for Exact Data Reduction"

Virgil Lawson
6 years ago
Views:

1 Sparse Screening for Exact Data Reduction Jieping Ye Arizona State University Joint work with Jie Wang and Jun Liu 1

2 wide data 2 tall data

3 How to do exact data reduction? The model learnt from the reduced data is identical to the model learnt from the full data: q Lasso for wide data (feature reduction) q SVM for tall data (sample reduction) 3

4 4 Center for Evolutionary Medicine and Informatics

5 Lasso/Basis Pursuit Center for Evolutionary Medicine and Informatics (Tibshirani, 1996, Chen, Donoho, and Saunders, 1999) x y A z + = n 1 n p p 1 n 1 Simultaneous feature selection and regression 5

6 Center for Evolutionary Medicine and Informatics Imaging Genetics (Thompson et al. 2013) 6

7 Sparse Reduced-Rank Regression 7 Vounou et al. (2010, 2012)

8 Structured Sparse Models Group Lasso Fused Lasso 8 Graph Lasso Tree Lasso

9 Center for Evolutionary Medicine and Informatics important modeling Sparsity has become an tool in genomics, genetics, signal and audio processing, image processing, neuroscience (theory of sparse coding), machine learning, statistics 9

10 Optimization Algorithms min loss(x) + λ penalty(x) Coordinate descent Subgradient descent Augmented Lagrangian Method Gradient descent Accelerated gradient descent 10

11 Lasso Fused Lasso Group Lasso Sparse Group Lasso Tree Structured Group Lasso Overlapping Group Lasso Sparse Inverse Covariance Estimation Trace Norm Minimization 11

12 More Efficiency? Very high dimensional data Non-smooth sparsity-induced norms Multiple runs in model selection A large number of runs in permutation test 12

13 How to make any existing Lasso solver much more efficient? 13

14 Data Reduction/Compression 1M 1K original data reduced data 14

15 Data Reduction Heuristic-based data reduction Sure screening, random projection/selection Resulting model is an approximation of the true model Propose data reduction methods Exact data reduction via sparse screening The model based on reduced data is identical to the one constructed from complete data 15

16 Sparse Screening 1M without screening 1M 1K same solution with screening 16

17 Large-Scale Sparse Screening

18 Screening Rule: Motivation

19 Large-Scale Sparse Screening (Cont d)

20 More on the Dual Formulation Solving the dual formulation is difficult Providing a good (not exact) estimate of the optimal dual solution is easier A good estimate of the optimal dual solution is sufficient for effective feature screening 20

21 Screening Rule 21

22 Sketch of Sparse Screening 22

23 How to Estimate the Region Θ? Non-expansiveness: J. Wang et al. NIPS 13; J. Liu et al. ICML 14

24 Results on MNIST along a sequence of 100 parameter values along the λ/λ max scale from 0.05 to 1. The data matrix is of size 784x50,000 24

25 Evaluation on MNIST solver SAFE DPP EDPP SDPP time (s) SDPP EDPP DPP SAFE Speedup

26 Evaluation on ADNI Problem: GWAS to MRI ROI prediction (ADNI) The size of the data matrix is 747 by Method ROI3 ROI8 ROI30 ROI69 ROI76 ROI83 Lasso Solver SR SR+Lasso EDDP EDDP+Lasso Running time (in seconds) of the Lasso solver, strong rule (Tibshriani et al, 2012), and EDPP. The parameter sequence contains 100 values along the log λ/λ max scale from 100 log 0.95 to log 0.95.

27 Sparse Screening Extensions Group Lasso J Wang, J Liu, J Ye. Efficient Mixed-Norm Regularization: Algorithms and Safe Screening Methods. arxiv preprint arxiv: Sparse Logistic Regression J Wang, J Zhou, P Wonka, J Ye. A Safe Screening Rule for Sparse Logistic Regression. arxiv preprint arxiv: Sparse Inverse Covariance Estimation S Huang, J Li, L Sun, J Liu, T Wu, K Chen, A Fleisher, E Reiman, J Ye. Learning brain connectivity of Alzheimer s disease by exploratory graphical models. NeuroImage 50, Witten, Friedman and Simon (2011), Mazumder and Hastie (2012) Multiple Graphical Lasso 27 S Yang, Z Pan, X Shen, P Wonka, J Ye. Fused Multiple Graphical Lasso. arxiv preprint arxiv:

28 Wide versus Tall Data wide data 28 tall data

29 Support Vector Machines SVM is a maximum margin classicier. denotes +1 denotes Margin

30 Support Vectors SVM is determined by the so- called support vectors. denotes +1 denotes - 1 Support Vectors are those data points that the margin pushes up against The non- support vectors are irrelevant to the classicier. Can we make use of this observation? 30

31 The Idea of Sample Screening Original Problem Screening Smaller Problem to Solve 31

32 Guidelines for Sample Screening 32 J. Wang, P. Wonka, and J. Ye. ICML 14.

33 Relaxed Guidelines 33

34 Sketch of SVM Screening 34

35 Synthetic Studies We use the rejection rates to measure the performance of the screening rules, the ratio of the number of data instances whose membership can be identicied by the rule to the total number of data instances. 35

36 Performance of DVI for SVM on Real Data Sets Comparison of SSNSV (Ogawa et al., ICML 13), ESSNSV and DVIs for SVM on three real data sets. IJCNN,, Speedup Wine,, Speedup Covertype,, Speedup Solver Total Solver Total Solver Total SSNSV 2.08 SSNSV 0.02 SSNSV 2.73 Solver + SSNSV Init Total Solver + SSNSV Init Total Solver + SSNSV Init Total Solver + ESSNS V ESSNSV 2.09 Init Total Solver + ESSNS V ESSNSV 0.03 Init Total Solver + ESSNS V ESSNSV 2.89 Init Total Solver + DVI DVI 0.99 Init Solver + DVI DVI 0.01 Init Solver + DVI DVI 1.27 Init Total Total Total

37 Experiments on Real Data Sets Comparison of SSNSV (Ogawa et al., ICML 13), ESSNSV and DVIs for LAD on three real data sets. Telescope,, Speedup Computer,, Speedup Telescope,, Speedup Solver Total Solver Total 5.85 Solver Total DVI 0.28 DVI 0.08 DVI 0.06 Solver + DVI Init Total Solver + DVI Init Total Solver + DVI Init. 0.1 Total

38 Summary Developed exact data reduction approaches Exact data reduction via feature screening Exact data reduction via sample screening The model based on reduced data is identical to the one constructed from complete data Results show screening leads to a significant speedup. Extend exact data reduction to other sparse learning formulations 38 Sparsity on features, samples, networks etc

39 Resource Tutorial webpages of our screening rules, which include sample codes, implementation instructions, illustration materials, etc. Seven lines implementation of EDPP rule 39 The list is growing quickly

40 40 Center for Evolutionary Medicine and Informatics

Mining Sparse Representations: Theory, Algorithms, and Applications

Mining Sparse Representations: Theory, Algorithms, and Applications Jun Liu, Shuiwang Ji, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University What is Sparsity?