Johnson-Lindenstrauss Lemma, Random Projection, and Applications

Size: px

Start display at page:

Download "Johnson-Lindenstrauss Lemma, Random Projection, and Applications"

Cora Rogers
5 years ago
Views:

1 Johnson-Lindenstrauss Lemma, Random Projection, and Applications Hu Ding Computer Science and Engineering, Michigan State University

2 JL-Lemma The original version: Given a set P of n points in R d, let k = O(log n/ɛ 2 ), then there exists a Lipshcitz mapping f from R d to R k such that for all u, v P, (1 ɛ) u v 2 f (u) f (v) 2 (1 + ɛ) u v 2.

3 JL-Lemma The original version: Given a set P of n points in R d, let k = O(log n/ɛ 2 ), then there exists a Lipshcitz mapping f from R d to R k such that for all u, v P, (1 ɛ) u v 2 f (u) f (v) 2 (1 + ɛ) u v 2. The algorithmic version using Gaussian: Given a set P of n points in R d, let k = O(log n/ɛ 2 ) and A R k d with each entry independently sampled from N(0, 1), then with high probability for all u, v P, (1 ɛ) u v 2 1 k Au 1 k Av 2 (1 + ɛ) u v 2. An Elementary Proof of a Theorem of Johnson and Lindenstrauss by Dasgupta and Gupta.

4 JL-Lemma Database-friendly JL-Transform: only use ±1 and 0, easy by summations and subtractions. Database-friendly random projections: Johnson-Lindenstrauss with binary coins by Achlioptas.

5 JL-Lemma Database-friendly JL-Transform: only use ±1 and 0, easy by summations and subtractions. Database-friendly random projections: Johnson-Lindenstrauss with binary coins by Achlioptas. Fast JL-Transform: densify x but sparsify S, and speedup JL-Transform. Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform by Ailon and Chazelle.

6 Lower time complexity. Comparing To PCA

7 Lower time complexity. Data oblivious: Comparing To PCA

8 Comparing To PCA Lower time complexity. Data oblivious: 1 Distributed computing: the data matrix X is divided into multiple parties (column or row partition).

9 Comparing To PCA Lower time complexity. Data oblivious: 1 Distributed computing: the data matrix X is divided into multiple parties (column or row partition). 2 Privacy preserving.

10 Comparing To PCA Lower time complexity. Data oblivious: 1 Distributed computing: the data matrix X is divided into multiple parties (column or row partition). 2 Privacy preserving. 3 Streaming data.

11 Application I: Large Matrix Improved Approximation Algorithms for Large Matrices via Random Projections by Sarlós.

12 Application I: Large Matrix Improved Approximation Algorithms for Large Matrices via Random Projections by Sarlós. Matrix multiplication: A B = AS T SB.

13 Application I: Large Matrix Improved Approximation Algorithms for Large Matrices via Random Projections by Sarlós. Matrix multiplication: A B = AS T SB. Linear regression: b Ax = Sb SAx, find more new results in the tutorial Sketching as a Tool for Numerical Linear Algebra by Woodruff.

14 Application I: Large Matrix Improved Approximation Algorithms for Large Matrices via Random Projections by Sarlós. Matrix multiplication: A B = AS T SB. Linear regression: b Ax = Sb SAx, find more new results in the tutorial Sketching as a Tool for Numerical Linear Algebra by Woodruff. SVD: A = SA, also can speedup PCA.

15 Application II: k-means k-means is equivalent to minimizing the total within-cluster squared pairwise distances, why?

16 Application II: k-means k-means is equivalent to minimizing the total within-cluster squared pairwise distances, why? Use JL-Transform to reduce the dimensionality, and run k-means in the low dimensional space:

17 Application II: k-means k-means is equivalent to minimizing the total within-cluster squared pairwise distances, why? Use JL-Transform to reduce the dimensionality, and run k-means in the low dimensional space: 1 Directly use JL-Lemma, O(log n/ɛ 2 ) for (1 + ɛ)-approximation.

18 Application II: k-means k-means is equivalent to minimizing the total within-cluster squared pairwise distances, why? Use JL-Transform to reduce the dimensionality, and run k-means in the low dimensional space: 1 Directly use JL-Lemma, O(log n/ɛ 2 ) for (1 + ɛ)-approximation. 2 Through SVD and JL-Lemma, O(k/ɛ 2 ) for (2 + ɛ)-approximation Random Projections for k-means Clustering by Boutsidis et al.

19 Application II: k-means k-means is equivalent to minimizing the total within-cluster squared pairwise distances, why? Use JL-Transform to reduce the dimensionality, and run k-means in the low dimensional space: 1 Directly use JL-Lemma, O(log n/ɛ 2 ) for (1 + ɛ)-approximation. 2 Through SVD and JL-Lemma, O(k/ɛ 2 ) for (2 + ɛ)-approximation Random Projections for k-means Clustering by Boutsidis et al. 3 Recently, (1) O(k/ɛ 2 ) for (1 + ɛ)-approximation and (2) O(log k/ɛ 2 ) for (9 + ɛ)-approximation Dimensionality Reduction for k-means Clustering and Low Rank Approximation by Cohen et al.

20 Application III: SVM Support Vector Machine (SVM) actually is equivalent to a high dimensional polytope distance problem (we will particularly talk about it in later lecture).

21 Application III: SVM Support Vector Machine (SVM) actually is equivalent to a high dimensional polytope distance problem (we will particularly talk about it in later lecture). So it is natural to use JL-Transform to reduce the dimensionality and approximately preserve the pairwise distances. Random Projections for Linear Support Vector Machines by Paul et al.

22 Application IV: Compressive Sensing Usually a signal can be represented as a sparse vector (e.g., after Fourier transformation). Define a k-sparse vector x as the vector having at most k non-zero entries, k d. The Johnson-Lindenstrauss Lemma Meets Compressed Sensing by Baraniuk et al.

23 Application V: Wireless Sensor Networks Use JL-Transform to reduce the communication complexity: Compressive data gathering for large-scale wireless sensor networks by Luo et al. Towards Distributed Ensemble Clustering for Networked Sensing Systems: A Novel Geometric Approach by Ding et al.

24 Application VI: Beyond JL-Lemma Actually, random projection can be viewed as a probe to guess a huge object, intuitively similar to x-ray tomography.

25 Application VI: Beyond JL-Lemma Actually, random projection can be viewed as a probe to guess a huge object, intuitively similar to x-ray tomography. High dimensional nearest neighbor search (a very important topic, we will talk about the details in later lecture).

26 Application VI: Beyond JL-Lemma Actually, random projection can be viewed as a probe to guess a huge object, intuitively similar to x-ray tomography. High dimensional nearest neighbor search (a very important topic, we will talk about the details in later lecture). Manifold learning. Random projection trees and low dimensional manifolds by Dasgupta and Freund.

27 Application VI: Beyond JL-Lemma Actually, random projection can be viewed as a probe to guess a huge object, intuitively similar to x-ray tomography. High dimensional nearest neighbor search (a very important topic, we will talk about the details in later lecture). Manifold learning. Random projection trees and low dimensional manifolds by Dasgupta and Freund. Probabilistic Inference. A Hybrid Approach for Probabilistic Inference using Random Projections by Zhu and Ermon.

28 Thank You! Any Question?

Package RandPro. January 10, 2018

Package RandPro. January 10, 2018 Type Package Title Random Projection with Classification Version 0.2.0 Author Aghila G, Siddharth R Package RandPro January 10, 2018 Maintainer Siddharth R Description Performs