Learning Compact and Effective Distance Metrics with Diversity Regularization. Pengtao Xie. Carnegie Mellon University

Size: px

Start display at page:

Download "Learning Compact and Effective Distance Metrics with Diversity Regularization. Pengtao Xie. Carnegie Mellon University"

Cleopatra Nicholson
5 years ago
Views:

1 Learning Compact and Effective Distance Metrics with Diversity Regularization Pengtao Xie Carnegie Mellon University 1

2 Distance Metric Learning Similar Dissimilar Distance Metric Wide applications in retrieval, clustering and classification 2

3 Distance Metric Learning Mahalanobis distance Formulation T ( x y) M ( x y) 3

4 Distance Metric Learning Reparametrization M A A Projection matrix T = T ( x y) M ( x y) Ax Ay A R k d 2 Formulation 4

5 Tradeoff between Effectiveness and Efficiency Small k Large k High computational efficiency Low effectiveness Low computational efficiency High effectiveness Can we achieve the best of both worlds? Small k Low computational efficiency High effectiveness 5

6 Insight from Principal Component Analysis Without orthogonal constraint With orthogonal constraint 6

7 Distance Metric Learning 32 Latent Factors 7

8 Diversify the Latent Factors 23 Latent Factors 8

9 Diversity Regularized DML Goal: encourage the latent factors to spread out to improve the coverage of long-tail topics Approach: Define a metric to measure the diversity of latent factors Use the diversity metric to regularize the learning of latent factors 9

10 Diversity Metric [Xie et.al, KDD15] Measure the dissimilarity between two vectors Measure the diversity of a vector set 10

11 Dissimilarity between two vectors Invariant to scale, translation, rotation and orientation of the two vectors Euclidean distance, L1 distance Variant to scale O O Cosine similarity Variant to orientation O O 11

12 Dissimilarity between two vectors Non-obtuse angle O θ O θ O θ Invariant to scale, translation, rotation and orientation of the two vectors Definition arccos x y x y 12

13 Measure the diversity of a vector set Based on the pairwise dissimilarity measure between vectors The diversity of a set of vectors A a where K i i 1 ( A) mean( ) var( ) i K, j K ij i 1, j 1 ij arccos is defined as Mean: summarize how these vectors are different from each other on the whole Variance: encourage the vectors to evenly spread out a a i i a a j j 13

14 Diversity Regularized DML 14

15 Optimization Relax the constraints Eliminate the constraints 15

16 Optimization Reparametrization A ~ diag( g) A gi a i 16

17 Optimization Fix A ~, optimize g Fix g, optimize A ~ 17

18 Optimization Lower bound ~ ~ ~ T ~ ~ T ~ ( A) ( A) arcsin( det( A A)) ( arcsin( det( A A)))

19 Theorem Maximizing the lower bound with projected gradient ascent (PGA) can increase the diversity metric Maximizing the lower bound with PGA can increase the mean of the angles Maximizing the lower bound with PGA can reduce the variance of the angles 19

20 Geometry Interpretation The gradient of the lower bound w.r.t complement of the space spanned by a i is in the orthogonal, a, 2 a 1, a K a i

21 Geometry Interpretation The gradient of the lower bound w.r.t complement of the space spanned by a i is in the orthogonal, a, 2 a 1, a K a i

22 Experiments Datasets Baselines Euclidean distance (EUC); Distance Metric Learning (DML); Large Margin Nearest Neighbor (LMNN) DML; Information Theoretical Metric Learning (ITML); Distance Metric Learning with Eigenvalue Optimization (DML-eig); Information-theoretic Semisupervised Metric Learning via Entropy Regularization (Seraph) Evaluation Feature Dim. #training data #data pairs 20-News K 200K 15-Scenes K 200K 6-Activities K 200K Retrieval: precision Clustering: accuracy and normalized mutual information Classification: accuracy

23 Precision (%) Precision (%) Precisison (%) Retrieval Precision Retrieval Precision on 20-News Retrieval Precision on 15-Scenes DML DDML DML DDML Number of latent factors k Number of latent factors k Retrieval Precision on 6-Activities Number of latent factors k DML DDML

24 Retrieval Precision 20-News 15-Scenes 6-Activities EUC DML LMNN ITML DML-eig Seraph DDML

25 Accuracy (%) Accuracy (%) Accuracy (%) Clustering Accuracy Clustering Accuracy on 20-News Clustering Accuracy on 15-Scenes Number of latent factors k DML DDML Number of latent factors k DML DDML Clustering Accuracy on 6-Activities DML DDML Number of latent factors k

26 Clustering Accuracy 20-News 15-Scenes 6-Activities EUC DML LMNN ITML DML-eig Seraph DDML

27 NMI (%) NMI (%) NMI (%) Clustering Normalized Mutual Information Clustering NMI on 20-News Clustering NMI on 15-Scenes Number of latent factors k DML DDML Number of latent factors k DML DDML Clustering NMI on 6-Activities Number of latent factors k DML DDML

28 Clustering Normalized Mutual Information 20-News 15-Scenes 6-Activities EUC DML LMNN ITML DML-eig Seraph DDML

29 Accuracy (%) Accuracy (%) Accuracy (%) NN Classification Accuracy 3-NN Accuracy on 20-News DML DDML Number of latent factors k NN Accuracy on 15-Scenes Number of latent factors k DML DDML NN Accuracy on 6-Activities Number of latent factors k DML DDML

30 3-NN Classification Accuracy 20-News 15-Scenes 6-Activities EUC DML LMNN ITML DML-eig Seraph DDML

31 Accuracy (%) Accuracy (%) Accuracy (%) 10-NN Classification Accuracy 10-NN Accuracy on 20-News 10-NN Accuracy on 15-Scenes Number of latent factors k DML DDML Number of latent factors k DML DDML NN Accuracy on 6-Activities DML DDML Number of latent factors k

32 10-NN Classification Accuracy 20-News 15-Scenes 6-Activities EUC DML LMNN ITML DML-eig Seraph DDML

33 Sensitivity to Parameters Sensitivity of DDML to the tradeoff parameter λ on (a) 20-News dataset (b) 15-Scenes dataset (c) 6-Activities dataset

34 Sensitivity to Parameters Sensitivity of DDML to the number of latent factors k on (a) 20- News dataset (b) 15-Scenes dataset (c) 6-Activities dataset

35 Conclusions Problem Learn compact and effective distance metrics Keep the number of latent factors to be small for the sake of computational efficiency, meanwhile retain effectiveness Solution Impose a diversity regularizer over the latent factors to encourage them to be uncorrelated Each factor can capture some unique information that is hard to be captured by other factors A small amount of latent factors can be sufficient to capture a large proportion of information Results Experiments demonstrate that a small amount of factors learned with diversity regularization can achieve comparable or even better performance compared with a large factor set learned without regularization

36 Thank you! Questions? 36

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,