Metric Learning Applied for Automatic Large Image Classification

Size: px

Start display at page:

Download "Metric Learning Applied for Automatic Large Image Classification"

Darleen Fleming
5 years ago
Views:

1 September, 2014 UPC Metric Learning Applied for Automatic Large Image Classification Supervisors SAHILU WENDESON / IT4BI TOON CALDERS (PhD)/ULB SALIM JOUILI (PhD)/EuraNova

2 Image Database Classification How? Depends on Using K-Nearest Neighbor (knn) Quality of Distance Measure 2

3 k Nearest Neighbor (knn) Classifier Depends on the distance measure k neighbors choose based on Euclidean distance measure (k=1) (k=5) Q 3

4 Outline 1) METRIC LEARNING 2) INDEXING 3) OBJECTIVES and CONTRIBUTIONS 4) EXPERIMENT RESULTS 5) DISCUSSION,CONCLUSION and FUTURE WORKS

5 1. METRIC LEARNING To maximize accuracy of knn by learning the distance measure Traditional metric space (Euclidean )lacks 1 : Consider correlation between features To provide curved as well as linear decision boundaries Metric Learning used to solve these limitations using Mahalanobis metric space 1 R. O. Duda, "Pattern Recognition for HCI," Department of Electrical Engineering San Jose State University, p

6 Mahalanobis Metric Space Where is the cone of symmetric PSD Euclidean Space Mahalanobis Space 1 Using Cholesky decomposition : And rewrite d M As follow: 1 6

7 Euclidean Vs Mahanalobis Space Age Euclidean Space Age Mahalanalobis Space M M=G T G Weight Weight 7

Metric Learning Algorithms Metric learning algorithms approaches: Driven by Nearest Neighbors Information -theoretical Online Etc Nearest Neighbor approaches

8 Metric Learning Algorithms Metric learning algorithms approaches: Driven by Nearest Neighbors Information -theoretical Online Etc Nearest Neighbor approaches MMC, Xing et al. (2002) NCA, leave-one out cross validation (LOO), Goldberger et al.(2004) MCML, Globerson and Roweis (2005) LMNN, Weinberger et al. (2005) 8

9 Euclidean Metric LMNN Local neighborhood Mahanalobis Metric G M=G T G Target Neighbor Impostors Margin 9

10 Summary LMNN 1 Scales to large datasets Has fast test-time performance Convex optimization Can solve efficiently No assumption about the data Number of target, prior assignment Euclidean Space, Multi-Pass LMNN 1 Sensitive to outliers Dimension Reduction Principle Component Analysis (PCA)

11 PCA Dimension Reduction In the mean-square error sense Linear dimension reduction Based on covariance matrix of the variables Used to reduce computation time and avoid overfitting PCA 11

12 Test, 30% Dataset (labeled ) Build, 70% Training set Metric Learning Testing set Normalize and dimension reduction Plug LMNN Best PSD, M=G T G Build Model Model Build model using LMNN Test model Evaluation Intra/Inter distance ratio knn Error ratio Model 12

13 Intra/Inter Distance Ratio intra/inter ratio Mahalanobis Euclidean E Class Mnist, has 10 classes, intra/inter ratio, number of target = 3 13

14 Test, 30% Dataset (labeled ) Build, 70% Training set Metric Learning Testing set Normalize and dimension reduction Plug LMNN Best PSD, M=G T G Build Model Model Build model using LMNN Test model Evaluation Intra/Inter distance ratio knn Error ratio Model 14

15 knn Error Ratio Mnist Dat tasets ISOLET Bal Faces Mahanalobis Euclidean Iris Error rate Error rate LMNN Vs Euclidean Metrics, (k =5), number of target = 3 15

16 Statistics Mnist Letters Isolet Bal Wines Iris #inputs #features #reduced dimensions #training examples #testing examples #classes knn Comparison Euclidean 2, PCA RCA MMC N/A N/A N/A NCA N/A N/A N/A LMNN PCA Multiple Passes Weinberger, K. Q. and L. K. Saul (2009). "Distance metric learning for large margin nearest neighbor classification." The Journal of Machine Learning Research 10:

17 Image Database Classification using knn Intractable Solution Time Complexity Approximate Nearest Neighbor (ANN) 17

18 Out-line 1) METRIC LEARNING 2) INDEXING 3) OBJECTIVES and CONTRIBUTIONS 4) EXPERIMENT RESULTS 5) DISCUSSION,CONCLUSION and FUTURE WORKS

19 2. Locality Sensitive Hashing Idea: hash functions that similar objects are more likely tohavethesamehash 1,Sub-lineartimesearch Hashing methods to do fast ApproximateNearest Neighbor (ANN) Search, :-approximation ratio, P=4, LSHs have been designed for Cosine Similarity L P Distance Measure Hamming distance Jaccard index for set similarity 1 [Indyk-Motwani 98] Q r, radius 19

20 Example, LSH Take random projections of data Quantize each projection with few bits Feature vector

21 Cosine Similarity LSH r is d-dimensional random hyperplane, Gaussian distribution Basic Hashing Function 1 Learned Hashing Function h r1 rb series of b randomized LSH functions Image database G h r1 r4 Q h r1 r4 Q Both cases Colliding instances are searched <<n 1 Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. In CVPR,

22 Euclidean Space Hashing Basic Euclidean Space Learned Euclidean Space Where a is a d-dimensional vector chosen independently from a p-stable distribution, Chose random line and partition into equi-width segments w, and bisarealnumberchosenrandomlyfromrange[0,w] To guarantee accuracy, L hash table(s) are used to probe near neighbors in each Buckets, under K hash functions. 22

23 Euclidean Space Hashing Image Database No indexing involved L = 3, number of hash Table K, number of hash function L 1 Hash Table Key Values X Y W R S L 2 Hash Table Key Values X Y R S key Y W R L 3 Hash Table Values Q 23

24 Out-line 1) METRIC LEARNING 2) INDEXING 3) OBJECTIVES and CONTRIBUTIONS 4) EXPERIMENT RESULTS 5) DISCUSSION,CONCLUSION and FUTURE WORKS

25 3. OBJECTIVES and CONTRIBUTIONS The main objectives of this thesis are:- To study and implement metric learning algorithm dimension reduction technique LSH in different metric space To establish and implement machine learning evaluation techniques The original contribution of the thesis are Formulate a fresh learned approach for both Cosine similarity and Euclidean metric space hashing 25

26 Out-line 1) METRIC LEARNING 2) INDEXING 3) OBJECTIVES and CONTRIBUTIONS 4) EXPERIMENTAL RESULTS 5) DISCUSSION,CONCLUSION and FUTURE WORKS

27 Test, 10% Dataset (labeled ) Build, 90% Training set Metric Learning Query set Normalize and dimension reduction Plug LMNN Decompose, M Best PSD, M Hashing (LSH) Cosine Similarity Hashing Euclidean Space Hashing Transform, M=G T G Learned Basic Original Learned Test model Model Evaluation Time Complexity Computational Complexity Query Accuracy 27

28 Time Complexity Exhaustive Vs Euclidean Space Hashing, 3NN time (Msecond) Exhaustive Euclidean Hashing LetterRecognition Isolet Mnist Datasets 49 Dataset LetterRecognitioon Isolet Mnist Instances 20, ,000 Dimension

29 Computation Complexity We use to guarantee searching of NN Steps CosineLSH CosineLSH +LMNN E2LSH LMNN + E2LSH Metric learning (offline) projection Euclidean LMNN O(d) O(d 2 ) O(d) O(d 2 ) O(d) O(d 2 ) Hash functions O(b) O(b) O(Lk) O(Lk) O(0) O(0) Signature (to represent the data point) O(1) O(1) O(L) O(L) O(1) O(1) Hashing: compute O(bd) O(bd) O(dLk) O(dLk) O(0) O(0) Search: identity the query's ANNs O(Md) O(Md) O(LMd) O(LMd) O(dN) O(dN) M. S. Charikar, "Similarity estimation techniques from rounding algorithms," in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, 2002, pp

30 Test, 10% Dataset (labeled ) Build, 90% Training set Metric Learning Query set Normalize and dimension reduction Plug LMNN Decompose, M Best PSD, M Hashing (LSH) Cosine Similarity Hashing Euclidean Space Hashing Transform, M=G T G Learned Randomized Original Learned Test model Evaluation Time Complexity Computational Complexity Query Accuracy Model K = 10 30

31 Accuracy Rate(Cosine Similarity Hashing) 1 ISOLET 0.8 Accuracy Rate Randomized Learned Bit 31

32 Accuracy Rate(Cosine Similarity Hashing) 1 MNIST 0.8 Accuracy Rate Randomized Learned Bit 32

33 Accuracy Rate(Cosine Similarity Hashing) 0.8 Cifer Accuracy Rate 0.4 Randomized Learned Bit 33

34 Accuracy Rate(Euclidean Space Hashing) Datasets cifar-100 Accuracy Rate: E2LSH Vs LMNN+E2LSH Letter Mnist ISOLET OliverFaces Accuracy rate LMNN+E2LSH E2LSH 34

35 Summary Time Euclidean LMNN Exhaustive techniques E2LSH LMNN + E2LSH Euclidean space hashing Basic Cosine Learned Cosine Cosine space hashing knn classification accuracy Accuracy 35

36 Out-line 1) METRIC LEARNING 2) INDEXING 3) OBJECTIVES and CONTRIBUTIONS 4) EXPERIMENTAL RESULTS 5) DISCUSSION,CONCLUSION and FUTURE WORKS

37 5. DISCUSSION and CONCLUSION Metric Learning, is used to learn metric distance using Mahanalobis metric space, LMNN E2LSH outperforms both unlearned and learned Cosine similarity hashing. Incorporating metric learning algorithm (LMNN) into both metric space hashing (Cosine and Euclidean, E2LSH) has a competence to improve the performance significantly. LMNN into E2LSH (LMNN +E2LSH), improves E2LSH 37

38 5. DISCUSSION and CONCLUSION The main goal of this research is to devise classifier by breeding metric learning algorithm and hashing technique in the context of large-scale image classification. Java, Eclipse IDE used to implement 38

39 FUTURE WORKS Future Future work(s) Extend this research work by adding feature extraction technique on top, to set up input data Use other LMNN extension algorithms and compare results of this thesis. Propose to advance this study using unsupervised (clustering) metric learning algorithms. 39

40 THANK YOU!!! Q A 40

Metric Learning Applied for Automatic large Scale Image Classification

Universitat Politècnica de Catalunya (UPC) Master in Information Technology for Business Intelligence Metric Learning Applied for Automatic large Scale Image Classification THESIS SUBMITTED IN PARTIAL