Metric Learning for Large-Scale Image Classification:

Similar documents
Metric Learning for Large Scale Image Classification:

Distance-Based Image Classification: Generalizing to new classes at near-zero cost

Improved Fisher Vector for Large Scale Image Classification XRCE's participation for ILSVRC

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Large-scale visual recognition Efficient matching

Aggregating Descriptors with Local Gaussian Metrics

Metric learning approaches! for image annotation! and face recognition!

Fisher vector image representation

Face2Face Comparing faces with applications Patrick Pérez. Inria, Rennes 2 Oct. 2014

3D Object Recognition using Multiclass SVM-KNN

Mixtures of Gaussians and Advanced Feature Encoding

Instance-based Learning

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

Global Metric Learning by Gradient Descent

ECG782: Multidimensional Digital Signal Processing

Network Traffic Measurements and Analysis

Pattern Spotting in Historical Document Image

ILSVRC on a Smartphone

Search Engines. Information Retrieval in Practice

Unsupervised Learning

Combine the PA Algorithm with a Proximal Classifier

A Taxonomy of Semi-Supervised Learning Algorithms

Face detection and recognition. Detection Recognition Sally

Generative and discriminative classification techniques

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Class 6 Large-Scale Image Classification

Content-based image and video analysis. Machine learning

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Compressed local descriptors for fast image and video search in large databases

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Supervised Learning: Nearest Neighbors

Multi-label Classification. Jingzhou Liu Dec

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

All lecture slides will be available at CSC2515_Winter15.html

INF 4300 Classification III Anne Solberg The agenda today:

A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval

Large-scale visual recognition The bag-of-words representation

Support Vector Machines

Supervised vs unsupervised clustering

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Large scale object/scene recognition

Deep Learning for Computer Vision

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

Instance-based Learning

Part-based and local feature models for generic object recognition

Object Classification Problem

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Adaptive Binary Quantization for Fast Nearest Neighbor Search

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

A Dendrogram. Bioinformatics (Lec 17)

Metric Learning Applied for Automatic Large Image Classification

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

ECE 5424: Introduction to Machine Learning

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

1 Case study of SVM (Rob)

CSE 573: Artificial Intelligence Autumn 2010

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Discriminate Analysis

Efficient Algorithms may not be those we think

Machine Learning Basics. Sargur N. Srihari

Clustering Lecture 5: Mixture Model

Support vector machines

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Problem 1: Complexity of Update Rules for Logistic Regression

Machine Learning. Chao Lan

Trans Media Relevance Feedback for Image Autoannotation

Based on Raymond J. Mooney s slides

ImageCLEF 2011

LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning

Logistic Regression. Abstract

Segmentation: Clustering, Graph Cut and EM

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

arxiv: v1 [cs.cv] 20 Dec 2013

Classification: Linear Discriminant Functions

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Shifting from Naming to Describing: Semantic Attribute Models. Rogerio Feris, June 2014

Nearest Neighbor with KD Trees

Introduction to Artificial Intelligence

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Generative and discriminative classification

MTTS1 Dimensionality Reduction and Visualization Spring 2014, 5op Jaakko Peltonen

Recursive Similarity-Based Algorithm for Deep Learning

Machine Learning in Biology

Deep Generative Models Variational Autoencoders

ECS289: Scalable Machine Learning

Optimizing 1-Nearest Prototype Classifiers

Unsupervised Learning

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Quasi Cosine Similarity Metric Learning

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

K-Means Clustering. Sargur Srihari

CSE255 Assignment 1 Improved image-based recommendations for what not to wear dataset

Facial Expression Classification with Random Filters Feature Extraction

Deep Learning and Its Applications

27: Hybrid Graphical Models and Neural Networks

Perceptron as a graph

Transcription:

Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka 1 1 Xerox Research Centre Europe, 2 INRIA NIPS BigVision Workshop December 7, 2012 1

Motivation Real-life image datasets are always evolving: new images are added every second new labels, tags, faces and products appear over time for example: Facebook, Flickr, Twitter, Amazon... Need to annotate these items for indexing and retrieval Therefore, we are interested in methods for large-scale visual classification where we can add new images and new classes at near-zero cost on the fly 2

Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 3

Introduction Recent focus on large-scale image classification ImageNet data set [1] Currently over 14 million images, and 20 thousand classes Standard large-scale classification pipeline: High dim. features: Super Vector [3] & Fisher Vector [4] Linear 1-vs-Rest SVM classifiers [2,3,4] Stochastic Gradient Descent (SGD) training [3,4] In this work, we take features for granted and focus on the learning problem. 1. Deng et al., ImageNet: A large-scale hierarchical image database, CVPR 09 2. Deng et al., What does classifying 10,000 image categories tell us?, ECCV 10 3. Lin et al., Large-scale image classification: Fast feature extraction, CVPR 11 4. Sánchez and Perronnin, High-dimensional signature compression for large-scale image classification, CVPR 11 4

Challenges of open-ended datasets 1-vs-Rest + SGD might look ideal for our problem: 1-vs-Rest: classes are trained independently SGD: online algorithm can accomodate new data Still several issues need to be addressed: Given a new sample, feed it to all classifiers? costly and suboptimal [1] How to balance the negatives and positives? How to regularize (and choose the step-size)? We turn to distance-based classifiers. 1. Perronnin et al., Towards good practice in large-scale learning for image classification, CVPR 12 5

Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 6

Distance Based Classifiers Classify based on the distance between images, or between image and class-representatives: k-nearest Neighbors Nearest Class Mean Classification Trivial addition of new images or new classes Critically depends on the distance function 7

k-nearest Neighbor Classifier Assign an image i to the most common class among the k closest images from the training set Very flexible non-linear model Easy to integrate new images Easy to integrate new classes Expensive at test time! 8

k-nearest Neighbor Classifier Assign an image i to the most common class among the k closest images from the training set Very flexible non-linear model Easy to integrate new images Easy to integrate new classes Expensive at test time! 8

k-nearest Neighbor Classifier Assign an image i to the most common class among the k closest images from the training set Very flexible non-linear model Easy to integrate new images Easy to integrate new classes Expensive at test time! Metric Learning: Large Margin Nearest Neighbors [1] 1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS 06 8

Nearest Class Mean Classifier Assign an image i to the class with the closest class mean µ c = 1 N c i:y i =c x i c = argmin d(x, µ c ) c Very fast at test time: linear model Easy to integrate new images Easy to integrate new classes Class only represented with mean, not flexible enough? 9

Nearest Class Mean Classifier Assign an image i to the class with the closest class mean µ c = 1 N c i:y i =c x i c = argmin d(x, µ c ) c Very fast at test time: linear model Easy to integrate new images Easy to integrate new classes Class only represented with mean, not flexible enough? 9

Nearest Class Mean Classifier Assign an image i to the class with the closest class mean µ c = 1 N c i:y i =c x i c = argmin d(x, µ c ) c Very fast at test time: linear model Easy to integrate new images Easy to integrate new classes Class only represented with mean, not flexible enough? We introduce metric learning 9

Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 10

Mahalanobis Distance Learning d(x, x ) = (x x ) M(x x ) d W (x, x ) = W x W x 2 2 1. M = I Euclidean distance Likely to be suboptimal 2. M : D D Full Mahalanobis distance Huge number of parameters for large D Expensive to compute distances in O ( D 2) 3. M = W W Low-Rank Projection W : m D Controllable number of parameters: m D Allows for compression of images to only m dimensions Cheap computation of distances in O ( m 2) 11

Mahalanobis Distance Learning d(x, x ) = (x x ) M(x x ) d W (x, x ) = W x W x 2 2 1. M = I Euclidean distance Likely to be suboptimal 2. M : D D Full Mahalanobis distance Huge number of parameters for large D Expensive to compute distances in O ( D 2) 3. M = W W Low-Rank Projection W : m D Controllable number of parameters: m D Allows for compression of images to only m dimensions Cheap computation of distances in O ( m 2) 11

Mahalanobis Distance Learning d(x, x ) = (x x ) M(x x ) d W (x, x ) = W x W x 2 2 1. M = I Euclidean distance Likely to be suboptimal 2. M : D D Full Mahalanobis distance Huge number of parameters for large D Expensive to compute distances in O ( D 2) 3. M = W W Low-Rank Projection W : m D Controllable number of parameters: m D Allows for compression of images to only m dimensions Cheap computation of distances in O ( m 2) 11

NCM Metric Learning (NCMML) Probabilistic formulation using the soft-min function: p(c x) = exp d W (x, µ c ) C c =1 exp d W (x, µ c ) Corresponds to class posterior in generative model: p(x c) = N (x; µ c, Σ), with shared covariance matrix Crucial point: parameters W and {µ c, c = 1,..., C} can be learned independently on different data subsets. 12

NCM Metric Learning (NCMML) Discriminative maximum likelihood training: We maximize with respect to W : L(W ) = N ln p(y i x i ) i=1 Implicit regularization through the rank of W Stochastic Gradient Descent (SGD): at time t Pick a random sample (x t, y t ) Update: W (t) = W (t 1) + η t W =W (t 1) ln p(y t x t ) mini-batch more efficient 13

Illustration of Learned Distances 14

Illustration of Learned Distances 14

Relationship to FDA Three non-linearly separable classes 15

Relationship to FDA Fisher Discriminant Analysis: maximizes variance between all class means 15

Relationship to FDA NCMML: maximizes variance between nearby class means 15

Relation to other linear classifiers f c (x) = b c + w c x Linear SVM Learn {b c, w c } per class WSABIE [1] w c = v c W W R d D Learn {v c } per class and shared W Nearest Class Mean b c = W µ c 2 2, w c = 2 ( µ c W W ) Learn shared W 1. Weston et al., Scaling up to large vocabulary image annotation, IJCAI 11 16

Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 17

Experimental Evaluation Data sets: ILSVRC 10: classes = 1,000, images = 1.2M training + 50K validation + 150K test INET10K: classes 10K, images = 4.5M training + 50K validation + 4.5M test Features: 4K and 64K dimensional Fisher Vectors [1] PQ Compression on 64K features [2] 1. Perronnin et al., Improving the Fisher kernel for image classification, ECCV 10 2. Jégou et al., Product quantization for nearest neighbor search, PAMI 11 18

Evaluation: ILSVRC 10 (Top 5 acc.) k-nn & NCM improve with metric learning NCM outperforms more flexible k-nn 4K Fisher Vectors Projection dimensionality 256 512 1024 l 2 k-nn, LMNN [1] - dynamic 61.0 60.9 59.6 44.1 NCM, learned metric 62.6 63.0 63.0 32.0 1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS 06 19

Evaluation: ILSVRC 10 (Top 5 acc.) k-nn & NCM improve with metric learning NCM outperforms more flexible k-nn NCM competitive with SVM and WSABIE 4K Fisher Vectors Projection dimensionality 256 512 1024 l 2 k-nn, LMNN [1] - dynamic 61.0 60.9 59.6 44.1 NCM, learned metric 62.6 63.0 63.0 32.0 WSABIE [2] 61.6 61.3 61.5 Baseline: 1-vs-Rest SVM 61.8 1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS 06 2. Weston et al., Scaling up to large vocabulary image annotation, IJCAI 11 19

Generalization on INET10K (Top 1 acc.) Nearest Class Mean Classifier Compute means of 10K classes, in about 1 CPU hour Re-use metric learned on ILSVRC 10 1-vs-Rest SVM baseline Train 10K SVM classifiers, in about 280 CPU days 20

Generalization on INET10K (Top 1 acc.) Nearest Class Mean Classifier Compute means of 10K classes, in about 1 CPU hour Re-use metric learned on ILSVRC 10 1-vs-Rest SVM baseline Train 10K SVM classifiers, in about 280 CPU days Feat. dim. 64K 21K 128K 60K Method NCM SVM SVM [1] SVM [2] DL [3] Flat top-1 13.9 21.9 6.4 19.1 19.2 1. Deng et al., What does classifying 10,000 image categories tell us?, ECCV 10 2. Perronnin et al., Good practice in large-scale image classification, CVPR 12 3. Le et al., Building high-level features using large scale unsupervised learning, ICML 12 20

Transfer Learning - Zero-Shot Prior Use ImageNet class hiearchy to estimate a mean, [1] Internal nodes Training nodes New class 1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in a large-scale setting, CVPR 11 21

Transfer Learning - Zero-Shot Prior Use ImageNet class hiearchy to estimate a mean, [1] Internal nodes Training nodes New class 1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in a large-scale setting, CVPR 11 21

Transfer Learning - Zero-Shot Prior Use ImageNet class hiearchy to estimate a mean, [1] Internal nodes Training nodes New class 1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in a large-scale setting, CVPR 11 21

Transfer Learning - Results ILSVRC 10 Step 1 Metric learning on 800 classes Step 2 Estimate means for remaining 200 for evaluation: Data mean (Maximum Likelihood) Zero-Shot prior + data mean (Maximum a Posteriori) 80 Top-5 accuracy 60 40 20 0 0 1 10 100 1000 Number of samples per class 22

Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 23

Conclusion Nearest Class Mean (NCM) Classification We proposed NCM Metric Learning Outperforms k-nn, on par with SVM and WSABIE Advantages of NCM over alternatives: Allows adding new images and classes at near zero cost Shows competitive results on unseen classes Can benefit from class priors for small sample sizes Further improvements Extension using multiple class centroids [1] 1. Mensink et al., Large Scale Metric Learning for Distance-Based Image Classification, Tech-report, 2012 24

Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka 1 1 Xerox Research Centre Europe, 2 INRIA NIPS BigVision Workshop December 7, 2012 25