Mixtures of Gaussians and Advanced Feature Encoding

Similar documents
Fisher vector image representation

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Aggregating Descriptors with Local Gaussian Metrics

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart

Oriented pooling for dense and non-dense rotation-invariant features

Compressed local descriptors for fast image and video search in large databases

Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening

Large scale object/scene recognition

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition Efficient matching

Learning Compact Visual Attributes for Large-scale Image Classification

Aggregating local image descriptors into compact codes

CODEBOOK ENHANCEMENT OF VLAD REPRESENTATION FOR VISUAL RECOGNITION

Evaluation of GIST descriptors for web scale image search

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

The devil is in the details: an evaluation of recent feature encoding methods

Neural Codes for Image Retrieval. David Stutz July 22, /48 1/48

arxiv: v1 [cs.cv] 15 Mar 2014

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

ImageCLEF 2011

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

ON AGGREGATION OF LOCAL BINARY DESCRIPTORS. Syed Husain and Miroslaw Bober

Improved Fisher Vector for Large Scale Image Classification XRCE's participation for ILSVRC

Pattern Spotting in Historical Document Image

VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval

SEMANTIC POOLING FOR IMAGE CATEGORIZATION USING MULTIPLE KERNEL LEARNING

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

ILSVRC on a Smartphone

CLUSTER ENCODING FOR MODELLING TEMPORAL VARIATION IN VIDEO

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli

Metric Learning for Large Scale Image Classification:

arxiv: v1 [cs.cv] 20 Dec 2013

Stable hyper-pooling and query expansion for event detection

Compact video description for copy detection with precise temporal alignment

Revisiting the VLAD image representation

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

Part-based and local feature models for generic object recognition

arxiv: v1 [cs.cv] 26 Oct 2015

Combining attributes and Fisher vectors for efficient image retrieval

Selective Aggregated Descriptors for Robust Mobile Image Retrieval

Metric Learning for Large-Scale Image Classification:

Writer Identification and Verification Using GMM Supervectors

Scene Recognition using Bag-of-Words

The Yael library. Matthijs Douze, Hervé Jégou. To cite this version: HAL Id: hal

Boosting the Fisher vector for fine-grained classification

arxiv: v1 [cs.cv] 1 Feb 2017

Language Identification by Using SIFT Features

IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS

Artistic ideation based on computer vision methods

Action recognition in videos

Local Features and Bag of Words Models

DeepIndex for Accurate and Efficient Image Retrieval

Bag-of-features. Cordelia Schmid

Computer Vision. Exercise Session 10 Image Categorization

Revisiting the Fisher vector for fine-grained classification

Aggregated Color Descriptors for Land Use Classification

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Stable hyper-pooling and query expansion for event detection

AGGREGATING CONTOUR FRAGMENTS FOR SHAPE CLASSIFICATION. Song Bai, Xinggang Wang, Xiang Bai

Part based models for recognition. Kristen Grauman

Fisher Vector Faces in the Wild

arxiv: v1 [cs.cv] 24 Apr 2015

A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-dimensional Visual Data

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

String distance for automatic image classification

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Bilinear Models for Fine-Grained Visual Recognition

Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness

Multiple VLAD encoding of CNNs for image classification

Performance Evaluation of SIFT and Convolutional Neural Network for Image Retrieval

Automatic Classification and Pathological Staging of Confocal Laser Endomicroscopic Images of the Vocal Cords

Fisher Vector Faces in the Wild

By Suren Manvelyan,

CLASSIFICATION Experiments

Approximate Gaussian Mixtures for Large Scale Vocabularies

Combining attributes and Fisher vectors for efficient image retrieval

Selective Encoding for Recognizing Unreliably Localized Faces

Locality in Generic Instance Search from One Example

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

MIL at ImageCLEF 2013: Scalable System for Image Annotation

Compressing Deep Neural Networks for Recognizing Places

CS229: Action Recognition in Tennis

Improving the Fisher Kernel for Large-Scale Image Classification

Object Classification Problem

Human Detection and Action Recognition. in Video Sequences

Searching in one billion vectors: re-rank with source coding

Image Retrieval with Fisher Vectors of Binary Features

arxiv: v1 [cs.mm] 3 May 2016

On the burstiness of visual elements

Learning Vocabularies over a Fine Quantization

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search

INRIA-LEAR S VIDEO COPY DETECTION SYSTEM. INRIA LEAR, LJK Grenoble, France 1 Copyright detection task 2 High-level feature extraction task

Sparse coding for image classification

Tag Recommendation for Photos

Patch Descriptors. CSE 455 Linda Shapiro

Transcription:

Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé

Why do good recognition systems go bad? E.g. Why isn t our Bag of Words classifier at 90% instead of 70%? Training Data Huge issue, but not necessarily a variable you can manipulate. Learning method Probably not such a big issue, unless you re learning the representation (e.g. deep learning). Representation Are the local features themselves lossy? What about feature quantization? That s VERY lossy.

Standard Kmeans Bag of Words http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/ bag_of_visual_words.pdf

Today s Class More advanced quantization / encoding methods that represent the state-of-theart in image classification and image retrieval. Soft assignment (a.k.a. Kernel Codebook) VLAD Fisher Vector Mixtures of Gaussians

Motivation Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region Why not including other statistics? http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

We already looked at the Spatial Pyramid level 0 level 1 level 2 But today we re not talking about ways to preserve spatial information.

Motivation Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region Why not including other statistics? For instance: mean of local descriptors http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Motivation Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region Why not including other statistics? For instance: mean of local descriptors (co)variance of local descriptors http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Simple case: Soft Assignment Called Kernel codebook encoding by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.

Simple case: Soft Assignment Called Kernel codebook encoding by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters. This is fast and easy to implement (try it for HW 5) but it does have some downsides for image retrieval the inverted file index becomes less sparse.

A first example: the VLAD 1 assign descriptors µ 1 µ 2 Given a codebook, e.g. learned with K-means, and a set of local descriptors : 1 assign: 23 compute: concatenate v i s + normalize 2 compute x- µ i µ 4 µ 3 x µ 5 3 v i =sum x- µ i for cell i v 1 v 2 v 3 v 4 v 5 Jégou, Douze, Schmid and Pérez, Aggregating local descriptors into a compact image representation, CVPR 10.

A first example: the VLAD A graphical representation of Jégou, Douze, Schmid and Pérez, Aggregating local descriptors into a compact image representation, CVPR 10.

The Fisher vector Score function Given a likelihood function with parameters λ, the score function of a given sample X is given by: Fixed-length vector whose dimensionality depends only on # parameters. Intuition: direction in which the parameters λ of the model should we modified to better fit the data.

Aside: Mixture of Gaussians (GMM) For Fisher Vector image representations, is a GMM. GMM can be thought of as soft kmeans. 0.4 0.5 0.05 0.05 Each component has a mean and a standard deviation along each direction (or full covariance)

This looks like a soft version of kmeans

The Fisher vector Relationship with the BOW FV formulas: 0.4 0.05 0.5 0.05 Perronnin and Dance, Fisher kernels on visual categories for image categorization, CVPR 07.

The Fisher vector Relationship with the BOW FV formulas: gradient wrt to w " soft BOW" 0.4 0.05 = soft-assignment of patch t to Gaussian i 0.5 0.05 Perronnin and Dance, Fisher kernels on visual categories for image categorization, CVPR 07.

The Fisher vector Relationship with the BOW FV formulas: gradient wrt to w " soft BOW" = soft-assignment of patch t to Gaussian i gradient wrt to µ and σ " compared to BOW, include higher-order statistics (up to order 2) Let us denote: D = feature dim, N = # Gaussians BOW = N-dim FV = 2DN-dim 0.4 0.05 0.5 Perronnin and Dance, Fisher kernels on visual categories for image categorization, CVPR 07. 0.05

The Fisher vector Relationship with the BOW FV formulas: gradient wrt to w " soft BOW" 0.4 0.05 = soft-assignment of patch t to Gaussian i 0.5 0.05 gradient wrt to µ and σ " compared to BOW, include higher-order statistics (up to order 2) FV much higher-dim than BOW for a given visual vocabulary size" FV much faster to compute than BOW for a given feature dim Perronnin and Dance, Fisher kernels on visual categories for image categorization, CVPR 07.

The Fisher vector Dimensionality reduction on local descriptors Perform PCA on local descriptors: uncorrelated features are more consistent with diagonal assumption of covariance matrices in GMM FK performs whitening and enhances low-energy (possibly noisy) dimensions

The Fisher vector Normalization: variance stabilization Variance stabilizing transforms of the form: (with α=0.5 by default) can be used on the FV (or the VLAD). Reduce impact of bursty visual elements Jégou, Douze, Schmid, On the burstiness of visual elements, ICCV 09.

Datasets for image retrieval INRIA Holidays dataset: 1491 shots of personal Holiday snapshot 500 queries, each associated with a small number of results 1-11 results 1 million distracting images (with some false positives) Hervé Jégou, Matthijs Douze and Cordelia Schmid Hamming Embedding and Weak Geometric consistency for large-scale image search, ECCV'08

Examples Retrieval Example on Holidays: From: Jégou, Perronnin, Douze, Sánchez, Pérez and Schmid, Aggregating local descriptors into compact codes, TPAMI 11.

Examples Retrieval Example on Holidays: From: Jégou, Perronnin, Douze, Sánchez, Pérez and Schmid, Aggregating local descriptors into compact codes, TPAMI 11. second order statistics are not essential for retrieval

Examples Retrieval Example on Holidays: From: Jégou, Perronnin, Douze, Sánchez, Pérez and Schmid, Aggregating local descriptors into compact codes, TPAMI 11. second order statistics are not essential for retrieval even for the same feature dim, the FV/VLAD can beat the BOW

Examples Retrieval Example on Holidays: From: Jégou, Perronnin, Douze, Sánchez, Pérez and Schmid, Aggregating local descriptors into compact codes, TPAMI 11. second order statistics are not essential for retrieval even for the same feature dim, the FV/VLAD can beat the BOW soft assignment + whitening of FV helps when number of Gaussians

Examples Retrieval Example on Holidays: From: Jégou, Perronnin, Douze, Sánchez, Pérez and Schmid, Aggregating local descriptors into compact codes, TPAMI 11. second order statistics are not essential for retrieval even for the same feature dim, the FV/VLAD can beat the BOW soft assignment + whitening of FV helps when number of Gaussians after dim-reduction however, the FV and VLAD perform similarly

Examples Classification Example on PASCAL VOC 2007: From: Chatfield, Lempitsky, Vedaldi and Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, BMVC 11. Feature dim map VQ 25K 55.30 KCB 25K 56.26 LLC 25K 57.27 SV 41K 58.16 FV 132K 61.69

Examples Classification Example on PASCAL VOC 2007: From: Chatfield, Lempitsky, Vedaldi and Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, BMVC 11. Feature dim map VQ 25K 55.30 KCB 25K 56.26 FV outperforms BOW-based techniques including: VQ: plain vanilla BOW KCB: BOW with soft assignment LLC: BOW with sparse coding LLC 25K 57.27 SV 41K 58.16 FV 132K 61.69

Examples Classification Example on PASCAL VOC 2007: From: Chatfield, Lempitsky, Vedaldi and Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, BMVC 11. Feature dim map VQ 25K 55.30 KCB 25K 56.26 LLC 25K 57.27 SV 41K 58.16 FV 132K 61.69 FV outperforms BOW-based techniques including: VQ: plain vanilla BOW KCB: BOW with soft assignment LLC: BOW with sparse coding including 2nd order information is important for classification

Packages The INRIA package: http://lear.inrialpes.fr/src/inria_fisher/" The Oxford package: http://www.robots.ox.ac.uk/~vgg/research/encoding_eval/" Vlfeat does it too http://www.vlfeat.org

Summary We ve looked at methods to better characterize the distribution of visual words in an image: Soft assignment (a.k.a. Kernel Codebook) VLAD Fisher Vector Mixtures of Gaussians is conceptually a soft form of kmeans which can better model the data distribution.