Large-scale visual recognition Efficient matching

Similar documents
Searching in one billion vectors: re-rank with source coding

Large scale object/scene recognition

Large-scale visual recognition The bag-of-words representation

Compressed local descriptors for fast image and video search in large databases

Product quantization for nearest neighbor search

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Mixtures of Gaussians and Advanced Feature Encoding

Metric Learning for Large-Scale Image Classification:

CLSH: Cluster-based Locality-Sensitive Hashing

Metric Learning for Large Scale Image Classification:

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Evaluation of GIST descriptors for web scale image search

Adaptive Binary Quantization for Fast Nearest Neighbor Search

Aggregating local image descriptors into compact codes

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University

Hamming embedding and weak geometric consistency for large scale image search

Improved Fisher Vector for Large Scale Image Classification XRCE's participation for ILSVRC

A Novel Quantization Approach for Approximate Nearest Neighbor Search to Minimize the Quantization Error

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014

6.819 / 6.869: Advances in Computer Vision

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart

Draft. Locality sensitive hashing: a comparison of hash function types and querying mechanisms. Loïc Paulevé, Hervé Jégou, Laurent Amsaleg

Searching with quantization: approximate nearest neighbor search using short codes and distance estimators

Explicit embeddings for nearest neighbor search with Mercer kernels

APPROXIMATE K-Nearest Neighbor (ANN) search has

The Boundary Graph Supervised Learning Algorithm for Regression and Classification

Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search

Lecture 12 Recognition

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), June, 2012

Supervised Hashing for Image Retrieval via Image Representation Learning

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011

arxiv: v1 [cs.cv] 11 Dec 2013

Problem 1: Complexity of Update Rules for Logistic Regression

Geometric data structures:

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

DeepIndex for Accurate and Efficient Image Retrieval

Hashing with Binary Autoencoders

Lecture 12 Recognition

Metric Learning Applied for Automatic Large Image Classification

Machine Learning: Think Big and Parallel

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 18.

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

NEAREST neighbor search is ubiquitous in computer

Fast Nearest Neighbor Search in the Hamming Space

arxiv: v2 [cs.cv] 23 Jul 2018

Fast Neighborhood Graph Search using Cartesian Concatenation

Link and code: Fast indexing with graphs and compact regression codes

Aggregated Color Descriptors for Land Use Classification

ANTON MURAVEV EFFICIENT VECTOR QUANTIZATION FOR FAST APPROXIMATE NEAREST NEIGHBOR SEARCH

Efficient Indexing of Billion-Scale datasets of deep descriptors

NEAREST neighbor search is ubiquitous in computer

Improving bag-of-features for large scale image search

Large Scale 3D Reconstruction by Structure from Motion

Multiple-Choice Questionnaire Group C

arxiv: v1 [cs.mm] 3 May 2016

Fast Indexing and Search. Lida Huang, Ph.D. Senior Member of Consulting Staff Magma Design Automation

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

3D Object Recognition using Multiclass SVM-KNN

Fisher vector image representation

Learning Affine Robust Binary Codes Based on Locality Preserving Hash

Stable hyper-pooling and query expansion for event detection

PQTable: Fast Exact Asymmetric Distance Neighbor Search for Product Quantization using Hash Tables

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search

over Multi Label Images

Aggregating Descriptors with Local Gaussian Metrics

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

EFFICIENT nearest neighbor (NN) search is one of the

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening

REcently, the computer vision community has witnessed

Compact video description for copy detection with precise temporal alignment

7 Approximate Nearest Neighbors

Lecture 12 Recognition. Davide Scaramuzza

Cartesian k-means. Mohammad Norouzi David J. Fleet Department of Computer Science University of Toronto 2.

Pattern Spotting in Historical Document Image

Mobile visual search. Research Online. University of Wollongong. Huiguang Sun University of Wollongong. Recommended Citation

Detection of Cut-And-Paste in Document Images

Recent Advance in Content-based Image Retrieval: A Literature Survey

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Min-Hashing and Geometric min-hashing

Image Processing. Image Features

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention

Bloom Filters and Compact Hash Codes for Efficient and Distributed Image Retrieval

Revisiting the VLAD image representation

Predictive Indexing for Fast Search

A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval

Recognition. Topics that we will try to cover:

Nearest Neighbor with KD Trees

Large Scale Nearest Neighbor Search Theories, Algorithms, and Applications. Junfeng He

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

Classifiers and Detection. D.A. Forsyth

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Transcription:

Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012

Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!! Searching with Product Quantization

Finding neighbors!! Nearest neighbor search is a critical step in object recognition!! To compute the image descriptor itself E.g., assignment with k-means to a large vocabulary!! To find the most similar images/patches in a database!! For instance, the closest one w.r.t to Euclidean distance:!! Problems:!! costly operation of exact exhaustive search: O(n*d)!! High-dimensional vectors: for exact search the best approach is the naïve exhaustive comparison

The cost of (efficient) exact matching!! But what about the actual timings? With an efficient implementation!!! Finding the 10-NN of 1000 distinct queries in 1 million vectors!! Assuming 128-D Euclidean descriptors!! i.e., 1 billion distances, computed on a 8-core machine Poll: How much time?

The cost of (efficient) exact matching!! But what about the actual timings? With an efficient implementation!!! Finding the 10-NN of 1000 distinct queries in 1 million vectors!! Assuming 128-D Euclidean descriptors!! i.e., 1 billion distances, computed on a 8-core machine 5.5 seconds!! Assigning 2000 SIFTs to a visual vocabulary of size k=100,000!! 1.2 second!! Hamming distance: 1000 queries, 1M database vectors!! Computing the 1 billion distances: 0.6 second

Need for approximate nearest neighbors!! 1 million images, 1000 descriptors per image!! 1 billion distances per local descriptor!! 10 12 distances in total!! 1 hour 30 minutes to perform the query for Euclidean vectors!! To improve the scalability:!! We allow to find the nearest neighbors in probability only: Approximate nearest neighbor (ANN) search!! Three (contradictory) performance criteria for ANN schemes!! search quality (retrieved vectors are actual nearest neighbors)!! speed!! memory usage

Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!! Searching with Product Quantization

Locality Sensitive Hashing (LSH)!! Most known ANN technique [Charikar 98, Gionis 99, Datar 04, ]!! But LSH is associated with two distinct search algorithms!! As an indexing technique involving several hash functions!! As a binarization technique 110 111 100 101 010 011 000 001

Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!! Searching with Product Quantization

LSH partitioning technique!! General idea:!! Define m hash functions in parallel!! Each vector: associated with m distinct hash keys!! Each hash key is associated with a hash table!! At query time:!! Compute the hash keys associated with the query!! For each hash function, retrieve all the database vectors assigned to the same key (for this hash function)!! Compute the exact distance on this short-list

E2LSH: hash function for Euclidean vectors [Datar 04] 1) Projection on i=1 m random directions 2) Construction of l hash functions: concatenate m indexes h i per hash function 3) For each g j, compute two hash values universal hash functions: u 1 (.), u 2 (.) 4) store the vector id in a hash table, as for an inverted file

E2LSH: hash function for Euclidean vectors [Datar 04] 0 1 2 1 2 2 a 1 1) Projection on i=1 m random directions 2) Construction of l hash functions: concatenate m indexes h i per hash function 1 O b i w 1 3) For each g j, compute two hash values universal hash functions: u 1 (.), u 2 (.) 4) store the vector id in a hash table, as for an inverted file

E2LSH: hash function for Euclidean vectors [Datar 04] 1) Projection on i=1 m random directions (0,1) (0,0) (1,1) (2,1) (1,0) (2,0) (3,1) (3,0) 2) Construction of l hash functions: concatenate m indexes h i per hash function (0,-1) (1,-1) (2,-1) 3) For each g j, compute two hash values universal hash functions: u 1 (.), u 2 (.) 4) store the vector id in a hash table, as for an inverted file

Alternative hash functions!! Instead of using random projections!! Why not directly using a structured quantizer?!! Vector quantizers: better compression performance than scalar ones!! Structured vector quantizer: Lattice quantizers [Andoni 06]!! Hexagonal (d=2), E 8 (d=8), Leech (d=24)!! But still: lack of adaptation to the data

Alternative hash functions Learned!! Any hash function can be used in LSH!! Just need a set of functions!! Therefore, could be learned on sample examples!! In particular: k-means, Hierarchical k-means, KD-trees From [Pauleve 10]!! Better data adaptation than with structured quantizers

Alternative hash functions Learned!! Significantly better than structured quantizers!! Example of search: quality for a single hash function!! BOV: k-means!!! HKM: loss compared with k-means [Nister 06]

Multi-probe LSH!! But multiple hash functions use a lot of memory!! Per vector and per hash table: at least an id!! Multi-probe LSH [Lv 07]!! Use less hash functions (possibly 1)!! But probe several (closest) cells per hash function " save a lot of memory!! Similar in spirit to Multiple-assignment with BOV

FLANN!! ANN package described in Muja s VISAPP paper [Muja 09]!! Multiple kd-tree or k-means tree!! With auto-tuning under given constraints Remark: self-tuned LSH proposed in [Dong 07]!! Still high memory requirement for large vector sets!! Excellent package: high integration quality and interface!!! See http://www.cs.ubc.ca/~mariusm/index.php/flann/flann

Issue for large scale: final verification!! For this second ( re-ranking ) stage, we need raw descriptors, i.e.,!! either huge amount of memory! 128GB for 1 billion SIFTs!! either to perform disk accesses! severely impacts efficiency

Issue for large scale: final verification NOT ON A LARGE SCALE!! Some techniques like BOV keep all vectors (no verification)!! Better: use very short codes for the filtering stage!! Hamming Embedding [Jegou 08] or Product Quantization [Jegou 11]

Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!! Searching with Product Quantization

LSH for binarization [Charikar 98, J. 08, Weiss 09, etc] 110 111 100 101 010 011 000 001!! Idea: design/learn a function mapping the original space into the compact Hamming space:!! Objective: neighborhood in the Hamming space try to reflect original neighborhood!! Advantages: compact descriptor, fast comparison

LSH for binarization [Charikar 98, J. 08, Weiss 09, etc] 110 111 100 101 010 011 000 001!! Given B random projection direction a i!! Compute a binary code from a vector x as!! Spectral Hashing: theoretical framework for finding hash functions!! In practice: PCA + binarization on the different axis (based on variance)

LSH: the two modes approximate guidelines Partitioning technique!! Sublinear search!! Several hash indexes (integer)!! Large memory overhead!! Hash table overhead (store ids)!! Need original vectors for re-ranking!! Need a lot of memory!! Or to access the disk!! Interesting when (e.g., FLANN)!! Not too large dimensionality!! Dataset small enough (memory) Binarization technique!! Linear search!! Produce a binary code per vector!! Very compact!! bit-vectors, concatenated (no ids)!! Very fast comparison!! Hamming distance (popcnt SSE4)!! 1 billion comparisons/second!! Interesting!! For very high-dimensional vectors!! When memory is critical!! Very good variants/software (FLANN)!! Simple to implement. Very active problems with many variants

Other topics on LSH!! More general metrics: Kernelized LSH [Kulis 09], RMMH [Joly 11],!! Binary LSH (=searching binary vectors)!! Like E2LSH but: random subset of bits instead of projections!! In this CVPR: [Nourouzi 12]. Exact binary search variant!!! Optimized jointly with dimensionality reduction!! PCA+random rotation or PCA+variance balancing [Jegou 10]!! ITQ [Gong 11]!! Asymmetric distances!! Idea: do not approximate the query only the database is binarized!! Proposed with sketches [Dong 08]!! Significant improvement for any binarization technique [Gordo 11]

Hamming Embedding!! Introduced as an extension of BOV [Jegou 08]!! Combination of!! A partitioning technique (k-means)!! A binary code that refine the descriptor Representation of a descriptor x!! Vector-quantized to q(x) as in standard BOV + short binary vector b(x) for an additional localization in the Voronoi cell!! Two descriptors x and y match iif Where h(.,.) denotes the Hamming distance

ANN evaluation of Hamming Embedding 0.7 NN recall 0.6 0.5 0.4 0.3 0.2 h t =16 18 20 28 32 24 k=100 compared to BOW: at least 22 200 5000 10000 20000 30000 50000 500 1000 2000 10 times less points in the short-list for the same level of accuracy Hamming Embedding provides a much better trade-off between recall and remove false positives 0.1 HE+BOW BOW 0 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 rate of points retrieved

Matching points - 20k word vocabulary 201 matches 240 matches Many matches with the non-corresponding image!

Matching points - 200k word vocabulary 69 matches 35 matches Still many matches with the non-corresponding one

Matching points - 20k word vocabulary + HE 83 matches 8 matches 10x more matches with the corresponding image!

Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!! Searching with Product Quantization

A typical source coding system rate-distortion optimization decorrelation quantization entropy coding!! Simple source coding system:!! Decorrelation, e.g., PCA!! Quantization!! Entropy coding!! To a code e(x) is associated a unique reconstruction value q(x) " i.e., the visual word!! Focus on quantization (lossy step)

Relationship between Reconstruction and Distance estimation!! Assume y quantized to q c (y) x is a query vector!! If we estimate the distance by c c!! Then we can show that: i.e., the error on the square distance is statistically bounded by the quantization error

Searching with quantization [Jegou 11]!! Main idea: compressed representation of the database vectors!! Each database vector y is represented by q c (y) where q c (.) is a product quantizer c c!! Search = distance approximation problem!! The key: Estimate the distances in the compressed domain such that!! Quantization is fast enough!! Quantization is precise, i.e., many different possible indexes (ex: 2 64 )!! Regular k-means is not appropriate: not for k=2 64 centroids

Product Quantizer!! Vector split into m subvectors:!! Subvectors are quantized separately!! Example: y = 16-dim vector split in 8 subvectors of dimension 16 y 1 : 2 components q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 q 1 (y 1 ) q 2 (y 2 ) q 3 (y 3 ) q 4 (y 4 ) q 5 (y 5 ) q 6 (y 6 ) q 7 (y 7 ) q 8 (y 8 ) 3 bits 24-bit quantization index!! In practice: 8 bits/subquantizer (256 centroids),!! SIFT: m=4-16!! VLAD/Fisher: 4-128 bytes per indexed vector

Asymmetric distance computation (ADC)!! Compute the square distance approximation in the compressed domain!! To compute distance between query and many codes!! compute for each subvector and all possible centroids! stored in look-up tables! fixed cost for quantization!! for each database code: sum the elementary square distances!! Each 8x8=64-bits code requires only m=8 additions per distance!! IVFADC: combination with an inverted file to avoid exhaustive search

Combination with an inverted file system ALGORITHM! Coarse k-means hash function Select k closest centroids c i and corresponding cells! Compute the residual vector x-c i of the query vector! Encode the residual vector by PQ c! Apply the PQ search method. Distance is approximated by d(x,y) = d(x-c i,q(y-c i )) Example timing: 3.5 ms per vector for a search in 2 billion vectors

Performance evaluation!! Comparison with other memory efficient approximate neighbor search techniques, i.e., binarization techniques!! Spectral Hashing [Weiss 09] exhaustive search!! Hamming Embedding [Jegou 08] non exhaustive search!! Performance measured by searching 1M vector (recall@r, varying R) Searching in 1M SIFT descriptors Searching in 1M GIST descriptors

Variants!! Adapted codebook for residual vectors [Uchida 11]!! Learn the product quantizer separately in the different cells!! Re-ranking with source coding [Jegou 11]!! Exploit the explicit reconstruction of PQ!! Refine the database vector by a short code!! In this CVPR:!! The multiple inverted index [Babenko 12]!! Replace the coarse k-means by a product quantizer!! + priority selection of cells!! Then apply PQ distance estimation

Product Quantization: some applications!! PQ search was first proposed for searching local descriptors [J 09-11], i.e., to replace bag-of-words or Hamming Embedding!! [J 10]: Encoding a global image representation (Vlad/Fisher)!! [Gammeter et al 10]: Fast geometrical re-ranking with local descriptors!! [Perronnin et al. 11]: Large scale classification (Imagenet)!! Combined with Stochastic Gradient Descent SVM!! Decompression on-the-fly when feeding the classifier!! Won the ILSVRC competition in 2011!! [Vedaldi and Zisserman 12] CVPR!! Learning in the PQ-compressed domain!! Typically *10 acceleration with no loss

Conclusion Nearest neighbor search is a key component of image indexing systems Product quantization-based approach offers o! Competitive search accuracy o! Compact footprint: few bytes per indexed vector Tested on 220 million video frames, 100 million still images extrapolation for 1 billion images: 20GB RAM, query < 1s on 8 cores Tested on audio, text descriptors, etc Toy Matlab package available on my web page 10 million images indexed on my laptop: 21 bytes per indexed image

Estimated distances versus true distances!! The estimator d(x,q(y)) 2 of d(x,y) 2 is biased:!! The bias can removed by quantization error terms!! but does not improve the NN search quality