Efficient Representation of Local Geometry for Large Scale Object Retrieval

Similar documents
Efficient Representation of Local Geometry for Large Scale Object Retrieval

Min-Hashing and Geometric min-hashing

Maximally Stable Extremal Regions and Local Geometry for Visual Correspondences

Large Scale Image Retrieval

Large scale object/scene recognition

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Evaluation of GIST descriptors for web scale image search

Large-scale Image Search and location recognition Geometric Min-Hashing. Jonathan Bidwell

Learning a Fine Vocabulary

Video Google: A Text Retrieval Approach to Object Matching in Videos

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Large-scale visual recognition The bag-of-words representation

Hamming embedding and weak geometric consistency for large scale image search

Keypoint-based Recognition and Object Search

arxiv: v1 [cs.cv] 15 Mar 2014

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang

Instance-level recognition II.

Towards Visual Words to Words

Object Recognition and Augmented Reality

Instance-level recognition part 2

Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases

Query Adaptive Similarity for Large Scale Object Retrieval

Pattern Spotting in Historical Document Image

Instance Search with Weak Geometric Correlation Consistency

Indexing local features and instance recognition May 16 th, 2017

Learning Vocabularies over a Fine Quantization

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

Compressed local descriptors for fast image and video search in large databases

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

WISE: Large Scale Content Based Web Image Search. Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley

Construction of Precise Local Affine Frames

Image Retrieval with a Visual Thesaurus

Recognition. Topics that we will try to cover:

Total Recall II: Query Expansion Revisited

Indexing local features and instance recognition May 14 th, 2015

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al.

From Structure-from-Motion Point Clouds to Fast Location Recognition

Paper Presentation. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart

On the burstiness of visual elements

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention

ON AGGREGATION OF LOCAL BINARY DESCRIPTORS. Syed Husain and Miroslaw Bober

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

An Evaluation of Two Automatic Landmark Building Discovery Algorithms for City Reconstruction

Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening

CLSH: Cluster-based Locality-Sensitive Hashing

Mixtures of Gaussians and Advanced Feature Encoding

Spatial Coding for Large Scale Partial-Duplicate Web Image Search

Lecture 12 Recognition

Indexing local features and instance recognition May 15 th, 2018

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search

Perception IV: Place Recognition, Line Extraction

Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking

Published in: MM '10: proceedings of the ACM Multimedia 2010 International Conference: October 25-29, 2010, Firenze, Italy

Speeded-up, relaxed spatial matching

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Detection of Cut-And-Paste in Document Images

Visual query expansion with or without geometry: refining local descriptors by feature aggregation

Specular 3D Object Tracking by View Generative Learning

TRECVid 2013 Experiments at Dublin City University

Automatic Ranking of Images on the Web

Revisiting the VLAD image representation

Recognizing object instances

A Vote-and-Verify Strategy for Fast Spatial Verification in Image Retrieval

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Large-scale visual recognition Efficient matching

6.819 / 6.869: Advances in Computer Vision

Descriptor Learning for Efficient Retrieval

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search

Modern-era mobile phones and tablets

Efficient Re-ranking in Vocabulary Tree based Image Retrieval

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting

Evaluation of GIST descriptors for web-scale image search

3D model search and pose estimation from single images using VIP features

Large Scale 3D Reconstruction by Structure from Motion

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

THE matching of local visual features plays a critical role

Scalable Recognition with a Vocabulary Tree

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Scalable Recognition with a Vocabulary Tree

Learning bottom-up visual processes using automatically generated ground truth data

Part-based and local feature models for generic object recognition

Stereo and Epipolar geometry

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Instance-level recognition

RANSAC RANdom SAmple Consensus

Instance-level recognition

Oriented pooling for dense and non-dense rotation-invariant features

arxiv: v1 [cs.cv] 1 Feb 2017

Descriptor Learning for Efficient Retrieval

By Suren Manvelyan,

Recognizing Object Instances. Prof. Xin Yang HUST

Learning to Match Images in Large-Scale Collections

Dynamic Match Kernel with Deep Convolutional Features for Image Retrieval

INRIA-LEAR S VIDEO COPY DETECTION SYSTEM. INRIA LEAR, LJK Grenoble, France 1 Copyright detection task 2 High-level feature extraction task

Handling Urban Location Recognition as a 2D Homothetic Problem

Mobile Visual Search with Word-HOG Descriptors

A Systems View of Large- Scale 3D Reconstruction

Reinforcement Matching Using Region Context

Transcription:

Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2009 22 nd June 2009

Large Scale Object Retrieval Large (web) scale real-time search involves millions(billions) of images Indexing structure should fit into RAM, failing to do so results in a order of magnitude increase in response time 1 bit per feature requires ~1.16GB in a retrieval system with 5 million images, ~2000 features per image Scalability and the cost of the search engine is determined by memory footprint -> memory is the limiting factor Methods based on local features Are able to retrieve objects, not only images Robust to occlusions and background changes Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 2/20

Image vs. Object Retrieval Image query Object query Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 3/20

Object Retrieval vs. Global Methods Query Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 4/20

Image Representation by Local Features Local features = appearance + geometry Bag of words approaches focus on appearance. Geometry benefits object retrieval (in spatial verification) [Philbin 07] The problem: How to efficiently store geometric information of local features? Currently, the straightforward representation of local geometry requires roughly 4-5x more memory than the appearance I.e. ~32bits for appearance vs. ~160bits for geometry. Our solution: a learnt geometric vocabulary Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 5/20

Image Representation by Local Features Keypoint Detection Local Appearance Bag of words, Video Google [Šivic 03] SIFT Description [Lowe 04] Visual Vocabulary graffiti Local Geometry graffiti? Visual Words word 1, word 2, word 8,... word 948534,word 998125 graffiti Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 6/20

Object Retrieval with Spatial Verification Inverted file 1 4 8 11 79... 948534 998423 query word 1 2 3... 1000000 +TF-IDF weights 1 1 1 1 2 2 2 2 8 8 8 8............ 948534 948534 948534 948534 998125 998125 998125 998125 bark trees graffiti bike TF-IDF ranking 0.85 0.81... 0.013 0.001... graffiti graffiti2 bark all_souls query Geometry bikes bikes graffiti bark LO-RANSAC H A i A j Final ranking 210 138... 3 1 graffiti2 graffiti bark all_souls Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 7/20

Local Geometry of Affine Features Coordinates x 0 and an ellipse E ellipse E can be decomposed up to an arbitrary rotation Q We fix ambiguity in Q by choosing R is given by a dominant orientation as in [Lowe 04] Local affine transformation is defined by a single correspondence Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 8/20

Geometry Compression Representing transformation A i A i (independent of rotation R ), captures the shape of the ellipse Let a transformation B i be a representant of A i, ideally However, to compress the representation, B will be chosen from a vocabulary and shared by multiple A i, so that How to measure and minimize error E of H? Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 9/20

Minimizing Geometric Error of H We capture the quality of H by integrating reprojection error over unit circle using simple manipulations Frobenius norm can be used in space of transformations as a similarity Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 10/20

K-means-like Clustering For a set of transformations A of cardinality N we are looking for a best set B of K representative candidates B j that minimizes error E over all possible assignments A j. 1. assignment step 2. refinement step which leads to a closed form solution. 3. go to 1, until no reassignments occurred or error is below threshold Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 11/20

Geometric vocabularies Assuming that the scale and shape of the ellipse are independent, scale can be extracted and quantized separately Each geometric vocabulary is characterized by two numbers (x, y) and denoted SxEy x number of bits for encoding scale y number of bits for encoding ellipse shape, K = 2 y Geometric vocabularies learnt on a set of local geometries of 10M affine covariant points (for visualisation scale was quantized separately) y = 2, K = 2 2 y = 3, K = 2 3 y = 4, K = 2 4 y = 5, K = 2 5 Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 12/20

Geometric vocabularies S0E8 S4E4 S8E0 S4E12 256 ellipses 16 scales, 16 ellipses 256 scales, circles 16 scales, 4k ellipses Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 13/20

Efficient Geometry Representation Keypoint Detection Local Appearance SIFT Description [Lowe 04] Visual Vocabulary graffiti Local Geometry graffiti Geom. Vocabulary x 1,y 1,B 1 x 2,y 2,B 5 x 3,y 3,B 3... x N,y N,B N graffiti Visual Words word 1, word 2, word 8,... word 948534,word 998125 graffiti Coordinates x i, y i are quantized separately and encoded using 16bits Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 14/20

Gravity Vector Assumption Transformation A has one eigenvector (0,1) > pointing down in the image we call it a gravity vector Surprisingly, upright direction is often preserved (R = I ) Gravity vector assumption was used solely in spatial verification [Philbin 07] Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 15/20

Gravity Vector Assumption Performance comparison SIFT dominant orientation [Lowe 04] produces 50% more descriptors Robustness of gravity vector assumption Robustness of SIFT to small imprecision in orientation Correct final geometry due to LO-step in RANSAC Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 16/20

Experiments Datasets and Protocol Oxford Buildings Dataset (Ox5K) [Philbin 07] 5062 images of buildings collected from Flickr, 55 queries with ground truth, 11 landmarks each with 5 different queries map mean Average Precision, average area below the precisionrecall curves for all queries Additional 100k of distractor images, together dataset (Ox105K) Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 17/20

Experiments Geometric Vocabularies Two different visual vocabularies, trained on Oxford5K and ~6000 images of tourist spots in Paris, protocol from [Philbin 07] No geometry Spatial verification with 8bit per feature geometric vocabulary for ellipse shape performs almost as exact geometry It might be necessary to use more challenging datasets to highlight the benefit of affine shape. In Oxford buildings dataset 86% of all GT pairs have anisotropy under 1.5 (72% is under 1.25) 90% of all GT pairs have skew under 20 (76% is under 10 ) Full geometry Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 18/20

Experiments Performance Comparison Oxford Buildings Dataset Proposed method outperforms all state of the art methods Geometry representation S0E8 was used Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 19/20

Conclusions Contributions We have proposed a method for learning an efficient, compact representation of local geometry Method achieved state of the art results (0.916 map) on Oxford buildings dataset and competitive results on INRIA Holidays dataset. Proposed method requires only 45.2 bits/feature on average Local geometry in 24 bits (16 bits for position, 8 bits for shape) Visual word labels and TF/IDF weights in 21.2 bits We have shown that the use of gravity vector assumption in feature description significantly improves recognition rate and further reduces the memory footprint Future work Compare performance of scale vs. affine covariant points on standard dataset Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 20/20