Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

Size: px

Start display at page:

Download "Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2"

Joseph Moody
6 years ago
Views:

1 Geometric VLAD for Large Scale Image Search Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2 1 2

2 Our Goal 1) Robust to various imaging conditions 2) Small memory footprint 3) Speed (<1s per query) 2

3 Issues with matching images (1/2) Photometric Invariance Brightness Exposure 3

4 Issues with matching images (2/2) Geometric Invariance Rotation Translation Scale 4

5 State-of-the-art: Bag-of-Words (BoW) BoW Computation Image Inventory Keypoint Detection Descriptor Computation Codebook Construction BoW Encoding Image Inventory Bag-of-Words Slide evolved from Fei-Fei Li s Bag-of-Words Image Encoding (size = 200k) Inverted Indices 5

6 Issues with BOW Matching v Weak Matching Schema o for a small visual dictionary: too many false matches o for a large visual dictionary: many true matches are missed v Hard to find vocabulary size trade-offs v Large inverted index size 6

7 Recent approaches for very large scale indexing BoW Computation Image Inventory Keypoint Detection Descriptor Computation Codebook Construction Bag-of-Words Vector Encoding Image Inventory Bag-of-Words Vector Compression (size = 128) Nearest Neighbor Search 7

assign each descriptor to closest center

cell v i := v i + (x - c i ) Residual (x

(dimension D = k x d) typical k = 64 128

8 VLAD: Vector of Locally Aggregated Descriptors c i x l l For a given image assign each descriptor to closest center c i accumulate (sum) descriptors per cell v i := v i + (x - c i ) Residual (x - c i ) adds useful information l VLAD (dimension D = k x d) typical k = Dimension VLAD has better performance than 65k BoW! 8

9 Issue with VLAD c i r x c i r x VLAD: v i := v i + (x - c i ) VLAD: r + r + r + r = 4*r VLAD: r + r + r + r = 4*r VLAD fails to capture geometry information 9

gvlad: Incorporating Geometry in VLAD x x c i r Bin 1 c i r Bin 2 gvlad: Take 2 angle bins - [-30,120), [120-330) v i := v i + (x - c i ) per angle bin

10 gvlad: Incorporating Geometry in VLAD x x c i r Bin 1 c i r Bin 2 gvlad: Take 2 angle bins - [-30,120), [ ) v i := v i + (x - c i ) per angle bin gvlad: (2*r, 2*r) Bin 1: r + r Bin 2: r + r gvlad: (4*r, 0) Bin 1: r + r + r + r Bin 2: 0 Angle binning captures different geometry of configurations! 10

11 Power of Keypoint Angle Features map Angle Bin (8) 0.15 Angle Bin (18) 0.24 Angle Bin (36) 0.26 Angle Bin (72) 0.27 GIST (544) 0.35 BoW (20,000) 0.45 Retrieval performance using only angle histogram Only 72-D Angle Bin performs well! 11

12 Datasets & Vocabularies Oxford 5K 5062 / 55 queries Paris 6K 6412 / 60 queries Holidays 1491 / 500 queries q Large Scale Distractors Flickr 100K, Flickr 1M q Vocabulary k-means clustering on SURF descriptors with k = 256 on Paris dataset Rotated Holidays 12

13 Dataset: Holiday & Oxford Holiday example queries Oxford example queries 13

14 Example Distractors Flickr 14

15 gvlad: Keypoint Detection & Descriptor Extraction feature descriptor angle 15

16 gvlad: Learning Angle Membership Rotate Holiday, 4 Bins Oxford, 4 Bins map map Offset Offset 16

17 gvlad: Learning Angle Membership Von-Mises Distribution Holiday SURF (8,233,763 key points) 17

18 gvlad: Vocabulary Adaptation Adapt existing codebooks with incremental dataset Alleviate the need of frequent large-scale codebook training U - Initial data set D New data set u initial codebook k 1 u adapted codebook k 2 k 3 18

19 gvlad: Compute Descriptors Inter-norm ~17.7% Rotated Holiday Dataset 19

20 gvlad: PCA whitening Whitened gvlad: Rotated Holiday Dataset 16.6% Lower Dim 20

21 gvlad: PCA whitening Dimension reduction on the original gvlad using PCA From 65,536 à 128 dimensions, the map decreases only about 1%. 21

22 Experiment: Full size gvlad on Holiday & Oxford map performance % 7.1% - Full size gvlad descriptors - Compared with state-of-the-art results - SURF detector & SURF descriptor are used - Best performances are in bold 22

23 Experiment: Low-dim gvlad on Holiday & Oxford map performance % 15.2% - Low dimensional descriptors - K=128 k w =128 - Comparison with state-of-the-art - Best performances are in bold. 23

24 Experiment: on Large Scale Dataset Avg ~ 12.5% Avg ~ 16.3% - Large Scale Data with 100k/1M distractors - Comparison with state-of-the-art - Best performances are in bold. 24

25 Take Home Message 0.8 gvlad VLAD VLAD+SSR Bow Improved Fisher Ours - gvlad MutltiVoc+VLAD 25

26 Thank You 26

27 BACKUP

28 Speed and Memory Speed ~750ms per query Memory 0.5KB per image for 128-D features. 0.5GB for 1M images. 500GB for 1B images. 28

29 Comparison with CNN-Based Approaches Neural Codes MOP-CNN Neural Codes for Image Retrieval, A Babenko1, A. Slesarev, A. Chigorin, and V. Lempitsky, arxiv, April gvlad Multi-scale Orderless Pooling of Deep Convolutional Activation Features, Yunchao Gong1, Liwei Wang2, Ruiqi Guo2, and Svetlana Lazebnik, arxiv, March

arxiv: v1 [cs.cv] 15 Mar 2014

arxiv: v1 [cs.cv] 15 Mar 2014 Geometric VLAD for Large Scale Image Search Zixuan Wang, Wei Di, Anurag Bhardwaj, Vignesh Jagadeesh, Robinson Piramuthu Dept.of Electrical Engineering, Stanford University, CA 94305 ebay Research Labs,