Supplementary Material for Ensemble Diffusion for Retrieval

Similar documents
Ensemble Diffusion for Retrieval

Regularized Diffusion Process for Visual Retrieval

Evaluation of GIST descriptors for web scale image search

Re-ranking by Multi-feature Fusion with Diffusion for Image Retrieval

Joint Unsupervised Learning of Deep Representations and Image Clusters Supplementary materials

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Multi-modal image retrieval with random walk on multi-layer graphs

Local Descriptor based on Texture of Projections

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Discriminative classifiers for image recognition

DeepIndex for Accurate and Efficient Image Retrieval

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Multiple Kernel Learning for Emotion Recognition in the Wild

Large scale object/scene recognition

Periocular Biometrics: When Iris Recognition Fails

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Affinity Learning with Diffusion on Tensor Product Graph. Xingwei Yang, Lakshman Prasad, and Longin Jan Latecki, Senior Member, IEEE

III. VERVIEW OF THE METHODS

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval

arxiv: v1 [cs.cv] 11 Feb 2014

Fast Face Recognition Based on 2D Fractional Fourier Transform

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Semantic-aware Co-indexing for Image Retrieval

A Comparison of SIFT, PCA-SIFT and SURF

Evaluation and comparison of interest points/regions

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Dynamic Match Kernel with Deep Convolutional Features for Image Retrieval

A New Feature Local Binary Patterns (FLBP) Method

Selective Search for Object Recognition

DeepPano: Deep Panoramic Representation for 3-D Shape Recognition

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search

Hamming embedding and weak geometric consistency for large scale image search

Learning Affine Robust Binary Codes Based on Locality Preserving Hash

International Journal of Computer Techniques Volume 4 Issue 1, Jan Feb 2017

Aggregating Descriptors with Local Gaussian Metrics

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Nagoya University at TRECVID 2014: the Instance Search Task

Efficient Re-ranking in Vocabulary Tree based Image Retrieval

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Local Image Features

Learning a Fine Vocabulary

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Spatial Coding for Large Scale Partial-Duplicate Web Image Search

An Evaluation of Volumetric Interest Points

Affine-invariant scene categorization

arxiv: v3 [cs.cv] 3 Oct 2012

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang

Color Local Texture Features Based Face Recognition

THIS paper considers the task of accurate image retrieval

Land-use scene classification using multi-scale completed local binary patterns

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Compressed local descriptors for fast image and video search in large databases

TEXTURE CLASSIFICATION METHODS: A REVIEW

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features

LARGE-SCALE image retrieval based on visual features has

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Selection of Scale-Invariant Parts for Object Class Recognition

A Bayesian Approach to Hybrid Image Retrieval

Beyond Bags of features Spatial information & Shape models

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Mixtures of Gaussians and Advanced Feature Encoding

Human detection solution for a retail store environment

Texture Feature Extraction Using Improved Completed Robust Local Binary Pattern for Batik Image Retrieval

Large-scale visual recognition Efficient matching

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Improving Shape retrieval by Spectral Matching and Meta Similarity

SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

A Novel Extreme Point Selection Algorithm in SIFT

Object detection using non-redundant local Binary Patterns

Learning to Match Images in Large-Scale Collections

Robust PDF Table Locator

DescriptorEnsemble: An Unsupervised Approach to Image Matching and Alignment with Multiple Descriptors

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

CS 223B Computer Vision Problem Set 3

Palm Vein Recognition with Local Binary Patterns and Local Derivative Patterns

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Visual Object Recognition

CS 231A Computer Vision (Fall 2012) Problem Set 3

An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012

arxiv: v1 [cs.lg] 20 Dec 2013

Facial-component-based Bag of Words and PHOG Descriptor for Facial Expression Recognition

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

arxiv: v1 [cs.cv] 20 Dec 2016

Patch Descriptors. CSE 455 Linda Shapiro

WISE: Large Scale Content Based Web Image Search. Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley

A FRAMEWORK FOR ANALYZING TEXTURE DESCRIPTORS

Decorrelated Local Binary Pattern for Robust Face Recognition

Facial Expression Recognition with Emotion-Based Feature Fusion

Transcription:

Supplementary Material for Ensemble Diffusion for Retrieval Song Bai 1, Zhichao Zhou 1, Jingdong Wang, Xiang Bai 1, Longin Jan Latecki 3, Qi Tian 4 1 Huazhong University of Science and Technology, Microsoft Research Asia 3 Temple University, 4 University of Texas at San Antonio {songbai,zzc,xbai}@hust.edu.cn,jingdw@microsoft.com,latecki@temple.edu,qi.tian@utsa.edu The document contains the supplementary materials for Ensemble Diffusion for Retrieval. The primary goal of this document is to provide the proof of used propositions in Sec. 1, which are omitted in the main paper due to the space itation. As a secondary goal, we present additional experiments for a more comprehensive evaluation, including the quantitative evaluations in Sec., the qualitative evaluations in Sec. 3, and the parameter discussions in Sec. 4, respectively. 1. Propositions Proposition 1. Eq. (5) converges to Eq. (6). Proof. By applying vec( ) to both sides of Eq. (6), we obtain à (t+1) = αsã(t) + (1 α)ĩ, where S = S (1) S (). Therefore, à (t+1) can be expanded as t 1 à (t+1) = (αs) t à (1) + (1 α) (αs) i Ĩ. It is known that the spectral radius of both S (1) and S () are no larger than 1. Hence, all the eigenvalues of S are in [ 1, 1]. Considering that 0 < α < 1, we have (αs)t à (1) = 0, (1 α) t 1 (αs) i Ĩ = (1 α)(i αs) 1 Ĩ. Therefore, one can easily induce that à (t+1) = (1 α)(i αs) 1 Ĩ, which is identical to Eq. (6) after applying vec 1 to both sides. Proposition. The closed-form solution of Eq. (7) is E- q. (6). indicates equal contributions. corresponding author. Proof. Define Y N(i 1) + k, Z N(j 1) + l, D = D (1) D () R N N and W = W (1) W () R N N. Then the left term of Eq. (7) can be transformed to ( ) N 1 ÃY W Y Z ÃZ DY Y DZZ = = N à Y W Y Z D Y Y N Y =1 N à Y à Y ÃT D 0.5 WD 0.5 à =ÃT ( I D 0.5 WD 0.5) Ã, =ÃT (I S)Ã. W Y Z DY Y D ZZ à Z The right term can be described as µ à Ĩ. Therefore, the objective function in Eq. (7) becomes whose partial derivative is min à T (I S)à + µ à Ĩ, à (I S)à + µ(ã Ĩ). In order to get the optimal solution of the above problem, set the derivative to zero and substitute µ = 1 α 1. We have à = (1 α)(i αs) 1 Ĩ, which is equivalent to Eq. (6) after applying vec 1 to both sides. Proposition 3. Eq. (14) converges to the solution in E- q. (1). Proof. Eq. (14) can be vectorized to à (t+1) = SÃ(t) + (1 = S t à (1) + (1 α v )Ĩ t 1 α v ) S i Ĩ.

where S = M α vs v. Similar to Proposition 1, we only need to prove that all the eigenvalues of S are in ( 1, 1). Since all the eigenvalues of S v (1 v M) are in [ 1, 1], the spectral radius of S is bounded by M α v. Considering µ > 0, β v > 0 and Eq. (13), we have α v = Therefore, we have Then, M v =1 β v µ + M v =1 β < 1. v St à (1) = 0, à (t+1) = (1 α v )(I α v S v ) 1 Ĩ. The proof is complete.. Quantitative Evaluation.1. Face Retrieval We follow [3] to evaluate the performances on ORL face dataset. ORL dataset is comprised of 400 grayscale face images, divided into 40 categories. The evaluation metric is called bull s eye score, which counts the average recall before the top-15 candidates in the ranking list. For each face image, we extract 18 dimensional SIFT [6], 14 dimensional HoG [], 3 dimensional LBP [8] and 51 dimensional GIST [9]. Their baseline performances are 84.95%, 73.70%, 73.15% and 83.67% respectively. The detailed performances of tensor product fusion is given in Table 1, and the performance comparison of different fusion methods is given in Table. Again, RED yields the best performances. Compared with [3] which enumerates 7 kinds of diffusion processes (by varying 4 different affinity initializations, 6 different transition matrices and 3 different update schemes), RED gives nearly 0 percent gain in the performance. The reason for the performance gain are two folds. First, instead of using only one similarity, we leverage more than two similarities, which can be well handled by the proposed framework. Second, the proposed RED is a more robust fusion with diffusion method with automatic weight learning... The performances of Tensor Product Fusion Table 3 and Table 4 present the detailed retrieval performances of tensor product fusion on Holidays dataset [5] and Ukbench dataset [7], respectively. The best results (marked in red) and the worse results (marked in blue) are the upper and the lower bounds of the performances of tensor product fusion, which are presented in the main paper. SIFT HOG LBP GIST SIFT - 90.00 89.75 97.17 HOG 90.60-87.5 9.45 LBP 90.50 87.13-94.38 GIST 96.83 91.60 93.05 - Table 1. The bull s eye scores of tensor product fusion on ORL dataset. Methods Bull s eye score Generic Diffusion Process [3] 77.4 Naive Fusion 93.53 Tensor Product Fusion 87.13 97.17 RED 97.75 Table. The performance (%) comparison on ORL face dataset. NetVLAD SPoC ResNet HSV NetVLAD - 9.36 91.85 88.4 SPoC 9.46-90.0 87.55 ResNet 91.85 90.09-85.44 HSV 87.77 87.33 85.1 - Table 3. The maps of tensor product fusion on Holidays dataset. NetVLAD SPoC ResNet HSV NetVLAD - 3.871 3.874 3.66 SPoC 3.876-3.854 3.69 ResNet 3.884 3.861-3.69 HSV 3.680 3.685 3.68 - Table 4. The N-S scores of tensor product fusion on Ukbench dataset. 3. Qualitative Evaluation In this section, we present three sample retrieval results for qualitative evaluations on ModelNet40 dataset [1]. The retrieval results of the 4 baseline similarities (Volumetric CNN [11], GIFT [1], ResNet [4], PANORAMA [10]) are presented in the first 4 rows. The retrieval results of the 3 fusion methods (naive fusion, tensor product fusion, RED) are presented in the next 3 rows. The performances of tensor product fusion are obtained by fusing ResNet [4] and GIFT [1]. As can be seen from Fig. and Fig. 3, fusion methods clearly outperform baseline similarities. Moreover, RED yield 100% retrieval precision in the top-10 retrieved list, which is superior to other fusion methods. In Fig. 4, one can observe that all the 4 baseline similarities fail with this query. However, by exploiting the complementary nature among them, RED still improves the retrieval performance remarkably. 4. Parameter Discussion In Fig. 1(a), we plot the influence of µ used by RED on the retrieval performance with Holidays dataset. As can be seen, RED is not sensitive to the change of µ as long as it is in a reasonable range.

Query Figure. The first sample retrieval results on ModelNet40 dataset. False positives are in red boxes Query Figure 3. The second sample retrieval results on ModelNet40 dataset. False positives are in red boxes As suggested in [3, 13], it is crucial to determine the number of nearest neighbors k on the affinity graph. Fig. 1(b) analyzes the influence of k on the retrieval performance with Holidays dataset. It can be observed that when k 5, the performance is significantly improved (around 93 in map). When k keeps increasing, the performance drops slightly due to the inclusion of noisy edges on the affinity graph. We can infer that it is still an open issue to automatically select a proper k on the affinity graph. References [1] S. Bai, X. Bai, Z. Zhou, Z. Zhang, and L. J. Latecki. Gift: A real-time and scalable 3d shape search engine. In CVPR, 016.

Query Figure 4. The third sample retrieval results on ModelNet40 dataset. False positives are in red boxes. map map 94 93.5 93 9.5 9 0 0.05 0.1 0.15 0. 0.5 0.3 94 93 9 91 90 89 (a) 88 0 10 0 30 40 50 k (b) Figure 1. The influence of µ and the number of nearest neighbors k on Holidays dataset. [] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886 893, 005. [3] M. Donoser and H. Bischof. Diffusion processes for retrieval revisited. In CVPR, pages 130 137, 013., 3 [4] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 016. [5] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pages 304 317, 008. [6] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60():91 110, 004. [7] D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In CVPR, pages 161 168, 006. [8] T. Ojala, M. Pietikäinen, and T. Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. TPAMI, 4(7):971 987, 00. [9] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 4(3):145 175, 001. [10] P. Papadakis, I. Pratikakis, T. Theoharis, and S. J. Perantonis. Panorama: A 3d shape descriptor based on panoramic views for unsupervised 3d object retrieval. IJCV, 89(-3):177 19, 010. [11] C. R. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. J. Guibas. Volumetric and multi-view cnns for object classification on 3d data. In CVPR, 016.

[1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shape modeling. In CVPR, 015. [13] X. Yang, S. Koknar-Tezel, and L. J. Latecki. Locally constrained diffusion process on locally densified distance s- paces with applications to shape retrieval. In CVPR, pages 357 364, 009. 3