GATED SQUARE-ROOT POOLING FOR IMAGE INSTANCE RETRIEVAL

Size: px
Start display at page:

Download "GATED SQUARE-ROOT POOLING FOR IMAGE INSTANCE RETRIEVAL"

Transcription

1 GATED SQUARE-ROOT POOLING FOR IMAGE INSTANCE RETRIEVAL Ziqian Chen 1, Jie Lin 2, Vijay Chandrasekhar 2,3, Ling-Yu Duan 1 Peking University, China 1 I 2 R, Singapore 2 NTU, Singapore 3 ABSTRACT Recently Convolutional Neural Networks (CNNs) have achieved great success in different fields including image instance retrieval. However traditional global pooling approaches fail to capture all possible discriminative information of CNN activations and treat activations over channels equally regardless of the different importance between channels. In this work, we focus on the mentioned problem of global feature pooling over CNN activations for image instance retrieval. We make two contributions. First, we introduce a channel-wise SQUare-root (SQU) pooling (2- norm) approach,which makes better use of information over activation maps and is superior to Average (1-norm) and Max pooling (infinity norm), in the context of instance retrieval. Second, we further improve SQU by learning a gating function that weights the contributions of different channels, in an end-to-end manner. Extensive experiments on 6 benchmark datasets show that the proposed strategies achieve considerable improvements over state-of-the-art. Index Terms Instance Retrieval, CNN, Square-root pooling, Learning to gate. 1. INTRODUCTION Image instance retrieval is the discovery of images from a database representing the same object or scene as the one depicted in a query image. In this field, descriptors play a significant role in its performance. Previous works for image instance retrieval are mainly image-level descriptors aggregated from handcrafted local features, such as the well-known SIFT [1], VLAD [2] and Fisher vectors [3]. These approaches can be further improved by numerous strategies [4, 5, 6, 7] Motivated by the remarkable success of CNNs [8, 9, 10] in image classification [11], CNN-based descriptors [12, 13, 14, 15, 16, 17, 18] have been progressively replacing handcrafted descriptors [6] as state-of-the-art for image instance retrieval. Generally speaking, deeply learned descriptors in image instance retrieval are aggregated by pooling over activation maps extracted from intermediate layers [19, 20, 12, 13, 14].Initial study [20] proposed to use representations extracted from fully connected layer of CNN.More compact descriptors [12] can be derived by performing either global max The first two authors contributed equally. (MAC [14]) or average pooling (e.g. SPoC [12]) over activation maps output by convolutional/pooling layers. However, global max pooling focus on the max activation and global average pooling regards low and high response the same and thus these approaches may ignore potentially discriminative information in a single activation map. Further improvements are obtained by spatial and channel-wise weighting of activation maps [13], making us noticing that CNN-based features over different channels may devote different contributions to the performance. Inspired by R-CNN [21] for object detection, Tolias et al. [14] proposed ROI-based pooling on activation maps, Regional Maximum Activation of Convolutions (R-MAC), which significantly improves global pooling approaches. However, R-MAC still follows the idea of Maximum pooling, which will discard some potentially necessary information and also treat different channels equally. In recent works [20, 15, 16, 17, 18], image classification pre-trained CNNs are repurposed for instance retrieval, by fine-tuning them with ranking loss functions such as contrastive loss [22] and triplet loss [23]. Results show that finetuned models outperform pre-trained ones by a large margin, when training and test datasets belong to similar domains. For the better usage of potentially discriminative information in a CNN feature map, we introduce Square-root pooling (SQU) as an alternative operator to regular global average and max pooling, in the context of instance retrieval.squ improves the quality of global descriptors with either global average or max pooling, which is our first contribution. Our second contribution is we propose to further boost the performance of SQU by learning a channel-wise gating function to weight the importance of different channels, termed Gated- SQU. In particular, to adapt the gating function for instance retrieval, we integrate GatedSQU layer with ranking loss for learning the gating parameters in an end-to-end manner. We perform systematical evaluations for both SQU and Gated- SQU on 6 benchmark datasets. Results demonstrate the effectiveness of SQU and GatedSQU over state-of-the-art. 2. METHOD 2.1. Aggregated CNN Descriptors for Instance Retrieval Consider an image X as input to CNN, we describe the image with activation maps extracted from intermediate layer, denoted as X = {x 1,..., x C }, where x c represents a activation

2 map of width W and height H, C is the number of channels. Global feature pooling is performed to convert each activation map with size N = W H into a single value, f α (x c ) = ( 1 N N (x c,i ) α ) 1 α. (1) i=1 f α (x c ) for all channels are concatenated to form a C- dimensional CNN descriptor. Eq. 1 encompasses recent works on global pooling [12, 13, 14] for image retrieval. For instance, α = 1 represents 1-norm average pooling for SPoC [12], while α = denotes max pooling for MAC [14]. The aggregated global CNN descriptors can be further improved by post-processing techniques such as PCA whitening [12, 13, 14]. Specifically, the descriptors are firstly L2 normalized, followed by PCA projection and whitening with a pre-trained PCA matrix. Finally, the whitened vectors are L2 normalized and compared with inner product SQU: SQUare-root pooling Theoretical analysis [24] has shown that max (or average) pooling may perform better than average (or max), depending on dataset as well as features. This motivates us to find an intermediate α-norm between and 1 that is superior to either of them. A natural choice could be Square-root pooling (SQU) with α = 2, i.e. fα SQU (x c ) = ( 1 N N i=1 x c,i 2 ) 1 2. In fact, SQU suppressed other pooling operators in image classification with Bag-of-Words built on SIFT [24]. Here, we explore the feasibility of SQU in the context of image instance retrieval with CNN features. Retrieval experiments in Section 3.3 highlight that SQU outperforms MAC and SPoC by a large margin GatedSQU: learning to gate SQU Recently, Kalantidis et al. [13] proposed CroW to improve SPoC by spatial and channel-wise weighted average pooling. However, we observe that CroW is limited to SPoC, there is a significant drop in performance if apply CroW to other pooling operations like MAC. Instead of the heuristic approach like CroW, we propose to learn a data-driven gating function that weights the contributions of channels. Moreover, the gating mechanism can benefit from domain-specific knowledge for the purpose of image instance retrieval. To this end, we design an end-to-end pipeline to learn a desirable gating function with deep neural networks. First, we crop the CNN to the last pooling layer (e.g. pool5), then append a new GatedSQU layer which performs pooling over activation maps output by pool5, finally followed by a triplet loss tailored for the retrieval task. In particular, the GatedSQU layer applies a gating function on SQU. GatedSQU layer. There are many ways to learn a gating function [25]. In this work, we design a simple channel-wise gating function on SQU, which is defined as fα GatedSQU (x c ) = σ(s w c )fα SQU (x c ), (2) where w = {w 1,...w C } denote channel-wise weights, σ(.) is the sigmoid function. s is a scale constant to control the speed of driving σ(.) towards 0 or 1 (i.e. a logic gate over channels). Eq. 2 is differentiable, thus, the gating parameters w can be optimized via stochastic gradient descent. More precisely, we compute the gradient of Eq. 2 w.r.t gating parameter w c and activation map element x c,i as follows f GatedSQU α wc = σ(s w c )(1 σ(s w c ))f SQU α (x c ), (3) f GatedSQU α xc,i = x c,iσ(s w c ) Nf SQU α (x c ) Finally, we apply l 2 normalization after GatedSQU, resulting in a C-dimensional normalized descriptor. Learning with triplet Loss. Following existing works [15, 17, 18], we choose the triplet loss which is widely used for the retrieval task. A triplet (X q, X +, X ) contains a query image X q, a positive image X + and a negative image X. Query X q is more similar to positive image X + than to negative image X. Image similarity is measured by l 2 normalized GatedSQU descriptors. Thus, the triplet needs to meet the condition that k(x q, X + ) > k(x q, X ), where k(, ) denotes the similarity of a pair. Accordingly, we define the triplet loss as L q,+, = max{0, m + k(x q, X ) k(x q, X + )}, where m is a positive margin constant Datasets and Metrics 3. EXPERIMENTS We evaluate our method on six benchmark datasets. INRIA Holidays [26] dataset is composed of 1491 scene-centric images, 500 of them are queries. Following [20], we use the rotated version of Holidays, where all images are with upright orientation. Oxford5k [27] and Paris6k [28] are buildings datasets respectively consisting of 5062 and 6412 images. For both datasets, there are 55 queries composed of 11 landmarks, each represented by 5 queries. To evaluate the performance at large scale, we additionally combine 100k Flickr images [27] with Oxford5k and Paris6k respectively, referred to as Oxford105k and Paris106k from here on. The University of Kentucky Benchmark (UKBench) [29] consists of VGA size images, organized into 2550 groups of common objects, each object represented by 4 images. All images are serving as queries. Following the standard protocols, for Holidays, Oxford5k and Paris6k, retrieval performances are measured by mean Average Precision (map). For UKBench, we report the average number of true positives within the top 4 returned images (4 Recall@4). (4)

3 Table 1. Comparison of SQU with MAC [14], SPoC [12] and R-MAC [14]. The former (latter) number in each cell represents performance generated by off-the-shelf AlexNet (VGG- 16). represents we generate the results of MAC, SPoC and R-MAC, based on the authors released codes. All experiments are performed without PCA Whitening. Method Holidays Oxford5k Paris6k UKBench R-MAC 79.8/ / / /3.73 MAC 73.7/ / / /3.65 SPoC 77.5/ / / /3.68 SQU 81.0/ / / / Implementation Notes In this work, we consider 2 CNN architectures : AlexNet [8] and VGG16 [9]. We test both off-the-shelf networks pretrained on ImageNet ILSVRC classification data set and finetuned one tailored for image retrieval [16]. To learn the gating parameters w in the GatedSQU layer, we leverage the training dataset released by [16], referred to as 3D-Landmarks from here on. 3D-Landmarks contains images, organized into 713 clusters of famous landmarks worldwide. Following [16], we select 551 clusters (22156 images, 5974 of them are queries) for training and 162 clusters (6403 images, 1691 of them are queries) for validation. Each training tuple contains 1 query, 1 positive and 5 hard negative images. We sample hard negatives by 3 criteria: (1) query and negative belong to different clusters, (2) negative has most similar descriptor to query and (3) 5 negatives for each query are from different clusters. We initialize the gating parameters w = 0 (i.e. equal contribution 0.5 for all channels) and set scale factor s = 10 for the GatedSQU layer, margin m 0.1, learning rate dividing by 2 every 5 epochs, moment 0.9, weight decay 0.001, and batch size of 5 tuples. To accelerate feature extraction, we resize all training and validation images to with aspect ratio killed. We train the network for 30 epochs. The trained model with the highest map on validation set is chosen for testing. Finally, post-processing can be applied to the GatedSQU descriptors. We choose PCA whitening in this work. To be consistent with most related papers [12, 14, 13], we learn PCA matrix on Paris6k when evaluating on Oxford5k and vice versa. For Holidays and UKBench, we simply use the 3D-Landmarks dataset for PCA learning Evaluation on SQU SQU vs. MAC, SPoC and R-MAC. We conduct retrieval experiments following standard practice in [14, 16, 17]: (1) Image size. We use the original image for Oxford5k, Paris6k and UKBench, while for Holidays we down sample the resolution to longer side equals to 1024 with aspect ratio maintained and (2) Cropped query. For Oxford5k and Paris6k, we crop query images using the provided bounding boxes. These setups are applied to the subsequent sections as well. Table 2. Comparison of GatedSQU with SQU and CroW [13]. denotes our implementations of SPoC and CroW based on the authors released codes. All experiments are performed without PCA whitening. Method Holidays Oxford5k Paris6k UKBench SPoC CroW SQU GatedSQU SQU GatedSQU Table 1 shows the comparisons of SQU with MAC, SPoC and R-MAC, using off-the-shelf AlexNet and VGG16. We observe that SQU outperforms MAC and SPoC by a large margin on all datasets for both networks. Compared to R- MAC, SQU performs worse on Oxford5k and Paris6k, but achieves better performance on Holidays and UKbench Evaluation on GatedSQU We train the GatedSQU layer with 2 configurations: Gated- SQU with off-the-shelf VGG16 pre-trained on ImageNet (denotes as GatedSQU), and GatedSQU with fine-tuned VGG16 for instance retrieval task [16] (denotes as GatedSQU ). From SQU to GatedSQU. Table 2 studies the advantage of GatedSQU over SQU. GatedSQU consistently improves its SQU counterpart on all four test datasets. For GatedSQU, similar trends can be observed on Oxford5k and Paris6k. Overall, the improvements on Oxford5k and Paris6k is relatively larger than on Holidays and UKBench. This is reasonable as 3D-Landmarks training set is building-oriented like Oxford5k and Paris6k, while Holidays and UKBench are scene-centric and object-centric respectively. GatedSQU vs. CroW. Table 2 compares GatedSQU to CroW [13], which performs both channel weighting and spatial weighting to improve SPoC [12]. First, consistently with [13], we find CroW is superior to SPoC. Second, SQU performs comparable or better than CroW on all datasets. GatedSQU further increases the gap against CroW. Off-the-shelf vs. Fine-tuned. Table 2 also compares GatedSQU with GatedSQU. We observe GatedSQU largely outperforms GatedSQU on buildings dataset (e.g. from 64.1% from 76.3% on Oxford5k), but is slightly worse than GatedSQU on Holidays and UKBench. Again, this is probably due to the fine-tuned network is biased toward a buildingcentric retrieval set. Training on more diverse datasets would improve the generalization ability of the network to general image retrieval scenarios Comparison with state-of-the-art In this section, we compare PCA whitened GatedSQU with the state-of-the-art. We split the comparisons into two parts according to the network used for experiments: off-the-shelf VGG16 and fine-tuned VGG16.

4 Table 3. Comparison of GatedSQU with the state-of-the-art, using off-the-shelf VGG16. Post.: post-processing, Dim.: dimensionality. denotes query images for Oxford5k/Paris6k are without query object cropping, results reported in another paper [16], query object cropping is performed on activation maps rather than images for Oxford5k/Paris6k, our implementations based on the authors codes, following the same setup as GatedSQU. Gray background refers to global pooling approaches, in which the best results are highlighted in bold. Method Post. Dim. Holidays Oxford5k Paris6k UKBench Oxford105k Paris106k Triang.+ Democr. aggr. [6] PCA Triang.+ Democr. aggr. [6] PCA Neural Codes [20] PCA Orderless Pooling [19] PCAW R-MAC [14] PCAW MAC [14] PCAW SPoC [12] PCAW SPoC [12] PCAW CroW [13] PCAW GatedSQU (Ours) PCAW Table 4. Comparison of GatedSQU with the state-of-the-art, using fine-tuned VGG16. Most of the marks are with the same meaning as in Table 3, expect that denotes PCA is trained end-to-end with tri-r-mac [17, 18]. Method Post. Dim. Holidays Oxford5k Paris6k UKBench Oxford105k Paris106k Neural Codes [20] PCA NetVLAD [15] PCAW NetVLAD [15] PCAW tri-r-mac [17, 18] PCAW tri-r-mac + RPN [17, 18] PCAW sia-r-mac [16] PCAW sia-r-mac [16] Lw sia-mac [16] PCAW sia-mac [16] Lw GatedSQU (Ours) PCAW Off-the-shelf network. Table 3 compares GatedSQU to state-of-the-art with off-the-shelf VGG16. Most of the approaches are post-processed with PCA whitening, and with the same dimensionality (i.e., 512). We observe that Gated- SQU performs consistently better than other global pooling operations (in gray background) on all six test datasets. More importantly, GatedSQU is superior to R-MAC on all datasets expect Paris6k and Paris106k. Fine-tuned network. Table 4 compares GatedSQU to the state-of-the-art with VGG16 fine-tuned. GatedSQU outperforms NetVLAD [15] by a large margin on Oxford5k and Paris6k, e.g. +20% when NetVLAD performs query cropping directly on images (denoted as ), while it performs a bit worse on Holidays and UKBench. We compare GatedSQU to sia-mac/sia-r-mac [16] which fine-tuned off-the-shelf VGG16 + MAC pooling with contrastive loss. Both sia-mac and sia-r-mac are post-processed by either PCA whitening or more advanced discriminative projection learned with labels (denoted as Lw) [16, 30]. We observe that GatedSQU significantly outperforms all variants of sia-mac/sia-r-mac on all datasets. In Table 4, we also present the best results of ROI-based pooling method tri-r-mac [17, 18], which simultaneously trains all VGG16 layers, RPN layers, PCA whitening and R-MAC pooling with triplet loss in an end-to-end framework, with a much larger training set which contains 200k noisy landmark images and 50k clean ones selected from the noisy set. Our global GatedSQU performs slightly better than tri-r-mac without RPN (i.e. with default ROI sampling in R-MAC) on Oxford5k and Paris6k, though the GatedSQU layer and PCA whitening are trained separately with the much smaller 3D-Landmarks dataset. 4. CONCLUSION In this work, we study the problem of pooling on deep convolutional features for image instance retrieval. We propose two strategies to improve global pooling. The first is a squareroot pooling (SQU), a 2-norm operator between the classic 1-norm average and infinite norm max operations. The second is a gating function that further enhances SQU. Both SQU and GatedSQU achieve remarkable performance compared to state-of-the-art. 5. ACKNOWLEDGE This work is supported by the National Natural Science Foundation of China ( ,U ) and in part by the PKU-NTU Joint Research Institute (JRI) sponsored by a donation from the Ng Teng Fong Charitable Foundation. And this work is partially supported by A*STAR AME Programmatic Fund A1892b0026.

5 6. REFERENCES [1] D. Lowe, Distinctive Image Features from Scaleinvariant Keypoints, IJCV, vol. 60, no. 2, pp , [2] H. Jégou, M. Douze, C. Schmid, and P. Perez, Aggregating Local Descriptors into a Compact Image Representation, in CVPR, [3] F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier, Largescale Image Retrieval with Compressed Fisher Vectors, in CVPR, [4] R. Arandjelović and A. Zisserman, All about vlad, in CVPR, [5] G. Tolias, Y. Avrithis, and H. Jegou, To aggregate or not to aggregate: Selective match kernels for image search, in ICCV, [6] Hervé Jégou and Andrew Zisserman, Triangulation embedding and democratic aggregation for image search, in CVPR, [7] Relja Arandjelović and Andrew Zisserman, Three things everyone should know to improve object retrieval, in CVPR, [8] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, Imagenet classification with deep convolutional neural networks, in NIPS, [9] K Simonyan and A Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, in ICLR, [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep residual learning for image recognition, CVPR, [11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in CVPR, [12] A. Babenko and V. Lempitsky, Aggregating local deep features for image retrieval, in ICCV, [13] Yannis Kalantidis, Clayton Mellina, and Simon Osindero, Cross-dimensional weighting for aggregated deep convolutional features, in arxiv: , [14] Giorgos Tolias, Ronan Sicre, and Hervé Jégou, Particular object retrieval with integral max-pooling of CNN activations, in ICLR, [15] R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, NetVLAD: CNN architecture for weakly supervised place recognition, in CVPR, [16] Filip Radenović, Giorgos Tolias, and Ondřej Chum, CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples, in ECCV, [17] Albert Gordo, Jon Almazán, Jérome Revaud, and Diane Larlus, Deep image retrieval: Learning global representations for image search, in ECCV, [18] Albert Gordo, Jon Almazán, Jérome Revaud, and Diane Larlus, End-to-end learning of deep visual representations for image retrieval, in IJCV, [19] Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, in ECCV, [20] Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky, Neural Codes for Image Retrieval, in ECCV, [21] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in CVPR, [22] Sumit Chopra, Raia Hadsell, and Yann Lecun, Learning a similarity metric discriminatively, with application to face verification, in CVPR, [23] J.Wang, Y.Wang, and et al., Learning Fine-grained Image Similarity with Deep Ranking, in CVPR, [24] Y-Lan Boureau, Jean Ponce, and Yann LeCun, A theoretical analysis of feature pooling in vision algorithms, in ICML, [25] Chen-Yu Lee, Patrick W. Gallagher, and Zhuowen Tu, Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree, in AISTATS, [26] H. Jégou, M. Douze, and C. Schmid, Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search, in ECCV, [27] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Object Retrieval with Large Vocabularies and Fast Spatial Matching, in CVPR, [28] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Lost in Quantization - Improving Particular Object Retrieval in Large Scale Image Databases, in CVPR, [29] D. Nistér and H. Stewénius, Scalable Recognition with a Vocabulary Tree, in CVPR, [30] Krystian Mikolajczyk and Jiri Matas, Improving descriptors for fast tree matching by optimal linear projection, in ICCV.

Aggregated Deep Feature from Activation Clusters for Particular Object Retrieval

Aggregated Deep Feature from Activation Clusters for Particular Object Retrieval Aggregated Deep Feature from Activation Clusters for Particular Object Retrieval Zhenfang Chen Zhanghui Kuang The University of Hong Kong Hong Kong zfchen@cs.hku.hk Sense Time Hong Kong kuangzhanghui@sensetime.com

More information

arxiv: v1 [cs.cv] 1 Feb 2017

arxiv: v1 [cs.cv] 1 Feb 2017 Siamese Network of Deep Fisher-Vector Descriptors for Image Retrieval arxiv:702.00338v [cs.cv] Feb 207 Abstract Eng-Jon Ong, Sameed Husain and Miroslaw Bober University of Surrey Guildford, UK This paper

More information

arxiv: v2 [cs.cv] 30 Jul 2016

arxiv: v2 [cs.cv] 30 Jul 2016 arxiv:1512.04065v2 [cs.cv] 30 Jul 2016 Cross-dimensional Weighting for Aggregated Deep Convolutional Features Yannis Kalantidis, Clayton Mellina and Simon Osindero Computer Vision and Machine Learning

More information

TWO-STAGE POOLING OF DEEP CONVOLUTIONAL FEATURES FOR IMAGE RETRIEVAL. Tiancheng Zhi, Ling-Yu Duan, Yitong Wang, Tiejun Huang

TWO-STAGE POOLING OF DEEP CONVOLUTIONAL FEATURES FOR IMAGE RETRIEVAL. Tiancheng Zhi, Ling-Yu Duan, Yitong Wang, Tiejun Huang TWO-STAGE POOLING OF DEEP CONVOLUTIONAL FEATURES FOR IMAGE RETRIEVAL Tiancheng Zhi, Ling-Yu Duan, Yitong Wang, Tiejun Huang Institute of Digital Media, School of EE&CS, Peking University, Beijing, 100871,

More information

Neural Codes for Image Retrieval. David Stutz July 22, /48 1/48

Neural Codes for Image Retrieval. David Stutz July 22, /48 1/48 Neural Codes for Image Retrieval David Stutz July 22, 2015 David Stutz July 22, 2015 0/48 1/48 Table of Contents 1 Introduction 2 Image Retrieval Bag of Visual Words Vector of Locally Aggregated Descriptors

More information

DEEP REGIONAL FEATURE POOLING FOR VIDEO MATCHING

DEEP REGIONAL FEATURE POOLING FOR VIDEO MATCHING DEEP REGIONAL FEATURE POOLING FOR VIDEO MATCHING Yan Bai 12 Jie Lin 3 Vijay Chandrasekhar 34 Yihang Lou 12 Shiqi Wang 4 Ling-Yu Duan 1 Tiejun Huang 1 Alex Kot 4 1 Institute of Digital Media Peking University

More information

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples Filip Radenović Giorgos Tolias Ondřej Chum CMP, Faculty of Electrical Engineering, Czech Technical University in Prague

More information

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2 Geometric VLAD for Large Scale Image Search Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2 1 2 Our Goal 1) Robust to various imaging conditions 2) Small memory footprint

More information

arxiv: v1 [cs.cv] 26 Oct 2015

arxiv: v1 [cs.cv] 26 Oct 2015 Aggregating Deep Convolutional Features for Image Retrieval Artem Babenko Yandex Moscow Institute of Physics and Technology artem.babenko@phystech.edu Victor Lempitsky Skolkovo Institute of Science and

More information

End-to-end Learning of Deep Visual Representations for Image Retrieval

End-to-end Learning of Deep Visual Representations for Image Retrieval Noname manuscript No. (will be inserted by the editor) End-to-end Learning of Deep Visual Representations for Image Retrieval Albert Gordo Jon Almazán Jerome Revaud Diane Larlus Instance-level image retrieval,

More information

arxiv: v1 [cs.cv] 24 Apr 2015

arxiv: v1 [cs.cv] 24 Apr 2015 Object Level Deep Feature Pooling for Compact Image Representation arxiv:1504.06591v1 [cs.cv] 24 Apr 2015 Konda Reddy Mopuri and R. Venkatesh Babu Video Analytics Lab, SERC, Indian Institute of Science,

More information

arxiv: v1 [cs.cv] 4 Jul 2017

arxiv: v1 [cs.cv] 4 Jul 2017 arxiv:17.009v1 [cs.cv] 4 Jul 2017 ABSTRACT Tuan Hoang Singapore University of Technology and Design nguyenanhtuan_hoang@mymail.sutd.edu.sg Dang-Khoa Le Tan Singapore University of Technology and Design

More information

Compressing Deep Neural Networks for Recognizing Places

Compressing Deep Neural Networks for Recognizing Places Compressing Deep Neural Networks for Recognizing Places by Soham Saha, Girish Varma, C V Jawahar in 4th Asian Conference on Pattern Recognition (ACPR-2017) Nanjing, China Report No: IIIT/TR/2017/-1 Centre

More information

CONTENT based image retrieval has received a sustained attention over the last decade, leading to

CONTENT based image retrieval has received a sustained attention over the last decade, leading to PARTICULAR OBJECT RETRIEVAL WITH INTEGRAL MAX-POOLING OF CNN ACTIVATIONS Giorgos Tolias Center for Machine Perception FEE CTU Prague Ronan Sicre Irisa Rennes Hervé Jégou Facebook AI Research ABSTRACT arxiv:1511.05879v2

More information

IN instance image retrieval an image of a particular object,

IN instance image retrieval an image of a particular object, Fine-tuning CNN Image Retrieval with No Human Annotation Filip Radenović Giorgos Tolias Ondřej Chum arxiv:7.0252v [cs.cv] 3 Nov 207 Abstract Image descriptors based on activations of Convolutional Neural

More information

DeepIndex for Accurate and Efficient Image Retrieval

DeepIndex for Accurate and Efficient Image Retrieval DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions

More information

Compact Deep Invariant Descriptors for Video Retrieval

Compact Deep Invariant Descriptors for Video Retrieval Compact Deep Invariant Descriptors for Video Retrieval Yihang Lou 1,2, Yan Bai 1,2, Jie Lin 4, Shiqi Wang 3,5, Jie Chen 2,5, Vijay Chandrasekhar 3,5, Lingyu Duan 2,5, Tiejun Huang 2,5, Alex Chichung Kot

More information

Efficient Object Instance Search Using Fuzzy Objects Matching

Efficient Object Instance Search Using Fuzzy Objects Matching Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Efficient Object Instance Search Using Fuzzy Objects Matching Tan Yu, 1 Yuwei Wu, 1,2 Sreyasee Bhattacharjee, 1 Junsong

More information

arxiv: v1 [cs.cv] 9 Jul 2017

arxiv: v1 [cs.cv] 9 Jul 2017 JIMENEZ, ALVAREZ, GIRO-I-NIETO: CLASS-WEIGHTED CONV. FEATS 1 arxiv:1707.02581v1 [cs.cv] 9 Jul 2017 Class-Weighted Convolutional Features for Visual Instance Search Albert Jimenez 12 albertjimenezsanfiz@gmail.com

More information

Mixtures of Gaussians and Advanced Feature Encoding

Mixtures of Gaussians and Advanced Feature Encoding Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM Christian Eggert, Stefan Romberg, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg ABSTRACT

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Oriented pooling for dense and non-dense rotation-invariant features

Oriented pooling for dense and non-dense rotation-invariant features ZHAO, JÉGOU, GRAVIER: ORIENTED POOLING 1 Oriented pooling for dense and non-dense rotation-invariant features Wan-Lei Zhao 1 http://www.cs.cityu.edu.hk/~wzhao2 Hervé Jégou 1 http://people.rennes.inria.fr/herve.jegou

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Large-Scale Image Retrieval with Attentive Deep Local Features

Large-Scale Image Retrieval with Attentive Deep Local Features Large-Scale Image Retrieval with Attentive Deep Local Features Hyeonwoo Noh André Araujo Jack Sim Tobias Weyand Bohyung Han POSTECH, Korea {shgusdngogo,bhhan}@postech.ac.kr Google Inc. {andrearaujo,jacksim,weyand}@google.com

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Panorama to panorama matching for location recognition

Panorama to panorama matching for location recognition Panorama to panorama matching for location recognition Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondřej Chum To cite this version: Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon,

More information

Compressed local descriptors for fast image and video search in large databases

Compressed local descriptors for fast image and video search in large databases Compressed local descriptors for fast image and video search in large databases Matthijs Douze2 joint work with Hervé Jégou1, Cordelia Schmid2 and Patrick Pérez3 1: INRIA Rennes, TEXMEX team, France 2:

More information

arxiv: v1 [cs.cv] 29 Mar 2018

arxiv: v1 [cs.cv] 29 Mar 2018 Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking Filip Radenović 1 Ahmet Iscen 1 Giorgos Tolias 1 Yannis Avrithis Ondřej Chum 1 1 VRG, FEE, CTU in Prague Inria Rennes arxiv:1803.1185v1

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Bilinear Models for Fine-Grained Visual Recognition

Bilinear Models for Fine-Grained Visual Recognition Bilinear Models for Fine-Grained Visual Recognition Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst Fine-grained visual recognition Example: distinguish

More information

Large-scale visual recognition Efficient matching

Large-scale visual recognition Efficient matching Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!

More information

Regional Attention Based Deep Feature for Image Retrieval

Regional Attention Based Deep Feature for Image Retrieval KIM ET AL.: REGIONAL ATTENTION BASED DEEP FEATURE FOR IMAGE RETRIEVAL 1 Regional Attention Based Deep Feature for Image Retrieval Jaeyoon Kim jaeyoon1603@gmail.com Sung-Eui Yoon sungeui@kaist.edu School

More information

Multiple VLAD encoding of CNNs for image classification

Multiple VLAD encoding of CNNs for image classification Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)

More information

Supplementary Material for Ensemble Diffusion for Retrieval

Supplementary Material for Ensemble Diffusion for Retrieval Supplementary Material for Ensemble Diffusion for Retrieval Song Bai 1, Zhichao Zhou 1, Jingdong Wang, Xiang Bai 1, Longin Jan Latecki 3, Qi Tian 4 1 Huazhong University of Science and Technology, Microsoft

More information

Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening

Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening Hervé Jégou, Ondrej Chum To cite this version: Hervé Jégou, Ondrej Chum. Negative evidences and co-occurrences

More information

Faster R-CNN Features for Instance Search

Faster R-CNN Features for Instance Search Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi

More information

Robust Scene Classification with Cross-level LLC Coding on CNN Features

Robust Scene Classification with Cross-level LLC Coding on CNN Features Robust Scene Classification with Cross-level LLC Coding on CNN Features Zequn Jie 1, Shuicheng Yan 2 1 Keio-NUS CUTE Center, National University of Singapore, Singapore 2 Department of Electrical and Computer

More information

arxiv: v1 [cs.cv] 3 Aug 2018

arxiv: v1 [cs.cv] 3 Aug 2018 LATE FUSION OF LOCAL INDEXING AND DEEP FEATURE SCORES FOR FAST IMAGE-TO-VIDEO SEARCH ON LARGE-SCALE DATABASES Savas Ozkan, Gozde Bozdagi Akar Multimedia Lab., Middle East Technical University, 06800, Ankara,

More information

Compact Global Descriptors for Visual Search

Compact Global Descriptors for Visual Search Compact Global Descriptors for Visual Search Vijay Chandrasekhar 1, Jie Lin 1, Olivier Morère 1,2,3, Antoine Veillard 2,3, Hanlin Goh 1,3 1 Institute for Infocomm Research, Singapore 2 Université Pierre

More information

Group Invariant Deep Representations for Image Instance Retrieval

Group Invariant Deep Representations for Image Instance Retrieval arxiv:1601.02093v2 [cs.cv] 13 Jan 2016 CBMM Memo No. 043 January 11, 2016 roup Invariant Deep Representations for Image Instance Retrieval by Olivier Morère, Antoine Veillard, Jie Lin, Julie Petta, Vijay

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Revisiting the VLAD image representation

Revisiting the VLAD image representation Revisiting the VLAD image representation Jonathan Delhumeau, Philippe-Henri Gosselin, Hervé Jégou, Patrick Pérez To cite this version: Jonathan Delhumeau, Philippe-Henri Gosselin, Hervé Jégou, Patrick

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

arxiv: v1 [cs.cv] 18 Apr 2016

arxiv: v1 [cs.cv] 18 Apr 2016 arxiv:1604.04994v1 [cs.cv] 18 Apr 2016 Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval Xiu-Shen Wei, Jian-Hao Luo, and Jianxin Wu National Key Laboratory for Novel Software

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

arxiv: v1 [cs.cv] 5 Sep 2017

arxiv: v1 [cs.cv] 5 Sep 2017 Learning Non-Metric Visual Similarity for Image Retrieval Noa Garcia Aston University, UK garciadn@aston.ac.uk George Vogiatzis Aston University, UK g.vogiatzis@aston.ac.uk arxiv:1709.01353v1 [cs.cv] 5

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL Hantao Yao 1,2, Shiliang Zhang 3, Dongming Zhang 1, Yongdong Zhang 1,2, Jintao Li 1, Yu Wang 4, Qi Tian 5 1 Key Lab of Intelligent Information Processing

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

IMage retrieval has always been an attractive research topic

IMage retrieval has always been an attractive research topic JOURNAL OF L A TEX CLASS FILES, VOL. **, NO. **, AUGUST 2018 1 Deep Feature Aggregation and Image Re-ranking with Heat Diffusion for Image Retrieval Shanmin Pang, Jin Ma, Jianru Xue, Member, IEEE, Jihua

More information

arxiv: v1 [cs.cv] 15 Mar 2014

arxiv: v1 [cs.cv] 15 Mar 2014 Geometric VLAD for Large Scale Image Search Zixuan Wang, Wei Di, Anurag Bhardwaj, Vignesh Jagadeesh, Robinson Piramuthu Dept.of Electrical Engineering, Stanford University, CA 94305 ebay Research Labs,

More information

To aggregate or not to aggregate: Selective match kernels for image search

To aggregate or not to aggregate: Selective match kernels for image search To aggregate or not to aggregate: Selective match kernels for image search Giorgos Tolias INRIA Rennes, NTUA Yannis Avrithis NTUA Hervé Jégou INRIA Rennes Abstract This paper considers a family of metrics

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

arxiv: v1 [cs.cv] 30 Oct 2016

arxiv: v1 [cs.cv] 30 Oct 2016 Accurate Deep Representation Quantization with Gradient Snapping Layer for Similarity Search Shicong Liu, Hongtao Lu arxiv:1610.09645v1 [cs.cv] 30 Oct 2016 Abstract Recent advance of large scale similarity

More information

arxiv: v1 [cs.mm] 3 May 2016

arxiv: v1 [cs.mm] 3 May 2016 Bloom Filters and Compact Hash Codes for Efficient and Distributed Image Retrieval Andrea Salvi, Simone Ercoli, Marco Bertini and Alberto Del Bimbo Media Integration and Communication Center, Università

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Deeply Cascaded Networks

Deeply Cascaded Networks Deeply Cascaded Networks Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu 1 Introduction After the seminal work of Viola-Jones[15] fast object

More information

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012) Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Performance Evaluation of SIFT and Convolutional Neural Network for Image Retrieval

Performance Evaluation of SIFT and Convolutional Neural Network for Image Retrieval Performance Evaluation of SIFT and Convolutional Neural Network for Image Retrieval Varsha Devi Sachdeva 1, Junaid Baber 2, Maheen Bakhtyar 2, Ihsan Ullah 2, Waheed Noor 2, Abdul Basit 2 1 Department of

More information

A practical guide to CNNs and Fisher Vectors for image instance retrieval

A practical guide to CNNs and Fisher Vectors for image instance retrieval A practical guide to CNNs and Fisher Vectors for image instance retrieval Vijay Chandrasekhar,1, Jie Lin,1, Olivier Morère,1,2 Hanlin Goh 1, Antoine Veillard 2 1 Institute for Infocomm Research (A*STAR),

More information

Unsupervised object discovery for instance recognition

Unsupervised object discovery for instance recognition Unsupervised object discovery for instance recognition Oriane Siméoni 1 Ahmet Iscen 2 Giorgos Tolias 2 Yannis Avrithis 1 Ondřej Chum 2 1 Inria Rennes 2 VRG, FEE, CTU in Prague {oriane.simeoni,ioannis.avrithis}@inria.fr

More information

Deep Residual Learning

Deep Residual Learning Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu Zhang, Shaoqing Ren, Jifeng Dai, & Jian Sun Microsoft Research Asia (MSRA) MSRA @ ILSVRC & COCO 2015 Competitions 1st

More information

Evaluation of GIST descriptors for web scale image search

Evaluation of GIST descriptors for web scale image search Evaluation of GIST descriptors for web scale image search Matthijs Douze Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg and Cordelia Schmid INRIA Grenoble, France July 9, 2009 Evaluation of GIST for

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

Multi-Scale Orderless Pooling of Deep Convolutional Activation Features

Multi-Scale Orderless Pooling of Deep Convolutional Activation Features Multi-Scale Orderless Pooling of Deep Convolutional Activation Features Yunchao Gong, Liwei Wang 2, Ruiqi Guo 2, and Svetlana Lazebnik 2 University of North Carolina at Chapel Hill yunchao@cs.unc.edu 2

More information

Learning Compact Visual Attributes for Large-scale Image Classification

Learning Compact Visual Attributes for Large-scale Image Classification Learning Compact Visual Attributes for Large-scale Image Classification Yu Su and Frédéric Jurie GREYC CNRS UMR 6072, University of Caen Basse-Normandie, Caen, France {yu.su,frederic.jurie}@unicaen.fr

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Efficient Representation of Local Geometry for Large Scale Object Retrieval Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society

More information

Large Scale Image Retrieval

Large Scale Image Retrieval Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions

More information

Learning to Match Images in Large-Scale Collections

Learning to Match Images in Large-Scale Collections Learning to Match Images in Large-Scale Collections Song Cao and Noah Snavely Cornell University Ithaca, NY, 14853 Abstract. Many computer vision applications require computing structure and feature correspondence

More information

Stable hyper-pooling and query expansion for event detection

Stable hyper-pooling and query expansion for event detection 3 IEEE International Conference on Computer Vision Stable hyper-pooling and query expansion for event detection Matthijs Douze INRIA Grenoble Jérôme Revaud INRIA Grenoble Cordelia Schmid INRIA Grenoble

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Learning Deep Representations for Visual Recognition

Learning Deep Representations for Visual Recognition Learning Deep Representations for Visual Recognition CVPR 2018 Tutorial Kaiming He Facebook AI Research (FAIR) Deep Learning is Representation Learning Representation Learning: worth a conference name

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Bloom Filters and Compact Hash Codes for Efficient and Distributed Image Retrieval

Bloom Filters and Compact Hash Codes for Efficient and Distributed Image Retrieval 2016 IEEE International Symposium on Multimedia Bloom Filters and Compact Hash Codes for Efficient and Distributed Image Retrieval Andrea Salvi, Simone Ercoli, Marco Bertini and Alberto Del Bimbo MICC

More information

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function: 1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.

More information

Metric Learning for Large Scale Image Classification:

Metric Learning for Large Scale Image Classification: Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Dynamic Match Kernel with Deep Convolutional Features for Image Retrieval

Dynamic Match Kernel with Deep Convolutional Features for Image Retrieval IEEE TRANSACTIONS ON IMAGE PROCESSING Match Kernel with Deep Convolutional Features for Image Retrieval Jufeng Yang, Jie Liang, Hui Shen, Kai Wang, Paul L. Rosin, Ming-Hsuan Yang Abstract For image retrieval

More information

Siamese Network Features for Image Matching

Siamese Network Features for Image Matching Siamese Network Features for Image Matching Iaroslav Melekhov Department of Computer Science Aalto University, Finland Email: iaroslav.melekhov@aalto.fi Juho Kannala Department of Computer Science Aalto

More information

Towards Deep Kernel Machines

Towards Deep Kernel Machines Towards Deep Kernel Machines Julien Mairal Inria, Grenoble Ohrid, September, 2016 Julien Mairal Towards deep kernel machines 1/58 Part I: Scientific Context Julien Mairal Towards deep kernel machines 2/58

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

PErson re-identification (re-id) is usually viewed as an

PErson re-identification (re-id) is usually viewed as an JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 A Discriminatively Learned CNN Embedding for Person Re-identification Zhedong Zheng, Liang Zheng and Yi Yang arxiv:1611.05666v2 [cs.cv] 3 Feb

More information