Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Similar documents
Unconstrained Salient Object Detection via Proposal Subset Optimization

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Final Report: Smart Trash Net: Waste Localization and Classification

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

Object Detection Based on Deep Learning

arxiv: v1 [cs.cv] 26 Jun 2017

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Project 3 Q&A. Jonathan Krause

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Structured Prediction using Convolutional Neural Networks

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Channel Locality Block: A Variant of Squeeze-and-Excitation

Gradient of the lower bound

Object Detection on Self-Driving Cars in China. Lingyun Li

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

Object detection with CNNs

Spatial Localization and Detection. Lecture 8-1

Unified, real-time object detection

arxiv: v4 [cs.cv] 6 Jul 2016

End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images

RSRN: Rich Side-output Residual Network for Medial Axis Detection

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

A Novel Representation and Pipeline for Object Detection

Rich feature hierarchies for accurate object detection and semantic segmentation

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Pseudo Mask Augmented Object Detection

arxiv: v1 [cs.cv] 14 Dec 2015

Fully Convolutional Networks for Semantic Segmentation

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

arxiv: v1 [cs.cv] 24 May 2016

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Real-time Object Detection CS 229 Course Project

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

DeepBox: Learning Objectness with Convolutional Networks

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Pseudo Mask Augmented Object Detection

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Dense Volume-to-Volume Vascular Boundary Detection

Content-Based Image Recovery

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Multi-Glance Attention Models For Image Classification

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Lecture 5: Object Detection

Adaptive Object Detection Using Adjacency and Zoom Prediction

Hand Detection For Grab-and-Go Groceries

Supplementary Material for submission 2147: Traditional Saliency Reloaded: A Good Old Model in New Shape

Feature-Fused SSD: Fast Detection for Small Objects

Rich feature hierarchies for accurate object detection and semant

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

arxiv: v1 [cs.cv] 4 Jun 2015

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

arxiv: v1 [cs.cv] 19 Apr 2016

Face Recognition via Active Annotation and Learning

arxiv: v1 [cs.cv] 15 Oct 2018

Mimicking Very Efficient Network for Object Detection

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

Traffic Multiple Target Detection on YOLOv2

arxiv: v1 [cs.cv] 6 Jul 2016

Dense Image Labeling Using Deep Convolutional Neural Networks

Co-Saliency Detection within a Single Image

arxiv: v1 [cs.cv] 5 Oct 2015

Return of the Devil in the Details: Delving Deep into Convolutional Nets

arxiv: v3 [cs.cv] 18 Oct 2017

Rich feature hierarchies for accurate object detection and semantic segmentation

YOLO9000: Better, Faster, Stronger

FCHD: A fast and accurate head detector

Object cosegmentation using deep Siamese network

arxiv: v2 [cs.cv] 18 Jul 2017

Learning Detection with Diverse Proposals

Cascade Region Regression for Robust Object Detection

RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking

arxiv: v2 [cs.cv] 26 Mar 2018

Tiny ImageNet Visual Recognition Challenge

Subspace Alignment Based Domain Adaptation for RCNN Detector

arxiv: v1 [cs.cv] 6 Sep 2016

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Part Localization by Exploiting Deep Convolutional Networks

Actions and Attributes from Wholes and Parts

A MultiPath Network for Object Detection

Deep Networks for Saliency Detection via Local Estimation and Global Search

Gated Bi-directional CNN for Object Detection

Texture Complexity based Redundant Regions Ranking for Object Proposal

In Defense of Fully Connected Layers in Visual Representation Transfer

Instance-Level Salient Object Segmentation

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 12 Sep 2016

CSE255 Assignment 1 Improved image-based recommendations for what not to wear dataset

WeText: Scene Text Detection under Weak Supervision

arxiv: v3 [cs.cv] 29 Mar 2017

Object Detection and Its Implementation on Android Devices

FOr object proposal generation, we are interested in

arxiv: v1 [cs.cv] 3 Apr 2016

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Transcription:

Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization formulation can be represented as: n h(o) = max w i (x i ) φ O γ K ij, i=1 x i Õ {0} 2 i,j Õ:i j (1) where w i (j) logp(x i = j I) and K ij is shorthand for the bounding box similarity measurek(b i,b j ). Õ denotes the index set corresponding to the selected windows ino. Proposition 1. h(o) is a submodular function. Proof. Let where h(o) = n A i (O)+φB(O)+γC(O), (2) i=1 A i (O) = max x i Õ {0} w i (x i ), B(O) = O, C(O) = 1 K ij. 2 i,j Õ:i j Because φ and γ are non-negative, it suffices to show A i (O), B(O) and C(O) are all submodular, since the class of submodular functions is closed under non-negative linear combinations. Recall that O B = {B i } n 1, where B is the overall window proposal set. Let X and Y denote two subsets of B, and X Y. Also, let x denote an arbitrary window proposal such thatx B\Y. To show a function f is submodular, we just need to prove that f(x {x}) f(x) f(y {x}) f(y) [6, p. 766]. First, B(O) is submodular because B(X {x}) B(X) = X {x} + X = Y {x} + Y = B(Y {x}) B(Y). Second,C(O) is submodular because C(X {x}) C(X) = i XK(B i,b x ) i Ỹ K(B i,b x ) C(Y {x}) C(Y), where X, Ỹ and x are the corresponding indices of X, Y andxw.r.t.b. Note thatk(b i,b x ) is a similarity measure, and it is non-negative. Lastly, we show that A i (O) is submodular. Note that A i is a monotone set function, so A i (Y) A i (X). Furthermore,A i (X {x}) = max{a i (X),A x i }, whereax i A i ({x}). Thus, A i (Y {x}) A i (X {x}) =max{a i (Y),A x i} max{a i (X),A x i} A i (Y) A i (X). It is easy to see the last inequality by checking the cases when A x i A i(x), A i (X) < A x i A i(y) and A i (Y) < A x i respectively. Then it follows that A i (X {x}) A i (X) A i (Y {x}) A i (Y). Therefore, A i is submodular. 2. Merging Annotated Bounding Boxes The MSRA [5] and DUT-O [9] datasets provide raw bounding box annotations from different subjects. To obtain a set of ground truth windows for each image, we use a greedy algorithm to merge bounding box annotations labeled by different subjects. Let B = {B i } n i denote the bounding box annotations for an image. For each bounding box B i, we calculate an overlap score: S i = IOU(B i,b j ). j:j i Based on the overlap score, we do a greedy non-maximumsuppression with the IOU threshold of 0.5 to get a set 1

of candidate windows. To suppress outlier annotations, a candidate window B i is removed if there are fewer than two other windows in B that significantly overlap with B i (IOU > 0.5). The remaining candidates are output as the ground truth windows for the given image. 3. CNN Model Training Details We generate 100 exemplar windows by doing K-means clustering on the bounding box annotations of the SOS training set. The training images are resized to 224 224 regardless of their original dimensions. Training images are augmented by flipping and random cropping. Bounding box annotations that overlap with the cropping window by less than 50% are discarded. We use Caffe [3] to train the CNN model with a minibatch size of 8 and a fixed base learning rate of 10 4. We fine-tune all the fully-connected layers together with conv5 1, conv5 2 and conv5 3 layers by backpropagation. Other training settings are the same as in [7]. We fine-tune the model on the ILSVRC-2014 detection dataset for 230K iterations, when the validation error plateaus. Then we continue to fine-tune the model on the SOS training dataset for 2000 iterations, where the iteration number is chosen via 5-fold cross validation. Fine-tuning takes about 20 hours on the ILSVRC dataset, and 20 mins on the SOS dataset using a single NVIDIA K40C GPU. 4. The Maximum Marginal Relevance Baseline In our experiments, the Maximum Marginal Relevance (MMR) baseline follows the formulation in [1]. The MMR re-scores each proposal by iteratively selecting the proposal with maximum marginal relevance w.r.t. the previously selected proposals. The maximum marginal relevance is formulated by MMR = argmax h i H\H p [ ] s(h i ) θ max IOU(h i,h j ), (3) h j H p whereh p is the previously selected proposals. We optimize the parameter θ for the MMR baseline w.r.t. the AP score. For SalCNN, we use θ = 1.3, and for MBox, we use θ = 0.05. 5. Sample Results We show sample detection results of our method on the four datasets: MSO [10] (Figs. 1 and 2), VOC07 [2] (Figs. 3 and 4), DUT-O [9] (Figs. 5 and 6) and MSRA [5] (Figs. 7 and 8). 6. Results on PASCAL-S For completeness, we further evaluate our method on the PASCAL-S dataset [4]. The images of PASCAL-S are from the PASCAL VOC07 dataset [2], and the ground truth is Table 1: AP scores on the PASCAL-S dataset MBox+NMS MBox+MAP SalCNN+NMS SalCNN+MAP.605.624.547.599 labeled based on the eye fixation data and the object segmentation masks provided by VOC07. This dataset contains some images with multiple objects, and most of its images contain at least one salient object. Table 1 show the results of our MAP formulation compared with the NMS baseline for our SalCNN and MBox [8]. We find that our MAP formulation consistently improves over the NMS baseline for both MBox and our Sal- CNN on this dataset. MBox+MAP is slightly better than SalCNN+MAP on this dataset. Note that the ground truth annotations of PASCAL-S are still limited to the 20 categories in PASCAL VOC that MBox is trained on and optimized for. References [1] J. Carreira and C. Sminchisescu. Cpmc: Automatic object segmentation using constrained parametric min-cuts. PAMI, 34(7):1312 1328, 2012. [2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/voc/voc2007/workshop/index.html. [3] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arxiv preprint arxiv:1408.5093, 2014. [4] Y. Li, X. Hou, C. Koch, J. M. Rehg, and A. L. Yuille. The secrets of salient object segmentation. In CVPR, 2014. [5] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.-Y. Shum. Learning to detect a salient object. PAMI, 33(2):353 367, 2011. [6] A. Schrijver. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer Science & Business Media, 2003. [7] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. [8] C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. arxiv preprint arxiv:1412.1441, 2014. [9] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection via graph-based manifold ranking. In CVPR, 2013. [10] J. Zhang, S. Ma, M. Sameki, S. Sclaroff, M. Betke, Z. Lin, X. Shen, B. Price, and R. Mĕch. Salient object subitizing. In CVPR, 2015. 2

Figure 1: Sample detection results of our method when λ = 0.1 on the MSO dataset [10]. 3

Figure 2: Sample detection results of our method when λ = 0.1 on the MSO dataset [10]. 4

Figure 3: Sample detection results of our method when λ = 0.1 on the VOC07 dataset [2]. 5

Figure 4: Sample detection results of our method when λ = 0.1 on the VOC07 dataset [2]. 6

Figure 5: Sample detection results of our method when λ = 0.1 on the DUT-O dataset [9]. 7

Figure 6: Sample detection results of our method when λ = 0.1 on the DUT-O dataset [9]. 8

Figure 7: Sample detection results of our method when λ = 0.1 on the MSRA dataset [5]. 9

Figure 8: Sample detection results of our method 10 when λ = 0.1 on the MSRA dataset [5].