Supplementary Materials for Salient Object Detection: A Discriminative Regional Feature Integration Approach Huaizu Jiang, Zejian Yuan, Ming-Ming Cheng, Yihong Gong Nanning Zheng, and Jingdong Wang Abstract In this supplementary material, we will present more details on learning a Random Forest saliency regressor. More evaluation results with state-of-the-art algorithms are also presented. LEARNING. Learning a Similarity Score between Two Adjacent Superpixels To learn the similarity score of two adjacent superpixels s i and s j, they are described by a 222- dimensional feature vector, including their saliency features, feature contrast, and the geometry features between them. Saliency features are already introduced in our paper. Feature contrast and superpixel boundary geometry features are presented in Fig...2 Feature importance in a Random Forest Training a Random Forest regressor is to independently build each decision tree. For the t-th decision tree, the training samples are randomly drawn from all training samples with replacement, X t = {x (t), x (t2),, x (tq) }, A t = {a (t), a (t2),, a (tq) }, where t i [, Q], i [, Q]. When learning a decision tree, the training samples are randomly drawn with replacement. In another word, some samples are not used for training. These training samples are called out-of-bag (oob) data. After constructing a decision tree, those oob data can be utilized to estimate the importance of features. Suppose that the feature f was used to construct one of the nodes of the tree and D oob are the oob samples. We first compute the prediction error for these oob data based on the i-th decision tree. E(f, i) = ( ã (j) i (f) a (j)) 2, () j D oob where ã (j) i (f) is the prediction value of the j-th oob sample given by the i-th tree based on the feature f. The feature f are then randomly permuted among H. Jiang, Z. Yuan, Y. Gong and N. Zheng, are with Xi an Jiaotong University. M.M. Cheng is with Oxford University. J. Wang is with Microsoft Research Asia. A preliminary version of this work appeared at CVPR []. Project website jianghz.com/drfi. Feature Contrast Dim c. abs. diff. of average RGB values 3 c2. χ 2 distance of RGB histogram c3. abs. diff. of average V values 3 c4. χ 2 distance of V histogram c5. abs. diff. of average L*a*b* values 3 c6. χ 2 distance of L*a*b* histogram c7. abs. diff. of average responses of filter bank 5 c8. χ 2 distance of maximum response of filter bank c9. χ 2 distance of texton histogram Boundary Geometry Dim g. average x coordinates g2. average y coordinates g3. th percentile of x coordinates g4. th percentile of y coordinates g5. 9th percentile of x coordinates g6. 9th percentile of y coordinates g7. normalized length Fig.. Feature contrast and superpixel boundary geometry features between two adjacent superpixels. all the oob samples. The permuted prediction error is then computed E p (f, i) = j D oob ( b(j) i (f) a (j)) 2, (2) (j) where b i (f) is the prediction value of the j-th oob sample given by the i-th tree based on the randomly permuted feature f. Finally, the importance of feature f can be computed as I(f) = T i= E p (f, i) E(f, i). (3) T The importance measure of features can be interpreted this way: if a feature is not so important, the prediction value will not be affected too much even if it is randomly set. Otherwise randomly permuted an important feature will greatly influence the prediction value. Therefore, the difference of prediction values I(f) over all the decision trees can be used to measure a feature s importance.
2 Fig. 2. Average annotation map of the DUT-OMRON* data set. 2 ADDITIONAL EVALUATIONS Extensive evaluations have been performed on six datasets, including MSRA-B [2], icoseg 2 [3], SED2 3 [4], ECSSD 4 [5], DUT-OMRON 5 [6], and DUT- OMRON* data set. In this section, we will provide additional quantitative and qualitative comparisons with state-of-the-art approaches for more comprehensive evaluation. 2. DUT-OMRON* Data Set In the paper, we sample 635 images from the DUT- OMRON data set (we call it DUT-OMRON* data set), where salient objects touch the image border and are far from the image center, in order to check the robustness of our approach. The average annotation of salient objects is shown in Fig 2. As we can see, there is a strong off-center bias. Moreover, it can be clearly seen that salient objects touch the image border. These two factors make DUT-OMRON* data set challenging for our approach, which is dependent on the pseudobackground assumption and the geometric distributions of salient objects discovered from training images. We also present some sample images in Fig. 3. The complete list of images in DUT-OMRON* data set is available at our project webpage jianghz.com/drfi. 2.2 Additional Benchmark Data Sets Due to limited space, we only consider six benchmark data sets in the paper for evaluations. In this supplementary material, we will provide more results on other three data sets, which are widely adopted for salient object detection evaluation as well. MSRAk. 6 This data set [7], containing images sampled from MSRA-B [2], is the first large-. http://research.microsoft.com/en-us/um/people/jiansun/ 2. http://chenlab.ece.cornell.edu/projects/touch-coseg/ 3. http://www.wisdom.weizmann.ac.il/ vision/seg Evaluation DB/ Fig. 3. Demonstration of sample images of the DUT- 4. http://www.cse.cuhk.edu.hk/leojia/projects/hsaliency/ OMRON* dataset. 5. http://ice.dlut.edu.cn/lu/dut-omron/homepage.htm 6. http://ivrgwww.epfl.ch/supplementary material/rk CVPR9/
3. P MSRA.. P P DUT OMRON MSRAk. P icoseg. P SED2. P SED. P ECSSD. P DUT OMRON*. P SOD Fig. 4. Quantitative comparisons of saliency maps produced by different approaches on different data sets in terms of PR curves. scale data set for salient object detection with pixelwise groundtruth annotation. However, performance of recent approaches start to be saturated on this benchmark data set as the background and salient regions are relatively homogeneous and simple. Since some images of MSRAk are chosen as training data of our approach. We discard these images for testing phrase. There are 396 images left. SED. 7 It has images containing exactly one salient object in each image. Pixel-wise groundtruth annotations for the salient objects are provided. Similar to MSRAk, it is a relatively simple data set with clean background. SOD. 8 This data set [8] is a collection of salient object boundaries based on the Berkeley segmentation data set [9]. Seven subjects are asked to choose the salient object(s) in 3 images. We generate the pixel-wise annotation of the salient objects as []. This data set contains many images with multiple objects making 7. http://www.wisdom.weizmann.ac.il/ vision/seg Evaluation DB/ 8. http://elderlab.yorku.ca/sod/ it challenging. 2.3 Additional Quantitative Comparisons In the paper, we provide the quantitative comparisons of our approach ( and ) on six benchmark data sets. In this section, we provide additional results on MSRAk, SED, and SOD data sets. In addition to the PR curve, ROC curve, and AUC (Area Under ROC Curve) scores, we also report the MAE (Mean Absolute Error) scores of each approach. MAE score directly reflects the mean absolute difference of the groundtruth annotation and a saliency map. Similar to the paper, we compare our method with 2 state-of-the-art approaches, including [], [2], [3], [4], [5], [6], [7], [6], P [8], [9], [2], and [2]. PR curves and ROC curves are plotted in Fig. 4 and Fig. 5, respectively. AUC scores and MAE scores are presented in Fig. 6 and Fig. 7. As can be seen, our approach is slightly better than most of state-of-the-art methods on MSRAk and SED according to either PR or ROC curves.
4. MSRA P.. DUT OMRON MSRAk P P. icoseg P. SED2 P. SED P. ECSSD P. DUT OMRON* P. SOD P Fig. 5. Quantitative comparison of saliency maps produced by different approaches on different data sets in terms of ROC curves. With multi-level enhancement, our approach performs much better. On the challenging SOD data set, and both significantly outperform other methods. Regarding AUC scores, performs consistently the best on MSRAk, SED, and SOD, and is ranked as the second best on SED and SOD. Since our approach computes the saliency score for each region independently, the smoothness constraint of adjacent regions is ignored. Therefore, our approach does not perform in terms of MAE scores as well as in terms of AUC scores. In specific, still performs the best on three benchmark data sets and the thrid best on SOD. is ranked as the best on two data sets, the second best on three data sets, and the third best on two data sets. Please note more sophisticated post-processing steps are utilized in other top-performing approaches. For instance, adopts Bayesian integration (in addition to the multi-level enhancement). The quadratic and manifold diffusion methods are adopted by and, respectively. As we stated in the paper about our future work, better MAE scores can be expected if more advanced post-processing is integrated. 2.4 Additional Qualitative Comparisons In this supplementary material, we present more qualitative comparisons of different approaches on all benchmark data sets. Saliency maps of randomly chosen images of each benchmark data set is shown from Fig. 8 to Fig. 6. Generally, our approach can produce more appealing saliency maps than other approaches. See the second and six rows of Fig. 8. For the extremely challenging DUT-OMRON* data set, our approach can generate slightly better saliency maps than others in most cases. See Fig. 3 for examples. REFERENCES [] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, Salient object detection: A discriminative regional feature integration approach, in IEEE CVPR, 23, pp. 283 29. [2] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.- Y. Shum, Learning to detect a salient object, IEEE TPAMI, vol. 33, no. 2, pp. 353 367, 2.
5 MSRA-B icoseg ECSSD DUT-OMRON SED2 DUT-OMRON* MSRAk SED SOD 99 6 99 66 34 93 54 62 77 6 37 38 5 54 6 88 6 82 3 52 9 3 25 24 59 8 37 37 8 33 59 4 79 76 72 2 7 77 3 72 5 62 79 63 25 8 59 8 58 58 79 9 3 82 29 6 2 35 7 42 2 34 53 3 46 76 9 95 P 38 95 7 87 3 76 75 5 22 5 98 49 87 63 5 78 29 38 56 2 56 99 95 76 82 27 4 45 4 4 94 73 79 8 5 2 54 44 58 2 4 76 5 54 7 68 75 3 33 22 88 72 83 Fig. 6. AUC (Area Under ROC Curve) scores of different approaches (larger is better). The best three results are highlighted with red, green, and blue fonts, respectively. MSRA-B icoseg ECSSD DUT-OMRON SED2 DUT-OMRON* MSRAk SED SOD 49 28 2 9 8 45 32 39 94 47 59 43 54 25 68 25 55 2 8 82 57 8 27 6 97 8 35 8 3 69. 42 2 74 3 3 8 5 2 8 4 22 6 49 84 4 47 69 27 5 99.2 7 83 5 3 37 9 5 7.74 8 6 P 8 9 9 6 4 3 3 25 74 4 9 5 6 52.92 9 6 2 26 9 8 9.8 6 34 25 4 9 9.64 8 29 3 8 22 6 2.94 23 8 32 5 7 34.98 6 27 Fig. 7. MAE (Mean Absolute Error) scores of different approaches (smaller is better). The best three results are highlighted with red, green, and blue fonts, respectively. [3] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, Interactively co-segmentating topically related images with intelligent scribble guidance, International Journal of Computer Vision, vol. 93, no. 3, pp. 273 292, 2. [4] S. Alpert, M. Galun, R. Basri, and A. Brandt, Image segmentation by probabilistic bottom-up aggregation and cue integration, in CVPR, 27. [5] Q. Yan, L. Xu, J. Shi, and J. Jia, Hierarchical saliency detection, in CVPR. CVPR, 23, pp. 55 62. [6] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, Saliency detection via graph-based manifold ranking, in CVPR, 23. [7] R. Achanta, S. S. Hemami, F. J. Estrada, and S. Süsstrunk, Frequency-tuned salient region detection, in CVPR, 29. [8] V. Movahedi and J. H. Elder, Design and perceptual validation of performance measures for salient object segmentation, in POCV, 2. [9] D. R. Martin, C. Fowlkes, and J. Malik, Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 5, pp. 53 549, 24. [] Y. Wei, F. Wen, W. Zhu, and J. Sun, Geodesic saliency using background priors, in ECCV (3), 22, pp. 29 42. [] K.-Y. Chang, T.-L. Liu, H.-T. Chen, and S.-H. Lai, Fusing generic objectness and visual saliency for salient object detection, in ICCV, 2, pp. 94 92. [2] S. Goferman, L. Zelnik-Manor, and A. Tal, Context-aware saliency detection, in CVPR, 2, pp. 2376 2383. [3] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li, Automatic salient object segmentation based on context and shape prior, in BMVC, 2. [4] M.-M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S.- M. Hu, Global contrast based salient region detection, IEEE TPAMI, 24. [5] F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, Saliency filters: Contrast based filtering for salient region detection, in CVPR, 22, pp. 733 74. [6] X. Shen and Y. Wu, A unified approach to salient object detection via low rank matrix recovery, in CVPR, 22. [7] Q. Yan, L. Xu, J. Shi, and J. Jia, Hierarchical saliency detection, in CVPR, 23. [8] R. Margolin, A. Tal, and L. Zelnik-Manor, What makes a patch distinct? in CVPR, 23. [9] B. Jiang, L. Zhang, H. Lu, C. Yang, and M.-H. Yang, Saliency detection via absorbing markov chain, in ICCV, 23. [2] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, Saliency detection via dense and sparse reconstruction, in ICCV, 23. [2] W. Zhu, S. Liang, Y. Wei, and J. Sun, Saliency optimization from robust background detection, in CVPR, 24.
6 (a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig. 8. Qualitative comparisons of different salient object detection approaches on the MSRA dataset.
7 (a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig. 9. Qualitative comparisons of different salient object detection approaches on the icoseg dataset.
8 (a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig.. Qualitative comparisons of different salient object detection approaches on the ECSSD dataset.
9 (a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig.. Qualitative comparisons of different salient object detection approaches on the DUT-OMRON dataset.
(a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig. 2. Qualitative comparisons of different salient object detection approaches on the SED2 dataset.
(a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig. 3. Qualitative comparisons of different salient object detection approaches on the DUT-OMRON* dataset.
2 (a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig. 4. Qualitative comparisons of different salient object detection approaches on the MSRAk dataset.
3 (a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig. 5. Qualitative comparisons of different salient object detection approaches on the SED dataset.
4 (a) input (b) (c) (d) (e) (f) (g) (h) (i) (j) P (k) (l) (m) (n) (o) Fig. 6. Qualitative comparisons of different salient object detection approaches on the SOD dataset.