Gradient of the lower bound

Similar documents
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

arxiv: v1 [cs.cv] 12 Jun 2018

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Lecture 7: Semantic Segmentation

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Conditional Random Fields as Recurrent Neural Networks

Fully Convolutional Networks for Semantic Segmentation

arxiv: v1 [cs.cv] 1 Feb 2018

arxiv: v1 [cs.cv] 31 Mar 2016

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

arxiv: v3 [cs.cv] 9 Apr 2017

Convolutional Simplex Projection Network for Weakly Supervised Semantic Segmentation

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Segmenting Objects in Weakly Labeled Videos

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

YOLO9000: Better, Faster, Stronger

arxiv: v4 [cs.cv] 6 Jul 2016

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Deep Dual Learning for Semantic Image Segmentation

SEMANTIC segmentation, the task of assigning semantic. Coarse-to-fine Semantic Segmentation from Image-level Labels

Recurrent Convolutional Neural Networks for Scene Labeling

Unified, real-time object detection

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

arxiv: v1 [cs.cv] 15 Oct 2018

Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation

Webly Supervised Semantic Segmentation

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai

Final Report: Smart Trash Net: Waste Localization and Classification

arxiv: v2 [cs.cv] 18 Jul 2017

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Structured Prediction using Convolutional Neural Networks

Dataset Augmentation with Synthetic Images Improves Semantic Segmentation

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Object Detection Based on Deep Learning

Person Part Segmentation based on Weak Supervision

Rich feature hierarchies for accurate object detection and semant

Thinking Outside the Box: Spatial Anticipation of Semantic Categories

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Semantic Segmentation. Zhongang Qi

arxiv: v2 [cs.cv] 8 Apr 2018

Unsupervised Learning

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Adversarial Localization Network

Dense Image Labeling Using Deep Convolutional Neural Networks

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Auto-Encoding Variational Bayes

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

TS 2 C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

arxiv: v1 [cs.cv] 13 Jul 2018

Structured Models in. Dan Huttenlocher. June 2010

Deep Generative Models Variational Autoencoders

arxiv: v3 [cs.cv] 2 Jun 2017

Geometry-aware Traffic Flow Analysis by Detection and Tracking

Alternatives to Direct Supervision

08 An Introduction to Dense Continuous Robotic Mapping

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

arxiv: v2 [cs.cv] 9 Apr 2018

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

arxiv: v1 [cs.cv] 11 May 2018

Object Recognition II

arxiv: v3 [cs.cv] 6 Aug 2017

CAP 6412 Advanced Computer Vision

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Proposal-free Network for Instance-level Object Segmentation

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

arxiv: v1 [cs.cv] 24 May 2016

Deep condolence to Professor Mark Everingham

arxiv: v4 [cs.cv] 12 Aug 2015

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Semi Supervised Semantic Segmentation Using Generative Adversarial Network

PASCAL VOC Classification: Local Features vs. Deep Features. Shuicheng YAN, NUS

Part Localization by Exploiting Deep Convolutional Networks

Stochastic Simulation with Generative Adversarial Networks

Lecture 5: Object Detection

Spatial Localization and Detection. Lecture 8-1

Semantic Segmentation

arxiv: v1 [cs.cv] 9 Aug 2017

A Novel Representation and Pipeline for Object Detection

arxiv: v1 [cs.cv] 28 Mar 2017

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Semantic Segmentation with Scarce Data

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Speaker: Ming-Ming Cheng Nankai University 15-Sep-17 Towards Weakly Supervised Image Understanding

arxiv: v2 [cs.cv] 28 Jul 2017

Joint Calibration for Semantic Segmentation

Structured Attention Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deformable Part Models

Transcription:

Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level labels only, infer the pixel-level labels of a test set that contains only images. Other problems addressed during PhD Generative Models Hierarchical completely random measures for topic modelling (ICML-16) A variational approach to deep conditional generative modelling (IJCNN-17) Discriminative Models Deep neural networks of infinite width (ICML-14) Continuous learning with deep nonparametric neural networks (under progress) Discriminative Bayesian clustering (under progress) Latent conditional random fields for semantic segmentation (under submission) Gradient of the lower bound 1. The first term is the KL-divergence loss on the output of CNN g, whose gradient can be computed exactly. 2. The gradient of the second term is approximated using reparametrization. 3. The multinomial distribution q is approximated by its continuous relaxation, the Gumbel-softmax distribution. 4. Next, samples from the Gumbel-softmax approximation are generated and fed to the classification network. 5. The gradient of log p(y x, z) is computed with respect to the relaxed sample and backpropagated. The proposed model 1. We treat the pixel-level labels as the latent features of a CRF. 2. The pixels and the image-level label are the observed features. Figure 3: Implementation of the proposed model 6. Note that log p(y z, x) contains no trainable parameters. Qualitative Results on VOC 2012 dataset Image 3. P (z x) exp( j<i k(x i, x j )µ(z i, z j )) Figure 1: The latent CRF 4. Enforces neighboring pixels with similar color to also have the same label (local consistency constraint). 5. The aim is to maximize P (y x). P (y x) = z P (y z)p (z x) Ground Truth Predicted 6. Computation of P (y x) is intractable. 7. Hence, we maximize its lower bound. Table 1: Segmentation masks predicted by the model Variational lower bound 1. We chose a variational distribution q(z x, y), and obtain a lower bound on the log-likelihood. log p(y x) KL(q(z y, x) p(z x)) + E q(z x,y) log p(y z, x) 2. In this work, we assume that the variational distribution q factorizes completely, that is 3. Moreover, q(z x, y) = q(z jk = 1 x, y) = m j=1 q(z j y, x) (1) exp(g jk (x)) Kk =1 exp(g jk (x)) ϕ jk(x), (2) where g is a fully convolutional neural network and {g jk (x), 1 j m, 1 k K}, are the outputs of g, when x is fed as input. Quantitative Results on VOC 2012 dataset Saliency maps localize the important regions of an image, and can drastically improve performance. 1. Approaches that don t use saliency maps: (a) MIL+ILP (b) EM-Adapt (c) CCNN 2. Approaches that use saliency maps: (a) SEC (b) STC 3. Intersection over Union of predicted segmentation masks is given by: class IoU = true positive true positive + false positive + false negative MIL+ILP EM-Adapt CCNN SEC STC Ours background 77.2 67.2 68.5 82.4 84.5 84.75 aeroplane 37.3 29.2 25.5 62.9 68.0 72.36 bike 18.4 17.6 17.0 26.4 19.5 25.2 bird 25.4 28.6 25.4 61.6 60.5 64.1 boat 28.2 22.2 20.2 27.6 42.5 29.6 bottle 31.9 29.6 26.3 38.1 44.8 53.6 bus 41.6 47.0 46.8 66.6 68.4 53.1 car 48.1 44.0 47.1 62.7 64.0 62.9 cat 50.7 44.2 48.0 75.2 64.8 70.5 tvmonitor 35.0 31.6 36.9 45.3 31.2 51.6 mean 36.6 33.8 35.3 50.7 49.8 50.4 Table 2: IoU of predicted segmentation masks Conclusions Figure 2: Inference network 4. The KL-divergence term forces the variational distribution to be close to the prior. 5. This ensures that the output of q(z x, y) also respects local consistency. 6. The second term ensures that pixel-level labels are consistent with the global labels. 1. This is the only work that uses a CNN as inference networks in a CRF for semantic segmentation. 2. The proposed model drastically outperforms all methods that don t use saliency maps. 3. The proposed model achieves performance comparable with other methods that use saliency maps. 4. The proposed models suggests that traditional probabilistic models, when combined with deep networks can achieve drastically improved performance.

Weakly Supervised Semantic Segmentation with Latent PhD advsior: Ambedkar Dukkipati Department of Computer Science & Automation Indian Institute of Science

Outline Problems addressed during PhD Generative models Discriminative models Weakly Supervised

Discriminative models Outline Problems addressed during PhD Generative models Discriminative models Weakly Supervised

Discriminative models Problems addressed during PhD Generative Models Hierarchical completely random measures for topic modelling [Pandey and Dukkipati, 2016a] (ICML) A variational approach to deep conditional generative modelling [Pandey and Dukkipati, 2016b] (IJCNN) Discriminative Models Deep neural networks of innite width [Pandey and Dukkipati, 2014] (ICML) Continuous learning with deep nonparametric neural networks (under progress) Discriminative Bayesian clustering (under progress) Latent conditional random elds for semantic segmentation (under submission)

Discriminative models Problems addressed during PhD Generative Models Hierarchical completely random measures for topic modelling [Pandey and Dukkipati, 2016a] (ICML) A variational approach to deep conditional generative modelling [Pandey and Dukkipati, 2016b] (IJCNN) Discriminative Models Deep neural networks of innite width [Pandey and Dukkipati, 2014] (ICML) Continuous learning with deep nonparametric neural networks (under progress) Discriminative Bayesian clustering (under progress) Latent conditional random elds for semantic segmentation (under submission)

Outline Problems addressed during PhD Weakly Supervised Problems addressed during PhD Generative models Discriminative models Weakly Supervised

Objective Problems addressed during PhD Weakly Supervised Weakly Supervised Given a training set that comprises image and image-level labels only, infer the pixel-level labels of a test set that contains only images.

Weakly Supervised Model We treat the pixel-level labels as the latent features of a conditional random eld. The pixels and the image-level label are the observed features.

Weakly Supervised Conditional Random Field Prior: P (z x) exp( j<i k(x i, x j )µ(z i, z j )) Enforces neighboring pixels with similar color to also have the same label (local consistency constraint). The aim is to maximize P (y x). P (y x) = z P (y z)p (z x) Computation of P (y x) is intractable. Hence, we maximize its lower bound.

Variational Lower Bound Weakly Supervised We chose a variational distribution q(z x, y), and obtain a lower bound on the log-likelihood. log p(y x) KL(q(z y, x) p(z x)) + E q(z x,y) log p(y z, x) We assume that the variational distribution q factorizes completely, that is m q(z x, y) = q(z j y, x) (1) Moreover, q(z jk = 1 x, y) = j=1 exp(g jk (x)) K k =1 exp(g jk (x)) (2) where g is a fully convolutional neural network.

Variational Lower Bound Weakly Supervised The distribution q(z x, y) is parametrized by a CNN.

Weakly Supervised Variational Lower Bound The KL-divergence term forces the variational distribution to be close to the prior. This ensures that the output of q(z x, y) also respects local consistency. The second term ensures that pixel-level labels are consistent with the global labels.

Weakly Supervised Gradient of the Lower Bound The rst term is the KL-divergence loss on the output of CNN g. The gradient of this loss can be computed exactly. The gradient of the second term is approximated using MCMC samples.

Weakly Supervised Qualitative results on VOC 2012 1 dataset Table: Examples of predicted segmentation masks. The middle row is the ground truth. 1 [Everingham et al., ]

Compared methods Weakly Supervised Saliency maps localize the important regions of an image, and can drastically improve performance. Approaches that don't use saliency maps: MIL+ILP [Pinheiro and Collobert, 2015] EM-Adapt [Papandreou et al., 2015] CCNN [Pathak et al., 2015] Approaches that use saliency maps: SEC [Kolesnikov and Lampert, 2016] STC [Wei et al., 2016]

Weakly Supervised Evaluation metric For each pixel, the class label is predicted. For each class, intersection over union (IoU) score is calculated as: true positive true positive + false positive + false negative The mean IoU is simply the average over all the classes.

Weakly Supervised Quantitative results on VOC 2012 dataset class MIL+ILP EM-Adapt CCNN SEC STC Ours background 77.2 67.2 68.5 82.4 84.5 84.75 aeroplane 37.3 29.2 25.5 62.9 68.0 72.36 bike 18.4 17.6 17.0 26.4 19.5 25.2 bird 25.4 28.6 25.4 61.6 60.5 64.1 boat 28.2 22.2 20.2 27.6 42.5 29.6 bottle 31.9 29.6 26.3 38.1 44.8 53.6 bus 41.6 47.0 46.8 66.6 68.4 53.1 car 48.1 44.0 47.1 62.7 64.0 62.9 mean 36.6 33.8 35.3 50.7 49.8 50.4 Table: Results on PASCAL VOC 2012 (IoU in %) val set.

Weakly Supervised Conclusions The rst work that uses a CNN as inference networks in a CRF for semantic segmentation. The proposed model drastically outperforms all methods that don't use saliency maps. The proposed model achieves performance comparable with other methods that use saliency maps. The proposed models suggests that traditional probabilistic models, when combined with deep networks can achieve drastically improved performance.

References I Problems addressed during PhD Weakly Supervised Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/voc/voc2012/workshop/index.html. Kolesnikov, A. and Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European Conference on Computer Vision, pages 695711. Springer.

References II Weakly Supervised Pandey, G. and Dukkipati, A. (2014). Learning by stretching deep networks. In Proceedings of The 31st International Conference on Machine Learning, pages 17191727. Pandey, G. and Dukkipati, A. (2016a). On collapsed representation of hierarchical completely random measures. In Proceedings of The 33rd International Conference on Machine Learning, pages 16051613. Pandey, G. and Dukkipati, A. (2016b). Variational methods for conditional multimodal deep learning. arxiv preprint arxiv:1603.01801.

References III Weakly Supervised Papandreou, G., Chen, L.-C., Murphy, K., and Yuille, A. L. (2015). Weakly-and semi-supervised learning of a dcnn for semantic image segmentation. arxiv preprint arxiv:1502.02734. Pathak, D., Krahenbuhl, P., and Darrell, T. (2015). Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 17961804.

References IV Weakly Supervised Pinheiro, P. O. and Collobert, R. (2015). From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 17131721. Wei, Y., Liang, X., Chen, Y., Shen, X., Cheng, M.-M., Feng, J., Zhao, Y., and Yan, S. (2016). Stc: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.