Recurrent Pixel Embedding for Grouping Shu Kong

Size: px

Start display at page:

Download "Recurrent Pixel Embedding for Grouping Shu Kong"

Augustus Stevens
6 years ago
Views:

1 Recurrent Pixel Embedding for Grouping Shu Kong CS, ICS, UCI

2 1. Problem Statement -- Pixel Grouping 2. Pixel-Pair Spherical Max-Margin Embedding 3. Recurrent Mean Shift Grouping 4. Experiment Outline 5. Conclusion and Extension Note: the slides were made before paper submission, please treat them as supplemental material and refer to the paper for updated content.

3 Tasks diving into pixels -- Pixel Labeling

4 Pixel Labeling: Low-Level Vision Tasks diving into pixels -- Low-level vision: edge, boundary, contour

5 Pixel Labeling: Mid-Level Vision Tasks diving into pixels -- Low-level vision: edge, boundary, contour Mid-level vision: object proposal

Mid-level vision: object proposal High-level vision:

6 Pixel Labeling: High-Level Vision Tasks diving into pixels -- Low-level vision: edge, boundary, contour Mid-level vision: object proposal High-level vision: semantic segmentation instance-level semantic segmentation

7 Pixel Labeling: Learning Tasks diving into pixels -- Low-level vision: logistic loss edge, boundary, contour Mid-level vision: object proposal logistic loss for score regression for location High-level vision: semantic segmentation instance-level semantic segmentation logistic loss for mask&score cross-entropy for category

8 Pixel Labeling: New Framework A new framework consisting of two novel modules --

9 Pixel Labeling: New Framework This A new framework framework consisting of two novel modules -- is agnostic to architecture, so ignore deep learning for now!

10 Pixel Labeling: New Framework A new framework consisting of two novel modules pixel-pair spherical max-margin regression 2. recurrent mean shift grouping

11 Pixel Labeling: New Framework A new framework consisting of two novel modules pixel-pair spherical max-margin regression learning an embedding space on the hyper-sphere such that if meeting the pair-wise criterion, learn to push pixels to be close to each other, e.g. both are boundaries, from same instance; if not, learn to pull them apart. 2. recurrent mean shift grouping

12 Pixel Labeling: New Framework A new framework consisting of two novel modules pixel-pair spherical max-margin regression learning an embedding space on the hyper-sphere such that if meeting the pair-wise criterion, learn to push pixels to be close to each other; e.g. both are boundaries, from same instance; if not, learn to pull them apart. 2. recurrent mean shift grouping iteratively group the pixels into discrete clusters, such as criteria: boundary vs. non-boundary; object proposals; semantic segments

13 Pixel-Pair Spherical Max-Margin Regression

14 Pixel-Pair Spherical Max-Margin Regression date back to Fisher Linear discriminant analysis (LDA)

15 Pixel-Pair Spherical Max-Margin Regression date back to Fisher Linear discriminant analysis (LDA) To utilize the label information in finding informative projection, maximizing the following objective where

16 Pixel-Pair Spherical Max-Margin Regression What loss functions can we use at pixel-level?

17 Pixel-Pair Spherical Max-Margin Regression What loss functions can we use at pixel-level? Principle for positive pairs of pixels (meeting the criterion), minimizing the pair-wise discrepancy/distance; 2. for negative pairs, minimizing the similarity.

Pixel-Pair Spherical Max-Margin Regression What loss functions can we use at pixel-level? Principle -- 1.

18 Pixel-Pair Spherical Max-Margin Regression What loss functions can we use at pixel-level? Principle for positive pairs of pixels (meeting the criterion), minimizing the pair-wise discrepancy/distance; 2. for negative pairs, minimizing the similarity. Bert De Brabandere, Davy Neven, Luc Van GoolSemantic Instance Segmentation with a Discriminative Loss Function, arxiv, 2017

19 Pixel-Pair Spherical Max-Margin Regression What loss functions can we use at pixel-level? Principle for positive pairs of pixels (meeting the criterion), minimizing the pair-wise discrepancy/distance; 2. for negative pairs, minimizing the similarity. for example: Euclidean distance between pixel feature vectors for measuring distance. Its inverse, or Gaussian transform, can measure the similarity.... Bert De Brabandere, Davy Neven, Luc Van GoolSemantic Instance Segmentation with a Discriminative Loss Function, arxiv, 2017 Alejandro Newell, Jia Deng, Associative Embedding: End-to-End Learning for Joint Detection and Grouping, NIPS, 2017 Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy, Semantic Instance Segmentation via Deep Metric Learning

20 Pixel-Pair Spherical Max-Margin Regression We propose the module to learn a hyper-sphere (embedding space), such that positive pairs have high cosine similarity; negative pairs have low cosine similarity.

21 Pixel-Pair Spherical Max-Margin Regression Why cosine similarity? E. B. Saff and A. B. Kuijlaars. Distributing many points on a sphere. The mathematical intelligencer, 19(1):5 11, L. Lovisolo ; E.A.B. da Silva, uniform distribution of points on a hyper-sphere with applications to vector bit-plane encoding, IEE Proc. Vision, Image and Signal Processing, 2001

22 Pixel-Pair Spherical Max-Margin Regression Why cosine similarity? 1. scale-invariant to the length of feature vector; E. B. Saff and A. B. Kuijlaars. Distributing many points on a sphere. The mathematical intelligencer, 19(1):5 11, L. Lovisolo ; E.A.B. da Silva, uniform distribution of points on a hyper-sphere with applications to vector bit-plane encoding, IEE Proc. Vision, Image and Signal Processing, 2001

23 Pixel-Pair Spherical Max-Margin Regression Why cosine similarity? 1. scale-invariant to the length of feature vector; 2. easy to analyze how to set hyper-parameters; E. B. Saff and A. B. Kuijlaars. Distributing many points on a sphere. The mathematical intelligencer, 19(1):5 11, L. Lovisolo ; E.A.B. da Silva, uniform distribution of points on a hyper-sphere with applications to vector bit-plane encoding, IEE Proc. Vision, Image and Signal Processing, 2001

Distributing many points on a sphere. The mathematical intelligencer, 19(1):5 11, 1997. L. Lovisolo ; E.A.B.

24 Pixel-Pair Spherical Max-Margin Regression Why cosine similarity? 1. scale-invariant to the length of feature vector; 2. easy to analyze how to set hyper-parameters; E. B. Saff and A. B. Kuijlaars. Distributing many points on a sphere. The mathematical intelligencer, 19(1):5 11, L. Lovisolo ; E.A.B. da Silva, uniform distribution of points on a hyper-sphere with applications to vector bit-plane encoding, IEE Proc. Vision, Image and Signal Processing, 2001

25 Pixel-Pair Spherical Max-Margin Regression We use the calibrated cosine similarity as below

26 Pixel-Pair Spherical Max-Margin Regression We use the calibrated cosine similarity as below loss function contains postive and negative pairs

27 Pixel-Pair Spherical Max-Margin Regression We use the calibrated cosine similarity as below loss function contains postive and negative pairs alpha is the margin, hyper parameter to be set.

28 Pixel-Pair Spherical Max-Margin Regression We use the calibrated cosine similarity as below loss function contains postive and negative pairs alpha is the margin, hyper parameter to be set. Gradient is one, didn't penalize hard pixels in sensitive regions, say nearby boundary, segments, etc.

29 Pixel-Pair Spherical Max-Margin Regression Important theories 1. the loss has a lower bound, minimum; 2. the lower bound does not depend on the dimension of the embedding space.

30 2D case Pixel-Pair Spherical Max-Margin Regression

31 3D case Pixel-Pair Spherical Max-Margin Regression

32 Pixel-Pair Spherical Max-Margin Regression

33 Pixel-Pair Spherical Max-Margin Regression

34 Pixel-Pair Spherical Max-Margin Regression

35 One more Pixel-Pair Spherical Max-Margin Regression

36 Pixel-Pair Spherical Max-Margin Regression Last one -- Combination-aware Weighting

37 Recurrent Mean Shift Grouping From good embedding space to pixel labeling How to get the instances? How to group the pixels?

38 Recurrent Mean Shift Grouping From good embedding space to pixel labeling How to get the instances? How to group the pixels? k-means, k-medoids?

39 Recurrent Mean Shift Grouping From good embedding space to pixel labeling How to get the instances? How to group the pixels? k-means, k-medoids? mean shift

40 Recurrent Mean Shift Grouping mean shift R.Collins, CSE, PSU, CSE598G Spring 2006

41 Recurrent Mean Shift Grouping mean shift R.Collins, CSE, PSU, CSE598G Spring 2006

42 Recurrent Mean Shift Grouping mean shift K. Fukunaga, L. Hostetler, The Estimation of the Gradient of a Density Function, with Applications in Pattern - Recognition, PAMI, 1975

43 Recurrent Mean Shift Grouping mean shift Other than estimating the PDF directly, estimating the gradient --

44 Recurrent Mean Shift Grouping mean shift then

45 Recurrent Mean Shift Grouping mean shift: iteratively updating by shifting the data by such an amount

46 Recurrent Mean Shift Grouping mean shift: iteratively updating by shifting the data by such an amount

47 Recurrent Mean Shift Grouping mean shift: iteratively updating by shifting the data by such an amount Gaussian blurring mean-shift (GBMS) algorithm the new iterate is the data average under the posterior probabilities given the current iterate:

48 Recurrent Mean Shift Grouping Gaussian blurring mean-shift (GBMS) algorithm Miguel A. Carreira-Perpinán, Fast Nonparametric Clustering with Gaussian Blurring Mean-Shift, ICML, 2006

49 Recurrent Mean Shift Grouping Gaussian blurring mean-shift (GBMS) algorithm It's guaranteed to converge without gradient vanishing or exploding Miguel A. Carreira-Perpinán, Fast Nonparametric Clustering with Gaussian Blurring Mean-Shift, ICML, 2006

50 Recurrent Mean Shift Grouping Gaussian blurring mean-shift (GBMS) algorithm It's guaranteed to converge without gradient vanishing or exploding But, are the updated data still on the sphere? Miguel A. Carreira-Perpinán, Fast Nonparametric Clustering with Gaussian Blurring Mean-Shift, ICML, 2006

51 Recurrent Mean Shift Grouping L2 normalization in the loop It's guaranteed to converge without gradient vanishing or exploding. Takumi Kobayashi, Nobuyuki Otsu, Von Mises-Fisher Mean Shift for Clustering on a Hypersphere, ICPR, 2010

52 Recurrent Mean Shift Grouping running the von-mises Fisher mean shift offline

53 Recurrent Mean Shift Grouping mean shift as recurrent module

54 Recurrent Mean Shift Grouping mean shift as recurrent module

55 Recurrent Mean Shift Grouping mean shift as recurrent module

56 Recurrent Mean Shift Grouping mean shift grouping in the loop

57 Recurrent Mean Shift Grouping What does it mean by mean shift gradient?

58 Recurrent Mean Shift Grouping What does it mean by mean shift gradient?

59 Recurrent Mean Shift Grouping mean shift grouping in the loop input image loop-0 loop-5

60 Learning to Group Low-level vision: edge, boundary, contour Mid-level vision: object proposal End-to-end trainable from data; with the cross-entropy loss. High-level vision: semantic segmentation instance-level semantic segmentation

61 Backbone architecture agnostic -- we use ResNet K He, X Zhang, S Ren, J Sun, Deep Residual Learning for Image Recognition, CVPR, 2016

62 Experiment: Boundary Detection boundary detection one of most imbalanced problem, 85% pixels are non-boundary 1. learn the embedding space of 3-dim with our loss; 2. after convergence, adding logistic loss and fine-tuning; 3. averaging multiple outputs at resblock2~5 followed by thinning (NMS).

63 Experiment: Boundary Detection visualize the 3-dim embedding maps as rgb image before&after fine-tuning with logistic loss

64 Experiment: Boundary Detection visualize the 3-dim embedding maps as rgb image before&after fine-tuning with logistic loss

65 quantitative comparison Experiment: Boundary Detection

66 Experiment: Boundary Detection test image input image

67 Experiment: Boundary Detection test image aesthetically colorful input image res2 res3 rand-proj res4 res5

68 Experiment: Boundary Detection [zoom-in] encoding orientation, distance transform

69 Experiment: Boundary Detection [zoom-in] encoding orientation, distance transform the Mobius strip

70 Experiment: Object Proposal Detection object proposal detection class-agnostic reduce search space for subsequential tasks, e.g. object detection the proposal framework is particularity suitable for this tasks How suitable?

71 Experiment: Object Proposal Detection object proposal detection -- How suitable? Average Recall: averaging recall at IoU=[0.5:0.05:0.95] Achieving very high average recall (AR) with a dozen proposals per image!

72 Experiment: Object Proposal Detection Quanlitatively: Ours vs. SharpMask

73 Experiment: Semantic Segmentation semantic segmentation with cross-entropy loss The pixel pair loss can fill in the holes.

74 Experiment: Semantic Instance Segmentation Instance-level semantic segmentation Using the semantic segmentation result to vote for the semantic label within each object proposal

75 Experiment: Semantic Instance Segmentation Instance-level semantic segmentation Using the semantic segmentation result to vote for the semantic label within each object proposal

76 Experiment: Semantic Instance Segmentation Instance-level semantic segmentation Using the semantic segmentation result to vote for the semantic label within each object proposal

77 Experiment: Semantic Instance Segmentation Quanlitatively div8 vs div4

78 Conclusion and Extension The framework is architecture-wise agnostic, conceptually simple, computationally efficient, practically effective, and theoretically abundant;

79 Conclusion and Extension The framework is architecture-wise agnostic, conceptually simple, computationally efficient, practically effective, and theoretically abundant; it can be purposed for boundary detection, object proposal detection, generic and instance-level segmentation, spanning low-, mid- and high-level vision tasks.

80 Conclusion and Extension The framework is architecture-wise agnostic, conceptually simple, computationally efficient, practically effective, and theoretically abundant; it can be purposed for boundary detection, object proposal detection, generic and instance-level segmentation, spanning low-, mid- and high-level vision tasks. Experiments demonstrate that the new framework achieves state-of-the-art performance on all these tasks.

81 Reference 1. R. Fisher. Dispersion on a sphere. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, volume 217, pages The Royal Society, E. B. Saff and A. B. Kuijlaars. Distributing many points on a sphere. The mathematical intelligencer, 19(1):5 11, K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with applications in pattern recogni- tion. IEEE Transactions on information theory, 21(1):32 40, V. A. Epanechnikov. Non-parametric estimation of a multi-variate probability density. Theory of Probability & Its Applications, 14(1): , 1969

82 Thanks

Team G-RMI: Google Research & Machine Intelligence

Team G-RMI: Google Research & Machine Intelligence Alireza Fathi (alirezafathi@google.com) Nori Kanazawa, Kai Yang, George Papandreou, Tyler Zhu, Jonathan Huang, Vivek Rathod, Chen Sun, Kevin Murphy, et