Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Size: px

Start display at page:

Download "Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction"

Angelica Atkinson
5 years ago
Views:

1 Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky

2 2 Comparison to Volumetric Fusion

3 Higher-order ray potentials to model visibility Volumetric formulation (Savinov et al, CVPR15) Ray potentials Pairwise regularizer Cost based on the first occupied voxel along the ray freespace depth label

4 Higher-order ray potentials to model visibility Discrete formulation using QPBO relaxation (Savinov et al, CVPR15) x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 0 x 1 x 2 x 3 x 4 x 5 x 6 Our goal is to find : such that is : 1) A pairwise function 2) Number of edges grows linearly with the length for a ray 3) Symmetric to inherit QPBO properties

5 Two-label problem (Savinov et al, CVPR15) To find we do these steps: 1) Polynomial representation of the ray potential 2) Transformation into submodular function over x and x 3) Pairwise construction using auxiliary variables z 4) Merging variables [Ramalingam12] for linear complexity 5) Symmetrization of the graph

6 Symmetric graph construction for higher-order ray potential (Savinov et al, CVPR15)

7 Multi-label problem (Savinov et al, CVPR15) Standard alpha-expansion Multi-label ray potential projects into 2-label ray potential expansion expansion expansion Variables not labelled by QPBO labelled using ICM

8 Implementation details (Savinov et al, CVPR15) Semantic cost Depth cost Semantic classifier [Ladický ICCV09] Multi-view stereo depth matches using zero-mean NCC For the top n matches : 0-1 Use [GoldbergAlgo11] for graphcut

9 Results inference generative 9 (Savinov et al, CVPR15)

10 Results Input Depth Semantics 3D model (Savinov et al, CVPR15)

11 Results Input Depth Semantics 3D model (Savinov et al, CVPR15)

12 Results (Savinov et al, CVPR15)

13 13 Joint depth-semantic cost?

15 Single-View Depth using a Joint Depth-Semantic Classifier Marc Pollefeys Joint work with Ľubor Ladický and Jianbo Shi (Upenn)

16 Single-View Depth Estimation

17 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08]

18 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08] Requires strong prior knowledge Ignores small objects

19 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08] 2. 3D-Detection based [Hoiem et al. CVPR06]

20 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08] 2. 3D-Detection based [Hoiem et al. CVPR06] Works only for foreground objects (things)

21 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08] 2. 3D-Detection based [Hoiem et al. CVPR06] 3. Depth from semantic labels [Liu et al. CVPR10]

22 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08] 2. 3D-Detection based [Hoiem et al. CVPR06] 3. Depth from semantic labels [Liu et al. CVPR10] Requires strong priors for semantic classes

23 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08] 2. 3D-Detection based [Hoiem et al. CVPR06] 3. Depth from semantic labels [Liu et al. CVPR10] 4. Data driven [Saxena et al. NIPS05]

24 Single-View Depth Estimation Standard approaches : 1. Model fitting [Barinova et al. ECCV08] 2. 3D-Detection based [Hoiem et al. CVPR06] 3. Depth from semantic labels [Liu et al. CVPR10] 4. Data driven [Saxena et al. NIPS05] Requires lots of data (depth does not generalizes across classes) A problem with balancing data

25 Data-driven Depth Estimation Impossible?

26 Data-driven Depth Estimation No common structure of the scene Ground plane not always visible Large variation of viewpoints and of objects in the scene Both things and stuff in the scene

27 Data-driven Depth Estimation Desired properties :

28 Data-driven Depth Estimation Desired properties : 1. Pixel-wise classifier Super-pixels not necessarily planar

29 Data-driven Depth Estimation Desired properties : 1. Pixel-wise classifier 2. Translation invariant Classifier response for x and at a depth d window w h around the point x I

30 Data-driven Depth Estimation Desired properties : 1. Pixel-wise classifier 2. Translation invariant 3. Depth transforms with inverse scaling

31 Data-driven Depth Estimation Desired properties : 1. Pixel-wise classifier 2. Translation invariant 3. Depth transforms with inverse scaling d C Sufficient to train a binary classifier predicting a single

32 Data-driven Depth Estimation Desired properties : 1. Pixel-wise classifier 2. Translation invariant 3. Depth transforms with inverse scaling d C Sufficient to train a binary classifier predicting a single For other depths d :

33 Data-driven Depth Estimation Desired properties : 1. Pixel-wise classifier 2. Translation invariant 3. Depth transforms with inverse scaling

34 Data-driven Depth Estimation Desired properties : 1. Pixel-wise classifier 2. Translation invariant 3. Depth transforms with inverse scaling Generalized to multiple semantic classes semantic label

35 1. Image pyramid is built Training the classifier

36 Training the classifier 1. Image pyramid is built 2. Training data randomly sampled

37 Training the classifier 1. Image pyramid is built 2. Training data randomly sampled 3. Samples of each class at d C used as positives

38 Training the classifier 1. Image pyramid is built 2. Training data randomly sampled 3. Samples of each class at d C used as positives 4. Samples of other classes or at d d C used as negatives

39 Training the classifier 1. Image pyramid is built 2. Training data randomly sampled 3. Samples of each class at d C used as positives 4. Samples of other classes or at d d C used as negatives 5. Multi-class classifier trained

40 Classifying the patch Dense Features SIFT, LBP, Self Similarity, Texton

41 Classifying the patch Dense Features SIFT, LBP, Self Similarity, Texton Representation rectangles Soft BOW representations in the set of random

42 Classifying the patch Dense Features SIFT, LBP, Self Similarity, Texton Representation rectangles Classifier Soft BOW representations in the set of random AdaBoost

43 Experiments KITTI dataset 30 training & 30 test images (1382 x 512) 12 semantic labels, depth 2-50m (except sky) ratio of neighbouring depths d i+1 / d i = 1.25 NYU2 dataset 725 training & 724 test images (640 x 480) 40 semantic labels, depth in the range 1-10 m ratio of neighbouring depths d i+1 / d i = 1.25

44 KITTI results

45 NYU2 results

46 NYU2 results

47 Quantitative results (semantic) Quantitative results on the KITTI dataset (recall) Quantitative results on the NYU2 dataset (frequency-weighted I / U)

48 Quantitative results (depth) The ratio of pixels below the relative error

49 Quantitative results (depth) The distribution of the relative errors of an estimated depth in the log 1.25 space

50 Single-View Depth: Conclusions Things and stuff have intrinsic visual scale Depth can be recovered from a single image Classification improves with scale normalized classifier Data does not need to be balanced over scale

51 Discriminatively Trained Dense Surface Normal Estimation Marc Pollefeys Joint work with Ľubor Ladický and Bernhard Zeisl

52 Surface Normal Estimation Not explored much in the literature so how to approach it?

53 Surface Normal Estimation Not explored much in the literature so how to approach it? Pixels or Super-pixels?

54 Pixel-based Classifiers Input image Feature representation Context-based (context pixels or rectangles) feature representations [Shotton06, Shotton08]

55 Pixel-based Classifiers Input image Feature representation Context-based (context pixels or rectangles) feature representations [Shotton06, Shotton08] Classifier typically noisy and does not follow object boundaries

56 Segment-based Classifiers Input image Feature representation Based on feature statistics in segments

57 Segment-based Classifiers Input image Feature representation Based on feature statistics in segments Segments expected to be label-consistent

58 Segment-based Classifiers Input image Feature representation Based on feature statistics in segments Segments expected to be label-consistent One particular segmentation has to be chosen

59 Joint Regularization Input image Independent classifiers Existing optimization methods (Ladicky09) designed for discrete labels

60 Joint Regularization Input image Independent classifiers Existing optimization methods (Ladicky09) designed for discrete labels Not obvious how to generalize for continuous problems

discrete labels Not obvious how to generalize for

61 Joint Regularization Input image Independent classifiers Existing optimization methods (Ladicky09) designed for discrete labels Not obvious how to generalize for continuous problems Maybe we can directly learn joint classifier

62 Joint Learning Input image Segment representation How to convert segment representation into pixel representation?

63 Joint Learning Input image Segment representation How to convert segment representation into pixel representation? Representation of a pixel the same as of the segment it belongs to

64 Joint Learning Input image Segment representation How to convert segment representation into pixel representation? Representation of a pixel the same as of the segment it belongs to Equivalent to weighted segment based approach

65 Joint Learning How to convert segment representation into pixel representation? Representation of a pixel the same as of the segment it belongs to Equivalent to weighted segment based approach Concatenation to combine pixel and multiple segment representations

66 Joint Learning To simplify regression problem Normals clustered using K-means clustering Each represented as weighted sums of cluster centres using local coding

67 Joint Learning To simplify regression problem Normals clustered using K-means clustering Each represented as weighted sums of cluster centres using local coding Learning formulated as a regression into local coding coordinates

68 Pipeline of our Method

69 AdaBoost Regression Response for each cluster centre l

70 AdaBoost Regression Response for each cluster centre l Learning optimizes weighted expected loss

71 AdaBoost Regression Response for each cluster centre l Learning optimizes weighted expected loss Empirical risk minimization in each iteration

72 AdaBoost Regression Introducing two sets of weights the problem transforms into recursive problem:

73 AdaBoost Regression Introducing two sets of weights the problem transforms into recursive problem: Closed-form solution for parameters of the weak classifier (see the paper)

74 Test-time Evaluation The most probable triangle found by maximizing:

75 Test-time Evaluation The most probable triangle found by maximizing: The local coding coefficients found as an expected value of probabilistic interpretation:

76 Test-time Evaluation The most probable triangle found by maximizing: The local coding coefficients found as an expected value of probabilistic interpretation: Normal recovered by projecting weighted sum to the unit sphere

77 Results Input image err = Input image err = Input image err = Input image err = Input image err = Input image err = Input image err = Input image err = Input image err =

78 Results Input image err = Input image err = Input image err = Input image err = Input image err = Input image err = Input image err = Input image err = Input image err =

79 Normal segmentation: conclusions Normal estimation might not be as hard as it seems Proposed joint pixel & segment learning useful for other recognition tasks Results improvable by incorporating regularization (potentially joint with depth)

80 Conclusion Volumetric multi-view approach which performs joint reconstruction, recognition and segmentation Strong coupling between geometry and appearance via classdependent anisotropic smoothness term Clean energy formulation (with tight convex relaxation) Significant qualitative improvement with respect to state of the art (regularized solution make more sense) Semantic reconstruction is more useful

81 Challenges and future research Scale to many classes leverage sparsity of semantic interactions, class hierarchies Scale to large volumes adaptive space discretization and basis representation Dynamic scenes Spatio-temporal interactions, extensions to 4D volumes and Wulff shapes Exploration and robotics Enable real-time navigation and exploration Predict information gain of perception actions

82 Thank you for your attention! Questions?

Geometric and Semantic 3D Reconstruction: Part 4A: Volumetric Semantic 3D Reconstruction. CVPR 2017 Tutorial Christian Häne UC Berkeley

Geometric and Semantic 3D Reconstruction: Part 4A: Volumetric Semantic 3D Reconstruction CVPR 2017 Tutorial Christian Häne UC Berkeley Dense Multi-View Reconstruction Goal: 3D Model from Images (Depth