CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017

Size: px

Start display at page:

Download "CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017"

Marvin Weaver
5 years ago
Views:

1 CS468: 3D Deep Learning on Point Cloud Data class label part label Hao Su image. May 10, 2017

2 Agenda Point cloud generation Point cloud analysis

3 CVPR 17, Point Set Generation Pipeline render

4 CVPR 17, Point Set Generation Pipeline 2K object categories 200K shapes ~10M image/point set pairs render sample Groundtruth point set

5 CVPR 17, Point Set Generation Pipeline Prediction Shape predictor (f) render sample Groundtruth point set

6 CVPR 17, Point Set Generation Pipeline Prediction Shape predictor (f) render A set is invariant up to permutation sample Groundtruth point set

7 CVPR 17, Point Set Generation Pipeline Prediction render Shape predictor (f) Loss on sets (L) sample Groundtruth point set

8 CVPR 17, Point Set Generation Pipeline Prediction render Shape predictor (f) Loss on sets (L) sample Groundtruth point set

9 CVPR 17, Point Set Generation Set comparison Given two sets of points, measure their discrepancy

10 CVPR 17, Point Set Generation Set comparison Given two sets of points, measure their discrepancy Key challenge: correspondence problem

11 CVPR 17, Point Set Generation Correspondence (I): optimal assignment Given two sets of points, measure their discrepancy a.k.a Earth Mover s distance (EMD)

12 CVPR 17, Point Set Generation Correspondence (II): closest point Given two sets of points, measure their discrepancy a.k.a Chamfer distance (CD)

13 CVPR 17, Point Set Generation Required properties of distance metrics Geometric requirement Computational requirement

14 CVPR 17, Point Set Generation Required properties of distance metrics Geometric requirement Reflects natural shape differences Induce a nice space for shape interpolations Computational requirement

15 CVPR 17, Point Set Generation How distance metric affects learning? A fundamental issue: inherent ambiguity in 2D-3D dimension lifting

16 CVPR 17, Point Set Generation How distance metric affects learning? A fundamental issue: inherent ambiguity in 2D-3D dimension lifting

17 CVPR 17, Point Set Generation How distance metric affects learning? A fundamental issue: inherent ambiguity in 2D-3D dimension lifting

18 CVPR 17, Point Set Generation How distance metric affects learning? A fundamental issue: inherent ambiguity in 2D-3D dimension lifting By loss minimization, the network tends to predict a mean shape that averages out uncertainty

19 Distance metrics affect mean shapes The mean shape carries characteristics of the distance metric continuous hidden variable (radius) Input EMD mean Chamfer mean CVPR 17, Point Set Generation

20 Mean shapes from distance metrics The mean shape carries characteristics of the distance metric continuous hidden variable (radius) discrete hidden variable (add-on location) Input EMD mean Chamfer mean CVPR 17, Point Set Generation

21 CVPR 17, Point Set Generation Comparison of predictions by EMD versus CD Input EMD Chamfer

22 CVPR 17, Point Set Generation Required properties of distance metrics Geometric requirement Reflects natural shape differences Induce a nice space for shape interpolations Computational requirement Defines a loss function that is numerically easy to optimize

23 CVPR 17, Point Set Generation Computational requirement of metrics To be used as a loss function, the metric has to be Differentiable with respect to point locations Efficient to compute

24 CVPR 17, Point Set Generation Computational requirement of metrics Differentiable with respect to point location Chamfer distance Earth Mover s distance - Simple function of coordinates - In general positions, the correspondence is unique - With infinitesimal movement, the correspondence does not change Conclusion: differentiable almost everywhere

25 CVPR 17, Point Set Generation Computational requirement of metrics Efficient to compute Chamfer distance: trivially parallelizable on CUDA Earth Mover s distance (optimal assignment): - We implement a distributed approximation algorithm on CUDA - Based upon [Bertsekas, 1985], -approximation

26 CVPR 17, Point Set Generation Pipeline Deep network (f) Prediction Loss on sets (L) sample

27 CVPR 17, Point Set Generation Pipeline Encoder shape embedding (f) Predictor Loss on sets (L) sample

28 CVPR 17, Point Set Generation Pipeline Encoder shape embedding (f) Predictor Loss on sets (L) sample

29 CVPR 17, Point Set Generation Pipeline conv... Predictor Loss on sets (L) sample

30 CVPR 17, Point Set Generation Pipeline conv... Predictor Loss on sets (L) sample

31 Natural statistics of geometry Many local structures are common e.g., planar patches, cylindrical patches strong local correlation among point coordinates CVPR 17, Point Set Generation

32 Natural statistics of geometry Many local structures are common e.g., planar patches, cylindrical patches strong local correlation among point coordinates Also some intricate structures points have high local variation CVPR 17, Point Set Generation

33 CVPR 17, Point Set Generation Pipeline Capture common structures conv... Capture intricate structures Deconv branch FC branch Loss on sets (L) sample

34 CVPR 17, Point Set Generation Pipeline Capture common structures conv... Capture intricate structures Deconv branch FC branch Loss on sets (L) sample

35 Review: deconv network n Output D arrays, e.g., 2D segmentation map Common local patterns are learned from data Predict locally correlated data well Weight sharing reduces the number of params Deconv network for image segmentation Credit: FCNN, Long et al.

36 Review: deconv network n Output D arrays, e.g., 2D segmentation map Common local patterns are learned from data Predict locally correlated data well Weight sharing reduces the number of params How to predict curved surfaces in 3D? Deconv network for image segmentation Credit: FCNN, Long et al.

37 Prediction of curved 2D surfaces in 3D Surface parametrization (2D 3D mapping) Credit: Discrete Differential Geometry, Crane et al.

38 Prediction of curved 2D surfaces in 3D Surface parametrization (2D 3D mapping) Credit: Discrete Differential Geometry, Crane et al.

39 Prediction of curved 2D surfaces in 3D Surface parametrization (2D 3D mapping) x-map y-map z-map coordinate maps Credit: Discrete Differential Geometry, Crane et al.

40 CVPR 17, Point Set Generation Parametrization prediction by deconv network Capture common structures conv... Capture intricate structures Deconv branch FC branch Loss on sets (L) sample

41 CVPR 17, Point Set Generation Parametrization prediction by deconv network Capture common structures deconv conv Capture intricate structures FC branch coordinate maps Loss on sets (L) sample

42 CVPR 17, Point Set Generation Parametrization prediction by deconv network Capture common structures deconv conv coordinate maps Note that FC branch Capture The parametrization intricate structures(2d/3d mapping) is learned from data i.e., obtains a network and data friendly parametrization Loss on sets (L) sample

43 Visualization of the learned parameterization Observation: Learns a smooth parametrization Because deconv net tends to predict data with local correlation (xk, yk, zk ) map of x coord map of y coord map of z coord

44 Visualization of the learned parameterization Observation: Learns a smooth parametrization Because deconv net tends to predict data with local correlation Corresponds to smooth surfaces! (xk, yk, zk ) map of x coord map of y coord map of z coord

46 Natural statistics of geometry Many local structures are common e.g., planar patches, cylindrical patches strong local correlation among point coordinates Also some intricate structures points have high local variation CVPR 17, Point Set Generation

47 CVPR 17, Point Set Generation Pipeline Capture common structures deconv conv Capture intricate structures FC branch Loss on sets (L) sample

48 CVPR 17, Point Set Generation Pipeline Capture common structures deconv conv dense Capture intricate structures dense fc Loss on sets (L) sample

49 CVPR 17, Point Set Generation Pipeline Capture common structures deconv conv dense Capture intricate structures sample dense fc Points are predicted independently Dense connection introduces more parameters to accommodate high variance Loss on sets (L)

50 Visualization of the effect of FC branch Observation: The arrangement of predicted points are uncorrelated x-coord y-coord z-coord red CVPR 17, Point Set Generation

51 Visualization of the effect of FC branch Observation: The arrangement of predicted points are uncorrelated Located at fine structures x-coord y-coord z-coord red CVPR 17, Point Set Generation

52 Q: Which color corresponds to the deconv branch? FC branch? CVPR 17, Point Set Generation

53 CVPR 17, Point Set Generation Q: Which color corresponds to the deconv branch? FC branch? blue: deconv branch large, smooth structures red: FC branch intricate structures

54 CVPR 17, Point Set Generation Comparison to state-of-the-art Input Ours Ours (post-processed) Groundtruth state-of-the-art (3D-R2N2) Better global structure Better details

55 CVPR 17, Point Set Generation Comparison to state-of-the-art Trained/tested on 2K object categories 1.8 Chamfer Distance (Error) Baseline (mean shape) [Choy et. al, ECCV16] Ours

56 Extension: shape completion for RGBD data RGBD map (input) 90 view of input output: completed point cloud CVPR 17, Point Set Generation

57 How about learning to predict geometric forms? Candidates: Rasterized form (regular grids) Geometric form (irregular) multi-view images depth map volumetric polygonal mesh point cloud primitive-based CAD models

58 Primitive-based assembly We learn to predict a corresponding shape composed by primitives. It allows us to predict consistent compositions across objects.

59 Unsupervised parsing Each point is colored according to the assigned primitive

60 A historical overview Generalized Cylinders, Binford (1971)

61 Approach Encoder Predictor for primitive parameters We predict primitive parameters: size, rotation, translation of M cuboids.

62 Approach We predict primitive parameters: size, rotation, translation of M cuboids. Variable number of parts? We predict primitive existence probability

63 Loss function Loss

64 Loss function construction Basic idea: Chamfer distance!

65 Loss function construction Sample points on the groundtruth mesh and predicted assembly Each point is a linear function of mesh/primitive vertex coordinates Differentiable!

66 Loss function construction Sample points on the groundtruth mesh and predicted assembly Each point is a linear function of mesh/primitive vertex coordinates Differentiable! Speed up the computation leveraging parameterization of primitives

67 Consistent primitive configurations Primitive locations are consistent due to the smoothness of primitive prediction network

68 Unsupervised parsing Mean accuracy (face area) on Shape COSEG chairs.

69 Analysis Shapes become more parsimonious as training progresses (due to our parsimony reward)

70 Image-based modeling

71 Agenda Point cloud generation Point cloud analysis

72 Applications of Point Set Learning Robot Perception What and where are the objects in a LiDAR scanned scene?

73 Applications of Point Set Learning Molecular Biology Can we infer an enzyme s category (reactions they catalyze) from its structure? EcoRV restriction enzyme molecule, LAGUNA DESIGN/SCIENCE PHOTO LIBRARY

74 Directly process point cloud data End-to-end learning for unstructured, unordered point data PointNet Motivation ShapeNet PointNet 74

75 Directly process point cloud data End-to-end learning for unstructured, unordered point data Unified framework for various tasks Object Classification PointNet Part Segmentation Scene Parsing... Motivation ShapeNet PointNet 75

76 Properties of a desired neural network on point clouds Point cloud: N orderless points, each represented by a D dim coordinate D N 2D array representation Motivation ShapeNet PointNet 76

77 Properties of a desired neural network on point clouds Point cloud: N orderless points, each represented by a D dim coordinate D N Permutation invariance 2D array representation Transformation invariance Motivation ShapeNet PointNet 77

78 Properties of a desired neural network on point clouds Point cloud: N orderless points, each represented by a D dim coordinate D D N represents the same set as N 2D array representation Permutation invariance Motivation ShapeNet PointNet 78

79 Permutation invariance: Symmetric function f (x 1, x 2,, x n ) f (x π1, x π 2,, x π n ), x i! D Examples: f (x 1, x 2,, x n ) = max{x 1, x 2,, x n } f (x 1, x 2,, x n ) = x 1 + x x n Motivation ShapeNet PointNet 79

80 Construct symmetric function family Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric Motivation ShapeNet PointNet 80

81 Construct symmetric function family Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric h (1,2,3) (1,1,1) (2,3,2) (2,3,4) Motivation ShapeNet PointNet 81

82 Construct symmetric function family Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric h (1,2,3) (1,1,1) g simple symmetric function (2,3,2) (2,3,4) Motivation ShapeNet PointNet 82

83 Construct symmetric function family Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric h (1,2,3) (1,1,1) g simple symmetric function γ (2,3,2) (2,3,4) PointNet (vanilla) Motivation ShapeNet PointNet 83

84 Q: What symmetric functions can be constructed by PointNet? Symmetric functions PointNet (vanilla) Motivation ShapeNet PointNet 84

85 A: Universal approximation to continuous symmetric functions Theorem: A Hausdorff continuous symmetric function approximated by PointNet. f :2 X! can be arbitrarily S! d, PointNet (vanilla) Motivation ShapeNet PointNet 85

86 PointNet Architecture

87 Results on Object Classification Object Classification Accuracy on ModelNet40

88 Results on Object Part Segmentation

89 Results on Object Part Segmentation Part Segmentation miou on ShapeNet Part Dataset

90 Results on Semantic Scene Segmentation

91 Results on Semantic Scene Parsing Semantic Segmentation (point based) on Stanford Semantic Parsing dataset 3D Object Detection (bounding box based)

92 Robustness to Data Corruption

93 Robustness to Data Corruption

94 Visualizing Point Functions Compact View: 1x3 FCs 1x1024 Expanded View: 1x3 FC FC FC FC FC x1024 Which input point will activate neuron j? Find the top-k points in a dense volumetric grid that activates neuron j.

95 Visualizing Point Functions

96 Visualizing Global Point Cloud Features What s captured and left out here?

97 Visualizing Global Point Cloud Features

98 Visualizing Global Point Cloud Features (OOS)

99 Previous Work: PointNet v1.0 Segmentation Network Point Embeddings

100 Previous Work: PointNet v1.0 Segmentation Network Tiled Global Features Point Embeddings

101 Previous Work: PointNet v1.0 Segmentation Network Tiled Global Features Point Embeddings No local context for each point!

102 Limitations of PointNet v1.0 Hierarchical Feature Learning Increasing receptive field 3D CNN (Wu et al.)

103 Limitations of PointNet v1.0 Hierarchical Feature Learning Increasing receptive field Global Feature Learning Receptive field: one point OR all points v.s. 3D CNN (Wu et al.) PointNet (vanilla) (Qi et al.)

104 Limitations of PointNet v1.0 Artifacts in segmentation tasks: Semantic segmentation in randomly translated table-cup scene. Instance segmentation in tablechair-cup scene

105 Limitations of PointNet v1.0 Artifacts in segmentation tasks: Semantic segmentation in randomly translated table-cup scene. Global feature depends on absolute XYZ! Instance segmentation in tablechair-cup scene Hard to generalize to unseen point configurations

106 Question How to learn local context feature for points? Use PointNet in local regions, aggregate local region features by PointNet again.. Hierarchical feature learning!

107 Multi-Scale PointNet for Hierarchical Feature Learning Charles Ruizhongtai Qi Multi-Scale PointNet for Hierarchical Feature Learning in Point Sets 107

108 PointNet v2.0: Multi-Scale PointNet N points in (x,y)

109 PointNet v2.0: Multi-Scale PointNet Local Coordinates! N points in (x,y) k local points in (x,y )

110 PointNet v2.0: Multi-Scale PointNet PointNet v1.0 N points in (x,y) k local points in (x,y ) feature vector (mark as ) for local point cloud

111 PointNet v2.0: Multi-Scale PointNet PointNet Module/Layer: Farthest Point Sampling + Grouping + PointNet v1.0 N points in (x,y) N1 points in (x,y,f)

112 PointNet v2.0: Multi-Scale PointNet N points in (x,y) N1 points in (x,y,f) N2 points in (x,y,f )

113 PointNet v2.0: Multi-Scale PointNet N points in (x,y) N1 points in (x,y,f) N2 points in (x,y,f )

114 PointNet v2.0: Multi-Scale PointNet N points in (x,y) N1 points in (x,y,f) N2 points in (x,y,f ) 1. Larger receptive field in higher layers 2. Less points in higher layers (more scalable) 3. Weight sharing 4. Translation invariance (local coordinates in local regions)

115 Discussions on Multi-Scale PointNet Charles Ruizhongtai Qi Multi-Scale PointNet for Hierarchical Feature Learning in Point Sets 115

116 Multi-Scale PointNet v.s. PointNet v1.0 Three-layer Multi-Scale PointNet: N points in (x,y) N1 points in (x,y,f) N2 points in (x,y,f ) One-layer Multi-Scale PointNet <=> PointNet v1.0 N points in (x,y)

117 PointNet Layer v.s. Convolution Layer PointNet Layer Convolution Layer PointNet v1.0 Input: Operation: Neighbor -hood: Point set MLP + max pooling Distance query Dense array Multiply and add Array index

118 Multi-Scale PointNet v.s. Graph CNN Unexpectedly strong relation with Graph CNN: Joan Bruna et al. Spectral Networks and Deep Locally Connected Networks on Graphs. ICLR 2014

119 Multi-Scale PointNet v.s. Graph CNN Local feature extraction, graph coarsening, repeat.. Joan Bruna et al. Spectral Networks and Deep Locally Connected Networks on Graphs. ICLR 2014

120 Multi-Scale PointNet v.s. Graph CNN In Graph CNN s perspective: Multi-Scale PointNet defines 1. Graph connectivity through Euclidean distance 2. Graph coarsening by farthest point sampling 3. Local feature extraction with PointNet (v1.0) Multi-scale PointNet Graph CNN

121 Relations to OctNet (Octree based 3D CNN) OctNet in Graph CNN s perspective: 1. Both connectivity and graph coarsening are defined by the Octree. 2. Local feature extraction by convolution layer. OctNet: Learning Deep 3D Representations at High Resolutions Gernot Riegler, Ali Osman Ulusoy and Andreas Geiger

Relations to OctNet (Octree based 3D CNN) OctNet in Graph CNN s perspective: In Multi-Scale PointNet 1. Both connectivity and graph coarsening are defined by the Octree. 2.

122 Relations to OctNet (Octree based 3D CNN) OctNet in Graph CNN s perspective: In Multi-Scale PointNet 1. Both connectivity and graph coarsening are defined by the Octree. 2. Local feature extraction by convolution layer. By ground distance By PointNet (v1.0) OctNet: Learning Deep 3D Representations at High Resolutions Gernot Riegler, Ali Osman Ulusoy and Andreas Geiger

Seeing the unseen. Data-driven 3D Understanding from Single Images. Hao Su

Seeing the unseen. Data-driven 3D Understanding from Single Images. Hao Su Seeing the unseen Data-driven 3D Understanding from Single Images Hao Su Image world Shape world 3D perception from a single image Monocular vision a typical prey a typical predator Cited from https://en.wikipedia.org/wiki/binocular_vision