PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

Size: px

Start display at page:

Download "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas"

Johnathan Scott
6 years ago
Views:

1 PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

2 Big Data + Deep Representation Learning Robot Perception Augmented Reality source: Scott J Grunewald source: Google Tango Emerging 3D Applications Shape Design source: solidsolutions

3 Big Data + Deep Representation Learning Robot Perception Augmented Reality source: Scott J Grunewald Shape Design source: Google Tango Need for 3D Deep Learning! source: solidsolutions

4 3D Representations Point Cloud Mesh Volumetric Projected View RGB(D)

5 3D Representation: Point Cloud Point cloud is close to raw sensor data LiDAR Depth Sensor Point Cloud

6 3D Representation: Point Cloud Point cloud is close to raw sensor data Point cloud is canonical Mesh LiDAR Volumetric Depth Sensor Point Cloud Depth Map

7 Previous Works Most existing point cloud features are handcrafted towards specific tasks Source:

8 Previous Works Point cloud is converted to other representations before it s fed to a deep neural network Conversion Voxelization Projection/Rendering Feature extraction Deep Net 3D CNN 2D CNN Fully Connected

9 Research Question: Can we achieve effective feature learning directly on point clouds?

10 Our Work: PointNet End-to-end learning for scattered, unordered point data PointNet

11 Our Work: PointNet End-to-end learning for scattered, unordered point data Unified framework for various tasks Object Classification PointNet Object Part Segmentation Semantic Scene Parsing...

12 Our Work: PointNet End-to-end learning for scattered, unordered point data Unified framework for various tasks

13 Challenges Unordered point set as input Model needs to be invariant to N! permutations. Invariance under geometric transformations Point cloud rotations should not alter classification results.

14 Challenges Unordered point set as input Model needs to be invariant to N! permutations. Invariance under geometric transformations Point cloud rotations should not alter classification results.

15 Unordered Input Point cloud: N orderless points, each represented by a D dim vector D N

16 Unordered Input Point cloud: N orderless points, each represented by a D dim vector D D N represents the same set as N

17 Unordered Input Point cloud: N orderless points, each represented by a D dim vector D D N represents the same set as N Model needs to be invariant to N! permutations

18 Permutation Invariance: Symmetric Function f (x 1, x 2,, x n ) f (x π1, x π 2,, x π n ), x i! D

19 Permutation Invariance: Symmetric Function f (x 1, x 2,, x n ) f (x π1, x π 2,, x π n ), x i! D Examples: f (x 1, x 2,, x n ) = max{x 1, x 2,, x n } f (x 1, x 2,, x n ) = x 1 + x x n

20 Permutation Invariance: Symmetric Function f (x 1, x 2,, x n ) f (x π1, x π 2,, x π n ), x i! D Examples: f (x 1, x 2,, x n ) = max{x 1, x 2,, x n } f (x 1, x 2,, x n ) = x 1 + x x n How can we construct a family of symmetric functions by neural networks?

21 Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric

22 Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric (1,2,3) (1,1,1) (2,3,2) h (2,3,4)

23 Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric (1,2,3) (1,1,1) (2,3,2) (2,3,4) h g simple symmetric function

24 Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric (1,2,3) (1,1,1) (2,3,2) (2,3,4) h g simple symmetric function γ PointNet (vanilla)

25 Permutation Invariance: Symmetric Function What symmetric functions can be constructed by PointNet? Symmetric functions PointNet (vanilla)

26 Universal Set Function Approximator Theorem: A Hausdorff continuous symmetric function arbitrarily approximated by PointNet. f :2 X! can be S! d PointNet (vanilla)

27 Basic PointNet Architecture Empirically, we use multi-layer perceptron (MLP) and max pooling: h (1,2,3) (1,1,1) MLP MLP g γ (2,3,2) MLP max MLP (2,3,4) MLP PointNet (vanilla)

28 Challenges Unordered point set as input Model needs to be invariant to N! permutations. Invariance under geometric transformations Point cloud rotations should not alter classification results.

29 Input Alignment by Transformer Network Idea: Data dependent transformation for automatic alignment 3 T-Net transform 3 params N Transform N Data Transformed Data

30 Input Alignment by Transformer Network Idea: Data dependent transformation for automatic alignment 3 T-Net transform 3 params N Transform N Data Transformed Data

31 Input Alignment by Transformer Network Idea: Data dependent transformation for automatic alignment 3 T-Net transform 3 params N Transform N Data Transformed Data

32 Input Alignment by Transformer Network The transformation is just matrix multiplication! 3 T-Net transform 3 params: 3x3 N Matrix Mult. Data Transformed Data

33 Embedding Space Alignment T-Net transform params: 64x64 Matrix Mult. Input embeddings: Nx64 Transformed embeddings: Nx64

34 Embedding Space Alignment T-Net transform params: 64x64 Matrix Mult. Regularization: Transform matrix A 64x64 close to orthogonal: Input embeddings: Nx64 Transformed embeddings: Nx64

35 PointNet Classification Network

36 PointNet Classification Network

37 PointNet Classification Network

38 PointNet Classification Network

39 PointNet Classification Network

40 PointNet Classification Network

41 PointNet Classification Network

42 Extension to PointNet Segmentation Network local embedding global feature

43 Extension to PointNet Segmentation Network local embedding global feature

44 Results

45 Results on Object Classification 3D CNNs dataset: ModelNet40; metric: 40-class classification accuracy (%)

46 Results on Object Part Segmentation

47 Results on Object Part Segmentation dataset: ShapeNetPart; metric: mean IoU (%)

48 Results on Semantic Scene Parsing Output Input dataset: Stanford 2D-3D-S (Matterport scans)

49 Robustness to Data Corruption dataset: ModelNet40; metric: 40-class classification accuracy (%)

50 Robustness to Data Corruption Less than 2% accuracy drop with 50% missing data dataset: ModelNet40; metric: 40-class classification accuracy (%)

51 Robustness to Data Corruption dataset: ModelNet40; metric: 40-class classification accuracy (%)

52 Robustness to Data Corruption Why is PointNet so robust to missing data? 3D CNN

53 Visualizing Global Point Cloud Features n MLP shared maxpool global feature Which input points are contributing to the global feature? (critical points)

54 Visualizing Global Point Cloud Features Original Shape: Critical Point Sets:

55 Visualizing Global Point Cloud Features n MLP shared maxpool global feature Which points won t affect the global feature?

56 Visualizing Global Point Cloud Features Original Shape: Critical Point Set: Upper bound set:

57 Visualizing Global Point Cloud Features (OOS) Original Shape: Critical Point Set: Upper bound Set:

58 Conclusion PointNet is a novel deep neural network that directly consumes point cloud. A unified approach to various 3D recognition tasks. Rich theoretical analysis and experimental results. Code & Data Available! See you at Poster 9!

59 Thank you!

60 THE END

61 Speed and Model Size Inference time 11.6ms, 25.3ms GTX1080, batch size 8

62 Permutation Invariance: How about Sorting? Sort the points before feeding them into a network. Unfortunately, there is no canonical order in high dim space. (1,2,3) (1,1,1) (2,3,2) (2,3,4) lexsorted (1,1,1) (1,2,3) (2,3,2) (2,3,4) MLP

63 Permutation Invariance: How about Sorting? Sort the points before feeding them into a network. Unfortunately, there is no canonical order in high dim space. lexsorted Multi-Layer Perceptron (ModelNet shape classification) (1,2,3) (1,1,1) (2,3,2) (2,3,4) (1,1,1) (1,2,3) (2,3,2) (2,3,4) MLP Accuracy Unordered Input 12% Lexsorted Input 40% PointNet (vanilla) 87%

64 Permutation Invariance: How about RNNs? Train RNN with permutation augmentation. However, RNN forgets and order matters. LSTM LSTM LSTM LSTM MLP MLP MLP MLP (1,2,3) (1,1,1) (2,3,2) (2,3,4)

65 Permutation Invariance: How about RNNs? Train RNN with permutation augmentation. However, RNN forgets and order matters. LSTM LSTM LSTM LSTM LSTM Network (ModelNet shape classification) MLP MLP MLP MLP Accuracy LSTM 75% (1,2,3) (1,1,1) (2,3,2) (2,3,4) PointNet (vanilla) 87%

66 PointNet Classification Network ModelNet40 Accuracy PointNet (vanilla) 87.1% + input 3x3 87.9% + feature 64x % + feature 64x64 + reg 87.4% + both 89.2%

67 Visualizing Point Functions Compact View: 1x3 FCs 1x1024 Expanded View: 1x3 FC FC FC FC FC x1024 Which input point will activate neuron X? Find the top-k points in a dense volumetric grid that activates neuron X.

68 Visualizing Point Functions

CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017

CS468: 3D Deep Learning on Point Cloud Data class label part label Hao Su image. May 10, 2017 Agenda Point cloud generation Point cloud analysis CVPR 17, Point Set Generation Pipeline render CVPR 17, Point