PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

Similar documents
CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

3D Deep Learning on Geometric Forms. Hao Su

AI Ukraine Point cloud labeling using machine learning

3D Deep Learning

POINT CLOUD DEEP LEARNING

Deep Learning on Point Sets for 3D Classification and Segmentation

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Using Faster-RCNN to Improve Shape Detection in LIDAR

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

arxiv: v1 [cs.cv] 28 Nov 2018

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

arxiv: v1 [cs.cv] 6 Nov 2018 Abstract

Photo-realistic Renderings for Machines Seong-heum Kim

Learning from 3D Data

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

arxiv: v1 [cs.cv] 17 Apr 2019

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Seeing the unseen. Data-driven 3D Understanding from Single Images. Hao Su

From 3D descriptors to monocular 6D pose: what have we learned?

MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

Machine Learning 13. week

3D model classification using convolutional neural network

GraphNet: Recommendation system based on language and network structure

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Detection and Tracking of Moving Objects Using 2.5D Motion Grids

Large-Scale Point Cloud Classification Benchmark

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

3D Object Classification via Spherical Projections

Keras: Handwritten Digit Recognition using MNIST Dataset

Relational inductive biases, deep learning, and graph networks

arxiv: v1 [cs.cv] 13 Feb 2018

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Deep Learning. Architecture Design for. Sargur N. Srihari

3D Attention-Driven Depth Acquisition for Object Identification

arxiv: v1 [cs.cv] 27 Nov 2018 Abstract

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Learning to generate 3D shapes

Cross-domain Deep Encoding for 3D Voxels and 2D Images

CS 395T Numerical Optimization for Graphics and AI (3D Vision) Qixing Huang August 29 th 2018

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

3D Object Detection with Sparse Sampling Neural Networks. Ryan Goy

Preparation Meeting. Recent Advances in the Analysis of 3D Shapes. Emanuele Rodolà Matthias Vestner Thomas Windheuser Daniel Cremers

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

3D Shape Segmentation with Projective Convolutional Networks

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Hello Edge: Keyword Spotting on Microcontrollers

arxiv: v1 [cs.cv] 2 Dec 2018

Face Alignment Across Large Poses: A 3D Solution

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Recurrent Convolutional Neural Networks for Scene Labeling

Keras: Handwritten Digit Recognition using MNIST Dataset

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Conditional Random Fields as Recurrent Neural Networks

Visual Computing TUM

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

3D Environment Reconstruction

End-To-End Spam Classification With Neural Networks

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

All You Want To Know About CNNs. Yukun Zhu

08 An Introduction to Dense Continuous Robotic Mapping

Efficient Algorithms may not be those we think

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Delivering Deep Learning to Mobile Devices via Offloading

Recurrent Neural Networks and Transfer Learning for Action Recognition

Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

arxiv: v1 [cs.cv] 28 Nov 2017

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 4 Jun 2018

Automotive 3D Object Detection Without Target Domain Annotations

3D Object Classification using Shape Distributions and Deep Learning

Deep Aggregation of Local 3D Geometric Features for 3D Model Retrieval

A spatiotemporal model with visual attention for video classification

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

AN image is simply a grid of numbers to a machine.

Deconvolution Networks

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

MultiAR Project Michael Pekel, Ofir Elmakias [GIP] [234329]

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Dynamic Routing Between Capsules

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

ABC-CNN: Attention Based CNN for Visual Question Answering

Deep Learning with Tensorflow AlexNet

1 Overview Definitions (read this section carefully) 2

YOLO9000: Better, Faster, Stronger

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval

arxiv: v4 [cs.cv] 27 Nov 2016

Know your data - many types of networks

Transcription:

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

Big Data + Deep Representation Learning Robot Perception Augmented Reality source: Scott J Grunewald source: Google Tango Emerging 3D Applications Shape Design source: solidsolutions

Big Data + Deep Representation Learning Robot Perception Augmented Reality source: Scott J Grunewald Shape Design source: Google Tango Need for 3D Deep Learning! source: solidsolutions

3D Representations Point Cloud Mesh Volumetric Projected View RGB(D)

3D Representation: Point Cloud Point cloud is close to raw sensor data LiDAR Depth Sensor Point Cloud

3D Representation: Point Cloud Point cloud is close to raw sensor data Point cloud is canonical Mesh LiDAR Volumetric Depth Sensor Point Cloud Depth Map

Previous Works Most existing point cloud features are handcrafted towards specific tasks Source: https://github.com/pointcloudlibrary/pcl/wiki/overview-and-comparison-of-features

Previous Works Point cloud is converted to other representations before it s fed to a deep neural network Conversion Voxelization Projection/Rendering Feature extraction Deep Net 3D CNN 2D CNN Fully Connected

Research Question: Can we achieve effective feature learning directly on point clouds?

Our Work: PointNet End-to-end learning for scattered, unordered point data PointNet

Our Work: PointNet End-to-end learning for scattered, unordered point data Unified framework for various tasks Object Classification PointNet Object Part Segmentation Semantic Scene Parsing...

Our Work: PointNet End-to-end learning for scattered, unordered point data Unified framework for various tasks

Challenges Unordered point set as input Model needs to be invariant to N! permutations. Invariance under geometric transformations Point cloud rotations should not alter classification results.

Challenges Unordered point set as input Model needs to be invariant to N! permutations. Invariance under geometric transformations Point cloud rotations should not alter classification results.

Unordered Input Point cloud: N orderless points, each represented by a D dim vector D N

Unordered Input Point cloud: N orderless points, each represented by a D dim vector D D N represents the same set as N

Unordered Input Point cloud: N orderless points, each represented by a D dim vector D D N represents the same set as N Model needs to be invariant to N! permutations

Permutation Invariance: Symmetric Function f (x 1, x 2,, x n ) f (x π1, x π 2,, x π n ), x i! D

Permutation Invariance: Symmetric Function f (x 1, x 2,, x n ) f (x π1, x π 2,, x π n ), x i! D Examples: f (x 1, x 2,, x n ) = max{x 1, x 2,, x n } f (x 1, x 2,, x n ) = x 1 + x 2 + + x n

Permutation Invariance: Symmetric Function f (x 1, x 2,, x n ) f (x π1, x π 2,, x π n ), x i! D Examples: f (x 1, x 2,, x n ) = max{x 1, x 2,, x n } f (x 1, x 2,, x n ) = x 1 + x 2 + + x n How can we construct a family of symmetric functions by neural networks?

Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric

Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric (1,2,3) (1,1,1) (2,3,2) h (2,3,4)

Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric (1,2,3) (1,1,1) (2,3,2) (2,3,4) h g simple symmetric function

Permutation Invariance: Symmetric Function Observe: f (x 1, x 2,, x n ) = γ! g(h(x 1 ),,h(x n )) is symmetric if g is symmetric (1,2,3) (1,1,1) (2,3,2) (2,3,4) h g simple symmetric function γ PointNet (vanilla)

Permutation Invariance: Symmetric Function What symmetric functions can be constructed by PointNet? Symmetric functions PointNet (vanilla)

Universal Set Function Approximator Theorem: A Hausdorff continuous symmetric function arbitrarily approximated by PointNet. f :2 X! can be S! d PointNet (vanilla)

Basic PointNet Architecture Empirically, we use multi-layer perceptron (MLP) and max pooling: h (1,2,3) (1,1,1) MLP MLP g γ (2,3,2) MLP max MLP (2,3,4) MLP PointNet (vanilla)

Challenges Unordered point set as input Model needs to be invariant to N! permutations. Invariance under geometric transformations Point cloud rotations should not alter classification results.

Input Alignment by Transformer Network Idea: Data dependent transformation for automatic alignment 3 T-Net transform 3 params N Transform N Data Transformed Data

Input Alignment by Transformer Network Idea: Data dependent transformation for automatic alignment 3 T-Net transform 3 params N Transform N Data Transformed Data

Input Alignment by Transformer Network Idea: Data dependent transformation for automatic alignment 3 T-Net transform 3 params N Transform N Data Transformed Data

Input Alignment by Transformer Network The transformation is just matrix multiplication! 3 T-Net transform 3 params: 3x3 N Matrix Mult. Data Transformed Data

Embedding Space Alignment T-Net transform params: 64x64 Matrix Mult. Input embeddings: Nx64 Transformed embeddings: Nx64

Embedding Space Alignment T-Net transform params: 64x64 Matrix Mult. Regularization: Transform matrix A 64x64 close to orthogonal: Input embeddings: Nx64 Transformed embeddings: Nx64

PointNet Classification Network

PointNet Classification Network

PointNet Classification Network

PointNet Classification Network

PointNet Classification Network

PointNet Classification Network

PointNet Classification Network

Extension to PointNet Segmentation Network local embedding global feature

Extension to PointNet Segmentation Network local embedding global feature

Results

Results on Object Classification 3D CNNs dataset: ModelNet40; metric: 40-class classification accuracy (%)

Results on Object Part Segmentation

Results on Object Part Segmentation dataset: ShapeNetPart; metric: mean IoU (%)

Results on Semantic Scene Parsing Output Input dataset: Stanford 2D-3D-S (Matterport scans)

Robustness to Data Corruption dataset: ModelNet40; metric: 40-class classification accuracy (%)

Robustness to Data Corruption Less than 2% accuracy drop with 50% missing data dataset: ModelNet40; metric: 40-class classification accuracy (%)

Robustness to Data Corruption dataset: ModelNet40; metric: 40-class classification accuracy (%)

Robustness to Data Corruption Why is PointNet so robust to missing data? 3D CNN

Visualizing Global Point Cloud Features n 3 1024 MLP shared maxpool global feature Which input points are contributing to the global feature? (critical points)

Visualizing Global Point Cloud Features Original Shape: Critical Point Sets:

Visualizing Global Point Cloud Features n 3 1024 MLP shared maxpool global feature Which points won t affect the global feature?

Visualizing Global Point Cloud Features Original Shape: Critical Point Set: Upper bound set:

Visualizing Global Point Cloud Features (OOS) Original Shape: Critical Point Set: Upper bound Set:

Conclusion PointNet is a novel deep neural network that directly consumes point cloud. A unified approach to various 3D recognition tasks. Rich theoretical analysis and experimental results. Code & Data Available! http://stanford.edu/~rqi/pointnet See you at Poster 9!

Thank you!

THE END

Speed and Model Size Inference time 11.6ms, 25.3ms GTX1080, batch size 8

Permutation Invariance: How about Sorting? Sort the points before feeding them into a network. Unfortunately, there is no canonical order in high dim space. (1,2,3) (1,1,1) (2,3,2) (2,3,4) lexsorted (1,1,1) (1,2,3) (2,3,2) (2,3,4) MLP

Permutation Invariance: How about Sorting? Sort the points before feeding them into a network. Unfortunately, there is no canonical order in high dim space. lexsorted Multi-Layer Perceptron (ModelNet shape classification) (1,2,3) (1,1,1) (2,3,2) (2,3,4) (1,1,1) (1,2,3) (2,3,2) (2,3,4) MLP Accuracy Unordered Input 12% Lexsorted Input 40% PointNet (vanilla) 87%

Permutation Invariance: How about RNNs? Train RNN with permutation augmentation. However, RNN forgets and order matters. LSTM LSTM LSTM LSTM MLP MLP MLP MLP (1,2,3) (1,1,1) (2,3,2) (2,3,4)

Permutation Invariance: How about RNNs? Train RNN with permutation augmentation. However, RNN forgets and order matters. LSTM LSTM LSTM LSTM LSTM Network (ModelNet shape classification) MLP MLP MLP MLP Accuracy LSTM 75% (1,2,3) (1,1,1) (2,3,2) (2,3,4) PointNet (vanilla) 87%

PointNet Classification Network ModelNet40 Accuracy PointNet (vanilla) 87.1% + input 3x3 87.9% + feature 64x64 86.9% + feature 64x64 + reg 87.4% + both 89.2%

Visualizing Point Functions Compact View: 1x3 FCs 1x1024 Expanded View: 1x3 FC FC FC FC FC 64 64 64 128 1x1024 Which input point will activate neuron X? Find the top-k points in a dense volumetric grid that activates neuron X.

Visualizing Point Functions