POINT CLOUD DEEP LEARNING

Similar documents
MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

Deep Learning with Tensorflow AlexNet

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

3D Object Classification via Spherical Projections

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

3D Object Detection with Sparse Sampling Neural Networks. Ryan Goy

Keras: Handwritten Digit Recognition using MNIST Dataset

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

Deep Learning for Computer Vision II

3D model classification using convolutional neural network

Keras: Handwritten Digit Recognition using MNIST Dataset

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Learning 3D Shapes as Multi-Layered Height-maps using 2D Convolutional Networks

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Deep Residual Learning

arxiv: v1 [cs.cv] 2 Dec 2018

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Fuzzy Set Theory in Computer Vision: Example 3

3D Deep Learning on Geometric Forms. Hao Su

CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017

Deep Learning for 3D Shape Classification Based on Volumetric Density and Surface Approximation Clues

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Structured Prediction using Convolutional Neural Networks

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Advanced Video Analysis & Imaging

3D Deep Learning

Perceptron: This is convolution!

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

arxiv: v1 [cs.cv] 28 Nov 2018

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Inception Network Overview. David White CS793

Convolutional Neural Networks

Deep Learning for Computer Vision

Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Classification of 3D Shapes with Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks

Multi-Task Self-Supervised Visual Learning

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

3D Object Classification using Shape Distributions and Deep Learning

Fuzzy Set Theory in Computer Vision: Example 3, Part II

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

Learning to generate 3D shapes

Spatial Localization and Detection. Lecture 8-1

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

Cross-domain Deep Encoding for 3D Voxels and 2D Images

Content-Based Image Recovery

Learning Descriptor Networks for 3D Shape Synthesis and Analysis

Using Faster-RCNN to Improve Shape Detection in LIDAR

3D CONVOLUTIONAL NEURAL NETWORKS BY MODAL FUSION

Implementing Deep Learning for Video Analytics on Tegra X1.

arxiv: v4 [cs.cv] 27 Nov 2016

CNN Basics. Chongruo Wu

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Face Recognition A Deep Learning Approach

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

Learning Adversarial 3D Model Generation with 2D Image Enhancer

Study of Residual Networks for Image Recognition

MoonRiver: Deep Neural Network in C++

DEEP LEARNING FOR 3D SHAPE CLASSIFICATION FROM MULTIPLE DEPTH MAPS. Pietro Zanuttigh and Ludovico Minto

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

All You Want To Know About CNNs. Yukun Zhu

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Dynamic Routing Between Capsules

Unified, real-time object detection

Deep Learning for Embedded Security Evaluation

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Machine Learning Tools

Fei-Fei Li & Justin Johnson & Serena Yeung

Automotive 3D Object Detection Without Target Domain Annotations

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

A performance comparison of Deep Learning frameworks on KNL

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Deep Learning and Its Applications

Deep Aggregation of Local 3D Geometric Features for 3D Model Retrieval

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

A Network Architecture for Point Cloud Classification via Automatic Depth Images Generation

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Accelerating Convolutional Neural Nets. Yunming Zhang

Know your data - many types of networks

A Deeper Look at 3D Shape Classifiers

Yiqi Yan. May 10, 2017

A Deep Learning Approach to Vehicle Speed Estimation

Transcription:

POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57

Introduction AGENDA Previous Work Method Result Conclusion 2 / 57

INTRODUCTION 3 / 57

2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification Convolutional Neural Network (CNN) for 2D images works really well AlexNet, ResNet, & GoogLeNet R-CNN Fast R-CNN Faster R-CNN Mask R-CNN Recent 2D image classification can even extract precise boundaries of objects (FCN Mask R-CNN) [] He et al., Mask R-CNN (27) 4 / 57

3D OBJECT CLASSIFICATION Deep Learning for 3D Object Classification 3D object classification approaches are getting more attentions Collecting 3D point data is easier and cheaper than before (LiDAR & other sensors) Size of data is bigger than 2D images Open datasets are increasing Recent researches approaches human level detection accuracy MVCNN, ShapeNet, PointNet, VoxNet, VoxelNet, & VRN Ensemble [2] Zhou and Tuzel, VoxelNet (27) 5 / 57

GOALS The goals of our method Evaluating & comparing different types of Neural Network models for 3D object classification Providing the generic framework to test multiple 3D neural network models Simple & easy to implement neural network models Fast preprocessing (remove bottleneck of loading, sampling, & jittering 3D data) 6 / 57

PREVIOUS WORK 7 / 57

3D POINT-BASED APPROACHES 3D Points Neural Nets PointNet First 3D point-based classification Unordered dataset Transform Multi-Layer Perceptron (MLP) Max Pool (MP) Classification [5] Qi et al., PointNet (27) 8 / 57

PIXEL-BASED APPROACHES 3D 2D Projections Neural Nets Multi-Layer Perceptron (MLP) Convolutional Neural Network (CNN) [4] Krizhevsky et al., AlexNet (22) Multi-View Convolutional Neural Network (MVCNN) [3] Su et al., MVCNN (25) 9 / 57

VOXEL-BASED APPROACHES 3D Points Voxels Neural Nets VoxNet [7] Broke et al., VRN Ensemble (26) VRN Ensemble VoxelNet [6] Maturana and Scherer, VoxNet (25) [2] Zhou and Tuzel, VoxelNet (27) / 57

METHOD / 57

PREPROCESSING Requirement Loading 3D polygonal objects Required Operations on 3D objects Sampling, Shuffling, Jittering, Scaling, & Rotating Projection, & Voxelization Python interface is not that good for multi-core processing (or multi-threading) # of objects is notoriously for single-core processing 2 / 57

nn3d_trainer (C++) Trainer: C++ program Load 3D Objects Sample Points Thread 3D Point Sample FRAMEWORK Basic Pipeline Thread 2 3D Point Sample Loading 3D Object Sampler Threads Thread 3 3D Point Sample Epoch #i Thread N 3D Point Sample Converter Pixel, Point, Voxel Converter Pixel, Point, Voxel Converter Pixel, Point, Voxel Converter Pixel, Point, Voxel Increase epoch Call Python NN Model Functions: Train, Test, Eval, Report, & Save 3 / 57

3D DATASETS MODELNET MODELNET4 SHAPENET CORE V2 Princeton ModelNet Data http://modelnet.cs.princeton.edu/ Categories 4,93 Objects (2 GB) OFF (CAD) File Format Princeton ModelNet Data http://modelnet.cs.princeton.edu/ 4 Categories 2,43 Objects ( GB) OFF (CAD) File Format ShapeNet https://www.shapenet.org/ 55 Categories 5,9 Objects (9 GB) OBJ File Format 4 / 57

Point-Based Models NEURAL NETWORK MODELS Pixel-Based Models Voxel-Based Models 5 / 57

POINT-BASED NEURAL NETWORK MODELS Types of Models Preprocessing: Rotate randomly Scale randomly Uniform sampling on 3D object surfaces Sample 248 points Shuffle points Tested Models Multi-Layer Perceptron (MLP) Multi Rotational MLPs Single Orientation CNN Multi Rotational CNNs Multi Rotational Resample & Max Pool Layers ResNet-like 6 / 57

7 / 57

POINT-BASED NEURAL NETWORK MODELS MLP 3D points ReLU + Dropout Softmax Cross Entropy Flatten Vector Fully Connected Layer Class Onehot Vector 8 / 57

POINT-BASED NEURAL NETWORK MODELS Multi Rotational MLPs 3D points ReLU + Dropout Softmax Cross Entropy Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 9 / 57

POINT-BASED NEURAL NETWORK MODELS Single Orientation CNN 3D points ReLU + Dropout ReLU + Dropout Softmax Cross Entropy Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 2 / 57

POINT-BASED NEURAL NETWORK MODELS Multi Rotational CNNs ReLU + Dropout ReLU + Dropout Softmax Cross Entropy 3D points Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 2 / 57

POINT-BASED NEURAL NETWORK MODELS Multi Rotational Resample & Max Pool Layers ReLU + Dropout ReLU + Dropout Softmax Cross Entropy 3D points Random 3x3 Rotation Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 22 / 57

POINT-BASED NEURAL NETWORK MODELS ResNet-like 3D points ReLU + Dropout ReLU + Dropout ReLU + Dropout Softmax Cross Entropy Random 3x3 Rotation Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 23 / 57

PIXEL-BASED NEURAL NETWORK MODELS Types of Models Preprocessing: Sample 892 points Tested Models: MLP Same as point-based models Depth-only orthogonal projection Depth-Only Orthogonal MVCNN 32x32 or 64x64 Generating multiple rotations 64x64x5 & 64x64x 24 / 57

25 / 57

PIXEL-BASED NEURAL NETWORK MODELS MLP Images (32x32x5) Softmax Cross Entropy Flatten Vector Fully Connected Layer Class Onehot Vector 26 / 57

PIXEL-BASED NEURAL NETWORK MODELS Depth-Only Orthogonal MVCNN Images (32x32x5) Softmax Cross Entropy Concat Image Separation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 27 / 57

VOXEL-BASED NEURAL NETWORK MODELS Types of Models Preprocessing: Sample 892 points Same as point based models Voxelization Tested Models: MLP CNN ResNet-like 3D points Voxels Each voxel has intensity. ~. how many points hit same voxel 32x32x32 & 64x64x64 28 / 57

29 / 57

VOXEL-BASED NEURAL NETWORK MODELS MLP Images (32x32x5) Softmax Cross Entropy Flatten Vector Fully Connected Layer Class Onehot Vector 3 / 57

VOXEL-BASED NEURAL NETWORK MODELS CNN Voxels 32x32x32 Softmax Cross Entropy 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 3 / 57

VOXEL-BASED NEURAL NETWORK MODELS ResNet-like Voxels 32x32x32 Softmax Cross Entropy Concat 3D Conv Layer Avg Pooling Layer Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 32 / 57

IMPLEMENTATION System Setup System: Ubuntu 6.4, RAM 32 GB & 64 GB, & SSD 52 GB NVIDIA Quadro P6, Quadro M6, & GeForce Titan X GCC 5.2. for C++ x Python 3.5 TensorFlow-GPU v.5. NumPy. 33 / 57

HYPER PARAMETERS Object Perturbation Random Rotations: -25 ~ 25 degree Random Scaling:.7 ~. Learning Rate:. Keep Probability (Dropout layer):.7 Max Epochs: Batch Size: 32 Number of Random Rotations: 2 Voxel Dim: 32x32x32 MVCNN Number of Views: 5 34 / 57

RESULT 35 / 57

MODELNET # OF TEST & TRAIN OBJECTS # of Test Models # of Train Models 9 8 7 6 5 4 3 2 36 / 57 table toilet monitor bathtub sofa chair desk dresser night_stand bed

MODELNET ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 37 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

MODELNET4 # OF TEST & TRAIN OBJECTS # of Test Models # of Train Models 9 8 7 6 5 4 3 2 38 / 57

MODELNET4 ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 39 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

MODELNET4 ACCURACY 7 CATEGORIES (# OF TRAIN OBJECTS > 4) # of Test Models # of Train Models 9 8 7 6 5 4 3 2 4 / 57

MODELNET4, 7 CATEGORIES ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 4 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

MODELNET4 ACCURACY CATEGORIES (# OF TRAIN OBJECTS > 3) # of Test Models # of Train Models 9 8 7 6 5 4 3 2 42 / 57

MODELNET4, CATEGORIES ACCURACY Train Accu Test Accu map 9 8 7 6 5 4 3 2 43 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

MODELNET4 ACCURACY 7 CATEGORIES (# OF TRAIN OBJECTS > 2) # of Test Models # of Train Models 9 8 7 6 5 4 3 2 44 / 57

MODELNET4,7 CATEGORIES ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 45 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

hours 3. MODELNET PERFORMANCE seconds.7 2.5.6 2..5.4.5.3..2.5.. PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Total Training Time Inference Time Per Batch 46 / 57

hours 8. MODELNET4 PERFORMANCE seconds.7 7..6 6..5 5..4 4..3 3. 2..2... PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Total Training Time Inference Time Per Batch 47 / 57

SHAPENET CORE V2 ACCURACY ShapeNet has pretty big dataset % Train Accu Test Accu map Iter: 6 9 GB of dataset 9 Only tested 3 NN models 8 Point MP 7 6 Depth-Only MVCNN 5 4 Voxel CNN 3 2 PC MP PX MVCNN VX CNN 48 / 57

TRAIN ACCURACY GRAPH.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MLP PC MLPs PC CNN PC CNNs 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MP PC ResNet PX MLP PX MVCNN 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54.8.6.4.2.8.6.4.2.8.6.4.2 VX MLP VX CNN VX ResNet 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 49 / 57

TEST ACCURACY GRAPH.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MLP PC MLPs PC CNN PC CNNs 2 3 4 2 3 4 2 3 4 2 3 4.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MP PC ResNet PX MLP PX MVCNN 2 3 4 2 3 4 2 3 4 2 3 4.8.6.4.2.8.6.4.2.8.6.4.2 VX MLP VX CNN VX ResNet 2 3 4 2 3 4 2 3 4 5 / 57

2.5 CROSS ENTROPY CONVERGENCE GRAPH PC MLP 2.5 PC MLPs 2.5 PC CNN 2.5 PC CNNs 2 2 2 2.5.5.5.5.5.5.5.5 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 2.5 PC MP 2.5 PC ResNet 2.5 PX MLP 2.5 PX MVCNN 2 2 2 2.5.5.5.5.5.5.5.5 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 2.5 VX MLP 2.5 VX CNN 2.5 VX ResNet 2 2 2.5.5.5.5.5.5 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 / 57

CONCLUSION 52 / 57

CONCLUSION Models Voxel ResNet and Voxel CNN provide best result on 3D object classification Unordered vs. Ordered: Unordered data (point cloud) could be learned, but lower accuracy and higher computation cost than ordered data Projection vs. Voxelization: Voxelization provides better result with similar computation cost (converge faster, & more precise) Strict comparisons with previous methods are not done yet Sampling and jittering methods are different (cannot directly compare yet) 53 / 57

CONCLUSION Dataset Evaluation ModelNet & ModelNet4 Some categories have too small training and testing objects Lowering classification accuracy ModelNet4, 7 categories (# of training objects > 4) Achieved 98% accuracy Partial dataset from ModelNet4 (Categories that have more than 4 3D objects) # of train 3D objects in a category matters ShapeNetCore.v2 # of 3D objects are big enough, but many object but many duplicated objects 54 / 57

FUTURE WORK Running entire procedures in CUDA Using NVIDIA GVDB Will have advantages of Sparse Voxels Strict comparisons with previous methods PointNet, VRN Ensemble, etc Extending methods to captured dataset (e.g. KITTI) Train set is too small Need to investigate whether Generative Adversarial Networks (GAN) can alleviate this problem or not 55 / 57

REFERENCE [] He et al., 27, Mask R-CNN [2] Zhou and Tuzel, 27, VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection [3] Su et al., 25, Multi-view convolutional neural networks for 3d shape recognition [4] Krizhevsky et al., 22, ImageNet Classification with Deep Convolutional Neural Networks [5] Qi et al., 27, Pointnet: Deep learning on point sets for 3d classification and segmentation [6] Maturana and Scherer, 25, VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition [7] Broke et al., 26, Generative and Discriminative Voxel Modeling with Convolutional Neural Networks [8] He et al., 26, Deep residual learning for image recognition [9] Goodfellow et al., 24, Generative adversarial nets [] Girshick, 25, Fast R-CNN [] Ren et al., 25, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [2] Chang et al., 25, ShapeNet: An Information-Rich 3D Model Repository [3] Wu et al., 25, 3D ShapeNets: A Deep Representation for Volumetric Shapes [4] Wu et al., 24, 3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction 56 / 57

57 / 57