POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57
Introduction AGENDA Previous Work Method Result Conclusion 2 / 57
INTRODUCTION 3 / 57
2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification Convolutional Neural Network (CNN) for 2D images works really well AlexNet, ResNet, & GoogLeNet R-CNN Fast R-CNN Faster R-CNN Mask R-CNN Recent 2D image classification can even extract precise boundaries of objects (FCN Mask R-CNN) [] He et al., Mask R-CNN (27) 4 / 57
3D OBJECT CLASSIFICATION Deep Learning for 3D Object Classification 3D object classification approaches are getting more attentions Collecting 3D point data is easier and cheaper than before (LiDAR & other sensors) Size of data is bigger than 2D images Open datasets are increasing Recent researches approaches human level detection accuracy MVCNN, ShapeNet, PointNet, VoxNet, VoxelNet, & VRN Ensemble [2] Zhou and Tuzel, VoxelNet (27) 5 / 57
GOALS The goals of our method Evaluating & comparing different types of Neural Network models for 3D object classification Providing the generic framework to test multiple 3D neural network models Simple & easy to implement neural network models Fast preprocessing (remove bottleneck of loading, sampling, & jittering 3D data) 6 / 57
PREVIOUS WORK 7 / 57
3D POINT-BASED APPROACHES 3D Points Neural Nets PointNet First 3D point-based classification Unordered dataset Transform Multi-Layer Perceptron (MLP) Max Pool (MP) Classification [5] Qi et al., PointNet (27) 8 / 57
PIXEL-BASED APPROACHES 3D 2D Projections Neural Nets Multi-Layer Perceptron (MLP) Convolutional Neural Network (CNN) [4] Krizhevsky et al., AlexNet (22) Multi-View Convolutional Neural Network (MVCNN) [3] Su et al., MVCNN (25) 9 / 57
VOXEL-BASED APPROACHES 3D Points Voxels Neural Nets VoxNet [7] Broke et al., VRN Ensemble (26) VRN Ensemble VoxelNet [6] Maturana and Scherer, VoxNet (25) [2] Zhou and Tuzel, VoxelNet (27) / 57
METHOD / 57
PREPROCESSING Requirement Loading 3D polygonal objects Required Operations on 3D objects Sampling, Shuffling, Jittering, Scaling, & Rotating Projection, & Voxelization Python interface is not that good for multi-core processing (or multi-threading) # of objects is notoriously for single-core processing 2 / 57
nn3d_trainer (C++) Trainer: C++ program Load 3D Objects Sample Points Thread 3D Point Sample FRAMEWORK Basic Pipeline Thread 2 3D Point Sample Loading 3D Object Sampler Threads Thread 3 3D Point Sample Epoch #i Thread N 3D Point Sample Converter Pixel, Point, Voxel Converter Pixel, Point, Voxel Converter Pixel, Point, Voxel Converter Pixel, Point, Voxel Increase epoch Call Python NN Model Functions: Train, Test, Eval, Report, & Save 3 / 57
3D DATASETS MODELNET MODELNET4 SHAPENET CORE V2 Princeton ModelNet Data http://modelnet.cs.princeton.edu/ Categories 4,93 Objects (2 GB) OFF (CAD) File Format Princeton ModelNet Data http://modelnet.cs.princeton.edu/ 4 Categories 2,43 Objects ( GB) OFF (CAD) File Format ShapeNet https://www.shapenet.org/ 55 Categories 5,9 Objects (9 GB) OBJ File Format 4 / 57
Point-Based Models NEURAL NETWORK MODELS Pixel-Based Models Voxel-Based Models 5 / 57
POINT-BASED NEURAL NETWORK MODELS Types of Models Preprocessing: Rotate randomly Scale randomly Uniform sampling on 3D object surfaces Sample 248 points Shuffle points Tested Models Multi-Layer Perceptron (MLP) Multi Rotational MLPs Single Orientation CNN Multi Rotational CNNs Multi Rotational Resample & Max Pool Layers ResNet-like 6 / 57
7 / 57
POINT-BASED NEURAL NETWORK MODELS MLP 3D points ReLU + Dropout Softmax Cross Entropy Flatten Vector Fully Connected Layer Class Onehot Vector 8 / 57
POINT-BASED NEURAL NETWORK MODELS Multi Rotational MLPs 3D points ReLU + Dropout Softmax Cross Entropy Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 9 / 57
POINT-BASED NEURAL NETWORK MODELS Single Orientation CNN 3D points ReLU + Dropout ReLU + Dropout Softmax Cross Entropy Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 2 / 57
POINT-BASED NEURAL NETWORK MODELS Multi Rotational CNNs ReLU + Dropout ReLU + Dropout Softmax Cross Entropy 3D points Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 2 / 57
POINT-BASED NEURAL NETWORK MODELS Multi Rotational Resample & Max Pool Layers ReLU + Dropout ReLU + Dropout Softmax Cross Entropy 3D points Random 3x3 Rotation Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 22 / 57
POINT-BASED NEURAL NETWORK MODELS ResNet-like 3D points ReLU + Dropout ReLU + Dropout ReLU + Dropout Softmax Cross Entropy Random 3x3 Rotation Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 23 / 57
PIXEL-BASED NEURAL NETWORK MODELS Types of Models Preprocessing: Sample 892 points Tested Models: MLP Same as point-based models Depth-only orthogonal projection Depth-Only Orthogonal MVCNN 32x32 or 64x64 Generating multiple rotations 64x64x5 & 64x64x 24 / 57
25 / 57
PIXEL-BASED NEURAL NETWORK MODELS MLP Images (32x32x5) Softmax Cross Entropy Flatten Vector Fully Connected Layer Class Onehot Vector 26 / 57
PIXEL-BASED NEURAL NETWORK MODELS Depth-Only Orthogonal MVCNN Images (32x32x5) Softmax Cross Entropy Concat Image Separation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 27 / 57
VOXEL-BASED NEURAL NETWORK MODELS Types of Models Preprocessing: Sample 892 points Same as point based models Voxelization Tested Models: MLP CNN ResNet-like 3D points Voxels Each voxel has intensity. ~. how many points hit same voxel 32x32x32 & 64x64x64 28 / 57
29 / 57
VOXEL-BASED NEURAL NETWORK MODELS MLP Images (32x32x5) Softmax Cross Entropy Flatten Vector Fully Connected Layer Class Onehot Vector 3 / 57
VOXEL-BASED NEURAL NETWORK MODELS CNN Voxels 32x32x32 Softmax Cross Entropy 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 3 / 57
VOXEL-BASED NEURAL NETWORK MODELS ResNet-like Voxels 32x32x32 Softmax Cross Entropy Concat 3D Conv Layer Avg Pooling Layer Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer Class Onehot Vector 32 / 57
IMPLEMENTATION System Setup System: Ubuntu 6.4, RAM 32 GB & 64 GB, & SSD 52 GB NVIDIA Quadro P6, Quadro M6, & GeForce Titan X GCC 5.2. for C++ x Python 3.5 TensorFlow-GPU v.5. NumPy. 33 / 57
HYPER PARAMETERS Object Perturbation Random Rotations: -25 ~ 25 degree Random Scaling:.7 ~. Learning Rate:. Keep Probability (Dropout layer):.7 Max Epochs: Batch Size: 32 Number of Random Rotations: 2 Voxel Dim: 32x32x32 MVCNN Number of Views: 5 34 / 57
RESULT 35 / 57
MODELNET # OF TEST & TRAIN OBJECTS # of Test Models # of Train Models 9 8 7 6 5 4 3 2 36 / 57 table toilet monitor bathtub sofa chair desk dresser night_stand bed
MODELNET ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 37 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
MODELNET4 # OF TEST & TRAIN OBJECTS # of Test Models # of Train Models 9 8 7 6 5 4 3 2 38 / 57
MODELNET4 ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 39 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
MODELNET4 ACCURACY 7 CATEGORIES (# OF TRAIN OBJECTS > 4) # of Test Models # of Train Models 9 8 7 6 5 4 3 2 4 / 57
MODELNET4, 7 CATEGORIES ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 4 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
MODELNET4 ACCURACY CATEGORIES (# OF TRAIN OBJECTS > 3) # of Test Models # of Train Models 9 8 7 6 5 4 3 2 42 / 57
MODELNET4, CATEGORIES ACCURACY Train Accu Test Accu map 9 8 7 6 5 4 3 2 43 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
MODELNET4 ACCURACY 7 CATEGORIES (# OF TRAIN OBJECTS > 2) # of Test Models # of Train Models 9 8 7 6 5 4 3 2 44 / 57
MODELNET4,7 CATEGORIES ACCURACY % Iter: Train Accu Test Accu map 9 8 7 6 5 4 3 2 45 / 57 PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
hours 3. MODELNET PERFORMANCE seconds.7 2.5.6 2..5.4.5.3..2.5.. PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Total Training Time Inference Time Per Batch 46 / 57
hours 8. MODELNET4 PERFORMANCE seconds.7 7..6 6..5 5..4 4..3 3. 2..2... PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet PC MLP PC CNN PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Total Training Time Inference Time Per Batch 47 / 57
SHAPENET CORE V2 ACCURACY ShapeNet has pretty big dataset % Train Accu Test Accu map Iter: 6 9 GB of dataset 9 Only tested 3 NN models 8 Point MP 7 6 Depth-Only MVCNN 5 4 Voxel CNN 3 2 PC MP PX MVCNN VX CNN 48 / 57
TRAIN ACCURACY GRAPH.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MLP PC MLPs PC CNN PC CNNs 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MP PC ResNet PX MLP PX MVCNN 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54.8.6.4.2.8.6.4.2.8.6.4.2 VX MLP VX CNN VX ResNet 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 49 / 57
TEST ACCURACY GRAPH.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MLP PC MLPs PC CNN PC CNNs 2 3 4 2 3 4 2 3 4 2 3 4.8.8.8.8.6.6.6.6.4.4.4.4.2.2.2.2 PC MP PC ResNet PX MLP PX MVCNN 2 3 4 2 3 4 2 3 4 2 3 4.8.6.4.2.8.6.4.2.8.6.4.2 VX MLP VX CNN VX ResNet 2 3 4 2 3 4 2 3 4 5 / 57
2.5 CROSS ENTROPY CONVERGENCE GRAPH PC MLP 2.5 PC MLPs 2.5 PC CNN 2.5 PC CNNs 2 2 2 2.5.5.5.5.5.5.5.5 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 2.5 PC MP 2.5 PC ResNet 2.5 PX MLP 2.5 PX MVCNN 2 2 2 2.5.5.5.5.5.5.5.5 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 2.5 VX MLP 2.5 VX CNN 2.5 VX ResNet 2 2 2.5.5.5.5.5.5 5 45 785 6 54 5 45 785 6 54 5 45 785 6 54 5 / 57
CONCLUSION 52 / 57
CONCLUSION Models Voxel ResNet and Voxel CNN provide best result on 3D object classification Unordered vs. Ordered: Unordered data (point cloud) could be learned, but lower accuracy and higher computation cost than ordered data Projection vs. Voxelization: Voxelization provides better result with similar computation cost (converge faster, & more precise) Strict comparisons with previous methods are not done yet Sampling and jittering methods are different (cannot directly compare yet) 53 / 57
CONCLUSION Dataset Evaluation ModelNet & ModelNet4 Some categories have too small training and testing objects Lowering classification accuracy ModelNet4, 7 categories (# of training objects > 4) Achieved 98% accuracy Partial dataset from ModelNet4 (Categories that have more than 4 3D objects) # of train 3D objects in a category matters ShapeNetCore.v2 # of 3D objects are big enough, but many object but many duplicated objects 54 / 57
FUTURE WORK Running entire procedures in CUDA Using NVIDIA GVDB Will have advantages of Sparse Voxels Strict comparisons with previous methods PointNet, VRN Ensemble, etc Extending methods to captured dataset (e.g. KITTI) Train set is too small Need to investigate whether Generative Adversarial Networks (GAN) can alleviate this problem or not 55 / 57
REFERENCE [] He et al., 27, Mask R-CNN [2] Zhou and Tuzel, 27, VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection [3] Su et al., 25, Multi-view convolutional neural networks for 3d shape recognition [4] Krizhevsky et al., 22, ImageNet Classification with Deep Convolutional Neural Networks [5] Qi et al., 27, Pointnet: Deep learning on point sets for 3d classification and segmentation [6] Maturana and Scherer, 25, VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition [7] Broke et al., 26, Generative and Discriminative Voxel Modeling with Convolutional Neural Networks [8] He et al., 26, Deep residual learning for image recognition [9] Goodfellow et al., 24, Generative adversarial nets [] Girshick, 25, Fast R-CNN [] Ren et al., 25, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [2] Chang et al., 25, ShapeNet: An Information-Rich 3D Model Repository [3] Wu et al., 25, 3D ShapeNets: A Deep Representation for Volumetric Shapes [4] Wu et al., 24, 3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction 56 / 57
57 / 57