MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

Size: px

Start display at page:

Download "MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY"

Opal Sutton
5 years ago
Views:

1 MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

2 Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26,

models Local features 3D spatial features Memory

3 Motivation Object recognition of 3D models from volumetric data Learn volumetric features from CAD models Local features 3D spatial features Memory efficient way to learn from volumetric data March 26,

4 Boundary Representation (B-Rep) CAD Models De-facto representation for CAD models Can be easily tessellated into triangles for rendering Difficult to interpret volumetric information Size of a feature Internal location of a feature March 26,

to a convolutional neural network Dense resolution voxel

5 Voxel Representation Binary occupancy information Augmented with extra geometry information Can be used as direct input to a convolutional neural network Dense resolution voxel grid has high memory and computation requirements March 26,

$As the resolution increases, the fraction$ of occupancy reduces Still need to store

Voxels Level 2 Voxels [2] http://openaccess.

6 Why we need Multi-Resolution? As the resolution increases, the fraction of occupancy reduces Still need to store empty voxels An hierarchical (multi-level) representation is useful to capture key features at a finer resolution Level 1 Voxels Level 2 Voxels [2] March 26,

ModelNet10 Dataset 3D CAD models for objects 10 categories of objects: Bathtub Chair Dresser Night Stand Table Bed Desk Monitor Sofa Toilet Source: Princeton ModelNet [1] Z. Wu, S. Song, A. Khosla, F.

7 ModelNet10 Dataset 3D CAD models for objects 10 categories of objects: Bathtub Chair Dresser Night Stand Table Bed Desk Monitor Sofa Toilet Source: Princeton ModelNet [1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao, 3D ShapeNets: A Deep Representation for Volumetric Shapes, Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR2015) March 26,

8 Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26,

Volumetric Voxelization of ModelNet10 Overlay a regular voxel grid on the object............................................. Test point membership of the voxel bounding-box center points, classify as in or out.

9 Volumetric Voxelization of ModelNet10 Overlay a regular voxel grid on the object Test point membership of the voxel bounding-box center points, classify as in or out March 26,

10 Identifying Boundary Voxels Boundary Voxels need to be identified in order to generate fine level voxel grid Identify the voxels that contain vertices Use separating-axis test for all other voxels within the bound Classify Vertices Triangle Box Intersection March 26,

11 Fine Level Voxelization (Level 2) Same method as coarse level Clip the model using AABB of boundary voxels Perform similar Tri-Box intersection to identify level 2 Boundary voxels All the information is stored in a flat data structure March 26,

12 Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26,

13 3D CNN on Dense Voxel Grid Dense voxel grid as input model 3D-CNN with two convolutional layers and a max-pooling layer for feature extraction Dense Voxel Grid 10 Classes A fully connected dense layer to flatten the data to get 10 class classification Convolution Layer 1 Convolution Layer 2 Pooling Layer Dense Layer 1 Dense Layer 2 March 26,

14 Data Augmentation ModelNet10: 3991 training and 908 testing 3D models Dataset size is insufficient to train the parameters of 3D-CNN 6 rigid body transformations on voxel grid for data augmentation 7x original data size used for training Rotation (x, y, z axis) Mirroring (x, y, z axis) Original model y y x x 90 Rot-z March 26,

15 Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26,

16 Need to learn from Multi-Resolution data Learn efficiently from complex and intricate features of a CAD model Improve performance with fewer computations Amenable to model interpretability by learning finer features at specific spatial locations Low memory usage March 26,

Transformation then applied on finer voxels inside each coarse

17 Data Augmentation Similar to data augmentation at coarse level voxels Rigid body transformation first applied on coarse voxels Transformation then applied on finer voxels inside each coarse voxel y 90 Level 1 Rot-z y 90 Level 2 Rot-z y x x x March 26,

Level Fusion Convolution Layer 1 Convolution Layer 2 Pooling Layer Dense Layer 1 Dense Layer 2 Update Weights Compute

18 Multi-Level 3D CNN Boundary Voxels Level-2 Forward Linking Level-2 with Level-1 Level-1 Forward Classification 4 x 4 x 4 Voxel Grid 8 x 8 x 8 Voxel Grid 10 Classes Fine Voxels Convolution layers Pooling Dense Sigmoid Output Coarse Level Fusion Convolution Layer 1 Convolution Layer 2 Pooling Layer Dense Layer 1 Dense Layer 2 Update Weights Compute Level-2 Gradients Extract Voxel gradients based on forwards pass Compute Level-1 Gradients Compute Loss March 26,

19 Results Multi-level training parameters: Batch size: 64 3D models of size 8x8x8 coarse & 4x4x4 fine voxels Optimizer: SGD with learning rate of Loss Function: Softmax cross-entropy Network (Level-1): Convolution: 64 filters Convolution: 128 filters Max Pooling Dense Layer: 256 filters Network (Level-2): Convolution: 8 filters Convolution: 16 filters Max Pooling Dense Layer: 32 filters March 26,

20 Results (Contd.) Dense level training parameters: Batch size: 64 3D models of size 32 x 32 x 32 voxels Optimizer: SGD with learning rate of Loss Function: Softmax cross-entropy Network A: Convolution: 64 filters Max Pooling Convolution: 128 filters Max Pooling Dense Layer: 256 filters Network B: Convolution: 64 filters Convolution: 128 filters Max Pooling Dense Layer: 256 filters March 26,

21 Accuracy Results (Contd.) 1 Coarse 2 Multi-Level 3 Dense 1 Coarse 2 Multi-Level 3 Dense 8x8x8 8x8x8 and 4x4x4 32x32x32 March 26,

22 Results (Contd.) March 26,

23 Results (Contd.) Memory Usage in GPU of Multi-Resolution voxel training & equivalent single resolution training Memory Usage in GPU (MB) Multi-Level Dense with MaxPool Dense wihout MaxPool March 26,

24 Conclusions We have developed methods to represent CAD models using a multi-resolution voxel grid Developed a multi-level 3D-CNN for object recognition using the multi-resolution voxel grid Memory usage by the multi-level 3D-CNN is much lower than the dense voxel 3D-CNN without compromising the accuracy March 26,

25 Future work Efficient training algorithms for Level-2 3D-CNN Explore different resolutions effect on training 3D-CNN Build model interpretability for hierarchical learning Experiment the algorithm with different datasets March 26,

26 Acknowledgements AI-based Design and Manufacturability Lab (ADAM Lab) Xian Lee Aditya Balu Gavin Young Funding Sources National Science Foundation CMMI: CM: Machine-Learning Driven Decision Support in Design for Manufacturability nvidia Titan Xp GPU for Academic Research March 26,

27 Thank You! Questions? March 26,

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details