MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26, 2018 2

Motivation Object recognition of 3D models from volumetric data Learn volumetric features from CAD models Local features 3D spatial features Memory efficient way to learn from volumetric data March 26, 2018 3

Boundary Representation (B-Rep) CAD Models De-facto representation for CAD models Can be easily tessellated into triangles for rendering Difficult to interpret volumetric information Size of a feature Internal location of a feature March 26, 2018 4

Voxel Representation Binary occupancy information Augmented with extra geometry information Can be used as direct input to a convolutional neural network Dense resolution voxel grid has high memory and computation requirements March 26, 2018 5

Why we need Multi-Resolution? As the resolution increases, the fraction of occupancy reduces Still need to store empty voxels An hierarchical (multi-level) representation is useful to capture key features at a finer resolution Level 1 Voxels Level 2 Voxels [2] http://openaccess.thecvf.com/content_cvpr_2017/poster/1319_poster.pdf March 26, 2018 6

ModelNet10 Dataset 3D CAD models for objects 10 categories of objects: Bathtub Chair Dresser Night Stand Table Bed Desk Monitor Sofa Toilet Source: Princeton ModelNet [1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao, 3D ShapeNets: A Deep Representation for Volumetric Shapes, Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR2015) March 26, 2018 7

Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26, 2018 8

Volumetric Voxelization of ModelNet10 Overlay a regular voxel grid on the object............................................. Test point membership of the voxel bounding-box center points, classify as in or out....................................................................................................................................... March 26, 2018 9

Identifying Boundary Voxels Boundary Voxels need to be identified in order to generate fine level voxel grid Identify the voxels that contain vertices Use separating-axis test for all other voxels within the bound Classify Vertices Triangle Box Intersection March 26, 2018 10

Fine Level Voxelization (Level 2) Same method as coarse level Clip the model using AABB of boundary voxels Perform similar Tri-Box intersection to identify level 2 Boundary voxels All the information is stored in a flat data structure March 26, 2018 11

Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26, 2018 12

3D CNN on Dense Voxel Grid Dense voxel grid as input model 3D-CNN with two convolutional layers and a max-pooling layer for feature extraction Dense Voxel Grid 10 Classes A fully connected dense layer to flatten the data to get 10 class classification Convolution Layer 1 Convolution Layer 2 Pooling Layer Dense Layer 1 Dense Layer 2 March 26, 2018 13

Data Augmentation ModelNet10: 3991 training and 908 testing 3D models Dataset size is insufficient to train the parameters of 3D-CNN 6 rigid body transformations on voxel grid for data augmentation 7x original data size used for training Rotation (x, y, z axis) Mirroring (x, y, z axis) Original model y y x x 90 Rot-z March 26, 2018 14

Outline Object Recognition Multi-Level Volumetric Representations for CAD Models Object Recognition using Dense Voxels Object Recognition using Multi-level Voxels March 26, 2018 15

Need to learn from Multi-Resolution data Learn efficiently from complex and intricate features of a CAD model Improve performance with fewer computations Amenable to model interpretability by learning finer features at specific spatial locations Low memory usage March 26, 2018 16

Data Augmentation Similar to data augmentation at coarse level voxels Rigid body transformation first applied on coarse voxels Transformation then applied on finer voxels inside each coarse voxel y 90 Level 1 Rot-z y 90 Level 2 Rot-z y x x x March 26, 2018 17

Multi-Level 3D CNN Boundary Voxels Level-2 Forward Linking Level-2 with Level-1 Level-1 Forward Classification 4 x 4 x 4 Voxel Grid 8 x 8 x 8 Voxel Grid 10 Classes Fine Voxels Convolution layers Pooling Dense Sigmoid Output Coarse Level Fusion Convolution Layer 1 Convolution Layer 2 Pooling Layer Dense Layer 1 Dense Layer 2 Update Weights Compute Level-2 Gradients Extract Voxel gradients based on forwards pass Compute Level-1 Gradients Compute Loss March 26, 2018 18

Results Multi-level training parameters: Batch size: 64 3D models of size 8x8x8 coarse & 4x4x4 fine voxels Optimizer: SGD with learning rate of 0.001 Loss Function: Softmax cross-entropy Network (Level-1): Convolution: 64 filters Convolution: 128 filters Max Pooling Dense Layer: 256 filters Network (Level-2): Convolution: 8 filters Convolution: 16 filters Max Pooling Dense Layer: 32 filters March 26, 2018 19

Results (Contd.) Dense level training parameters: Batch size: 64 3D models of size 32 x 32 x 32 voxels Optimizer: SGD with learning rate of 0.001 Loss Function: Softmax cross-entropy Network A: Convolution: 64 filters Max Pooling Convolution: 128 filters Max Pooling Dense Layer: 256 filters Network B: Convolution: 64 filters Convolution: 128 filters Max Pooling Dense Layer: 256 filters March 26, 2018 20

Accuracy Results (Contd.) 1 Coarse 2 Multi-Level 3 Dense 1 Coarse 2 Multi-Level 3 Dense 8x8x8 8x8x8 and 4x4x4 32x32x32 March 26, 2018 21

Results (Contd.) March 26, 2018 22

Results (Contd.) 16000 Memory Usage in GPU of Multi-Resolution voxel training & equivalent single resolution training 14000 12000 10000 8000 6000 4000 2000 0 Memory Usage in GPU (MB) Multi-Level Dense with MaxPool Dense wihout MaxPool March 26, 2018 23

Conclusions We have developed methods to represent CAD models using a multi-resolution voxel grid Developed a multi-level 3D-CNN for object recognition using the multi-resolution voxel grid Memory usage by the multi-level 3D-CNN is much lower than the dense voxel 3D-CNN without compromising the accuracy March 26, 2018 24

Future work Efficient training algorithms for Level-2 3D-CNN Explore different resolutions effect on training 3D-CNN Build model interpretability for hierarchical learning Experiment the algorithm with different datasets March 26, 2018 25

Acknowledgements AI-based Design and Manufacturability Lab (ADAM Lab) Xian Lee Aditya Balu Gavin Young Funding Sources National Science Foundation CMMI:1644441 CM: Machine-Learning Driven Decision Support in Design for Manufacturability nvidia Titan Xp GPU for Academic Research March 26, 2018 26

Thank You! Questions? March 26, 2018 27