Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

Similar documents
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

arxiv: v4 [cs.cv] 27 Mar 2018

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

POINT CLOUD DEEP LEARNING

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

3D Deep Learning on Geometric Forms. Hao Su

arxiv: v1 [cs.cv] 28 Nov 2018

CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

SO-Net: Self-Organizing Network for Point Cloud Analysis

Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material

PHOG:Photometric and geometric functions for textured shape retrieval. Presentation by Eivind Kvissel

Machine Learning 13. week

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Supplementary Material for Ensemble Diffusion for Retrieval

Non-Profiled Deep Learning-Based Side-Channel Attacks

Dynamic Routing Between Capsules

Su et al. Shape Descriptors - III

Structure-oriented Networks of Shape Collections

The exam is closed book, closed notes except your one-page cheat sheet.

Photo-realistic Renderings for Machines Seong-heum Kim

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

MoonRiver: Deep Neural Network in C++

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

Multi-View 3D Object Detection Network for Autonomous Driving

Deep Learning with Tensorflow AlexNet

Vulnerability of machine learning models to adversarial examples

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Deep Face Recognition. Nathan Sun

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Learning to generate 3D shapes

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

SHREC 16 Track Large-Scale 3D Shape Retrieval from ShapeNet Core55

arxiv: v1 [cs.cv] 2 Dec 2018

Supplementary Material for The Generalized PatchMatch Correspondence Algorithm

Emphasizing 3D Properties in Recurrent Multi-View Aggregation for 3D Shape Retrieval

3D Shape Segmentation with Projective Convolutional Networks

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Deep Learning With Noise

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network

CS 340 Lec. 4: K-Nearest Neighbors

Keras: Handwritten Digit Recognition using MNIST Dataset

3D Deep Learning

arxiv: v1 [cs.cv] 21 Nov 2018

Introduction to Neural Networks

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

3D Object Classification via Spherical Projections

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Know your data - many types of networks

Deep Learning for Computer Vision II

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Apparel Classifier and Recommender using Deep Learning

CRF Based Point Cloud Segmentation Jonathan Nation

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Deep Aggregation of Local 3D Geometric Features for 3D Model Retrieval

Why Normalizing v? Why did we normalize v on the right side? Because we want the length of the left side to be the eigenvalue

4422 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 9, SEPTEMBER 2018

Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

Tutorial on Machine Learning Tools

A Deep Learning Approach to Vehicle Speed Estimation

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

arxiv: v1 [cs.cv] 31 Mar 2016

3D Object Recognition using Multiclass SVM-KNN

Accelerating Convolutional Neural Nets. Yunming Zhang

Self-Organizing Maps for cyclic and unbounded graphs

Lecture 7: Semantic Segmentation

Face Recognition A Deep Learning Approach

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Separating Objects and Clutter in Indoor Scenes

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Where s Waldo? A Deep Learning approach to Template Matching

Artificial Intelligence. Programming Styles

RECENT years have witnessed the rapid growth of image. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments

Binary Convolutional Neural Network on RRAM

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

arxiv: v1 [cs.cv] 29 Sep 2016

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Single Image Super Resolution of Textures via CNNs. Andrew Palmer

Stacked Denoising Autoencoders for Face Pose Normalization

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Supplementary Material for SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

arxiv: v1 [cs.cv] 6 Nov 2018 Abstract

Learning to Segment Object Candidates

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Kaggle Data Science Bowl 2017 Technical Report

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Alternatives to Direct Supervision

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Transcription:

Supplementary A. Overview This supplementary document provides more technical details and experimental results to the main paper. Shape retrieval experiments are demonstrated with ShapeNet Core55 dataset in Sec. B. The time and space complexity is analyzed in Sec. C, followed by detailed illustration of our permutation invariant SOM training algorithms in Sec. D. More experiments and results are presented in Sec. E. B. Shape Retrieval Our object classification network can be easily extended to the task of 3D shape retrieval by regarding the classification score as the feature vector. Given a query shape and a shape library, the similarity between the query and the candidates can be computed as their feature vector distances. B.1. Dataset We perform 3D shape retrieval task using the ShapeNet Core55 dataset, which contains 51,19 shapes from 55 categories and 24 subcategories. Specifically, we adopt the dataset split provided by the 3D Shape Retrieval Contest 216 (SHREC16), where 7% of the models are used for training, 1% for validation and 2% for testing. Since the 3D shapes are represented by CAD models, i.e., vertices and faces, we sample 5, points and surface normal vectors from each CAD model. Data augmentation is identical with the previous classification and segmentation experiments - random jitter and scale. B.2. Procedures We train a classification network on the ShapeNet Core55 dataset using identical configurations as our Model- Net4 classification experiment, i.e. a SOM of size 8 8 and k =3. For simplicity, the softmax loss is minimized with only the category labels (without any subcategory information). The classification score vector of length 55 is used as the feature vector. We calculate the L2 feature distance between each shape in the test set and all shapes in the same predicted category from the test set (including itself). The corresponding retrieval list is constructed by sorting these shapes according to the feature distances. B.3. Performance SHREC16 provides several evaluation metrics including Precision-Recall curve, F-score, mean average precision (map), normalized discounted cumulative gain (NDCG). These metrics are computed under two contexts - macro and micro. Macro metric is a simple average across all categories while micro metric is a weighted average according to the number of shapes in each category. As shown in Table 3, our SO-Net out-performs state-of-the-art approaches with most metrics. The precision-recall curves are illustrated in Fig. 9, where SO-Net demonstrates the largest area under curve (AUC). Some shape retrieval results are visualized in Fig. 11. C. Time and Space Complexity We evaluate the model size, forward (inference) time and training time of several point cloud based networks in the task of ModelNet4 classification, as shown in Table 4. The forward timings are acquired with a batch size of 8 and input point cloud size of 124. In the comparison, we choose the networks with the best classification accuracy among various configurations of PointNet and PointNet++, i.e., Point- Net with transformations and PointNet++ with multi-scale grouping (MSG). Because of the parallelizability and simplicity of our network design, our model size is smaller and the training speed is significantly faster compared to Point- Net and its successor PointNet++. Meanwhile, our inference time is around 1/3 of that of PointNet++. D. Permutation Invariant SOM We apply two methods to ensure that the SOM is invariant to the permutation of the input points - fixed initialization and deterministic training rule. D.1. Initialization for SOM Training In addition to permutation invariance, the initialization should be reasonable so that the SOM training is less prone to local minima. Suboptimal SOM may lead to many isolated nodes outside the coverage of the input points. For simplicity, we use fixed initialization for any point cloud input although there are other initialization approaches that are permutation invariant, e.g., principal component initialization. We generate a set of node coordinates that are uniformly distributed in an unit ball to serve as a reasonable initialization because the input point clouds are in various shapes. Unfortunately, as shown in Fig. 1, isolated nodes are inevitable even with uniform initialization. Isolated nodes may not be associated during the knn search, and their corresponding node features will be set to zero, i.e. the node features are invalid. Nevertheless, our SO-Net is robust to small amount of invalid nodes as demonstrated in the experiments. We propose a simple algorithm based on potential field methods to generate the initialization as shown in Algorithm 1. S = {s j 2 R 3,j =,,M 1} represents the SOM nodes and is the learning rate. The key idea is to apply a repulsion force between any pair of nodes, and external forces to attract nodes toward the origin. The parameter is used to control the weighting between the re-

Method Tatsuma Wang CCMLT Li ViewAggregation Bai GIFT [3] Su MVCNN [33] Kd-Net [18] O-CNN [36] Ours P@N.427.718.58.76.77.76.778.799 R@N.689.35.868.695.77.768.782.8 Micro F1@N map.472.728.391.823.582.829.689.825.764.873.743.85.776.875.795.869 NDCG@N.875.886.94.896.899.95.95.97 P@N.154.313.147.444.571.492.615 Macro F1@N map.23.596.286.661.21.711.454.74.575.817.519.746.622.85 R@N.73.536.813.531.625.676.673 NDCG@N.86.82.846.85.88.864.888 Table 3. 3D shape retrieval results with SHREC16. Our SO-Net out-performs state-of-the-art deep networks with most metrics. 1 1 Bai_GIFT Li_ViewAggregation Su_MVCNN Tatsuma_DB-FMCD-FUL-LCDR Wang_CCMLT SO-Net.95.9 Bai_GIFT Li_ViewAggregation Su_MVCNN Tatsuma_DB-FMCD-FUL-LCDR Wang_CCMLT SO-Net.9.8.85 P P.7.8.6.75.5.7.4.65.6.2.4.6.8 R.3.2.4.6.8 R (a) (b) Figure 9. Precision-recall curve for micro (a) and macro (b) metrics in the 3D shape retrieval task. In both curves, the SO-Net demonstrates the largest AUC. PointNet [26] PointNet++ [28] Kd-Net [18] Ours Size / MB 4 12 11.5 Forward / ms 25.3 163.2 59.6 Train / h 3-6 2 16 1.5 Table 4. Time and space complexity of point cloud based networks in ModelNet4 classification. Figure 1. Results of SOM training with uniform initialization. Isolated nodes are inevitable even with uniform initialization. pulsion and attraction force, so that the resulting nodes are within the unit ball. Algorithm 1 Potential field method for SOM initialization Set random seed. Random initialization: S U( 1, 1) repeat for all sj 2 S do fjwall sj fjnode for all sk 2 S, k 6= j do s s fjnode fjnode + ksjj skkk2 2 for all sj 2 S do sj sj + (fjwall + fjnode ) until converge D.2. Batch Update Training Instead of updating the SOM once per point, the batch update rule conducts one update after accumulating the effect of all points in the point cloud. As a result, each SOM

update iteration is unrelated to the order of point, i.e., permutation invariant. During SOM training, each training sample affects the winner node and all its neighbors. We define the neighborhood function as a Gaussian distribution as follows: w xy (x, y p, q, x, y) = exp 1 2 (x µ)t 1 (x µ) p (2 )2 µ = p q T " # 2x =. 2 y The pseudo code of the training scheme is shown in Algorithm 2. P = {p i 2 R 3,i =,,N 1} and S = {s j 2 R 3,j =,,M 1} represent the input points and SOM nodes respectively. The learning rate t and neighborhood parameter ( x, y) should be decreased slowly during training. In addition, Algorithm 2 can be easily implemented as matrix operations which are highly efficient on GPU. Algorithm 2 SOM batch update rule Initialize m m SOM S with Algorithm 1 for t<maxiter do. Set update vectors to zero for all s xy 2 S do D xy. Accumulate the effect of all points for all p i 2 P do Obtain nearest neighbor coordinate p, q for all s xy 2 S do w xy Eq. (6) D xy D xy + w xy (p i s xy ). Conduct one update for all s xy 2 S do s xy s xy + t D xy t t +1 Adjust x, y and t E. More Experiments E.1. MNIST Classification We evaluate our network using the 2D MNIST dataset, which contains 6, 28 28 images for training and (6) 1, images for testing. 2D coordinates are extracted from the non-zero pixels of the images. In order to upsample these 2D coordinates into point clouds of a certain size, e.g., 512 in our experiment, we augment the original pixel coordinates with Gaussian noise N (,.1). Other than the acquisition of point clouds, the data augmentation is exactly the same as other experiments using ModelNet or ShapeNetPart. We reduce the SOM size to 4 4 and set k = 4 because the point clouds are in 2D and the cloud size is relatively small. The neurons in the shared fully connected layers are reduced as well: 2-64-64-128-128 during point feature extraction and (128+2)-256-512-512-124 during node feature extraction. Similar to 3D classification tasks, our network outperforms existing point cloud based deep networks although the best performance is still from the well engineered 2D ConvNets as shown in Table 6. Despite using point cloud representation instead of images, our network demonstrates better results compared with ConvNets such as Network in Network [22], LeNet5 [2]. E.2. Classification with SOM Only There are two sources of information utilized by the SO- Net - the point cloud and trained SOM. The information from SOM is explicitly used when the nodes are concatenated with the node features at the beginning of node feature extraction. Additionally, the SOM is implicitly utilized because point normalization, knn search and the max pooling are based on the nodes. We perform classification using the SOM nodes without the point coordinates of the point cloud to analyze the contribution of the SOM. We feed the SOM nodes into a 3-layer MLP with MNIST, ModelNet1 and ModelNet4 dataset. Similarly in the Kd-Net [18], experiments are conducted using the kd-tree split directions without point information, i.e. feeding directions of the splits into a MLP. The results are shown in Table 5. It is interesting that we can achieve reasonable performance in the classification tasks by combining SOM and a simple MLP. But there is still a large gap between this variant and the full SO-Net, which suggests that the integration of SOM and point clouds is important. Another intriguing phenomenon is that the SOM based MLP achieves better results than split-based MLP. It suggests that maybe SOM is more expressive than kd-trees in the context of classification. E.3. Result Visualization To visualize the shape retrieval results, we present the top 5 retrieval results for a few shapes as shown in Fig. 11 For the point cloud autoencoder, we present results from two networks. The first network consumes 124 points and reconstructs 128 points with the ShapeNetPart dataset (Fig. 12), while the second one consumes 5 points

Method Input MNIST ModelNet1 ModelNet4 Kd-Net split based MLP [18] splits 82.4 83.4 73.2 Kd-Net depth 1 [18] point 99.1 93.3 9.6 Ours - SOM based MLP SOM nodes 91.37 88.9 75.7 Ours point 99.56 94.5 92.3 Table 5. Classification results using structure information - SOM nodes and kd-tree split directions. Method Error rate (%) Multi-column DNN [1].23 Network in Network [22].47 LeNet5 [2].8 Multi-layer perceptron [31] 1.6 PointNet [26].78 PointNet++ [28].51 Kd-Net [18].9 ECC [32].63 Ours.44 Table 6. MNIST classification results. and reconstructs 468 points using the ModelNet4 dataset (Fig. 13). We present one instance for each category. For results of object part segmentation using ShapeNet- Part dataset, we visualize one instance per category in Fig. 14. The inputs to the network are point clouds of size 124 and the corresponding surface normal vectors.

Figure 11. Top 5 retrieval results. First column: query shapes. Column 2-6: retrieved shapes ordered by feature similarity.

Figure 12. Results of our ShapeNetPart autoencoder. Red points are recovered by the convolution branch and green ones are by the fully connected branch. Odd rows: input point clouds. Even rows: reconstructed point clouds.

Figure 13. Results of our ModelNet4 autoencoder. Red points are recovered by the convolution branch and green ones are by the fully connected branch. Odd rows: input point clouds. Even rows: reconstructed point clouds.

Figure 14. Results of object part segmentation. Odd rows: ground truth segmentation. Even rows: predicted segmentation.