3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Size: px

Start display at page:

Download "3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis"

Giles Lucas
6 years ago
Views:

1 3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis

2 3D model repositories [3D Warehouse - video]

3 3D geometry acquisition [KinectFusion - video]

4 3D shapes come in various flavors Polygon meshes Analytic surfaces Point Clouds May have different resolution, non manifold geometry, arbitrary or no texture and interior, disjoint parts, noise

5 We need algorithms that understand shapes Office chair Geometric representation

6 We need algorithms that understand shapes Office chair back seat base Geometric representation

7 We need algorithms that understand shapes Office chair back seat base Geometric representation Correspondences (structure, function, style, point based)

8 Why shape understanding? Generative models of shapes

9 Why shape understanding? Generative models of shapes Kalogerakis, Chaudhuri, Koller, Koltun, SIGGRAPH 2012

10 Why shape understanding? Scene design Lun, Kalogerakis, Wang, Sheffer, SIGGRAPH ASIA 2016

11 Why shape understanding? Texturing Ear Head Torso Back Upper arm Lower arm Hand Upper leg Lower leg Foot Tail Kalogerakis, Hertzmann, Singh, SIGGRAPH 2010

12 Why shape understanding? Character Animation Ear Head Torso Back Upper arm Lower arm Hand Upper leg Lower leg Foot Tail Simari, Nowrouzezahrai, Kalogerakis, Singh, SGP 2009

13 How can we perform shape understanding? It is very hard to perform shape understanding with manually specified rules & hand engineered descriptors

14 The importance of good shape descriptors descriptor Motorbike Not Motorbike descriptor 1 Old-style descriptors: surface curvature, spin images, PCA

15 The importance of good shape descriptors descriptor descriptor 1 + Motorbike Not Motorbike Learning Algorithm descriptor Is this a Motorbike? descriptor 1 classification boundary Old-style descriptors: surface curvature, spin images, PCA

16 The importance of good shape descriptors tank? engine? Is this a Motorbike? tank? engine? tank? Learning Algorithm + engine? + Motorbike Not Motorbike Need descriptors that capture semantics, function

17 From shallow mappings Old style approach: output is a direct function of hand engineered shape descriptors y y f( x) ( w x) x 1 x 2 x 3... x d 1

18 to neural nets Introduce intermediate learned functions that yield optimized descriptors. y (2) y ( w h) (1) h ( w x) h h ( w x) 2 (1) 1 1 h x 1 x 2 x 3... x d 1

19 Stack several layers to deep neural nets y h 1 ' h 2 ' h 3 ' h n ' 1 h 1 h 2 h 3... h m 1 x 1 x 2 x 3... x d 1

20 Convolutional neural networks Think of these intermediate functions as convolutional filters acting on small adjacent windows

21 Convolutional neural networks Basic idea: interchange several convolutional and pooling (subsampling) layers. Source:

22 The image processing success story The convolution filters capture various hierarchical patterns (edges, sub parts, parts ). Convnets have achieved high accuracy in several image processing tasks. Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks, 2014

23 How can we apply convnets for 3D shapes? Motivated by the success of image based architectures and the fact that 3D shapes are often designed for viewing

24 View-based convnets for 3D shapes we introduced view based convnets for 3D shape analysis! Projective Convnet fuselage wing vert. stabilizer horiz. stabilizer E. Kalogerakis, M. Averkiou, S, Maji, S. Chaudhuri, CVPR 2017 (oral)

25 Input: shape as a collection of rendered views For each input shape, infer a set of viewpoints that maximally cover its surface across multiple distances.

26 Input: shape as a collection of rendered views Render depth & shaded images (normal dot view vector) Shaded images Depth images

27 Input: shape as a collection of rendered views Perform in plane camera rotations for rotational invariance 0 o, 90 o, 180 o, 270 o rotations Shaded images Depth images

28 Projective convnet architecture Each pair of depth & shaded images is processed by a convnet. Views are not ordered (no view correspondence across shapes). Convnets have shared parameters. feature feature feature Surface confidence Shaded images Depth images Convnet (fully convolutional)

29 Projective convnet architecture The output of each convnet branch is a confidence map per part label. hor. stabilizer feature feature feature Surface confidence Shaded images Depth images Convnet (fully convolutional) Image-based confidence

30 Projective convnet architecture The output of each convnet branch is a confidence map per part label. wing feature feature feature Surface confidence Shaded images Depth images Convnet (fully convolutional) Image-based confidence

31 Projective convnet architecture Since we want our output on the surface, we aggragate the image confidences across all views onto the surface. feature Shaded images Depth images feature feature Convnet (fully convolutional) Image-based confidence Surface confidence Surface confidence

32 Projective convnet architecture For each face / surface point, find all pixels that include it across all views, and use the max of confidence per label. feature Shaded images Depth images feature feature Convnet (fully convolutional) Image-based confidence max Surface confidence Surface confidence

33 Projective convnet architecture: CRF layer The last layer performs inference in a probabilistic model defined on the surface to promote coherent labeling. R 1 R 2 R 3 R 4 R 1, R 2, R 3, R 4 random variables taking values: fuselage wing vert. stabilizer hor. stabilizer

34 Projective convnet architecture: CRF layer It has the form of a Conditional Random Field whose unary term represents the surface based label confidences R 1 R 2 R 3 R 4 1 P( R, R, R, R... shape) P( R views) P( R, R surface) f f f ' Z f 1.. n i, j Unary factor (convnet)

35 Projective convnet architecture: CRF layer Pairwise terms favor same label for triangles or points with similar surface normals and small geodesic distance R 1 R 2 R 3 R 4 1 P( R, R, R, R... shape) P( R views) P( R, R surface) f f f ' Z f 1.. n i, j Pairwise factor (geodesic+normal dist.)

Projective convnet architecture: CRF layer Inference aims to find the most likely joint assignment to all surface random variables (optimization

36 Projective convnet architecture: CRF layer Inference aims to find the most likely joint assignment to all surface random variables (optimization problem) R 1 R 2 R 3 R 4 max 1 P( R, R, R, R... shape) P( R views) P( R, R surface) f f f ' Z f 1.. n i, j MAP assignment (mean field inference)

37 Training The architecture is trained end to end with analytic gradients. Training starts from a pretrained image based net (VGG16) Surface-based conf. CRF layer Output labeling Convnet Image-based conf. Forward pass / joint inference (convnet+crf) Backpropagation / joint training (convnet+crf)

38 Training The architecture is trained end to end with analytic gradients. Training starts from a pretrained image based net (VGG16) Surface-based conf. CRF layer Output labeling Convnet Image-based conf. Backpropagation / joint training (convnet+crf)

39 What are the learned filters doing? Activated in the presence of certain surface patterns / patches conv4 conv5 fc6

40 Dataset used in experiments Evaluation on ShapeNetCore (human labeled shapes). 50% used for training / 50% used for test split per category. [Yi et al. 2016]

41 ShapeNetCore: 8% improvement in labeling accuracy for complex categories (vehicles, furniture etc)

42 ShapeNetCore: 8% improvement in labeling accuracy for complex categories (vehicles, furniture etc)

43 ground-truth ShapeBoost ShapePFCN

44 ground-truth ShapeBoost ShapePFCN

45 Shape recognition with multi-view CNNs An earlier version of a view based CNN for shape recognition View pooling CNN 2... CNN 1 Su, Maji, Kalogerakis, Learned-Miller, ICCV 2015

46 Summary Inspired by human vision: view based convnets analyze what can be seen under view projections Aggregate information from multiple views selected to maximally cover the surface Fast processing at high resolutions Robust to input geometric representation artifacts (e.g., irregular tessellation, polygon soups, etc) Initialized from image based architectures pretrained on massive image datasets (filters capture shape+texture)

Thank you! Acknowledgements: NSF (CHS-1422441, CHS-1617333, IIS- 1617917), NVidia, Adobe, Facebook, Qualcomm.

47 Thank you! Acknowledgements: NSF (CHS , CHS , IIS ), NVidia, Adobe, Facebook, Qualcomm. Experiments were performed in the UMass GPU cluster (400 GPUs!) obtained under a grant by the MassTech Collaborative. Our project web page:

3D Shape Segmentation with Projective Convolutional Networks

3D Shape Segmentation with Projective Convolutional Networks Evangelos Kalogerakis 1 Melinos Averkiou 2 Subhransu Maji 1 Siddhartha Chaudhuri 3 1 University of Massachusetts Amherst 2 University of Cyprus