Learning from 3D Data

Size: px

Start display at page:

Download "Learning from 3D Data"

Earl Lamb
6 years ago
Views:

1 Learning from 3D Data Thomas Funkhouser Princeton University* * On sabbatical at Stanford and Google

2 Disclaimer: I am talking about the work of these people Shuran Song Andy Zeng Fisher Yu Yinda Zhang Maciej Halber Jerry Liu Manolis Savva Angel Chang Matthias Niessner Jianxiong Xiao Matt Fisher Angela Dai

3 Goal Learning to 3D objects in indoor environments Detect Segment Match Label Complete Synthesize Edit etc.

4 Motivation Applications Scene understanding Robot manipulation Object modeling etc.

5 Scale of region Focus of This Talk Learning patterns in 3D space Small Patch Object View Scene Large For each scale: Motivating application? Source of training data? Methods? Results?

6 Scale of region Talk Outline (Part 1 of 4) Learning patterns in 3D space Small Patch Object View Scene Large A. Zeng, S. Song, M. Niessner, M. Fisher, J. Xiao, T. Funkhouser, 3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions, CVPR 2017

7 3DMatch Goal: train a discriminating 3D local shape descriptor from data Local shape descriptor Local shape descriptor Match!

8 3DMatch Challenge: where to get training data?

9 3DMatch Approach: train on wide-baseline correspondences in RGB-D reconstructions

10 3DMatch Approach: train on wide-baseline correspondences in RGB-D reconstructions Ground truth match between RGB-D Images from different views

11 3DMatch Method: sample true/false correspondences from RGB-D reconstructions, train Siamese network

12 3DMatch Result: learns to discriminate local shapes found in real-world data

13 3DMatch Results Result 1: learned feature descriptor predicts RGB-D point correspondences more accurately than hand-tuned descriptors Match classification error at 95% recall Fragment Alignment Success Rate

of small objects in Amazon Picking Challenge Object pose

14 3DMatch Results Result 2: feature descriptor learned from RGB-D reconstructions provides matching for recognizing poses of small objects in Amazon Picking Challenge Object pose prediction accuracy Predicting pose of 3D object model in RGB-D scan

15 3DMatch Results Result 3: feature descriptor learned from RGB-D reconstructions provides discriminative matching of semantic correspondences on 3D meshes

16 Scale of region Talk Outline (Part 2 of 4) Learning patterns in 3D space Small Patch Object View Scene Large J. Liu, F. Yu, T. Funkhouser, Interactive 3D Modeling with a Generative Adversarial Network,, 3DV 2017

17 Interactive 3D Modeling with a GAN Goal: interactive modeling of objects with a SIMPLE user interface

18 Interactive 3D Modeling with a GAN Challenge: difficult to create complete, detailed, realistic shapes

19 Interactive 3D Modeling with a GAN Approach: learn a shape manifold, and then use it to help a user to design new realistic shapes Latent Space of Shape Manifold

20 Interactive 3D Modeling with a GAN Where to get training data? ShapeNet Chang et al., ShapeNet: An Information-Rich 3D Model Repository, arxiv, December 2015.

21 Interactive 3D Modeling with a GAN Prior work: editing images with shape manifold J-Y Zhu, P. Krähenbühl, E. Shechtman, A. Efros, Generative Visual Manipulation on the Natural Image Manifold ECCV 2016

22 Interactive 3D Modeling with a GAN New idea: allow users to snap partial edits to 3D shape manifold

23 Interactive 3D Modeling with a GAN New idea: edit, snap edit, snap

24 Interactive 3D Modeling with a GAN Method: constrain edits to shape manifold learned with 3DGAN x

25 Interactive 3D Modeling with a GAN Method: map user edits to shape manifold with learned projection operator x G(z) Projection z=p(x) x x P S (x)

26 Interactive 3D Modeling with a GAN Method: two step projection operator accounts for similarity and realism Step 1: project into latent space Step 2: optimize for realism Realism of output Similarity to input

27 Interactive 3D Modeling with a GAN Results: comparison to one-step projection network

28 Interactive 3D Modeling with a GAN Results: comparison of snap operator to retrieval of nearest neighbor

29 Interactive 3D Modeling with a GAN Results: example editing sequences for chairs

30 Interactive 3D Modeling with a GAN Results: example editing sequences for planes and tables

31 Scale of region Talk Outline (Part 3 of 4) Learning patterns in 3D space Small Patch Object View Scene Large S. Song, F. Yu, A. Zeng, A. Chang, M. Savva, and T. Funkhouser, Semantic Scene Completion from a Single Depth Image, CVPR 2017

32 Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class Input: Single view depth map Output: Semantic scene completion

33 Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class visible surface free space occluded space outside view outside room 3D Scene

34 Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class visible surface free space occluded space outside view outside room 3D Scene

35 Semantic Scene Completion Prior work: segmentation OR completion surface segmentation Silberman et al. scene completion Firman et al. 3D Scene The occupancy and the object This identity paper are tightly intertwined! semantic scene completion

36 Semantic Scene Completion: SSCNet Approach: end-to-end deep network Prediction: N+1 classes SSCNet 3D ConvNet Input: Single view depth map Output: Semantic scene completion

37 Semantic Scene Completion : SSCNet

38 Semantic Scene Completion : SSCNet

39 Semantic Scene Completion : SSCNet Encode 3D space using flipped TSDF

40 Semantic Scene Completion : SSCNet Encode 3D space using flipped TSDF Voxel size: 0.02 m

41 Semantic Scene Completion : SSCNet Local geometry Receptive field: 0.98 m

42 Semantic Scene Completion : SSCNet High-level 3D context via big receptive field provided by dilated convolution Receptive field: 2.26

43 Semantic Scene Completion : SSCNet Multi-scale aggregation Receptive field: 0.98 m Receptive field:1.62 m Receptive field: 2.26 m

44 Semantic Scene Completion: SSCNet Experiments Where to get training data?

45 Semantic Scene Completion: SSCNet Experiments Where to get training data? NYU: only visible surfaces SUN3D: No semantic labels No dense volumetric ground truth with semantic labels for a complete scene

46 Semantic Scene Completion: SSCNet Experiments SUNCG dataset

47 Semantic Scene Completion: SSCNet Experiments SUNCG dataset 46K houses 50K floors 400K rooms 5.6M object instances

48 Semantic Scene Completion: SSCNet Experiments SUNCG dataset synthetic camera views depth ground truth semantic scene completion

49 Semantic Scene Completion: SSCNet Experiments Train on SUNCG Test on NYU

50 Semantic Scene Completion: SSCNet Results Result: better than previous volumetric completion algorithms Comparison to previous algorithms for volumetric completion

51 Color Image Observed Surface Ground Truth Zhang et al. Firman et al. Ours(SSCNet)

52 Semantic Scene Completion: SSCNet Results Result: better than previous 3D model fitting algorithms Comparison to previous algorithms for 3D model fitting

53 Color Image Observed Surface Ground Truth Lin et al. Geiger and Wang Ours(SSCNet)

54 Color Image Observed Surface Ground Truth Lin et al. Geiger and Wang Ours(SSCNet)

55 Scale of region Talk Outline (Part 4 of 4) Learning patterns in 3D space Small Patch Object View Scene Large Y. Zhang, M. Bai, J. Xiao, P. Kohli, and S. Izadi, DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding, ICCV 2017

56 DeepContext Goal: given a RGB-D image, perform holistic 3D scene understanding wall lamp nightstand bed ceiling dresser dresser Input RGB-D Image floor All Objects Identified with 3D Bounding Boxes

57 DeepContext Goal: given a RGB-D image, perform holistic 3D scene understanding wall lamp nightstand bed ceiling dresser dresser Input RGB-D Image floor

58 DeepContext Challenge: how parse whole scene with a single deep network?? Want learned models for: Object shapes Object co-occurances Object spatial relationships Object supports etc. wall lamp nightstand floor bed ceiling dresser dresser

59 DeepContext Challenge: how parse whole scene with a single deep network? wall lamp ceiling dresser nightstand bed dresser 3D Convolutional Neural Network? floor

60 DeepContext Approach: different deep network templates for different scene types Sleeping Area Office Area Lounging Area Table & Chair Set

61 Sleeping Area dresser with mirror sofa ottoman dresser bed dresser dresser with mirror nightstand lamp wall

62 DeepContext Sleeping Area Office Area Lounging Area Table & Chair Set

63 1. Rotation 2. Translation Depth Image 3D Volume (TSDF) Transformation Network

64 1. Rotation 2. Translation Depth Image 3D Volume (TSDF) Transformation Network

65 1. Rotation 2. Translation Depth Image 3D Volume (TSDF) Transformation Network

66 1. Rotation 2. Translation Depth Image Initial Template Alignment (Top View) 3D Volume (TSDF) 3D Context Network 1. Object Existence 2. Box offset Transformation Network

67 1. Rotation 2. Translation Depth Image Initial Template Alignment (Top View) 3D Volume (TSDF) 3D Context Network 1. Object Existence 2. Box offset Transformation Network

68 1. Rotation 2. Translation Depth Image Initial Template Alignment (Top View) 3D Volume (TSDF) 3D Context Network 1. Object Existence 2. Box offset Transformation Network

69 1. Rotation 2. Translation Depth Image Initial Template Alignment (Top View) 3D Volume (TSDF) 3D Context Network 1. Object Existence 2. Box offset Transformation Network

70 Depth Image Initial Template Alignment (Top View) 3D Volume (TSDF) 3D Context Network lamp wall ceiling dresser Transformation Network 1. Object Existence 2. Box offset nightstand bed dresser 1. Rotation 2. Translation floor

71 DeepContext Results Where to get training data? SUN RGB-D S. Song, S. Lichtenberg, and J. Xiao. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite, CVPR 2015

72 DeepContext - Results Depth + Truth Result 2D Initial Alignment Initial Alignment Top View

73 DeepContext - Results Depth + Truth Result 2D Initial Alignment Initial Alignment Top View

74 DeepContext - Results Depth + Truth Result 2D Initial Alignment Initial Alignment Top View

detections than previous methods: Depth + Ground

75 DeepContext Results Experiments with SUN RGB-D show that DeepContext provides better 3D object detections than previous methods: Depth + Ground Truth 3D Object Detections Average Precision for 3D Object Detection

76 Summary Four projects where ConvNets are trained on 3D voxel grids with different Scales Tasks Training data Loss functions Network architectures Training protocols Problem structures Application domains

77 Future Challenges More efficient 3D convolutions Larger 3D data sets Larger 3D contexts Learning 3D interactions Better 3D representations

78 Future Challenges More efficient 3D convolutions Larger 3D data sets Larger 3D contexts Learning 3D interactions OctNet Riegler et al., CVPR 2017 Better 3D representations PointNet++ Qi et al., NIPS 2017

, CVPR 2017) 700 rooms, 36K labeled objects Synthetic RGB-D Image RGB-D Video Object ShapeNet Intel RealSense

79 Future Challenges More efficient 3D convolutions Larger 3D data sets Larger 3D contexts Learning 3D interactions Better 3D representations SUN RGB-D (Song et al, CVPR 2017) 10K labeled images ScanNet (Dai et al., CVPR 2017) 700 rooms, 36K labeled objects Synthetic RGB-D Image RGB-D Video Object ShapeNet Intel RealSense Redwood Room SUNCG SUN RGB-D ScanNet Multiroom SUNCG Matterport3D SUN3D Largest 3D datasets available today for indoor environments

annotations Object annotations A. Chang, A.

80 Future Challenges More efficient 3D convolutions Matterport3D Larger 3D data sets Larger 3D contexts Learning 3D interactions Better 3D representations Whole houses Room annotations Object annotations A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, Matteport3D: Learning from RGB-D Data in Indoor Environments, 3DV 2017

Future Challenges More efficient 3D convolutions Larger 3D data sets Larger 3D contexts Learning 3D interactions Better 3D representations MINOS:

81 Future Challenges More efficient 3D convolutions Larger 3D data sets Larger 3D contexts Learning 3D interactions Better 3D representations MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun, ArXiv 2017

82 Future Challenges More efficient 3D convolutions Larger 3D data sets Larger 3D contexts Learning 3D interactions Better 3D representations Geometry, materials, lighting, physical properties, support relationships, spatial relationships, affordances, activities, dynamics, etc. Ikea

interactions Better 3D representations Geometry,

relationships, spatial relationships, affordances,

83 Future Challenges More efficient 3D convolutions Larger 3D data sets Larger 3D contexts Learning 3D interactions Better 3D representations Geometry, materials, lighting, physical properties, support relationships, spatial relationships, affordances, activities, dynamics, etc. Wu et al. Visual Genome DeepContext

84 Acknowledgments Princeton: Angel X. Chang, Kyle Genova, Maciej Halber, Manolis Savva, Elena Sizikova, Shuran Song, Jianxiong Xiao, Fisher Yu, Yinda Zhang, Andy Zeng Collaborators: Angela Dai, Matt Fisher, Matthias Niessner, Jianxiong Xiao Data: SUN3D, NYU, Trimble, Planner5D, Matterport Funding: NSF, Google, Intel, Facebook, Adobe, Pixar Thank You!

3D Scene Understanding from RGB-D Images. Thomas Funkhouser

3D Scene Understanding from RGB-D Images Thomas Funkhouser Recent Ph.D. Student Current Postdocs Current Ph.D. Students Disclaimer: I am talking about the work of these people Shuran Song Yinda Zhang Andy