Multi-view 3D Models from Single Images with a Convolutional Network

Size: px

Start display at page:

Download "Multi-view 3D Models from Single Images with a Convolutional Network"

Ronald Dalton
6 years ago
Views:

1 Multi-view 3D Models from Single Images with a Convolutional Network Maxim Tatarchenko University of Freiburg Skoltech - 2nd Christmas Colloquium on Computer Vision

2 Humans have prior knowledge about 3D 2

3 Humans have prior knowledge about 3D Side view? 2

4 Humans have prior knowledge about 3D Side view? 2

5 Humans have prior knowledge about 3D Side view? 2

6 3D-awareness How can we teach similar 3Dawareness to neural networks? 3

7 Convolutional network cat *slides partially provided by Alexey Dosovitskiy 4

8 Up-convolutional network 5

Up-convolutions 2 1.5 0.8 1.2 1 1.7 1.5 1.7 2 0.

3 Pooling shrinking the feature maps 1.4 0.5 2.1 1.

9 Up-convolutions Pooling shrinking the feature maps Unpooling expanding the feature maps Up-convolution Unpooling + Convolution 6

10 Application Generating chairs Alexey Dosovitskiy Jost Tobias Springenberg Thomas Brox 7

11 Cats are complicated 8

12 Chairs are simpler 9

13 Training data Chairs from [Aubry et al. 2014] Cars and tables from ShapeNet Figure from dimatura/seeing3d 10

14 Training data Chairs from [Aubry et al. 2014] Cars and tables from ShapeNet 11

15 CNN for generating objects [1] A. Dosovitskiy, J. T. Springenberg and T. Brox Learning to Generate Chairs with Convolutional Neural Networks, CVPR 2015 [2] A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko and T. Brox Learning to Generate Chairs, Tables and Cars with Convolutional Neural Networks, PAMI

16 Generated images - transformations Translation Rotation Zoom Squeeze Saturation Brightness Color 13

17 Style interpolation - chairs 14

18 Style interpolation - chairs 14

19 Style interpolation - chairs 14

20 Style interpolation - cars 15

21 Style interpolation chairs to tables 16

22 Chair arithmetic 17

23 Viewpoint interpolation transfer learning Source set : 90% styles, all viewpoints available Target set : 10% styles, only some viewpoints available Task: Interpolate missing angles in the target set 15 azimuth angles available 18

24 Viewpoint interpolation transfer learning Source set : 90% styles, all viewpoints available Target set : 10% styles, only some viewpoints available Task: Interpolate missing angles in the target set 15 azimuth angles available 18

25 Viewpoint interpolation transfer learning Source set : 90% styles, all viewpoints available Target set : 10% styles, only some viewpoints available Task: Interpolate missing angles in the target set 15 azimuth angles available 18

26 Viewpoint interpolation transfer learning 8 azimuths available 4 azimuths available 2 azimuths available 1 azimuth available 19

27 Let s add an inference network! 20

28 Novel view prediction Adding an inference net M.Tatarchenko, A. Dosovitskiy, and T. Brox Multi-view 3D Models from Single Images with a Convolutional Network, ECCV

29 Performance on synthetic data 22

30 Performance on synthetic data 22

31 Performance on synthetic data - video 23

32 Segmentation Training data + = Network predictions 24

33 Segmentation - video 25

34 Trained on synthetic, works on natural 26

35 Network learns consistent 3D representation 27

36 Network learns consistent 3D representation 27

37 Network learns consistent 3D representation 27

38 Network learns consistent 3D representation 27

39 3D reconstruction - video 28

40 Comparison with IGN Kulkarni et al., NIPS

41 Comparison with no inference -network Dosovitskiy et al., CVPR

42 Comparison with recurrent network Yang et al., NIPS

43 Comparison with appearance flow Zhou et al., ECCV

44 Informative inputs lead to better predictions 33

45 Informative inputs lead to better predictions 33

46 Informative inputs lead to better predictions 33

47 Informative inputs lead to better predictions 34

48 Informative inputs lead to better predictions 35

49 Interpolation between cars 36

50 Internal representation is invariant 37

51 Internal representation is invariant pairwise distances 37

52 Internal representation is invariant pairwise distances 37

53 Summary High-resolution images can be generated with a convolutional network from a set of high-level parameters Network learns meaningful continuous manifolds Adding an encoder allows to infer 3D representation from a single image Internal 3D representation can be explicitly decoded into a consistent point cloud by fusing multiple output depth maps 38

54 Thank you! Code availble: 39

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS Alexey Dosovitskiy, Jost Tobias Springenberg and Thomas Brox University of Freiburg Presented by: Shreyansh Daftry Visual Learning and Recognition