Learning to generate 3D shapes

Similar documents
3D Deep Learning on Geometric Forms. Hao Su

Learning from 3D Data

Deep Models for 3D Reconstruction

Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material

MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

Bilinear Models for Fine-Grained Visual Recognition

3D Shape Segmentation with Projective Convolutional Networks

POINT CLOUD DEEP LEARNING

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Alternatives to Direct Supervision

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Unsupervised Learning

CAPNet: Continuous Approximation Projection For 3D Point Cloud Reconstruction Using 2D Supervision

Deep Learning for Visual Manipulation and Synthesis

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

GAL: Geometric Adversarial Loss for Single-View 3D-Object Reconstruction

Learning Adversarial 3D Model Generation with 2D Image Enhancer

Computer Vision: Making machines see

3D Deep Learning

GANs for Exploiting Unlabeled Data. Presented by: Uriya Pesso Nimrod Gilboa Markevich

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Multiresolution Tree Networks for 3D Point Cloud Processing

CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017

Two Routes for Image to Image Translation: Rule based vs. Learning based. Minglun Gong, Memorial Univ. Collaboration with Mr.

Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction

Deformable Part Models

Using Machine Learning to Optimize Storage Systems

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

S+U Learning through ANs - Pranjit Kalita

Seeing the unseen. Data-driven 3D Understanding from Single Images. Hao Su

Lecture 7: Semantic Segmentation

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Stochastic Simulation with Generative Adversarial Networks

arxiv: v1 [cs.cv] 21 Jun 2017

arxiv: v1 [cs.cv] 27 Mar 2019

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

arxiv: v2 [cs.cv] 31 Mar 2018

arxiv: v1 [cs.cv] 10 Sep 2018

CS 395T Numerical Optimization for Graphics and AI (3D Vision) Qixing Huang August 29 th 2018

CSGNet: Neural Shape Parser for Constructive Solid Geometry

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

arxiv: v1 [cs.cv] 30 Sep 2018

CAPNet: Continuous Approximation Projection For 3D Point Cloud Reconstruction Using 2D Supervision

Structured Prediction using Convolutional Neural Networks

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

Multi-view 3D Models from Single Images with a Convolutional Network

Unsupervised Learning of Spatiotemporally Coherent Metrics

Web-Scale Image Search and Their Applications

What are we trying to achieve? Why are we doing this? What do we learn from past history? What will we talk about today?

DeepPose & Convolutional Pose Machines

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang

Generative Adversarial Network

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

Su et al. Shape Descriptors - III

08 An Introduction to Dense Continuous Robotic Mapping

Fully Convolutional Networks for Semantic Segmentation

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

CAP 6412 Advanced Computer Vision

Visual Computing TUM

3D model classification using convolutional neural network

Visual Recommender System with Adversarial Generator-Encoder Networks

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

3D Object Classification via Spherical Projections

Spatial Localization and Detection. Lecture 8-1

Analysis and Synthesis of 3D Shape Families via Deep Learned Generative Models of Surfaces

Human Pose Estimation with Deep Learning. Wei Yang

arxiv: v1 [cs.cv] 19 Jul 2017

Generative Adversarial Text to Image Synthesis

arxiv: v1 [cs.cv] 17 Nov 2016

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Learning visual odometry with a convolutional network

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material

Introduction to Deep Learning

arxiv: v1 [cs.cv] 31 Mar 2016

Geometric and Semantic 3D Reconstruction: Part 4A: Volumetric Semantic 3D Reconstruction. CVPR 2017 Tutorial Christian Häne UC Berkeley

3D Reconstruction of Dynamic Textures with Crowd Sourced Data. Dinghuang Ji, Enrique Dunn and Jan-Michael Frahm

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Learning and Recognizing Visual Object Categories Without First Detecting Features

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Lecture 19: Generative Adversarial Networks

Transcription:

Learning to generate 3D shapes Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji August 10, 2018 @ Caltech

Creating 3D shapes is not easy Image from Autodesk 3D Maya

Inferring 3D shapes from images What shapes are puffins? What shapes are pumpkinseed fish? 3

Creating 3D shapes is not easy Many techniques for recognizing 3D data, but relatively few techniques for generating them Representations for generation? Voxels Multiview Geometry images Shape basis Set-based (points, triangles, etc.) Procedural, e.g., constructive solid geometry

Talk overview Generative models for 3D shapes and applications Multiview [3DV 17] Multiresolution tree networks [ECCV 18] Constructive solid geometry [CVPR 18] Learning 3D shapes with weak supervision [3DV 17]

Part 1 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks Zhaoliang Lun Matheus Gadelha Evangelos Kalogerakis Subhransu Maji Rui Wang 3DV 2017

Part 1 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks Zhaoliang Lun Matheus Gadelha Evangelos Kalogerakis Subhransu Maji Rui Wang 3DV 2017

Why line drawings? Simple & intuitive medium to convey shape! Image from Suggestive Contour Gallery, DeCarlo et al. 2003

Goal: 2D line drawings in, 3D shapes out! front view ShapeMVD side view 3D shape

Deep net architecture: U-net structure Feature representations in the decoder depend on previous layer & encoder s corresponding layer front view output view 1 side view output view 12 U-net: Ronneberger et al. 2015, Isola et al. 2016

Training: full loss Penalize per-pixel depth reconstruction error: & per-pixel normal reconstruction error: & unreal outputs: pixels pixels d d pred log P( real) gt (1 n n ) pred gt Discriminator Network Real? Fake? front view front view Generator Network output view 1 Discriminator Network Real? Fake? side view output view 12 cgan: Isola et al. 2016

Training data Character 10K models Chair 10K models Airplane 3K models Models from The Models Resource & 3D Warehouse

Training data Synthetic line drawings 12 views Training depth and normal maps

Test time Predict multi-view depth and normal maps! front view output view 1 side view output view 12

Multi-view depth & normal map fusion Optimization problem output view 1 Depth derivatives should be consistent with normals Corresponding depths and normals across different views should agree output view 12 Multi-view depth & normal maps Consolidated point cloud

Surface reconstruction output view 1 output view 12 Multi-view depth & normal maps Consolidated point cloud Surface reconstruction [Kazhdan et al. 2013]

Surface deformation output view 1 output view 12 Multi-view depth & normal maps Consolidated point cloud Surface reconstruction [Kazhdan et al. 2013] Surface fine-tuning [Nealen et al. 2005]

Surface deformation output view 1 output view 12 Multi-view depth & normal maps Consolidated point cloud Surface reconstruction [Kazhdan et al. 2013] Surface fine-tuning [Nealen et al. 2005]

Experiments

Qualitative Results reference shape reference shape nearest retrieval our result volumetric net nearest retrieval our result volumetric net

Qualitative Results reference shape reference shape nearest retrieval our result volumetric net nearest retrieval our result volumetric net

Quantitative Results Character (human drawing) Metric Our method Volumetric NN Hausdorff distance 0.120 0.638 0.242 Chamfer distance 0.023 0.052 0.045 normal distance 34.27 56.97 47.94 volumetric distance 0.309 0.497 0.550 Man-made (human drawing) Hausdorff distance 0.171 0.211 0.228 Chamfer distance 0.028 0.032 0.038 normal distance 34.19 48.81 43.75 volumetric distance 0.439 0.530 0.560

Single vs two input line drawings Single sketch Resulting shape Two sketches Resulting shape

More results

Part 2 Multiresolution Tree Networks for 3D Point Cloud Processing Matheus Gadelha Subhransu Maji Rui Wang ECCV 18

Part 2 Multiresolution Tree Networks for 3D Point Cloud Processing Matheus Gadelha Subhransu Maji Rui Wang ECCV 18

Point-cloud decoders Global shape basis or fully-connected decoders Requires perfect correspondence Morphable models [Figure from Booth et al., 16]

How important is correspondence? Gadelha et al., BMVC 17 Related work: Fan et al., CVPR 2017

How important is correspondence?

Multiresolution tree networks Addresses the lack of convolutional structure, and coarse to fine reasoning Basic idea: linearize 3D points and use 1D convolutions Points colored with kdtree sort index

Multiresolution tree networks Addresses the lack of convolutional structure, and coarse to fine reasoning Basic idea: linearize 3D points and use 1D convolutions Implicit mulitresolution structure

Multiresolution tree networks Architecture for encoding and decoding Multiresolution convolution block

Does multiresolution analysis help? fully-connected single-resolution multi-resolution Color indicates the position in the list

Other shape tasks with MRTNet

Single image shape reconstruction

Quantitative evaluation: ShapeNet dataset Chamfer distance: pred g GT / GT g pred voxel-based fully-conn. multiview Lin et al. AAAI 2018

Quantitative evaluation: ShapeNet dataset Chamfer distance: pred g GT / GT g pred voxel-based fully-conn. multiview

MRTNet summary A generic architecture for Point cloud classification (91.7% on ModelNet40) Semantic segmentation (see results in the paper) Generation Project page: http://mgadelha.me/mrt/index.html

Part 3 CSGNet: Neural Shape Parser for Constructive Solid Geometry Gopal Sharma Rishabh Goyal Difan Liu Evangelos Kalogerakis Subhransu Maji CVPR 18 interpretable and editable

Constructive 2D geometry

Constructive solid geometry

Constructive solid geometry Circle1 Triangle1 - Circle2 - Triangle2 - Triangle3 - [STOP] Postfix notation CNN RNN Program

Learning Supervised setting: learn to predict programs directly Unsupervised setting: No ground-truth programs. Learn parameters to minimize a reconstruction error through policy gradients [REINFORCE, Willams 1992] reconstruction loss CNN RNN Program

CSGNet: 2D/3D a programs

CSGNet: 2D/3D a programs Train on synthetic data and adapt to new domains using policy gradients Synthetic CAD shapes Logos

CSGNet: 2D/3D a programs How well does the nearest neighbor perform? Chamfer distance 1.88 (NN), 1.36 (CSGNet) with 675K training examples

CSGNet: 2D/3D a programs How well does the nearest neighbor perform? Chamfer distance 1.88 (NN), 1.36 (CSGNet) with 675K training examples

CSGNet: 2D/3D a programs CAD shapes dataset: Chamfer distance 1.94 (NN), 0.51 (CSGNet)

CSGNet: 2D/3D a programs Input More results in the paper: reward shaping, comparison to Faster R-CNN for primitive detection, results on 3D, etc. Preprint available: https://arxiv.org/abs/1712.08290

Part 4 Learning 3D Shape Representations with Weak Supervision Matheus Gadelha Subhransu Maji Rui Wang 3DV 2017

Related Work 3D shape from collection of images Visual hull same instance, known viewpoints Photometric stereo same instance, known lighting, simple reflectance Structure from motion same instance (or 3D) Non-rigid structure from motion known shape family (e.g., faces) Our work unknown shape family, unknown viewpoints 3D shape from single image Optimization-based approaches; Recognition-based approaches;

A motivating example Small cubes are reddish Big cubes are bluish Hypothesis: It is easier to generate these images by reasoning in 3D

Approach Our goal is to learn a 3D shape generator whose projections match the provided set of the views Generator Projection

Approach Our goal is to learn a 3D shape generator whose projections match the provided set of the views Generator Projection How do we match distributions?

Approach Our goal is to learn a 3D shape generator whose projections match the provided set of the views Generator Projection generated min G How do we match distributions? true D KL(G D) =min z G E apple log G(z) D(z)

Approach Our goal is to learn a 3D shape generator whose projections match the provided set of the views Generator Projection How do we match distributions? generated min G true D KL(G D) =min z G E apple log G(z) D(z) estimate using logistic regression

Approach Our goal is to learn a 3D shape generator whose projections match the provided set of the views Generator Projection generated min G min G How do we match distributions? true D KL(G D) =min z G E max apple log G(z) D(z) E x D [log d(x)] + E z G [log(1 estimate using logistic regression d Generative adversarial networks [Goodfellow et al.] d(z))]

PrGAN Generator maps z to a voxel occupancy grid and a viewpoint Projection using line integration along the view direction Z 1 I(x) =1 exp V (x + r)dr 0

Dataset generation

Airplanes input generated

Airplanes input generated

Vases input generated

Vases input generated

Mixed categories input generated

MMD metric: 2D-GAN 90.1, PrGAN 88.3

Trained with 3D supervision MMD metric 3D-GAN 347.5 PrGAN 442.9

Projection GAN The model is able to recover the coarse 3D structure But should use side information when available Viewpoint Landmarks / part labels Pose estimates Iterative: bootstrap 3D to estimate pose & viewpoint

Thank you! Collaborators: Matheus Gadelha, Zhaoliang Lun, Gopal Sharma, Rui Wang, Evangelos Kalogerakis Funding from NSF, NVIDIA, Facebook https://people.cs.umass.edu/smaji/projects.html