Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding

Similar documents
Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding

Separating Objects and Clutter in Indoor Scenes

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Support surfaces prediction for indoor scene understanding

3D-Based Reasoning with Blocks, Support, and Stability

Contexts and 3D Scenes

arxiv: v1 [cs.cv] 3 Jul 2016

Contexts and 3D Scenes

3D Reasoning from Blocks to Stability

Object Detection by 3D Aspectlets and Occlusion Reasoning

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Learning from 3D Data

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization

Motion Tracking and Event Understanding in Video Sequences

Efficient Grasping from RGBD Images: Learning Using a New Rectangle Representation. Yun Jiang, Stephen Moseson, Ashutosh Saxena Cornell University

Detection III: Analyzing and Debugging Detection Methods

Finding Surface Correspondences With Shape Analysis

3D Deep Learning on Geometric Forms. Hao Su

Trademark Matching and Retrieval in Sport Video Databases

Urban Scene Segmentation, Recognition and Remodeling. Part III. Jinglu Wang 11/24/2016 ACCV 2016 TUTORIAL

Colored Point Cloud Registration Revisited Supplementary Material

Sports Field Localization

CS 558: Computer Vision 13 th Set of Notes

Multiple View Geometry

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

arxiv: v3 [cs.cv] 18 Aug 2017

Seeing the unseen. Data-driven 3D Understanding from Single Images. Hao Su

Depth Estimation Using Collection of Models

Multi-View 3D Object Detection Network for Autonomous Driving

Segmentation. Bottom up Segmentation Semantic Segmentation

Learning based face hallucination techniques: A survey

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

arxiv: v1 [cs.cv] 28 Sep 2018

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Planning, Execution and Learning Application: Examples of Planning in Perception

Learning to Exploit Stability for 3D Scene Parsing

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

WP1: Video Data Analysis

Local cues and global constraints in image understanding

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Scene Understanding by Reasoning Stability and Safety

Correcting User Guided Image Segmentation

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Edge-Semantic Learning Strategy for Layout Estimation in Indoor Environment

From 3D Scene Geometry to Human Workspace

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

arxiv: v1 [cs.cv] 23 Mar 2018

Adaptive Point Cloud Rendering

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments

Geometric-Edge Random Graph Model for Image Representation

Seminar Heidelberg University

Shape from Silhouettes I CV book Szelisky

3D Spatial Layout Propagation in a Video Sequence

Irradiance Gradients. Media & Occlusions

3D Scene Understanding by Voxel-CRF

Scalable Object Classification using Range Images

Data-driven Depth Inference from a Single Still Image

Symmetry Detection Competition

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

FPM: Fine Pose Parts-Based Model with 3D CAD Models

Recurrent Transformer Networks for Semantic Correspondence

3DNN: 3D Nearest Neighbor

Volumetric Scene Reconstruction from Multiple Views

From 3D descriptors to monocular 6D pose: what have we learned?

Unfolding an Indoor Origami World

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Understanding Indoor Scenes using 3D Geometric Phrases

Tracking People. Tracking People: Context

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Learning Semantic Environment Perception for Cognitive Robots

Multi-view stereo. Many slides adapted from S. Seitz

Georgios (George) Papaioannou

Learning to Grasp Objects: A Novel Approach for Localizing Objects Using Depth Based Segmentation

BIL Computer Vision Apr 16, 2014

3D Shape Segmentation with Projective Convolutional Networks

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Learning to Segment Object Candidates

Let s start with occluding contours (or interior and exterior silhouettes), and look at image-space algorithms. A very simple technique is to render

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Structured Attention Networks

An Evaluation of Volumetric Interest Points

Understanding Bayesian rooms using composite 3D object models

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Multiview Reconstruction

Deformable Segmentation using Sparse Shape Representation. Shaoting Zhang

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes

CS231A Midterm Review. Friday 5/6/2016

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

CS231A Course Notes 4: Stereo Systems and Structure from Motion

INDOOR 3D MODEL RECONSTRUCTION TO SUPPORT DISASTER MANAGEMENT IN LARGE BUILDINGS Project Abbreviated Title: SIMs3D (Smart Indoor Models in 3D)

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Transcription:

: Stability-based Cuboid Arrangements for Scene Understanding Tianjia Shao* Aron Monszpart Youyi Zheng Bongjin Koo Weiwei Xu Kun Zhou * Niloy J. Mitra *

Background A fundamental problem for single view data acquisition Missing observations due to scene occlusion Limiting the understanding of indoor scenes From the capture view From another view

How do We Imagine Unseen Data?

How do We Imagine Unseen Data?

Our Motivation Mimic the process of human s imagination based on physical stability

Our Motivation Physical stability can help to reason about the scene structure Relationship among parts touching or fixed

Related Works: Proxy-based Scene Understanding Structured output of primitives [Li et al. 2011] [Lafarge et al. 2013] Creating abstracted geometry [Arikan et al. 2013] Studying spatial layout [Gupta et al. 2010] [Lee et al. 2010] [Hartley et al. 2012] Image manipulation [Zheng et al. 2012]

Related Works: Cuboid Detection Statistical deformable cuboid model [Fidler et al. 2012] Cuboid corner point model [Xiao et al. 2012] Image space contrast-based features [Hedau et al. 2012]

Related Works: Physical Validity Constraints Penetration free [Hedau et al. 2010] [Jiang and Xiao 2013] Improve segmentation [Jia et al. 2013] Voxel-based scene parsing [Zheng et al. 2013] [Kim et al. 2013]

Our Goal Recover the underlying structure of indoor scenes Abstracting indoor scenes as collections of cuboids Hallucinate geometry in the occluded regions Identifying part parameters (size and orientation) Identifying part relations (touching or fixed)

Algorithm Overview

Algorithm Overview Input scan Initial cuboids Optimized cuboids + recovered structure

Algorithm Overview Input scan Initial cuboids Optimized cuboids + recovered structure

A B C D E G Inferring Geometry and Relations Key observation: relations are coupled with geometry G supports A & E; E supports B & C & D G supports E; E supports A & C & D; C supports B Optimized geometry Optimized geometry

Inferring Geometry and Relations Encode the discrete interaction among the cuboids as a multiconnection graph G V, E V: cuboids; E: contact types (and corresponding cuboid extensions) a b e ab 2 a e ab 1 e ab 0 b e ab 3

Inferring Geometry and Relations Optimization formulation Edge selection problem x ij k = 1: edge type e ij k selected; x ij k = 0: not selected. Goal: Physically stable arrangement of cuboids with minimal number of fixed contacts min xij k #(e ij k = fixed joint) Constraints: k x ij k = 1 e ab 2 a e ab 1 e ab 0 b e ab 3

Inferring Geometry and Relations Pruning the solution space Pre-pruning Penetration check Visibility check

Inferring Geometry and Relations Pruning the solution space Pre-pruning Penetration check Visibility check Branch-and-bound Most stable

Inferring Geometry and Relations Assessing physical stability Local reasoning is not enough

Inferring Geometry and Relations Measuring global stability through static equilibrium The net force and torque should be zero Decompose the contact forces to compression forces and friction forces E s A min f Df + w 2 s. t. f i n 0 and f i u, fi v < μfi n f 1 f n 2 f v 2 f 2 w f u 2 f 3

Overview Ground truth Metrics Robustness Validity Applications

Overview Ground truth Metrics Robustness Validity Applications

Evaluation Ground Truth

Evaluation Ground Truth 20 scenes High occlusion 3 dimensions measured by hand

Evaluation Ground Truth Fixed Touching 20 scenes High occlusion 3 dimensions measured by hand 700 scenes (NYU2) Medium occlusion Cuboids, floors, walls Support graph Initialization

Evaluation Ground Truth Fixed Touching 20 scenes High occlusion 3 dimensions measured by hand 700 scenes (NYU2) Medium occlusion Online: 720 scenes, annotator tool Cuboids, floors, walls Support graph Initialization

Overview Ground truth Metrics Robustness Validity Applications

Evaluation Metrics Approximation error L 1 -norm

Evaluation Metrics Fixed Touching Approximation error L 1 -norm Correctness of structure graph F 1 score

Evaluation Approximation error Initial

Evaluation Approximation error Initial Ground truth {err i init }

Evaluation Approximation error Initial Ground truth Optimized {err i init } {err i optimized }

Evaluation Approximation error Initial Ground truth Optimized {err i init } {err i optimized } err i init err i optimized >? 5%

Evaluation Approximation error Initial Ground truth Optimized {err i init } {err i optimized } err i init err i optimized >? 5% 35% 3%

RGB(-D) Evaluation F 1 score Ground Truth Fixed Touching

RGB(-D) Evaluation F 1 score Ground Truth Output Fixed Touching

RGB(-D) Evaluation F 1 score Ground Truth Output Fixed Touching

RGB(-D) Evaluation F 1 score Precision Recall TP TP + FP TP TP + FN F 1 200 Precision Recall Precision + Recall Ground Truth Output Fixed Touching

Evaluation F 1 score RGB(-D) #Scenes Initial Optimized Synthetic 12 32.2 87.2 NYU2 700 40.0 60.5 Own 20 56.5 95.9 Kinfu 2 32.1 89.7 Ground Truth Output Fixed Touching

Evaluation F 1 score RGB(-D) Initialization Ground Truth Fixed Touching

Evaluation F 1 score RGB(-D) Initialization Ground Truth Optimized Fixed Touching

Evaluation F 1 score RGB(-D) Initialization Ground Truth Optimized Fixed Touching

Overview Ground truth Metrics Robustness Validity Applications

Robustness to viewpoint 1 Viewpoint 1 Fixed Touching

Robustness to viewpoint 1 Viewpoint 1 Viewpoint 2 Fixed Touching 2

Robustness to scene complexity Input scene

Robustness to scene complexity Input cloud (back view)

Robustness to scene complexity Initialization

Robustness to scene complexity Fixed Touching Initialization Optimized Structure graph

Robustness to scene complexity Fixed Touching Initialization Optimized Structure graph

Overview Ground truth Metrics Robustness Validity Applications

Evaluation Validity RGB-D input Initial (side view)

Evaluation Validity RGB-D input Initial (side view) Optimized (side view)

Evaluation Validity Less occluded scan

Evaluation Validity Less occluded scan + Optimized

Evaluation Validity Less occluded scan + Optimized Extra proxies removed Front view

Overview Ground truth Metrics Robustness Validity Applications

Applications #1 Model retrieval

Applications #1 Model retrieval

Applications #1 Model retrieval

Applications #2 Re-arrangement

Applications #2 Re-arrangement

Applications #3 - Completion RGB-D Input

Applications #3 - Completion RGB-D Input Physically stable approximation

Applications #3 - Completion RGB-D Input Completed models

Applications #3 - Completion RGB-D Input Completed models

Limitations heterogeneous [http://azfoo.net/places/az/mesa/cemetery] [GravityGlue.com]

Limitations Fixed Touching [http://azfoo.net/places/az/mesa/cemetery] [GravityGlue.com]

Code + Data:

Robustness to initialization No perturbation 5 10