Layered Scene Decomposition via the Occlusion-CRF Supplementary material

Similar documents
Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Multiview Reconstruction

Estimating Human Pose in Images. Navraj Singh December 11, 2009

CPSC 340: Machine Learning and Data Mining

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

I How does the formulation (5) serve the purpose of the composite parameterization

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Learning Two-View Stereo Matching

Unsupervised learning in Vision

Occluder Simplification using Planar Sections

3D Reasoning from Blocks to Stability

MULTIVIEW STEREO (MVS) reconstruction has drawn

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry

Linear Methods for Regression and Shrinkage Methods

Piecewise-Planar 3D Reconstruction with Edge and Corner Regularization

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Lecture 8: The EM algorithm

EE795: Computer Vision and Intelligent Systems

Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains

Multi-View 3D Object Detection Network for Autonomous Driving

Chapter 2 Basic Structure of High-Dimensional Spaces

Multi-View Stereo for Static and Dynamic Scenes

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Joint Entity Resolution

A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Contexts and 3D Scenes

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

CS381V Experiment Presentation. Chun-Chen Kuo

Supplementary Materials for

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Optical Flow-Based Motion Estimation. Thanks to Steve Seitz, Simon Baker, Takeo Kanade, and anyone else who helped develop these slides.

ECSE 626 Course Project : A Study in the Efficient Graph-Based Image Segmentation

Machine Learning Feature Creation and Selection

Supplementary Material: Piecewise Planar and Compact Floorplan Reconstruction from Images

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Hierarchical Clustering 4/5/17

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Configuration Space of a Robot

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Some Thoughts on Visibility

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Shape from Silhouettes I CV book Szelisky

CRF Based Point Cloud Segmentation Jonathan Nation

Object Detection by 3D Aspectlets and Occlusion Reasoning

Data-driven Depth Inference from a Single Still Image

Combining PGMs and Discriminative Models for Upper Body Pose Detection

DATA MINING II - 1DL460

CSE 5243 INTRO. TO DATA MINING

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

Supplementary Material for ECCV 2012 Paper: Extracting 3D Scene-consistent Object Proposals and Depth from Stereo Images

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava

Leveraging Transitive Relations for Crowdsourced Joins*

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Contexts and 3D Scenes

Bitangent 3. Bitangent 1. dist = max Region A. Region B. Bitangent 2. Bitangent 4

Separating Objects and Clutter in Indoor Scenes

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

Image Based Reconstruction II

COMPARISON OF PHOTOCONSISTENCY MEASURES USED IN VOXEL COLORING

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

3D-Based Reasoning with Blocks, Support, and Stability

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

Constructing a G(N, p) Network

Supplemental Material: Detailed, accurate, human shape estimation from clothed 3D scan sequences

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Combining Monocular and Stereo Depth Cues

The Application of Image Processing to Solve Occlusion Issue in Object Tracking

CSE 5243 INTRO. TO DATA MINING

Supplementary Material for Synthesizing Normalized Faces from Facial Identity Features

Numerical Analysis and Statistics on Tensor Parameter Spaces

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering Color/Intensity. Group together pixels of similar color/intensity.

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Scan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Markov Networks in Computer Vision. Sargur Srihari

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Det De e t cting abnormal event n s Jaechul Kim

CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007.

Introduction to Computer Vision

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Jie Gao Computer Science Department Stony Brook University

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Facial Expression Recognition Using Non-negative Matrix Factorization

Segmentation and Tracking of Partial Planar Templates

Support surfaces prediction for indoor scene understanding

Image Restoration with Deep Generative Models

3D Object Model Acquisition from Silhouettes

Real-time Object Detection CS 229 Course Project

Transcription:

Layered Scene Decomposition via the Occlusion-CRF Supplementary material Chen Liu 1 Pushmeet Kohli 2 Yasutaka Furukawa 1 1 Washington University in St. Louis 2 Microsoft Research Redmond 1. Additional experimental results All additional experimental results are in figure 1, figure 2, figure 3, and figure 4. See the captions for the details. 1

Figure 1. More experimental results on our datasets.

Figure 2. More experimental results on our datasets.

Figure 3. More experimental results on NYU V2 datasets.

Figure 4. More experimental results on NYU V2 datasets.

Figure 5. In the main paper, we have compared our Fusion Space (FS) algorithm against Fusion Move (FM) algorithm based on some statistics. This figure shows some qualitative evaluations on the the two methods. The first column shows the input image and the depth image. The remaining columns compares FS (top) and FM (bottom) methods.

Figure 6. Typical failure cases: (1) When there are many objects in a scene, our method might produce many small surfaces which fail to represent objects in a meaningful way. (2) Our method is capable of capturing most background structures accurately but may fail when the background is complex and contains windows, opening door or large free space. (3) Our method sometimes fails to capture small objects accurately, for example, the bottle on the table and the foot of the chair. (4) In some cases, our method produces many unnecessary surfaces due to the lack of balance between the data term and the MDL term.

2. Data term details Our data term consists of the following four terms: E data (F) = p I λ depth E depth (f p ) + λ norm E norm (f p )+ λ color E color (f p ) + E order (f p ). We provide the details of the first three terms in this supplementary document. For the first depth term, we compute the difference between the model depth at the first non-empty layer and the input depth along the model surface normal. Let us denote this depth difference as d, then E depth (f p ) is set to 1 exp( d2 ). σ 2σd 2 d = 0.1m and λ depth = 2000 is used throughout the experiments, while the unit of the depth difference is a meter. If the first non-visible vertex is at the background most layer, we tolerate a small amount of fitting error (0.05m) and use max(d 0.05, 0) as the depth deviation to prefer smooth surfaces in the background. The second normal term is simply the angle between the model normxal and the input normal. λ normal = 200. The color term measures the likelihood of observing a pixel color given a color model for the segment. For each segment (when generated in a proposal), we train a mixture of Gaussian models with two clusters. To alleviate the effects of lighting and exposure variations, we convert RGB color into the HSV space and ignore the V channel. The color term is a truncated linear function of the negative loglikelihood with the truncation at 20. λ color = 10 is used. 3. More proposal designing schemes Single surface expansion proposal: This proposal seeks to expand a surface to more accurately explain input data, or inpaint invisible surfaces to minimize regularization penalties. Given a current surface segment, we have two mechanism to specify the solution space. For each pair of a surface and a mechanism, we update the model. We would like to merge the solution space and update the model in much fewer steps, but the convergence becomes poor in our experiments. Given a surface s S in the model, the first mechanism simply allows this surface to be used in all the pixels in all the layers. The second mechanism seeks to expand this surfaces behind other surfaces in the same layer, while moving other surfaces to frontal layers. We allow other surfaces in the same layer to be at the same pixel locations in frontal layers, while allowing s to be at all the pixels in this layer. Backward merging proposal: The first background hull proposal may fail in getting the proper background structure, and this proposal is specifically designed to improve the background layer by merging more surfaces. For every surface that is not in the background layer, we allow it at the same pixels in the background layer, while excluding impossible pixels where the surface depth is in front of the input depth or behind the current background layer. We grow the region as before. Structure expansion proposal: One limitation of the single surface expansion is that the proposal expands only one surface. This proposal seeks to expand two surfaces simultaneously, and is more powerful in recovering structures of large objects such as tables and chairs that consist of more than one surface. This proposal is conducted only in the middle layers, where large objects likely reside in. We exhaustively pick a pair of surfaces in a middle layer that share a common boundary and are orthogonal to each other with a tolerance of 20. We compute the score of each combination by using the metric used in the smooth hull proposal. We randomly pick a candidate pair with a probability proportional to the score. We grow each surface as before. We allow pixels in the layer to be either the current one or the predicted structure. We also allow current labels to be in frontal layers to move objects to the front as in the background hull proposal. 4. Proposal designing for Fusion Move method We design proposals for the Fusion Move method (FM) in the same manner to make a fair comparison. The key difference is that for FM approach, only one new label is allowed for each pixel. So here we explain the modification required for each proposal. Surface adding proposal: For surface adding proposal, we allow each surface to grow to cover more pixels in order to better find the extent of a surface. While for FM, a surface can not grow to a pixel if it is already occupied by another surface. So the extent of growth is limited. background hull proposal: We first extract background hull in the same way. The difference is that for our FS approach, we can allow other surfaces to appear in any frontal layer or to be removed, but for FM approach, we cannot afford this because only one new label is allowed. So we always move other surface one layer front. This difference is critical as the lack of freedom makes frontal surfaces reluctant for moving. Segment refitting proposal: For our FS approach, we can refit multiple new surfaces for one surface at the same time. And in our layer representation, different layers could choose either a refitted surface or the original one indepently. While for FM approach, as only one new label is allowed, we refit one new surface for each surface and a pixel has to choose either using new surfaces for all layers or using original surfaces for all layers. The growth of surfaces is also limited as in segment adding proposal.

Single surface expansion proposal: For FM approach, there is only one mechanism allowed for single surface expansion proposal because of the label space limitation. That is the chosen surface is expanded in its layer and other surfaces in the same layer is moved one layer front. Layer swap proposal: This proposal is limited by FM approach. Now each surface has only one new layer to choose. For surfaces in the first layer, they have the second layer as their option. For other surfaces, they could coose to be moved one layer front. Also, a pixel has to choose either moving all surfaces or not moving any surfaces. Backward merging proposal: The limitation of FM has little impact for this proposal as surfaces could choose to be moved to the background layer or not. The difference is that the growth of surfaces in the background layer is limited as in segment adding proposal. Structure expansion proposal: Similar with background hull proposal, we can expand a structure as we did for our FS approach, but we can only move other surfaces in the same layer one layer front, which severely affects the efficiency of this proposal. In summary, for FM approach all these proposals are limited by the fact that only one new label is allowed for each pixel and become less effective for our layer decomposition task. As Table 1 in our main paper shows, for FM approach, the energy stays at a high state and cannot be further decreased by any proposal even though a much lower state is reachable by our FS approach. Figure 7. Left: Two surfaces are convex. The exted part of one surface is occluding the other surface. Right: Two surfaces are concave. The exted part of one surface is occluded by the other surface. choose either the corresponding surface or the currently selected surface based on convexity between them. Note that this algorithm might not find the proper background hull in some extreme cases, but it works well practically in our experiment. And our FS method is robust against wrong background hulls. This algorithm is also used for structure construction applied in structure expansion proposal. 5. Background hull construction algorithm Given a set of planar surface segments in an image, we compute a background hull by their combinations. Here each surface contains information such as 3D geometry parameters and the set of pixels it covers on image. So we can calculate the depth of pixel p if it is assigned to surface f, which we denote as d f p. We denote the set of image pixels f covers as P f. See Alg. 1 for the algorithm. Note that this is a greedy algorithm based on a simple heuristic. The heuristic is that the convexity between two surfaces should be kept. The way we determine convexity is to ext each surface and check occlusion in the image region a surface covers. See Fig. 7. If two surfaces f 1 and f 2 form a convex structure, then the exted part of f 1 will occlude f 2 in the region covered by f 2. Same for f 2. While if two surfaces f 1 and f 2 form a concave structure, then their exted part is occluded by the other surface. So we determine convexity between two surfaces by counting how many pixels a surface covers are occluded by or occluding another surface s exted part. Once we determine convexity between any pair of surfaces, for each pixel in R, we iterate over all surfaces in S. Each time, we

Algorithm 1: Background Hull Construction input : A set of surfaces S and the set of pixels R in ROI output: An assignment of S to R for f 1 in S do for f 2 in S\f 1 do Convexity[f 1, f 2 ] calcconvexity(f 1, f 2 ); for p in R do f h empty for f in S do if f h = empty then f h f; else if Convexityf, f h = true then if d f p > d f h p then f h f; else if d f p < d f h p then f h f; H[p] f h return H; Func calcconvexity(f 1, f 2 ) num occluding pixels 0 num occluded pixels 0 for p in P { do if d f 2 p < d f 1 p then num occluding pixels num occluding pixels + 1 else num occluded pixels num occluded pixels + 1 for p in P { do if d f 1 p < d f 2 p then num occluding pixels num occluding pixels + 1 else num occluded pixels num occluded pixels + 1 return num occluding pixels num occluded pixels