CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Similar documents
CS381V Experiment Presentation. Chun-Chen Kuo

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Contexts and 3D Scenes

Separating Objects and Clutter in Indoor Scenes

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Contexts and 3D Scenes

3D Reasoning from Blocks to Stability

Segmentation. Bottom up Segmentation Semantic Segmentation

Classification and Detection in Images. D.A. Forsyth

Midterm Examination CS 534: Computational Photography

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

Chapter 3 Image Registration. Chapter 3 Image Registration

Automatic Photo Popup

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Recurrent Convolutional Neural Networks for Scene Labeling

Data-driven Depth Inference from a Single Still Image

Support surfaces prediction for indoor scene understanding

Indoor Object Recognition of 3D Kinect Dataset with RNNs

3D-Based Reasoning with Blocks, Support, and Stability

Learning video saliency from human gaze using candidate selection

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

CRF Based Point Cloud Segmentation Jonathan Nation

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

Conditional Random Fields as Recurrent Neural Networks

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

Object Detection by 3D Aspectlets and Occlusion Reasoning

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Segmentation as Selective Search for Object Recognition in ILSVRC2011

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Correcting User Guided Image Segmentation

Classification of objects from Video Data (Group 30)

Learning to Segment Object Candidates

Region-based Segmentation and Object Detection

Uncertainties: Representation and Propagation & Line Extraction from Range data

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization

Superpixel Segmentation using Depth

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Tri-modal Human Body Segmentation

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Segmentation and Grouping

Single-view 3D Reconstruction

Segmentation and Tracking of Partial Planar Templates

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Supplementary Material: Piecewise Planar and Compact Floorplan Reconstruction from Images

Multi-Class Segmentation with Relative Location Prior

Undirected Graphical Models. Raul Queiroz Feitosa

CS 558: Computer Vision 13 th Set of Notes

Learning Depth-Sensitive Conditional Random Fields for Semantic Segmentation of RGB-D Images

Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

3 October, 2013 MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos

Learning Semantic Environment Perception for Cognitive Robots

Towards Spatio-Temporally Consistent Semantic Mapping

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Computer Vision I. Dense Stereo Correspondences. Anita Sellent 1/15/16

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Learning Spatial Context: Using Stuff to Find Things

Automatic Colorization of Grayscale Images

Learning and Recognizing Visual Object Categories Without First Detecting Features

Indoor Scene Understanding with RGB-D Images

Supervised Learning for Image Segmentation

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Multi-Camera Calibration, Object Tracking and Query Generation

Multi-View 3D Object Detection Network for Autonomous Driving

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

4/13/ Introduction. 1. Introduction. 2. Formulation. 2. Formulation. 2. Formulation

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Segmentation and Grouping April 19 th, 2018

08 An Introduction to Dense Continuous Robotic Mapping

Automatic Indoor 3D Scene Classification using RGB-D data

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Seminar Heidelberg University

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Geometric and Semantic 3D Reconstruction: Part 4A: Volumetric Semantic 3D Reconstruction. CVPR 2017 Tutorial Christian Häne UC Berkeley

Outline. Segmentation & Grouping. Examples of grouping in vision. Grouping in vision. Grouping in vision 2/9/2011. CS 376 Lecture 7 Segmentation 1

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

Robust Segmentation of Cluttered Scenes Using RGB-Z Images

Methods for Automatically Modeling and Representing As-built Building Information Models

Rich feature hierarchies for accurate object detection and semant

Multiple View Geometry

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

Natural Language Processing

Learning the Ecological Statistics of Perceptual Organization

Machine Learning Applications in Exploration and Mining

Depth from Stereo. Dominic Cheng February 7, 2018

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Structured Models in. Dan Huttenlocher. June 2010

Selective Search for Object Recognition

CS 231A Computer Vision (Winter 2014) Problem Set 3

Transcription:

CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012

Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships Different colors show different kinds of objects; Support relationships help understand the scene and interact with scene elements.

Introduction What do we have Color image Depth image (3D coordinates) How 3D cues can best inform a structured 3D interpretation Dataset with 1449 densely labeled images

scene structure region segmentation supporting relationships General Steps How 3D cues help scene interpretation Integer programming formulation

scene structure region segmentation supporting relationships Scene Structure Modeling Align the room with the 3 principle directions Compute 3D lines and surface normals Find the most probable X-Y-Z axis Segment the visible regions into 3D planes Propose 3D planes using RANSAC Segment the image into the proposed planes

scene structure region segmentation supporting relationships Aligning to Room Coordinates Preparation using 3D coordinates Straight line segments 3D surface normals at each pixel Propose candidates (100-200) All the straight 3D lines Mean-shift modes of surface normals Search for the most probable X-Y-Z triple Random sample a triple, compute the score Manhattan world assumption Choose the triple with highest score Warp the image to align with principle directions

scene structure region segmentation supporting relationships Proposing and Segmenting Planes Generating potential planes Sample the grid of pixel and propose planes (>2500 inliers) Assign each pixel a label to a certain plane Latent variables to infer: plane label Observable variables: 3D coordinates, RGB intensities, surface normals Conditional random field modeling solved by graph cuts 3D coordinates unary term surface normals pairwise term RGB intensities

scene structure region segmentation supporting relationships Proposing and Segmenting Planes Unary term Geometrically validate the labels _ from RANSAC plane proposing Pairwise term smoothness weighed by RGB intensity difference

scene structure region segmentation supporting relationships Segmentation Oversegmentation into superpixels Boundaries detection from RGB intensities Force consistency with 3D planes regions Iterative merging of regions Regions with minimum boundary strength are merged Boundary strength: Trained boosted decision tree classifier y: labels of regions x: paired regions features

scene structure region segmentation supporting relationships Segmentation Paired region features RGB features: crucial for nearby or touching objects 3D features (plane labels, surface normals, depth): help differentiate between texture and object edges

scene structure region segmentation supporting relationships Modeling Support Relationships Variables to infer for each region ( R regions in total) the support region supported by other regions supported by an invisible region not supported (ground) supported from below/behind structure class 1: Ground 2: Furniture (large objects that cannot be carried) 3: Prop (small objects that can be easily carried) 4: Structure (walls, ceiling, columns)

scene structure region segmentation supporting relationships Modeling Support Relationships Energy minimization Factorize posterior distribution Final problem likelihood + factorization Prior likelihood + factorization Prior

scene structure region segmentation supporting relationships Modeling Support Relationships Prior term Transition prior (supporting relationship between two structure classes) Support consistency (between 3D structure and support relationship) which combination is more likely Global ground consistency Everything is above floor Ground consistency No support for floor

scene structure region segmentation supporting relationships Modeling Support Relationships Likelihood term support relation classifier structure classifier support features proximity, containment, characteristics of supporting objects, absolute 3D locations of candidate objects structure features SIFT features, color histogram, (object classification) Classifiers trained by logistic regression

scene structure region segmentation supporting relationships Modeling Support Relationships Introduce Boolean indicator variables: Problem is linearized! Integer programming à relax the integrality constraints

Experiments Segmentation evaluation measured as average overlap over ground truth regions for best-matching segmented region

Support Relationships Evaluation Evaluate proposed inference model against Image plane rules (no structure class assignment) Structure class rules (class assignment by trained classifier) Support classifier (no structure class assignment; infer the support relationship between every pair of regions) Metric Percentage of correct supports

Support Relationships Evaluation

Experiments Structure class prediction evaluation only slightly better than local classification

More results Using ground-truth segmentation

More results Using proposed segmentation

Summary Pros 3D features (planes, surface normals, 3D coordinates) help segmentation and support relationship inference Globally infer the support relationships with high accuracy (50% - 70%) Cons Too many functions based on training ---- training time and training data size What is a good factorization of the posterior distribution in inference of support relationships ---- Are structure class features and support features really separable? Should we consider more kinds of objects instead of just props (to make features more distinguishable)?