Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Similar documents
ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Contexts and 3D Scenes

Contexts and 3D Scenes

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Learning from 3D Data

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

Object Detection by 3D Aspectlets and Occlusion Reasoning

3D Object Detection with Latent Support Surfaces

What are we trying to achieve? Why are we doing this? What do we learn from past history? What will we talk about today?

Detecting Object Instances Without Discriminative Features

arxiv: v3 [cs.cv] 18 Aug 2017

Separating Objects and Clutter in Indoor Scenes

Support surfaces prediction for indoor scene understanding

Development in Object Detection. Junyuan Lin May 4th

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

CS 558: Computer Vision 13 th Set of Notes

Object Category Detection. Slides mostly from Derek Hoiem

Tri-modal Human Body Segmentation

Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding

arxiv: v1 [cs.cv] 25 Oct 2017

Detection III: Analyzing and Debugging Detection Methods

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

CS381V Experiment Presentation. Chun-Chen Kuo

3D Spatial Layout Propagation in a Video Sequence

2D-Driven 3D Object Detection in RGB-D Images

Robotics Programming Laboratory

Category vs. instance recognition

FPM: Fine Pose Parts-Based Model with 3D CAD Models

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Bus Detection and recognition for visually impaired people

arxiv: v1 [cs.cv] 3 Jul 2016

Segmentation. Bottom up Segmentation Semantic Segmentation

Deformable Part Models

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Category-level localization

Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models

DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image. Supplementary Material

arxiv: v2 [cs.cv] 24 Apr 2017

Detection and Fine 3D Pose Estimation of Texture-less Objects in RGB-D Images

3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Object Category Detection: Sliding Windows

Supplementary Material for Ensemble Diffusion for Retrieval

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

CRF Based Point Cloud Segmentation Jonathan Nation

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Human Upper Body Pose Estimation in Static Images

Data-driven Depth Inference from a Single Still Image

Seminar Heidelberg University

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Permanent Structure Detection in Cluttered Point Clouds from Indoor Mobile Laser Scanners (IMLS)

Real-time Object Detection CS 229 Course Project

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Classification of objects from Video Data (Group 30)

Object Detection Design challenges

Beyond Bags of features Spatial information & Shape models

Part-Based Models for Object Class Recognition Part 3

Structured Models in. Dan Huttenlocher. June 2010

All lecture slides will be available at CSC2515_Winter15.html

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

3D-Based Reasoning with Blocks, Support, and Stability

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Spatial Localization and Detection. Lecture 8-1

3D Deep Learning on Geometric Forms. Hao Su

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)

Learning to generate 3D shapes

Single Image Super-resolution. Slides from Libin Geoffrey Sun and James Hays

Efficient Detector Adaptation for Object Detection in a Video

Local features and image matching. Prof. Xin Yang HUST

Modern Object Detection. Most slides from Ali Farhadi

Object Detection Using Segmented Images

Multi-view stereo. Many slides adapted from S. Seitz

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Single-view 3D Reconstruction

Part-based and local feature models for generic object recognition

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Learning Realistic Human Actions from Movies

Robust PDF Table Locator

Revisiting 3D Geometric Models for Accurate Object Shape and Pose

A novel template matching method for human detection

Local Features and Bag of Words Models

Real-Time Human Detection using Relational Depth Similarity Features

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report


Transcription:

ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016

Introduction Given an image of an realistic indoor scene, how do you classify objects in them?

Goals Indoor scene understanding: To develop new representations and algorithms for 3D object detection and spatial layout prediction in cluttered indoor scenes Main challenge: Images of indoor (home or office) environments are typically highly cluttered and have substantial occlusions

Previous Work: CAD Detection: CAD models for learning object shapes and alternative viewpoints But lack a surplus of models Models do not cover different classes of the same object type Computationally inefficient M. Aubry, D. Maturana, A. Efros, B. Russell, and J. Sivic. Seeing 3D chairs: Exemplar partbased 2D3D alignment using a large dataset of CAD models. In CVPR, 2014. J. J. Lim, H. Pirsiavash, and A. Torralba. Parsing IKEA objects: Fine pose estimation. In ICCV, 2013 J. J. Lim, A. Khosla, and A. Torralba. FPM: Fine pose partsbased model with 3D CAD models. In ECCV, pages 478 493. Springer, 2014

Previous Work Layout proposal: Manhattan structure to infer 2D projections of the 3D structure Integral representation to explore exponentially many layout proposals Previous work focused on restricted environment and does not generalize to cluttered scenes Manual heuristics to reduce scene parsing false positives Scalability issue J. M. Coughlan and A. L. Yuille. Manhattan world: Compass direction from a single image by Bayesian inference. In ICCV, volume 2, pages 941 947. IEEE, 1999 Z. Wu, S. Song, A. Khosla, X. Tang, and J. Xiao. 3D shapenets for 2.5D object recognition and nextbestview prediction. arxiv preprint arxiv:1406.5670, 2014 S. Song and J. Xiao. Sliding shapes for 3D object detection in depth images. In ECCV, pages 634 651. Springer, 2014.

Proposed Solution Representation Geometric features Clouds of oriented gradients (COG) Novel Manhattan voxel structure (layout) Training Structured SVM (on cuboid + layout) Cascaded classification framework

Proposed Solution Representation Geometric features Clouds of oriented gradients (COG) Novel Manhattan voxel structure Training Structured SVM (on cuboid + layout) Cascaded classification framework

RGBD to Voxel Features

Geometric Features Point cloud density: Using 3D Density with a 3 = iℓ Niℓ / Aiℓ 3D Normal Orientations: Find normal orientation for each 3D point using plane fit with 15 nearest neighbors

Clouds of Oriented Gradients (COG) Computes gradients on the RGB channels of the 2D image Applies filters: Maximum responses across color channels are gradients (dx, dy) in the x and y directions, with magnitude:

Clouds of Oriented Gradients (COG) Uses nine 3D orientation bins from 0 to 180 degrees Uses perspective projection to find corresponding 2D bin boundaries

COG Normalization and Aliasing Bilinearly interpolates gradient magnitudes between neighboring orientation bins For a small ϵ > 0 Dimension of COG: 63 x 9 = 1944

Room Layout Geometry: Manhattan Voxels Room layout prediction: floor, ceiling, wall Discretize vertical space between floor and ceiling into 6 equal bins Threshold of 0.15m to separate points near walls from hypothesized layout Use diagonal lines to split bins at room corners to create 12 x 6 = 72 bins

Manhattan voxels cont. Regions: 14: Scene interior, where objects could be placed anywhere (point cloud distribution varies widely) 58: Model points near assumed Manhattan wall structure. Here, 5 and 6 contain orthogonal planes. 912: Points outside of predicated layout

Proposed Solution Representation Geometric features Clouds of oriented gradients (COG) Novel Manhattan voxel structure Training Structured SVMs (on cuboid + layout) Cascaded classification framework

Cuboid Detection Cuboid i = (Ii, Bi) = { a iℓ a iℓ, b iℓ, c iℓ }216ℓ=1 point cloud density feature b iℓ 25 surface normal histogram features c 9 COG features iℓ Find prediction function hc : I B B = (L,, S) L center of cuboid in 3D cuboid orientation S physical size of cuboid

Cuboid Detection Training: nslack formulation of structural SVM n Ii Bi C number of categories input image for cuboid i bounding box for cuboid i constant

Cuboid Detection Loss function: B B 3D bounding box ground truth bounding box orientation with respect to ground ground truth orientation

Cuboid Hypothesis Cuboid hypotheses calculated using sliding window Width quantiles {0.1, 0.3, 0.5, 0.7, 0.9} Depth quantiles {0.25, 0.5, 0.75} Height quantiles {0.3, 0.5, 0.8} All combinations of voxel size, 3D location, and orientation (from 16 candidate orientations) is evaluated.

Layout Detection M = (L,, S) Trained using the same SSVM method, with freespace definition of IOU as loss, where ground truth is hypothesis with largest freespace IOU

Layout Hypothesis Layout hypotheses must capture 80% of candidate points. Floors and ceilings predicted at 0.001 and 0.999 quantiles of 3D points (along gravity direction). 5,000 20,000 hypotheses for a typical scene

Learning Spatial Context Problem: Portion of large object detected as smaller object

Learning Spatial Context Problem: Portion of large object detected as smaller object Solution: Cascaded classification

Evaluation

Experiment Setup Dataset: SUN RGBD Parameters Compared with: sliding shape, baseline layout, HOG 10 object categories Performance Metrics Cuboid performance evaluated using IOU with ground truth cuboids Layout performance evaluated using freespace IOU with human annotations

Experiment Results

Experiment Results Precision scores for 10 object categories

Experiment Results

Summary Novel Representations Cloud of oriented gradients (COG) for cuboids Manhattan voxels for layouts Uses RGBD data, does not rely on CAD model information Learning Objects classified using SSVM Cascaded learning framework applied to remove false positives

Q&A

Backup

Cascaded Classification Firststage detection becomes input features to secondstage classifiers that estimate confidence Essentially a directed graphical model with hidden variables. Marginalizing the firststage variables recovers a standard, fullyconnected undirected graph. More efficient: Training decomposes into independent learning problems for each node (object category) Optimal test classification is possible via a rapid sequence of local decisions

Cascaded Classification First stage Outputs layout, set of {bounding box, confidence score, object category} Second stage Add contextual features: Objectobject overlap: Objectlayout context: distance and angle to nearest wall

Learning Spatial Context Training Standard SVM with radial basis function (RBF) kernel Binary classification: true or false positive Prediction Secondstage classifier outputs new contextual confidence Overall confidence is sum of first and second stages