CS 558: Computer Vision 13 th Set of Notes

Similar documents
Contexts and 3D Scenes

Contexts and 3D Scenes

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)

Closing the Loop in Scene Interpretation

Local cues and global constraints in image understanding

Automatic Photo Popup

Single-view 3D Reconstruction

Can Similar Scenes help Surface Layout Estimation?

Correcting User Guided Image Segmentation

3D Spatial Layout Propagation in a Video Sequence

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

CEng Computational Vision

Object Detection by 3D Aspectlets and Occlusion Reasoning

Segmentation. Bottom up Segmentation Semantic Segmentation

Support surfaces prediction for indoor scene understanding

Region-based Segmentation and Object Detection

Deformable Part Models

Object Category Detection. Slides mostly from Derek Hoiem

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Supplementary Material: Piecewise Planar and Compact Floorplan Reconstruction from Images

Recovering the Spatial Layout of Cluttered Rooms

Detection III: Analyzing and Debugging Detection Methods

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

Single-view 3D reasoning

Focusing Attention on Visual Features that Matter

Detecting Object Instances Without Discriminative Features

Data-driven Depth Inference from a Single Still Image

Decomposing a Scene into Geometric and Semantically Consistent Regions

What are we trying to achieve? Why are we doing this? What do we learn from past history? What will we talk about today?

3D layout propagation to improve object recognition in egocentric videos

3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Interpreting 3D Scenes Using Object-level Constraints

Shadows in the graphics pipeline

Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Object Detection Using Cascaded Classification Models

c 2011 Varsha Chandrashekhar Hedau

Object Detection by 3D Aspectlets and Occlusion Reasoning

Viewpoint Invariant Features from Single Images Using 3D Geometry

Visual localization using global visual features and vanishing points

Learning Spatial Context: Using Stuff to Find Things

Discriminative classifiers for image recognition

Object Category Detection: Sliding Windows

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

HOG-based Pedestriant Detector Training

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Scene Modeling for a Single View

Scene Modeling for a Single View

Projective Geometry and Camera Models

Revisiting 3D Geometric Models for Accurate Object Shape and Pose

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Single-view metrology

Separating Objects and Clutter in Indoor Scenes

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Simple 3D Reconstruction of Single Indoor Image with Perspe

Category vs. instance recognition

Scene Modeling for a Single View

Projective Geometry and Camera Models

Combining Semantic Scene Priors and Haze Removal for Single Image Depth Estimation

arxiv: v1 [cs.cv] 3 Jul 2016

Window based detectors

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

SIMPLE ROOM SHAPE MODELING WITH SPARSE 3D POINT INFORMATION USING PHOTOGRAMMETRY AND APPLICATION SOFTWARE

Single Image Super-resolution. Slides from Libin Geoffrey Sun and James Hays

Beyond Bags of features Spatial information & Shape models

12/3/2009. What is Computer Vision? Applications. Application: Assisted driving Pedestrian and car detection. Application: Improving online search

Contextual Classification with Functional Max-Margin Markov Networks

More Single View Geometry

Local Image Features

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

Parsing Indoor Scenes Using RGB-D Imagery

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization

Pinhole Camera Model 10/05/17. Computational Photography Derek Hoiem, University of Illinois

The Kinect Sensor. Luís Carriço FCUL 2014/15

Linear combinations of simple classifiers for the PASCAL challenge

Methods for Automatically Modeling and Representing As-built Building Information Models

Scene Modeling for a Single View

DEPTH MAP ESTIMATION FROM SINGLE-VIEW IMAGE USING OBJECT CLASSIFICATION BASED ON BAYESIAN LEARNING

Local Features and Bag of Words Models

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

Introduction to 3D Concepts

CMPSCI 670: Computer Vision! Image formation. University of Massachusetts, Amherst September 8, 2014 Instructor: Subhransu Maji

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Multi-view stereo. Many slides adapted from S. Seitz

Course Administration

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Perspective projection. A. Mantegna, Martyrdom of St. Christopher, c. 1450

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Part-based and local feature models for generic object recognition

Probabilistic Parameter Selection for Learning Scene Structure from Video

CS381V Experiment Presentation. Chun-Chen Kuo

Optimizing Monocular Cues for Depth Estimation from Indoor Images

Local features: detection and description. Local invariant features

CSCI 4972/6963 Advanced Computer Graphics Quiz 2 Tuesday April 17, 2007 noon-1:30pm

From 3D Scene Geometry to Human Workspace

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Toward Coherent Object Detection And Scene Layout Understanding

arxiv: v1 [cs.cv] 28 Sep 2018

Real-time Indoor Scene Understanding using Bayesian Filtering with Motion Cues

Transcription:

CS 558: Computer Vision 13 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215

Overview Context and Spatial Layout Relating Objects and Geometry Putting Objects in Perspective Interpretation of indoor scenes Based on slides by D. Hoiem

Context in Recognition Objects usually are surrounded by a scene that can provide context in the form of nearby objects, surfaces, scene category, geometry, etc.

Context provides clues for function What is this? Examples from Antonio Torralba

Context provides clues for function What is this? Now can you tell?

Sometimes context is the major component of recognition What is this?

Sometimes context is the major component of recognition What is this? Now can you tell?

More Low-Res What are these blobs?

More Low-Res The same pixels! (a car)

There are many types of context Local pixels window, surround, image neighborhood, object boundary/shape, global image statistics 2D Scene Gist global image statistics 3D Geometric 3D scene layout, support surface, surface orientations, occlusions, contact points, etc. Semantic event/activity depicted, scene category, objects present in the scene and their spatial extents, keywords Photogrammetric camera height orientation, focal length, lens distortion, radiometric, response function Illumination sun direction, sky color, cloud cover, shadow contrast, etc. Geographic GPS location, terrain type, land use category, elevation, population density, etc. Temporal nearby frames of video, photos taken at similar times, videos of similar scenes, time of capture Cultural photographer bias, dataset selection bias, visual clichés, etc. from Divvala et al. CVPR 2009

Jason Salavon: http://salavon.com/specialmoments/newlyweds.shtml Cultural context

Cultural context Mildred and Lisa : Who is Mildred? Who is Lisa? Andrew Gallagher: http://chenlab.ece.cornell.edu/people/andy/projectpage_names.html

Cultural context Age given Appearance Age given Name Andrew Gallagher: http://chenlab.ece.cornell.edu/people/andy/projectpage_names.html

Spatial layout is especially important 1. Context for recognition

Spatial layout is especially important 1. Context for recognition

Spatial layout is especially important 1. Context for recognition 2. Scene understanding

Spatial Layout: 2D vs. 3D

But object relations are in 3D Close Not Close

Highly Structured 3D Models Figs from Hoiem - Savarese 2011 book

High detail, Low abstraction Depth Map Saxena, Chung & Ng 2005, 2007

Medium detail, High abstraction Room as a Box Hedau Hoiem Forsyth 2009

Med-High detail, High abstraction Complete 3D Layout Guo Zou Hoiem 2015

Surface Layout: describe 3D surfaces with geometric classes Sky Non-Planar Porous Vertical Non-Planar Solid Support Planar (Left/Center/Right)

The challenge???

Our World is Structured Abstract World Our World Image Credit (left): F. Cunin and M.J. Sailor, UCSD

Learn the Structure of the World Training Images

Infer the most likely interpretation Unlikely Likely

Geometry estimation as recognition Features Color Texture Perspective Position Surface Geometry Classifier Vertical, Planar Region Training Data

Use a variety of image cues Color, texture, image location Vanishing points, lines Texture gradient

Surface Layout Algorithm Input Image Segmentation Features Perspective Color Texture Position Surface Labels Trained Region Classifier Training Data Hoiem Efros Hebert (2007)

Surface Layout Algorithm Input Image Multiple Segmentations Features Perspective Color Texture Position Confidence-Weighted Predictions Final Surface Labels Trained Region Classifier Training Data Hoiem Efros Hebert (2007)

Geometric Classes Ground Vertical Planar: facing Left ( ), Center ( ), Right ( ) Non-planar: Solid (X), Porous (O) or wiry Sky

Surface Description Result

Automatic Photo Popup Labeled Image Fit Ground-Vertical Boundary with Line Segments Form Segments into Polylines Cut and Fold Final Pop-up Model [Hoiem Efros Hebert 2005]

The World Behind the Image

Geometric Cues Color Texture Location Perspective

Need Good Spatial Support 50x50 Patch 50x50 Patch Color Texture Perspective Color Texture Perspective

Need Good Spatial Support

Image Segmentation Single segmentation won t work Solution: multiple segmentations

Labeling Segments For each segment: -Get P(good segment data) P(label good segment, data)

Image Labeling Labeled Segmentations Labeled Pixels P( label data) segments P( good segment data) P( label good segment, data)

Labeling Results Input image Ground Truth Hoiem et al.

Results Input Image Ground Truth Hoiem et al.

Results Input Image Ground Truth Hoiem et al.

Results Input Image Ground Truth Hoiem et al.

Failures: Reflections, Rare Viewpoint Input Image Ground Truth Hoiem et al.

Average Accuracy Main Class: 88% Subclasses: 61%

48 Automatic Photo Popup [SIGGRAPH 2005]

Relating Objects and Geometry

Knowledge of Geometry Critical for Object Detection? 2? 1? 3

Geometrically Coherent Interpretation Geometry Objects Image v 0 : horizon position Camera Parameters y c : camera height

What can we do with these models? Local Queries (marginalization) What is the likelihood that this is a car? What is the distribution of the camera height given the image? Global Queries (maximization) What is the most likely complete hypothesis of objects, geometry, parameters?

Objects Object detector: Defines a set of objects Estimate likelihood of object identities at possible locations/scales Learn distribution of object heights in the 3D world E.g. consumerreports.com for cars

Geometry Image Labels Ground Vertical Sky V-Left V-Center V-Right V-Porous V-Solid Geometry estimates: produce probability maps for each label Compute: Likelihoods of object identities at each position given the geometry and image data Likelihood of geometry given image data Initial estimates (top of ground, bottom of sky) help horizon estimate

Camera Parameters Camera Height Estimate prior from training images Horizon Position Estimate based on prior, estimated geometry, and potential vanishing points Identified objects of known height distribution help refine camera parameter estimates Image Plane Horizon Position Camera Object Image Height Camera Height 3D Object 3D Object Object World Height Object World Height

Local Queries (Marginalization) Exact marginalization not tractable Assumptions Objects depend on local geometry Local geometry independent Objects independent given camera parameters Approximations Marginalize only over hypotheses that have at most one other object Discard extremely unlikely objects/camera parameters early

How far can camera parameters get us? What the system knows: Estimated horizon position Camera height prior (in meters) Distribution of car heights in the world (in meters) No other image data! What the system tells you: 50% confidence interval for size of car (in the image) given bottom-center position 50% confidence intervals for camera parameters

Hallucinations: Camera Parameters No Objects Given 1 Object Given

Hallucinations: Camera Parameters No Objects Given 1 Object Given

How far can camera parameters and geometry get us? What the system knows: Same as before but with geometry estimates No other image data! What the system tells you: Most likely position/size of car in image 50% confidence intervals for camera parameters

Hallucinations: Geometry and Camera Parameters Estimated Geometry No Object Given 1 Object Given

How much do geometry estimates help (with local image data)? 40 contextual features based on average confidence values of geometric labels within windows Above Object Window Below [ICCV 2005]

Example Results [Murphy et al., 2003] [Hoiem et al., 2005] + 40 context features [ICCV 2005]

64 Global Queries (Maximization) Provides full image interpretation (for modeled aspects of scene) Finding optimal solution is intractable Branch and bound algorithm Greedy algorithms Usefulness depends on the peakedness of the joint distribution

Putting Objects in Perspective Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University Robotics Institute

Local Object Detection True Detection False Detections Missed Missed True Detections Local Detector: [Dalal-Triggs 2005]

Real Relationships are 3D Close Not Close

Objects and Scenes Hock, Romanski, Galie, & Williams 1978 Biederman s Relations among Objects in a Well-Formed Scene (1981): Support Size Position Interposition Likelihood of Appearance

Contribution of this Research Hock, Romanski, Galie, & Williams 1978 Biederman s Relations among Objects in a Well-Formed Scene (1981): Support Size Position Interposition Likelihood of Appearance

Object Support

Object Size in the Image Is the person or the car taller? World

Object Size Camera Viewpoint Input Image Loose Viewpoint Prior

Object Size Camera Viewpoint Object Position/Sizes Viewpoint

Object Size Camera Viewpoint Object Position/Sizes Viewpoint

Object Size Camera Viewpoint Object Position/Sizes Viewpoint

Object Size Camera Viewpoint Object Position/Sizes Viewpoint

What does surface and viewpoint say about objects? Image P(surfaces) P(viewpoint) P(object) P(object surfaces) P(object viewpoint)

What does surface and viewpoint say about objects? Image P(surfaces) P(viewpoint) P(object) P(object surfaces, viewpoint)

Scene Parts Are All Interconnected Objects Camera Viewpoint 3D Surfaces

Input to Algorithm Object Detection Surface Estimates Viewpoint Prior Local Car Detector Local Ped Detector Local Detector: [Dalal-Triggs 2005] Surfaces: [Hoiem-Efros-Hebert 2005]

Scene Parts Are All Interconnected Objects Viewpoint 3D Surfaces

Approximate Model Objects Viewpoint 3D Surfaces

Object detection Car: TP / FP Ped: TP / FP Initial (Local) Final (Global) Car Detection 4 TP / 2 FP 4 TP / 1 FP Ped Detection 3 TP / 2 FP 4 TP / 0 FP Local Detector: [Dalal-Triggs 2005]

Experiments on LabelMe Dataset Testing with LabelMe dataset: Cars as small as 14 pixels Peds as small as 36 pixels

More Tasks Better Detection Local Detector from Murphy et al. 2003 All Information Objects + View Objects + Geom Objects Only Car Detection Pedestrian Detection All Information Objects + View Objects + Geom Objects Only [Hoiem Efros Hebert 2006]

Good Detectors Become Better Local Detector from Dalal-Triggs 2005 All Information Objects Only Car Detection All Information Objects Only Pedestrian Detection

Better Detectors Better Viewpoint Median Error: Horizon Prior Using 2003 Local Detector Using 2005 Local Detector 8.5% 3.8% 3.0% 90% Bound:

More is Better More objects Detect Cars Only Detect Peds Only Detect Both Better viewpoint estimates 7.3% Error 5.0% Error 3.8% Error Better viewpoint Better object detection 10% fewer false positives at same detection rate

Qualitative Results Car: TP / FP Ped: TP / FP Initial: 6 TP / 1 FP Final: 9 TP / 0 FP

Qualitative Results Car: TP / FP Ped: TP / FP Initial: 3 TP / 3 FP Final: 5 TP / 1 FP

Putting Objects in Perspective meters Ped Car Ped meters

Interpretation of indoor scenes

Vision = assigning labels to pixels? Lamp Wall Sofa Table Floor Floor

Vision = interpreting within physical space Wall Sofa Table Floor

Physical space needed for affordance Is this a good place to sit? Could I stand over here? Can I put my cup here? Walkable path

Physical space needed for recognition Apparent shape depends strongly on viewpoint

Physical space needed for recognition Vanishing point

Key challenges How to represent the physical space? Requires seeing beyond the visible How to estimate the physical space? Requires simplified models Requires learning from examples

Hedau Hoiem Forsyth, ICCV 2009 Box Layout Room is an oriented 3D box Three vanishing points specify orientation Two pairs of sampled rays specify position/size

Another box consistent with the same vanishing points Box Layout

Image Cues for Box Layout Straight edges Edges on floor/wall surfaces are usually oriented towards VPs Edges on objects might mislead Appearance of visible surfaces Floor, wall, ceiling, object labels should be consistent with box floor left wall right wall objects

Box Layout Algorithm 1. Detect edges 2. Estimate 3 orthogonal vanishing points + 3. Apply region classifier to label pixels with visible surfaces Boosted decision trees on region based on color, texture, edges, position 4. Generate box candidates by sampling pairs of rays from VPs 5. Score each box based on edges and pixel labels Learn score via structured learning 6. Jointly refine box layout and pixel labels to get final estimate

Evaluation Dataset: 308 indoor images Train with 204 images, test with 104 images

Experimental results Detected Edges Surface Labels Box Layout Detected Edges Surface Labels Box Layout

Experimental results Detected Edges Surface Labels Box Layout Detected Edges Surface Labels Box Layout

Experimental results Joint reasoning of surface label / box layout helps Pixel error: 26.5% 21.2% Corner error: 7.4% 6.3% Similar performance for cluttered and uncluttered rooms

Using room layout to improve object detection Box layout helps 1. Predict the appearance of objects, because they are often aligned with the room 2. Predict the position and size of objects, due to physical constraints and size consistency 2D Bed Detection 3D Bed Detection with Scene Geometry Hedau, Hoiem, Forsyth, ECCV 2010, CVPR 2012