What are we trying to achieve? Why are we doing this? What do we learn from past history? What will we talk about today?

Similar documents
Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Contexts and 3D Scenes

Contexts and 3D Scenes

3D Spatial Layout Propagation in a Video Sequence

3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Data-Driven 3D Primitives for Single Image Understanding

Unfolding an Indoor Origami World

From 3D Scene Geometry to Human Workspace

Object Detection by 3D Aspectlets and Occlusion Reasoning

Focusing Attention on Visual Features that Matter

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

CS 558: Computer Vision 13 th Set of Notes

Semantic Classification of Boundaries from an RGBD Image

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Multi-view stereo. Many slides adapted from S. Seitz

Revisiting 3D Geometric Models for Accurate Object Shape and Pose

Learning from 3D Data

Seeing the unseen. Data-driven 3D Understanding from Single Images. Hao Su

Can Similar Scenes help Surface Layout Estimation?

arxiv: v1 [cs.cv] 3 Jul 2016

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image. Supplementary Material

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

EECS 442 Computer vision. 3D Object Recognition and. Scene Understanding

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Single Image Super-resolution. Slides from Libin Geoffrey Sun and James Hays

Shape Anchors for Data-driven Multi-view Reconstruction

Deformable Part Models

Local cues and global constraints in image understanding

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry

Finding Structure in Large Collections of 3D Models

Structured Prediction using Convolutional Neural Networks

FPM: Fine Pose Parts-Based Model with 3D CAD Models

Support surfaces prediction for indoor scene understanding

Unsupervised discovery of category and object models. The task

Building Part-based Object Detectors via 3D Geometry

Combining Semantic Scene Priors and Haze Removal for Single Image Depth Estimation

Learning to generate 3D shapes

Segmentation. Bottom up Segmentation Semantic Segmentation

Painting-to-3D Model Alignment Via Discriminative Visual Elements. Mathieu Aubry (INRIA) Bryan Russell (Intel) Josef Sivic (INRIA)

arxiv: v3 [cs.cv] 18 Aug 2017

Closing the Loop in Scene Interpretation

3D layout propagation to improve object recognition in egocentric videos

Depth from Stereo. Dominic Cheng February 7, 2018

Revisiting Depth Layers from Occlusions

Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics

Announcements. Stereo Vision Wrapup & Intro Recognition

Opportunities of Scale

Parsing Indoor Scenes Using RGB-D Imagery

Understanding Bayesian rooms using composite 3D object models

arxiv: v2 [cs.cv] 24 Apr 2017

Geometric and Semantic 3D Reconstruction: Part 4A: Volumetric Semantic 3D Reconstruction. CVPR 2017 Tutorial Christian Häne UC Berkeley

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Automatic Photo Popup

Understanding Indoor Scenes using 3D Geometric Phrases

arxiv: v1 [cs.cv] 25 Oct 2017

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

arxiv: v1 [cs.cv] 5 May 2016

Video Object Proposals

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc.

Window based detectors

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

Single-view 3D reasoning

DEPT: Depth Estimation by Parameter Transfer for Single Still Images

Patch to the Future: Unsupervised Visual Prediction

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding

Fitting (LMedS, RANSAC)

By Suren Manvelyan,

Detection III: Analyzing and Debugging Detection Methods

Learning-based Localization

Part-Based Models for Object Class Recognition Part 3

CS381V Experiment Presentation. Chun-Chen Kuo

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Data-driven Depth Inference from a Single Still Image

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

3D Scene Understanding by Voxel-CRF

Object Detection by 3D Aspectlets and Occlusion Reasoning

Lecture 8 Active stereo & Volumetric stereo

Detecting Object Instances Without Discriminative Features

Lecture 18: Human Motion Recognition

EE795: Computer Vision and Intelligent Systems

Binge Watching: Scaling Affordance Learning from Sitcoms

Object Category Detection. Slides mostly from Derek Hoiem

CS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Local Features and Bag of Words Models

Scale-less Dense Correspondences

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas

Combining Monocular Geometric Cues with Traditional Stereo Cues for Consumer Camera Stereo

3D Deep Learning on Geometric Forms. Hao Su

Some Thoughts on Visibility

Dynamic visual understanding of the local environment for an indoor navigating robot

Deep Models for 3D Reconstruction

Transcription:

Introduction

What are we trying to achieve? Why are we doing this? What do we learn from past history? What will we talk about today?

What are we trying to achieve?

Example from Scott Satkin

3D interpretation from image Geometric labels Volumetric layout

Sparse primitives Dense reconstruction 3D scene model

Why are we doing this? Applications

3D for motion when there is no direct way to get 3D

Informing detectors

D. Hoiem, A. A. Efros, and M. Hebert. Putting Objects in Perspective. International Journal of Computer Vision, Vol. 80, No. 1, October, 2008.

Learned Primitives (Examples) 747/203

Contact points

Object surfaces + Contact points

Editing images

Pose sampling

Predicting actions

Input Image Estimated Geometry

Application: Affordance Estimation Estimated Geometry Predicted Sitting Locations

Application: Affordance Estimation Sitting Upright Sitting Reclined Laying Down Reaching (4 poses)

Application: Affordance Estimation

Application: Affordance Estimation Sitting Upright Sitting Reclined Laying Down Reaching (4 poses)

Separating style and structure

Style vs. structure? Tenenbaum& Freeman. Separating Style and Content with Bilinear Models. Neural Computation. 2000.

Casablanca Hotel, New York

What do we learn from past history? Historical perspective

First era: Geometric/symbolic reasoning

Huffman 71, Clowes 71, Kanade80, 81 Sugihara 86, Malik 87, etc.

Kanade s Origami World, 1978

Kanade schair (Artificial Intelligence, 1981)

Scene parsing [Ohta & Kanade 1978] Guzman (SEE), 1968 Yakimovsky & Feldman, 1973 Hansen & Riseman(VISIONS), 1978 Barrow & Tenenbaum 1978 Brooks (ACRONYM), 1979 Ohta& Kanade, 1978

Issues Assumed good (perfect?) geometric elements inferred from the image. Limitations on computation, data, inference techniques prevented practical estimation of geometric primitives.

Second era: Statistical machine learning

Input Learned model Classification Training data

Now we have the opposite problem: Powerful tools to estimate low-level geometric cues (e.g., surface labels) Does not incorporate high-level geometric constraints (e.g., orthogonality, intersections, etc.) Does not incorporate higher-level reasoning

Now (Part I): learning+reasoning

Structured prediction tools Orientation maps(lee et al. 2009) Geometric Context(Hoiem et al. 2007) [Schwing/Urtasun 2012]

Structured prediction tools+ search Input image features Generate hypotheses Search through hypotheses to pick the best one Best = maximum score Scene configuration hypotheses Score computation learned from data using structured prediction tools Final scene configuration

Optimization tools Convex Concave [Fouhey et al. ECCV 2014]

+ Richer representations, including reasoning about geometric primitives (e.g., relative placement of surfaces, contact relationships, etc.) - Taming the combinatorics: How to generate and search hypothesis space efficiently? - Summarizes a large amount of training data into a simple model - Difficult to capture the richness of big data

Now (Part II): Data-driven interpretation

Label transfer Karsch, Liu, Kang. Depth Extraction from Video Using Non-parametric Sampling. ECCV 2012. Fouhey, Gupta, Hebert. Data-Driven 3D Primitives for Single-Image Understanding. ICCV 2013.

Object transfer Input image DPM output Lots of object models Output Matched models Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models. M. Aubry, D. Maturana, A. Efros, B. Russell and J. Sivic CVPR, 2014.

Object transfer Training Testing W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese. Understanding Indoor Scenes using 3D Geometric Phrases. CVPR 2013.

Scene transfer Lots of 3D models Nearestneighbor search

(Arbitrarily) richer description: Transfer of semantics, 3D poses, segmentation, material properties, etc. How to relate 2D/appearance features to purely 3D geometric representations? What matching score/distance metric should be used? How to rank matches?

What will we talk about today?

Tutorial Outline Bottom up classifiers More explicit constraint+reasoning Qualitative Explicit/Quantitative

Outline Part 1: Derek Bottom-up Methods for Regions and Boundaries, Global Constraints Part 2: Abhinav Volumetric and Functional Constraints Part 3: David Data-driven Models Region labels + Boundaries and objects Stronger geometric constraints from domain knowledge + physical constraints +functional constraints

Big Questions How to estimate geometric properties from an image? How to incorporate geometric constraints and which ones? How to combine reasoning tools with statistical classification/regression tools? How to use large-scale 3D data (3D models, kinect) How to combine with other 3D estimation methods?