Methods for Representing and Recognizing 3D objects

Similar documents
CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Lecture 15 Visual recognition

EECS 442 Computer vision. 3D Object Recognition and. Scene Understanding

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Instance-level recognition part 2

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

Instance-level recognition II.

Hough Transform and RANSAC

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Lecture 9 Fitting and Matching

Lecture 8 Fitting and Matching

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

EECS 442 Computer vision. Fitting methods

Part-based models. Lecture 10

Fitting. Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! EECS Fall 2014! Foundations of Computer Vision!

Part based models for recognition. Kristen Grauman

Supervised learning. y = f(x) function

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Instance-level recognition

Course Administration

Image correspondences and structure from motion

Recognizing and Learning Object Categories. Li Fei-Fei, Stanford Rob Fergus, NYU Antonio Torralba, MIT

Supervised learning. y = f(x) function

Fitting: The Hough transform

Local features: detection and description. Local invariant features

Instance-level recognition

Part-based and local feature models for generic object recognition

Fitting: The Hough transform

Object Recognition with Invariant Features

Homographies and RANSAC

Patch Descriptors. CSE 455 Linda Shapiro

Photo by Carl Warner

Patch-based Object Recognition. Basic Idea

Local features: detection and description May 12 th, 2015

Camera Geometry II. COS 429 Princeton University

Announcements. Recognition (Part 3) Model-Based Vision. A Rough Recognition Spectrum. Pose consistency. Recognition by Hypothesize and Test

Beyond Bags of features Spatial information & Shape models

Recognition (Part 4) Introduction to Computer Vision CSE 152 Lecture 17

Feature Matching + Indexing and Retrieval

Local features and image matching. Prof. Xin Yang HUST

CV as making bank. Intel buys Mobileye! $15 billion. Mobileye:

RANSAC and some HOUGH transform

Local Features based Object Categories and Object Instances Recognition

RANSAC: RANdom Sampling And Consensus

Invariant Features from Interest Point Groups

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Patch Descriptors. EE/CSE 576 Linda Shapiro

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Object and Class Recognition I:

Fitting. Fitting. Slides S. Lazebnik Harris Corners Pkwy, Charlotte, NC

Model Fitting: The Hough transform II

Lecture 16: Object recognition: Part-based generative models

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Deformable Part Models

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

Automatic Image Alignment

Automatic Image Alignment

Automatic Image Alignment (feature-based)

3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints

Lecture 10: Multi view geometry

RANSAC RANdom SAmple Consensus

Structure from motion

Constructing Implicit 3D Shape Models for Pose Estimation

Model Fitting. Introduction to Computer Vision CSE 152 Lecture 11

3dRR: Representation and Recognition

Today. Introduction to recognition Alignment based approaches 11/4/2008

Fitting: The Hough transform

Miniature faking. In close-up photo, the depth of field is limited.

Multiple View Geometry. Frank Dellaert

Model Fitting, RANSAC. Jana Kosecka

Object Recognition and Augmented Reality

EECS 442 Computer vision. Object Recognition

Multi-stable Perception. Necker Cube

EE290T : 3D Reconstruction and Recognition

Stereo and Epipolar geometry

Perception IV: Place Recognition, Line Extraction

Machine Learning Crash Course

CS 558: Computer Vision 4 th Set of Notes

Step-by-Step Model Buidling

Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections

EECS 442 Computer Vision fall 2011

Vision par ordinateur

Video Scene Categorization by 3D Hierarchical Histogram Matching

CSE 252B: Computer Vision II

Object Category Detection. Slides mostly from Derek Hoiem

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1

Visual Object Recognition

Recovering structure from a single view Pinhole perspective projection

Robust Geometry Estimation from two Images

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

Epipolar Geometry and Stereo Vision

Augmenting Reality, Naturally:

C280, Computer Vision

Technical Report No D Object Recognition using Local Shape Descriptors. Mustafa Mohamad

Prof. Kristen Grauman

Lecture 10: Multi-view geometry

Feature Matching and Robust Fitting

Feature Based Registration - Image Alignment

Fitting: Voting and the Hough Transform April 23 rd, Yong Jae Lee UC Davis

Transcription:

Methods for Representing and Recognizing 3D objects part 1 Silvio Savarese University of Michigan at Ann Arbor

Object: Building, 45º pose, 8-10 meters away Object: Person, back; 1-2 meters away Object: Police car, side view, 4-5 m away

Object: Building s name; What are the businesses inside? Street or intersection name

Scene understanding Navigation Interaction Augmentation Manipulation Visual technology Building Object 1 Image/video Object N - semantic - semantic - geometry - geometry Recognizing objects under arbitrary viewing conditions Recognize their pose

Car: frontright training Iron: top-rearleft Minimal supervision

Single 3D object recognition Edelman et al. 91 Ballard, 81 Ullman & Barsi, 91 Grimson & L.-Perez, 87 Rothwell 92 Lowe, 87 Linderberg, 94 Murase & Nayar 94 Zhang et al 95 Schmid & Mohr, 96 Schiele & Crowley, 96 Lowe, 99 Jacob & Barsi, 99 Rothganger et al., 04 Ferrari et al, 05 Brown & Lowe 05 Snavely et al 06

Recognition for Virtual sightseeing NOKIA

Object manipulations Robot arm for automatic gas fill up

Object manipulations

Single view object categorization Leung et al 99 Felzenszwalb & Weber et al. 00 Huttenlocher 03 Ullman et al. 02 Fei-Fei et al. 04 Fergus et al. 03 Leibe et al. 04 Torralba et al. 03 Kumar & Hebert 04 Sivic et al. 05 Shotton et al 05 Grauman et al. 05 Sudderth et al 05 Torralba et al. 05 Lazebnik et al. 06 Todorovic et al. 06 Bosh et al 07 Vedaldi & Soatto 08 Zhu et al 08

Security Safe driving photography

3D Object Categorization Weber et al. 00 Schneiderman et al. 01 Capel et al 02 Johnson & Herbert 99 Bronstein et al, 03 Ruiz-Correa et al. 03, Funkhouser et al 03 Bart et al 04 Thomas et al. 06 Kushal, et al., 07 Savarese et al, 07, 08 Chiu et al. 07 Hoiem, et al., 07 Yan, et al. 07

Overview Single 3D object recognition Single view object categorization 3D object categorization

Single 3D object recognition Edelman et al. 91 Ballard, 81 Ullman & Barsi, 91 Grimson & L.-Perez, 87 Rothwell 92 Lowe, 87 Linderberg, 94 Murase & Nayar 94 Zhang et al 95 Schmid & Mohr, 96 Schiele & Crowley, 96 Lowe, 99 Jacob & Barsi, 99 Rothganger et al., 04 Ferrari et al, 05 Brown & Lowe 05 Snavely et al 06

Basic scheme -Representation -Features -Descriptors - Model -Model learning -Recognition -Hypothesis generation - Model verification

Object representation: Collection of patches in 3D Rothganger et al. 06 x,y,z + h,v + descriptor Courtesy of Rothganger et al

Model learning Rothganger et al. 03 06 Build a 3D model: N images of object from N different view points Match key points between consecutive views [ create sample set] Use affine structure from motion to compute 3D location and orientation + camera locations [RANSAC] Find connected components Use bundle adjustment to refine model Upgrade model to Euclidean assuming zero skew and square pixels

Affine Structure from Motion Books: Faugeras, 95 Zisserman & Hartley, 00 Ma, Soatto, et al. 05 Affine epipolar geometry between image pairs. Fundamental matrix F imposes: x Fx = 0 and l = Fx

Learnt models Rothganger et al. 03 06 Courtesy of Rothganger et al

Basic scheme -Representation -Features -Descriptors - Model -Model learning -Recognition [object instance from a single image] - Hypothesis generation - Model verification

Recognition [Rothganger et al. 03 06] 1. Find matches between model and test image features

Recognition [Rothganger et al. 03 06] 1. Find matches between model and test image features 2. Generate hypothesis: Compute transformation M from N matches points) 3. Model verification (N=2; affine camera; affine key

Recognition [Rothganger et al. 03 06] Goal: Estimate (fit) the best M in presence of outliers

Line fitting with outliers : I P, such that: O min O Model parameters f (P, ) f ( P, ) T 1 T P P P

Fitting homographies for stitching panorama Image h Image k h : I P, k : I P, k h O O k h x x,,, 1 2 min x n h O O k such that: f ( P h, P k, ) f h ( P, P, ) k P h H P k

Fitting homographies for stitching panorama

RANSAC - Basic philosophy (RANdom SAmple Consensus) : Learning technique to estimate parameters of a model by random sampling of observed data Fischler & Bolles in 81. Data elements are used to vote for one (or multiple) models Robust to outliers and missing data Assumption1: Noise features will not vote consistently for any single model ( few outliers) Assumption2: there are enough features to agree on a good model ( few missing data)

RANSAC Sample set = set of points in 2D Algorithm: 1. Select random sample of minimum required size to fit model 2. Compute a putative model from sample set 3. Compute the set of inliers to this model from whole data set Repeat 1-3 until model with the most inliers over all samples is found

RANSAC Sample set = set of points in 2D Algorithm: 1. Select random sample of minimum required size to fit model =[2] [?] 2. Compute a putative model from sample set 3. Compute the set of inliers to this model from whole data set Repeat 1-3 until model with the most inliers over all samples is found

RANSAC Sample set = set of points in 2D Algorithm: 1. Select random sample of minimum required size to fit model =[2] [?] 2. Compute a putative model from sample set 3. Compute the set of inliers to this model from whole data set Repeat 1-3 until model with the most inliers over all samples is found

RANSAC Sample set = set of points in 2D Algorithm: O = 14 1. Select random sample of minimum required size to fit model =[2] [?] 2. Compute a putative model from sample set 3. Compute the set of inliers to this model from whole data set Repeat 1-3 until model with the most inliers over all samples is found

RANSAC (RANdom SAmple Consensus) : Fischler & Bolles in 81. Algorithm: O = 6 1. Select random sample of minimum required size to fit model [?] 2. Compute a putative model from sample set 3. Compute the set of inliers to this model from whole data set Repeat 1-3 until model with the most inliers over all samples is found

How many samples? Number of samples N Choose N so that, with probability p, at least one random sample is free from outliers (e.g. p=0.99) (outlier ratio: e ) Initial number of points s Typically minimum number needed to fit the model Distance threshold Choose so probability for inlier is p (e.g. 0.95) Zero-mean Gaussian noise with std. dev. σ: t 2 =3.84σ 2 N log 1 p / log 1 1 e s proportion of outliers e s 5% 10% 20% 25% 30% 40% 50% 2 2 3 5 6 7 11 17 3 3 4 7 9 11 19 35 4 3 5 9 13 17 34 72 5 4 6 12 17 26 57 146 6 4 7 16 24 37 97 293 7 4 8 20 33 54 163 588 8 5 9 26 44 78 272 1177 Source: M. Pollefeys

Recognition [Rothganger et al. 03 06] 1. Find matches between model and test image features 2. Generate hypothesis: Compute transformation M from N matches points) 3. Model verification (N=2; affine camera; affine key

Object to recognize Initial matches based on appearance Matches after pose verification Recovered pose Courtesy of Rothganger et al

3D Object Recognition results Rothganger et al. 03 06 Courtesy of Rothganger et al Handle severe clutter

A comparative experiment

Other multi-view matching algorithms -Ferrari et al. 04, 06 - Lazebnick et al 04 -Brown et al, 05 -Toshev, Shi, Daniilidis, 07

Overview Single 3D object recognition Single view object categorization 3D object categorization

3D Object Categorization Mixture of 2D single view models Weber et al. 00 Schneiderman et al. 01 Bart et al. 04 Full 3D models Multi-view models Bronstein et al, 03 Ruiz-Correa et al. 03, Funkhouser et al 03 Capel et al 02 Johnson & Herbert 99 Thomas et al. 06 Savarese et al, 07, 08 Chiu et al. 07 Hoiem, et al., 07 Yan, et al. 07 Kushal, et al., 07 Liebelt et al 08 Sun et al 08

Mixture of single-view 2D models Weber et al. 00 Schneiderman et al. 01 Single view Model 3D Category model Single view Model

Mixture of single-view 2D models Mixture of 2-D models Weber, Welling and Perona CVPR 00 Component 1 Component 2

Mixture of single-view 2D models Weber et al. 00 Schneiderman et al. 01 Single view Model 3D Category model Single view Model Single view models are independent No information is shared No sense of correspondences of parts under 3D transformations

3D Object Categorization Mixture of 2D single view models Weber et al. 00 Schneiderman et al. 01 Bart et al. 04 Full 3D models Multi-view models Bronstein et al, 03 Ruiz-Correa et al. 03, Funkhouser et al 03 Capel et al 02 Johnson & Herbert 99 Thomas et al. 06 Savarese et al, 07, 08 Chiu et al. 07 Hoiem, et al., 07 Yan, et al. 07 Kushal, et al., 07 Liebelt et al 08 Sun et al 08

Full 3D models 3D model instance Bronstein et al, 03 Ruiz-Correa et al. 03, Funkhouser et al 03 Kazhdan et al.03 Osada et al 02 Capel et al 02 Johnson & Herbert 99 Amberg et al 08 3D category model 3D model instance A 3D model category is built from a collection of 3D range data or CAD models

Shape distributions Osada et al 02 Spherical harmonics Kazhdan et al. 03

Full 3D models 3D model instance Bronstein et al, 03 Ruiz-Correa et al. 03, Funkhouser et al 03 Kazhdan et al.03 Osada et al 02 Capel et al 02 Johnson & Herbert 99 Amberg et al 08 3D category model 3D model instance A 3D model category is built from a collection of 3D range data or CAD models - Build a 3d model is expensive - Difficult to incorporate appearance information - Need to cope with 3D alignment (orientation, scale, etc )

3D Object Categorization Mixture of 2D single view models Weber et al. 00 Schneiderman et al. 01 Bart et al. 04 Full 3D models Multi-view models Bronstein et al, 03 Ruiz-Correa et al. 03, Funkhouser et al 03 Capel et al 02 Johnson & Herbert 99 Thomas et al. 06 Savarese et al, 07, 08 Chiu et al. 07 Hoiem, et al., 07 Yan, et al. 07 Kushal, et al., 07 Liebelt et al 08 Sun et al 08

Multi-view models 3D Category model Sparse set of interest points or parts of the objects are linked across

Multi-view models by rough 3d shapes Yan, et al. 07

Multi-view models by rough 3d shapes Hoiem, et al., 0

Multi-view models by ISM representations [Thomas et al. 06] Multi-view model Sparse set of interest points or parts of the objects are linked across

Multi-view models by ISM representations ISM representation [Thomas et al. 06] Leibe, Leonardis, and Schiele, ECCV Workshop on Statistical Learning in Computer Vision 2004 visual codeword with displacement vectors Visual codebook is used to index votes for object position Generalized Hough transform

Set of region-tracks connecting model views Each track is composed of image regions of a single physical surface patch along the model views in which it is visible. Combining multi-views and ISM models Region tracks [Ferrari et al. 04, 06] [Thomas et al. 06] Courtesy of Thomas et al. 06

Properties View point invariant X Single view/ Mixture Multi-view No supervision # Categories ~300 2 X Category View point Share information across views X View synthesis X X Pose estimation X X