Combining Appearance and Topology for Wide

Similar documents
CSE 252B: Computer Vision II

Stereo and Epipolar geometry

Local features: detection and description. Local invariant features

Outline 7/2/201011/6/

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Local features: detection and description May 12 th, 2015

Feature Based Registration - Image Alignment

Instance-level recognition

Instance-level recognition part 2

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Instance-level recognition II.

Automatic Image Alignment (feature-based)

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Local Feature Detectors

Instance-level recognition

Video Google: A Text Retrieval Approach to Object Matching in Videos

Local features and image matching. Prof. Xin Yang HUST

Local invariant features

Final Exam Study Guide

Patch Descriptors. EE/CSE 576 Linda Shapiro

The SIFT (Scale Invariant Feature

Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Image correspondences and structure from motion

Evaluation and comparison of interest points/regions

Feature Detectors and Descriptors: Corners, Lines, etc.

Patch Descriptors. CSE 455 Linda Shapiro

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

CS231A Midterm Review. Friday 5/6/2016

Invariant Features from Interest Point Groups

Shape recognition with edge-based features

Reinforcement Matching Using Region Context

Motion Estimation and Optical Flow Tracking

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

CS4670: Computer Vision

Feature Matching and RANSAC

Determinant of homography-matrix-based multiple-object recognition

Automatic Image Alignment

Motion Tracking and Event Understanding in Video Sequences

Generalized RANSAC framework for relaxed correspondence problems

Homographies and RANSAC

Prof. Feng Liu. Spring /26/2017

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Shape Descriptor using Polar Plot for Shape Recognition.

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

A Summary of Projective Geometry

Requirements for region detection

Dense 3D Reconstruction. Christiano Gava

VEHICLE MAKE AND MODEL RECOGNITION BY KEYPOINT MATCHING OF PSEUDO FRONTAL VIEW

Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences

Automatic Image Alignment

Object Recognition with Invariant Features

Object and Class Recognition I:

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

RANSAC: RANdom Sampling And Consensus

Local Features: Detection, Description & Matching

Image Matching. AKA: Image registration, the correspondence problem, Tracking,

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Scale Invariant Feature Transform

RANSAC and some HOUGH transform

Dense 3D Reconstruction. Christiano Gava

Feature descriptors and matching

Mosaics. Today s Readings

3D Computer Vision. Structure from Motion. Prof. Didier Stricker

CSE 527: Introduction to Computer Vision

Chapter 3 Image Registration. Chapter 3 Image Registration

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

Vision-based Mobile Robot Localization and Mapping using Scale-Invariant Features

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Reliable Feature Matching Across Widely Separated Views

Scale Invariant Feature Transform

Multi-stable Perception. Necker Cube

CS 231A Computer Vision (Winter 2014) Problem Set 3

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Oriented Filters for Object Recognition: an empirical study

EE795: Computer Vision and Intelligent Systems

Announcements. Recognition (Part 3) Model-Based Vision. A Rough Recognition Spectrum. Pose consistency. Recognition by Hypothesize and Test

An Evaluation of Volumetric Interest Points

Structure from motion

Midterm Examination CS 534: Computational Photography

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Wide Baseline Matching using Triplet Vector Descriptor

Recognition (Part 4) Introduction to Computer Vision CSE 152 Lecture 17

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Computer Vision I - Filtering and Feature detection

Tightly-Integrated Visual and Inertial Navigation for Pinpoint Landing on Rugged Terrains

A SIFT Descriptor with Global Context

calibrated coordinates Linear transformation pixel coordinates

Multiple Views Geometry

Deep Learning: Image Registration. Steven Chen and Ty Nguyen

Accurate Image Registration from Local Phase Information

EE795: Computer Vision and Intelligent Systems

Matching Local Invariant Features with Contextual Information: An Experimental Evaluation.

Today s lecture. Image Alignment and Stitching. Readings. Motion models

Transcription:

Combining Appearance and Topology for Wide Baseline Matching Dennis Tell and Stefan Carlsson Presented by: Josh Wills

Image Point Correspondences Critical foundation for many vision applications 3-D reconstruction, object detection/recognition, etc Trivial problem for the human visual system Extremely difficult problem computationally

Correspondences Correspondences are pairs of points (one from the left image and one from the right image) that correspond to the same real world point. They are often represented by point-line pairs in a set of images.

Difficulties in matching Many matches are only available via higher level reasoning This perceived match disappears locally Matches may rely on previous knowledge of objects We understand what a given object would look like from varying viewpoints and lighting conditions

Previous Work Invariant descriptors Schmid and Mohr - Affine invariant descriptors Belongie - shape contexts Intensity profiles Tell and Carlsson Robust model estimation Torr - RANSAC

Presentation Outline Common approach to matching Intensity Profiles Topological Constraints Results Discussion

Common approach to matching Locate interest points Harris corners Attach a descriptor to each point and compute a measure of similarity for each pair of points Normalized correlation on intensity Choose candidate matches Maximum similarity or Hungarian Validate matches using robust estimation RANSAC for homography or fundamental matrix

Interest Points n n n Many of the pixels in an image are redundant. Computation is expensive so we want a set of points that are interesting The types of points we are looking for are those centered in rank 1 or 2 neighborhoods

Harris Corner Detector For each point, compute corner-ness for an image patch centered on that point: Where k is routinely 0.04 Keep all points above some threshold When trace is large there is an edge (rank 1) When determinant is large there is an edge or corner (rank 1 or 2)

Point Descriptors Local v. Relational Local point descriptors attempt to capture the appearance of a small patch of the image centered on a given point. Gray values Filter responses With relational point descriptors, a point is characterized by its relation to the other points in the scene. Shape contexts

Gray value point descriptors The gray values in a small neighborhood centered on a point may be strung out as a feature vector for that point. These vectors may be compared using SSD or cross correlations These descriptors allow for very efficient comparison of points, but have little invariance to changes between the two images

Filter-based point descriptors A set of oriented filters are applied to each image and the response of each filter at a given point results in another type of feature vector. This allows for more invariance to absolute intensity and a degree of rotational invariance is possible by cyclically rotating the feature vectors.

Shape contexts The points are classified by their positional relation to the other points in the image. Log polar histograms are centered on each point and the chi-squared score for each pair is computed. There is a degree of affine/rotational invariance in using this point descriptor

Validating Point Matches Due to the difference of appearance in the two images and the limitations of the point matching approaches, mismatches are all but guaranteed. One method for sifting out these mismatches is to fit an image relation model to the data and see which matches are consistent with the estimated model. Robust estimation is used to stifle the effect of outliers on the model estimation.

Robust Estimation What set of model parameters best explains the data? Here, what a and b values in the equation for a line result in the best fit to this point set which is known have outliers?

Least Squares Try to minimize the squared error for each of the points in the point set Very fast and efficient Highly influenced by outliers - notice the effect of the outlier on the right

RANSAC - Random Sample Consensus Find the parameters that results in the largest number of inliers - points that fit the model Repeatedly choose samples of the minimal size, recover the appropriate model, and count inliers P.H.S. Torr brought this to computer vision

Estimation for image relations This same approach is applied to the estimation of the image relation. Here we seen the application of a homography that was computed via RANSAC. The outliers are shown in red.

Homography Circular order-preserving transformation Similarity: angle preserving Affine: parallel-line preserving Also the space of transformations that are possible with perspective effects on a plane

Description of the Approach Extract interest points with Harris corner detector For each intra-image pair of points, extract the intensity profile joining them Compare each inter-image pair of profiles Fourier feature vectors and add votes for the endpoints of similar profiles Perform String matching to ensure one-to-one matching and to preserve cyclic order Using RANSAC, eliminate more outliers and find homography and/or fundamental matrix

Planar Assumption For intensity profiles to be reliable and for the preservation of cyclic order, image regions must be related by a homography. This requires that the regions correspond to real-world planar objects.

Intensity Profiles Invariant to affine transformations if the profile is on a planar surface. For many perspective transformations, there is an affine approximation that is close for a given 1D line. The only difference will be a change in scale an possibly some high frequency variation.

Fourier Feature Vectors These features are chosen primarily for their scale invariance and invariance to high-frequency noise. Choose N for N/2 sin and cos basis functions

Intensity Profile Matching Matches constitute votes for the correspondences of the vertices of the intensity profiles

Drawbacks of the Original Method Biased voting - points with common intensity profiles may receive superfluous votes for correspondences in the vote matrix Only local information - matches that are ambiguous using only local information may be unique when considering relational data.

Biased voting The point p would have 4 votes for a match with itself, but it would have 6 votes for a match with q. This can be solved by enforcing a one-to-one constraint on the votes.

Local and Topological features Another drawback of the original paper is that it doesn t use some of the semi-local information to disambiguate the matches The cyclic order of correct matches is preserved for planar regions.

String Matching Here we can see the intensity profiles that are coming out of the points p and q Each of these profiles can represent a letter in a string and the maximum number of matching profiles that preserve cyclic order can be formulated as a maximum substring matching problem.

String Matching Algorithm

Cyclic String Matching We don t know the starting points of the strings, which can effect the length of the match that is found. An algorithm by Gregor and Thomason extends the standard string matching approach to cyclic string matching with a running time of O(mn logn)

Results of String Matching The string matching step significantly reduced the noise in the vote matrix by disambiguating many matches The combination of local and semi-local feature comparison results in more discriminative power

Results 149 matches 47 matches

More Results 109 matches 29 matches

Discussion Contributions Intensity profiles Topological constraints Limitation Planar assumption Possible extensions/applications Pairing with an existing method Tracking for image mosaicing

Intensity Profiles Invariant to affine transformations and tolerant to perspective transformations on planar surfaces Very good descriptors for wide-baseline pairs since often the image change is so severe that most other methods will not find a match However, they are unstable if the localization of the end points is not exact. A slight change in the vertices may completely change the content of the intensity profile

Topological Constraints Provide more discriminative power and lead to disambiguated matches. Require that interest points are detected on both images - something that is not solved Schmid and Mohr show that will no interest point detector (short of one that chooses all pixels) is guaranteed to find both projections of a point in the real world.

Planar assumption This may be very limiting. Many scenes contain little or no planar regions. Parallax and occlusion are common to wide-baseline stereo and both will break intensity profile matching. This method will only allow the estimation of a fundamental matrix if there are two planes that are present - a pointset on a plane will lead to a degenerate solution for the matrix.

Pairing with existing systems If one out of the seven points used to estimate the fundamental matrix is not coplanar with the rest of the points, then a solution is possible. The matches found with this method could be combined with the matches from another method (say, Jones and Malik). Since the sample space of RANSAC is correspondences, it doesn t matter how those correspondences were generated and there is no difficulty in the integration of multiple sets.

Tracking/Image Mosaicing Most of the scene appears roughly planar since there is a large distance from the camera to the points in the scene. This may require a slight change in the comparison of the intensity profiles since there will be occlusion from the vehicle.

Sentence to walk away with By combining intensity profile matching and topological constraints, Tell and Carlsson balanced the opposing forces of invariance and discriminative power to achieve good results in point matching for planar regions.

References Dennis Tell and Stefan Carlsson. Wide baseline point matching using affine invariants computed from intensity profiles. In ECCV, vol 1, 814-828, 2000. Torr, P. H. S. and Murray, D. W. The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix. In IJCV pages 271-300, 1997. C. Schmid, R. Mohr and C. Bauckhage. Evaluation of Interest Point Detectors. In IJCV, 37(2), 151-172, 2000. David Jones and Jitendra Malik. "A computational framework for determining stereo correspondence from a set of linear spatial filters" In ECCV, 1992, 395-410, 1992 Dennis Tell. Wide Baseline Matching with Applications to Visual Servoing. TRITA-NA-0205, ISSN 0348-2952, ISBN 91-7283-254-1, PhD Thesis, Dept. of Numerical Analysis and Computer Science, KTH, Stockholm, Sweden 2002.

Acknowledgements Ben Ochoa - for providing the mosaicing videos The music of Sameer Agarwal