Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Lecture 08 Computer Vision

Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British Columbia Dr Richard Szeliski Head of Interactive Visual Media Group at Microsoft Research Dr Steve Seitz Professor, University of Washington Dr Kristen Grauman Associate Professor, University of Texas at Austin Dr Allan + Jepson Professor, University of Toronto

Suggested Readings Chapter 4 Richard Szeliski, Computer Vision: Algorithms and Applications, Springer; 2011. D. Lowe, Distinctive image features from scale invariant key points, International Journal of Computer Vision, 2004.

Recall: Corners as distinctive interest points Since M is symmetric, we have u E( u, v) u, v M v M X 1 0 0 X 2 T The eigen values of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window.

Recall: Corners as distinctive interest points edge : 1 >> 2 2 >> 1 corner : 1 and 2 are large, 1 ~ 2 ; flat region 1 and 2 are small; One way to score the cornerness: 2 R k 1 2 1 2

Properties of the Harris corner detector Rotation invariant? Yes Scale invariant? M X 1 0 0 X 2 T

Properties of the Harris corner detector Rotation invariant? Yes Scale invariant? No All points will be classified as edges Corner!

Image matching by Diva Sian by swashford

Harder case by Diva Sian by scgbt

Harder still? NASA Mars Rover images

Look for tiny colored squares.. NASA Mars Rover images with SIFT feature matches Figure by Noah Snavely

Image Matching At an interesting point, let s define a coordinate system (x,y axis) Use the coordinate system to pull out a patch at that point

Image Matching

Local features: main components Detection: Identify the interest points (1) (1) X1 [ x1,... x d ] Description: Extract feature vector descriptor surrounding each interest point (2) (2) X 2 [ x1,... x d ] Matching: Determine correspondence between descriptors in two views

SIFT D. Lowe, Distinctive image features from scale invariant key points, International Journal of Computer Vision, 2004 SIFT Scale Invariant Feature Transform - Detector and descriptor - SIFT is computationally expensive and copyrighted

SIFT - Feature detector A SIFT keypoint is a circular image region with an orientation It is described by a geometric frame of four parameters: - Keypoint center coordinates x and y - Its scale (the radius of the region) - and its orientation (an angle expressed in radians)

Steps for Extracting Key Points Scale space peak selection - Potential locations for finding features Key point localization - Accurately locating the feature key points Orientation Assignment - Assigning orientation to the key points Key point descriptor - Describing the key point as a high dimensional vector

Invariant local features Algorithm for finding points and representing their patches should produce similar results even when conditions vary Buzzword is invariance geometric invariance: translation, rotation, scale photometric invariance: brightness, exposure, Feature Descriptors

What makes a good feature? Say we have 2 images of this scene we d like to align by matching local features What would be the good local features (ones easy to match)? Uniqueness

Illumination Types of invariance

Illumination Scale Types of invariance

Types of invariance Illumination Scale Rotation

Types of invariance Illumination Scale Rotation Affine

Types of invariance Illumination Scale Rotation Affine Perspective

Scale invariant interest points Independently select interest points in each image, such that the detections are repeatable across different scales

Automatic scale selection Intuition: Find scale that gives local maxima of some function f in both position and scale. f Image 1 f Image 2 s 1 region size s 2 region size

Automatic scale selection Intuition: Find scale that gives local maxima of some function f in both position and scale. f is a local maximum in both position and scale Common definition of f: Laplacian: or difference between two Gaussian filtered images with different sigmas

Recall: Edge detection f Edge d dx g Derivative of Gaussian f d dx g Edge = maximum of derivative

Recall: Edge detection f Edge d dx 2 2 g Second derivative of Gaussian (Laplacian) d dx 2 f 2 g Edge = zero crossing of second derivative

From edges to blobs Edge: ripple Blob: superposition of two ripples Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is matched to the scale of the blob maximum

Blob detection in 2D Laplacian of Gaussian: Circularly symmetric operator for blob detection in 2D 2 g 2 g 2 x 2 g 2 y

filter scales Blob detection in 2D: scale selection Laplacian-of-Gaussian = blob detector 2 g 2 g 2 x 2 g 2 y img1 img2 img3

Blob detection in 2D We define the characteristic scale as the scale that produces peak of Laplacian response characteristic scale

Scale space blob detector- Example Original image at ¾ the size Original image

Scale space blob detector- Example Original image down sampled to ¾ the size but later stretched to match with original size

Scale space blob detector- Example

Scale invariant interest points Interest points are local maxima in both position and scale Laplacian-of- Gaussian σ 5 σ 4 L xx ( ) L ( ) yy σ 3 σ 2 Squared filter response maps σ 1 scale List of (x, y, σ)

Scale-space blob detector: Example

Scale invariant interest points Pyramids Smooth the image Subsample the smoothed image Repeat until image is small Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution) Scale Space (DOG method)

Pyramids

Scale invariant interest points Pyramids Scale Space (DOG method) Pyramid but fill gaps with blurred images Take features from differences of these images If the feature is repeatably present in between Difference of Gaussians it is Scale Invariant and we should keep it.

Building a Scale Space Scale-space representation L(x,y;t) at scale t=0,1,4,16,64,256 respectively, corresponding to the original image

Building a Scale Space All scales must be examined to identify scale invariant features An efficient function is to compute the Laplacian Pyramid (Difference of Gaussian)

Approximation of LoG by Difference of Gaussians Heat Equation Rate of change of Gaussian Typical values : σ =1.6; k= 2 Gaussian filter with σ

Scale invariant interest points - Technical detail We can approximate the Laplacian with a difference of Gaussians; more efficient to implement. 2 L Gxx x y Gyy x y (,, ) (,, ) Laplacian DoG G( x, y, k ) G( x, y, ) Difference of Gaussians

Building a Scale Space Scale (next octave) k 4 σ Scale (first octave) k 3 σ k 2 σ kσ Peak detection Local maxima/minima σ Gaussian Difference of Gaussian

Building a Scale Space http://www.vlfeat.org/api/sift.html

Scale Space Peak Detection Compare a pixel (X) with 26 pixels in current and adjacent scales (Green Circles) Select a pixel (X) if larger/smaller than all 26 pixels Large number of extrema, computationally expensive - Detect the most stable subset with a coarse sampling of scales 9+9+8 = 26 http://www.vlfeat.org/api/sift.html

Key Point Localization Candidates are chosen from extrema detection Original Image Extrema locations

Initial Outlier Rejection Low contrast candidates Poorly localized candidates along an edge Taylor series expansion of DOG, D Minima or maxima is located at Value of D(x) at minima/maxima must be large, D(x) >th

Initial Outlier Rejection Reduced from 832 key points to 729 key points, th=0.03

Further Outlier Rejection DOG has strong response along edge Assume DOG as a surface - Compute principal curvatures (PC) - Along the edge PC is very low, across the edge is high

Further Outlier Rejection Analogous to Harris corner detector Compute Hessian of D Remove outliers by evaluating

Further Outlier Rejection Following quantity is minimum when r=1 It increases with r Eliminate key points if

Further Outlier Rejection Reduced from 729 key points to 536 key points

How to achieve invariance Need both of the following: 1. Make sure your detector is invariant Harris is invariant to translation and rotation Scale is trickier common approach is to detect features at many scales using a Gaussian pyramid More sophisticated methods find the best scale to represent each feature (e.g., SIFT) 2. Design an invariant feature descriptor A descriptor captures the information in a region around the detected feature point The simplest descriptor: a square window of pixels Better approach is SIFT descriptor

SIFT - Feature descriptors SIFT Descriptor: is a 3-D spatial histogram of the image gradients in characterizing the appearance of a keypoint.

SIFT - Feature descriptors Use histograms to bin pixels within sub-patches according to their orientation. 0 2 p

Making descriptor rotation invariant To achieve rotation invariance, compute central derivatives, gradient magnitude and direction of L (smooth image) at the scale of key point (x,y)

Orientation assignment Create a weighted direction histogram in a neighborhood of a key point (36 bins) 10 degree for each orientation = 36x10 = 360 Weights are - Gradient magnitudes - Spatial gaussian filter with σ =1.5 x <scale of key point> Gradient magnitude

Orientation assignment Select the highest peak as direction of the key point Use of gradient orientation histograms has robust representation

Orientation assignment Find dominant orientation of the image patch This is given by x +, the eigenvector of H corresponding to + + is the larger eigen value Rotate the patch according to this angle

Orientation assignment Rotate patch according to its dominant gradient orientation This puts the patches into a canonical orientation Gradient orientation is more stable then raw intensity value CSE 576: Computer Vision

SIFT - Feature descriptors Extraordinarily robust matching technique Can handle changes in viewpoint - Up to about 60 degree out of plane rotation Can handle significant changes in illumination - Sometimes even day vs. night (below) Fast and efficient can run in real time http://people.csail.mit.edu/albert/ladypack/wiki/index.php?title=known_implementations_of_sift

SIFT - Feature descriptors Take 16x16 square window around detected feature Compute edge orientation (angle of the gradient) for each pixel Throw out weak edges (threshold gradient magnitude) Create histogram of surviving edge orientations 0 2 angle histogram Image gradients Keypoint descriptor

SIFT - Feature descriptors Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Image gradients Keypoint descriptor

Matching local features Given a feature in I 1, how to find the best match in I 2? Define distance function that compares two descriptors Image 1 Image 2

Matching local features To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance)? Image 1 Image 2

Matching local features How to define the difference between two features f 1, f 2? Simple approach is SSD(f 1, f 2 ) Sum of Square Differences between entries of the two descriptors f 1 f 2 Image 1 Image 2 SSD can give good scores to very ambiguous (bad) matches!!!

Matching local features How to define the difference between two features f 1, f 2? Better approach: ratio = SSD(f 1, f 2 ) / SSD(f 1, f 2 ) f 2 is best SSD match to f 1 in I 2 f 2 is 2 nd best SSD match to f 1 in I 2 Gives small values for ambiguous matches f 1 f 2 f 2 ' Image 1 Image 2

Ambiguous matches To add robustness to matching, can consider ratio : Euclidean distance to best match / Euclidean distance to second best match - If low, first match looks good - If high, could be ambiguous match???? Image 1 Image 2

Matching SIFT Descriptors Nearest neighbor (Euclidean distance) Threshold ratio of nearest to 2 nd nearest descriptor

Evaluating the results How can we measure the performance of a feature matcher? 50 75 200 feature distance

True/false positives The distance threshold affects performance True positives = # of detected matches that are correct Suppose we want to maximize these how to choose threshold? False positives = # of detected matches that are incorrect Suppose we want to minimize these how to choose threshold? 50 75 true match 200 false match feature distance

Evaluating the results How can we measure the performance of a feature matcher? 1 0.7 # true positives # matching features (positives) true positive rate 0 0.1 false positive rate 1 # false positives # unmatched features (negatives)

# true positives # matching features (positives) Evaluating the results How can we measure the performance of a feature matcher? ROC Curves - Generated by counting # current/incorrect matches, for different threholds - Want to maximize area under the curve (AUC) - Useful for comparing different feature matching methods - For more info: http://en.wikipedia.org/wiki/receiver_oper ating_characteristic ROC curve ( Receiver Operator Characteristic ) 0.7 1 true positive rate 0 0.1 false positive rate 1 # false positives # unmatched features (negatives)