Local Image Features Ali Borji UWM Many slides from James Hayes, Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial
Overview of Keypoint Matching 1. Find a set of distinctive key- points A 1 A 2 A 3 B 3 2. Define a region around each keypoint f A d( f A, f B ) < T B 1 B 2 f B K. Grauman, B. Leibe 3. Extract and normalize the region content 4. Compute a local descriptor from the normalized region 5. Match local descriptors
Why local features? LocalFeature=Interest Point =Keypoint=F eature Point = Distinguished Region = Covariant Region ( ) local descriptor Local: robust to occlusion/clutter Invariant: to scale/view/illumination changes Local feature can be discriminant 3 Slide credit: Liangliang Cao
4 Slide credit: B. Freeman and A. Torralba
Image/patch representations Templates Intensity, gradients, etc. Histograms Color, texture, SIFT descriptors, etc.
6 Slide credit: Liangliang Cao Development of Low-Level Image Features 1999 Classical features Raw pixel Histogram feature - Color histogram - Edge histogram Frequency analysis Image filters Texture features - LBP Scene features - GIST Shape descriptors Edge detection Corner detection Local Descriptors SIFT HOG SURF DAISY BRIEF DoG Hessian detector Laplacian of Harris FAST ORB
7 Slide credit: Liangliang Cao Concatenating Raw Pixels As 1D Vector Credit: The Face Research Lab
Pictures courtesy to Sam Roweis 8 Slide credit: Liangliang Cao Concatenated Raw Pixels Famous applications (widely used in ML field) Face recognition e recognition Hand-written digits nd written digits
Slide credit: K. Grauman Raw patches as local descriptors The simplest way to describe the neighborhood around an interest point is to write down the list of intensities to form a feature vector. But this is very sensitive to even small shifts, rotations.
Tiny Images Antonio Torralba et al. proposed to resize images to 32x32 color thumbnails, which are called ages tiny images Related applications - Scene recognition - Object recognition Fast speed with limited accuracy Torralba et al, MIT CSAIL report, 2007 Slide credit: Liangliang Cao 10
11 Slide credit: Liangliang Cao Problem of raw-pixel based representation -Rely heavily on good alignment -Assume the images are of similar scale -Suffer from occlusion -Recognition from different view point will be difficult We want more powerful features for realworld problems like the following
Image Representations: Histograms Images from Dave Kauchak Global histogram Represent distribution of features Space Shuttle Color, texture, depth, Cargo Bay
Image Representations: Histograms Histogram: Probability or count of data in each bin Images from Dave Kauchak Joint histogram Requires lots of data Loss of resolution to avoid empty bins Marginal histogram Requires independent features More data/bin than joint histogram
[Swain and Ballard 91] 14 Slide credit: Liangliang Cao Color Histogram Each pixel is described by a vector of pixel values Distribution of color vectors is described by a histogram r g b Note: There are different choices for color space: RGB, HSV, Lab, etc. For gray images, we usually use 256 or fewer bins for histogram.
What kind of things do we compute histograms of? Color L*a*b* color space HSV color space Texture (filter banks or HOG over regions)
16 Slide credit: Liangliang Cao Benefits of Histogram Representations No longer sensitive to alignment, scale transform, or even global rotation Similar color histograms Similar color histograms (after normalization) ( ).
Limitation of Global Histogram Global histogram has no location information at all All of these images have the same color histogram 17 Slide credit: Ali Farhadi
Spatial pyramid representation Extension of a bag of features Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006) Slide credit: Ali Farhadi
Spatial pyramid representation Extension of a bag of features Locally orderless representation at several levels of resolution level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006) Slide credit: Ali Farhadi
Spatial pyramid representation Extension of a bag of features Locally orderless representation at several levels of resolution level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006) Slide credit: Ali Farhadi
General Histogram Representation Color histogram patch patch patch Extract color descriptor General histogram histogram accumulation patch patch patch Extract other descriptors edge histogram shape context histogram local binary patterns histogram accumulation 21 Slide credit: Liangliang Cao
Local Binary Pattern (LBP) For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p s value and 1 otherwise. Represent the texture in the image (or a region) by the histogram of these numbers. 8 1 2 3 100 101 103 40 50 80 50 60 90 7 6 4 5 1 1 1 1 1 1 0 0 Ojala et al, PR 96, PAMI 02 Ojala et al, PR 96, PAMI 02 22 Slide credit: Liangliang Cao
LBP Histogram Divide the examined window to cells (e.g. 16x16 pixels for each cell). Compute the histogram, over the cell, of the frequency of each "number" occurring. Optionally normalize the histogram. Concatenate normalized histograms of all cells. 23 Slide credit: Liangliang Cao
Steps Histogram of Oriented (HOG) descriptor Histograms of Oriented Gradients for Human Detection [N. Dalal and B. Triggs CVPR 2005] 24
25
26
27
28
Slide credit: K. Grauman SIFT descriptor [Lowe 2004] Full version Divide the 16x16 window into a 4x4 grid of cells Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor 0 2p Why subpatches? Why does SIFT have some illumination invariance?
SIFT descriptor Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Threshold normalize the descriptor: such that: 0.2 Adapted from slide by David Lowe
Making descriptor rotation invariant CSE 576: Computer Vision Rotate patch according to its dominant gradient orientation This puts the patches into a canonical orientation. Image from Matthew Brown Slide credit: K. Grauman
Slide credit: K. Grauman, S. Seitz SIFT properties Extraordinarily robust matching technique Can handle changes in viewpoint Up to about 60 degree out of plane rotation Can handle significant changes in illumination Sometimes even day vs. night (below) Fast and efficient can run in real time Lots of code available http://people.csail.mit.edu/albert/ladypack/wiki/index.php/ Known_implementations_of_SIFT
Slide credit: K. Grauman Example NASA Mars Rover images
Example NASA Mars Rover images with SIFT feature matches Figure by Noah Snavely Slide credit: K. Grauman
Example: Object Recognition SIFT is extremely powerful for object instance recognition, especially for well-textured objects Lowe, IJCV04
Example: Google Goggle
Slide credit: K. Grauman SIFT properties Invariant to Scale Rotation Partially invariant to Illumination changes Camera viewpoint Occlusion, clutter
38 Slide credit: R. Urtasun
SIFT Repeatability Lowe IJCV 2004
SIFT Repeatability
Recognition of specific objects, scenes Schmid and Mohr 1997 Sivic and Zisserman, 2003 Kristen Grauman Rothganger et al. 2003 Lowe 2002
When does SIFT fail? Patches SIFT thought were the same but aren t:
Other methods: Daisy Circular gradient binning SIFT Daisy Picking the best DAISY, S. Winder, G. Hua, M. Brown, CVPR 09
Local Descriptors: SURF (Speeded Up Robust Features) Fast approximation of SIFT idea Efficient computation by 2D box filters & integral images 6 times faster than SIFT Equivalent quality for object identification GPU implementation available Feature extraction @ 200Hz (detector + descriptor, 640 480 img) http://www.vision.ee.ethz.ch/~surf [Bay, ECCV 06], [Cornelis, CVGPU 08] K. Grauman, B. Leibe
Other methods: SURF For computational efficiency only compute gradient histogram with 4 bins: SURF: Speeded Up Robust Features Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, ECCV 2006
Other methods: BRIEF Randomly sample pair of pixels a and b. 1 if a > b, else 0. Store binary vector. Daisy BRIEF: binary robust independent elementary features, Calonder, V Lepetit, C Strecha, ECCV 2010
Local Descriptors: Shape Context Count the number of points inside each bin, e.g.: Count = 4... Count = 10 Log-polar binning: more precision for nearby points, more flexibility for farther points. Belongie & Malik, ICCV 2001 K. Grauman, B. Leibe
Shape Context Descriptor
Shape Context Descriptor Now in order for a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scale, small perturbations, and depending on application rotation. Translational invariance come naturally to shape context. Scale invariance is obtained by normalizing all radial distances by the mean distance \alpha between all the point pairs in the shape [2][3] although the median distance can also be used.[1] [4] Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers[4] using synthetic point set matching experiments.[5] One can provide complete rotation invariance in shape contexts. One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges). This results in a completely rotationally invariant descriptor. But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame. Many applications in fact forbid rotation invariance e.g. distinguishing a "6" from a 9". [from WIKIPEDIA] 49
http://www.acberg.com/gb.html Local Descriptors: Geometric Blur Compute edges at four orientations Extract a patch in each channel Example descriptor ~ (Idealized signal) Apply spatially varying blur and sub-sample Berg & Malik, CVPR 2001 K. Grauman, B. Leibe
Self-similarity Descriptor Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007
Self-similarity Descriptor Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007
Self-similarity Descriptor Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007
Learning Local Image Descriptors, Winder and Brown, 2007
Choosing a detector What do you want it for? Precise localization in x-y: Harris Good localization in scale: Difference of Gaussian Flexible region shape: MSER Best choice often application dependent Harris-/Hessian-Laplace/DoG work well for many natural categories MSER works well for buildings and printed things Why choose? Get more points with more detectors There have been extensive evaluations/comparisons [Mikolajczyk et al., IJCV 05, PAMI 05] All detectors/descriptors shown here work well
Comparison of Keypoint Detectors Tuytelaars Mikolajczyk 2008
Local Descriptors Most features can be thought of as templates, histograms (counts), or combinations The ideal descriptor should be Robust Distinctive Compact Efficient Most available descriptors focus on edge/ gradient information Capture texture information Color rarely used K. Grauman, B. Leibe
Things to remember Keypoint detection: repeatable and distinctive Corners, blobs, stable regions Harris, DoG Descriptors: robust and selective spatial histograms of orientation SIFT