SIFT (Scale Invariant Feature Transform) descriptor

Local descriptors

SIFT (Scale Invariant Feature Transform) descriptor SIFT keypoints at loca;on xy and scale σ have been obtained according to a procedure that guarantees illumina;on and scale invance. By assigning a consistent orienta;on, the SIFT keypoint descriptor can be also orienta;on invariant. SIFT descriptor is therefore obtained from the following steps: Determine loca;on and scale by maximizing DoG in scale and in space (i.e. find the blurred image of closest scale) Sample the points around the keypoint and use the keypoint local orienta;on as the dominant gradient direc;on. Use this scale and orienta;on to make all further computa;ons invariant to scale and rota;on (i.e. rotate the gradients and coordinates by the dominant orienta;on) Separate the region around the keypoint into subregions and compute a 8- bin gradient orienta;on histogram of each subregion weigthing samples with a σ = 1.5 Gaussian

SIFT aggrega;on window In order to derive a descriptor for the keypoint region we could sample intensi;es around the keypoint, but they are sensi;ve to ligh;ng changes and to slight errors in x, y, θ. In order to make keypoint es;mate more reliable, it is usually preferable to use a larger aggrega;on window (i.e. the Gaussian kernel size) than the detec;on window. The SIFT descriptor is hence obtained from thresholded image gradients sampled over a 16x16 array of loca;ons in the neighbourhood of the detected keypoint, using the level of the Gaussian pyramid at which the keypoint was detected. For each 4x4 region samples they are accumulated into a gradient orienta;on histogram with 8 sampled orienta;ons. In total it results into a 4x4x8 = 128 dimensional feature vector 4x4 neighbourhood region 16x16 keypoint neighbourhood A a ass keypoint 8 sampled orientations

SIFT Gradient orienta;on histogram The gradient magnitudes are downweighted by a Gaussian func;on (red circle) in order to reduce the influence of gradients far from the center, as these are more affected by small misregistra;ons (smoothed by a Gaussian func;on with σ equal to 1.5 the keypoint scale). The 8- bin gradient orienta;on histogram in each 4 4 quadrant, a is formed by so#ly adding the weighted gradient magnitudes. Orientation histogram Dominant direction SoX distribu;on of values to adjacent histogram bins is performed by trilinear interpola;on to reduce the effects of loca;on and dominant orienta;on mises;ma;on

Image from: Jonas Hurrelmann

SIFT illumina;on invariance The keypoint descriptor is normalized to unit length to make it invariant to intensity change, i.e. to reduce the effects of contrast or gain. To make the descriptor robust to other photometric varia;ons, gradient magnitude values are clipped to 0.2 and the resul;ng vector is once again renormalized to unit length.

SIFT has been empirically found to show very good performance, invariant to image rota.on, scale, intensity change, and to moderate affine transforma;ons [Mikolajczyk & Schmid 2005] s extensive survey : 80% Repeatability at: 10% image noise 45 viewing angle 1k- 100k keypoints in database

Color SIFT descriptors Local descriptors like SIFT are usually based only on luminance and shape, so they use grey- scale values and ignore color, mainly because it is very difficult to select a color model that it sufficiently robust and general. Nevertheless, color is very important to describe/dis;nguish objects or scenes Different types of descriptors can be combined to improve representa;on; the most common combina;on is between a local shape- descriptor (e.g. SIFT) and a color descriptor (e.g. color histogram in a smart color space like Luv or HSV) An example of color- SIFT (sparse) descriptor (van de Weijer and Schmid, ECCV 2006). The combined descriptor is obtained by fusion of standard SIFT and Hue descriptor Courtesy J. van de Weijer

Dense gray and color SIFT SIFT descriptors can also be taken at fixed loca;ons by defining a regular grid over the image. In this case at each center point, 128- d SIFT descriptors are computed. In this case the descriptor account for the distrubu;on of the gradient orienta;on but does not has scale invariance as obtained from the DoG detector. Mul;ple descriptors are therefore computed to allow for scale varia;on 128- d SIFT (128 x 3)- d SIFT

SIFT descriptors in numbers 1 patch (1 SIFT descriptor) = 128 float = 128 * 4 byte (32bit) = 512 byte. 1 image 320x240 well textured ~ 600 keypoints= 600*512 byte = 307200 byte = 300 KB. This presenta;on in memory= 51 slides = 300 KB*51 = 15300 KB ~ 15 MB...

SIFT: implementa;ons available SIFT: available implementa;ons: David Lowe s: ﬁrst code of SIFT algorithm by its creator (only binary). OpenCV 2.2 implementa;on - wrapper from Vedaldi code. SIFT original Lowe s algorithm is quite slow (~6 sec for an image of size 1280x768): it is computa;onally expensive and copyrighted Implementa;on by Rob Hess. Best ever enta;ons: A References: C A. htp://www.cs.ubc.ca/~lowe/keypoints/ B. htp://opencv.willowgarage.com/documenta;on/cpp/features2d_feature_detec;on_and_descrip;on.html#six C. htp://blogs.oregonstate.edu/hess/code/six/ B

Rob Hess SIFT implementa;on Features: Best Open Source SIFT implementa;on in terms of speed, eﬃciency and similarity compared to Lowe s binary. SIFT library is writen in C with versions available for both Linux and Windows Easy to integrate with an OpenCV Project. Contains a suite of algorithms: SIFT keypoint detec;on/descrip;on KD- tree matching Robust plane- to- plane trasforma;on References: 1. htps://web.engr.oregonstate.edu/%7ehess/publica;ons/sixlib- acmmm10.pdf 2. htp://blogs.oregonstate.edu/hess/code/six/

Compile and Install: Unix plazorms are preferred On a debian- based simply: 1. sudo apt- get install build- essen;al libgtk2.0- dev libcv- dev libcvaux- dev 2. cd <path- to- six>/ && make all 3../bin/siXfeat - h Output: Usage: sixfeat [op;ons] <img_file> Op;ons: - h Display this message and exit - o <out_file> Output keypoints to text file - m <out_img> Output keypoint image file (format determined by extension) - i <intervals> Set number of sampled intervals per octave in scale space pyramid (default 3) - s <sigma> Set sigma for ini;al gaussian smoothing at each octave (default 1.6000) - c <thresh> Set threshold on keypoint contrast D(x) based on [0,1] pixel values (default 0.1400) - r <thresh> Set threshold on keypoint ra;o of principle curvatures (default 10) - n <width> Set width of descriptor histogram array (default 4) - b <bins> Set number of bins per histogram in descriptor array (default 8) - d Toggle image doubling (default on) - x Turn off keypoint display

SIFT parameters to tune ( in si#.h or as argument of./bin/si#feat) : SIFT_CONTR_THR [0.04] is the default threshold on keypoint contrast D(x). To high values correspond less but stronger keypoints and vice versa. SIFT_DESCR_HIST_BINS [8] is the default number of bins per histogram in descriptor array. Trade- off between dis;nc;veness and efficiency. SIFT_IMG_DBL [0/1] tells if the image must be resized twice before keypoint localiza;on. Increase the detectable keypoints lowering performance. NN Matching parameters to tune ( in./src/match.c) : KDTREE_BBF_MAX_NN_CHKS [200] is the maximum number of keypoint NN candidates to check during BBF search. NN_SQ_DIST_RATIO_THR [0.49] is the threshold on squared ra;o of distances between 1- st NN and 2- nd NN. Used to reject noisy matches in high dimensions.

SIFT alterna;ves GLOH (Gradient Loca;on and Orienta;on Histogram). larger ini;al descriptor + PCA, TPAMI 2005 SURF (Speeded Up Robust Features) faster than SIFT and some;mes more robust. ECCV 2006. (343, Google cita;ons) GIST Rapid Biologically- Inspired Scene Classifica;on Using Features Shared with Visual Aten;on, TPAMI 2007 LESH Head Pose Es;ma;on In Face Recogni;on Across Pose Scenarios, VISAPP 2008 PCA- SIFT A More Dis;nc;ve Representa;on for Local Image Descriptors, CVPR 2004 Spin Image Sparse Texture Representa;on Using Affine- Invariant Neighborhoods, CVPR 2003

GLOH (Gradient Loca;on and Orienta;on Histogram) descriptor GLOH is a method for local shape descrip;on very similar to SIFT introduced by Miko in 2005 Differently from SIFT it employs a log- polar loca;on grid: 3 bins in radial direc;on 8 bins in angular direc;on 16 bins for Gradient orienta;on quan;za;on The GLOH descriptor is therefore a higher dimensional vector with a total of 17 (i.e. 2x8+1) * 16 = 272 bins. PCA dimension reduc;on is employed in the vector representa;on space

Example 16 dim 2x8 +1 dim GLOH

SURF (Speed Up Robust Features) descriptor SURF is a performant scale and rota;on invariant interest point detector and descriptor. It approximates or even outperforms SIFT and other local descriptors with respect to repeatability, dis;nc;veness, and robustness, yet can be computed and compared much faster This is achieved by: relying on integral images for image convolu;ons building on the strengths of the leading exis;ng detectors and descriptors (using a Hessian matrix- based measure for the detector, and a distribu;on- based descriptor) simplifying these methods to the essen;al

The approach of SURF is similar to that of SIFT but... Integral images in conjunc;on with Haar Wavelets are used to increase robustness and decrease computa;on ;me Instead of itera;vely reducing the image size (like in the SIFT approach), the use of integral images allows the up- scaling of the filter at constant cost SIFT approach SURF approach The SURF descriptor computa;on can be divided into two tasks: Orienta;on assignment Extrac;on of descriptor components

SURF integral images Much of the performance increase in SURF can be atributed to the usage of Integral Images: Integral Image is computed rapidly from an input image I Then it is used to speed- up the calcula;on of the area of any upright rectangular area Given an input image I and a point (x,y) the integral image I is calculated by the sum of the pixel intensi;es between the point and the origin: Using this representa;on, the cost of computa;on for a rectangular area is only 4 addi;ons

The integral image ii (x,y) at loca;on x,y is an intermediate representa;on of the image i (x,y) that contains all the pixels above and to the lex of xy It can be computed in one pass over the original image integra;on along rows integra;on along columns where s(x,y) is the cumula;ve row sum.! " " = y y x x y x i y x ii ', ' ') ', ( ), ( 0 ) 1, ( 0 1), ( ), ( ) 1, ( ), ( ), ( 1), ( ), ( =! =! +! = +! = y ii x s y x s y x ii y x ii y x i y x s y x s

Integral Image ii(x,y) x (0,0) s(x,y) = s(x,y- 1) + i(x,y) y (x- 1,y) (x,y) ii(x,y) = ii(x- 1,y) + s(x,y) Using the integral image representa;on one can compute the value of any rectangular sum in constant ;me. For example the integral sum inside rectangle D is computed as: ii(4) + ii(1) ii(2) ii(3)

SURF fast Hessian Detec;on SURF detector approximates the the second- order Gaussian deriva;ves of the image at point x, also referred to as Laplacian of Gaussians (LoG) by using box filter representa;ons of the Gaussian kernels (differently from SIFT that uses instead Difference of Gaussians (DoG)). In the SURF approach interest points are detected at loca;ons where the determinant of the Hessian Matrix is maximum. SURF permits a very efficient implementa;on, making good use of integral images to perform fast convolu;ons of varying size box filters (at near costant ;me) The Hessian matrix H is calculated as a func;on of both space and scale: L xx (x, σ), L yy (x, σ), L xy (x, σ) are the second- order Gaussian deriva;ves of the image at point x (LoGs)

SURF LoG approxima;on Approximated second order deriva;ves (LoGs) with box filters: 0 0 The 9x9 box filters in the figure are approxima;ons of L yy (x, σ), L xy (x, σ) second- order Gaussian deriva;ves of the image at point x. A Gaussian with σ=1.2 is considered that represents the lowest scale Haar- Wavelets are simple filters which can be used to find gradients in x and y direc;ons. For each template, the corresponding feature value is the sum of the pixels' intensity lying under the black part, minus the sum of the pixels' intensity lying under the white part: x response y response +1-1 - 1 +1 When used with integral images each wavelet requires 6 opera;ons to compute

SURF rienta;on assignment In order to achieve invariance to image rota;on each detected interest point is assigned a reproducible orienta;on. To determine the orienta;on, Haar wavelet responses of 4σ are calculated for a set of pixels within radius 6σ of the detected point: The Haar wavelet responses are represented as vectors. The dominant orienta;on is es;mated by calcula;ng the sum of all responses within a sliding orienta;on window of 60 degrees. Circular neighborhood of radius 6σ Sliding orienta;on window: the longest vector is the dominant orienta;on 6σ (x,y,σ)

SURF extrac;on of descriptor components To extract descriptor components, a square window of size 20σ is taken around the interest point oriented along the dominant orienta;on. The window is divided into 4x4 regular subregions Haar wavelets of size 2σ are calculated for 25 regularly distributed sample points in each subregion The x and y wavelet responses (dx, dy) are collected for each subregion according to: Each subregion therefore contributes 4 values to the SURF descriptor vector leading to an overall vector of length 4x4x4 = 64 The green square bounds one of the 16 subregions and blue circles represent the sample points at which we compute the wavelet responses x ad y responses are calculated rela.ve to the dominant orienta.on

SURF computa;onal cost SURF computa;onal cost (detector and descriptor) against the most- used detectors and the SIFT descriptor:

SURF implementa;ons available A. Herbert Bay code (creator). Library and source code. B. OpenCV implementa;on. C. OpenSurf1 at google code. See matcher_simple.cpp in samples/cpp in OpenCV 2.2 A References: B C A. htp://www.vision.ee.ethz.ch/~surf/ B. htp://opencv.willowgarage.com/documenta;on/cpp/features2d_feature_detec;on_and_descrip;on.html#surf C. htp://code.google.com/p/opensurf1/

SURF alterna;ve versions U- SURF (Upright version) For some applica;ons rota;on invariance is not necessary. U- SURF (Upright version) of SURF does not implement the orienta.on assignment step. It is faster while maintaining a robustness to rota;on of about +/- 15 degrees SURF- 128 SURF- 128 implements a descriptor vector of 128 length. The sum of dx and dx are computed separately for dy < 0 and dy > 0 (and so for dy and dy ) It is more precise and not much slower to compute, although slower to match

Compara;ve table for invariance of main descriptors Only moderate Van Gool 06 SURF Only moderate Fei- Fei Li

Affine Invariant Descriptors Fit an ellipse to the auto- correla;on (using eigenvalue analysis) and then use the principal axes and ra;os of this fit as the affine coordinate frame. The square root of the moment matrix can be used to transform local patches into a frame which is similar up to rota;on. Find affine normalized frame Σ = 1 T pp A Σ = 2 T qq 1 T 1 1 1 A 1 1 Σ = AA A 2 Σ = AA T 2 2 2 rotation Compute rota;onal invariant descriptor in this normalized frame Intensity domain spin image (2- D histogram of brightness) in the affine- normalized patch.

Affine invariant color moments is another affine invariant descriptor: m abc p q a (, ) b (, ) c pq = x y R x y G x y B ( x, y) dxdy region Different combina;ons of these moments are fully affine invariant Also invariant to affine transforma;on of intensity I a I + b F.Mindru et.al. Recognizing Color Paterns Irrespec;ve of Viewpoint and Illumina;on. CVPR99