SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry
Invariant interest points Goal match points between images with very different scale, orientation, projective distortion method must be invariant to these transformations (e.g. patches around conventional corner points are not!) Different strategies design an interest point detector, which already compensates for the distortion usual strategy for scale detect the interest point, then normalize the local region around it to a canonical value usual strategy for rotation build invariance to the distortion into the region descriptor usual strategy for other (small) geometric distortions build invariance to the distortion into the similarity measure a possible strategy for photometric changes 2
Image scale-space represent the image signal at all scales (in practice at a discrete number of levels) original image is the highest available scale to create coarser scales, suppress fine details smoothing it can be shown that only Gaussian smoothing ensures that no spurious structures are introduced at coarser scales all levels together represent a Gaussian scale space I σ (x, y)= 1 2πσ e x 2 +y 2 2 2σ 2 I 0 (x, y) I0 =4 =16 =64 3
General strategy run interest point detector on all different scales select strong local maxima in (x,y, )-space Example: Laplace detector blobs of high contrast to the background 4
Regions with strong contrast to their surroundings are local curvature maxima find image locations with maximal 2 nd derivative combine smoothing kernel and Laplace kernel (magnitude of 2 nd derivative) into one filter result: Laplacian-of-Gaussian (LoG) interest point detection by convolution with LoG-kernel 2 G σ = 2 G σ x 2 + 2 G σ y 2
DoG - difference of Gaussians computationally efficient approximation LoG approximated by difference between different scale-space levels LoG = σ 2 G xx (x, y,σ)+g yy (x, y,σ) DoG = G(x, y,kσ) G(x, y,σ) 6
DoG - difference of Gaussians computationally efficient approximation LoG approximated by difference between different scale-space levels 7
Scale selection at which scale is the response maximal for an ideal circle of radius r? zero-crossings of DoG function at circle boundary r signal image Laplacian
scale of interest point = scale with highest DoG response characteristic scale Lindeberg, T., Feature detection with automatic scale selection. Int l Journal of Computer Vision, 1998.
response is also high next to strong gray-value edges edge points are unwanted - cannot be reliably localised only want points where the image intensity has high curvature test value (c.f. Förstner operator) H = 2 Ixx I xy trace(h) I xy I yy det(h) = (I xx +I yy ) 2 I xx I yy I 2 xy <T
Steps create Gaussian scale space compute differences between adjacent levels find maxima in 3-dimensional (x,y, )-space remove responses on edges
Descriptors Goal assign a descriptor to each interest point, which allows to robustly match it across views in aerial photogrammetry: the surrounding image region directly serves as descriptor (e.g. for normalized cross-correlation) problem 1: comparing image regions is not invariant to rotation problem 2: comparing gray-values directly is not invariant to small misalignments due to variations in scale, viewpoint, etc. Solution rotation-invariance (in fact covariance): estimate a dominant rotation, align regions misalignments: use a robust encoding of the region, rather than raw intensity values SIFT descriptor 12
Descriptors Normalizing rotation intuition: we need two find the same direction in object space in the local neighborhoods of corresponding points idea: find the dominant gradient direction in the region 13
Descriptors Encoding local structure raw gradients (or intensities) are sensitive to alignment errors global histograms are invariant to small location errors, but contain no spatial information to be robust against small errors, but preserve some local structure, build histograms over local neighborhoods 0 2 14
Descriptors SIFT descriptor divide patch into 4 x 4 sub-windows compute an 8-bin histogram of gradient orientations each location contributes with its gradient magnitude, contribution is distributed over adjacent bins with trilinear interpolation 15
Descriptors SIFT descriptor result: a 128-dimensional vector (4 x 4 x 8 histogram bins) gradients are invariant to constant offset in brightness invariance to linear contrast scaling: normalize feature vector Lowe, D., Distinctive image features from scale-invariant keypoints, IJCV, 2004. 16
Matching descriptors Similarity measure: Euclidean distance Nearest-neighbor search finding the nearest point in high dimensions is expensive approximate nearest-neighbor search with suitable data structures (kd-trees, vantage-point trees,...) Variant to avoid ambiguities find nearest descriptor v 1 and second-nearest descriptor v 2 check that v 1 is significantly closer than v 2 (say, 30%), otherwise discard match 17