SCALE INVARIANT FEATURE TRANSFORM (SIFT)

1 SCALE INVARIANT FEATURE TRANSFORM (SIFT)

OUTLINE SIFT Background SIFT Extraction Application in Content Based Image Search Conclusion 2

SIFT BACKGROUND Scale-invariant feature transform SIFT: to detect and describe local features in an images. Proposed by David Lowe in ICCV1999. Refined in IJCV 2004. Cited more than 12000 times till now. Wildly used in image search, object recognition, video tracking, gesture recognition, etc. David Lowe Professor in UBC 3

WHY SIFT IS SO POPULAR? An instance of object matching 4

WHY SIFT IS SO POPULAR? Desired property of SIFT Invariant to scale change Invariant to rotation change Invariant to illumination change Robust to addition of noise Robust to substantial range of affine transformation Robust to 3D view point Highly distinctive for discrimination 5

CLAIMED ADVANTAGES OF SIFT Locality: features are local, so robust to occlusion and clutter (no prior segmentation) Distinctiveness: individual features can be matched to a large database of objects Quantity: many features can be generated for even small objects Efficiency: close to real-time performance Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness 6

HOW TO EXTRACT SIFT Test image Detector: where are the local features? Descriptor: how to describe them? 7

SIFT DETECTOR Desired properties for detector Position: Repeatable across different changes Scale: automatic scale estimation Intuition: Find scale that gives local maxima of some function f in both position and scale. 8

WHAT CAN BE THE SIGNATURE FUNCTION F? 9

WHAT CAN BE THE SIGNATURE FUNCTION F? Laplacian-of-Gaussian = blob detector 2-D LoG by a 5 5 convolution kernel 14

AT A GIVEN POINT IN THE IMAGE: We define the characteristic scale as the scale that produces peak of Laplacian response 15

SCALE-SPACE BLOB DETECTION 22

SCALE-SPACE BLOB DETECTOR: EXAMPLE 23

LOG VS. DOG x y 2 G ( x, y) G ( x, y) 2 2 2 4 2 G 2 2 G( x, y, k ) G( x, y, ) ( k 1) ( k 1) G DoG LoG 24

TECHNICAL DETAIL We can approximate the Laplacian with a difference of Gaussians; more efficient to implement. (Laplacian) (Difference of Gaussians) 25

HOW MANY SCALES? convolution computing: At each scale: MxNxKxK operations 26

LOWE S PYRAMID SCHEME Scale space is separated into octaves: Octave 1 uses scale Octave 2 uses scale 2 etc. In each octave, the initial image is repeatedly convolved with Gaussians to produce a set of scale space images. Adjacent Gaussians are subtracted to produce the DOG After each octave, the Gaussian image is down-sampled by a factor of 2 to produce an image ¼ the size to start the next level. 11/22/2016 28

DOG IMAGE PYRAMID 2 0 0 29

LOCAL EXTREMA DETECTION Maxima and minima Compare x with its 26 neighbors at 3 scales 30

SCALE-SPACE EXTREMA DETECTION: EXPERIMENTAL RESULTS OVER 32 IMAGES THAT WERE SYNTHETICALLY TRANSFORMED AND NOISE ADDED. The top line of the first graph shows the percent of keypoints that are repeatably detected at the same location and scale in a transformed image as a function of the number of scales sampled per octave. The lower line shows the percent of keypoints that have their descriptors correctly matched to a large database. The second graph shows the total number of keypoints detected in a typical image as a function of the number of scale samples. 31

FREQUENCY OF SAMPLING IN DOMAIN Trade-off between sampling frequency and rate of detection Sigma=1.6 32

ACCURATE KEYPOINT LOCALIZATION Derivatives D at the sample point (x,y,sigma) with offset x Location of accurate extrema is 33

ELIMINATING UNSTABLE KEYPOINT If x^ > o.5 in any dimension, closer to a different sample point All extrema with a value of D(x) <0.03 were discarded 34

ELIMINATING EDGE RESPONSES Motivation DoG aims to detect blob. DoG function have a strong response along edges. Remove such key points by Hessian Matrix analysis Hessian matrix Formulation 35

ELIMINATING EDGE RESPONSES 36

ORIENTATION Gradient and angle: Orientation selection 38

SIFT DESCRIPTOR Making descriptor rotation invariant Rotate patch according to its dominant gradient orientation This puts the patches into a canonical orientation. 39

SIFT DESCRIPTOR Use histograms to bin pixels within sub-patches according to their orientation. 40

SUMMARY OF SIFT FEATURE Descriptor: 128-D 4 by 4 patches, each with 8-D gradient angle histogram: 4 4 8 = 128 Normalized to reduce the effects of illumination change. Position: (x, y) Where the feature is located at. Scale Control the region size for descriptor extraction. Orientation To achieve rotation-invariant descriptor. 41

LOCAL FEATURES 42

MATCHING LOCAL FEATURES 43

MATCHING LOCAL FEATURES 44

AMBIGUOUS MATCHES 45

MATCHING SIFT DESCRIPTORS 46