CAP5415- Computer Vision Lecture 5 and 6- Finding Features, Affine Invariance, SIFT Ulas Bagci bagci@ucf.edu 1
Outline Concept of Scale Pyramids Scale- space approaches briefly Scale invariant region selecqon SIFT: an image region descriptor David Lowe, IJCV 2004 paper [Please read it!] 2
Reminder: MoQvaQon Image Matching Fundamental aspect of many problems Object RecogniQon 3D Structures Stereo Correspondence MoQon Tracking. 3
Reminder: MoQvaQon Image Matching Fundamental aspect of many problems Object RecogniQon 3D Structures Stereo Correspondence MoQon Tracking. What are the desired features to conduct these tasks? 4
Review of the Corner DetecQon Corners are invariant to translaqon rotaqon scaling 5
SCALE The extrema in a signal and its first a few derivaqves provide a useful general purpose descripqon for many kinds of signals Edges Corners 6
SCALE The extrema in a signal and its first a few derivaqves provide a useful general purpose descripqon for many kinds of signals Edges Corners In physics, objects live on a range of scales. 7
SCALE The extrema in a signal and its first a few derivaqves provide a useful general purpose descripqon for many kinds of signals Edges Corners In physics, objects live on a range of scales. Neighborhood operaqons can only extract local features (at a scale of at most a few pixels), however, images contain informaqon at larger scales 8
Scale- Space Any given image can contain objects that exist at scales different from other objects in the same image 9
Scale- Space Any given image can contain objects that exist at scales different from other objects in the same image Or, that even exist at mulqple scales simultaneously 10
Scale- Space Any given image can contain objects that exist at scales different from other objects in the same image Or, that even exist at mulqple scales simultaneously 11
MulQ- scale RepresentaQon Image Small Scale Features Large Scale Features Smaller Filter Masks Larger Filter Masks 12
MulQ- scale RepresentaQon Image Small Scale Features Large Scale Features Smaller Filter Masks Larger Filter Masks ComputaQonal Cost increases! Doubling the scale leads to four- fold increase in the number of operaqons in 2D. 13
MulQ- Scale RepresentaQon: Pyramid Pyramid is one way to represent images in mulq- scale Pyramid is built by using mulqple copies of image. Each level in the pyramid is 1/4 of the size of previous level. The lowest level is of the highest resoluqon. The highest level is of the lowest resoluqon. 14
MulQ- Scale RepresentaQon: Pyramid 15
Pyramid can capture global and local features 16
Gaussian Pyramids 17 Source: Forsyth
Laplacian Pyramids 18 Source: Forsyth
Laplacian Pyramids 1-2 1-2 1 1 Laplacian of Gaussian (LoG): Gaussian first to smooth images then perform Laplacian operaqon 5 2 (G I) =I 5 2 G 19
Laplacian Pyramids 20
Laplacian Pyramids G x (x, y) = x 2 4 e (x 2 +y 2 )/(2 2 ) 21
Laplacian Pyramids G x (x, y) = x 2 4 e (x 2 +y 2 )/(2 2 ) Repeat same derivaqve for y component, and then take second derivaqves. 5 2 G (x, y) = 1 2 4 (x 2 + y 2 2 2 ) 2 e (x2 +y 2 )/(2 2 ) 22
Laplacian Pyramid ConstrucQon 23
Laplacian Pyramid ConstrucQon 24
DecimaQon and InterpolaQon Lowpass filter (i.e., Gaussian) 25
DecimaQon and InterpolaQon Lowpass filter (i.e., Gaussian) y(n) =x(n) h(n) = X k h(k)x(n k) 26
DecimaQon and InterpolaQon Lowpass filter (i.e., Gaussian) y(n) =x(n) h(n) = X k h(k)x(n k) z(n) =y(2n) 27
DecimaQon and InterpolaQon Lowpass filter (i.e., Gaussian) y(n) =x(n) h(n) = X k h(k)x(n k) z(n) =y(2n) z(n) = X k h(k)x(2n k) 28
Difference of Gaussian (DoG) It is a common approximaqon of LoG (bejer run Qme). 29
Difference of Gaussian (DoG) It is a common approximaqon of LoG (bejer run Qme). D, (x, y) =L(x, y, ) L(x, y, ) 30
Difference of Gaussian (DoG) It is a common approximaqon of LoG (bejer run Qme). D, (x, y) =L(x, y, ) L(x, y, ) It is the difference between a blurred copy of image I and an even more blurred copy of I. 31
Difference of Gaussian (DoG) It is a common approximaqon of LoG (bejer run Qme). D, (x, y) =L(x, y, ) L(x, y, ) It is the difference between a blurred copy of image I and an even more blurred copy of I. 5G (x, y) G (x, y) G (x, y) ( 1) 2 32
Scale SelecQon- Automated Find scale that gives local maxima of some function f in both position and scale. f Image 1 f Image 2 K. Grauman, s 1 region size s 2 33 region size
Good FuncQon? A good function for scale detection: has one stable sharp peak f bad f bad f Good! region size region size region size For usual images: a good funcqon would be a one which responds to contrast (sharp local intensity change) 34
Patch Size Corresponding to Scale f ( Ii σ 1 im i1 im ( x, σ )) = f ( I ( x, )) How to find corresponding patch sizes? 35
Lecture 5&6 Finding Features, Affine Invariance, SIFT Patch Size Corresponding to Scale f ( I i1 im ( x, σ )) K. Grauman, B. Leibe f ( I i1 im ( x, σ )) 36
Lecture 5&6 Finding Features, Affine Invariance, SIFT Patch Size Corresponding to Scale f ( I i1 im ( x, σ )) K. Grauman, B. Leibe f ( I i1 im ( x, σ )) 37
Lecture 5&6 Finding Features, Affine Invariance, SIFT Patch Size Corresponding to Scale f ( I i1 im ( x, σ )) K. Grauman, B. Leibe f ( I i1 im ( x, σ )) 38
Patch Size Corresponding to Scale f ( I (, )) i im 1 x i i σ m f ( I 1 ( x, σ )) K. Grauman, B. Leibe 39
Lecture 5&6 Finding Features, Affine Invariance, SIFT Patch Size Corresponding to Scale f ( I i1 im ( x, σ )) K. Grauman, B. Leibe f ( I i1 im ( x, σ )) 40
Lecture 5&6 Finding Features, Affine Invariance, SIFT Patch Size Corresponding to Scale FuncQon responses for increasing scale (scale signature) f ( I i1 im ( x, σ )) K. Grauman, B. Leibe f ( I i1 im ( x, σ )) 41
NormalizaQon 42
Scale Invariant Feature Transform (SIFT) Lowe., D. 2004, IJCV cited > 25K 43
Scale Invariant Feature Transform (SIFT) Shows existence, importance, and value of invariant types of detector Demonstrates the richness that feature descriptors can bring to feature matching Instead of using LoG (Laplacian of Gaussian) like in Harris and Hessian- based operators, SIFT uses DoG (Difference of Gaussian). 44
Scale Invariant Feature Transform (SIFT) Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters 45
Scale Invariant Feature Transform (SIFT) Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters 46
Overall Procedure at a High Level Scale- Space Extrema DetecQon Keypoint LocalizaQon Search over mulqple scales and image locaqons Fit a model to determine locaqon and scale. Select Keypoints based on a measure of stability. OrientaQon Assignment Compute best orientaqon(s) for each keypoint region. Keypoint DescripQon Use local image gradients at selected scale and rotaqon to describe each keypoint region. 47
Basic idea: SIFT Descriptor Take n x n (i.e., n=16) square window (around a feature/interest point) Divide them into m x m (i.e, m=4) cells Compute gradient orientation for each cell Create histogram over edge orientations weighted by magnitude 0 2 angle histogram 48
SIFT Descriptor 16 histograms x 8 orientaqons = 128 features 49
SIFT Descriptor Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Threshold normalize the descriptor: such that: 0.2 50
ProperQes of SIFT Extraordinarily robust matching technique Can handle changes in viewpoint Can handle significant changes in illuminaqon Efficient (real Qme) Lots of code available hjp://people.csail.mit.edu/albert/ladypack/wiki/index.php/known_implementaqons_of_sift 51
Revisit SIFT Steps (1) Scale-space extrema detection Extract scale and rotation invariant interest points (i.e., keypoints). (2) Keypoint localization Determine location and scale for each interest point. Eliminate weak keypoints (3) Orientation assignment Assign one or more orientations to each keypoint. (4) Keypoint descriptor Use local image gradients at the selected scale. 52
(1) Scale- Space Extrema DetecQon We want to find points that give us informaqon about the objects in the image. The informaqon about the objects is in the object s edges. We will represent the image in a way that gives us these edges as this representaqons extrema points. 53
(1) Scale- Space Extrema DetecQon Harris-Laplace scale Find local maxima of: Harris detector in space LoG in scale y Harris x LoG SIFT scale Find local maxima of: Hessian in space DoG in scale y DoG Hessian x 54
(1) Scale- Space Extrema DetecQon 55
(1) Scale- Space Extrema DetecQon ( 2 σ ) D k D( kσ ) Extract local extrema (i.e., minima or maxima) in DoG pyramid. - Compare each point to its 8 neighbors at the same level, 9 neighbors in the level above, and 9 neighbors in the level below (i.e., 26 total). D( σ ) 56
(1) Scale- Space Extrema DetecQon Laplacian of Gaussian kernel Scale normalized (x by scale 2 ) Proposed by Lindeberg Scale-space detection Find local maxima across scale/space A good blob detector 57
(2) Keypoint LocalizaQon There are still a lot of points, some of them are not good enough. The locations of keypoints may be not accurate. Eliminating edge points. 58
(2) Keypoint LocalizaQon Reject (1) points with low contrast (flat) (2) poorly localized along an edge (edge)
(2) Keypoint LocalizaQon Reject (1) points with low contrast (flat) (2) poorly localized along an edge (edge) if Tr(H) H < (r + 1)2 r Reject Where r = 1 2 and pracqcally SIFT uses r=10.
Inaccurate Keypoint LocalizaQon Poor contrast True Extrema Detected Extrema Sampling x 61
Inaccurate Keypoint LocalizaQon The SoluQon: Taylor expansion:! D x D! x T! 1 2 2 D! 2 x ( ) T = D + x + x x Minimize to find accurate extrema: 2 D 1 D x ˆ =! 2! x x If offset from sampling point is larger than 0.5 - Keypoint should be in a different sampling point.! T! Brown & Lowe 2002 62
Local extremas 63
Local extremas Remove low contrast features 64
Local extremas Remove low edges 65
SIFT Descriptor 66
(3) OrientaQon Assignment Create histogram of gradient directions, within a region around the keypoint, at selected scale: L(x, y,σ ) = G(x, y,σ )* I(x, y) m(x, y) = (L(x +1, y) L(x 1, y)) 2 + (L(x, y +1) L(x, y 1)) 2 θ(x, y) = a tan2((l(x, y +1) L(x, y 1)) / (L(x +1, y) L(x 1, y))) 36 bins (i.e., 10 o per bin) 0 2 pi Histogram entries are weighted by (i) gradient magnitude and (ii) a Gaussian funcqon with σ equal to 1.5 Qmes the scale of the keypoint. 67
(3) OrientaQon Assignment 68
(3) OrientaQon Assignment 69
(3) OrientaQon Assignment 70
(3) OrientaQon Assignment 71
(3) OrientaQon Assignment 72
(4) Keypoint Descriptor Partial Voting: distribute histogram entries into adjacent bins (i.e., additional robustness to shifts) Each entry is added to all bins, multiplied by a weight of 1-d, where d is the distance from the bin it belongs. 73
(4) Keypoint Descriptor Descriptor depends on two main parameters: (1) number of orientaqons n x n array of orientaqon histograms 74
EvaluaQng Results How can we measure the performance of a feature matcher? 1 the matcher correctly found a match # true positives matched # true positives 0.7 true positive rate features that really do have a match 0 0.1 false positive rate 1 # false positives matched # true negatives the matcher said yes when the right answer was no 75
EvaluaQng Results How can we measure the performance of a feature matcher? 1 ROC curve ( Receiver Operator Characteristic ) the matcher correctly found a match # true positives matched # true positives 0.7 true positive rate features that really do have a match 0 0.1 false positive rate 1 # false positives matched # true negatives the matcher said yes when the right answer was no 76
Change of IlluminaQon Change of brightness => doesn t effect gradients (difference of pixels value). Change of contrast => doesn t effect gradients (up to normalization). Saturation (non- linear change of illumination) => affects magnitudes much more than orientation. => Threshold gradient magnitudes to 0.2 and renormalize. 77
IlluminaQon invariance? 1. Before you start, you can normalize the images. 2. Difference based metrics can be used (Haar, SIFT, ) 78
79
Extract features 80
Extract features Compute putamve matches 81
Extract features Compute putamve matches Loop: Hypothesize transformaqon T (small group of putaqve matches that are related by T) 82
Extract features Compute putamve matches Loop: Hypothesize transformaqon T (small group of putaqve matches that are related by T) Verify transformaqon (search for other matches consistent with T) 83
Extract features Compute putamve matches Loop: Hypothesize transformaqon T (small group of putaqve matches that are related by T) Verify transformaqon (search for other matches consistent with T) 84
For training images: Object RecogniQon ExtracQng keypoints by SIFT. CreaQng descriptors database. For query images: ExtracQng keypoints by SIFT. For each descriptor - finding nearest neighbor in DB. Finding cluster of at- least 3 keypoints. Performing detailed geometric fit check for each cluster. 85
RecogniQon/Matching 86
Image RegistraQon 87