Key properties of local features

Similar documents
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Scale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade

Scale Invariant Feature Transform

Scale Invariant Feature Transform

The SIFT (Scale Invariant Feature

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Local Features: Detection, Description & Matching

Computer Vision for HCI. Topics of This Lecture

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Local Features Tutorial: Nov. 8, 04

Motion Estimation and Optical Flow Tracking

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

SIFT: Scale Invariant Feature Transform

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Implementing the Scale Invariant Feature Transform(SIFT) Method

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Obtaining Feature Correspondences

CAP 5415 Computer Vision Fall 2012

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

SIFT - scale-invariant feature transform Konrad Schindler

School of Computing University of Utah

Scale Invariant Feature Transform by David Lowe

Feature Based Registration - Image Alignment

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Outline 7/2/201011/6/

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Motion illusion, rotating snakes

Feature Detection and Matching

Local Feature Detectors

Object Recognition with Invariant Features

Local features: detection and description. Local invariant features

2D Image Processing Feature Descriptors

Local invariant features

Distinctive Image Features from Scale-Invariant Keypoints

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

A Novel Algorithm for Color Image matching using Wavelet-SIFT

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Local Image Features

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Local Image Features

Lecture 10 Detectors and descriptors

Distinctive Image Features from Scale-Invariant Keypoints

Local features: detection and description May 12 th, 2015

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Verslag Project beeldverwerking A study of the 2D SIFT algorithm

Local features and image matching. Prof. Xin Yang HUST

Distinctive Image Features from Scale-Invariant Keypoints

CS 558: Computer Vision 4 th Set of Notes

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Ulas Bagci

Problems with template matching

Corner Detection. GV12/3072 Image Processing.

Scott Smith Advanced Image Processing March 15, Speeded-Up Robust Features SURF

Feature descriptors and matching

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

SURF: Speeded Up Robust Features. CRV Tutorial Day 2010 David Chi Chung Tam Ryerson University

Object Detection by Point Feature Matching using Matlab

Click to edit title style

A Comparison of SIFT, PCA-SIFT and SURF

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Image Matching. AKA: Image registration, the correspondence problem, Tracking,

Chapter 3 Image Registration. Chapter 3 Image Registration

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Designing Applications that See Lecture 7: Object Recognition

Prof. Feng Liu. Spring /26/2017

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Patch-based Object Recognition. Basic Idea

EE795: Computer Vision and Intelligent Systems

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Image features. Image Features

A Comparison and Matching Point Extraction of SIFT and ISIFT

A Novel Extreme Point Selection Algorithm in SIFT

How and what do we see? Segmentation and Grouping. Fundamental Problems. Polyhedral objects. Reducing the combinatorics of pose estimation

3D Reconstruction From Multiple Views Based on Scale-Invariant Feature Transform. Wenqi Zhu

CS4670: Computer Vision

SIFT Descriptor Extraction on the GPU for Large-Scale Video Analysis. Hannes Fassold, Jakub Rosner

Local Patch Descriptors

AK Computer Vision Feature Point Detectors and Descriptors

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

III. VERVIEW OF THE METHODS

Eppur si muove ( And yet it moves )

IMAGE-GUIDED TOURS: FAST-APPROXIMATED SIFT WITH U-SURF FEATURES

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

THE ANNALS OF DUNAREA DE JOS UNIVERSITY OF GALATI FASCICLE III, 2007 ISSN X ELECTROTECHNICS, ELECTRONICS, AUTOMATIC CONTROL, INFORMATICS

Pictures at an Exhibition: EE368 Project

ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012

CS 556: Computer Vision. Lecture 3

Stereoscopic Images Generation By Monocular Camera

FESID: Finite Element Scale Invariant Detector

A SIFT Descriptor with Global Context

Image Mosaic with Rotated Camera and Book Searching Applications Using SURF Method

Stitching and Blending

VK Multimedia Information Systems

SURF applied in Panorama Image Stitching

Multi-modal Registration of Visual Data. Massimiliano Corsini Visual Computing Lab, ISTI - CNR - Italy

Transcription:

Key properties of local features Locality, robust against occlusions Must be highly distinctive, a good feature should allow for correct object identification with low probability of mismatch Easy to etract and match, efficiency Quantity: many features from small objects Invariance to: noise changes in illumination scaling rotation viewing direction (to a certain etent)

Scale Invariant Feature Transform (SIFT) Well engineered local features, designed to address these requirements Proposed by David Lowe Distinctive Image Features from Scale Invariant Features, International Journal of Computer Vision, Vol. 60, No. 2, 2004, pp. 91-110 Slides adapted from D.Lowe and Cordelia Schmid, Recognition and Matching based on local invariant features, CVPR 2003.

Steps Scale-space etrema detection Keypoint localization Orientation assignment Generation descriptors

Finding Keypoints scale matters (an intuitive and informal eplanation)

Finding keypoints Detect points in which the Laplacian (of Gaussian) of the image is large (Lindeberg 1994) This has been eperimentally shown to be a stable image feature compared to gradients, Hessian or Harris corner detection functions To achieve invariance to scale, use the normalized LoG: 2 2 G G σ σ = σ + = 2 2 y σ LoG norm 2 G

Build a scale space, each octave is a collection of images smoothed in sequence: σ σ σ 2, k, k,... When we reach a new octave (sigma doubles), downsample and start a new octave If each octave is made of s intervals: 1/ k =2 s Adjacent image scales are subtracted to produce the difference-of-gaussian images

Recall that the Laplacian of Gaussian can be approimated as a difference of two Gaussians Efficient, smoothed images L can be computed efficiently D is then computed as simple image subtraction L(, y, σ) = G(, y, σ) I(, y) 2 D(, y, σ) = ( k 1) σ G I(, y) ( G(, y, kσ) G(, y, σ)) I(, y) = L(, y, kσ) L(, y, σ) 0.08 0.06 0.04 0.02 0 σ σ 2 1 σ 1 > σ 2-0.02-0.04 0 5 10 15 20 25 30 35 40 45 50

Graphically Downsample Convolve with Gaussian

Local etrema detection Find local etrema (ma or min) of DoG Compare each point with its eight neighbors at the same scale and nine neighbors at two nearby scales $ = (, y, σ)

Improve reliability For each candidate keypoint Discard low contrast keypoints Discard edges, difference-of-gaussian has strong response to edges, unfortunately edges are unstable (due to translation or noise)

D( $ ) < t low contrast point Check principle curvature of D(), edges will have large principal curvature across edge but small one in the perpendicular direction H D D y = Dy Dyy estimating using differences the eigenvalues of H are proportional to the principle curvatures of D, as for the Harris corner detector we can avoid computing the eigenvalues

tr( H) = D + D = λ + λ yy 1 2 ( D ) det( H) = D D = λ λ yy y check that det( H) > 0 2 1 2 then, 2 tr( H) det( H) r = λ 1 2 ( λ + λ ) ( r + 1) 1 2 = = λ λ λ 1 2 2 2 r 2 tr( H) increases with r, pick ma value of det( H) 2 tr( H) discard points for which < det( H ) ( r$ + 1) 2 r$ r$ ( r$ + 1) r$ 2

Orientation assignment For the selected keypoint, at the closest scale: compute a gradient orientation histogram determine dominant orientation assign this orientation to the keypoint ))) 1, ( ) 1, ( 1)) /(, ( 1), ( arctan(( ), ( 1)), ( 1), ( ( )) 1, ( ) 1, ( ( ), ( 2 2 y L y L y L y L y y L y L y L y L y m + + = + + + = θ

Orientation assignment: technicalities Histogram size is 36 bins Use magnitude of gradient as weight in the histogram, multiplied by Gaussian weights Given scale s, use Gaussian with sigma=1.5*s Detect highest peak and any other local peak within 80% of the highest peak these local peaks generate etra keypoints (eperimentally this happens 15% of the times) Fit a parabola to 3 points in the histograms closest to the match, and re-detect maimum for better accuracy

Descriptor Thresholded image gradients are sampled over 1616 array of locations in scale space Create array of orientation histograms To achieve orientation invariance, orientations are rotated relative to the keypoint orientation (prev. slide) 8 orientations 44 histogram array = 128 dimensions Below: schematic representation (in this case of a 22 descriptor from a 88 array of locations)

Orientations are weighted with Gaussian sigma=scale/2 Brightness change will not affect gradients, since they are computed from piel differences A change in image contrast, in which all piels of an image are multiplied by a constant, will multiply gradients of the same amount To reduce the effect of illumination change, keypoint descriptors are normalized to unit length To reduce artifacts due to saturation: remove large gradients after first normalization (>0.2), and renormalize the result

Eample of keypoint detection (a) 233189 image (b) 832 DOG etrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures

Keypoint matching The best match for each keypoint is found as the nearest neighbor in a database of SIFT features from training images Use Euclidian distance between descriptors k1(), l l [1,128] k2(), l l [1,128] d( k1, k2) = k1 k2 2 How do we discard features that do not have a good match? Pick a global threshold? Lowe suggests using ratio of nearest neighbor to ratio of second nearest neighbor This measures performs well: correct matches need to have closest match significantly closer than the closest incorrect match False match: there is likely to be a number of other false matches within similar distances due to the high dimensionality of the feature space

Matching: k ( l) compute : d sort k i i,j 1st = k, k j 2nd score = i = 1,...M k i k depending on, k 3rd d d j... i, 1st i, 2nd, j i consider good match if score d i,j > 0.8

Curiosity: real implementation To avoid ehaustive search Lowe proposes an approimate algorithm called Best-Bin-First, which returns the closest neighbor with high probability Priority search, with cut-off after checking 200 first candidates For a database of 100K keypoints speed up of 2 orders of magnitude, with less than 5% of correct match loss

Conclusions SIFT features are reasonably invariant to rotation, scaling, and illumination changes We can use them for matching and object recognition among other things Robust to occlusion, as long as we can see at least 3 features from the object we can compute the location and pose Efficient on-line matching, recognition can be performed in close-to-real time (at least for small object databases)

Empirical evaluation of the parameters In the paper most parameters are determined empirically 32 real images, outdoor scenes, human faces, aerial photographs, industrial images Range of synthetic transformation, rotation, scaling, affine stretch, change in brightness, contrast, noise Because transformations were synthetic, it was possible to know where each feature will be in the transformed image and compare the result of the match Check repeatability: # of keypoints that are detected after the transformation, in the correct location and scale # of keypoints that are successfully matched using the nearest descriptor technique details in Lowe 2004 IJCV

1) Frequency of sampling in scale best value appears to be 3 scales per octave

2) Smoothing for each octave (sigma) σ = 1.6 (compromise bw performance and efficiency)

Size of the descriptor: r number of orientations in the histograms n the width of the nn array of orientation histograms this gives a rnn descriptor vector Eamples: 822=32 844=128,

3) Size of the descriptor r 8 orientations 44 128 dimensions

4) Threshold 0.8 is a threshold that eliminates 90% of false matches at the cost of discarding only 5% of good matches

Results

Location recognition

3D Object Recognition

3D Object Recognition

Recognition under occlusion

Improvements

Resa B mple lur Subtrac t Key point localization Refine maima and minima detection Fit a quadratic to surrounding values for sub-piel and sub-scale interpolation (Brown & Lowe, 2002) Taylor epansion around point: $ $ $ T 1 T D( + ) = D( ) + D( ) + H ( $ ) 2 = (, y, σ ) T Offset of etremum (use finite differences for derivatives): ( ) $ 1 = H ( ) D $ D and its derivatives are evaluated around the same point, here is the offset

Speeded-Up Robust Features (SURF) H.Bay, A.Ess, T.Tuytelaars, L. Van Gool, Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346--359, 2008 Similar to SIFT but faster and more robust (according to the paper) Available in OpenCV

Compute: L L y H(, σ) = Ly Lyy 2 L = g I,... 2 SURF Keypoints in practice compute an approimation Search maima, solving:, σ { H σ } argma det( (, ))

Idea: gaussians are in practice discretized and cropped Compute an approimation of H using bo filters 2 2 g g, 2 2 y 2 g y Bo filters can be computed efficiently using the integral image (net) Computing these filters is fast irrespectively of the size, no need to perform pyramid decomposition and downsampling

Instead of down-sampling the image, up-scale the filters

Descriptor Similar to SIFT, describe distribution of intensities Compute orientations using squared filters (called Haar wavelets) The dimension of the descriptor is 64, to speed up matching

From scale s of the keypoint Firstly compute descriptor principle orientation, compute response to Haar wavelets in neighborood of 6s points (size of filters 4s points) Weight responses using a Gaussian sigma=2s The response each filter d,dy is represented in a,y plane slide a window of size pi/3 compute the sum of d and dy within the windows take the maimum

Descriptor Size of the decriptor is 444=64

Integral Image The integral image representation allows for fast computation of rectangular twodimensional image features Originally proposed by Viola & Jones (2001) for face detection Used in the SURF algorithm to compute bo filters at different scales

Eamples of features

Integral Image = y y y i y ii ', ' ) ' ', ( ), ( The integral image at location,y contains the sum of all piels above and left of,y: Computed once, from a single pass: 0 ) 1, ( 0 1), ( ), ( ) 1, ( ), ( ), ( 1), ( ), ( = = + = + = y ii s y s y ii y ii y i y s y s ) 1, ( y ii 1), ( y s ), ( y i

The sum of the piels within rectangle D can be computed with four array references: D = ii(4) + ( A + B + C ii(1) + D) + ( ii(2) A + ii(3)) ( B + A + = C + A)