Class 2: Low-level Representation Liangliang Cao, Jan 31, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition And Search 1
For students who have submitted HW0: Form a group (3 people) before Feb. 7 Teamwork on presentations and projects Project proposal due in Feb (date to be announced) For others Notice Please drop the class if you haven t learned Computer Vision or Learning before If you think you have strong vision background, talk to the instructor after the class You are welcome to audit the class Visual Recognition And Search 2
Development of Low Level Features Classical features Raw pixel Histogram feature Color Histogram Edge histogram Frequency analysis Image filters Texture features LBP Scene features GIST Shape descriptors Edge detection Corner detection 1999 SIFT HOG SURF DAISY BRIEF DoG Hessian detector Laplacian of Harris FAST ORB Local Descriptors Local Detectors Visual Recognition And Search 3
Outline Traditional features Raw Pixels and Histograms Image Filters Classic local feature: SIFT SIFT Descriptor SIFT Detector More recent local features SURF Binary Local Features Hands on Instruction Visual Recognition And Search 4
Outline Traditional features Raw Pixels and Histograms Image Filters Classic local feature: SIFT SIFT Descriptor SIFT Detector More recent local features SURF Binary Local Features Hands on Instruction Visual Recognition And Search 5
Traditional Low Level Features Raw Pixels and Histograms Visual Recognition And Search 6
Raw Pixels and Histograms Concatenating Raw Pixels As 1D Vector Credit: The Face Research Lab Visual Recognition And Search 7
Raw Pixels and Histograms Concatenated Raw Pixels Famous applications (widely used in ML field) Face recognition Hand written digits Pictures courtesy to Sam Roweis Visual Recognition And Search 8
Raw Pixels and Histograms Click to Tiny edit Images title style Antonio Torralba et al proposed to resize images to 32x32 color thumbnails, which are called tiny images Related applications Scene recognition Object recognition Fast speed with limited accuracy Torralba et al, MIT CSAIL report, 2007 Visual Recognition And Search 9
Raw Pixels and Histograms Problem of raw-pixel based representation Rely heavily on good alignment Assume the images are of similar scale Suffer from occlusion Recognition from different view point will be difficult We want more powerful features for real-world problems like the following Visual Recognition And Search 10
Raw Pixels and Histograms Click to Color edit Histogram title style Each pixel is described Distribution of color vectors by a vector of pixel values is described by a histogram r g b Note: There are different choices for color space: RGB, HSV, Lab, etc. For gray images, we usually use 256 or fewer bins for histogram. [Swain and Ballard 91] Visual Recognition And Search 11
Raw Pixels and Histograms Benefits of Histogram Representation No longer sensitive to alignment, scale transform, or even global rotation. Similar color histograms (after normalization). Visual Recognition And Search 12
Raw Pixels and Histograms Limitation of Global Histogram Global histogram has no location information at all They re equal in terms of global histogram Example courtesy to Erik Learned-Miller Visual Recognition And Search 13
Raw Pixels and Histograms Histogram with Spatial Layout Concatenated histogram for each region. Example courtesy to Erik Learned-Miller Visual Recognition And Search 14
Raw Pixels and Histograms IBM IMARS Spatial Gridding Visual Recognition And Search 15
Raw Pixels and Histograms Spatial Pyramid Matching Lazebnik, Schmid and Ponce, CVPR 06 Visual Recognition And Search 16
Raw Pixels and Histograms General Histogram Representation Color histogram patch patch patch Extract color descriptor histogram accumulation General histogram patch patch patch Extract other descriptors edge histogram shape context histogram local binary patterns histogram accumulation Visual Recognition And Search 17
Raw Pixels and Histograms Local Binary Pattern (LBP) For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p s value and 1 otherwise. Represent the texture in the image (or a region) by the histogram of these numbers. 8 1 2 3 100 101 103 40 50 80 50 60 90 7 6 4 5 1 1 1 1 1 1 0 0 Ojala et al, PR 96, PAMI 02 Visual Recognition And Search 18
Raw Pixels and Histograms LBP Histogram Divide the examined window to cells (e.g. 16x16 pixels for each cell). Compute the histogram, over the cell, of the frequency of each "number" occurring. Optionally normalize the histogram. Concatenate normalized histograms of all cells. Visual Recognition And Search 19
Image Filters Visual Recognition And Search 20
Image Filters Haar Wavelet Haar-like Feature Haar like features Given two adjacent rectangular regions, sums up the pixel intensities in each region and calculates the difference between the two sums Application: face detection [Viola and Jones]: Efficiently processing via integral images Visual Recognition And Search 21
Image Filters Integral Image In integral image, each location saves the sum of gray scale pixel value of the sub-image to the left top corner. Cost four additions operation only Visual Recognition And Search 22
Image Filters Gabor Filtering Gabor filters are defined by a harmonic function multiplied by a Gaussian function. They reflect edges at different scales and spatial frequencies Widely used in fingerprint, iris, OCR, texture and face recognition. Computationally expensive. J. G. Daugman discovered that simple cells in the visual cortex can be modeled by Gabor functions Visual Recognition And Search 23
Scale-Invariant Feature Transform (SIFT) David G. Lowe - Distinctive image features from scale-invariant keypoints, IJCV 2004 - Object recognition from local scale-invariant features, ICCV 1999 Visual Recognition And Search 24
SIFT Why Local Features? Local Feature = Interest Point = Keypoint = Feature Point = Distinguished Region = Covariant Region ( ) local descriptor Local : robust to occlusion/clutter Invariant : to scale/view/illumination changes Local feature can be discriminant! Visual Recognition And Search 25
SIFT Why Local Features? (2) Visual Recognition And Search 26
SIFT How to Get Good Local Features Good descriptor Distinctiveness: is able to distinguish any objects Invariance: find the same object across different viewpoint/illumination change Good detector Locality: features are local, so robust to occlusion and clutter Quantity: many features can be generated for even small objects Visual Recognition And Search 27
SIFT Distinctive Descriptor Histogram of gradient orientation - Histogram is more robust to positioin than raw pixels - Edge gradient is more distinctive than color for local patches Concatenate histograms in spatial cells for better discriminancy Histogram of gradient orientation Concatenated histogram SIFT Descriptor [D. Lowe, 1999] Visual Recognition And Search 28
SIFT Distinctive Descriptor Compute image gradients at each location Histogram of the gradient orientation (8 bins) Concatenate histograms of 4 x 4 spatial grid Feature Dimension: 8 orientation x (4x4) spatial grid = 128 Visual Recognition And Search 29
SIFT Distinctive Descriptor David Lowe s excellent performance tuning: Good parameters: 4 ori, 4 x 4 grid Soft-assignment to spatial bins Gaussian weighting over spatial location Visual Recognition And Search 30
SIFT More Invariant to Illumination Changes Normalize the descriptor to norm one - A change in image contrast can change the magitude but not the orientation Reduce the influence of large gradient magnitudes -Threshold the values in the unit feature vector to be no larger than 0.2 - Renormalize the feature vector after thresholding. Visual Recognition And Search 31
SIFT Rotation Invariance Estimate the dominant orientation of each patch Rotate patch in dominant direction Each patch is associated with (X, Y, Scale, Orientation) Note: - Not every package provides such a procedure in SIFT implementation - Few existing work makes good use of the dominant orientation - With rotation invariance, SIFT may be less discriminant in general recognition Visual Recognition And Search 32
SIFT How to Get Good Local Features Good descriptor Distinctiveness: is able to distinguish any objects Invariance: find the same object across different viewpoint/illumination change Can we do better? Visual Recognition And Search 33
SIFT How to Get Good Local Features Good descriptor Distinctiveness: is able to distinguish any objects Invariance: find the same object across different viewpoint/illumination change Can we do better? More computational efficient, esp for mobile application More compact storage Combining SIFT with traditional features, e.g., LBP Improving poor color description: - SIFT is robust to illumination change, but color is not - SIFT in individual color channels (RGB): more expensive but slightly better Visual Recognition And Search 34
SIFT How to Get Good Local Features Good descriptor Distinctiveness: is able to distinguish any objects Invariance: find the same object across different viewpoint/illumination change Good detector Locality: features are local, so robust to occlusion and clutter Quantity: many features can be generated for even small objects Visual Recognition And Search 35
SIFT SIFT Detector: A First Impression SIFT employs A variant of Histogram of Gradient Orientation as descriptor DoG (Difference of Gaussians) as local region detector Hard to beat Relatively easier to beat Why? DoG has good theoretical support but may not necessarily reflect the requirements in practice Trade-off between invariance and number of detection: In many scenarios, combining multiple detectors -> good performance. Visual Recognition And Search 36
SIFT Detector Concerns for Detectors Detecting Corners Corners are located more robustly than edges Selecting Scales Invariant to scale changes Finding Affine Covariance Invariant to affine transform; matching ellipse with circles Only useful when there is such transform in your dataset! Visual Recognition And Search 37
SIFT Detector DoG Detector Detecting Corners Selecting Scales Finding Affine Covariance Detected region: (red circle) Figure courtesy to Matas and Mikolajczyk Visual Recognition And Search 38
SIFT Detector Difference of Gaussians Visual Recognition And Search 39
SIFT Detector Sampling Frequency in Scale More points are found as sampling frequency increases, but accuracy of matching decreases after 3 scales/octave Visual Recognition And Search 40
SIFT Detector Resample Blur Subtract DoG and Scale Space Theory Scale space representation is a family of the convolution of the given signal with Gaussian kernel = 0 = 1 = 4 = 16 Visual Recognition And Search 41
SIFT Detector DoG and Scale Space Theory (2) Let t =, which represents a "temperature of the intensity values of the image as "temperature distribution" in the image plane Then the scale-space family can be defined as the solution of the diffusion equation where the Gaussian kernel arises as the Green's function of this specific partial differential equation. In scale-space, the corner can be obtained by Laplacian of Gaussian where DoG is an efficient approximation of Laplacian of Gaussian. Visual Recognition And Search 42
SIFT Detector Resample Blur Subtract Accurate Key point localization Detect maxima and minima of difference-of-gaussian in scale space Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002) Taylor expansion around point: Offset of extremum (use finite differences for derivatives): Visual Recognition And Search 43
SIFT Detector Resample Blur Subtract Eliminate Unstable Detections To get stable detection, David Lowe rejects 1. Keypoints with low contrast 2. Keypoints with strong edge but not clear corners. For 1, we filter out keypoints with small For 2, those keypoints correspond to large principle curvature along the edge but small one in the perpendicular direction. To efficient compute the curvature ratio, we need only compute the trace and determinant of local hessian. Visual Recognition And Search 44
SIFT Detector Example of SIFT Keypoint Detection (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures Visual Recognition And Search 45
SIFT Detector How to Do Better than DoG? Combine more than one detectors Harris detector Hessian detector Dense-sampling instead of sparse detector Grid based sampling + multiple resolutions More computation, more storage, but better accuracy Visual Recognition And Search 46
SIFT Detector Harris Detector Scale selection: Laplacian of Harris detector Visual Recognition And Search 47
SIFT Detector Hessian Detector Visual Recognition And Search 48
Modern Local Features Speeded Up Robust Features (SURF) Bay, Tuytelaars, and Van Gool, SURF: Speeded Up Robust Features, ECCV 2006 Visual Recognition And Search 49
SURF Motivation SIFT is slow and expensive, especially for large scale and mobile applications Recall: Integral Image, a useful technique to compute Fast box filter (in detection) Fast Haar wavelet (in description) Visual Recognition And Search 50
SURF Integral Image Using integral images for major speed up Integral Image (summed area tables) is an intermediate representation for fast computation Second order derivative and Haar-wavelet response Cost four additions operation only Visual Recognition And Search 51
SURF Fast Detection Approximated second order derivatives with box filters (mean/average filter) Both box filters and following Haar wavelets can be computed with integral images. Visual Recognition And Search 52
SURF Fast Description Approximate gradient orientation with Haar wavelet 1. Split the interest region into 4 x 4 cell with 5 x 5 regularly spaced sample points inside 2. Compute gradient histogram with 4 bins 3. Sum the response over each sub-region for d x and d y separately dim= 64 4. In order to bring in information about the polarity of the intensity changes, extract the sum of absolute value of the responses dim= 128 5. Normalize the vector Visual Recognition And Search 53
SURF Compared with SIFT Visual Recognition And Search 54
SURF Compared with SIFT (con t) SURF is an order or magnitude (3~5 times) faster than SIFT In object recognition tasks, SURF obtains a comparable or sometimes a little worse results than SIFT (esp. with illumination change or viewpoint change). But we can see how traditional techniques (integral images) can help! Visual Recognition And Search 55
Binary Local Feature Descriptors Visual Recognition And Search 56
Binary Local Descriptors DAISY Tested different configurations Steerable filters Summation Normalization Descriptor quantization PCA dimensions Better performance than SIFT with 13 bytes descriptors (vs. 128 bytes for SIFT) Slide courtesy of Michele Merler S. Winder, G. Hua, M. Brown, Picking the Best DAISY, CVPR 09 Visual Recognition And Search 57
Binary Local Descriptors BRIEF Identify a 31x31 neighborhood for each interest point For each interest point, randomly create 256 pairs patches (patch size 9 x 9) For each pair a, b, if a>b, set the bit to 1, else set 0 The descriptor: 256 bit vector Calonder, Lepetit, Strecha, BRIEF: binary robust independent elementary features, ECCV 2010 Visual Recognition And Search 58
Binary Local Descriptors ORB FAST detector Interest points are detected as >= 12 contiguous pixel brighter than P in a ring of radius 3 around P. BRIEF detector ~100x faster than SIFT, good viewpoint invariance Rublee et al, ORB: an efficient alternative to SIFT or SURF, ICCV 11 Visual Recognition And Search 59
Binary Local Descriptors BRISK Combination of SIFT-like scale-space detection and BRIEF-like descriptor Leutenegger, Chli, Siegwart, BRISK: Binary Robust Invariant Scalable Keypoints, ICCV 2011 Visual Recognition And Search 60
Hands-on Experience Most of slides in this section are courtesy to Andrea Vedaldi Visual Recognition And Search 61
Existing softwares VLFeat http://www.vlfeat.org/ OpenCV http://opencv.org/ Amsterdam Color Descriptor http://koen.me/research/colordescriptors/ Oxford VGG http://www.robots.ox.ac.uk/~vgg/research/affine/ Visual Recognition And Search 62
Existing softwares VLFeat http://www.vlfeat.org/ OpenCV http://opencv.org/ Amsterdam Color Descriptor http://koen.me/research/colordescriptors/ Oxford VGG http://www.robots.ox.ac.uk/~vgg/research/affine/ Visual Recognition And Search 63
VLFeat SIFT Detector and Descriptor Visual Recognition And Search 64
VLFeat What Does Frame Mean? Visual Recognition And Search 65
VLFeat Other Corner Detectors Visual Recognition And Search 66
VLFeat Other Corner Detectors Visual Recognition And Search 67
VLFeat Dominant Orientation Visual Recognition And Search 68
VLFeat Compute Descriptors Visual Recognition And Search 69
VLFeat Custom Frames Visual Recognition And Search 70
OpenCV List of Options - OpenCV 2.1: Harris, Tomasi&Shi, MSER, DoG, SURF, STAR and FAST, DoG + SIFT, SURF, HARRIS, GoodFeaturesToTrack - OpenCV 2.4: ORB, BRIEF, and FREAK Visual Recognition And Search 71
OpenCV An Example (Extracting SIFT) Courtesy to Unapiedra@stackoverflow Visual Recognition And Search 72
Next class: How to Use SIFT for Object Classification Todo before next class - Form a group and email the instructors your group information - Read the three papers on sparse coding and prepare for the discussion Any volunteer? Reminder: do not forget to drop the class if you don t plan to attend! Visual Recognition And Search 73
Visual Recognition And Search 74
A Demo Photo Tourism Snavely, Seitz and Szeliski, Photo tourism: Exploring photo collections in 3D, SIGGRAPH, 2006 Snavely, Seitz and Szeliski, Modeling the world from Internet photo collections, IJCV 2007 Visual Recognition And Search 75
Visual Recognition And Search 76
Photo Tourism Detect SIFT Visual Recognition And Search 77
Photo Tourism Detect SIFT Visual Recognition And Search 78
Photo Tourism Match SIFT Visual Recognition And Search 79
Photo Tourism From Correspondence To Geometry Visual Recognition And Search 80
Do You Want Another Video? Visual Recognition And Search 81
Build Rome in A Day Agarwal et al, Building Rome in a Day, ICCV, 2009 Visual Recognition And Search 82
Visual Recognition And Search 83