Click to edit title style - PDF Free Download

Class 2: Low-level Representation Liangliang Cao, Jan 31, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition And Search 1

For students who have submitted HW0: Form a group (3 people) before Feb. 7 Teamwork on presentations and projects Project proposal due in Feb (date to be announced) For others Notice Please drop the class if you haven t learned Computer Vision or Learning before If you think you have strong vision background, talk to the instructor after the class You are welcome to audit the class Visual Recognition And Search 2

Development of Low Level Features Classical features Raw pixel Histogram feature Color Histogram Edge histogram Frequency analysis Image filters Texture features LBP Scene features GIST Shape descriptors Edge detection Corner detection 1999 SIFT HOG SURF DAISY BRIEF DoG Hessian detector Laplacian of Harris FAST ORB Local Descriptors Local Detectors Visual Recognition And Search 3

Outline Traditional features Raw Pixels and Histograms Image Filters Classic local feature: SIFT SIFT Descriptor SIFT Detector More recent local features SURF Binary Local Features Hands on Instruction Visual Recognition And Search 4

Traditional Low Level Features Raw Pixels and Histograms Visual Recognition And Search 6

Raw Pixels and Histograms Concatenating Raw Pixels As 1D Vector Credit: The Face Research Lab Visual Recognition And Search 7

Raw Pixels and Histograms Concatenated Raw Pixels Famous applications (widely used in ML field) Face recognition Hand written digits Pictures courtesy to Sam Roweis Visual Recognition And Search 8

Raw Pixels and Histograms Click to Tiny edit Images title style Antonio Torralba et al proposed to resize images to 32x32 color thumbnails, which are called tiny images Related applications Scene recognition Object recognition Fast speed with limited accuracy Torralba et al, MIT CSAIL report, 2007 Visual Recognition And Search 9

Raw Pixels and Histograms Problem of raw-pixel based representation Rely heavily on good alignment Assume the images are of similar scale Suffer from occlusion Recognition from different view point will be difficult We want more powerful features for real-world problems like the following Visual Recognition And Search 10

Raw Pixels and Histograms Click to Color edit Histogram title style Each pixel is described Distribution of color vectors by a vector of pixel values is described by a histogram r g b Note: There are different choices for color space: RGB, HSV, Lab, etc. For gray images, we usually use 256 or fewer bins for histogram. [Swain and Ballard 91] Visual Recognition And Search 11

Raw Pixels and Histograms Benefits of Histogram Representation No longer sensitive to alignment, scale transform, or even global rotation. Similar color histograms (after normalization). Visual Recognition And Search 12

Raw Pixels and Histograms Limitation of Global Histogram Global histogram has no location information at all They re equal in terms of global histogram Example courtesy to Erik Learned-Miller Visual Recognition And Search 13

Raw Pixels and Histograms Histogram with Spatial Layout Concatenated histogram for each region. Example courtesy to Erik Learned-Miller Visual Recognition And Search 14

Raw Pixels and Histograms IBM IMARS Spatial Gridding Visual Recognition And Search 15

Raw Pixels and Histograms Spatial Pyramid Matching Lazebnik, Schmid and Ponce, CVPR 06 Visual Recognition And Search 16

Raw Pixels and Histograms General Histogram Representation Color histogram patch patch patch Extract color descriptor histogram accumulation General histogram patch patch patch Extract other descriptors edge histogram shape context histogram local binary patterns histogram accumulation Visual Recognition And Search 17

Raw Pixels and Histograms Local Binary Pattern (LBP) For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p s value and 1 otherwise. Represent the texture in the image (or a region) by the histogram of these numbers. 8 1 2 3 100 101 103 40 50 80 50 60 90 7 6 4 5 1 1 1 1 1 1 0 0 Ojala et al, PR 96, PAMI 02 Visual Recognition And Search 18

Raw Pixels and Histograms LBP Histogram Divide the examined window to cells (e.g. 16x16 pixels for each cell). Compute the histogram, over the cell, of the frequency of each "number" occurring. Optionally normalize the histogram. Concatenate normalized histograms of all cells. Visual Recognition And Search 19

Image Filters Visual Recognition And Search 20

Image Filters Haar Wavelet Haar-like Feature Haar like features Given two adjacent rectangular regions, sums up the pixel intensities in each region and calculates the difference between the two sums Application: face detection [Viola and Jones]: Efficiently processing via integral images Visual Recognition And Search 21

Image Filters Integral Image In integral image, each location saves the sum of gray scale pixel value of the sub-image to the left top corner. Cost four additions operation only Visual Recognition And Search 22

Image Filters Gabor Filtering Gabor filters are defined by a harmonic function multiplied by a Gaussian function. They reflect edges at different scales and spatial frequencies Widely used in fingerprint, iris, OCR, texture and face recognition. Computationally expensive. J. G. Daugman discovered that simple cells in the visual cortex can be modeled by Gabor functions Visual Recognition And Search 23

Scale-Invariant Feature Transform (SIFT) David G. Lowe - Distinctive image features from scale-invariant keypoints, IJCV 2004 - Object recognition from local scale-invariant features, ICCV 1999 Visual Recognition And Search 24

SIFT Why Local Features? Local Feature = Interest Point = Keypoint = Feature Point = Distinguished Region = Covariant Region ( ) local descriptor Local : robust to occlusion/clutter Invariant : to scale/view/illumination changes Local feature can be discriminant! Visual Recognition And Search 25

SIFT Why Local Features? (2) Visual Recognition And Search 26

SIFT How to Get Good Local Features Good descriptor Distinctiveness: is able to distinguish any objects Invariance: find the same object across different viewpoint/illumination change Good detector Locality: features are local, so robust to occlusion and clutter Quantity: many features can be generated for even small objects Visual Recognition And Search 27

SIFT Distinctive Descriptor Histogram of gradient orientation - Histogram is more robust to positioin than raw pixels - Edge gradient is more distinctive than color for local patches Concatenate histograms in spatial cells for better discriminancy Histogram of gradient orientation Concatenated histogram SIFT Descriptor [D. Lowe, 1999] Visual Recognition And Search 28

SIFT Distinctive Descriptor Compute image gradients at each location Histogram of the gradient orientation (8 bins) Concatenate histograms of 4 x 4 spatial grid Feature Dimension: 8 orientation x (4x4) spatial grid = 128 Visual Recognition And Search 29

SIFT Distinctive Descriptor David Lowe s excellent performance tuning: Good parameters: 4 ori, 4 x 4 grid Soft-assignment to spatial bins Gaussian weighting over spatial location Visual Recognition And Search 30

SIFT More Invariant to Illumination Changes Normalize the descriptor to norm one - A change in image contrast can change the magitude but not the orientation Reduce the influence of large gradient magnitudes -Threshold the values in the unit feature vector to be no larger than 0.2 - Renormalize the feature vector after thresholding. Visual Recognition And Search 31

SIFT Rotation Invariance Estimate the dominant orientation of each patch Rotate patch in dominant direction Each patch is associated with (X, Y, Scale, Orientation) Note: - Not every package provides such a procedure in SIFT implementation - Few existing work makes good use of the dominant orientation - With rotation invariance, SIFT may be less discriminant in general recognition Visual Recognition And Search 32

SIFT How to Get Good Local Features Good descriptor Distinctiveness: is able to distinguish any objects Invariance: find the same object across different viewpoint/illumination change Can we do better? More computational efficient, esp for mobile application More compact storage Combining SIFT with traditional features, e.g., LBP Improving poor color description: - SIFT is robust to illumination change, but color is not - SIFT in individual color channels (RGB): more expensive but slightly better Visual Recognition And Search 34

SIFT How to Get Good Local Features Good descriptor Distinctiveness: is able to distinguish any objects Invariance: find the same object across different viewpoint/illumination change Good detector Locality: features are local, so robust to occlusion and clutter Quantity: many features can be generated for even small objects Visual Recognition And Search 35

SIFT SIFT Detector: A First Impression SIFT employs A variant of Histogram of Gradient Orientation as descriptor DoG (Difference of Gaussians) as local region detector Hard to beat Relatively easier to beat Why? DoG has good theoretical support but may not necessarily reflect the requirements in practice Trade-off between invariance and number of detection: In many scenarios, combining multiple detectors -> good performance. Visual Recognition And Search 36

SIFT Detector Concerns for Detectors Detecting Corners Corners are located more robustly than edges Selecting Scales Invariant to scale changes Finding Affine Covariance Invariant to affine transform; matching ellipse with circles Only useful when there is such transform in your dataset! Visual Recognition And Search 37

SIFT Detector DoG Detector Detecting Corners Selecting Scales Finding Affine Covariance Detected region: (red circle) Figure courtesy to Matas and Mikolajczyk Visual Recognition And Search 38

SIFT Detector Difference of Gaussians Visual Recognition And Search 39

SIFT Detector Sampling Frequency in Scale More points are found as sampling frequency increases, but accuracy of matching decreases after 3 scales/octave Visual Recognition And Search 40

SIFT Detector Resample Blur Subtract DoG and Scale Space Theory Scale space representation is a family of the convolution of the given signal with Gaussian kernel = 0 = 1 = 4 = 16 Visual Recognition And Search 41

SIFT Detector DoG and Scale Space Theory (2) Let t =, which represents a "temperature of the intensity values of the image as "temperature distribution" in the image plane Then the scale-space family can be defined as the solution of the diffusion equation where the Gaussian kernel arises as the Green's function of this specific partial differential equation. In scale-space, the corner can be obtained by Laplacian of Gaussian where DoG is an efficient approximation of Laplacian of Gaussian. Visual Recognition And Search 42

SIFT Detector Resample Blur Subtract Accurate Key point localization Detect maxima and minima of difference-of-gaussian in scale space Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002) Taylor expansion around point: Offset of extremum (use finite differences for derivatives): Visual Recognition And Search 43

SIFT Detector Resample Blur Subtract Eliminate Unstable Detections To get stable detection, David Lowe rejects 1. Keypoints with low contrast 2. Keypoints with strong edge but not clear corners. For 1, we filter out keypoints with small For 2, those keypoints correspond to large principle curvature along the edge but small one in the perpendicular direction. To efficient compute the curvature ratio, we need only compute the trace and determinant of local hessian. Visual Recognition And Search 44

SIFT Detector Example of SIFT Keypoint Detection (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures Visual Recognition And Search 45

SIFT Detector How to Do Better than DoG? Combine more than one detectors Harris detector Hessian detector Dense-sampling instead of sparse detector Grid based sampling + multiple resolutions More computation, more storage, but better accuracy Visual Recognition And Search 46

SIFT Detector Harris Detector Scale selection: Laplacian of Harris detector Visual Recognition And Search 47

SIFT Detector Hessian Detector Visual Recognition And Search 48

Modern Local Features Speeded Up Robust Features (SURF) Bay, Tuytelaars, and Van Gool, SURF: Speeded Up Robust Features, ECCV 2006 Visual Recognition And Search 49

SURF Motivation SIFT is slow and expensive, especially for large scale and mobile applications Recall: Integral Image, a useful technique to compute Fast box filter (in detection) Fast Haar wavelet (in description) Visual Recognition And Search 50

SURF Integral Image Using integral images for major speed up Integral Image (summed area tables) is an intermediate representation for fast computation Second order derivative and Haar-wavelet response Cost four additions operation only Visual Recognition And Search 51

SURF Fast Detection Approximated second order derivatives with box filters (mean/average filter) Both box filters and following Haar wavelets can be computed with integral images. Visual Recognition And Search 52

SURF Fast Description Approximate gradient orientation with Haar wavelet 1. Split the interest region into 4 x 4 cell with 5 x 5 regularly spaced sample points inside 2. Compute gradient histogram with 4 bins 3. Sum the response over each sub-region for d x and d y separately dim= 64 4. In order to bring in information about the polarity of the intensity changes, extract the sum of absolute value of the responses dim= 128 5. Normalize the vector Visual Recognition And Search 53

SURF Compared with SIFT Visual Recognition And Search 54

SURF Compared with SIFT (con t) SURF is an order or magnitude (3~5 times) faster than SIFT In object recognition tasks, SURF obtains a comparable or sometimes a little worse results than SIFT (esp. with illumination change or viewpoint change). But we can see how traditional techniques (integral images) can help! Visual Recognition And Search 55

Binary Local Feature Descriptors Visual Recognition And Search 56

Binary Local Descriptors DAISY Tested different configurations Steerable filters Summation Normalization Descriptor quantization PCA dimensions Better performance than SIFT with 13 bytes descriptors (vs. 128 bytes for SIFT) Slide courtesy of Michele Merler S. Winder, G. Hua, M. Brown, Picking the Best DAISY, CVPR 09 Visual Recognition And Search 57

Binary Local Descriptors BRIEF Identify a 31x31 neighborhood for each interest point For each interest point, randomly create 256 pairs patches (patch size 9 x 9) For each pair a, b, if a>b, set the bit to 1, else set 0 The descriptor: 256 bit vector Calonder, Lepetit, Strecha, BRIEF: binary robust independent elementary features, ECCV 2010 Visual Recognition And Search 58

Binary Local Descriptors ORB FAST detector Interest points are detected as >= 12 contiguous pixel brighter than P in a ring of radius 3 around P. BRIEF detector ~100x faster than SIFT, good viewpoint invariance Rublee et al, ORB: an efficient alternative to SIFT or SURF, ICCV 11 Visual Recognition And Search 59

Binary Local Descriptors BRISK Combination of SIFT-like scale-space detection and BRIEF-like descriptor Leutenegger, Chli, Siegwart, BRISK: Binary Robust Invariant Scalable Keypoints, ICCV 2011 Visual Recognition And Search 60

Hands-on Experience Most of slides in this section are courtesy to Andrea Vedaldi Visual Recognition And Search 61

Existing softwares VLFeat http://www.vlfeat.org/ OpenCV http://opencv.org/ Amsterdam Color Descriptor http://koen.me/research/colordescriptors/ Oxford VGG http://www.robots.ox.ac.uk/~vgg/research/affine/ Visual Recognition And Search 62

VLFeat SIFT Detector and Descriptor Visual Recognition And Search 64

VLFeat What Does Frame Mean? Visual Recognition And Search 65

VLFeat Other Corner Detectors Visual Recognition And Search 66

VLFeat Other Corner Detectors Visual Recognition And Search 67

VLFeat Dominant Orientation Visual Recognition And Search 68

VLFeat Compute Descriptors Visual Recognition And Search 69

VLFeat Custom Frames Visual Recognition And Search 70

OpenCV List of Options - OpenCV 2.1: Harris, Tomasi&Shi, MSER, DoG, SURF, STAR and FAST, DoG + SIFT, SURF, HARRIS, GoodFeaturesToTrack - OpenCV 2.4: ORB, BRIEF, and FREAK Visual Recognition And Search 71

OpenCV An Example (Extracting SIFT) Courtesy to Unapiedra@stackoverflow Visual Recognition And Search 72

Next class: How to Use SIFT for Object Classification Todo before next class - Form a group and email the instructors your group information - Read the three papers on sparse coding and prepare for the discussion Any volunteer? Reminder: do not forget to drop the class if you don t plan to attend! Visual Recognition And Search 73

Visual Recognition And Search 74

A Demo Photo Tourism Snavely, Seitz and Szeliski, Photo tourism: Exploring photo collections in 3D, SIGGRAPH, 2006 Snavely, Seitz and Szeliski, Modeling the world from Internet photo collections, IJCV 2007 Visual Recognition And Search 75

Visual Recognition And Search 76

Photo Tourism Detect SIFT Visual Recognition And Search 77

Photo Tourism Detect SIFT Visual Recognition And Search 78

Photo Tourism Match SIFT Visual Recognition And Search 79

Photo Tourism From Correspondence To Geometry Visual Recognition And Search 80

Do You Want Another Video? Visual Recognition And Search 81

Build Rome in A Day Agarwal et al, Building Rome in a Day, ICCV, 2009 Visual Recognition And Search 82

Visual Recognition And Search 83