Image Analysis & Retrieval Lec 12 - Mid-Term Review

CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 12 - Mid-Term Review Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact: Email: lizhu@umkc.edu, Ph: x 2346. Created using WPS office and EqualX latex equation editor p.1

q HW-2 q Quiz-1 Outline q Review part I: Image Features & Classifiers Image Formation: Homography Color Space & Color Features Filtering and Edge Features Harris Detector SIFT Aggregation: BoW, VLAD, Fisher, and Supervector Image Retrieval System Metrics: TPR/Precision, FPR, Recall, map Classification: knn, GMM Bayesian, SVM p.2

qdata Set: CDVS data set Homework-2: Aggregation q Global Codebook for VLAD, FisherVector Compute SIFT and DenseSIFT from each image, sort them into two arrays: hw2_sift: hw2_dsift Random sample a subset of them: use randperm() Compute PCA: o indx = randperm(n_sift); indx = indx(1:20*2^10); % sample 20k o [A0, s, eigv]=princomp(hw2_sift(indx,: )); Project SIFT o Dimension reduction: x = sift*a0(:,1:kd) Compute Kmeans and GMM models for VLAD and FV aggregation: o vl_feat: vl_kmeans(), vl_gmm(); p.3

Fisher Vector Aggregation q FV encoding Gradient on the mean, for GMM component k, j=1..d In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances Normalize by Fisher Kernel p.4

Quiz-1 q Covers what we reviewed today q Format: True or False Multiple Choices Problem Solving Extra Credit (25%) q Close Book, 1-page A4 cheating sheet allowed. cheating sheet need to be turned in with name and id number cheating sheet will also be graded for up to 10 pts q Relax J Quiz is actually on me. p.5

Best Cheating Sheet Award from 2017 J q Cheating sheet is also graded. Up to 10 pts extra A nice way to summarize and organize/indexing your knowledge base. Cheating sheet + HW = what you really get from the class. p.6

Classification in Image Retrieval q An image retrieval pipeline (hand crafted features) Image Formatio n Feature Computing Feature Aggregation Classificatio n Homography, Color space Color histogram Filtering, Edge Detection HoG, Harris Detector, SIFT Bow VLAD Fisher Vector Supervector Knowledge /Data Base p.8

Image Formation - Geometry p.9 [ ] X 0 x = K I ú ú ú ú û ù ê ê ê ê ë é ú ú ú û ù ê ê ê ë é = ú ú ú û ù ê ê ê ë é 1 0 1 0 0 0 0 0 0 0 0 1 z y x f f v u w K Slide Credit: GaTech Computer Vision class, 2015. Intrinsic Assumptions Unit aspect ratio Optical center at (0,0) No skew of pixels Extrinsic Assumptions No rotation Camera at (0,0,0) X x

Image Formation: Homography q In general, homography H maps 2d points according to, x =Hx Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF qaffine Transform: 6 DoF: Contains a translation [t 1, t 2 ], and invertable affine matrix A [2x2] qsimilarity Transform, 4 DoF: a rigid transform that preserves distance if s=1: p.10

VL_FEAT Implementation q VL_FEAT has an implementation: p.11

Image Formation Color Space q RGB Color Space: 8 bit Red, Green, Blue Channels Mixed together q YUV/YCbCr Color Space q HSV Color Space Hue: color in polar coord Saturation Value p.12

Color Histogram Fixed Codebook q Color Histogram: Fixed code book to generate histogram Example: MPEG 7 Scalable Color Descriptor HSV Color Space, 16x4x4 partition for bins Scalability thru quantization and Haar Transform p.13

Color Histogram Adaptive Codebook qadaptive Color Histogram No fixed code book Example: MPEG 7 Dominant Color Descriptor Distance Metric Not index-able p.14

q Basic Filter Operations Image Filters 1/9 1/9 1 1/9 1 1/9 1 1/9 1 1/9 1 * 1/9 1 1/9 1 1/9 1 qregion of Support p.15

Filter Properties qlinear Filters are: Shift-Invariant: f(m-k, n-j)*h = g(m-k, n-j), if f*h=g Associative: f*h 1 *h 2 = f*(h 1 *h 2 ) this can save a lot of complexity Distributive: f*h1 + f*h2 = f*(h1+h2) useful in SIFT s DoG filtering. p.16

Edge Detection q The need of smoothing Gaussian smoothing Combine differentiation and Gaussian derivative of Gaussian (x) f p.17

Image Gradient from DoG Filtering qimage Gradient (at a scale): The gradient points in the direction of most rapid increase in intensity The edge strength is given by the gradient magnitude: The gradient direction is given by: p.18

HOG Histogram of Oriented Gradients q Generate local histogram per block Size of HoG: n*m*4*9=36nm, for n x m cells. f = 1 1 2 2 3 3 4 4 ( h,..., h, h,..., h, h,..., h, h h ) 1 9 1 9 1 9 1,..., 9 Angle Magnitude 0 15 25 25 5 20 20 10 10 15 25 30 5 10 10 5 HOG Cell size 5 Binary voting 45 47 95 97 160 10 1 10 1 11 0 12 0 20 50 30 70 30 70 Magnitude voting 40 80 4 140 120 3 100 80 2 60 1 40 20 0 0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179 0 0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179 p.19

Harris Detector qsmoothed 2 nd moment: 1. Image derivatives (optionally blur first) 2 é I x ( s D ) m s I, s D ) = g( s I ) * ê êë I xi y ( s D ) ( 2 I ù xi y ( s D ) ú I y ( s D ) úû I x I y det M trace M = l l 1 2 = l + l 1 2 2. Square of derivatives I x 2 I y 2 I x I y 3. Gaussian filter g(s I ) g(i x 2) g(i y 2) g(i x I y ) 4. Harris Function detect corners, no need to compute the eigen values Rharris 2 = det[ m( s, s )]-a[trace( m( s, s )) ] = I D g + 2 2 2 2 2 ( I x ) g( I y ) -[ g( I xi y )] -a[ g( I x ) g( I y I D )] 2 har p.20

Harris Detector Steps q Response function -> Threshold -> Local Maxima Harris Detector NOT scale invariant R(x,y) p.21

Scale Space Theory - Lindeberg Ñ 2 g = 2 g 2 x + 2 g 2 y r image characteristic scale p.22

SIFT Detector via Difference of Gaussian q However, LoG computing is expensive nonseperatable filter q SIFT: uses Difference of Gaussian filters to approximate LoG Separate-able, much faster Box Filter Approximation you will see later. LoG Scale space construction By Gaussian Filtering, and Image Difference p.23

Peak Strength & Edge Removal q Peak Strength: Interpolate true DoG response and pixel location by Taylor expansion q Edge Removal: Re-do Harris type detection to remove edge on much reduced pixel set p.24

Scale Invariance thru Dominant Orientation Coding q Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the gradients closer to the center p.25

SIFT Description q Rotate to dominant direction, divide Image Patch (16x16 pixels) around SIFT point into 4x4 Cells q For each cell, compute an 8-bin edge histogram 0 2p angle histogram Spatial-scale space extrema qoverall, we have 4x4x8=128 dimension p.26

SIFT Matching and Repeatability Prediction q SIFT Distance True Love Test qnot all SIFT are created equal Peak strength (DoG response at interpolated position) Combined scale/peak strength pmf p.27

q Curse of Dimensionality Why Aggregation? + qdecision Boundary / Indexing + +.. + p.29

Histogram Aggregation: Bag-of-Words k n p.30

Histogram Aggregation: Kernel Code Book Soft Encoding p.31

VLAD- Vector of Locally Aggregated Descriptors 1 assign descriptors m 1 m 2 m 4 m 3 x m 5 2 compute x- m i 3 v i =sum x- m i for cell i v 1 v 2 v 3 v 4 v 5 p.32

Supervector Aggregation p.34

AKULA Adaptive KLUster Aggregation qakula Descriptor: No fixed codebook! Outer loop: subspace optimization Inner loop: cluster centroids + SIFT count A 1 ={yc 1 1, yc 1 2,, yc 1 k ; pc 1 1, pc 1 2,, pc 1 k } A 2 ={yc 2 1, yc 2 2,, yc 2 k ; pc 2 1, pc 2 2,, pc 2 k } qdistance metric: Min centroids distance, weighted by SIFT count (kind of manifolds tangent distance) p.35

Image Retrieval Performance Metrics q True Positive Rate q False Positive Rate qprecision qrecall qf-measure TPR = TP/(TP+FP) FPR = FP/(TP+FP) Precision =TP/(TP + FP), Recall = TP/(TP + FN) F-measure = 2*(precision*recall)/(precision + recall) p.36

Retrieval Performance: Mean Average Precision qmap measures the retrieval performance across all queries p.37

map example q MAP is computed across all query results as the average precision over the recall p.38

Compute TPR-FPR q ROC Curve (Receiver Operating Characteristics) Confusion/distance matrix: d(j,k) Ground Truth: m(j,k) p.39

Classification in Image Retrieval q A typical image retrieval pipeline Image Formatio n Feature Computing Feature Aggregation Classificatio n Knowledge /Data Base p.40

q knn is Nonlinear Classifier Operations: majority vote knn Classifier qperformance bounds on k: not worse off by 2 times the error rate of Bayesian error rate As k increases, it improves, but not much qaccelerate NN retrieval operation Kd-tree: o a space partition scheme, to limit the number of data points comparison operation Kd-tree supported knn search: o Approx KNN search o Kd-Forest supported Approx Search p.41

Gaussian Generative Model Classifier q Shaped by the relative shape of Covariance Matrix q Matlab: [rec_label, err]=classify(q, x, y, method); Method = quadratic p.42

q Why called SVM? The Support Vector Solution K(x i, x) p.43

Support Vector Machine qmatlab: % train svm svm = svmtrain(x, y, 'showplot', true); % svm classification for k=1:5 figure(41); hold on; grid on; axis([0 1 0 1]); [u,v]=ginput(1); q=[u,v]; plot(u, v, '*b'); rec = svmclassify(svm, q); fprintf('\n recognized as %c', rec); end p.44

Summary q Image Analysis & Retrieval Part I Image Formation Color and Texture Features Interest Points Detection Harris Detector Invariant Interest Point Detection SIFT Aggregation Classification tools: knn, gmm, svm Performance metrics: o tp, fp, tn, fn, tpr/precision, fpr, recall, map q Part II: Holistic approach Just throw the pixels to the algorithm to figure out what are the good features. Subspace approach: Eigenface, Fisherface, Laplacian Embedding Deep Learning: CNN Hashing p.45