Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 18 Image Hashing Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, 2016 Fall p.1
Outline Recap Lec 17: Sparse Signal Recovery L1 norm and L1 Magic Solution Application in occluded face recognition Application in super resolution Media Data Hashing LSH Spectral Hashing Grassmann Hashing Summary Z. Li, Image Analysis & Retrv, 2016 Fall p.2
Sparse Signal Recovery Sparse Signal Processing If signal is sparse in some (unknown) domain, then from a random measurement, we can reliably recover the signal via L1 minimization L1Magic min x x 1, s. t. y = Ax Z. Li, Image Analysis & Retrv, 2016 Fall p.3
Sparse Signal Recovery-L1Magic L1Magic % observations y = A*x; % initial guess = min energy x0 = A'*y; % solve with primal-dual method xp = l1eq_pd(x0, A, [], y, 1e-3); subplot(3,1,3); plot(xp); title('x(t) recovered by L1 magic'); Z. Li, Image Analysis & Retrv, 2016 Fall p.4
Sparsity in Face Models Assume y is belonging to class i, then, Or, Where only a small number of coefficients in x has non-zero entry, thus sparse. Z. Li, Image Analysis & Retrv, 2016 Fall p.5
Illustration of Recovery from Sparsity Assume y is belonging to class 1, then, Most co-efficients related to other classes are zero, only a small number of non-zero coefficients in alpha 1 Z. Li, Image Analysis & Retrv, 2016 Fall p.6
Coupled Dictionary Learning Pre-train a common set of coupled low and high resolution dictionary Super-resolve by solving L1 minimization on lower resolution patch, and use the same coeffiients to superresolve the higher resolution patch Z. Li, Image Analysis & Retrv, 2016 Fall p.7
Dictionary Training Training data: low and high resolution image patches Y l ={y k }, X h ={x k }: Enforce the common sparse coefficients Z. Li, Image Analysis & Retrv, 2016 Fall p.8
Results 3x super-resolution Low-resolution input Bicubic Neighbor embedding [Chang CVPR 04] Coupled Dictionary Original Z. Li, Image Analysis & Retrv, 2016 Fall p.9
Outline Recap Lec 17: Sparse Signal Recovery L1 norm and L1 Magic Solution Application in occluded face recognition Application in super resolution Media Data Hashing LSH & Spectral Hashing Grassmann Hashing Complementary Hashing Summary Z. Li, Image Analysis & Retrv, 2016 Fall p.10
Media Data Hashing Use Case Internet scale image retrieval Internet contains billions of images Search the internet Challenges: Internet Scale: very large reporitory, need a compact representation Speed: hash offers binary operations, fast Accuracy: the hash need to preserve the desired similarity in hamming distance Z. Li, Image Analysis & Retrv, 2016 Fall p.11
Media Data Hashing Recall MPEG CDVS, Scalable Fisher Vector: N SIFT binarize Aggregate against kdxnc GMM Hash Objective: Find a image feature and feature aggregation/projection Binarize the representation to generate Hash s.t., the pair-wise relationship is preserved by the Hamming distance of the Hash Z. Li, Image Analysis & Retrv, 2016 Fall p.12
Tree Based Hash Kd-Tree Hash Data partition solution Iteratively split the data along the dimensions Each leaf node has equal number of data points Assign Hash as 1/0 when traversing down the kd-tree Octree/Quadtree Hash Space partition solution Iteratively split the space into 2 d equal size pieces Each node is addressed by a byte code, resulting in a prefix hash. Z. Li, Image Analysis & Retrv, 2016 Fall p.13
Curse of Dimensionality When data dimension is large, say > 20, tree based solution breaks down. Degenerate to linear search with O(N) Complexity + Z. Li, Image Analysis & Retrv, 2016 Fall p.14
Nearest Neighbor Search Definitions Nearest Neighbor (NN), r-nn Credit: P.Indyk, Approx NN search in High Dimensional Space, http://www.mit.edu/~andoni/lsh/ Z. Li, Image Analysis & Retrv, 2016 Fall p.15
Approx. NN Search Definition Z. Li, Image Analysis & Retrv, 2016 Fall p.16
Motivation for LSH if p and q are close, then Ap, Aq must be close, not vice versa Z. Li, Image Analysis & Retrv, 2016 Fall p.17
Locality Sensitivity Definition: (p 1, p 2, r, cr) -sensitivity Z. Li, Image Analysis & Retrv, 2016 Fall p.18
LSH Locality Sensitive Hashing Basic Idea: Reduce images to some features {x k } in R d, where d is usually large (e.g., SCFV: d=32x128=512) Select random projections y=ax, where A is 1xd, then assign 1 or 0 from the projection Aggregate all these projections and the bits produced as Hash for the image 0 Y=A 1 x 1 101 Hash Generated Feature vector 0 Y=A 2 x 0 1 1 Y=A 3 x No learning involved Z. Li, Image Analysis & Retrv, 2016 Fall p.19
LSH Analysis Intuition: If two points {p, q}are close, they will hash to the same bucket with prob p 1. If two points are far away, they will hash to the same bucket with prob p 2. Pr[h(p)=h(q)]=(1-d(p,q)/D) k D is the number of dimensions in the binary representation k is the size of subset of Hashes We can vary the probability by changing k: adding more hash bits getting more evidence Pr k=1 Pr k=2 distance distance Z. Li, Image Analysis & Retrv, 2016 Fall p.20
Indyk s LSH Results Color histogram dataset from Corel Draw 20,000 images, 64 dimensions Used 1k, 2k, 5k, 10k, 19k points for training 1k points are used for query Computed missed ratio fraction of queries with no hits
Grassmann Hashing Main Motivation Allow multiple low dimensional projection, generating multiple bits per projection Penalizing the subspaces we already selected avoiding generating similar bits that are wasting the hash bits budget GRASH: GRASH introduces the Grassmann metric to measure the similarity between different hashing subspaces, so the hashing function can better capture the data diversity. GRASH incorporates the discriminant information into the hashing functions; GRASH can extend the original LSH s 1-d hashing subspaces to m-d; GRASH applies non-uniform size bucket to generate hashing codes, so the distortion can be minimized. Z. Li, Image Analysis & Retrv, 2016 Fall p.22
GRASH Discriminative Projection via Learning (can be LDA/LPP) Do FLDA, get first d Fisher Faces W Hash Projection Candidates Find Hashing Subspace Candidates (HSC) by traversing the combinations of the m Fisher faces out of d, where m is the No. of hashing dim. Record the discriminant energy of the derived HSC, which is defined as follows: arg max W = [ w w... w ] E 1 2 t m i T W SBW T W SWW n 2 i
GRASH Penalizing Similar subspaces chosen: Select the optimal k hashing functions according to the following criteria: min error rate and sum of grassmann distance i E d i j 2 arg max i (1 ) Arc (, ) i j U in
LSH vs GRASH
GRASH Bucket Design Bucket Design Non-uniform bucket design for hashing codes Apply Lloyd-Max algorithm, to minimize distortion: D E x xˆ 2
Experiments Datasets: A large human face dataset, combining YALE, ESSEX, ORL etc, 6,680 faces of 417 individuals MSRA-MM datasets, around 10,000 images from 10 classes, each image with 899D feature (e.g. feature from RBG histogram, wavelet texture) Performance Evaluation: Intersection rate, defined as follows: I 1 q, GRASH q,* Q q Q U U q,* U
Experiments Face Hash Face dataset: Intersection rate vs μ(no. of hashing function 20, 8- NN)
Experiments MSRA Data Set 4-NNS 8-NNS 16-NNS 32-NNS LSH-1bit 23.9% 28.8% 33.6% 35.1% LSH-2bit 31.5% 34.6% 39.3% 39.8% LSH-4bit 40.6% 45.7% 51.2% 55.1% GRASH-1bit 39.3% 42.4% 49.7% 53.2% GRASH-2bit 52.8% 55.8% 68.3% 72.3% GRASH-4bit 63.9% 69.7% 73.6% 80.3%
MSRA-MM dataset: Experiments
Spectral Hashing To simplify the problem, first assume that the items have already been embedded in a Euclidean space Try to embed the data into a hamming space Hamming space is binary space 010101001 Fergus et al
n y i i=1 Some definition Let be the list of code words (binary vectors of length k) for n data points Affinity map: W i, j = exp( x i x j 2 /h 2 )is the affinity matrix characterize similarities between data points.
Objective function the average Hamming distance between similar points is minimal What does this objective function mean? Generated hash {y i } has equal 1/0 bits W i,j enforce that similar data points are preserved in hamming distance of y i
Objective of Spectral Hashing Spectral Hashing explained: the average Hamming distance between similar neighbors in the Euclidean space The code is binary each bit have 50% to be 0 or 1 the bits to be uncorrelated (bounding condition for the objective)
Spectral Relaxation We obtain an easy problem whose solutions are simply the k eigenvectors of D W with minimal eigenvalue Observation: Similar with spectral graph partition Could be solved by computing generalized Eigenvalue problem on Laplacian
New Sample After Embedding Problem Only tells us how to compute the code representation of items in the training set How about the testing set? A new query image? Computing the code in the testing set is called the out-of-sample extension v 9 + What would be the hash for V 9?
New Sample Hash Assignment Need a function to map new points into the space Take limit of Eigenvalues as n \inf Need to carefully normalize graph Laplacian Analytical form of Eigenfunctions exists for certain distributions (uniform, Gaussian) Constant time compute/evaluate new point For uniform distribution: Eigen vectors 1/0 assignment Z. Li, Image Analysis & Retrv, 2016 Fall p.37
The Algorithm Input: Data {x i } of dimensionality d; desired # bits, k
1. Fit Multidimensional Rectangle Run PCA Run PCA to align axes Bound uniform distribution
2. Calculuate Eigenfunctions
2. Calculuate Eigenfunctions
2. Calculuate Eigenfunctions
2. Calculuate Eigenfunctions
3. Pick k smallest Eigenfunctions Eigenvalues e.g. k=3
4. Threshold chosen Eigenfunctions
Back to the 2-D Toy example Hashing the new data points 3 bits 7 bits 15 bits Distance Red Green Blue 0 bits 1 bit 2 bits
Fergus et al 2-D uniform Toy Example Comparison
Some results on Labelme data set Observation: spectral hashing get the best performance
Summary Image Hash a very useful technique in large scale image retrieval Locality Sensitive Hash Random projections that generate hash bits Sufficient number of projections will preserve its distance in hamming distance, as d(p,q) nearness is always preserved in projection. Not very efficient though (see Complementary Hashing) Grassmann Hash Allow flexible multiple dimension projection and bucket design Penalizing the projections with Grassmann metric Spectrum Hash Use local graph Laplacian eigenfunctions to generate hash bits, which is an assignment of segmentation.