Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011

Size: px

Start display at page:

Download "Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011"

Erick Robinson
6 years ago
Views:

1 Hashing with Graphs Wei Liu (Columbia Columbia), Jun Wang (IBM IBM), Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011

2 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 2

3 Fast NN Search In many ML/CV/IR problems, one needsa reliable content based similarity search engine. Thesimplest method is nearest neighbor (NN) search using some similarity measure such as L p, cosine, kernel. Theexhaustivelinear exhaustive search into n data items takes time. KD tree is faster but seriously cursed by dimensionality (c<=d d, d<20 works). 3

4 Hashing Low memoryusage and high search efficiency. Locality Sensitive Hashing (LSH) Indyk et al. (since 1998) Semantic Hashing (RBMs) Salakhutdinov & Hinton (2006,2007) 2007) Spectral Hashing (SH) Weiss et al. (2009) Binary Reconstruction Embedding (BRE) Kulis & Darrell ll(2010) Semi Supervised Hashing (SSH) Wang et al. (2010) randomized hash function long bits/multi tables learned hash function short bits/single table 4

Hashing Using Compact Codes Learning based methods try to learn compact binary codes whose Hamming distance approaches or preserves the given similarity, e.g., L 2, cosine, kernel (local measure).

5 Hashing Using Compact Codes Learning based methods try to learn compact binary codes whose Hamming distance approaches or preserves the given similarity, e.g., L 2, cosine, kernel (local measure). Achieve constant time search (O(1)) by flipping several bits and looking up the hash table. learning hash function buckets NN list 5

6 Our Mission Can load millions of data, eg e.g., images, into memory. Very compact codes! Beatthe the existing unsupervised hashing methods (LSH, PCAH, USPLH, SH, KLSH, SIKH). Exploit global measure on manifolds to learn compact codes. Use graphs! Graphs tend to capture the semantic neighborhoods. Beat L 2 linear scan!

7 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 7

8 Graph Hashing (GH) A standard graph based hashing framework [Weiss et al. 09] drop it by spectral relaxation x2 balanced bits uncorrelated bits x1 A12 A: the affinity matrix of the data graph. L: the graph Laplaciandiag(A1) A. A x 1 x 2 x x n data points 8

9 Steps Training: 1) Building a sparse matrix A of an exact neighborhood graph (e.g., k NN graph) onn n d dim data points needs. 2) Computing & binarizing r eigenvectors of the sparse graph Laplacian L needs. Test (hashing): 1) Generalizing r eigenvectors to any unseen point and binarizing needs (interpolations over n entries). Training 1) is intractable for offline training; Test 1) is infeasible for online hashing! 9

10 Spectral Hashing [Weiss et al. 09] SHdirectlyobtainsthe the analyticaleigenfunctions cos(akx+b) of 1D graph Laplacians based on the strong assumptions: i) data dimensions are independent to each other; ii) at each dimension data is uniformly distributed. The 1Dgraph Laplacian is derived from a complete graph. Manifold structure of raw data is almost ignored. Pseudo eigenfunctions & pseudo linear hash functions. 10

11 Toy Data 12k 2D points: blue color denotes the positive eigenvector entries and red color denotes the negative entries. GH smooth yet balanced our approach SH st eigenvector nd eigenvector

12 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 12

13 Idea In Training 1), we utilize our proposed large graph Anchor Graph whose space/time complexities are both linear. W. Liu, J. He, and S. F. Chang Large Graph Construction for Scalable Semi Supervised Learning ICML 2010 (code released) 13

14 Anchor Graph [Liu et al. 10] Nonnegative, sparse, and low rank affinity matrices. Approximate the exact neighborhood graph when the number of anchors is sufficiently large. 100 anchor points 10 NN graph Anchor Graph 14

15 Anchors Introduce a few anchor points (K means clustering centers) and do the data to anchor similarity computation (nxm). The data to anchor similarity matrix Obtain the data to data similarity matrix Â using Z. Â18>0 x8 x1 Z16 anchor points Z12 Z11 u1 u6 Inner product? u2 u3 Â14=0 u5 u4 data points x4 15

16 Low Rank Affinity Matrix Kernel based truncated Z design ( ): Note Z1=1 such that Z acts as a mapping anchors data The Anchor Graph affinity matrix Rank<= m, only computing and saving Z in memory; a doubly stochastic matrix, so the graph Laplacian is I Â and has the same eigenvectors as those of Â. 16

17 Anchor Graph Hashing (AGH) Solvethe relaxed GH problem using the Anchor Graph Due to low rank of Â, its r eigenvectors Y = ZW (excluding the trivial 1) can be easily solved via a small eigenvalue decomposition on the mxm matrix. looks like r eigenvectors on m anchors. The r dim Hamming Embeddings are sgn(zw). 17

18 Eigenvector to Eigenfunction Supposeoneeigenvector one eigenvector y = Zw w 18

19 Theorem Given m anchor points U={u j } and any sample x, define a feature map as follows where =1 iff anchor u j is one of s nearest anchors of x in U according to the distance function D( ). ( ) Then the NystrÖm eigenfunction extended from the Anchor Graph Laplacian eigenvector y k = Zw k is interpolation i over m entries 19

20 Steps Training: 1) Building an Anchor Graph (Z) on nd dim data points needs (T isthe iterationnumber number ofk means) K means). 2) Computing & binarizing r eigenvectors of the Anchor Graph Laplacian needs. 3) Build the hash table O(n). Test (hashing): 1) Generalizing r eigenvectors to any unseen point via NystrÖm extension and binarizing needs. 2) Inverse lookup in the hash table O(1). Linear training time & constant hashing time! 20

21 Hierarchical Hashing The higher eigenvec/eigenfunc corresponding to the higher graph Laplacian eigenvalue is of low quality for partitioning. [Shi et al. 00] We propose two layer hashing to revisit the lower (smoother) eigenfuncs to generate multiple hash bits. 1 Eigenvalues of Anchor Graph Laplacian reuse

22 Hierarchical Hashing balanced partitioning 22

23 1 AGH vs. 2 AGH One Layer AGH = Binarizing r eigenfuncs into r hash bits hash functions r bit code Two Layer AGH = Hierarchically thresholding r/2 eigenfuncs into r hash bits r bit code 23

24 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 24

25 MNIST 70kdigit imagesfrom 10classes classes, 1000queries uniformly sampled from 10 classes. Compare 8 unsupervised hashing methods, L 2 linear scan, and Spectral Embedding (SE) L 2 linear scan (real valued version of 1 AGH). Measure Hamming radius 2 precision (truly hashing) and Hamming ranking accuracy (brute force search) in terms of true neighbors which take the same class label. 25

26 MNIST Pre ecision with hin Hammin ng radius (a) Precision vs. # bits (b) Precision vs. # anchors much higher LSH AGH>SE L 2 > L 2 PCAH 0.7 drop # bits USPLH SH KLSH SIKH 1-AGH 2AGH 2-AGH top-5000 neighbors Precision of l 2 Scan SE l 2 Scan (48 dims) 1-AGH (48 bits) 2-AGH (48 bits) # anchors (a) Precision within Hamming radius 2 using hash lookup (fix m=300 to run AGH); (b) Hamming ranking precision of top 5k ranked neighbors. 26

27 MNIST Precision (c) Precision curves (1-AGH vs. 2-AGH) l 2 Scan 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) Recall AGH (r) > 1 AGH (r/2) # samples (d) Recall curves (1-AGH vs. 2-AGH) Recall 2 AGH (r) > 1 AGH (r) l 2 Scan 1-AGH (24 bits) AGH (48 bits) 2-AGH (48 bits) # samples x 10 4 (c) Hamming ranking precision curves (fix m=300); (d) Hamming ranking recall curves (fix m=300). 27

28 NUS WIDE within Ham mming radiu us 2 Precision About 270k web images from 81 semantic tags, 2100 query images sampled from 21 frequent tags. True neighbors are those sharing at least one semantic tag. (a) Precision vs. # bits LSH PCAH USPLH SH 0.48 KLSH SIKH AGH 2-AGH 0.47 n of top neighbo ors (b) Precision vs. # anchors much higher 2 AGH>SE L 2 > L 2 drop Precisio l 2 Scan #bit bits # anchors SE l 2 Scan (48 dims) 1-AGH (48 bits) 2-AGH (48 bits)

29 NUS WIDE (a)(c)(d) fix m=300. (b) top 5k hammingranking precision. Pre cision (c) Precision curves (1-AGH vs. 2-AGH) l 2 Scan 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) # samples Re ecall AGH (r) > 1 AGH (r/2) (d) Recall curves (1-AGH vs. 2-AGH) Recall 2 AGH (r) > 1 AGH (r) l 2 Scan AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) # samples x

30 Hamming Ranking Method MAP on MNIST MP/5k on NUS WIDE r=24 r=48 r=24 r=48 supervised L 2 Scan SE L 2 Scan BRE (NIPS 10) SH (NIPS 09) KLSH (ICCV 09) linear SIKH (NIPS 10) USPLH (ICML 10) AGH AGH

31 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 31

32 Conclusions Underthe unsupervised scenario, almost all hashing methods are inferior to L 2 linear scan in search accuracy. Our AGH method outperforms L 2 scan in terms of searching semantic neighbors. We have applied Anchor Graphs to achieve scalability in SSL, clustering, and web image search. AGH will have wide applications, i even only the compact code generation scheme. 32

Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), June, 2012

Supervised Hashing with Kernels Wei Liu (Columbia Columbia), Jun Wang (IBM IBM), Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), and Shih Fu Chang (Columbia Columbia) June, 2012 Outline Motivations Problem