Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011

Size: px
Start display at page:

Download "Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011"

Transcription

1 Hashing with Graphs Wei Liu (Columbia Columbia), Jun Wang (IBM IBM), Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011

2 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 2

3 Fast NN Search In many ML/CV/IR problems, one needsa reliable content based similarity search engine. Thesimplest method is nearest neighbor (NN) search using some similarity measure such as L p, cosine, kernel. Theexhaustivelinear exhaustive search into n data items takes time. KD tree is faster but seriously cursed by dimensionality (c<=d d, d<20 works). 3

4 Hashing Low memoryusage and high search efficiency. Locality Sensitive Hashing (LSH) Indyk et al. (since 1998) Semantic Hashing (RBMs) Salakhutdinov & Hinton (2006,2007) 2007) Spectral Hashing (SH) Weiss et al. (2009) Binary Reconstruction Embedding (BRE) Kulis & Darrell ll(2010) Semi Supervised Hashing (SSH) Wang et al. (2010) randomized hash function long bits/multi tables learned hash function short bits/single table 4

5 Hashing Using Compact Codes Learning based methods try to learn compact binary codes whose Hamming distance approaches or preserves the given similarity, e.g., L 2, cosine, kernel (local measure). Achieve constant time search (O(1)) by flipping several bits and looking up the hash table. learning hash function buckets NN list 5

6 Our Mission Can load millions of data, eg e.g., images, into memory. Very compact codes! Beatthe the existing unsupervised hashing methods (LSH, PCAH, USPLH, SH, KLSH, SIKH). Exploit global measure on manifolds to learn compact codes. Use graphs! Graphs tend to capture the semantic neighborhoods. Beat L 2 linear scan!

7 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 7

8 Graph Hashing (GH) A standard graph based hashing framework [Weiss et al. 09] drop it by spectral relaxation x2 balanced bits uncorrelated bits x1 A12 A: the affinity matrix of the data graph. L: the graph Laplaciandiag(A1) A. A x 1 x 2 x x n data points 8

9 Steps Training: 1) Building a sparse matrix A of an exact neighborhood graph (e.g., k NN graph) onn n d dim data points needs. 2) Computing & binarizing r eigenvectors of the sparse graph Laplacian L needs. Test (hashing): 1) Generalizing r eigenvectors to any unseen point and binarizing needs (interpolations over n entries). Training 1) is intractable for offline training; Test 1) is infeasible for online hashing! 9

10 Spectral Hashing [Weiss et al. 09] SHdirectlyobtainsthe the analyticaleigenfunctions cos(akx+b) of 1D graph Laplacians based on the strong assumptions: i) data dimensions are independent to each other; ii) at each dimension data is uniformly distributed. The 1Dgraph Laplacian is derived from a complete graph. Manifold structure of raw data is almost ignored. Pseudo eigenfunctions & pseudo linear hash functions. 10

11 Toy Data 12k 2D points: blue color denotes the positive eigenvector entries and red color denotes the negative entries. GH smooth yet balanced our approach SH st eigenvector nd eigenvector

12 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 12

13 Idea In Training 1), we utilize our proposed large graph Anchor Graph whose space/time complexities are both linear. W. Liu, J. He, and S. F. Chang Large Graph Construction for Scalable Semi Supervised Learning ICML 2010 (code released) 13

14 Anchor Graph [Liu et al. 10] Nonnegative, sparse, and low rank affinity matrices. Approximate the exact neighborhood graph when the number of anchors is sufficiently large. 100 anchor points 10 NN graph Anchor Graph 14

15 Anchors Introduce a few anchor points (K means clustering centers) and do the data to anchor similarity computation (nxm). The data to anchor similarity matrix Obtain the data to data similarity matrix  using Z. Â18>0 x8 x1 Z16 anchor points Z12 Z11 u1 u6 Inner product? u2 u3 Â14=0 u5 u4 data points x4 15

16 Low Rank Affinity Matrix Kernel based truncated Z design ( ): Note Z1=1 such that Z acts as a mapping anchors data The Anchor Graph affinity matrix Rank<= m, only computing and saving Z in memory; a doubly stochastic matrix, so the graph Laplacian is I Â and has the same eigenvectors as those of Â. 16

17 Anchor Graph Hashing (AGH) Solvethe relaxed GH problem using the Anchor Graph Due to low rank of Â, its r eigenvectors Y = ZW (excluding the trivial 1) can be easily solved via a small eigenvalue decomposition on the mxm matrix. looks like r eigenvectors on m anchors. The r dim Hamming Embeddings are sgn(zw). 17

18 Eigenvector to Eigenfunction Supposeoneeigenvector one eigenvector y = Zw w 18

19 Theorem Given m anchor points U={u j } and any sample x, define a feature map as follows where =1 iff anchor u j is one of s nearest anchors of x in U according to the distance function D( ). ( ) Then the NystrÖm eigenfunction extended from the Anchor Graph Laplacian eigenvector y k = Zw k is interpolation i over m entries 19

20 Steps Training: 1) Building an Anchor Graph (Z) on nd dim data points needs (T isthe iterationnumber number ofk means) K means). 2) Computing & binarizing r eigenvectors of the Anchor Graph Laplacian needs. 3) Build the hash table O(n). Test (hashing): 1) Generalizing r eigenvectors to any unseen point via NystrÖm extension and binarizing needs. 2) Inverse lookup in the hash table O(1). Linear training time & constant hashing time! 20

21 Hierarchical Hashing The higher eigenvec/eigenfunc corresponding to the higher graph Laplacian eigenvalue is of low quality for partitioning. [Shi et al. 00] We propose two layer hashing to revisit the lower (smoother) eigenfuncs to generate multiple hash bits. 1 Eigenvalues of Anchor Graph Laplacian reuse

22 Hierarchical Hashing balanced partitioning 22

23 1 AGH vs. 2 AGH One Layer AGH = Binarizing r eigenfuncs into r hash bits hash functions r bit code Two Layer AGH = Hierarchically thresholding r/2 eigenfuncs into r hash bits r bit code 23

24 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 24

25 MNIST 70kdigit imagesfrom 10classes classes, 1000queries uniformly sampled from 10 classes. Compare 8 unsupervised hashing methods, L 2 linear scan, and Spectral Embedding (SE) L 2 linear scan (real valued version of 1 AGH). Measure Hamming radius 2 precision (truly hashing) and Hamming ranking accuracy (brute force search) in terms of true neighbors which take the same class label. 25

26 MNIST Pre ecision with hin Hammin ng radius (a) Precision vs. # bits (b) Precision vs. # anchors much higher LSH AGH>SE L 2 > L 2 PCAH 0.7 drop # bits USPLH SH KLSH SIKH 1-AGH 2AGH 2-AGH top-5000 neighbors Precision of l 2 Scan SE l 2 Scan (48 dims) 1-AGH (48 bits) 2-AGH (48 bits) # anchors (a) Precision within Hamming radius 2 using hash lookup (fix m=300 to run AGH); (b) Hamming ranking precision of top 5k ranked neighbors. 26

27 MNIST Precision (c) Precision curves (1-AGH vs. 2-AGH) l 2 Scan 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) Recall AGH (r) > 1 AGH (r/2) # samples (d) Recall curves (1-AGH vs. 2-AGH) Recall 2 AGH (r) > 1 AGH (r) l 2 Scan 1-AGH (24 bits) AGH (48 bits) 2-AGH (48 bits) # samples x 10 4 (c) Hamming ranking precision curves (fix m=300); (d) Hamming ranking recall curves (fix m=300). 27

28 NUS WIDE within Ham mming radiu us 2 Precision About 270k web images from 81 semantic tags, 2100 query images sampled from 21 frequent tags. True neighbors are those sharing at least one semantic tag. (a) Precision vs. # bits LSH PCAH USPLH SH 0.48 KLSH SIKH AGH 2-AGH 0.47 n of top neighbo ors (b) Precision vs. # anchors much higher 2 AGH>SE L 2 > L 2 drop Precisio l 2 Scan #bit bits # anchors SE l 2 Scan (48 dims) 1-AGH (48 bits) 2-AGH (48 bits)

29 NUS WIDE (a)(c)(d) fix m=300. (b) top 5k hammingranking precision. Pre cision (c) Precision curves (1-AGH vs. 2-AGH) l 2 Scan 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) # samples Re ecall AGH (r) > 1 AGH (r/2) (d) Recall curves (1-AGH vs. 2-AGH) Recall 2 AGH (r) > 1 AGH (r) l 2 Scan AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) # samples x

30 Hamming Ranking Method MAP on MNIST MP/5k on NUS WIDE r=24 r=48 r=24 r=48 supervised L 2 Scan SE L 2 Scan BRE (NIPS 10) SH (NIPS 09) KLSH (ICCV 09) linear SIKH (NIPS 10) USPLH (ICML 10) AGH AGH

31 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions 31

32 Conclusions Underthe unsupervised scenario, almost all hashing methods are inferior to L 2 linear scan in search accuracy. Our AGH method outperforms L 2 scan in terms of searching semantic neighbors. We have applied Anchor Graphs to achieve scalability in SSL, clustering, and web image search. AGH will have wide applications, i even only the compact code generation scheme. 32

Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), June, 2012

Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), June, 2012 Supervised Hashing with Kernels Wei Liu (Columbia Columbia), Jun Wang (IBM IBM), Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), and Shih Fu Chang (Columbia Columbia) June, 2012 Outline Motivations Problem

More information

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0

More information

over Multi Label Images

over Multi Label Images IBM Research Compact Hashing for Mixed Image Keyword Query over Multi Label Images Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar

More information

Adaptive Binary Quantization for Fast Nearest Neighbor Search

Adaptive Binary Quantization for Fast Nearest Neighbor Search IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,

More information

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM),

More information

Adaptive Hash Retrieval with Kernel Based Similarity

Adaptive Hash Retrieval with Kernel Based Similarity Adaptive Hash Retrieval with Kernel Based Similarity Xiao Bai a, Cheng Yan a, Haichuan Yang a, Lu Bai b, Jun Zhou c, Edwin Robert Hancock d a School of Computer Science and Engineering, Beihang University,

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Isometric Mapping Hashing

Isometric Mapping Hashing Isometric Mapping Hashing Yanzhen Liu, Xiao Bai, Haichuan Yang, Zhou Jun, and Zhihong Zhang Springer-Verlag, Computer Science Editorial, Tiergartenstr. 7, 692 Heidelberg, Germany {alfred.hofmann,ursula.barth,ingrid.haas,frank.holzwarth,

More information

Hashing with Binary Autoencoders

Hashing with Binary Autoencoders Hashing with Binary Autoencoders Ramin Raziperchikolaei Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu Joint work with Miguel Á. Carreira-Perpiñán

More information

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 18.

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 18. Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 18 Image Hashing Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph:

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia 1, Yan Pan 1, Hanjiang Lai 1,2, Cong Liu

More information

Large-scale visual recognition Efficient matching

Large-scale visual recognition Efficient matching Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!

More information

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function: 1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.

More information

CLSH: Cluster-based Locality-Sensitive Hashing

CLSH: Cluster-based Locality-Sensitive Hashing CLSH: Cluster-based Locality-Sensitive Hashing Xiangyang Xu Tongwei Ren Gangshan Wu Multimedia Computing Group, State Key Laboratory for Novel Software Technology, Nanjing University xiangyang.xu@smail.nju.edu.cn

More information

Supervised Hashing with Kernels

Supervised Hashing with Kernels Supervised Hashing with Kernels Wei Liu Jun Wang Rongrong Ji Yu-Gang Jiang Shih-Fu Chang Electrical Engineering Department, Columbia University, New York, NY, USA IBM T. J. Watson Research Center, Yorktown

More information

Learning to Hash on Structured Data

Learning to Hash on Structured Data Learning to Hash on Structured Data Qifan Wang, Luo Si and Bin Shen Computer Science Department, Purdue University West Lafayette, IN 47907, US wang868@purdue.edu, lsi@purdue.edu, bshen@purdue.edu Abstract

More information

Progressive Generative Hashing for Image Retrieval

Progressive Generative Hashing for Image Retrieval Progressive Generative Hashing for Image Retrieval Yuqing Ma, Yue He, Fan Ding, Sheng Hu, Jun Li, Xianglong Liu 2018.7.16 01 BACKGROUND the NNS problem in big data 02 RELATED WORK Generative adversarial

More information

Learning independent, diverse binary hash functions: pruning and locality

Learning independent, diverse binary hash functions: pruning and locality Learning independent, diverse binary hash functions: pruning and locality Ramin Raziperchikolaei and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced

More information

Evaluation of Hashing Methods Performance on Binary Feature Descriptors

Evaluation of Hashing Methods Performance on Binary Feature Descriptors Evaluation of Hashing Methods Performance on Binary Feature Descriptors Jacek Komorowski and Tomasz Trzcinski 2 Warsaw University of Technology, Warsaw, Poland jacek.komorowski@gmail.com 2 Warsaw University

More information

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6, Spectral Clustering XIAO ZENG + ELHAM TABASSI CSE 902 CLASS PRESENTATION MARCH 16, 2017 1 Presentation based on 1. Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4

More information

Asymmetric Discrete Graph Hashing

Asymmetric Discrete Graph Hashing Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Asymmetric Discrete Graph Hashing Xiaoshuang Shi, Fuyong Xing, Kaidi Xu, Manish Sapkota, Lin Yang University of Florida,

More information

Large Scale Mobile Visual Search

Large Scale Mobile Visual Search Large Scale Mobile Visual Search Ricoh, HotPaper (by Mac Funamizu) Shih-Fu Chang June 2012 The Explosive Growth of Visual Data broadcast Social portals video blogs 70,000 TB/year, 100 million hours 60

More information

Variable Bit Quantisation for LSH

Variable Bit Quantisation for LSH Sean Moran School of Informatics The University of Edinburgh EH8 9AB, Edinburgh, UK sean.moran@ed.ac.uk Variable Bit Quantisation for LSH Victor Lavrenko School of Informatics The University of Edinburgh

More information

arxiv: v2 [cs.cv] 27 Nov 2017

arxiv: v2 [cs.cv] 27 Nov 2017 Deep Supervised Discrete Hashing arxiv:1705.10999v2 [cs.cv] 27 Nov 2017 Qi Li Zhenan Sun Ran He Tieniu Tan Center for Research on Intelligent Perception and Computing National Laboratory of Pattern Recognition

More information

Smart Hashing Update for Fast Response

Smart Hashing Update for Fast Response Smart Hashing Update for Fast Response Qiang Yang, Longkai Huang, Wei-Shi Zheng*, and Yingbiao Ling School of Information Science and Technology, Sun Yat-sen University, China Guangdong Province Key Laboratory

More information

Learning Affine Robust Binary Codes Based on Locality Preserving Hash

Learning Affine Robust Binary Codes Based on Locality Preserving Hash Learning Affine Robust Binary Codes Based on Locality Preserving Hash Wei Zhang 1,2, Ke Gao 1, Dongming Zhang 1, and Jintao Li 1 1 Advanced Computing Research Laboratory, Beijing Key Laboratory of Mobile

More information

Learning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro

Learning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro Learning Low-rank Transformations: Algorithms and Applications Qiang Qiu Guillermo Sapiro Motivation Outline Low-rank transform - algorithms and theories Applications Subspace clustering Classification

More information

RECENT years have witnessed the rapid growth of image. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

RECENT years have witnessed the rapid growth of image. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval Jian Zhang, Yuxin Peng, and Junchao Zhang arxiv:607.08477v [cs.cv] 28 Jul 206 Abstract The hashing methods have been widely used for efficient

More information

Similarity-Preserving Binary Hashing for Image Retrieval in large databases

Similarity-Preserving Binary Hashing for Image Retrieval in large databases Universidad Politécnica de Valencia Master s Final Project Similarity-Preserving Binary Hashing for Image Retrieval in large databases Author: Guillermo García Franco Supervisor: Dr. Roberto Paredes Palacios

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

Complementary Hashing for Approximate Nearest Neighbor Search

Complementary Hashing for Approximate Nearest Neighbor Search Complementary Hashing for Approximate Nearest Neighbor Search Hao Xu Jingdong Wang Zhu Li Gang Zeng Shipeng Li Nenghai Yu MOE-MS KeyLab of MCC, University of Science and Technology of China, P. R. China

More information

Metric Learning Applied for Automatic Large Image Classification

Metric Learning Applied for Automatic Large Image Classification September, 2014 UPC Metric Learning Applied for Automatic Large Image Classification Supervisors SAHILU WENDESON / IT4BI TOON CALDERS (PhD)/ULB SALIM JOUILI (PhD)/EuraNova Image Database Classification

More information

Iterative Quantization: A Procrustean Approach to Learning Binary Codes

Iterative Quantization: A Procrustean Approach to Learning Binary Codes Iterative Quantization: A Procrustean Approach to Learning Binary Codes Yunchao Gong and Svetlana Lazebnik Department of Computer Science, UNC Chapel Hill, NC, 27599. {yunchao,lazebnik}@cs.unc.edu Abstract

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Fast Indexing and Search. Lida Huang, Ph.D. Senior Member of Consulting Staff Magma Design Automation

Fast Indexing and Search. Lida Huang, Ph.D. Senior Member of Consulting Staff Magma Design Automation Fast Indexing and Search Lida Huang, Ph.D. Senior Member of Consulting Staff Magma Design Automation Motivation Object categorization? http://www.cs.utexas.edu/~grauman/slides/jain_et_al_cvpr2008.ppt Motivation

More information

Angular Quantization-based Binary Codes for Fast Similarity Search

Angular Quantization-based Binary Codes for Fast Similarity Search Angular Quantization-based Binary Codes for Fast Similarity Search Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik Google Research, New York, NY, USA Computer Science Department, University

More information

Scalable Graph Hashing with Feature Transformation

Scalable Graph Hashing with Feature Transformation Proceedings of the Twenty-ourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Scalable Graph Hashing with eature Transformation Qing-Yuan Jiang and Wu-Jun Li National Key Laboratory

More information

Inductive Hashing on Manifolds

Inductive Hashing on Manifolds Inductive Hashing on Manifolds Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, Zhenmin Tang The University of Adelaide, Australia Nanjing University of Science and Technology, China Abstract

More information

Supervised Hashing with Latent Factor Models

Supervised Hashing with Latent Factor Models Supervised Hashing with Latent Factor Models Peichao Zhang Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering Shanghai Jiao Tong University, China

More information

Sequential Projection Learning for Hashing with Compact Codes

Sequential Projection Learning for Hashing with Compact Codes Jun Wang jwang@ee.columbia.edu Department of Electrical Engineering, Columbia University, New Yor, NY 27, USA Sanjiv Kumar Google Research, New Yor, NY, USA sanjiv@google.com Shih-Fu Chang sfchang@ee.columbia.com

More information

Searching in one billion vectors: re-rank with source coding

Searching in one billion vectors: re-rank with source coding Searching in one billion vectors: re-rank with source coding Hervé Jégou INRIA / IRISA Romain Tavenard Univ. Rennes / IRISA Laurent Amsaleg CNRS / IRISA Matthijs Douze INRIA / LJK ICASSP May 2011 LARGE

More information

arxiv: v1 [cs.cv] 19 Oct 2017 Abstract

arxiv: v1 [cs.cv] 19 Oct 2017 Abstract Improved Search in Hamming Space using Deep Multi-Index Hashing Hanjiang Lai and Yan Pan School of Data and Computer Science, Sun Yan-Sen University, China arxiv:70.06993v [cs.cv] 9 Oct 207 Abstract Similarity-preserving

More information

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg Spectral Clustering Aarti Singh Machine Learning 10-701/15-781 Apr 7, 2010 Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg 1 Data Clustering Graph Clustering Goal: Given data points X1,, Xn and similarities

More information

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and

More information

Deep Supervised Hashing with Triplet Labels

Deep Supervised Hashing with Triplet Labels Deep Supervised Hashing with Triplet Labels Xiaofang Wang, Yi Shi, Kris M. Kitani Carnegie Mellon University, Pittsburgh, PA 15213 USA Abstract. Hashing is one of the most popular and powerful approximate

More information

Learning to Hash with Binary Reconstructive Embeddings

Learning to Hash with Binary Reconstructive Embeddings Learning to Hash with Binary Reconstructive Embeddings Brian Kulis Trevor Darrell Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-0

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Locality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC)

Locality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC) Locality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC) Goal We want to design a binary encoding of data such that similar data points (similarity measures

More information

Binary Embedding with Additive Homogeneous Kernels

Binary Embedding with Additive Homogeneous Kernels Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) Binary Embedding with Additive Homogeneous Kernels Saehoon Kim, Seungjin Choi Department of Computer Science and Engineering

More information

Learning to Hash with Binary Reconstructive Embeddings

Learning to Hash with Binary Reconstructive Embeddings Learning to Hash with Binary Reconstructive Embeddings Brian Kulis and Trevor Darrell UC Berkeley EECS and ICSI Berkeley, CA {kulis,trevor}@eecs.berkeley.edu Abstract Fast retrieval methods are increasingly

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

Locally Linear Landmarks for large-scale manifold learning

Locally Linear Landmarks for large-scale manifold learning Locally Linear Landmarks for large-scale manifold learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Complementary Projection Hashing

Complementary Projection Hashing 23 IEEE International Conference on Computer Vision Complementary Projection Hashing Zhongg Jin, Yao Hu, Yue Lin, Debing Zhang, Shiding Lin 2, Deng Cai, Xuelong Li 3 State Key Lab of CAD&CG, College of

More information

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in

More information

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

The Normalized Distance Preserving Binary Codes and Distance Table *

The Normalized Distance Preserving Binary Codes and Distance Table * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 32, XXXX-XXXX (2016) The Normalized Distance Preserving Binary Codes and Distance Table * HONGWEI ZHAO 1,2, ZHEN WANG 1, PINGPING LIU 1,2 AND BIN WU 1 1.

More information

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2 So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal

More information

Lost in Binarization: Query-Adaptive Ranking for Similar Image Search with Compact Codes

Lost in Binarization: Query-Adaptive Ranking for Similar Image Search with Compact Codes Lost in Binarization: Query-Adaptive Ranking for Similar Image Search with Compact Codes Yu-Gang Jiang, Jun Wang, Shih-Fu Chang Columbia University, New York, NY 10027, USA IBM T.J. Watson Research Center,

More information

Large Scale Nearest Neighbor Search Theories, Algorithms, and Applications. Junfeng He

Large Scale Nearest Neighbor Search Theories, Algorithms, and Applications. Junfeng He Large Scale Nearest Neighbor Search Theories, Algorithms, and Applications Junfeng He Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School

More information

arxiv: v5 [cs.cv] 15 May 2018

arxiv: v5 [cs.cv] 15 May 2018 A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search The State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, China Alibaba-Zhejiang University Joint Institute

More information

Instances on a Budget

Instances on a Budget Retrieving Similar or Informative Instances on a Budget Kristen Grauman Dept. of Computer Science University of Texas at Austin Work with Sudheendra Vijayanarasimham Work with Sudheendra Vijayanarasimham,

More information

Assigning affinity-preserving binary hash codes to images

Assigning affinity-preserving binary hash codes to images Assigning affinity-preserving binary hash codes to images Jason Filippou jasonfil@cs.umd.edu Varun Manjunatha varunm@umiacs.umd.edu June 10, 2014 Abstract We tackle the problem of generating binary hashcodes

More information

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued MTTTS17 Dimensionality Reduction and Visualization Spring 2018 Jaakko Peltonen Lecture 11: Neighbor Embedding Methods continued This Lecture Neighbor embedding by generative modeling Some supervised neighbor

More information

6.819 / 6.869: Advances in Computer Vision

6.819 / 6.869: Advances in Computer Vision 6.819 / 6.869: Advances in Computer Vision Image Retrieval: Retrieval: Information, images, objects, large-scale Website: http://6.869.csail.mit.edu/fa15/ Instructor: Yusuf Aytar Lecture TR 9:30AM 11:00AM

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

Fast Supervised Hashing with Decision Trees for High-Dimensional Data

Fast Supervised Hashing with Decision Trees for High-Dimensional Data Fast Supervised Hashing with Decision Trees for High-Dimensional Data Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter The University of Adelaide, SA 5005, Australia Abstract

More information

Semi-Supervised Hashing for Scalable Image Retrieval

Semi-Supervised Hashing for Scalable Image Retrieval Semi-Supervised Hashing for Scalable Image Retrieval Jun Wang Columbia University New Yor, NY, 10027 jwang@ee.columbia.edu Sanjiv Kumar Google Research New Yor, NY, 10011 sanjiv@google.com Shih-Fu Chang

More information

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015 Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs

More information

Compact Hash Code Learning with Binary Deep Neural Network

Compact Hash Code Learning with Binary Deep Neural Network Compact Hash Code Learning with Binary Deep Neural Network Thanh-Toan Do, Dang-Khoa Le Tan, Tuan Hoang, Ngai-Man Cheung arxiv:7.0956v [cs.cv] 6 Feb 08 Abstract In this work, we firstly propose deep network

More information

CS 340 Lec. 4: K-Nearest Neighbors

CS 340 Lec. 4: K-Nearest Neighbors CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection

More information

Scalable Similarity Search with Optimized Kernel Hashing

Scalable Similarity Search with Optimized Kernel Hashing Scalable Similarity Search with Optimized Kernel Hashing Junfeng He Columbia University New York, NY, 7 jh7@columbia.edu Wei Liu Columbia University New York, NY, 7 wliu@ee.columbia.edu Shih-Fu Chang Columbia

More information

Network embedding. Cheng Zheng

Network embedding. Cheng Zheng Network embedding Cheng Zheng Outline Problem definition Factorization based algorithms --- Laplacian Eigenmaps(NIPS, 2001) Random walk based algorithms ---DeepWalk(KDD, 2014), node2vec(kdd, 2016) Deep

More information

Learning to Hash for Indexing Big DataVA Survey

Learning to Hash for Indexing Big DataVA Survey INVITED PAPER Learning to Hash for Indexing Big DataVA Survey This paper provides readers with a systematic understanding of insights, pros, and cons of the emerging indexing and search methods for Big

More information

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 16

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 16 Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 16 Subspace/Transform Optimization Zhu Li Dept of CSEE, UMKC Office: FH560E, Email:

More information

Semi-supervised Data Representation via Affinity Graph Learning

Semi-supervised Data Representation via Affinity Graph Learning 1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073

More information

Surfaces, meshes, and topology

Surfaces, meshes, and topology Surfaces from Point Samples Surfaces, meshes, and topology A surface is a 2-manifold embedded in 3- dimensional Euclidean space Such surfaces are often approximated by triangle meshes 2 1 Triangle mesh

More information

Globally and Locally Consistent Unsupervised Projection

Globally and Locally Consistent Unsupervised Projection Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering

More information

A Sparse Embedding and Least Variance Encoding Approach to Hashing

A Sparse Embedding and Least Variance Encoding Approach to Hashing A Sparse Embedding and Least Variance Encoding Approach to Hashing Xiaofeng Zhu, Lei Zhang, Member, IEEE, Zi Huang Abstract Hashing is becoming increasingly important in large-scale image retrieval for

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Digital Geometry Processing

Digital Geometry Processing Digital Geometry Processing Spring 2011 physical model acquired point cloud reconstructed model 2 Digital Michelangelo Project Range Scanning Systems Passive: Stereo Matching Find and match features in

More information

Time and Space Efficient Spectral Clustering via Column Sampling

Time and Space Efficient Spectral Clustering via Column Sampling Time and Space Efficient Spectral Clustering via Column Sampling Mu Li Xiao-Chen Lian James T. Kwok Bao-Liang Lu, 3 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai

More information

SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING

SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING Alireza Chakeri, Hamidreza Farhidzadeh, Lawrence O. Hall Department of Computer Science and Engineering College of Engineering University of South Florida

More information

Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval

Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval Ting Yao, Fuchen Long, Tao Mei,

More information

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline

More information

NEarest neighbor search plays an important role in

NEarest neighbor search plays an important role in 1 EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on knn Graph Cong Fu, Deng Cai arxiv:1609.07228v3 [cs.cv] 3 Dec 2016 Abstract Approximate nearest neighbor (ANN) search

More information

Chapter 7. Learning Hypersurfaces and Quantisation Thresholds

Chapter 7. Learning Hypersurfaces and Quantisation Thresholds Chapter 7 Learning Hypersurfaces and Quantisation Thresholds The research presented in this Chapter has been previously published in Moran (2016). 7.1 Introduction In Chapter 1 I motivated this thesis

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

442 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 15, NO. 2, FEBRUARY 2013

442 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 15, NO. 2, FEBRUARY 2013 442 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 15, NO. 2, FEBRUARY 2013 Query-Adaptive Image Search With Hash Codes Yu-Gang Jiang, Jun Wang, Member, IEEE, Xiangyang Xue, Member, IEEE, and Shih-Fu Chang, Fellow,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 19 Outline

More information

Algorithms for Nearest Neighbors

Algorithms for Nearest Neighbors Algorithms for Nearest Neighbors State-of-the-Art Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Yandex Tech Seminar, April 2007 1 / 28 Outline 1 Problem Statement Applications Data Models

More information

Bilinear discriminant analysis hashing: A supervised hashing approach for high-dimensional data

Bilinear discriminant analysis hashing: A supervised hashing approach for high-dimensional data Bilinear discriminant analysis hashing: A supervised hashing approach for high-dimensional data Author Liu, Yanzhen, Bai, Xiao, Yan, Cheng, Zhou, Jun Published 2017 Journal Title Lecture Notes in Computer

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

The Constrained Laplacian Rank Algorithm for Graph-Based Clustering

The Constrained Laplacian Rank Algorithm for Graph-Based Clustering The Constrained Laplacian Rank Algorithm for Graph-Based Clustering Feiping Nie, Xiaoqian Wang, Michael I. Jordan, Heng Huang Department of Computer Science and Engineering, University of Texas, Arlington

More information

Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification

Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo,

More information