Locality- Sensitive Hashing Random Projections for NN Search

Similar documents
Geometric data structures:

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Problem 1: Complexity of Update Rules for Logistic Regression

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

High Dimensional Indexing by Clustering

Nearest Neighbors Classifiers

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

Going nonparametric: Nearest neighbor methods for regression and classification

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

CS 340 Lec. 4: K-Nearest Neighbors

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

Large Scale Nearest Neighbor Search Theories, Algorithms, and Applications. Junfeng He

7 Approximate Nearest Neighbors

Going nonparametric: Nearest neighbor methods for regression and classification

1 The range query problem

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Web- Scale Mul,media: Op,mizing LSH. Malcolm Slaney Yury Li<shits Junfeng He Y! Research

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 18.

Spatial Data Structures

Data Structures and Algorithms

Spatial Data Structures

1 Unweighted Set Cover

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits

Machine Learning and Pervasive Computing

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Computational Geometry

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Balanced Trees Part Two

Spatial Data Structures

Big Data Analytics. Special Topics for Computer Science CSE CSE April 14

Hot topics and Open problems in Computational Geometry. My (limited) perspective. Class lecture for CSE 546,

Algorithms for Nearest Neighbors

Spatial Data Structures

Orthogonal Range Queries

CSC 411: Lecture 05: Nearest Neighbors

Computational Geometry

L9: Hierarchical Clustering

Classifiers and Detection. D.A. Forsyth

Computational Geometry

15.083J Integer Programming and Combinatorial Optimization Fall Enumerative Methods

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Compact Data Representations and their Applications. Moses Charikar Princeton University

Supervised Learning: Nearest Neighbors

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Tree-based methods for classification and regression

A Computational Theory of Clustering

Computational Geometry

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Clustering Documents. Case Study 2: Document Retrieval

Data Mining and Machine Learning: Techniques and Algorithms

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017

Approximation Algorithms for Clustering Uncertain Data

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry

Assignment 5: Solutions

Fast Indexing Method. Dongliang Xu 22th.Feb.2008

Finding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing

Nearest Pose Retrieval Based on K-d tree

Nearest Neighbor Search by Branch and Bound

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014

Computational Geometry

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Nearest neighbor classification DSE 220

CS 229 Midterm Review

Multidimensional Indexes [14]

Physical Level of Databases: B+-Trees

Chapter 5 Efficient Memory Information Retrieval

Horn Formulae. CS124 Course Notes 8 Spring 2018

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Clustering. Unsupervised Learning

Algorithms for Nearest Neighbors

Material You Need to Know

Recitation 9. Prelim Review

Motivation for B-Trees

11. APPROXIMATION ALGORITHMS

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

CSC 261/461 Database Systems Lecture 17. Fall 2017

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

CS369G: Algorithmic Techniques for Big Data Spring

Informed Search A* Algorithm

All lecture slides will be available at CSC2515_Winter15.html

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Semi-supervised learning and active learning

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

Algorithms for Sensor-Based Robotics: Sampling-Based Motion Planning

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

6.856 Randomized Algorithms

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Polynomial-Time Approximation Algorithms

V Advanced Data Structures

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Transcription:

Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade 2017 1 Announcements: HW2 posted Project Milestones Start early Lit. review (>= 3 papers read carefully) First rounds of experiments Today: Review: ball trees, cover trees Today: locality sensitive hashing Kakade 2 2017 1

Case Study 2: Document Retrieval Task Description: Finding Similar Items Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 13, 2017 Sham Kakade 2017 3 Where is FAST similarity search important? 4 2

1- Nearest Neighbor Articles Query: 1- NN Goal: Formulation: Sham Kakade 2017 5 KD- Tree Construction Keep one additional piece of information at each node: The (tight) bounds of the points at or below this node. Sham Kakade 2017 6 3

Nearest Neighbor with KD Trees Traverse the tree looking for the nearest neighbor of the query point. Sham Kakade 2017 7 Nearest Neighbor with KD Trees Examine nearby points first: Explore branch of tree closest to the query point first. Sham Kakade 2017 8 4

Nearest Neighbor with KD Trees Examine nearby points first: Explore branch of tree closest to the query point first. Sham Kakade 2017 9 Nearest Neighbor with KD Trees When we reach a leaf node: Compute the distance to each point in the node. Sham Kakade 2017 10 5

Nearest Neighbor with KD Trees When we reach a leaf node: Compute the distance to each point in the node. Sham Kakade 2017 11 Nearest Neighbor with KD Trees Then backtrack and try the other branch at each node visited Sham Kakade 2017 12 6

Nearest Neighbor with KD Trees Each time a new closest node is found, update the distance bound Sham Kakade 2017 13 Nearest Neighbor with KD Trees Using the distance bound and bounding box of each node: Prune parts of the tree that could NOT include the nearest neighbor Sham Kakade 2017 14 7

Nearest Neighbor with KD Trees Using the distance bound and bounding box of each node: Prune parts of the tree that could NOT include the nearest neighbor Sham Kakade 2017 15 Nearest Neighbor with KD Trees Using the distance bound and bounding box of each node: Prune parts of the tree that could NOT include the nearest neighbor Sham Kakade 2017 16 8

Complexity For (nearly) balanced, binary trees... Construction Size: Depth: Median + send points left right: Construction time: 1- NN query Traverse down tree to starting point: Maximum backtrack and traverse: Complexity range: Under some assumptions on distribution of points, we get O(logN) but exponential in d (see citations in reading) Sham Kakade 2017 17 Complexity Sham Kakade 2017 18 9

KD- trees: What about NNs searches in high dimensions? What is going wrong? Can this be easily fixed? What do have to utilize? utilize triangle inequality of metric New ideas: ball trees and cover trees Sham Kakade 2017 19 Ball Trees Sham Kakade 2017 20 10

Ball Tree Construction Node: Every node defines a ball (hypersphere), containing a subset of the the points (to be searched) A center A (tight) radius of the points Construction: Root: start with a ball which contains all the data take a ball and make two children (nodes) as follows: Make two spheres, assign each point (in the parent sphere) to its closer sphere Make the two spheres in a reasonable manner Sham Kakade 2017 21 Ball Tree Search Given point x, how do find its nearest neighbor quickly? Approach: Start: follow a greedy path through the tree Backtrack and prune: rule out other paths based on the triange inequality (just like in KD- trees) How good is it? Guarantees: Practice: Sham Kakade 2017 22 11

Cover trees What about exact NNs in general metric spaces? Same Idea: utilize triangle inequality of metric (so allow for arbitrary metric) What does the dimension even mean? cover- tree idea: Sham Kakade 2017 23 Intrinsic Dimension How does the volume grow, from radius R to 2R? Can we relax this idea to get at the intrinsic dimension? This is the doubling dimension: Sham Kakade 2017 24 12

Cover trees: data structure Ball Trees: each node had associated Center: (tight) Radius: Points: Cover trees: Center: (tight) Radius: Points: Sham Kakade 2017 25 Cover Tree Complexity Construction Size: Construction time: 1- NN query: Check all paths with triangle. Maximum time complexity: Under assumptions that doubling dimension is D. Provable method for datastructure construction. Sham Kakade 2017 26 13

kd- trees Wrapping Up Important Points Tons of variants On construction of trees (heuristics for splitting, stopping, representing branches ) Other representational data structures for fast NN search (e.g.,cover trees, ball trees, ) Nearest Neighbor Search Distance metric and data representation are crucial to answer returned For both High dimensional spaces are hard! Number of kd- tree searches can be exponential in dimension Rule of thumb N >> 2 d Typically useless. Ball Trees and Cover Trees more effective here! Sham Kakade 2017 27 What you need to know Document retrieval task Document representation (bag of words), tf- idf Also, think about image search! Nearest neighbor search Formulation Different distance metrics and sensitivity to choice Challenges with large N, d kd- trees for nearest neighbor search Construction of tree NN search algorithm using tree Complexity of construction and query Challenges with large d Sham Kakade 2017 28 14

Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade 2017 29 Intuition (?): NN in 1D and Sorting How do we do 1- NN searches in 1 dim? Pre- processing time: Query time: Sham Kakade 2017 30 15

Using Hashing to Find Neighbors KD- trees are cool, but Non- trivial to implement efficiently Problems with high- dimensional data Approximate neighbor finding Don t find exact neighbor, but that s OK for many apps, especially with Big Data What if we could use hash functions: Hash elements into buckets: Look for neighbors that fall in same bucket as x: But, by design Sham Kakade 2017 31 What to hash? Before: we were hashing words /strings Remember, we can think of hash functions abstractly: Idea of LSH: try to has similar items into same buckets and different items into different buckets Sham Kakade 2017 32 16

Locality Sensitive Hashing (LSH) Suppose we have a set of functions H and a distribution over these functions. A LSH family H satisfies (for example), for some similarity function d, for r>0, α>1, 1>P1,P2>0: d(x,x ) r, then Pr H (h(x)=h(x )) is high, with prob>p1 d(x,x ) > α.r, then Pr H (h(x)=h(x )) is low, with probl<p2 (in between, not sure about probability) Sham Kakade 2017 33 LSH: basic paradigm Step 0: pick a simple way to construct LSH functions Step 1: (amplification) make another hash function by repeating this construction Step 2: the output of this function specifies the index to a bucket. Step 3: use multiple hash tables. for recall, search for similar items in the same buckets. Sham Kakade 2017 34 17

Example: hashing binary strings Suppose x and x are binary strings Hamming distance metric x- x What is a simple family of hash function? Suppose x- x are R close, what is P1? Suppose x- x >cr, what is P2? Sham Kakade 2017 35 Amplification Improving P1 and P2 Now the hash function is: The choice m is a parameter. Sham Kakade 2017 36 18

Review: Random Projection Illustration Pick a random vector v: Independent Gaussian coordinates Preserves separability for most vectors Gets better with more random vectors Sham Kakade 2017 37 Multiple Random Projections: Approximating Dot Products Pick m random vectors v(i): Independent Gaussian coordinates Approximate dot products: Cheaper, e.g., learn in smaller m dimensional space Only need logarithmic number of dimensions! N data points, approximate dot- product within ε>0: But all sparsity is lost log N m = O 2 Sham Kakade 2017 38 19

LSH Example function: Sparser Random Projection for Dot Products Pick random vector v Simple 0/1 projection: h(x) = Now, each vector is approximated by a single bit This is an LSH function, though with poor α and P2 Sham Kakade 2017 39 LSH Example continued: Amplification with multiple projections Pick random vectors v (i) Simple 0/1 projection: φ i (x) = Now, each vector is approximated by a bit- vector Dot- product approximation: Sham Kakade 2017 40 20

LSH for Approximate Neighbor Finding Very similar elements fall in exactly same bin: And, nearby bins are also nearby: Simple neighbor finding with LSH: For bins b of increasing hamming distance to φ(x): Look for neighbors of x in bin b Stop when run out of time Pick m such that N/2 m is smallish + use multiple tables Sham Kakade 2017 41 LSH: using multiple tables Sham Kakade 2017 42 21

NN complexities Sham Kakade 2017 43 Hash Kernels: Even Sparser LSH for Learning Two big problems with random projections: Data is sparse, but random projection can be a lot less sparse You have to sample m huge random projection vectors And, we still have the problem with new dimensions, e.g., new words Hash Kernels: Very simple, but powerful idea: combine sketching for learning with random projections Pick 2 hash functions: h : Just like in Count- Min hashing ξ : Sign hash function Removes the bias found in Count- Min hashing (see homework) Define a kernel, a projection ϕ for x: Sham Kakade 2017 44 22

Hash Kernels, Random Projections and Sparsity i(x) = X j:h(j)=i Hash Kernel as a random projection: (j)x j What is the random projection vector for coordinate i of ϕ i : Implicitly define projection by h and ξ, so no need to compute apriori and automatically deals with new dimensions Sparsity of ϕ, if x has s non- zero coordinates: Sham Kakade 2017 45 What you need to know Locality- Sensitive Hashing (LSH): nearby points hash to the same or nearby bins LSH uses random projections Only O(log N/ε 2 ) vectors needed But vectors and results are not sparse Use LSH for nearest neighbors by mapping elements into bins Bin index is defined by bit vector from LSH Find nearest neighbors by going through bins Hash kernels: Sparse representation for feature vectors Very simple, use two hash functions Can even use one hash function, and take least significant bit to define ξ Quickly generate projection ϕ(x) Learn in projected space Sham Kakade 2017 46 23