Geometric data structures:

Similar documents
Locality- Sensitive Hashing Random Projections for NN Search

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Problem 1: Complexity of Update Rules for Logistic Regression

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014

Spatial Data Structures

Spatial Data Structures

High Dimensional Indexing by Clustering

Nearest Neighbors Classifiers

Spatial Data Structures

Spatial Queries. Nearest Neighbor Queries

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Spatial Data Structures

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

Big Data Analytics. Special Topics for Computer Science CSE CSE April 14

Nearest Pose Retrieval Based on K-d tree

CS535 Fall Department of Computer Science Purdue University

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

Data Mining and Machine Learning: Techniques and Algorithms

CSC 411: Lecture 05: Nearest Neighbors

A note on quickly finding the nearest neighbour

Range Reporting. Range Reporting. Range Reporting Problem. Applications

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

Nearest Neighbor Search by Branch and Bound

Computational Geometry

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Orthogonal Range Queries

Outline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI

Collision and Proximity Queries

VQ Encoding is Nearest Neighbor Search

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

1 The range query problem

7 Approximate Nearest Neighbors

Data Structures and Algorithms

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Computational Geometry

Acceleration Data Structures

Multiple-choice (35 pt.)

Machine Learning and Pervasive Computing

Search. The Nearest Neighbor Problem

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Going nonparametric: Nearest neighbor methods for regression and classification

Computational Geometry

Nearest Neighbor Methods

CS210 Project 5 (Kd-Trees) Swami Iyer

Data Structures and Algorithms

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Data Structures for Approximate Proximity and Range Searching

Multidimensional Indexes [14]

Spatial Data Structures

Warped parallel nearest neighbor searches using kd-trees

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

CS 340 Lec. 4: K-Nearest Neighbors

Computational Geometry

Accelerated Raytracing

Lesson 05. Mid Phase. Collision Detection

Algorithms for Nearest Neighbors

Assignment 8 CSCE 156/156H/RAIK 184H Spring 2017

Computational Geometry

Approximation Algorithms

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

Collision Detection. These slides are mainly from Ming Lin s course notes at UNC Chapel Hill

Algorithm classification

Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Intersection Acceleration

CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees

Going nonparametric: Nearest neighbor methods for regression and classification

26 The closest pair problem

Lecture 7: Decision Trees

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Nearest neighbor classification DSE 220

Testing Bipartiteness of Geometric Intersection Graphs David Eppstein

Range Reporting. Range reporting problem 1D range reporting Range trees 2D range reporting Range trees Predecessor in nested sets kd trees

15.083J Integer Programming and Combinatorial Optimization Fall Enumerative Methods

Assignment 5: Solutions

Motivation for B-Trees

Trees, Trees and More Trees

Supervised Learning: Nearest Neighbors

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Algorithms for Nearest Neighbors

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Data Structures III: K-D

Backtracking. Chapter 5

Range Searching. Data structure for a set of objects (points, rectangles, polygons) for efficient range queries.

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

CPSC / Sonny Chan - University of Calgary. Collision Detection II

11. APPROXIMATION ALGORITHMS

CS 4649/7649 Robot Intelligence: Planning

Fast Indexing Method. Dongliang Xu 22th.Feb.2008

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

Transcription:

Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1

Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other ideas: KD- trees, ball trees, cover trees Kakade 2 2017

Image Search 3

LSH for Euclidean distance The family of hash functions: Recall R, cr, P1, P2 Pre- processing time: Query time: Sham Kakade 2017 4

What other guarantees might we hope for? Recall sorting: LSH: Voronoi: How about other geometric data structures? What is the key inequality to exploit? Sham Kakade 2017 5

KD- Trees Smarter approach: kd- trees Structured organization of documents Recursively partitions points into axis aligned boxes. Enables more efficient pruning of search space Examine nearby points first. Ignore any points that are further than the nearest point found so far. kd- trees work well in low- medium dimensions We ll get back to this Sham Kakade 2017 6

KD- Tree Construction Pt X Y 1 0.00 0.00 2 1.00 4.31 3 0.13 2.85 Start with a list of d- dimensional points. Sham Kakade 2017 7

KD- Tree Construction NO X>.5 YES Pt X Y 1 0.00 0.00 3 0.13 2.85 Pt X Y 2 1.00 4.31 Split the points into 2 groups by: Choosing dimension d j and value V (methods to be discussed ) x i d j x i d j Separating the points into > V and <= V. Sham Kakade 2017 8

KD- Tree Construction NO X>.5 YES Pt X Y 1 0.00 0.00 3 0.13 2.85 Pt X Y 2 1.00 4.31 Consider each group separately and possibly split again (along same/different dimension). Stopping criterion to be discussed Sham Kakade 2017 9

KD- Tree Construction NO X>.5 YES NO Y>.1 YES Pt X Y 2 1.00 4.31 Pt X Y 3 0.13 2.85 Pt X Y 1 0.00 0.00 Consider each group separately and possibly split again (along same/different dimension). Stopping criterion to be discussed Sham Kakade 2017 10

KD- Tree Construction Continue splitting points in each set creates a binary tree structure Each leaf node contains a list of points Sham Kakade 2017 11

KD- Tree Construction Keep one additional piece of information at each node: The (tight) bounds of the points at or below this node. Sham Kakade 2017 12

KD- Tree Construction Use heuristics to make splitting decisions: Which dimension do we split along? Which value do we split at? When do we stop? Sham Kakade 2017 13

Many heuristics median heuristic center- of- range heuristic Sham Kakade 2017 14

Nearest Neighbor with KD Trees Traverse the tree looking for the nearest neighbor of the query point. Sham Kakade 2017 15

Nearest Neighbor with KD Trees Examine nearby points first: Explore branch of tree closest to the query point first. Sham Kakade 2017 16

Nearest Neighbor with KD Trees Examine nearby points first: Explore branch of tree closest to the query point first. Sham Kakade 2017 17

Nearest Neighbor with KD Trees When we reach a leaf node: Compute the distance to each point in the node. Sham Kakade 2017 18

Nearest Neighbor with KD Trees When we reach a leaf node: Compute the distance to each point in the node. Sham Kakade 2017 19

Nearest Neighbor with KD Trees Then backtrack and try the other branch at each node visited Sham Kakade 2017 20

Nearest Neighbor with KD Trees Each time a new closest node is found, update the distance bound Sham Kakade 2017 21

Nearest Neighbor with KD Trees Using the distance bound and bounding box of each node: Prune parts of the tree that could NOT include the nearest neighbor Sham Kakade 2017 22

Nearest Neighbor with KD Trees Using the distance bound and bounding box of each node: Prune parts of the tree that could NOT include the nearest neighbor Sham Kakade 2017 23

Nearest Neighbor with KD Trees Using the distance bound and bounding box of each node: Prune parts of the tree that could NOT include the nearest neighbor Sham Kakade 2017 24

Complexity For (nearly) balanced, binary trees... Construction Size: Depth: Median + send points left right: Construction time: 1- NN query Traverse down tree to starting point: Maximum backtrack and traverse: Complexity range: Under some assumptions on distribution of points, we get O(logN) but exponential in d (see citations in reading) Sham Kakade 2017 25

Complexity Sham Kakade 2017 26

K- NN with KD Trees Exactly the same algorithm, but maintain distance as distance to furthest of current k nearest neighbors Complexity is: Sham Kakade 2017 27

Approximate K- NN with KD Trees Before: Prune when distance to bounding box > Now: Prune when distance to bounding box > Will prune more than allowed, but can guarantee that if we return a neighbor at distance, then there is no neighbor closer than. r r/ In practice this bound is loose Can be closer to optimal. Saves lots of search time at little cost in quality of nearest neighbor. Sham Kakade 2017 28

KD- trees: What about NNs searches in high dimensions? What is going wrong? Can this be easily fixed? What do have to utilize? utilize triangle inequality of metric New ideas: ball trees and cover trees Sham Kakade 2017 29

Ball Trees Sham Kakade 2017 30

Ball Tree Construction Node: Every node defines a ball (hypersphere), containing a subset of the the points (to be searched) A center A (tight) radius of the points Construction: Root: start with a ball which contains all the data take a ball and make two children (nodes) as follows: Make two spheres, assign each point (in the parent sphere) to its closer sphere Make the two spheres in a reasonable manner Sham Kakade 2017 31

Ball Tree Search Given point x, how do find its nearest neighbor quickly? Approach: Start: follow a greedy path through the tree Backtrack and prune: rule out other paths based on the triange inequality (just like in KD- trees) How good is it? Guarantees: Practice: Sham Kakade 2017 32

Cover trees What about exact NNs in general metric spaces? Same Idea: utilize triangle inequality of metric (so allow for arbitrary metric) What does the dimension even mean? cover- tree idea: Sham Kakade 2017 33

Intrinsic Dimension How does the volume grow, from radius R to 2R? Can we relax this idea to get at the intrinsic dimension? This is the doubling dimension: Sham Kakade 2017 34

NN complexities Sham Kakade 2017 35