VQ Encoding is Nearest Neighbor Search

Similar documents
kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

OPPA European Social Fund Prague & EU: We invest in your future.

Geometric data structures:

Spatial Queries. Nearest Neighbor Queries

Search trees, k-d tree Marko Berezovský Radek Mařík PAL 2012

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Outline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI

CS210 Project 5 (Kd-Trees) Swami Iyer

CMSC 754 Computational Geometry 1

Friday Four Square! 4:15PM, Outside Gates

Data Structures in Java

Big Data Analytics. Special Topics for Computer Science CSE CSE April 14

Binary Trees, Binary Search Trees

Binary Search Trees. See Section 11.1 of the text.

Lecture 3 February 9, 2010

Orthogonal Range Queries

Intersection Acceleration

High Dimensional Indexing by Clustering

Computational Geometry

Computational Geometry

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Ray Tracing Acceleration Data Structures

Range Searching. Data structure for a set of objects (points, rectangles, polygons) for efficient range queries.

KD-Tree Algorithm for Propensity Score Matching PhD Qualifying Exam Defense

Hierarchical Ordering for Approximate Similarity Ranking

Orthogonal range searching. Range Trees. Orthogonal range searching. 1D range searching. CS Spring 2009

CS350: Data Structures Binary Search Trees

Module 8: Range-Searching in Dictionaries for Points

Orthogonal Range Search and its Relatives

Multidimensional Indexes [14]

CS350: Data Structures B-Trees

We can use a max-heap to sort data.

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016

Trees. (Trees) Data Structures and Programming Spring / 28

CS 350 : Data Structures Binary Search Trees

Nearest Neighbor Methods

Midterm Examination CS540-2: Introduction to Artificial Intelligence

{ K 0 (P ); ; K k 1 (P )): k keys of P ( COORD(P) ) { DISC(P): discriminator of P { HISON(P), LOSON(P) Kd-Tree Multidimensional binary search tree Tak

Spatial Data Structures

Data Mining and Machine Learning: Techniques and Algorithms

Spatial Data Structures

CS 350 : Data Structures B-Trees

In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies.

B-Trees. Based on materials by D. Frey and T. Anastasio

TREES. Trees - Introduction

Nearest Neighbors Classifiers

Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Locality- Sensitive Hashing Random Projections for NN Search

Data Structures and Algorithms

Lecture 6: Analysis of Algorithms (CS )

Spatial Data Structures

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

Range Searching and Windowing

CSE 241 Class 17. Jeremy Buhler. October 28, Ordered collections supported both, plus total ordering operations (pred and succ)

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

9. Heap : Priority Queue

Algorithms. AVL Tree

Binary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.

Backtracking. Chapter 5

CSE 326: Data Structures B-Trees and B+ Trees

Spatial Data Structures

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Trapezoidal decomposition:

Data Structures. Giri Narasimhan Office: ECS 254A Phone: x-3748

CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees. Linda Shapiro Spring 2016

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

CS 361 Data Structures & Algs Lecture 9. Prof. Tom Hayes University of New Mexico

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Advanced Set Representation Methods

Algorithms for GIS:! Quadtrees

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

An improvement in the build algorithm for Kd-trees using mathematical mean

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

l So unlike the search trees, there are neither arbitrary find operations nor arbitrary delete operations possible.

CMSC 341 Lecture 15 Leftist Heaps

Clustering. Unsupervised Learning

GEOMETRIC SEARCHING PART 2: RANGE SEARCH

January 10-12, NIT Surathkal Introduction to Graph and Geometric Algorithms

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry

Priority Queues and Binary Heaps

Data Mining and Data Warehousing Classification-Lazy Learners

Data Structures and Algorithms

CMSC 341 Lecture 15 Leftist Heaps

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

CSCI2100B Data Structures Heaps

Programming II (CS300)

CS Machine Learning

A note on quickly finding the nearest neighbour

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Balanced Trees Part Two

Transcription:

VQ Encoding is Nearest Neighbor Search Given an input vector, find the closest codeword in the codebook and output its index. Closest is measured in squared Euclidean distance. For two vectors (w 1,x 1,y 1,z 1 ) and (w 2,x 2,y 2,z 2 ).

kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

k-d Tree Jon Bentley, 1975 Tree used to store spatial data. Nearest neighbor search. Range queries. Fast look-up! k-d trees are guaranteed log 2 n depth where n is the number of points in the set. Traditionally, k-d trees store points in d-dimensional space (equivalent to vectors in ddimensional space).

k-d tree construction If there is just one point, form a leaf with that point. Otherwise, divide the points in half by a line perpendicular to one of the axes. Recursively construct k-d trees for the two sets of points. Division strategies: divide points perpendicular to the axis with widest spread. divide in a round-robin fashion.

k-d tree construction example

k-d tree construction example

k-d tree construction example

k-d tree construction example

k-d tree construction example

k-d tree Construction Complexity First sort the points in each dimension: O(dn log n) time and dn storage. These are stored in A[1..d,1..n] Finding the widest spread and equally dividing into two subsets can be done in O(dn) time. Constructing the k-d tree can be done in O(dn log n) and dn storage

kd-trees Each level has a cutting dimension Cycle through the dimensions as you walk down the tree. Each node contains a point P = (x,y) x y x To find (x,y ) you only compare coordinate from the cutting dimension - e.g. if cutting dimension is x, then you ask: is x < x? x y

kd-tree example insert: (30,40), (5,25), (10,12), (70,70), (50,30), (35,45) x 30,40 (70,70) y 5,25 70,70 (30,40) (35,45) x 10,12 50,30 (5,25) (50,30) y 35,45 (10,12)

FindMin in kd-trees FindMin(d): find the point with the smallest value in the dth dimension. Recursively traverse the tree If cutdim(current_node) = d, then the minimum can t be in the right subtree, so recurse on just the left subtree - if no left subtree, then current node is the min for tree rooted at this node. If cutdim(current_node) d, then minimum could be in either subtree, so recurse on both subtrees. - (unlike in 1-d structures, often have to explore several paths down the tree)

FindMin FindMin(x-dimension): (35,90) (60,80) x 51,75 (51,75) (70,70) y 25,40 70,70 (50,50) (25,40) x 10,30 35,90 55,1 60,80 (1,10) (10,30) (55,1) y 1,10 50,50

FindMin FindMin(y-dimension): (35,90) (60,80) x 51,75 (51,75) (70,70) y 25,40 70,70 (50,50) (25,40) x 10,30 35,90 55,1 55,1 60,80 (1,10) (10,30) (55,1) y 1,10 1,10 50,50

FindMin FindMin(y-dimension): space searched (35,90) (60,80) x 51,75 (51,75) (70,70) y 25,40 70,70 (50,50) (25,40) x 10,30 35,90 55,1 60,80 (1,10) (10,30) (55,1) y 1,10 50,50

Delete in kd-trees Want to delete node A. Assume cutting dimension of A is cd In BST, we d findmin(a.right). cd A Here, we have to findmin(a.right, cd) Everything in Q has cd-coord < B, and everything in P has cdcoord B Q cd B P

Delete in kd-trees --- No Right Subtree What is right subtree is empty? Possible idea: Find the max in the left subtree? - Why might this not work? Suppose I findmax(t.left) and get point (a,b): x (x,y) It s possible that T.left contains another point with x = a. Now, our equal coordinate invariant is violated! (a,c) Q cd (a,b)

No right subtree --- Solution Swap the subtrees of node to be deleted B = findmin(t.left) Replace deleted node by B x (x,y) Now, if there is another point with x=a, it appears in the right subtree, where it should Q cd (a,b) (a,c)

Codebook for 2-d vector

Node Structure for k-d Tree A node has 5 fields axis (splitting axis) value (splitting value) left (left subtree) right (right subtree) point (holds a point if left and right children are null)

Nearest Neighbor Searching in kd-trees Nearest Neighbor Queries are very common: given a point Q find the point P in the data set that is closest to Q. Doesn t work: find cell that would contain Q and return the point it contains. - Reason: the nearest point to P in space may be far from P in the tree: - E.g. NN(52,52): (35,90) (60,80) 51,75 (51,75) 25,40 70,70 (70,70) (50,50) (25,40) 10,30 35,90 55,1 60,80 (1,10) (10,30) (55,1) 1,10 50,50

kd-trees Nearest Neighbor Idea: traverse the whole tree, BUT make two modifications to prune to search space: 1. Keep variable of closest point C found so far. Prune subtrees once their bounding boxes say that they can t contain any point closer than C 2. Search the subtrees in order that maximizes the chance for pruning

Nearest Neighbor: Ideas, continued Query Point Q d Bounding box of subtree rooted at T T If d > dist(c, Q), then no point in BB(T) can be closer to Q than C. Hence, no reason to search subtree rooted at T. Update the best point so far, if T is better: if dist(c, Q) > dist(t.data, Q), C := T.data Recurse, but start with the subtree closer to Q: First search the subtree that would contain Q if we were inserting Q below T.

Nearest Neighbor Facts Might have to search close to the whole tree in the worst case. [O(n)] In practice, runtime is closer to: - O(2 d + log n) - log n to find cells near the query point - 2 d to search around cells in that neighborhood Three important concepts that reoccur in range / nearest neighbor searching: - storing partial results: keep best so far, and update - pruning: reduce search space by eliminating irrelevant trees. - traversal order: visit the most promising subtree first.

Why does k-d tree work?

k-d Tree Nearest Neighbor Search

k-d Tree Nearest Neighbor Search

k-d Tree Nearest Neighbor Search

k-d Tree Nearest Neighbor Search

k-d Tree Nearest Neighbor Search

k-d Tree Nearest Neighbor Search

Notes on Nearest Neighbor Search Has been shown to run in O(log n) average time per search in a reasonable model. (Assuming d a constant) For VQ it appears that O(log n) is correct. Storage for the k-d tree is O(n). Preprocessing time is O(n log n) assuming d is a constant.

Notes on Nearest Neighbor Search Orchard s Algorithm (1991) Uses O(n2) storage but is very fast Annulus Algorithm Similar to Orchard but uses O(n) storage. Does many more distance calculations. Principal Component Partitioning (PCP) Zatloukal, Johnson, Ladner (1999). Similar to k-d trees. Also very fast.

PCP Tree

PCP Tree vs. k-d Tree

Search Time

Notes on VQ Works well in some applications. Requires training. Has some interesting algorithms: Codebook design. Nearest neighbor search. Variable length codes for VQ: PTSVQ - pruned tree structured VQ (Chou, Lookabaugh and Gray, 1989) ECVQ - entropy constrained VQ (Chou, Lookabaugh and Gray, 1989)