A note on quickly finding the nearest neighbour
|
|
- Anissa Briggs
- 6 years ago
- Views:
Transcription
1 A note on quickly finding the nearest neighbour David Barber Department of Computer Science University College London May 19, Finding your nearest neighbour quickly Consider that we have a set of datapoints x 1,..., x N and a new query vector q. Our task is to find the nearest neighbour to the query n = argmin d(q, x n ), n = 1,..., N. For the Euclidean distance n d(x, y) = (x y) 2 = D (x i y i ) 2 (1) i=1 and D dimensional vectors, it takes O (D) operations to compute this distance. For a set of N vectors, computing the nearest neighbour to q would take then O (DN) operations. For large datasets this can be prohibitively expensive. Is there a way to avoid calculating all the distances? This is a large research area (see [2] for a review) and we will focus here on first methods that make use of the triangle inequality for metric distances and secondly a KD-trees which form a spatial data structure. 2 Using the triangle inequality to speed up search 2.1 The triangle inequality For the Euclidean distance we have (x y) 2 = (x z + z y) 2 = (x z) 2 + (z y) (x z) (z y) (2) Since the scalar product between two vectors a and b relates the lengths of the vectors a, b and the angle θ between them by a T b = a b cos(θ) we have (using x x T x) x y 2 = x z 2 + z y cos(θ) x z z y (3) Using cos(θ) 1 we obtain the triangle inequality x y x z + z y (4) Geometrically this simply says that for a triangle formed by the points x, y, z, it is shorter to go from x to y than to go from x to an intermediate point z and then from that point to y. More generally a distance d(x, y) satisfies the triangle inequality if it is of the form 1 d(x, y) d(x, z) + d(y, z) (5) Formally the distance is a metric if it is symmetric d(x, y) = d(y, x), non negative, d(x, y) 0 and d(x, y) = 0 x = y. 1 Note here that d(, ) is defined as the Euclidean distance, not the squared Euclidean distance. 1
2 Using the triangle inequality to speed up search q x i x j Figure 1: If we know that x i is close to q, but that x i and x j are not close, namely d(q, x i ) 1 2 d(xi, x j ), then we can infer that x j will not be closer to q than x i, i.e. d(q, x i ) d(q, x j ). A useful basic fact that we can deduce for such metric distances is the following: If d(x, y) 1 2d(z, y), then d(x, y) d(x, z), meaning that we do not need to compute d(x, z). To see this consider d(y, z) d(y, x) + d(x, z) (6) If we are in the situation that d(x, y) 1 2d(z, y), then we can write 2d(x, y) d(y, x) + d(x, z) (7) and hence d(x, y) d(x, z). In the nearest neighbour context, we can infer that if d(q, x i ) 1 2 d(xi, x j ) then d(q, x i ) d(q, x j ), see fig(1). 2.2 Using all datapoint to datapoint distances One way to use the above result is to first precompute all the distance pairs d ij d(x i, x j ) in the dataset. Orchard s algorithm Given these distances, for each i we can then compute an ordered list L i = { j i 1, ji 2,..., ji N 1} of those vectors x j that are closest to x i, with d(x i, x ji 1 ) d(x i, x ji 2 ) d(x i, x ji 3 ).... We then start with some vector x i as our current best guess for the nearest neighbour to q and compute d(q, x i ). We then examine the first element of the list L i and consider the following cases: If d(q, x i ) 1 2 d i,j i 1 then ji 1 cannot be closer than xi to q; furthermore, neither can any of the other members of this list since they automatically satisfy this bound as well. In this fortunate situation, x i must be the nearest neighbour to q. If d(q, x i ) 1 2 d i,j i 1 then we move on to the next member of the list, ji 2 and compute d(q, x ji 2 ). If d(q, x ji 2 ) < d(q, x i ) we have found a better candidate i j2 i than our current best guess, and we jump to the start of the new list L i. Otherwise we continue to traverse the current list, checking if d(q, x i ) 1 2 d i,j2 i, etc. [6]. In summary, this process of traversing a list for the candidate continues until we either: (i) find a better candidate (and jump to the top of its list) and restart traversing the new candidate list (ii) find that the bound criterion is met and declare that the current candidate is then optimal (iii) get to the of a list, in which case the current candidate is optimal. See algorithm(1) and fastnnorchard.m for a formal description of the algorithm. Complexity Orchard s algorithm requires O ( D 2) storage to precompute the distance sorted lists. In the worst case, it can take O (N) distance calculations to find the nearest neighbour. To see this, consider a simple one
3 Using the triangle inequality to speed up search Algorithm 1 Orchard s Nearest Neighbour Search 1: Compute all pairwise distances metric(data{m},data{n}) 2: For each datapoint n, compute the list{n} that stores the distance list{n}.distance and index list{n}.index of each other datapoint, sorted by increasing distance, list{n}.distance(1) list{n}.distance(2)... 3: for all Query points do 4: cand.index=randi(n) assign first candidate index randomly as one of the N datapoint indices 5: cand.distance=metric(query,data{cand.index}) 6: Assign all nodes to state not tested 7: i = 1 start at the beginning of the list 8: while i N and list{cand.index}.distance(i)<2*cand.distance do 9: node=list{cand.index}.index(i) get the index of the i th member of the current list 10: if tested(node)=false then just to avoid computing this distance again 11: tested(node)=true 12: querydistance=metric(query,data{node}) 13: if querydistance<cand.distance then found a better candidate 14: cand.index=node; 15: cand.distance=querydistance; 16: i = 1 go to start of next list 17: else 18: i = i + 1 continue to traverse the current list 19: if 20: i = i : if 22: while 23: cand.index and cand.distance contain the nearest neighbour and distance thereto 24: for dimensional dataset: x n+1 = x n + 1, n = 1,..., N x 1 = 0 (8) If the query point is for example q = x N + 1 and our initial candidate for the nearest neighbour is x 1, then at iteration k, the nearest non-visited datapoint will be x k+1, meaning that we will simply walk through all the data, see fig(2). x 1 0 x 2 1 x 2 2 x 3 3 x 4 4 q 5 Figure 2: A worst-case scenario for Orchard s algorithm. If the initial candidate is x 1, then x 2 becomes the next best candidate, and subsequenly x 3 etc. Approximating and Eliminating Search Algorithm (AESA) The triangle inequality can be used to form a lower bound d(q, x j ) d(q, x i ) d(x i, x j ) (9) For datapoints x i, i I for which d(q, x i ) has already been computed, one can then maximise the lower bounds to find the tightest lower bound on all other d(q, x j ) 2 : d(q, x j ) max i I d(q, xi ) d(x i, x j ) L j (10) All datapoints x j whose lower bound is greater than the current best nearest neighbour distance can then be eliminated, see fig(3). One may then select the next (non-eliminated) candidate datapoint x j corresponding to the lowest bound and continue, updating the bound and eliminating [7], see algorithm(2) and fastnnaesa.m. 2 In the classic AESA one only retains the best current nearest neighbour I = best.
4 Using the triangle inequality to speed up search d(q, x ) L 1 L 2 L 3 L 4 Figure 3: We can eliminate datapoints x 2 and x 3 since their distance to the query is greater than the current best candidate distance d(q, x ). After eliminating these points, we use the datapoint with the lowest bound to suggest the next candidate, in this case x 5. L 5 Algorithm 2 AESA Nearest Neighbour Search 1: For all datapoints compute and store the distances d(x i, x j ) 2: for all Queries do 3: best.dist= 4: I = Datapoints examined 5: J = {1,..., N} Datapoints not examined 6: L(n) =, n = 1,..., N 7: while J is not empty do 8: cand.ind=arg min j J bound(j) select candidate based on lowest bound 9: cand.dist=metric(query,data{cand.ind}) 10: distquerydata(cand.ind)=cand.dist store computed distances 11: I = I cand.ind Add candidate to examined list 12: if cand.dist<best.dist then If candidate is nearer than current best 13: best.dist=cand.dist 14: best.ind=cand.ind 15: if 16: L(j) = max i I distquerydata(i) d(x i, x j ), j J \ cand.ind lower bound 17: J = {j such that L(j) < best.dist} eliminate 18: while 19: for Complexity As for Orchard s algorithm, AESE requires O ( D 2) storage to precompute the distance matrix d(x i, x j ). During the first lower bound computation, when no datapoints have yet been eliminated, AESE needs to compute all the N 1 lower bounds for datapoints other than the first candidate examined. Whilst each lower bound is fast to compute, this still requires an O (N) computation. A similar computation is required to update the bounds at later stages, with the worst case being that there are N/2 datapoints left to examine, meaning that computing the optimal lower bound scales O ( N 2) in this case. One can limit this complexity by restricting the elements of I in the max operation to compute the bound; however this may result in more iterations being required since the bound is then potentially inferior. The experiments in demofastnn.m also suggest that the AESA method typically requires less distance calculations than Orchard s approach. However, there remains at least an O (N) calculation required to compute the bounds in AESA which may be prohibitive, deping on the application. 2.3 Using the datapoints to buoys distances Both Orchard s algorithm and AESA can significantly reduce the number of distance calculations required. However, we pay an O ( N 2) storage cost. For very large datasets, this storage cost is likely to be prohibitive. Given the difficulty in storing d i,j, an alternative is to consider the distances between the training points and a smaller number of strategically placed buoys 3, b 1,..., b B, B < N. These buoys can be either a subset of the original datapoints, or new positions. 3 Also called pivots or basis vectors by other authors.
5 Using the triangle inequality to speed up search Ũ 2 Ũ 4 Ũ 1 L 1 L2 Ũ 3 L 3 L 4 Ũ 5 Figure 4: We can eliminate datapoint x 2 since there is another datapoint (either x 1 or x 5 ) that has an upper bound Ũ that is lower than L 2. We can similarly eliminate x 3. L 5 Ũ 1 L 1 Ũ 2 L 2 Ũ 3 L 3 Figure 5: The lower bounds of non-eliminated datapoints from fig(4) relabelled such that L 1 L In linear AESA we use these lower bounds to order the search for the nearest neighbour, starting with x 1. If we get to a bound where L m is greater than our current best distance, then all remaining distances must be greater than our current best distance, and the algorithm terminates. Pre-elimination Given the buoys, the triangle inequality gives the following upper and lower bounds on the distance from the query to each datapoint: d(q, x n ) max m d(q, bm ) d(b m, x n ) L n (11) d(q, x n ) min m d(q, bm ) + d(b m, x n ) Ũn (12) We can then immediately eliminate any m for which there is some n m with L(m) Ũ(n), see fig(4). This enables one to pre-eliminate datapoints, at a cost of B distance calculations, see fastnnbuoyselim.m. The remaining candidates can then be used in the Orchard or AESA algorithms (either the standard ones described above or the buoy variants described below). AESA with buoys In place of the exact distances to the datapoints, an alternative is to relabel the datapoints according to L n, with lowest distance first L 1 L 2,... L n. We can then compute the distance d(q, x 1 ) and compare this to L 2. If d(q, x 1 ) L 2 then x 1 must be the nearest neighbour, and the algorithm terminates. Otherwise we move on to the next candidate x 2. If this datapoint has a lower distance than our current best guess, we update our current best guess accordingly. We then move on to the next candidate in the list and continue. If we reach a candidate in the list for which d(q, x best ) L m the algorithm terminates, see fig(5). This algorithm is also called linear AESA [5], see fastnnlaesa.m. The gain here is that the storage costs are reduced to O (NB) since we only now need to pre-compute the distances between the buoys and the dataset vectors. By choosing B N, this can be a significant saving. The loss is that, since we are now not using the true distance but a bound, we may need more distance calculations d(q, x i ). Orchard with buoys For Orchard s algorithm we can also use buoys to construct bounds on d i,j on the fly. Consider an arbitrary vector b, then, d(x i, b) d(x j, b) d(x i, x j ) (13)
6 Using the triangle inequality to speed up search datapoints query buoys nearest neigbour eliminated Figure 6: Example of elimination using buoys. All points expect for the query are datapoints. Using buoys, we can pre-eliminate (crossed datapoints) a large number of the datapoints from further consideration. See demofastnn.m We can use this in Orchard s approach since if there is a vector b such that 2d(q, x i ) d(x i, b) d(x j, b) (14) then it follows that d(q, x i ) d(q, x j ). This suggests that we can replace using the exact distance d(x i, x j ) with an upper bound d(x i, b) d(x j, b). If we have a set of such buoys b 1,..., b B, we want to use the highest lower bound approximation to the true distance d(x i, x j ): 2d(q, x i ) max m d(xi, b m ) d(x j, b m ) d i,j (15) These surrogate distances d i,j can then be used as in the standard Orchard algorithm to form (on the fly) a list L i of closest vectors x j to the current candidate x i, sorted according to this surrogate distance. The algorithm then proceeds as before, see fastnnorchardbuoys.m. Whilst this may seem useful, one can show that AESA-buoys dominates Orchard-buoys. In AESA-buoys the lower bound on d(q, x i ) is given by L i = max m d(q, bm ) d(x i, b m ) (16) Let m be the optimal buoy index for i, namely L i = d(q, b m ) d(x i, b m ) (17) Furthermore L j = max m d(q, bm ) d(x j, b m ) d(q, b m ) d(x j, b m ) (18) If we have reached the Orchard-buoys termination criterion d(x j, b m ) 2d(q, x i ) d(x i, b m ) for all j > i (19) then datapoint x i is the nearest neighbour. Hence, if Orchard-buoys terminates for x i, we must have, for j > i: L j d(q, b m ) d(x i, b m ) + 2d(q, x i ) d(q, b m ) d(x i, b m ) = L i for all j > i (20) Furthermore, L j d(q, b m ) d(x j, b m ) d(q, b m ) + d(q, x i ) d(x i, b m ) + d(q, x i ) d(b m, x i ) d(x i, b m ) +d(q, x i ) }{{} for all j > i (21) 0
7 KD trees v q x Figure 7: Consider one dimensional data in which the datapoints are partioned into those that lie to the left of v and those to the right. If the current best candidate x has a distance to the query q that is less than the distance of the query q to v, then none of the points to the left of v can be the nearest neighbour. which is precisely the AESA-bouys termination criterion. Hence Orchard-buoys terminating implies that AESA-buoys terminates. Since AESA-buoys has a different termination criterion to Orchard-buoys, AESAbuoys has the opportunity to terminate before Orchard-buoys. This explains why Orchard-buoys cannot outperform AESA-buoys in terms of the number of distance calculations d(q, x i ). This observation is also borne out in demofastnn.m, see also fig(6) for a D = 2 demonstration. 2.4 Other uses of nearest neighbours Note also that nearest neighbour calculations are required in other applications, for example k-means clustering see [4] for a fast algorithm based on similar applications of the triangle inequality. 3 KD trees K-dimensional trees [1] are a way to form a partition of the space that can be used to help speed up search. Before introducing the tree, we ll discuss the basic idea on which the potential speed-up is based. 3.1 Basic idea in one-dimension If we consider first one-dimensional data x n, n = 1,..., N we can partition the data into points that have value less than a chosen value v, and those with a value greater than this, see fig(7). If the distance of the current best candidate x to the query point q is smaller than the distance of the query to v, then points to the left of v cannot be the nearest neighbour. To see this, consider a query point that is in the right space, q > v and a candidate x that is in the left space, x < v, then (x q) 2 = (x v + v q) 2 = (x v) (x v) (v q) +(v q) 2 (v q) 2 (22) }{{}}{{} 0 0 Let the distance of the current best candidate to the query be δ 2 (x q) 2. Then if (v q) 2 δ 2 it follows that all points in the left space are further from q than x. In the more general K dimensional case, consider a query vector q. Let s imagine that we have partitioned the datapoints into those with first dimension x 1 less than a defined value v (to its left ), and those with a value greater or equal to v (to its right ): L = {x n : x n 1 < v}, R = {x n : x n 1 v} (23) Let s also say that our current best nearest neighbour candidate has squared Euclidean distance δ 2 = ( q x i ) 2 from q and that q1 v. The squared Euclidean distance of any datapoint x L to the query is (x q) 2 = k (x k q k ) 2 (x 1 q 1 ) 2 (v q 1 ) 2 (24) If (v q 1 ) 2 > δ 2, then (x q) 2 > δ 2. That is, all points in L must be further from q than the current best point x i. On the other hand, if (v q 1 ) 2 δ 2, then it is possible that some point in L might be closer to q than our current best nearest neighbour candidate, and we need to check these points. The KD-tree approach essentially is a recursive application of the above intuition.
8 KD trees 3.2 Constructing the tree For N datapoints, the tree consists of N nodes. Each node contains a datapoint, along with the axis along which the data is split. We first need to define a routine [middata leftdata rightdata]=splitdata(x,axis) that splits data along a specified dimension of the data. Whilst not necessary, it is customary to use the median value along the split dimension to partition. The routine should return : middata We first form the set of scalar values that correspond to the axis components of x. These are sorted and the middata is the datapoint close to the median of the data. leftdata These are the datapoints to the left of the middata datapoint. rightdata These are the datapoints to the right of the middata datapoint. For example, if the datapoints are (2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2) and we split along dimension 1, then we would have: >>x = >> [middata leftdata rightdata]=splitdata(x,1) middata = 7 2 leftdata = rightdata = The tree can be constructed recursively as follows. We start with node(1) and create a temporary storage node(1).data that contains the complete dataset. We then call [middata leftdata rightdata]=splitdata(node(1).data,1) forming node(1).x=middata. We now form two child nodes and populate them with data node(2).data=leftdata; node(3).data=rightdata. We store also in node(1).axis which axis was used to split the data The temporary data node(1).data can now be removed. We now move down to the second layer, and split along the next dimension for nodes in this layer. We then go through each of the nodes and split the corresponding data, forming a layer beneath. If leftdata is empty, then the corresponding child node is not formed, and similarly if rightdata is empty:
9 KD trees Algorithm 3 KD tree construction function [node A]=KDTreeMake(x) [D N]=size(x); A=sparse(N,N); node(1).data=x; for level=1:1+floor(log2(n)) % generate the binary tree: split_dimension=rem(level-1,d)+1; % cycle over split dimensions if level>1 pa=layer{level-1}; % parents ch=children(a,pa); % children else ch=1; idx=max(ch); layer{level}=ch; for i=ch [node(i).x leftdata rightdata]=splitdata(node(i).data,split_dimension); node(i).split_dimension=split_dimension; if ~isempty(leftdata); idx=idx+1; node(idx).data=leftdata; A(i,idx)=1; if ~isempty(rightdata); idx=idx+1; node(idx).data=rightdata; A(i,idx)=1; node(i).data=[]; % remove to save storage In this way, the top layer split the data along axis 1, and then the nodes 2 and 3 in the second layer split the data along dimension 2. The nodes in the third layer don t require any more splitting since they contain single datapoints. Recursive programming is a natural way to construct the tree. Alternatively, one can avoid this by explicitly constructing the tree layer by layer. See algorithm(3) and KDTreeMake.m for the full algorithm details in MATLAB. We can also depict the datapoints that each node represents: 1 : (7, 2) 2 : (5, 4) 3 : (9, 6) 4 : (2, 3) 5 : (4, 7) 6 : (8, 1) The hierarchical partitioning of the space that this tree represents can also be visualised, see fig(8). There are many extensions of the KD tree, for example to search for the nearest K neighbours, or search for datapoints in a particular range. Different hierarchical partitioning strategies, including non-axis aligned partitions can also be considered. See [3] for further discussion. Complexity Building a KD tree has O (N log N) time complexity and O (KN) space complexity. 3.3 Nearest Neighbour search To search we can make recursive use of our simple observation in section(3.1). For a query point q we first find the leaf node of the tree by traversing the tree from the root, and seeing if the corresponding components of q is to the left or to the right of the current tree node. For the above tree, for the query q = (9, 2), we would first consider the value 9 (since the first layer splits along dimension 1). Since 9 is to
10 Curse of Dimensionality :(4,7) :(2,3) 2:(5,4) 3:(9,6) Figure 8: KD tree. The data is plotted along with its node index, and corresponding split dimension (either horiztonal or vertical). The KD tree partitions the space into hyperrectangles. 2 1:(7,2) 1 6:(8,1) the right of 7 (the first dimension of node 1), we go now to node 3. In this case, there is a unique child of node 3, so we continue to the leaf, node 6. Node 6 now represents our first guess for the nearest neighbour. This has distance (9 8) 2 +(2 1) 2 = 2 δ 2 from the query. We set this to our current best guess of the nearest neighbour and corresponding distance. We then go up the tree, to the parent, node 3. We check if this is a better neighbour. It has distance (9 9) 2 + (2 6) 2 = 16, so this is worse. Since there are no other children to consider, we move up another level, this time to node 1. This has distance (9 7) 2 + (2 2) 2 = 4, which is also worse than our current best. We now need to consider if there are any nodes in the other branch of the tree containing nodes 2,4,5, that could be better than our current best. We know that for all nodes 2,4,5 they have first dimensions with value less than 7. We can then check if the corresponding datapoints are necessarily further than our current best guess by checking if (v q 1 ) 2 > δ 2, namely if (7 9) 2 > 2. Since this is true, it must be that all the points in nodes 2,4,5 are further away from the query than our current best guess. At this point the algorithm terminates, having examined only 3 datapoints in which to find the nearest neighbour, rather than all 6. The full procedure is given in MATLAB in algorithm(4) KDTreeNN.m (again using non-recursive programming). See also demokdtree.m for a demonstration. Complexity Searching a KD tree has in the worst case O (N) time complexity. For dense and non-pathological data, typically the algorithm is much more efficient and O (log N) time complexity can be achieved. 4 Curse of Dimensionality Whilst KD-trees and Orchard s approach can work well when the data is reasonably uniformly distributed over the space, their efficacy can reduce dramatically when the space is not well covered by the data. If we split each of the D dimensions into 2 parts, for example in 2 dimensions: we will have 2 D partitions of the data. For data to be uniformly distributed, we need a dataset to occupy each of these partitions. This means that we need at least 2 D datapoints in order to begin to see reasonable coverage of the space. Without this exponentially large number of datapoints, most of these partitions will be empty, with the effectively small number of datapoints very sparsely scattered throughout the space. In this case there is little speed up that can be expected based on either triangle or KD tree methods. It is also instructive to understand that in high dimensions, two datapoints will typically be far apart. To
11 Curse of Dimensionality Algorithm 4 KD tree nearest neighbour search function [bestx bestdist]=kdtreenn(q,x) [node A]=KDTreeMake(x); bestdist=realmax; N=size(x,2); tested=false(1,n); treenode=1; % start at the top of the tree for loop=1:n % maximum possible number of loops % assign query point to a leaf node (in the remaining tree): for level=1:1+floor(log2(n)) ch=children(a,treenode); if isempty(ch); break; % hit a leaf if length(ch)==1 treenode=ch(1); else if q(node(treenode).split_dimension)<node(treenode).x(node(treenode).split_dimension); treenode=ch(1); else treenode=ch(2); % check if leaf is closer than current best: if ~tested(treenode) testx=node(treenode).x; dist=sum((q-testx).^2); tested(treenode)=true; if dist<bestdist; bestdist=dist; bestx=testx; parentnode=parents(a,treenode); if isempty(parentnode); break; % finished searching all nodes A(parentnode,treenode)=0; % remove child from tree to stop searching % first check if this is a better node: if ~tested(parentnode) testx=node(parentnode).x; dist=sum((q-testx).^2); if dist<bestdist; bestdist=dist; bestx=testx; tested(parentnode)=true; % see if could be points closer on the other branch: if (node(parentnode).x(node(parentnode).split_dimension)... - q(node(parentnode).split_dimension))^2>bestdist A(parentnode,children(A,parentnode))=0; % if not then remove this branch treenode=parentnode; % move up to parent on tree see this, consider the volume of a hypershere of radius r in D dimensions. This is given by the expression V = πd/2 r D (D/2)! (25) The fraction of volume that a unit hypersphere occupies when inscribed by a unit hypercube is π D/2 2 D (D/2)! (26)
12 REFERENCES REFERENCES This value drops exponentially quickly with the dimension of the space D, meaning that nearly all points lie in the corners of the hypercube. Hence two randomly chosen points will typically be in different corners and far apart nearest neighbours are likely to be a long distance away. The triangle and KD tree approaches are effective in cases where data is locally concentrated, a situation highly unlikely to occur in high dimensional data. References [1] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 9(18): , [2] K. L. Clarkson. Nearest-neighbor searching and metric space dimensions. In In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. MIT Press, [3] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry (Algorithms and Applications). Springer, [4] C. Elkan. Using the Triangle Inequality to Accelerate k-means. In T. Fawcett and N. Mishra, editors, International Conference on Machine Learning, pages , Menlo Park, CA, AAAI Press. [5] L. Micó, J. Oncina, and E. Vidal. A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. 15(1):9 17, [6] M. T. Orchard. A fast nearest-neighbor search algorithm. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4: , [7] E. Vidal. An algorithm for finnding nearest neighbours in (approximately) constant average time. Pattern Recognition Letters, 4(3): , 1986.
Geometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationOutline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI
Other Use of Triangle Ineuality Algorithms for Nearest Neighbor Search: Lecture 2 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Outline
More informationGeometric Data Structures Multi-dimensional queries Nearest neighbour problem References. Notes on Computational Geometry and Data Structures
Notes on Computational Geometry and Data Structures 2008 Geometric Data Structures and CGAL Geometric Data Structures and CGAL Data Structure Interval Tree Priority Search Tree Segment Tree Range tree
More informationNearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017
Nearest neighbors Focus on tree-based methods Clément Jamin, GUDHI project, Inria March 2017 Introduction Exact and approximate nearest neighbor search Essential tool for many applications Huge bibliography
More informationMachine Learning and Pervasive Computing
Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)
More informationSearch. The Nearest Neighbor Problem
3 Nearest Neighbor Search Lab Objective: The nearest neighbor problem is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and data compression.
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationPivoting M-tree: A Metric Access Method for Efficient Similarity Search
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz
More informationHierarchical Ordering for Approximate Similarity Ranking
Hierarchical Ordering for Approximate Similarity Ranking Joselíto J. Chua and Peter E. Tischer School of Computer Science and Software Engineering Monash University, Victoria 3800, Australia jjchua@mail.csse.monash.edu.au
More informationData Structures III: K-D
Lab 6 Data Structures III: K-D Trees Lab Objective: Nearest neighbor search is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and
More informationObject Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision
Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition
More informationLecture 3 February 9, 2010
6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationRay Tracing Acceleration Data Structures
Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationVQ Encoding is Nearest Neighbor Search
VQ Encoding is Nearest Neighbor Search Given an input vector, find the closest codeword in the codebook and output its index. Closest is measured in squared Euclidean distance. For two vectors (w 1,x 1,y
More information1 The range query problem
CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition
More informationAnnouncements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday
Announcements Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday 1 Spatial Data Structures Hierarchical Bounding Volumes Grids Octrees BSP Trees 11/7/02 Speeding Up Computations
More informationPoint Location in Delaunay Triangulations
Point Location in Delaunay Triangulations Inspiration: Graphics software wants to light a model and needs to know which face a light ray hits You moved the mouse and the windowing system would like to
More informationSpeeding up Queries in a Leaf Image Database
1 Speeding up Queries in a Leaf Image Database Daozheng Chen May 10, 2007 Abstract We have an Electronic Field Guide which contains an image database with thousands of leaf images. We have a system which
More informationCMSC 754 Computational Geometry 1
CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationOn the Least Cost For Proximity Searching in Metric Spaces
On the Least Cost For Proximity Searching in Metric Spaces Karina Figueroa 1,2, Edgar Chávez 1, Gonzalo Navarro 2, and Rodrigo Paredes 2 1 Universidad Michoacana, México. {karina,elchavez}@fismat.umich.mx
More informationPreliminaries: Size Measures and Shape Coordinates
2 Preliminaries: Size Measures and Shape Coordinates 2.1 Configuration Space Definition 2.1 The configuration is the set of landmarks on a particular object. The configuration matrix X is the k m matrix
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationApproximate Nearest Line Search in High Dimensions
Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi MIT mahabadi@mit.edu Abstract We consider the Approximate Nearest Line Search (NLS) problem. Given a set L of N lines in the high dimensional
More informationIntroduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Nearest-Neighbor Algorithms Nearest Neighbor Algorithms Definition: fix k 1, given a labeled
More informationkd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )
kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against
More information11. APPROXIMATION ALGORITHMS
11. APPROXIMATION ALGORITHMS load balancing center selection pricing method: vertex cover LP rounding: vertex cover generalized load balancing knapsack problem Lecture slides by Kevin Wayne Copyright 2005
More informationFast Hierarchical Clustering via Dynamic Closest Pairs
Fast Hierarchical Clustering via Dynamic Closest Pairs David Eppstein Dept. Information and Computer Science Univ. of California, Irvine http://www.ics.uci.edu/ eppstein/ 1 My Interest In Clustering What
More informationDISTANCE MATRIX APPROACH TO CONTENT IMAGE RETRIEVAL. Dmitry Kinoshenko, Vladimir Mashtalir, Elena Yegorova
International Book Series "Information Science and Computing" 29 DISTANCE MATRIX APPROACH TO CONTENT IMAGE RETRIEVAL Dmitry Kinoshenko, Vladimir Mashtalir, Elena Yegorova Abstract: As the volume of image
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationNearest Pose Retrieval Based on K-d tree
Nearest Pose Retrieval Based on K-d tree Contents 1.1 Introduction of KDTree... 2 1.2 The curse of dimensionality... 4 1.3 Nearest neighbor search (NNS)... 4 1.4 Python Implementation of KDtree... 5 1.5
More informationOrganizing Spatial Data
Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the
More informationNearest Neighbor Classification
Nearest Neighbor Classification Charles Elkan elkan@cs.ucsd.edu October 9, 2007 The nearest-neighbor method is perhaps the simplest of all algorithms for predicting the class of a test example. The training
More informationA Pivot-based Index Structure for Combination of Feature Vectors
A Pivot-based Index Structure for Combination of Feature Vectors Benjamin Bustos Daniel Keim Tobias Schreck Department of Computer and Information Science, University of Konstanz Universitätstr. 10 Box
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationTree-Weighted Neighbors and Geometric k Smallest Spanning Trees
Tree-Weighted Neighbors and Geometric k Smallest Spanning Trees David Eppstein Department of Information and Computer Science University of California, Irvine, CA 92717 Tech. Report 92-77 July 7, 1992
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More information7 Approximate Nearest Neighbors
7 Approximate Nearest Neighbors When we have a large data set with n items where n is large (think n = 1,000,000) then we discussed two types of questions we wanted to ask: (Q1): Which items are similar?
More informationParallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li
Parallel Physically Based Path-tracing and Shading Part 3 of 2 CIS565 Fall 202 University of Pennsylvania by Yining Karl Li Jim Scott 2009 Spatial cceleration Structures: KD-Trees *Some portions of these
More informationNearest Neighbor Search by Branch and Bound
Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to
More informationSpatial Data Management
Spatial Data Management [R&G] Chapter 28 CS432 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value
More informationDynamic Skyline Queries in Metric Spaces
Dynamic Skyline Queries in Metric Spaces Lei Chen and Xiang Lian Department of Computer Science and Engineering Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China
More informationSpace-based Partitioning Data Structures and their Algorithms. Jon Leonard
Space-based Partitioning Data Structures and their Algorithms Jon Leonard Purpose Space-based Partitioning Data Structures are an efficient way of organizing data that lies in an n-dimensional space 2D,
More informationMultidimensional Indexing The R Tree
Multidimensional Indexing The R Tree Module 7, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Single-Dimensional Indexes B+ trees are fundamentally single-dimensional indexes. When we create
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationJanuary 10-12, NIT Surathkal Introduction to Graph and Geometric Algorithms
Geometric data structures Sudebkumar Prasant Pal Department of Computer Science and Engineering IIT Kharagpur, 721302. email: spp@cse.iitkgp.ernet.in January 10-12, 2012 - NIT Surathkal Introduction to
More information11 - Spatial Data Structures
11 - Spatial Data Structures cknowledgement: Marco Tarini Types of Queries Graphic applications often require spatial queries Find the k points closer to a specific point p (k-nearest Neighbours, knn)
More informationIntersection Acceleration
Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees
More informationSpatial Data Management
Spatial Data Management Chapter 28 Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite
More informationClustering Billions of Images with Large Scale Nearest Neighbor Search
Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton
More informationCS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018
CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved.
More informationIntroduction to Indexing R-trees. Hong Kong University of Science and Technology
Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records
More informationIntersection of an Oriented Box and a Cone
Intersection of an Oriented Box and a Cone David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International
More informationComputational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs
Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in
More informationChapter 4: Non-Parametric Techniques
Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density
More informationApproximation Algorithms
Approximation Algorithms Given an NP-hard problem, what should be done? Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one of three desired features. Solve problem to optimality.
More informationData Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.
Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that
More information18.S34 (FALL 2007) PROBLEMS ON HIDDEN INDEPENDENCE AND UNIFORMITY
18.S34 (FALL 2007) PROBLEMS ON HIDDEN INDEPENDENCE AND UNIFORMITY All the problems below (with the possible exception of the last one), when looked at the right way, can be solved by elegant arguments
More informationIntroduction to Supervised Learning
Introduction to Supervised Learning Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA 01003 February 17, 2014 Abstract This document introduces the
More informationQuadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase
Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationStriped Grid Files: An Alternative for Highdimensional
Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,
More informationBumptrees for Efficient Function, Constraint, and Classification Learning
umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A
More informationFast Indexing Method. Dongliang Xu 22th.Feb.2008
Fast Indexing Method Dongliang Xu 22th.Feb.2008 Topics (Nearest Neighbor Searching) Problem Definition Basic Structure Quad-Tree KD-Tree Locality Sensitive Hashing Application: Learning BoostMap: A Method
More informationResearch Collection. Cluster-Computing and Parallelization for the Multi-Dimensional PH-Index. Master Thesis. ETH Library
Research Collection Master Thesis Cluster-Computing and Parallelization for the Multi-Dimensional PH-Index Author(s): Vancea, Bogdan Aure Publication Date: 2015 Permanent Link: https://doi.org/10.3929/ethz-a-010437712
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationDatenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser
Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image
More informationAccelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016
Accelerating Geometric Queries Computer Graphics CMU 15-462/15-662, Fall 2016 Geometric modeling and geometric queries p What point on the mesh is closest to p? What point on the mesh is closest to p?
More informationNeedle in a Haystack: Searching for Approximate k- Nearest Neighbours in High-Dimensional Data
Needle in a Haystack: Searching for Approximate k- Nearest Neighbours in High-Dimensional Data Liang Wang*, Ville Hyvönen, Teemu Pitkänen, Sotiris Tasoulis, Teemu Roos, and Jukka Corander University of
More informationA Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems
A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems Kyu-Woong Lee Hun-Soon Lee, Mi-Young Lee, Myung-Joon Kim Abstract We address the problem of designing index structures that
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 8, 28 Locality sensitive hashing for approximate NN
More informationTopic: Local Search: Max-Cut, Facility Location Date: 2/13/2007
CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be
More informationData Structures for Approximate Proximity and Range Searching
Data Structures for Approximate Proximity and Range Searching David M. Mount University of Maryland Joint work with: Sunil Arya (Hong Kong U. of Sci. and Tech) Charis Malamatos (Max Plank Inst.) 1 Introduction
More information4 INFORMED SEARCH AND EXPLORATION. 4.1 Heuristic Search Strategies
55 4 INFORMED SEARCH AND EXPLORATION We now consider informed search that uses problem-specific knowledge beyond the definition of the problem itself This information helps to find solutions more efficiently
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationComputer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer
Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Divide and Conquer Divide and-conquer is a very common and very powerful algorithm design technique. The general idea:
More informationPoint Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology
Point Cloud Filtering using Ray Casting by Eric Jensen 01 The Basic Methodology Ray tracing in standard graphics study is a method of following the path of a photon from the light source to the camera,
More informationAlgorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs
Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms
More informationHierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm
161 CHAPTER 5 Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm 1 Introduction We saw in the previous chapter that real-life classifiers exhibit structure and
More informationLecture 8 13 March, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 8 13 March, 2012 1 From Last Lectures... In the previous lecture, we discussed the External Memory and Cache Oblivious memory models.
More informationSimilarity Searching:
Similarity Searching: Indexing, Nearest Neighbor Finding, Dimensionality Reduction, and Embedding Methods, for Applications in Multimedia Databases Hanan Samet hjs@cs.umd.edu Department of Computer Science
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationFast nearest-neighbor search algorithms based on approximation-elimination search
Pattern Recognition 33 (2000 1497}1510 Fast nearest-neighbor search algorithms based on approximation-elimination search V. Ramasubramanian, Kuldip K. Paliwal* Computer Systems and Communications Group,
More informationSpanners of Complete k-partite Geometric Graphs
Spanners of Complete k-partite Geometric Graphs Prosenjit Bose Paz Carmi Mathieu Couture Anil Maheshwari Pat Morin Michiel Smid May 30, 008 Abstract We address the following problem: Given a complete k-partite
More informationSpatial Data Structures
CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration
More informationAssignment 5: Solutions
Algorithm Design Techniques Assignment 5: Solutions () Port Authority. [This problem is more commonly called the Bin Packing Problem.] (a) Suppose K = 3 and (w, w, w 3, w 4 ) = (,,, ). The optimal solution
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationPtolemaic Indexing. Eirik Benum Reksten. Master of Science in Computer Science. An Evaluation
Ptolemaic Indexing An Evaluation Eirik Benum Reksten Master of Science in Computer Science Submission date: June 2010 Supervisor: Magnus Lie Hetland, IDI Norwegian University of Science and Technology
More informationLecture 3 February 23, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 3 February 23, 2012 1 Overview In the last lecture we saw the concepts of persistence and retroactivity as well as several data structures
More informationNearest Neighbors Classifiers
Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More information26 The closest pair problem
The closest pair problem 1 26 The closest pair problem Sweep algorithms solve many kinds of proximity problems efficiently. We present a simple sweep that solves the two-dimensional closest pair problem
More informationOrthogonal Range Search and its Relatives
Orthogonal Range Search and its Relatives Coordinate-wise dominance and minima Definition: dominates Say that point (x,y) dominates (x', y') if x
More information