A Mixed Hierarchical Algorithm for Nearest Neighbor Search

Size: px
Start display at page:

Download "A Mixed Hierarchical Algorithm for Nearest Neighbor Search"

Transcription

1 A Mixed Hierarchical Algorithm for Nearest Neighbor Search Carlo del Mundo Virginia Tech 222 Kraft Dr. Knowledge Works II Building Blacksburg, VA ABSTRACT The k nearest neighbor (knn) search is a computationally intensive application critical to fields such as image processing, statistics, and biology. Recent works have demonstrated the efficacy of k-d tree based implementations on multi-core CPUs. It is unclear, however, whether such tree based implementations are amenable for execution in high-density processors typified today by the graphics processing unit (GPU). This work seeks to map and optimize knn to massively parallel architectures such as the GPU. Our approach synthesizes a clustering technique, k-means, with traditional brute force methods to prune the search space while taking advantage of data-parallel execution of knn on the GPU. Overall, our general case GPU version outperforms a singlethreaded CPU by factors as high as INTRODUCTION & MOTIVATION knn is a fundamental algorithm for classifying objects. It works by finding the nearest neighbor of one or several query points in a metric space. Figure 1 depicts an example of knn for a 2D Euclidean metric space. In the context of this paper, the input data set is referred to as the reference points (shown as circles), and the targets are referred to as query points (shown as an X). The two closest neighbors (K = 2) of the query point are shown in green. Computing knn presents a prohibitive cost for large inputs and dimensionalities. This work seeks to capitalize on the rich parallel resources of GPUs to accelerate knn calculations. Traditional techniques focus on k-d tree based data structures to achieve O(log N) searches by pruning the search space at each level of the tree. To the best of our knowledge, no works have focused on search space pruning techniques for knn on the GPU. We explore the use of a clustering technique, known as k- means, to perform offline groupings of like-coordinate points. Our approach takes advantage of the properties of clusters. Mariam Umar Virginia Tech 222 Kraft Dr. Knowledge Works II Building Blacksburg, VA mariam.umar@vt.edu Figure 1: Nearest neighbor search for N = 12 and K = 2. The red X represents the query point, q, and its nearest neighbors are shown in green. Points are clustered together based on a convergence criterion. These clusters of near-proximity points have centers known as the centroids and are calculated as the average coordinates of points within a cluster. We assume that the nearest neighbor of a query point, q, belongs to the closest cluster to q. This fundamental assumption prunes the search space by discarding points that do not belong to the closest cluster. This tree-like pruning behavior can significantly cut down on the number of data points to test while avoiding branch divergence penalties. In this work, we characterize the brute force (BF) linear knn method for both the CPU and GPU. We then demonstrate the efficacy of our hierarchical algorithm on the GPU against the BF CPU. To that end, our contributions are as follows. 1. A characterization of the data-parallel brute force algorithm on CPU and GPU 2. The design, implementation, and characterization of a mixed hierarchical knn algorithm that prunes the search space via clustering The rest of the document is outlined as follows. Section 2 discusses related work, Section 3 presents our hierarchical approach, and finally, Section 4 summarizes and discusses our results.

2 2. RELATED WORK 2.1 K-means Clustering is a widely studied problems of which k-means is the canonical clustering method. Pelleg et al. [8] discusses issues in implementation for k-means such as poor scaling and finding local minima. They propose several solutions for these problems. Alsabti et al. [1] explored a k-d tree based implementation of k-means. They claim that the calculation of k-means with a k-d tree approach improves performance by two orders of magnitude. Bradley et al. [4] shows that the initial point calculation is very important in performance, and they argue that defining a better initial point helps k-means converge to a better minimum. Finding a better initial point improves solutions for both continuous and discrete data sets. Kanungo et al. [7] use Lloyd s algorithm for k-means. Their approach differs from the conventional approach in that they construct a k-d tree for data points rather than the query points. They claim that their implementation performs better both for synthetically generated data sets as well as real data sets. K-means has been a popular clustering algorithm for many decades, but no theoretical bounds have been established until now. Arthur et al. [2] shows that it is simple and fast, but additional research is required to understand its theoretical complexity. They claim that even the initial clusters and their corresponding centers are chosen uniformly at random points. The cluster calculation is still superpolynomial. They are calculating lower bound on k-means using reset widgets. This widget is introduced to make computation of k-means much longer in order to calculate lower bound. Reza et al. [] have implemented k-means on GPUs. They structure their implementation based on the architecture of the GPU. They have taken performance and efficiency into account demonstrating data intensive tasks on GPU is better due to power constraints. Their speedups improve by factors as high as 68 for the NVIDIA 88 Ultra GTX. A serious concern about their speed up is that they have not taken into account the data transfer times between CPU and GPU, which may serve as a bottleneck for larger data sets. 2.2 knn One prominent tree based nearest neighbor algorithm is based on the work of Arya et al [3]. This work focuses on creating a tree data structure on the CPU to cut-down search and space complexity to O(log N) and O(n), respectively. The authors discuss the implications of nearest neighbor for higher dimensionalities (d > 2) and how to avoid common pitfalls. Their methodology involves a modified k-d tree, also known as a bounding box decomposition tree. Finally, they relax the constraints of the nearest neighbor algorithm in order to gain substantial speedup with respect to algorithmic complexity. Garcia et al. propose a GPU-based implementation of the knn algorithm [6]. They implemented a brute-force approach of the knn problem by composing their computation as a series of matrix and sorting problems. By leveraging CUDA and CUBLAS, the authors have shown substantial speedups by factors as high as 64 and 189 faster than Arya s work on multi-core CPUs. 3. APPROACH The common approach to knn is: (1) a linear brute force (BF) approach and (2) the k-d tree approach. Instead, we propose a mixed hierarchical algorithm that uses a combination of BF and clustering. In the BF algorithm, a simple Euclidean distance kernel is applied to an array of points. This approach does not require extra data structures other than an array. This approach is relatively slow, on the order of O(N) where N is the number of points in the list. Other efficient partitioning techniques use spatial data structures such as k-d trees reducing the complexity of the search to O(log N) [3]. Unfortunately, k-d tree based implementations for nearest neighbor search has only been widely studied on the CPU. We propose a mixed hierarchical algorithm that first compresses the data via a clustering scheme. Since points are clustered by a distance metric, we can assume that the query points and their neighbors are within the same cluster. Determining the cluster is done by finding the Euclidean distance between the query point and the centroid of the cluster. Finally, a brute force, data parallel approach can traverse the remaining data points within the identified cluster. This approach effectively prunes the number of reference points by a factor related to the cluster size. Figure 2 compares and contrasts the following algorithms (1) brute force, (2) the k-d tree, and (3) our proposed hierarchical algorithm. The brute force algorithm, shown in (a), calculates distances between the query point and every other point (O(N)). In the k-d tree method, the coordinate space is subdivided as a set of tiles. Though this has algorithmic complexity of O(log N) for search, it is unclear whether such a data structure will map well onto the GPU. Finally our hierarchical algorithm, shown in (c), first partitions data points into clusters. Distance calculations are first done between each cluster s centroid with the query point. Once the closest cluster is determined, a brute force approach is applied to points within that cluster. Our implementations are based on the pseudocode outlined in Sections 3.1, 3.2 and 3.3 for the brute force knn, k-means clustering, and our mixed algorithm, respectively. 3.1 Brute force knn algorithm Given a set of query points, Q, and a set of input data points, I. Then for each query point, q i: 1. Compute the Euclidean distance between q i and each point in I. 2. Sort the distances in descending order. The k nearest neighbors for query point q i will be the first k entries in the sorted array. 3.2 K-means clustering algorithm The K-means clustering algorithm takes a parameter, C, which is the total number of clusters to group a set of reference data points.

3 (a) (b) (c) Figure 2: Approaches to knn. In (a), the brute force nearest neighbor algorithm is shown. For a query point, q, a distance calculation is performed on every other point. In (b), the k-d tree nearest neighbor algorithm subdivides the coordinate space into equally spaced tiles. The search complexity is of order O(log N). Finally, in (c), an example of our proposed hierarchical algorithm is shown. Instead of traversing a tree structure on the GPU, clustering is performed on the CPU and clusters transferred to GPU. To effectively narrow down the query point to its nearest neighbors, a distance calculation is performed from the query point to each cluster s centroids. The brute force approach is then applied to points in the closest cluster. Our assumption is that the nearest neighbor will be contained within the closest cluster. 1. Choose C initial points as the initial clusters. The centroid of these clusters is the point coordinate of the initial points. 2. Calculate the Euclidean distance between each point to the current clusters. Group each point to its nearest cluster. 3. Recalculate the centroid of each cluster based on all points belonging to that cluster. 4. Repeat the previous two steps until the centroids converge. 3.3 Mixed Algorithm 1. Calculate a set of C clusters using the algorithm in Section For each query point, q i: (a) Determine the closest cluster to q i by calculating the Euclidean distance of each cluster s centroid with the query point. (b) Apply the brute force algorithm as detailed in section 3.1 to the closest cluster. 3.4 Limitations Boundary cases. Like the k-d tree implementation, boundary cases are an issue. Suppose the query points are in the boundaries between two clusters. In this situation, both clusters must be traversed in order to determine the nearest neighbor. The problem is further exacerbated when applied to query points in the boundaries between N clusters. This could be alleviated by creating a bounding volume for each cluster, therefore, identifying potential clusters that contain the nearest neighbor. Costs for creating the cluster. The start up costs for clustering can be prohibitive for large sample sizes. We assume that the cost of creating the clusters beforehand will be amortized by fast query searches. Number of clusters and size of clusters. Empirical testing must be performed in order to determine the optimal number of clusters and the size of the respective clusters. Many clusters will increase the overhead of determining which cluster the query point belongs to. Similarly, a large cluster will increase the overhead of the brute force algorithm. 4. RESULTS AND DISCUSSION Here, we outline our experimental setup, results, and discussion. 4.1 Experimental Testbed Table 1: Experimental Testbed. OS/Kernel Debian Wheezy 7., v bit Software CUDA., Driver v313.3 CPU Intel Celeron E33 (2 cores 2. GHz) GPU NVIDIA Tesla C27 (448 cores 1.1 GHz) Compiler nvcc -O3 (only optimizes CPU) Our experimental testbed is listed in Table 1. For the course of experimentation, we have used the nvcc compiler with compiler flags (-3) to amortize the cost of having a slow CPU. The compiler flag improved CPU performance by a factor of five over CPU without the flags. For our dataset, we use the USA-Central nodes data set and vary the input size from 1 to 64 MB. We fixed the number of query (Q) and neighbor points (K) to one. Finally, we fix our experiment to 1 clusters.

4 Our implementations are broken down into three kernels: (1) distance computation, (2) sort, and (3) k-means. We run these kernels on the CPU, GPU, or both. Our experiments are as follows. Distance (8.%) Distance (6.3%) 1. BF CPU. Distance Computation (on CPU), Sort (on CPU). 2. BF GPU. Distance Computation (on GPU), Sort (on GPU). 3. BF GPU + k-means. K-means (on CPU), Distance Computation (GPU), Sort (on GPU). Sort (91.%) (a) Reduction (34.7%) (b) Since the number of neighbor points (K) are fixed to one element, a reduction operation can be substituted for the sorting operation. We demonstrate the efficacy of reduction vs. sorting. The authors implemented the distance computation on CPU and GPU and k-means for CPU, used STL::sort and Thrust::sort for the CPU and GPU, respectively, and the Thrust::reduction for the GPU reduction operation. We do not include the execution time of k-means for our mixed algorithm. Furthermore, we assume that cluster creation is negligible. 4.2 Results Execution Time (ms) CPU BF GPU BF GPU BF + KM GPU BF + KM + Reduction Number of Reference Points (MB) Figure 3: Results for BF CPU, BF GPU, and BF GPU with k-means. Our primary results are listed in Figure 4. We note that both axes are on a logarithmic scale. In all cases, our GPU versions outperform the CPU versions with performance improving with successive GPU implementations. Figure 4 depicts the execution time for N = 64 for the {GPU BF + KM + Reduction} broken down in its constituent stages: distance/sort or distance/reduction. Recall that the sorting operation for the BF algorithm can be substituted for a reduction operation if the number of neighbor points (K) is one. Substituting sort for reduction improves performance by a factor of 2; in addition, we note that the execution time is no longer dominated by the sorting stage, but rather by the distance stage. 4.3 Discussion The linear growth of all experiments is expected since the BF algorithm requires O(N) search time. We note that the Figure 4: Percentage Execution for the Distance and Sort/Reduction Stage for N = 64. In (a), the execution of {BF GPU + KM} is shown, and in (b), the execution of {BF GPU + KM + Reduction}. Sorting is the dominant component of GPU BF algorithm with k-means comprising 91.% of the execution time for 64 MB. Substituting a reduction operation instead of sorting, the distance component of the BF GPU becomes the dominant factor. differences in execution time in the GPU versions can be attributed to differences in algorithmic design. The naive {GPU BF} version computes both distance and sorting for all points in the data set. The {GPU BF + KM} computes distance and sorting for only a subset of points (within the selected cluster). Finally, the {GPU BF + KM + Reduction} is similar to {GPU BF + KM}, but instead performs a reduction operation in lieu of sorting. Overall, our fastest GPU version, {GPU BF + KM + Reduction} outperforms its respective CPU version by factors as high as 822 over our CPU implementation. We note, however, this is a special corner case in the knn computation (where K = 1). Therefore, for the general case, {GPU BF + KM} outperforms its respective CPU version by factors as high as 18 over our CPU implementation.. REFERENCES [1] Khaled Alsabti. An efficient k-means clustering algorithm. In In Proceedings of IPPS/SPDP Workshop on High Performance Data Mining, [2] David Arthur and Sergei Vassilvitskii. How slow is the k-means method? In Nina Amenta and Otfried Cheong, editors, Symposium on Computational Geometry, pages ACM, 26. [3] Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 4(6): , November [4] P. S. Bradley and Usama M. Fayyad. Refining initial points for k-means clustering. pages Morgan kaufmann, [] Reza Farivar, Daniel Rebolledo, Ellick Chan, and Roy H. Campbell. A parallel implementation of k-means clustering on gpus. In PDPTA, pages 34 34, 28. [6] V. Garcia, E. Debreuve, F. Nielsen, and M. Barlaud.

5 K-nearest neighbor search: Fast gpu-based implementations and application to high-dimensional feature matching. In Image Processing (ICIP), 21 17th IEEE International Conference on, pages , 21. [7] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine Piatko, Ruth Silverman, and Angela Y. Wu. The analysis of a simple k-means clustering algorithm, 2. [8] Dau Pelleg and Andrew Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In In Proceedings of the 17th International Conf. on Machine Learning, pages Morgan Kaufmann, 2.

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Parallel K-Means Clustering with Triangle Inequality

Parallel K-Means Clustering with Triangle Inequality Parallel K-Means Clustering with Triangle Inequality Rachel Krohn and Christer Karlsson Mathematics and Computer Science Department, South Dakota School of Mines and Technology Rapid City, SD, 5771, USA

More information

Towards the world s fastest k-means algorithm

Towards the world s fastest k-means algorithm Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Faster Algorithms for the Constrained k-means Problem

Faster Algorithms for the Constrained k-means Problem Faster Algorithms for the Constrained k-means Problem CSE, IIT Delhi June 16, 2015 [Joint work with Anup Bhattacharya (IITD) and Amit Kumar (IITD)] k-means Clustering Problem Problem (k-means) Given n

More information

Fast k Nearest Neighbor Search using GPU

Fast k Nearest Neighbor Search using GPU Fast k Nearest Neighbor Search using GPU Vincent Garcia Eric Debreuve Michel Barlaud Université de Nice-Sophia Antipolis/CNRS Laboratoire I3S, 2000 route des Lucioles, 06903 Sophia Antipolis, France garciav@i3s.unice.fr

More information

Accelerating K-Means Clustering with Parallel Implementations and GPU computing

Accelerating K-Means Clustering with Parallel Implementations and GPU computing Accelerating K-Means Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam

More information

Scalability of Efficient Parallel K-Means

Scalability of Efficient Parallel K-Means Scalability of Efficient Parallel K-Means David Pettinger and Giuseppe Di Fatta School of Systems Engineering The University of Reading Whiteknights, Reading, Berkshire, RG6 6AY, UK {D.G.Pettinger,G.DiFatta}@reading.ac.uk

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms

Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms Liang Men, Miaoqing Huang, John Gauch Department of Computer Science and Computer Engineering University of Arkansas {mliang,mqhuang,jgauch}@uark.edu

More information

Characterization of CUDA and KD-Tree K-Query Point Nearest Neighbor for Static and Dynamic Data Sets

Characterization of CUDA and KD-Tree K-Query Point Nearest Neighbor for Static and Dynamic Data Sets Characterization of CUDA and KD-Tree K-Query Point Nearest Neighbor for Static and Dynamic Data Sets Brian Bowden bbowden1@vt.edu Jon Hellman jonatho7@vt.edu 1. Introduction Nearest neighbor (NN) search

More information

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the

More information

Introduction of Clustering by using K-means Methodology

Introduction of Clustering by using K-means Methodology ISSN: 78-08 Vol. Issue 0, December- 0 Introduction of ing by using K-means Methodology Niraj N Kasliwal, Prof Shrikant Lade, Prof Dr. S. S. Prabhune M-Tech, IT HOD,IT HOD,IT RKDF RKDF SSGMCE Bhopal,(India)

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

Data Structures for Approximate Proximity and Range Searching

Data Structures for Approximate Proximity and Range Searching Data Structures for Approximate Proximity and Range Searching David M. Mount University of Maryland Joint work with: Sunil Arya (Hong Kong U. of Sci. and Tech) Charis Malamatos (Max Plank Inst.) 1 Introduction

More information

Classifying Documents by Distributed P2P Clustering

Classifying Documents by Distributed P2P Clustering Classifying Documents by Distributed P2P Clustering Martin Eisenhardt Wolfgang Müller Andreas Henrich Chair of Applied Computer Science I University of Bayreuth, Germany {eisenhardt mueller2 henrich}@uni-bayreuth.de

More information

Normalization based K means Clustering Algorithm

Normalization based K means Clustering Algorithm Normalization based K means Clustering Algorithm Deepali Virmani 1,Shweta Taneja 2,Geetika Malhotra 3 1 Department of Computer Science,Bhagwan Parshuram Institute of Technology,New Delhi Email:deepalivirmani@gmail.com

More information

Monika Maharishi Dayanand University Rohtak

Monika Maharishi Dayanand University Rohtak Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Efficient Stream Reduction on the GPU

Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU David Roger Grenoble University Email: droger@inrialpes.fr Ulf Assarsson Chalmers University of Technology Email: uffe@chalmers.se Nicolas Holzschuch Cornell University

More information

Parallel Approach for Implementing Data Mining Algorithms

Parallel Approach for Implementing Data Mining Algorithms TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

More information

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2 So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters

PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,

More information

Clustering and Dimensionality Reduction

Clustering and Dimensionality Reduction Clustering and Dimensionality Reduction Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: Data Mining Automatically extracting meaning from

More information

Searching High-Dimensional Neighbours: CPU-Based Tailored Data-Structures Versus GPU-Based Brute-Force Method

Searching High-Dimensional Neighbours: CPU-Based Tailored Data-Structures Versus GPU-Based Brute-Force Method Searching High-Dimensional Neighbours: CPU-Based Tailored Data-Structures Versus GPU-Based Brute-Force Method Vincent Garcia and Frank Nielsen Ecole Polytechnique, Palaiseau, France {garciav,nielsen}@lix.polytechnique.fr

More information

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,

More information

Textural Features for Image Database Retrieval

Textural Features for Image Database Retrieval Textural Features for Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195-2500 {aksoy,haralick}@@isl.ee.washington.edu

More information

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015 Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs

More information

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline

More information

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

Adaptive Assignment for Real-Time Raytracing

Adaptive Assignment for Real-Time Raytracing Adaptive Assignment for Real-Time Raytracing Paul Aluri [paluri] and Jacob Slone [jslone] Carnegie Mellon University 15-418/618 Spring 2015 Summary We implemented a CUDA raytracer accelerated by a non-recursive

More information

Lecture 25 November 26, 2013

Lecture 25 November 26, 2013 CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 25 November 26, 2013 Scribe: Thomas Steinke 1 Overview Tody is the last lecture. We will finish our discussion of MapReduce by covering

More information

A comparative study of k-nearest neighbour techniques in crowd simulation

A comparative study of k-nearest neighbour techniques in crowd simulation A comparative study of k-nearest neighbour techniques in crowd simulation Jordi Vermeulen Arne Hillebrand Roland Geraerts Department of Information and Computing Sciences Utrecht University, The Netherlands

More information

A Recommender System Based on Improvised K- Means Clustering Algorithm

A Recommender System Based on Improvised K- Means Clustering Algorithm A Recommender System Based on Improvised K- Means Clustering Algorithm Shivani Sharma Department of Computer Science and Applications, Kurukshetra University, Kurukshetra Shivanigaur83@yahoo.com Abstract:

More information

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Clustering Billions of Images with Large Scale Nearest Neighbor Search Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton

More information

Online Document Clustering Using the GPU

Online Document Clustering Using the GPU Online Document Clustering Using the GPU Benjamin E. Teitler, Jagan Sankaranarayanan, Hanan Samet Center for Automation Research Institute for Advanced Computer Studies Department of Computer Science University

More information

CSE 527: Introduction to Computer Vision

CSE 527: Introduction to Computer Vision CSE 527: Introduction to Computer Vision Week 5 - Class 1: Matching, Stitching, Registration September 26th, 2017 ??? Recap Today Feature Matching Image Alignment Panoramas HW2! Feature Matches Feature

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

On the Efficacy of Haskell for High Performance Computational Biology

On the Efficacy of Haskell for High Performance Computational Biology On the Efficacy of Haskell for High Performance Computational Biology Jacqueline Addesa Academic Advisors: Jeremy Archuleta, Wu chun Feng 1. Problem and Motivation Biologists can leverage the power of

More information

R-Storm: A Resource-Aware Scheduler for STORM. Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell

R-Storm: A Resource-Aware Scheduler for STORM. Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell R-Storm: A Resource-Aware Scheduler for STORM Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell Introduction STORM is an open source distributed real-time data stream processing system

More information

IMAGE SEGMENTATION. Václav Hlaváč

IMAGE SEGMENTATION. Václav Hlaváč IMAGE SEGMENTATION Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/ hlavac, hlavac@fel.cvut.cz

More information

Redefining and Enhancing K-means Algorithm

Redefining and Enhancing K-means Algorithm Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,

More information

On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy

On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy Jan Verschelde joint with Genady Yoffe and Xiangcheng Yu University of Illinois at Chicago Department of Mathematics, Statistics,

More information

CS229 Final Project: k-means Algorithm

CS229 Final Project: k-means Algorithm CS229 Final Project: k-means Algorithm Colin Wei and Alfred Xue SUNet ID: colinwei axue December 11, 2014 1 Notes This project was done in conjuction with a similar project that explored k-means in CS

More information

Fast Hierarchical Clustering via Dynamic Closest Pairs

Fast Hierarchical Clustering via Dynamic Closest Pairs Fast Hierarchical Clustering via Dynamic Closest Pairs David Eppstein Dept. Information and Computer Science Univ. of California, Irvine http://www.ics.uci.edu/ eppstein/ 1 My Interest In Clustering What

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Application of Clustering as a Data Mining Tool in Bp systolic diastolic

Application of Clustering as a Data Mining Tool in Bp systolic diastolic Application of Clustering as a Data Mining Tool in Bp systolic diastolic Assist. Proffer Dr. Zeki S. Tywofik Department of Computer, Dijlah University College (DUC),Baghdad, Iraq. Assist. Lecture. Ali

More information

3D Point Cloud Processing

3D Point Cloud Processing 3D Point Cloud Processing The image depicts how our robot Irma3D sees itself in a mirror. The laser looking into itself creates distortions as well as changes in intensity that give the robot a single

More information

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm

More information

PASCAL. A Parallel Algorithmic SCALable Framework for N-body Problems. Laleh Aghababaie Beni, Aparna Chandramowlishwaran. Euro-Par 2017.

PASCAL. A Parallel Algorithmic SCALable Framework for N-body Problems. Laleh Aghababaie Beni, Aparna Chandramowlishwaran. Euro-Par 2017. PASCAL A Parallel Algorithmic SCALable Framework for N-body Problems Laleh Aghababaie Beni, Aparna Chandramowlishwaran Euro-Par 2017 Outline Introduction PASCAL Framework Space Partitioning Trees Tree

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

EFFICIENT BVH CONSTRUCTION AGGLOMERATIVE CLUSTERING VIA APPROXIMATE. Yan Gu, Yong He, Kayvon Fatahalian, Guy Blelloch Carnegie Mellon University

EFFICIENT BVH CONSTRUCTION AGGLOMERATIVE CLUSTERING VIA APPROXIMATE. Yan Gu, Yong He, Kayvon Fatahalian, Guy Blelloch Carnegie Mellon University EFFICIENT BVH CONSTRUCTION VIA APPROXIMATE AGGLOMERATIVE CLUSTERING Yan Gu, Yong He, Kayvon Fatahalian, Guy Blelloch Carnegie Mellon University BVH CONSTRUCTION GOALS High quality: produce BVHs of comparable

More information

Parallel K-Means Algorithm for Shared Memory Multiprocessors

Parallel K-Means Algorithm for Shared Memory Multiprocessors Journal of Computer and Communications, 2014, 2, 15-23 Published Online September 2014 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2014.211002 Parallel K-Means Algorithm for

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

Exploiting GPUs to Accelerate Clustering Algorithms

Exploiting GPUs to Accelerate Clustering Algorithms Exploiting GPUs to Accelerate Clustering Algorithms Mahmoud Al-Ayyoub, Qussai Yaseen, Moahmmed A. Shehab, Yaser Jararweh, Firas Albalas and Elhadj Benkhelifa Jordan University of Science and Technology,

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Enhanced K-Means Clustering Algorithm Using Red Black Tree and Min-Heap

Enhanced K-Means Clustering Algorithm Using Red Black Tree and Min-Heap Enhanced K-Means Clustering Algorithm Using Red Black Tree and Min-Heap Rajeev Kumar, Rajeshwar Puran and Joydip Dhar Abstract Fast and high quality clustering is one of the most important tasks in the

More information

QR Decomposition on GPUs

QR Decomposition on GPUs QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño High Quality DXT Compression using OpenCL for CUDA Ignacio Castaño icastano@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change 0.1 02/01/2007 Ignacio Castaño First

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of

More information

ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013

ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013 ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013 TOPIC: RESEARCH PAPER Title: Data Management for SSDs for Large-Scale Interactive Graphics Applications Authors: M. Gopi, Behzad Sajadi,

More information

2D image segmentation based on spatial coherence

2D image segmentation based on spatial coherence 2D image segmentation based on spatial coherence Václav Hlaváč Czech Technical University in Prague Center for Machine Perception (bridging groups of the) Czech Institute of Informatics, Robotics and Cybernetics

More information

Automated fmri Feature Abstraction using Neural Network Clustering Techniques

Automated fmri Feature Abstraction using Neural Network Clustering Techniques NIPS workshop on New Directions on Decoding Mental States from fmri Data, 12/8/2006 Automated fmri Feature Abstraction using Neural Network Clustering Techniques Radu Stefan Niculescu Siemens Medical Solutions

More information

The Effect of Word Sampling on Document Clustering

The Effect of Word Sampling on Document Clustering The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,

More information

PARALLEL CLASSIFICATION ALGORITHMS

PARALLEL CLASSIFICATION ALGORITHMS PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision

More information

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing

More information

Particle Swarm Optimization applied to Pattern Recognition

Particle Swarm Optimization applied to Pattern Recognition Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...

More information

Pthread Parallel K-means

Pthread Parallel K-means Pthread Parallel K-means Barbara Hohlt CS267 Applications of Parallel Computing UC Berkeley December 14, 2001 1 Introduction K-means is a popular non-hierarchical method for clustering large datasets.

More information

On the Least Cost For Proximity Searching in Metric Spaces

On the Least Cost For Proximity Searching in Metric Spaces On the Least Cost For Proximity Searching in Metric Spaces Karina Figueroa 1,2, Edgar Chávez 1, Gonzalo Navarro 2, and Rodrigo Paredes 2 1 Universidad Michoacana, México. {karina,elchavez}@fismat.umich.mx

More information

Towards a Performance- Portable FFT Library for Heterogeneous Computing

Towards a Performance- Portable FFT Library for Heterogeneous Computing Towards a Performance- Portable FFT Library for Heterogeneous Computing Carlo C. del Mundo*, Wu- chun Feng* *Dept. of ECE, Dept. of CS Virginia Tech Slides Updated: 5/19/2014 Forecast (Problem) AMD Radeon

More information

LUNAR TEMPERATURE CALCULATIONS ON A GPU

LUNAR TEMPERATURE CALCULATIONS ON A GPU LUNAR TEMPERATURE CALCULATIONS ON A GPU Kyle M. Berney Department of Information & Computer Sciences Department of Mathematics University of Hawai i at Mānoa Honolulu, HI 96822 ABSTRACT Lunar surface temperature

More information

Multi Agent Navigation on GPU. Avi Bleiweiss

Multi Agent Navigation on GPU. Avi Bleiweiss Multi Agent Navigation on GPU Avi Bleiweiss Reasoning Explicit Implicit Script, storytelling State machine, serial Compute intensive Fits SIMT architecture well Navigation planning Collision avoidance

More information

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm Jared Vicory and M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ABSTRACT Gray-level

More information

Segmentation Computer Vision Spring 2018, Lecture 27

Segmentation Computer Vision Spring 2018, Lecture 27 Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have

More information

Map Abstraction with Adjustable Time Bounds

Map Abstraction with Adjustable Time Bounds Map Abstraction with Adjustable Time Bounds Sourodeep Bhattacharjee and Scott D. Goodwin School of Computer Science, University of Windsor Windsor, N9B 3P4, Canada sourodeepbhattacharjee@gmail.com, sgoodwin@uwindsor.ca

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Big-data Clustering: K-means vs K-indicators

Big-data Clustering: K-means vs K-indicators Big-data Clustering: K-means vs K-indicators Yin Zhang Dept. of Computational & Applied Math. Rice University, Houston, Texas, U.S.A. Joint work with Feiyu Chen & Taiping Zhang (CQU), Liwei Xu (UESTC)

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Accelerating Braided B+ Tree Searches on a GPU with CUDA

Accelerating Braided B+ Tree Searches on a GPU with CUDA Accelerating Braided B+ Tree Searches on a GPU with CUDA Jordan Fix, Andrew Wilkes, Kevin Skadron University of Virginia Department of Computer Science Charlottesville, VA 22904 {jsf7x, ajw3m, skadron}@virginia.edu

More information

Random Swap algorithm

Random Swap algorithm Random Swap algorithm Pasi Fränti 24.4.2018 Definitions and data Set of N data points: X={x 1, x 2,, x N } Partition of the data: P={p 1, p 2,,p k }, Set of k cluster prototypes (centroids): C={c 1, c

More information

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017 Nearest neighbors Focus on tree-based methods Clément Jamin, GUDHI project, Inria March 2017 Introduction Exact and approximate nearest neighbor search Essential tool for many applications Huge bibliography

More information

CPSC / Sonny Chan - University of Calgary. Collision Detection II

CPSC / Sonny Chan - University of Calgary. Collision Detection II CPSC 599.86 / 601.86 Sonny Chan - University of Calgary Collision Detection II Outline Broad phase collision detection: - Problem definition and motivation - Bounding volume hierarchies - Spatial partitioning

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Cluster Analysis: Agglomerate Hierarchical Clustering

Cluster Analysis: Agglomerate Hierarchical Clustering Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information