Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
|
|
- Quentin Harrington
- 5 years ago
- Views:
Transcription
1 Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1
2 1. Supervised Data Mining Classification Regression Outlier detection Frequent pattern mining 2. Unsupervised Data Mining Clustering Feature Extraction definition real use-cases method pros and cons Izabela Moise, Evangelos Pournaras, Dirk Helbing 2
3 1. Supervised Data Mining Classification Regression Outlier detection Frequent pattern mining 2. Unsupervised Data Mining Clustering Feature Extraction definition real use-cases method pros and cons Izabela Moise, Evangelos Pournaras, Dirk Helbing 2
4 Unsupervised Data Mining descriptive or undirected finds hidden structure and relation within the data determine the existence of classes or clusters in the data exploratory analysis all variable are treated in the same way Izabela Moise, Evangelos Pournaras, Dirk Helbing 3
5 Overview Clustering Main principles Definition Types of clustering Applications Clustering techniques Distance metrics k-means Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing 4
6 Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing 5
7 Definition Clustering is a data mining function that partitions the data points into natural groups called clusters. The goal: the points within a cluster are very similar, whereas points across clusters are as dissimilar as possible. Unsupervised (requires data, not labels) Outcome clusters Izabela Moise, Evangelos Pournaras, Dirk Helbing 6
8 Definition Clustering is a data mining function that partitions the data points into natural groups called clusters. The goal: the points within a cluster are very similar, whereas points across clusters are as dissimilar as possible. Unsupervised (requires data, not labels) Outcome clusters Izabela Moise, Evangelos Pournaras, Dirk Helbing 6
9 Definition Clustering is a data mining function that partitions the data points into natural groups called clusters. The goal: the points within a cluster are very similar, whereas points across clusters are as dissimilar as possible. Unsupervised (requires data, not labels) Outcome clusters Izabela Moise, Evangelos Pournaras, Dirk Helbing 6
10 Types of Clustering partitional divides data points into non-overlapping clusters, each point is in exactly one subset hierarchal finds clusters using previously built clusters agglomerative start with single-element clusters and merge them exclusive a data point belongs to a single cluster non-exclusive a data point may belong to multiple clusters fuzzy, probabilistic a point belongs to every cluster with a weight between 0 and 1 Izabela Moise, Evangelos Pournaras, Dirk Helbing 7
11 Applications 1. useful when don t know what you re looking for 2. used as a stand-alone tool to get insight into the data 3. used as a preprocessing tool for other algorithms (outlier detection, data compression) Astronomy: aggregation of stars, galaxies, or super galaxies Spatial Data Analysis: create thematic maps in GIS by clustering feature spaces Image Processing Izabela Moise, Evangelos Pournaras, Dirk Helbing 8
12 Weblogs: discover groups of similar access patterns City-planning: identifying groups of houses according to their house type, value, and geographical location Land use: identification of areas of similar land use in an earth observation database Earth-quake studies: observed earth quake epicentres should be clustered along continent faults Summarisation: reduce the size of large data sets Marketing Izabela Moise, Evangelos Pournaras, Dirk Helbing 9
13 Google News Izabela Moise, Evangelos Pournaras, Dirk Helbing 10
14 Applications Izabela Moise, Evangelos Pournaras, Dirk Helbing 11
15 What is a Cluster? a subset of objects which are similar the distance between any two objects in the cluster is less than the distance between any object in the cluster and any object outside it a connected region of a multidimensional space containing a relatively high density of objects Izabela Moise, Evangelos Pournaras, Dirk Helbing 12
16 What Makes a Clustering Good? A good clustering method will produce high quality clusters in which: intra-cluster similarity is high inter-cluster similarity is low depends on the similarity metric and its implementation ability to discover all or some hidden patterns Izabela Moise, Evangelos Pournaras, Dirk Helbing 13
17 Distance Metrics 1. Euclidean Distance 2. Manhattan Distance 3. Minkowski Distance Izabela Moise, Evangelos Pournaras, Dirk Helbing 14
18 Calculating Cluster Distances 1. Single link dist(k i, k j ) = min(x i,p, y j,q ) 2. Complete link dist(k i, k j ) = max(x i,p, y j,q ) 3. Average distance dist(k i, k j ) = avg(x i,p, y j,q ) Izabela Moise, Evangelos Pournaras, Dirk Helbing 15
19 Centroid vs. Medoid Centroid: the middle of a cluster C n centroid = 1 x n i, n = C i=1 does not have to be one of the data points in the cluster Medoid: the central point of a cluster C the data point that is "least dissimilar" from all of the other data points has to be one of the data points in the cluster Centroids distance dist(k i, k j ) = dist(centroid i, centroid j ) Medoids distance dist(k i, k j ) = dist(medoid i, medoid j ) Izabela Moise, Evangelos Pournaras, Dirk Helbing 16
20 Centroid vs. Medoid Centroid: the middle of a cluster C n centroid = 1 x n i, n = C i=1 does not have to be one of the data points in the cluster Medoid: the central point of a cluster C the data point that is "least dissimilar" from all of the other data points has to be one of the data points in the cluster Centroids distance dist(k i, k j ) = dist(centroid i, centroid j ) Medoids distance dist(k i, k j ) = dist(medoid i, medoid j ) Izabela Moise, Evangelos Pournaras, Dirk Helbing 16
21 Centroid vs. Medoid Centroid: the middle of a cluster C n centroid = 1 x n i, n = C i=1 does not have to be one of the data points in the cluster Medoid: the central point of a cluster C the data point that is "least dissimilar" from all of the other data points has to be one of the data points in the cluster Centroids distance dist(k i, k j ) = dist(centroid i, centroid j ) Medoids distance dist(k i, k j ) = dist(medoid i, medoid j ) Izabela Moise, Evangelos Pournaras, Dirk Helbing 16
22 Centroid vs. Medoid Centroid: the middle of a cluster C n centroid = 1 x n i, n = C i=1 does not have to be one of the data points in the cluster Medoid: the central point of a cluster C the data point that is "least dissimilar" from all of the other data points has to be one of the data points in the cluster Centroids distance dist(k i, k j ) = dist(centroid i, centroid j ) Medoids distance dist(k i, k j ) = dist(medoid i, medoid j ) Izabela Moise, Evangelos Pournaras, Dirk Helbing 16
23 k-means very popular algorithm for clustering object = n-dimensional vector users specifies k ( of clusters) generic sketch: (1) pick k random vectors as centroids (2) assign vectors to closest centroid clusters (3) compute centroids of each cluster (4) repeat from (2) until clusters converge or a finite number of iterations is reached Izabela Moise, Evangelos Pournaras, Dirk Helbing 17
24 k-means Algorithm Izabela Moise, Evangelos Pournaras, Dirk Helbing 18
25 k-means in action K-means clustering The dataset. Input k=5 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
26 k-means in action Randomly picking 5 positions as initial cluster centers (not necessarily a data point) K-means clustering 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
27 k-means in action Each point finds which cluster center it is closest to (very much like 1NN). The point belongs to that cluster. K-means clustering 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
28 k-means in action Each cluster computes its new centroid, based on which points belong to it K-means clustering 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
29 k-means in action Each cluster computes its new centroid, based on which points belong to it And repeat until convergence (cluster centers no longer move) K-means clustering 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
30 k-means in action K-means: initial cluster centers 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
31 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
32 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
33 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
34 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
35 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
36 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
37 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
38 k-means in action K-means in action 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
39 k-means in action K-means stops 1 1 Introduction to Machine Learning, Xiaojin Zhu Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
40 k-means Algorithm Izabela Moise, Evangelos Pournaras, Dirk Helbing 20
41 k -Means Algorithm Izabela Moise, Evangelos Pournaras, Dirk Helbing 20
42 Why k-means converges? Whenever an assignment is changed, the sum squared distances of datapoints from their assigned cluster centers is reduced. Whenever a cluster center is moved the sum squared distances of the datapoints from their currently assigned cluster centers is reduced. If the assignments do not change in the assignment step, we have converged. Izabela Moise, Evangelos Pournaras, Dirk Helbing 21
43 k-means Convergence 1. assign each point to its nearest centroid 2. compute centroid of each cluster Izabela Moise, Evangelos Pournaras, Dirk Helbing 22
44 k-means Convergence 1. assign each point to its nearest centroid 2. compute centroid of each cluster Algorithm terminates when neither (1) nor (2) results in change of configuration Izabela Moise, Evangelos Pournaras, Dirk Helbing 22
45 Initial Centroids affect the final clusters (inter-cluster and intra-cluster distances) often chosen randomly clusters vary from one run to another one solution: 1. pick a random point x 1 from dataset 2. find the point x 2 farthest from x 1 in the dataset 3. find x 3 farthest from the closer of x 1, x 2 4. pick k points like this, use them as starting cluster centroids for the k clusters Izabela Moise, Evangelos Pournaras, Dirk Helbing 23
46 k-means Properties unsupervised, non-deterministic and iterative there are always k clusters there is always at least one point in each cluster clusters are non-hierarchical and they do not overlap Izabela Moise, Evangelos Pournaras, Dirk Helbing 24
47 Pros and Cons Pros: fast, robust and easy to understand relatively efficient best results when data are well separated from each other Izabela Moise, Evangelos Pournaras, Dirk Helbing 25
48 Pros and Cons Cons: X requires a priori specification of k X unable to handle noisy data and outliers Centroid is average of cluster members Outlier can dominate average computation Solution: K-medoids X different initial partitions can result in different final clusters Izabela Moise, Evangelos Pournaras, Dirk Helbing 26
K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing
K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela
More informationData Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science
Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 201 Road map What is Cluster Analysis? Characteristics of Clustering
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationCLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16
CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationBig Data Analytics! Special Topics for Computer Science CSE CSE Feb 9
Big Data Analytics! Special Topics for Computer Science CSE 4095-001 CSE 5095-005! Feb 9 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Clustering I What
More informationData Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationIntroduction to Clustering
Introduction to Clustering Ref: Chengkai Li, Department of Computer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) What is Cluster Analysis? Finding groups of
More informationCluster Analysis. CSE634 Data Mining
Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationDensity-Based Clustering. Izabela Moise, Evangelos Pournaras
Density-Based Clustering Izabela Moise, Evangelos Pournaras Izabela Moise, Evangelos Pournaras 1 Reminder Unsupervised data mining Clustering k-means Izabela Moise, Evangelos Pournaras 2 Main Clustering
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationWhat is Cluster Analysis? COMP 465: Data Mining Clustering Basics. Applications of Cluster Analysis. Clustering: Application Examples 3/17/2015
// What is Cluster Analysis? COMP : Data Mining Clustering Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, rd ed. Cluster: A collection of data
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationClustering Basic Concepts and Algorithms 1
Clustering Basic Concepts and Algorithms 1 Jeff Howbert Introduction to Machine Learning Winter 014 1 Machine learning tasks Supervised Classification Regression Recommender systems Reinforcement learning
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationAdministrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES
Administrative Machine learning: Unsupervised learning" Assignment 5 out soon David Kauchak cs311 Spring 2013 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Machine
More informationUnsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team
Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationUnsupervised Learning Partitioning Methods
Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationLecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, nd Edition by Tan, Steinbach, Karpatne, Kumar What is Cluster Analysis? Finding groups
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationWhat is Unsupervised Learning?
Clustering What is Unsupervised Learning? Unlike in supervised learning, in unsupervised learning, there are no labels We simply a search for patterns in the data Examples Clustering Density Estimation
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationK-Means. Oct Youn-Hee Han
K-Means Oct. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²K-Means algorithm An unsupervised clustering algorithm K stands for number of clusters. It is typically a user input to the algorithm Some criteria
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London What Is Text Clustering? Text Clustering = Grouping a set of documents into classes of similar
More informationClustering Part 1. CSC 4510/9010: Applied Machine Learning. Dr. Paula Matuszek
CSC 4510/9010: Applied Machine Learning 1 Clustering Part 1 Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 What is Clustering? 2 Given some instances with data:
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationAn Unsupervised Technique for Statistical Data Analysis Using Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 11-20 International Research Publication House http://www.irphouse.com An Unsupervised Technique
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationExploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray
Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data
More informationData Warehousing and Machine Learning
Data Warehousing and Machine Learning Preprocessing Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 35 Preprocessing Before you can start on the actual
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationClustering Analysis Basics
Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [5., KPM] Outline Introduction Data Types and Representations Distance Measures Major Clustering Methodologies Summary Introduction Cluster: A collection/group
More informationCluster Analysis: Basic Concepts and Algorithms
Cluster Analysis: Basic Concepts and Algorithms Data Warehousing and Mining Lecture 10 by Hossen Asiful Mustafa What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationClustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme
Clustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme Why do we need to find similarity? Similarity underlies many data science methods and solutions to business problems. Some
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationCHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM
CHAPTER 4 K-MEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationData Mining Algorithms
for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationIntroduction to Computer Science
DM534 Introduction to Computer Science Clustering and Feature Spaces Richard Roettger: About Me Computer Science (Technical University of Munich and thesis at the ICSI at the University of California at
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationA COMPARATIVE STUDY ON K-MEANS AND HIERARCHICAL CLUSTERING
A COMPARATIVE STUDY ON K-MEANS AND HIERARCHICAL CLUSTERING Susan Tony Thomas PG. Student Pillai Institute of Information Technology, Engineering, Media Studies & Research New Panvel-410206 ABSTRACT Data
More informationPreprocessing DWML, /33
Preprocessing DWML, 2007 1/33 Preprocessing Before you can start on the actual data mining, the data may require some preprocessing: Attributes may be redundant. Values may be missing. The data contains
More informationCOSC 6339 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2017.
COSC 6339 Big Data Analytics Fuzzy Clustering Some slides based on a lecture by Prof. Shishir Shah Edgar Gabriel Spring 217 Clustering Clustering is a technique for finding similarity groups in data, called
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationFlat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017
Flat Clustering Slides are mostly from Hinrich Schütze March 7, 07 / 79 Overview Recap Clustering: Introduction 3 Clustering in IR 4 K-means 5 Evaluation 6 How many clusters? / 79 Outline Recap Clustering:
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationClustering: Overview and K-means algorithm
Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More information数据挖掘 Introduction to Data Mining
数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More information