Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Similar documents
Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Cluster Analysis. Ying Shen, SSE, Tongji University

Clustering CS 550: Machine Learning

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

CSE 5243 INTRO. TO DATA MINING

CHAPTER 4: CLUSTER ANALYSIS

Unsupervised Learning and Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Unsupervised Learning and Clustering

CSE 5243 INTRO. TO DATA MINING

Finding Clusters 1 / 60

Cluster Analysis: Agglomerate Hierarchical Clustering

Unsupervised Learning

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Computer Science

Understanding Clustering Supervising the unsupervised

University of Florida CISE department Gator Engineering. Clustering Part 2

Lesson 3. Prof. Enza Messina

Cluster Analysis. Angela Montanari and Laura Anderlucci

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Kapitel 4: Clustering

Cluster analysis. Agnieszka Nowak - Brzezinska

Introduction to Data Mining

CSE 5243 INTRO. TO DATA MINING

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

University of Florida CISE department Gator Engineering. Clustering Part 4

Machine Learning. Unsupervised Learning. Manfred Huber

Unsupervised Learning : Clustering

Clustering Part 4 DBSCAN

Hierarchical clustering

Clustering Part 3. Hierarchical Clustering

Sergei Silvestrov, Christopher Engström. January 29, 2013

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Gene Clustering & Classification

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Cluster Analysis: Basic Concepts and Algorithms

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Hierarchical Clustering Lecture 9

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

K-Means. Oct Youn-Hee Han

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

k-means Clustering David S. Rosenberg April 24, 2018 New York University

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Hierarchical Clustering

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples

Forestry Applied Multivariate Statistics. Cluster Analysis

Clustering and Visualisation of Data

Hierarchical clustering

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Exploratory data analysis for microarrays

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Supervised vs. Unsupervised Learning

Clustering: Overview and K-means algorithm

Clustering. Pattern Recognition IX. Michal Haindl. Clustering. Outline

Chapter VIII.3: Hierarchical Clustering

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis

Introduction to Clustering

9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis

ECLT 5810 Clustering

Hierarchical Clustering 4/5/17

Chapter DM:II. II. Cluster Analysis

A COMPARATIVE STUDY ON K-MEANS AND HIERARCHICAL CLUSTERING

Exploiting Parallelism to Support Scalable Hierarchical Clustering

Clustering & Bootstrapping

Machine Learning (BSMC-GA 4439) Wenke Liu

MATH5745 Multivariate Methods Lecture 13

Chapter 6 Continued: Partitioning Methods

Clustering part II 1

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Clustering. Chapter 10 in Introduction to statistical learning

Machine learning - HT Clustering

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Unsupervised Learning

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Network Traffic Measurements and Analysis

[7.3, EA], [9.1, CMB]

Based on Raymond J. Mooney s slides

Clustering in Data Mining

Information Retrieval and Web Search Engines

CS7267 MACHINE LEARNING

An Unsupervised Technique for Statistical Data Analysis Using Data Mining

DATA MINING - 1DL105, 1Dl111. An introductory class in data mining

Machine Learning (BSMC-GA 4439) Wenke Liu

Transcription:

Center of Atmospheric Sciences, UNAM November 16, 2016

Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Cluster analisis is used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. https://en.wikipedia.org/wiki/cluster_analysis

Types of cluster models () Typical cluster models include: clustering. Based on distance connectivity Agglomerative (bottom up). Each observation starts in its own cluster, and pairs of clusters are merged as one moves up. Divisive (top down). All observations start in one cluster. https://en.wikipedia.org/wiki/_clustering https://upload.wikimedia.org/wikibooks/en/2/28/agglomerative_clustering_dendogram.png

Types of cluster models () In order to decide which clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required. Some commonly used metrics are: Euclidean distance. a b 2 = i (a i b i ) 2 Squared Euclidean distance. a b 2 2 = i (a i b i ) 2 Manhattan distance. a b 1 = i a i b i Maximum distance. a b = max i a i b i Mahalanobis distance. (a b) T S 1 (a b) (S is the covariance matrix). https://en.wikipedia.org/wiki/_clustering

Linkage criteria The linkage criterion determines the distance between sets of observations as a function of the pairwise distances between observations. Some commonly linkage criteria are: Maximum or complete-linkage. max{d(a, b) : a A, b B} Minimum or single-linkage. min{d(a, b) : a A, b B} Mean or average-linkage. a A b B d(a, b) 1 A B Centroid linkage. c a c b where C a and c b are the centroids of clusters A and B respectively. https://en.wikipedia.org/wiki/_clustering

Example Using the complete-linkage and the manhattan distance, perform cluster analysis on the following data points: a = [0,0] b = [0,1] c = [10,3] d = [4,2]

Exercise Using the single-linkage and the maximum distance, perform cluster analysis on the following data points: a = [0,0] b = [0,1] c = [10,3] d = [4,2]

Types of cluster models Other types of cluster models: Centroid models. Distribution models. Using statistical distributions. Density models. Define clusters as connected dense regions. Example: DBSCAN and OPTICS Graph-based. Clusters are represented as subset nodes in a graph. Connected nodes belong to the same cluster. Types of clustering: Hard clustering: Each object belongs to a cluster or not. Fuzzy clustering: Each object belongs to each cluster to a certain degree.

clustering (1967 James MacQueen) aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results in a partitioning of the data space into Voronoi cells. Given a set of observations x 1, x 2,, x n, where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k n sets S = {S 1, S 2,, S k } so as to minimize the within-cluster sum of squares. In other words, its objective is to find: argmin S k i=1 where µ i is the mean of points in Si. https://en.wikipedia.org/wiki/_clustering x S i x µ i 2 (1)

algorithm 1 Randomly initialize k cluster centers: µ i 2 Assign each observation to its closest cluster. { Si t = x p : x p µ t i 2 x p µ t j 2 } j, 1 j k 3 Update each cluster center to the mean of the observations belonging to that cluster. µ t+1 i = 1 x j Si t (3) x j S t i (2) 4 Repeat steps 2 and 3 until the position of the centers do not change, or the change is minimum.

Example Program k-means in python! Yeah!