An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Similar documents
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Introduction to Artificial Intelligence

Clustering Lecture 5: Mixture Model

Cluster Analysis: Agglomerate Hierarchical Clustering

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Clustering algorithms

Introduction to Machine Learning CMU-10701

Unsupervised: no target value to predict

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Chapter 6 Continued: Partitioning Methods

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Introduction to R and Statistical Data Analysis

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering and The Expectation-Maximization Algorithm

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Machine Learning (BSMC-GA 4439) Wenke Liu

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

Network Traffic Measurements and Analysis

IBL and clustering. Relationship of IBL with CBR

Clustering. K-means clustering

University of Florida CISE department Gator Engineering. Clustering Part 2

Unsupervised Learning and Clustering

MATH5745 Multivariate Methods Lecture 13

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

K-Means Clustering 3/3/17

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Supervised vs. Unsupervised Learning

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Clustering CS 550: Machine Learning

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

ALTERNATIVE METHODS FOR CLUSTERING

COMS 4771 Clustering. Nakul Verma

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

The Curse of Dimensionality

Unsupervised Learning

Unsupervised Learning and Clustering

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Machine Learning. Unsupervised Learning. Manfred Huber

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Clustering. Supervised vs. Unsupervised Learning

Input: Concepts, Instances, Attributes

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Learning

Unsupervised Learning: Clustering

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Clustering: Classic Methods and Modern Views

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Pattern Clustering with Similarity Measures

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Finding Clusters 1 / 60

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Clustering web search results

Iterated Consensus Clustering: A Technique We Can All Agree On

Experimental Design + k- Nearest Neighbors

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

K-Means Clustering. Sargur Srihari

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Machine Learning for OR & FE

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Hierarchical Clustering Lecture 9

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Motivation. Technical Background

Expectation Maximization: Inferring model parameters and class labels

What to come. There will be a few more topics we will cover on supervised learning

CHAPTER 4: CLUSTER ANALYSIS

Clustering: K-means and Kernel K-means

Clustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

CLUSTERING. JELENA JOVANOVIĆ Web:

Note Set 4: Finite Mixture Models and the EM Algorithm

Machine Learning: Algorithms and Applications Mockup Examination

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Introduc)on to Machine Learning: Clustering Algorithms How to Analyze Your Own Genome Fall 2013

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Data Mining Practical Machine Learning Tools and Techniques

Chapter 6: Cluster Analysis

ECE 5424: Introduction to Machine Learning

Unsupervised Learning : Clustering

Expectation Maximization (EM) and Gaussian Mixture Models

10701 Machine Learning. Clustering

Support Vector Machines - Supplement

CSE 5243 INTRO. TO DATA MINING

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Transcription:

An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1

What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two measurements per subject 0.0 0.5 1.0 1.5 signal T 2

signal C 0.0 0.5 1.0 0.0 0.5 1.0 1.5 signal T 3

CC signal C 0.0 0.5 1.0 CT TT 0.0 0.5 1.0 1.5 signal T 4

Cluster Analysis Seeks rules to group data Large between-cluster difference Small within-cluster difference Exploratory Aims to understand/ learn the unknown substructure of multivariate data 5

Cluster Analysis vs Classification Data are unlabeled The number of clusters are unknown Unsupervised learning Goal: find unknown structures The labels for training data are known The number of classes are known Supervised learning Goal: allocate new observations, whose labels are unknown, to one of the known classes 6

The Iris Data It was collected by F. A. Fisher A famous data set that has been widely used in textbooks Four features: sepal length in cm sepal width in cm petal length in cm petal width in cm

The Iris Data Three types: Setosa Versicolor Virginica 8

The Iris Data Sepal L. Sepal W. Petal L. Petal W. [1,] 5.1 3.5 1.4 0.2 [2,] 4.9 3.0 1.4 0.2 [3,] 4.7 3.2 1.3 0.2 [4,] 4.6 3.1 1.5 0.2 [5,] 5.0 3.6 1.4 0.2 [6,] 5.4 3.9 1.7 0.4 [7,] 4.6 3.4 1.4 0.3 [8,] 5.0 3.4 1.5 0.2 [9,] 4.4 2.9 1.4 0.2 [45,] 5.1 3.8 1.9 0.4 [46,] 4.8 3.0 1.4 0.3 [47,] 5.1 3.8 1.6 0.2 [48,] 4.6 3.2 1.4 0.2 [49,] 5.3 3.7 1.5 0.2 [50,] 5.0 3.3 1.4 0.2 Iris Setosa Sepal L. Sepal W. Petal L. Petal W. [1,] 7.0 3.2 4.7 1.4 [2,] 6.4 3.2 4.5 1.5 [3,] 6.9 3.1 4.9 1.5 [4,] 5.5 2.3 4.0 1.3 [5,] 6.5 2.8 4.6 1.5 [6,] 5.7 2.8 4.5 1.3 [7,] 6.3 3.3 4.7 1.6 [8,] 4.9 2.4 3.3 1.0 [9,] 6.6 2.9 4.6 1.3 [45,] 5.6 2.7 4.2 1.3 [46,] 5.7 3.0 4.2 1.2 [47,] 5.7 2.9 4.2 1.3 [48,] 6.2 2.9 4.3 1.3 [49,] 5.1 2.5 3.0 1.1 [50,] 5.7 2.8 4.1 1.3 Iris Versicolor Sepal L. Sepal W. Petal L. Petal W. [1,] 6.3 3.3 6.0 2.5 [2,] 5.8 2.7 5.1 1.9 [3,] 7.1 3.0 5.9 2.1 [4,] 6.3 2.9 5.6 1.8 [5,] 6.5 3.0 5.8 2.2 [6,] 7.6 3.0 6.6 2.1 [7,] 4.9 2.5 4.5 1.7 [8,] 7.3 2.9 6.3 1.8 [9,] 6.7 2.5 5.8 1.8 [45,] 6.7 3.3 5.7 2.5 [46,] 6.7 3.0 5.2 2.3 [47,] 6.3 2.5 5.0 1.9 [48,] 6.5 3.0 5.2 2.0 [49,] 6.2 3.4 5.4 2.3 [50,] 5.9 3.0 5.1 1.8 Iris Virginica

The Iris Data Sepal L. Sepal W. Petal L. Petal W. Setosa Versicolor Virginica 10

Clustering Methods Model-free: Nonhierarchical clustering. K-means. Hierarchical clustering. Based on similarity measures Model-based clustering 11

Model-Free Clustering Nonhierarchical Clustering: K-Means 12

K-Means Assign each observation to the cluster with the nearest mean Nearest is usually defined based on Euclidean distance 13

K-Means: Algorithm Step 0: Preprocess data. Standardize data if appropriate Step 1: Partition the observations into K initial clusters. Step 2 2.a (update step): Calculate the centroids. 2.b (assignment step): Assign each observation to its nearest cluster. Repeat step 2 until no more changes in assignments 14

From An Introduction to Statistical Learning 15

Remarks Before convergence, each step is guaranteed to decrease the within-cluster sum of squares objective Within a finite number of steps, the algorithm might converge to a (local) minimum Use different and random initial values 16

Different Initial Values From An Introduction to Statistical Learning 17

Example: Cluster Analysis of Iris Data (Petal L & W) Pretend that the iris types of the observations are unknown => cluster analysis As an example, and for illustration purpose, we will use petal length and width Choose K=3 K-means 18

K-Mean Clustering: Iris (Petal L & W) Note: the animation in the figure doesn t work appropriately on MAC. 19

K-Mean Clustering: Iris (Petal L & W) 20

K-Mean Clustering: Iris (Petal L & W) Iteration= 1 Iteration= 2 Iteration= 3 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica Iteration= 4 Iteration= 5 Iteration= 6 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica Iteration= 7 Iteration= 8 Iteration= 9 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica 0.0 0.2 0.4 0.6 0.8 1.0 Setosa Versicolor Virginica 21

K-Mean Clustering: Iris (Petal L & W) Setosa Versicolor Virginica Note: the animation in the figure doesn t work appropriately on MAC. 22

Model-Free Clustering: Hierarchical Clustering 23

Hierarchical Clustering The number of clusters is not required Gives a tree-based representation of observations - dendrogram Each leaf represents an observation Leaves similar with each other are fused to branches Leaves/branches similar with each other are fused to branches 24

Setosa Virginica Versicolor 25

Hierarchical Clustering To grow a tree, we need to define dissimilarities (distances) between leaves/branches Two leaves: easy. One can use a dissimilarity measure A leaf and a branch: there are different options Two branches: similar to a leaf and a branch, there are different options 26

Distance between Two Branches/Clusters Single linkage Complete linkage Average linkage Many other options! 27

Model-Based Clustering 28

Model-Based Clustering: Mixture Model Consider a random variable X. We say it follows a mixture of K distributions if its distribution can be represented using K distributions: The weights p k, k=1,,k are nonnegative numbers and they add up to 1 29

Cluster Analysis Based on Mixture Model I present a frequentist version Choose an appropriate model. E.g., A Gaussian mixture model with K=2 clusters Write down the likelihood function Find the maximum likelihood estimate of the parameters Calculate the Pr(cluster k observation x i ) for i=1,,n, k=1,2 30

The Maximum Likelihood Estimate (MLE) of the Parameters An easy-to-implement algorithm to find the MLEs is called the Expectation and Maximization (EM) algorithm Initialize parameters E step: calculate conditional expectation. conditional means conditional on current estimate of the parameters This step involves calculating prob(cluster k obs I, current estimate of para), k=1,,k, i=1,,n This step is similar to the assignment step in an K- means algorithm 31

The Maximum Likelihood Estimate (MLE) of the Parameters The M step: find the set of values that maximize the conditional expectation calculated in the E step. This step updates the parameter values Repeat the E and M steps until convergence 32

EM vs K-Mean EM Step 1: initialization E: Calculate conditional probabilities M step: Find optimal values for parameters Repeat the E and M steps until convergence Allows clusters to have different shapes K-Mean Step 1: initialization Step 2a: guess cluster membership Step 2b: find cluster centers Repeat 2a-2b until convergence 33

Example: Gaussian Mixture Model Observed data (simulated from two normal distributions) 0.37 1.18 0.16 2.60 1.33 0.18 1.49 1.74 3.58 2.69 4.51 3.39 2.38 0.79 4.12 2.96 2.98 3.94 3.82 3.59 Assuming K=2 Parameters: μ 1, μ 0, σ 1, σ 0, p 34

Example: simulated data Group 1 Group 2 ~ N(1,1) ~ N(3,1) 35 Note: the animation in the figure doesn t work appropriately on MAC.

Example: Cluster Analysis of Iris Data Using Petal Length Setosa Versicolor Note: the animation in the figure doesn t work appropriately on MAC. 36

R Package: MCLUST Developed by Adrian Raftery and colleagues Gaussian mixture model EM Clustering, classification, density estimation Please try it out! 37

Clustering Analysis For Multidimensional Data 38

Multidimensional Data Human faces, images 3D objects Text documents Brain imaging 39

Whole Brain Connectivity task 1 task 2 task 3 rest 1 rest 2 rest 3 Sub 1 Sub 2 Sub 3 Sub 4 40

Brain Connectivity vs Fingerprint Subject ID 41

task 1 task 2 task 3 rest 1 rest 2 rest 3 42

Some Technical Details? 43