CS Introduction to Data Mining Instructor: Abdullah Mueen

Similar documents
Clustering Lecture 5: Mixture Model

Distance-based Methods: Drawbacks

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Unsupervised Learning : Clustering

Introduction to Mobile Robotics

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Clustering CS 550: Machine Learning

University of Florida CISE department Gator Engineering. Clustering Part 2

COMP 465: Data Mining Still More on Clustering

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Biclustering for Microarray Data: A Short and Comprehensive Tutorial


CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Gene Clustering & Classification

Biclustering with δ-pcluster John Tantalo. 1. Introduction

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

ALTERNATIVE METHODS FOR CLUSTERING

ECLT 5810 Clustering

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team

Mixture Models and the EM Algorithm

IBL and clustering. Relationship of IBL with CBR

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

A Review on Cluster Based Approach in Data Mining

Clustering in Data Mining

ECLT 5810 Clustering

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition

Methods for Intelligent Systems

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Unsupervised Learning

Expectation Maximization!

Clustering in Ratemaking: Applications in Territories Clustering

Clustering: Classic Methods and Modern Views

Data Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1

Clustering Part 4 DBSCAN

Understanding Clustering Supervising the unsupervised

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Segmentation: Clustering, Graph Cut and EM

CHAPTER 4: CLUSTER ANALYSIS

Kapitel 4: Clustering

COMS 4771 Clustering. Nakul Verma

CSE 5243 INTRO. TO DATA MINING

10701 Machine Learning. Clustering

Clustering Lecture 3: Hierarchical Methods

University of Florida CISE department Gator Engineering. Clustering Part 4

Data Mining 4. Cluster Analysis

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Chapter VIII.3: Hierarchical Clustering

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Unsupervised Learning and Clustering

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

CS570: Introduction to Data Mining

What is Cluster Analysis? COMP 465: Data Mining Clustering Basics. Applications of Cluster Analysis. Clustering: Application Examples 3/17/2015

Lecture on Modeling Tools for Clustering & Regression

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Knowledge Discovery in Databases

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Clustering algorithms

Unsupervised: no target value to predict

Cluster Analysis. Ying Shen, SSE, Tongji University

Data Mining: Concepts and Techniques. Chapter 11. (3 rd ed.)

Supervised vs. Unsupervised Learning

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Big Data SONY Håkan Jonsson Vedran Sekara

d(2,1) d(3,1 ) d (3,2) 0 ( n, ) ( n ,2)......

Note Set 4: Finite Mixture Models and the EM Algorithm

Expectation Maximization (EM) and Gaussian Mixture Models

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Unsupervised Learning and Clustering

[7.3, EA], [9.1, CMB]

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Biclustering Algorithms for Gene Expression Analysis

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Biclustering Bioinformatics Data Sets. A Possibilistic Approach

Gene expression & Clustering (Chapter 10)

Network Traffic Measurements and Analysis

SGN (4 cr) Chapter 11

DATA MINING - 1DL105, 1Dl111. An introductory class in data mining

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Chapter 6 Continued: Partitioning Methods

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

On Mining Micro-array data by Order-Preserving Submatrix

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

Clustering Basic Concepts and Algorithms 1

Cluster analysis. Agnieszka Nowak - Brzezinska

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering from Data Streams

Transcription:

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING)

Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts Group data so that object similarity is high within clusters but low across clusters Partitioning Methods K-means and k-medoids algorithms and their refinements Hierarchical Methods Agglomerative and divisive method, Birch, Cameleon Density-Based Methods DBScan, Optics and DenCLu Grid-Based Methods STING and CLIQUE (subspace clustering) Evaluation of Clustering Assess clustering tendency, determine # of clusters, and measure clustering quality 2

Outline of Advanced Clustering Analysis Probability Model-Based Clustering Each object may take a probability to belong to a cluster Clustering High-Dimensional Data Curse of dimensionality: Difficulty of distance measure in high-d space Clustering Graphs and Network Data Similarity measurement and clustering methods for graph and networks Clustering with Constraints Cluster analysis under different kinds of constraints, e.g., that raised from background knowledge or spatial distribution of the objects 3

Chapter 11. Cluster Analysis: Advanced Methods Probability Model-Based Clustering Clustering High-Dimensional Data Clustering Graphs and Network Data Clustering with Constraints Summary 4 4

Fuzzy Set and Fuzzy Cluster Clustering methods discussed so far Every data object is assigned to exactly one cluster Some applications may need for fuzzy or soft cluster assignment Ex. An e-game could belong to both entertainment and software Methods: fuzzy clusters and probabilistic model-based clusters Fuzzy cluster: A fuzzy set S: F S : X [0, 1] (value between 0 and 1) Example: Popularity of cameras is defined as a fuzzy mapping Then, A(0.05), B(1), C(0.86), D(0.27) 5

Fuzzy (Soft) Clustering Example: Let cluster features be C 1 : digital camera and lens C 2 : computer Fuzzy clustering k fuzzy clusters C 1,,C k,represented as a partition matrix M = [w ij ] P1: for each object o i and cluster C j, 0 w ij 1 (fuzzy set) P2: for each object o i,, equal participation in the clustering P3: for each cluster C j,, ensures there is no empty cluster Let c 1,, c k as the center of the k clusters For an object o i, sum of the squared error (SSE), p is a parameter: For a cluster C i, SSE: Measure how well a clustering fits the data: 6

The EM (Expectation Maximization) Algorithm The k-means algorithm has two steps at each iteration: Expectation Step (E-step): Given the current cluster centers, each object is assigned to the cluster whose center is closest to the object: An object is expected to belong to the closest cluster Maximization Step (M-step): Given the cluster assignment, for each cluster, the algorithm adjusts the center so that the sum of distance from the objects assigned to this cluster and the new center is minimized The (EM) algorithm: A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models. E-step assigns objects to clusters according to the current fuzzy clustering or parameters of probabilistic clusters M-step finds the new clustering or parameters that maximize the sum of squared error (SSE) or the expected likelihood 7

Iteratively calculate this until the cluster centers converge or the change is small enough Fuzzy Clustering Using the EM Algorithm Initially, let c 1 = a and c 2 = b 1 st E-step: assign o to c 1,w. wt = 1 st M-step: recalculate the centroids according to the partition matrix, minimizing the sum of squared error (SSE)

Probabilistic Model-Based Clustering Cluster analysis is to find hidden categories. A hidden category (i.e., probabilistic cluster) is a distribution over the data space, which can be mathematically represented using a probability density function (or distribution function). Ex. 2 categories for digital cameras sold consumer line vs. professional line density functions f 1, f 2 for C 1, C 2 obtained by probabilistic clustering A mixture model assumes that a set of observed objects is a mixture of instances from multiple probabilistic clusters, and conceptually each observed object is generated independently Our task: infer a set of k probabilistic clusters that is mostly likely to generate D using the above data generation process 9

Model-Based Clustering A set C of k probabilistic clusters C 1,,C k with probability density functions f 1,, f k, respectively, and their probabilities ω 1,, ω k. Probability of an object o generated by cluster C j is Probability of o generated by the set of cluster C is Since objects are assumed to be generated independently, for a data set D = {o 1,, o n }, we have, Task: Find a set C of k probabilistic clusters s.t. P(D C) is maximized However, maximizing P(D C) is often intractable since the probability density function of a cluster can take an arbitrarily complicated form To make it computationally feasible (as a compromise), assume the probability density functions being some parameterized distributions 10

Univariate Gaussian Mixture Model O = {o 1,, o n } (n observed objects), Θ = {θ 1,, θ k } (parameters of the k distributions), and P j (o i θ j ) is the probability that o i is generated from the j-th distribution using parameter θ j, we have Univariate Gaussian mixture model Assume the probability density function of each cluster follows a 1- d Gaussian distribution. Suppose that there are k clusters. The probability density function of each cluster are centered at μ j with standard deviation σ j, θ j, = (μ j, σ j ), we have 11

Computing Mixture Models with EM Given n objects O = {o 1,, o n }, we want to mine a set of parameters Θ = {θ 1,, θ k } s.t.,p(o Θ) is maximized, where θ j = (μ j, σ j ) are the mean and standard deviation of the j-th univariate Gaussian distribution We initially assign random values to parameters θ j, then iteratively conduct the E- and M- steps until converge or sufficiently small change At the E-step, for each object o i, calculate the probability that o i belongs to each distribution, At the M-step, adjust the parameters θ j = (μ j, σ j ) so that the expected likelihood P(O Θ) is maximized 12

Advantages and Disadvantages of Mixture Models Strength Mixture models are more general than partitioning and fuzzy clustering Clusters can be characterized by a small number of parameters The results may satisfy the statistical assumptions of the generative models Weakness Converge to local optimal (overcome: run multi-times w. random initialization) Computationally expensive if the number of distributions is large Need large data sets Hard to estimate the number of clusters 13

Bi-Clustering Methods Bi-clustering: Cluster both objects and attributes simultaneously (treat objs and attrs in symmetric way) Four requirements: Only a small set of objects participate in a cluster A cluster only involves a small number of attributes An object may participate in multiple clusters, or does not participate in any cluster at all An attribute may be involved in multiple clusters, or is not involved in any cluster at all Ex 1. Gene expression or microarray data: a gene sample/condition matrix. Each element in the matrix, a real number, records the expression level of a gene under a specific condition Ex. 2. Clustering customers and products Another bi-clustering problem

Types of Bi-clusters Let A = {a 1,..., a n } be a set of genes, B = {b 1,, b n } a set of conditions A bi-cluster: A submatrix where genes and conditions follow some consistent patterns 4 types of bi-clusters (ideal cases) Bi-clusters with constant values: for any i in I and j in J, e ij = c Bi-clusters with constant values on rows: e ij = c + α i Also, it can be constant values on columns Bi-clusters with coherent values (aka. pattern-based clusters) e ij = c + α i + β j Bi-clusters with coherent evolutions on rows e ij (e i1j1 e i1j2 )(e i2j1 e i2j2 ) 0 i.e., only interested in the up- or down- regulated changes across genes or conditions without constraining on the exact values 15

Bi-Clustering Methods Real-world data is noisy: Try to find approximate bi-clusters Methods: Optimization-based methods vs. enumeration methods Optimization-based methods Try to find a submatrix at a time that achieves the best significance as a bi-cluster Due to the cost in computation, greedy search is employed to find local optimal bi-clusters Ex. δ-cluster Algorithm (Cheng and Church, ISMB 2000) Enumeration methods Use a tolerance threshold to specify the degree of noise allowed in the bi-clusters to be mined Then try to enumerate all submatrices as bi-clusters that satisfy the requirements Ex. δ-pcluster Algorithm (H. Wang et al. SIGMOD 2002, MaPle: Pei et al., ICDM 2003) 16

Bi-Clustering for Micro-Array Data Analysis Left figure: Micro-array raw data shows 3 genes and their values in a multi-d space: Difficult to find their patterns Right two: Some subsets of dimensions form nice shift and scaling patterns No globally defined similarity/distance measure Clusters may not be exclusive An object can appear in multiple clusters 17