Mixture models and clustering

Size: px
Start display at page:

Download "Mixture models and clustering"

Transcription

1 1 Lecture topics: Miture models and clustering, k-means Distance and clustering Miture models and clustering We have so far used miture models as fleible ays of constructing probability models for prediction tasks. The motivation behind the miture model as that the available data may include unobserved (sub)groups and by incorporating such structure in the model e could obtain more accurate predictions. We are also interested in uncovering that group structure. This is a clustering problem. For eample, in the case of modeling eam scores it ould be useful to understand hat types of students there are. In a biological contet, it ould be useful to uncover hich genes are active (epressed) in hich cell types hen the measurements are from tissue samples involving multiple cell types in unknon quantities. Clustering problems are ubiquitous. There are many different types of clustering algorithms. Some generate a series of nested clusters by merging simple clusters into larger ones (hierarchical agglomerative clustering), hile others try to find a pre-specified number of clusters that best capture the data (e.g., k-means). Which algorithm is the most appropriate to use depends in part on hat e are looking for. So it is very useful to kno more than one clustering method. Miture models as generative models require us to articulate the type of clusters or subgroups e are looking to identify. The simplest type of clusters e could look for are spherical Gaussian clusters, i.e., e ould be estimating Gaussian mitures of the form m P (; θ, m) = P (j)n(; µ j, σ 2 j I) (1) j=1 here the parameters θ include {P (j)}, {µ j }, and {σ 2 j }. Note that e are estimating the miture models ith a different objective in mind. We are more interested in finding here the clusters are than ho good the miture model is as a generative model. There are many questions to resolve. For eample, ho many such spherical Gaussian clusters are there in the data? This is a model selection problem. If such clusters eist, do e have enough data to identify them? If e have enough data, can e hope to find the clusters via the EM algorithm? Is our approach robust, i.e., does our method degrade

2 2 gracefully hen data contain background samples or impurities along ith spherical Gaussian clusters? Can e make the clustering algorithm more robust? Similar questions apply to other clustering algorithms. We ill touch on some of these issues in the contet of each algorithm. Mitures and K-means It is often helpful to try to search for simple clusters. For eample, consider a miture of to Gaussians here the variances of the spherical Gaussians may be different: 2 P (; θ) = P (j)n(; µ j, σ 2 j I) (2) j=1 Points far aay from the to means, in hatever direction, ould alays be assigned to the Gaussian ith the larger variance (tails of that distribution approach zero much sloer; the posterior assignment is based on the ratio of probabilities). While this is perfectly fine for modeling the density, it does not quite agree ith the clustering goal. To avoid such issues (and to keep the discussion simpler) e ill restrict the covariance matrices to be all identical and equal to σ 2 I here σ 2 is common to all clusters. Moreover, e ill fi the miing proportions to be uniform P (j) = 1/m. With such restrictions e can more easily understand the type of clusters e ill discover. The resulting simplified miture model has the form m 1 m j=1 P (; θ) = N(; µ j, σ 2 I) () Let s begin by trying to understand ho points are assigned to clusters ithin the EM algorithm. To this end, e can see ho the miture model partitions the space into regions here ithin each region a specific component has the highest posterior probability. To start ith, consider any to components i and j and the boundary here P (i, θ) = P (j, θ), i.e., the set of points for hich the component i and j have the same posterior probability. Since the Gaussians have equal prior probabilities, this happens hen N(; µ j, σ 2 I) = N(; µ i, σ 2 I). Both Gaussians have the same spherical covariance matri so the probability that they assign to points is based on the Euclidean distances to their mean vectors. The posteriors therefore can be equal only hen the component means are equidistant from the points: µ i = µ j or 2 T (µ j µ i ) = µ j µ j (4)

3 The boundary is therefore linear 1 in. We can dra such boundaries beteen any pair of components as illustrated in Figure 1. The pairise comparisons induce a Voronoi partition of the space here, for eample, the region enclosing µ 1 in the figure corresponds to all the points hose closest mean is µ 1. This is also the region here P (1, θ) takes the highest value among all the posterior probabilities. 1 µ µ 1 µ 1 Figure 1: Voronoi regions resulting from pairise comparison of posterior assignment probabilities. Each line corresponds to a decision beteen posterior probabilities of to components. The line is highlighted (solid) hen the boundary is an active constraint for deciding the highest probability component. Note that the posterior assignment probabilities P (j, θ) evaluated in the E-step of the EM algorithm do not vanish across the boundaries. The regions merely highlight hen one assignment has a higher probability than the others. The overall variance parameter σ 2 controls ho sharply the posterior probabilities change hen e cross each boundary. Small σ 2 results in very sharp transitions (e.g., from near zero to nearly one) hile the posteriors change smoothly hen σ 2 is large. K-means. We can define a simpler and faster version of the EM algorithm by just assigning each point to the component ith the highest posterior probability (its closest mean). Geometrically, the points ithin each Voronoi region are assigned to one component. The M-step then simply repositions each mean vector to the center of the points ithin the region. The updated means define ne Voronoi regions and so on. The resulting algorithm for finding cluster means is knon as the K-means algorithm: E-step: assign each point t to its closest mean, i.e., j t = arg min j t µ j 2 M-step: recompute µ j s as means of the assigned points. 1 The linearity holds hen the to Gaussians have the same covariance matri, spherical or not.

4 4 This is a highly popular hard assignment version of the EM-algorithm. We can vie the algorithm as optimizing the complete log-likelihood of the data ith respect to the assignments j 1,..., j n (E-step) as ell as the means µ 1,..., µ m (M-step). This is equivalent to minimizing the overall squared error (distortion) to the cluster means: n J(j 1,..., j n, µ 1,..., µ m ) = t µ jt 2 (5) Each step of the algorithm, E or M-step, decreases the objective until convergence. The algorithm can never come back to the same assignments j 1,..., j n as it ould result in the same value of the objective. Since there are only O(n k ) possible partitions of points ith k means, the algorithm has to converge in a finite number of steps. At convergence, the cluster means ˆµ 1,..., µˆm are locally optimal solutions to the minimum distortion objective t=1 n J(ˆµ 1,..., µˆm) = min t µˆj 2 (6) j t=1 The speed of the K-means algorithm comes at a cost. It is even more sensitive to proper initialization than a miture of Gaussians trained ith EM. A typical and reasonable initialization corresponds to setting the first mean to be a randomly selected training point, the second as the furthest training point from the first mean, the third as the furthest training point from the to means, and so on. This initialization is a greedy packing algorithm. Identifiability. The k-means or miture of Gaussians ith the EM algorithm can only do as ell as the maimum likelihood solution they aim to recover. In other ords, even hen the underlying clusters are spherical, the number of training eamples may be too small for the globally optimal (non-trivial) maimum likelihood solution to provide any reasonable clusters. As the number of points increases, the quality of the maimum likelihood solution improves but may be hard to find ith the EM algorithm. A large number of training points also makes it easier to find the maimum likelihood solution. We could therefore roughly speaking divide the clustering problem into three regimes depending on the number of training points: not solvable, solvable but hard, and easy. For a recent discussion on such regimes, see Srebro et al., An Investigation of Computational and Informational Limits in Gaussian Miture Clustering, ICML 2006.

5 5 Distance and clustering Gaussian miture models and the K-means algorithm make use of the Euclidean distance beteen points. Why should the points be compared in this manner? In many cases the vector representation of objects to be clustered is derived, i.e., comes from some feature transformation. It is not at all clear that the Euclidean distance is an appropriate ay of comparing the resulting feature vectors. Consider, for eample, a document clustering problem. We might treat each document as a bag of ords and map it into a feature vector based on term frequencies: n () = number of times ord appears in (7) n () f( ) = = term frequency (8) n () { [ φ () = f( ) log IDF }} ] { # of docs # of docs ith ord (9) here the feature vector φ() is knon as the TFIDF mapping (there are many variations of this). IDF stands for inverse document frequency and aims to de-emphasize ords that appear in all documents, ords that are unlikely to be useful for clustering or classification. We can no interpret the vectors φ() as points in a Euclidean space and apply, e.g., the K-means clustering algorithm. There are, hoever, many other ays of defining a distance metric for clustering documents. Distance metric plays a central role in clustering, regardless of the algorithm. For eample, a simple hierarchical agglomerative clustering algorithm is defined almost entirely on the basis of the distance function. The algorithm proceeds by successively merging to closest points or closest clusters (average squared distance) and is illustrated in Figure 2. In solving clustering problems, the focus should be on defining the distance (similarity) metric, perhaps at the epense of a specific algorithm. Most algorithms could be applied ith the chosen metric. One avenue for defining a distance metric is through model selection. Let s see ho this can be done. The simplest model over the ords in a document is a unigram model here each ord is an independent sample from a multi-nomial distribution {θ }, θ = 1. The probability of all the ords in document is therefore P ( θ) = θ = θ n () (10)

6 6 a) b) c) d) Figure 2: a-c) successive merging of clusters in a hierarchical clustering algorithm and d) the resulting cluster hierarchy. It is often a good idea to normalize the documents so they have the same length. We could, for eample, say that each document is of length one and ord counts are given by term frequencies f( ). Accordingly, P ( θ) = θ f ( ) (11) here denotes a normalized version of document. Consider a cluster of documents C. The unigram model that best describes the normalized documents in the cluster can be obtained by maimizing the log-likelihood l(c; θ) = log P ( θ) = f( ) log θ (12) C The maimum likelihood solution is, as before, obtained through empirical counts (represented here by term frequencies): C 1 θˆ = C f( ) (1) C

7 7 The resulting log-likelihood value is l(c; θˆ) = f( ) log θˆ (14) C here H(θˆ) is the entropy of the unigram distribution. = C 1 f( ) log θˆ (15) C C = C θˆ log θˆ = C H(θˆ) (16) We can no proceed to define a distance corresponding to the unigram model. Consider any to clusters C i and C j and their combination C = C i C j. When C i and C j are treated as distinct clusters, e ill use a different unigram model to capture their term frequencies. The combined cluster C, on the other hand, is modeled ith a single unigram model. We can no treat the documents in the to clusters as observations and devise a model selection problem for deciding hether the clusters should be vieed as separate or merged. To this end, e ill evaluate the resulting log-likelihoods in to ays corresponding to hether the clusters are modeled as separate or combined. When separate: l(c i θˆ i) + l(c j θˆ j ) = C i H(θˆ i) C j H(θˆ j ) (17) = ( C i + C j ) Pˆ(y)H(θˆ y) (18) y {i,j} here Pˆ(y = i) = C i /( C i + C j ) is the probability that a randomly dran document from C = C i C j is from cluster C i. The parameter estimates θˆ i and θˆ j are obtained as before. Similarly, hen the to clusters are modeled ith the same unigram: l(c i, C j θˆ) = ( C i + C j )H(θˆ) (19) here the parameter estimate θˆ can be ritten as θˆ = y {i,j} Pˆ(y)θˆ y, i.e., as a cluster size eighted combination of the estimates from the individual clusters. We can no define a (squared) distance beteen C i and C j according to ho much better they are modeled as separate clusters (log-likelihood ratio statistic): d 2 (C i, C j ) = l(c i θˆ i) + l(c j θˆ j ) l(c i, C j θˆ) (20) [ ] = ( C i + C j ) H(θˆ) Pˆ(y)H(θˆ y) (21) y {i,j} = ( C i + C j ) Î(; y) (22)

8 8 here Î(; y) is the mutual information beteen ords sampled from the combined cluster C and the identity of the cluster (i or j). (Recall a similar derivation in the feature selection contet). More precisely, the mutual information is computed on the basis of Pˆ(, y) = Pˆ(y)θˆ y. Note that d 2 (C i, C j ) is symmetric and non-negative but need not satisfy all the properties of a metric. It nevertheless compares the to clusters in a very natural ay: e measure ho distinguishable they are based on their ord distributions. In other ords, Î(; y) tells us ho much a ord sampled at random from a document in the combined cluster C tells us about the cluster C i or C j that the sample came from. Lo information content means that the to clusters have nearly identical distributions over ords. The (squared) metric d 2 (C i, C j ) can be immediately used, e.g., in a hierarchical clustering algorithm (the same algorithm derived from a different perspective is knon as an agglomerative information bottleneck method). We could take the model selection idea further and decide hen to stop merging clusters rather than simply providing a scale for deciding ho different to clusters are.

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

k-means Gaussian mixture model Maximize the likelihood exp(

k-means Gaussian mixture model Maximize the likelihood exp( k-means Gaussian miture model Maimize the likelihood Centers : c P( {, i c j,...,, c n },...c k, ) ep( i c j ) k-means P( i c j, ) ep( c i j ) Minimize i c j Sum of squared errors (SSE) criterion (k clusters

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Tutorial 8: Practice Exam Questions

Tutorial 8: Practice Exam Questions Tutorial 8: Practice Exam Questions Informatics 1 Data & Analysis Notes on Solutions Week 10, Semester 2, 2017/18 Read this first: it is not the same as the other tutorials Folloing the strike by university

More information

SGN (4 cr) Chapter 11

SGN (4 cr) Chapter 11 SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

10.2 Single-Slit Diffraction

10.2 Single-Slit Diffraction 10. Single-Slit Diffraction If you shine a beam of light through a ide-enough opening, you might expect the beam to pass through ith very little diffraction. Hoever, hen light passes through a progressively

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

New Graphs of Finite Mutation Type

New Graphs of Finite Mutation Type Ne Graphs of Finite Mutation Type Harm Derksen Department of Mathematics University of Michigan hderksen@umich.edu Theodore Oen Department of Mathematics Ioa State University oentheo@isu.edu Submitted:

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

Extended Graph Rotation Systems as a Model for Cyclic Weaving on Orientable Surfaces

Extended Graph Rotation Systems as a Model for Cyclic Weaving on Orientable Surfaces Extended Graph Rotation Systems as a Model for Cyclic Weaving on Orientable Surfaces ERGUN AKLEMAN, JIANER CHEN Texas A&M University, College Station, Texas JONATHAN L. GROSS Columbia University, Ne York

More information

Community Detection. Jian Pei: CMPT 741/459 Clustering (1) 2

Community Detection. Jian Pei: CMPT 741/459 Clustering (1) 2 Clustering Community Detection http://image.slidesharecdn.com/communitydetectionitilecturejune0-0609559-phpapp0/95/community-detection-in-social-media--78.jpg?cb=3087368 Jian Pei: CMPT 74/459 Clustering

More information

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Data Modelling and Multimedia Databases M

Data Modelling and Multimedia Databases M ALMA MATER STUDIORUM - UNIERSITÀ DI BOLOGNA Data Modelling and Multimedia Databases M International Second cycle degree programme (LM) in Digital Humanities and Digital Knoledge (DHDK) University of Bologna

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Randomized Algorithms for Fast Bayesian Hierarchical Clustering

Randomized Algorithms for Fast Bayesian Hierarchical Clustering Randomized Algorithms for Fast Bayesian Hierarchical Clustering Katherine A. Heller and Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College ondon, ondon, WC1N 3AR, UK {heller,zoubin}@gatsby.ucl.ac.uk

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand

More information

Lecture 4 Hierarchical clustering

Lecture 4 Hierarchical clustering CSE : Unsupervised learning Spring 00 Lecture Hierarchical clustering. Multiple levels of granularity So far we ve talked about the k-center, k-means, and k-medoid problems, all of which involve pre-specifying

More information

An Interactive Clustering Methodology for High Dimensional Data Mining

An Interactive Clustering Methodology for High Dimensional Data Mining Association for Information Systems AIS Electronic Library AISeL ACIS 24 roceedings acific Asia Conference on Information Systems ACIS December 24 An Interactive Clustering Methodology for High Dimensional

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Outline. Advanced Digital Image Processing and Others. Importance of Segmentation (Cont.) Importance of Segmentation

Outline. Advanced Digital Image Processing and Others. Importance of Segmentation (Cont.) Importance of Segmentation Advanced Digital Image Processing and Others Xiaojun Qi -- REU Site Program in CVIP (7 Summer) Outline Segmentation Strategies and Data Structures Algorithms Overview K-Means Algorithm Hidden Markov Model

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

1-2 Geometric vectors

1-2 Geometric vectors 1-2 Geometric ectors We are going to start simple, by defining 2-dimensional ectors, the simplest ectors there are. Are these the ectors that can be defined by to numbers only? Yes, and here is a formal

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

A Quick Guide for the EMCluster Package

A Quick Guide for the EMCluster Package A Quick Guide for the EMCluster Package Wei-Chen Chen 1, Ranjan Maitra 2 1 pbdr Core Team 2 Department of Statistics, Iowa State Universit, Ames, IA, USA Contents Acknowledgement ii 1. Introduction 1 2.

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Lecture 1: Turtle Graphics. the turtle and the crane and the swallow observe the time of their coming; Jeremiah 8:7

Lecture 1: Turtle Graphics. the turtle and the crane and the swallow observe the time of their coming; Jeremiah 8:7 Lecture 1: Turtle Graphics the turtle and the crane and the sallo observe the time of their coming; Jeremiah 8:7 1. Turtle Graphics Motion generates geometry. The turtle is a handy paradigm for investigating

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Expectation Maximization!

Expectation Maximization! Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

RES 3000 Version 3.0 CA/PMS Installation and Setup Instructions

RES 3000 Version 3.0 CA/PMS Installation and Setup Instructions RES 3000 Version 3.0 CA/PMS Installation and Setup Instructions $ERXW7KLV'RFXPHQW This document provides installation and setup instructions for the CA/ PMS credit card driver. The type of CA/EDC Driver

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)

More information

Segmentation: Clustering, Graph Cut and EM

Segmentation: Clustering, Graph Cut and EM Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray

More information

Project 1: Creating and Using Multiple Artboards

Project 1: Creating and Using Multiple Artboards E00ILCS.qxp 3/19/2010 1:0 AM Page 7 Workshops Introduction The Workshop is all about being creative and thinking outside of the box. These orkshops ill help your right-brain soar, hile making your left-brain

More information

arxiv:cs/ v1 [cs.ds] 18 May 2002

arxiv:cs/ v1 [cs.ds] 18 May 2002 arxiv:cs/0205041v1 [cs.ds] 18 May 2002 Faster Parametric Shortest Path and Minimum Balance Algorithms Neal E. Young Computer Science Department, Princeton University, Princeton, Ne Jersey, 08544 Robert

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1 Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

Clustering Techniques

Clustering Techniques Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,

More information

Clustering. Lecture 6, 1/24/03 ECS289A

Clustering. Lecture 6, 1/24/03 ECS289A Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram. Bioinformatics (Lec 17) A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

More information

Optimization Methods: Optimization using Calculus-Stationary Points 1. Module - 2 Lecture Notes 1

Optimization Methods: Optimization using Calculus-Stationary Points 1. Module - 2 Lecture Notes 1 Optimization Methods: Optimization using Calculus-Stationary Points 1 Module - Lecture Notes 1 Stationary points: Functions of Single and Two Variables Introduction In this session, stationary points of

More information

Clustering. Shishir K. Shah

Clustering. Shishir K. Shah Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials

More information

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018 CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Optimal time-delay spiking deconvolution and its application in the physical model measurement

Optimal time-delay spiking deconvolution and its application in the physical model measurement Optimal time-delay spiking deconvolution and its application in the physical model measurement Zhengsheng Yao, Gary F. Margrave and Eric V. Gallant ABSRAC Spike deconvolution based on iener filter theory

More information

Fast and Scalable Conflict Detection for Packet Classifiers

Fast and Scalable Conflict Detection for Packet Classifiers Fast and Scalable Conflict Detection for Packet Classifiers Florin Baboescu, George Varghese Dept. of Computer Science and Engineering University of California, San Diego 95 Gilman Drive La Jolla, CA9293-4

More information

Three Unsupervised Models

Three Unsupervised Models Three Unsupervised Models Lecture 7: Clustering and Tree Models Sam Roweis October, 3 The three canonical problems in unsupervised learning are clustering, dimensionality reduction, and density modeling:

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Projections. Let us start with projections in 2D, because there are easier to visualize.

Projections. Let us start with projections in 2D, because there are easier to visualize. Projetions Let us start ith projetions in D, beause there are easier to visualie. Projetion parallel to the -ais: Ever point in the -plane ith oordinates (, ) ill be transformed into the point ith oordinates

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model One-Shot Learning with a Hierarchical Nonparametric Bayesian Model R. Salakhutdinov, J. Tenenbaum and A. Torralba MIT Technical Report, 2010 Presented by Esther Salazar Duke University June 10, 2011 E.

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information