Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Size: px
Start display at page:

Download "Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin"

Transcription

1 Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin

2 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin

3 Clustering web search results Carlos Guestrin

4 Example (Taken from Kevin Murphy s ML textbook) Data: gene expression levels Goal: cluster genes with similar expression trajectories Carlos Guestrin

5 Some Data Carlos Guestrin

6 K-means 1. Ask user how many clusters they d like. (e.g. k=5) Carlos Guestrin

7 K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations Carlos Guestrin

8 K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. (Thus each Center owns a set of datapoints) Carlos Guestrin

9 K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns Carlos Guestrin

10 K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns 5. and jumps there 6. Repeat until terminated! Carlos Guestrin

11 K-means Randomly initialize k centers µ (0) = µ 1 (0),, µ k (0) Classify: Assign each point j {1, N} to nearest center: Recenter: µ i becomes centroid of its point: Equivalent to µ i average of its points! Carlos Guestrin

12 What is K-means optimizing? Potential function F(µ,C) of centers µ and point allocations C: N Optimal K-means: min µ min C F(µ,C) Carlos Guestrin

13 Does K-means converge??? Part 1 Optimize potential function: Fix µ, optimize C Carlos Guestrin

14 Does K-means converge??? Part 2 Optimize potential function: Fix C, optimize µ Carlos Guestrin

15 Coordinate descent algorithms Want: min a min b F(a,b) Coordinate descent: fix a, minimize b fix b, minimize a repeat Converges!!! if F is bounded to a (often good) local optimum as we saw in applet (play with it!) (For LASSO it converged to the global optimum, because of convexity) K-means is a coordinate descent algorithm! Carlos Guestrin

16 Mixtures of Gaussians Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin

17 (One) bad case for k-means Clusters may overlap Some clusters may be wider than others Carlos Guestrin

18 Nonspherical data Carlos Guestrin

19 Quick Review of Gaussians Univariate and multivariate Gaussians Carlos Guestrin

20 Two-Dimensional Gaussians 5 spherical 10 diagonal 6 full full spherical 0.2 diagonal Carlos 10 Guestrin

21 Gaussians in d Dimensions P(x) = 1 (2π ) d/2 Σ exp # 1 1/2 $ % 2 x µ ( ) T Σ 1 ( x µ ) & ' ( Carlos Guestrin

22 Learning Gaussians P(x) = 1 (2π ) d/2 Σ exp # 1 1/2 $ % 2 x µ ( ) T Σ 1 ( x µ ) & ' ( Given data: MLE for mean: MLE for covariance: Carlos Guestrin

23 When the world is not Gaussian Distribution of male heights in US Distribution of male heights in Sweden What if we mix these together? Carlos Guestrin

24 Gaussian Mixture Model Most commonly used mixture model Observations: Parameters: Cluster indicator: (a) Per-cluster likelihood: Ex. z i = country of origin, x i = height of i th person k th mixture component = distribution of heights in country k Carlos Guestrin

25 Generative Model We can think of sampling observations from the model For each observation i, Sample a cluster assignment (a) Sample the observation from the selected Gaussian Carlos Guestrin

26 Density Estimation Estimate a density based on x 1,,x N Carlos Guestrin

27 Density Estimation Contour Plot of Joint Density Carlos Guestrin

28 Density as Mixture of Gaussians Approximate density with a mixture of Gaussians Mixture of 3 Gaussians Contour Plot of Joint Density Carlos Guestrin

29 Density as Mixture of Gaussians Approximate density with a mixture of Gaussians Mixture of 3 Gaussians p(x i,µ, ) = Carlos Guestrin

30 Density as Mixture of Gaussians Approximate with density with a mixture of Gaussians Mixture of 3 Gaussians Our actual observations (b) C. Bishop, Pattern Recognition & Machine Learning Carlos Guestrin

31 Clustering our Observations Imagine we have an assignment of each x i to a Gaussian Our actual observations 1 (a) 1 (b) Complete data labeled by true cluster assignments C. Bishop, Pattern Recognition & Machine Learning Carlos Guestrin

32 Clustering our Observations Imagine we have an assignment of each x i to a Gaussian 1 (a) Introduce latent cluster indicator variable z i 0.5 Then we have p(x i z i,,µ, ) = Complete data labeled by true cluster assignments C. Bishop, Pattern Recognition & Machine Learning Carlos Guestrin

33 Clustering our Observations We must infer the cluster assignments from the observations (c) Posterior probabilities of assignments to each cluster *given* model parameters: r ik = p(z i = k x i,,µ, ) = Soft assignments to clusters C. Bishop, Pattern Recognition & Machine Learning Carlos Guestrin

34 Unsupervised Learning: not as hard as it looks Sometimes easy Sometimes impossible and sometimes in between Carlos Guestrin

35 Summary of GMM Concept Estimate a density based on x 1,,x N p(x i, µ, ) = 1 (a) KX z i =1 z in (x i µ z i, z i) Complete data labeled by true cluster assignments Surface Plot of Joint Density, Marginalizing Cluster Assignments Carlos Guestrin

36 Summary of GMM Components Observations x i 2 R d, x i i =1, 2,...,N Hidden cluster labels z i 2{1, 2,...,K}, i =1, 2,...,N Hidden mixture means µ k 2 R d, k =1, 2,...,K Hidden mixture covariances Hidden mixture probabilities k 2 R d d, k =1, 2,...,K k, KX k =1 k=1 Gaussian mixture marginal and conditional likelihood : KX p(x i, µ, ) = z i p(x i z i,µ, ) z i =1 p(x i z i,µ, ) =N (x i µ z i, z i) Carlos Guestrin

37 Application to Document Modeling Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin

38 Cluster Documents Cluster documents based on topic Carlos Guestrin

39 Document Representation Bag of words model Carlos Guestrin

40 Issues with Document Representation Words counts are bad for standard similarity metrics Term Frequency Inverse Document Frequency (tf-idf) Increase importance of rare words Carlos Guestrin

41 TF-IDF Term frequency: tf(t, d) = Could also use {0, 1}, 1 + logt f(t, d),... Inverse document frequency: idf(t, D) = tf-idf: tfidf(t, d, D) = High for document d with high frequency of term t (high term frequency ) and few documents containing term t in the corpus (high inverse doc frequency ) Carlos Guestrin

42 A Generative Model Documents: Associated topics: Parameters simple mixture of Gaussians: Carlos Guestrin

43 What you get from mixture model for documents Words give topic: Topic proportions: Topic distribution of each document: Carlos Guestrin

44 Results from Wikipedia data using similar model (LDA) Carlos Guestrin

45 Expectation Maximization Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin

46 Next back to Density Estimation What if we want to do density estimation with multimodal or clumpy data? Carlos Guestrin

47 Learning Model Parameters Want to learn model parameters Mixture of 3 Gaussians Our actual observations (b) C. Bishop, Pattern Recognition & Machine Learning Carlos Guestrin

48 ML Estimate of Mixture Model Params Log likelihood L x ( ), log p({x i } ) = X i log X z i p(x i,z i ) Want ML estimate ˆ ML = Neither convex nor concave and local optima Carlos Guestrin

49 Complete Data Imagine we have an assignment of each x i to a cluster Our actual observations 1 (a) 1 (b) Complete data labeled by true cluster assignments C. Bishop, Pattern Recognition & Machine Learning Carlos Guestrin

50 If complete data were observed z i Assume class labels were observed in addition to L x,z ( ) = X log p(x i,z i ) i x i Compute ML estimates Separates over clusters k! Example: mixture of Gaussians (MoG) = { k,µ k, k } K k=1 Carlos Guestrin

51 Cluster Responsibilities We must infer the cluster assignments from the observations (c) Posterior probabilities of assignments to each cluster *given* model parameters: r ik = p(z i = k x i,, )= Soft assignments to clusters C. Bishop, Pattern Recognition & Machine Learning Carlos Guestrin

52 Iterative Algorithm Motivates a coordinate ascent-like algorithm: 1. Infer missing values given estimate of parameters 2. Optimize parameters to produce new given filled in data 3. Repeat z i ˆ ˆ z i Example: MoG 1. Infer responsibilities r ik = p(z i = k x i, ˆ (t 1) )= 2. Optimize parameters max w.r.t. k : max w.r.t. k : Carlos Guestrin

53 E.M. Convergence EM is coordinate ascent on an interesting potential function Coord. ascent for bounded pot. func. è convergence to a local optimum guaranteed This algorithm is REALLY USED. And in high dimensional state spaces, too. E.G. Vector Quantization for Speech Data Carlos Guestrin

54 Gaussian Mixture Example: Start Carlos Guestrin

55 After first iteration Carlos Guestrin

56 After 2nd iteration Carlos Guestrin

57 After 3rd iteration Carlos Guestrin

58 After 4th iteration Carlos Guestrin

59 After 5th iteration Carlos Guestrin

60 After 6th iteration Carlos Guestrin

61 After 20th iteration Carlos Guestrin

62 Some Bio Assay data Carlos Guestrin

63 GMM clustering of the assay data Carlos Guestrin

64 Resulting Density Estimator Carlos Guestrin

65 Initialization In mixture model case where there are many ways to initialize the EM algorithm Examples: y i = {z i,x i } Choose K observations at random to define each cluster. Assign other observations to the nearest centriod to form initial parameter estimates Pick the centers sequentially to provide good coverage of data Grow mixture model by splitting (and sometimes removing) clusters until K clusters are formed Can be quite important to convergence rates in practice Carlos Guestrin

66 Label switching Color = label does not matter Can switch labels and likelihood is unchanged Carlos Guestrin

67 What you should know K-means for clustering: algorithm converges because it s coordinate ascent EM for mixture of Gaussians: How to learn maximum likelihood parameters (locally max. like.) in the case of unlabeled data Remember, E.M. can get stuck in local minima, and empirically it DOES EM is coordinate ascent Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Expectation Maximization: Inferring model parameters and class labels

Expectation Maximization: Inferring model parameters and class labels Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Clustering. Image segmentation, document clustering, protein class discovery, compression

Clustering. Image segmentation, document clustering, protein class discovery, compression Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key

More information

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Expectation Maximization: Inferring model parameters and class labels

Expectation Maximization: Inferring model parameters and class labels Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/27/2017 Jumble of unlabeled images HISTOGRAM blue

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

CSC 411: Lecture 12: Clustering

CSC 411: Lecture 12: Clustering CSC 411: Lecture 12: Clustering Raquel Urtasun & Rich Zemel University of Toronto Oct 22, 2015 Urtasun & Zemel (UofT) CSC 411: 12-Clustering Oct 22, 2015 1 / 18 Today Unsupervised learning Clustering -means

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best knob settings. Often impossible to nd di The EM Algorithm This lecture introduces an important statistical estimation algorithm known as the EM or \expectation-maximization" algorithm. It reviews the situations in which EM works well and its

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

Mixture Models and EM

Mixture Models and EM Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Clustering and The Expectation-Maximization Algorithm

Clustering and The Expectation-Maximization Algorithm Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

SGN (4 cr) Chapter 11

SGN (4 cr) Chapter 11 SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

University of Washington Department of Computer Science and Engineering / Department of Statistics

University of Washington Department of Computer Science and Engineering / Department of Statistics University of Washington Department of Computer Science and Engineering / Department of Statistics CSE 547 / Stat 548 Machine Learning (Statistics) for Big Data Homework 2 Winter 2014 Issued: Thursday,

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm Gaussian Mixture Models For Clustering Data Soft Clustering and the EM Algorithm K-Means Clustering Input: Observations: xx ii R dd ii {1,., NN} Number of Clusters: kk Output: Cluster Assignments. Cluster

More information

University of Washington Department of Computer Science and Engineering / Department of Statistics

University of Washington Department of Computer Science and Engineering / Department of Statistics University of Washington Department of Computer Science and Engineering / Department of Statistics CSE 599 / Stat 592 Machine Learning (Statistics) for Big Data Homework 2 Winter 2013 Issued: Thursday,

More information

ALTERNATIVE METHODS FOR CLUSTERING

ALTERNATIVE METHODS FOR CLUSTERING ALTERNATIVE METHODS FOR CLUSTERING K-Means Algorithm Termination conditions Several possibilities, e.g., A fixed number of iterations Objects partition unchanged Centroid positions don t change Convergence

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Lecture 8: The EM algorithm

Lecture 8: The EM algorithm 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

An Introduction to PDF Estimation and Clustering

An Introduction to PDF Estimation and Clustering Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Exam

More information

Warped Mixture Models

Warped Mixture Models Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining. K-means clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 K-means Outline K-means, K-medoids Choosing the number of clusters: Gap test, silhouette plot. Mixture

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter

More information

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning CS325 Artificial Intelligence Cengiz Spring 2013 Unsupervised Learning Missing teacher No labels, y Just input data, x What can you learn with it? Unsupervised Learning Missing teacher No labels, y Just

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two

More information

LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16

LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16 LogisBcs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and InformaBon Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Exam

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

University of Washington Department of Computer Science and Engineering / Department of Statistics

University of Washington Department of Computer Science and Engineering / Department of Statistics University of Washington Department of Computer Science and Engineering / Department of Statistics CSE 547 / Stat 548 Machine Learning (Statistics) for Big Data Homework 2 Spring 2015 Issued: Thursday,

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Clustering. Shishir K. Shah

Clustering. Shishir K. Shah Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique

More information

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics) Cluster Analysis Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics) Summer School in Statistics for Astronomers June 1-6, 9 Clustering: Intuition

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Expectation Maximization!

Expectation Maximization! Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical

More information

Clustering. Clustering in R d

Clustering. Clustering in R d Clustering Clustering in R d Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional space.

More information

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,

More information