Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs
|
|
- Lucy Bryant
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF Mitchell - - EM, GMM Readings: Murphy , , Bishop 9 HTF Mitchell Matt Gormley Lecture 16 March 20,
2 Reminders Homework 5: Readings / Application of ML Release: Wed, Mar. 08 Due: Wed, Mar. 22 at 11:59pm 2
3 Peer Tutoring Tutor Tutee 3
4 K- MEANS 5
5 K- Means Outline Clustering: Motivation / Applications Optimization Background Coordinate Descent Block Coordinate Descent Clustering Inputs and Outputs Objective- based Clustering K- Means K- Means Objective Computational Complexity K- Means Algorithm / Lloyd s Method K- Means Initialization Random Farthest Point K- Means++ Last Lecture This Lecture 6
6 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would we want to do this? Useful for: Automatically organizing data. Understanding hidden structure in data. Preprocessing for further analysis. Representing high- dimensional data in a low- dimensional space (e.g., for visualization purposes). Slide courtesy of Nina Balcan
7 Applications (Clustering comes up everywhere ) Cluster news articles or web pages or search results by topic. Cluster protein sequences by function or genes according to expression profile. Cluster users of social networks by interest (community detection). Facebook network Twitter Network Slide courtesy of Nina Balcan
8 Applications (Clustering comes up everywhere ) Cluster customers according to purchase history. Cluster galaxies or nearby stars (e.g. Sloan Digital Sky Survey) And many many more applications. Slide courtesy of Nina Balcan
9 Whiteboard: Optimization Background Coordinate Descent Block Coordinate Descent 10
10 Clustering Question: Which of these partitions is better? 11
11 Whiteboard: Clustering Inputs and Outputs Objective- based Clustering 12
12 Whiteboard: K- Means K- Means Objective Computational Complexity K- Means Algorithm / Lloyd s Method 13
13 Whiteboard: K- Means Initialization Random Furthest Traversal K- Means++ 14
14 Lloyd s method: Random Initialization Slide courtesy of Nina Balcan
15 Lloyd s method: Random Initialization Example: Given a set of datapoints Slide courtesy of Nina Balcan
16 Lloyd s method: Random Initialization Select initial centers at random Slide courtesy of Nina Balcan
17 Lloyd s method: Random Initialization Assign each point to its nearest center Slide courtesy of Nina Balcan
18 Lloyd s method: Random Initialization Recompute optimal centers given a fixed clustering Slide courtesy of Nina Balcan
19 Lloyd s method: Random Initialization Assign each point to its nearest center Slide courtesy of Nina Balcan
20 Lloyd s method: Random Initialization Recompute optimal centers given a fixed clustering Slide courtesy of Nina Balcan
21 Lloyd s method: Random Initialization Assign each point to its nearest center Slide courtesy of Nina Balcan
22 Lloyd s method: Random Initialization Recompute optimal centers given a fixed clustering Get a good quality solution in this example. Slide courtesy of Nina Balcan
23 Lloyd s method: Performance It always converges, but it may converge at a local optimum that is different from the global optimum, and in fact could be arbitrarily worse in terms of its score. Slide courtesy of Nina Balcan
24 Lloyd s method: Performance Local optimum: every point is assigned to its nearest center and every center is the mean value of its points. Slide courtesy of Nina Balcan
25 Lloyd s method: Performance.It is arbitrarily worse than optimum solution. Slide courtesy of Nina Balcan
26 Lloyd s method: Performance This bad performance, can happen even with well separated Gaussian clusters. Slide courtesy of Nina Balcan
27 Lloyd s method: Performance This bad performance, can happen even with well separated Gaussian clusters. Some Gaussian are combined.. Slide courtesy of Nina Balcan
28 Lloyd s method: Performance If we do random initialization, as k increases, it becomes more likely we won t have perfectly picked one center per Gaussian in our initialization (so Lloyd s method will output a bad solution). For k equal- sized Gaussians, Pr[each initial center is in a different Gaussian] "! " $ % & $ Becomes unlikely as k gets large. Slide courtesy of Nina Balcan
29 Another Initialization Idea: Furthest Point Heuristic Choose c 1 arbitrarily (or at random). For j = 2,, k Pick c j among datapoints x 1, x 2,, x n that is farthest from previously chosen c 1, c 2,, c j01 Fixes the Gaussian problem. But it can be thrown off by outliers. Slide courtesy of Nina Balcan
30 Furthest point heuristic does well on previous example Slide courtesy of Nina Balcan
31 Furthest point initialization heuristic sensitive to outliers Assume k=3 (0,1) (- 2,0) (3,0) (0,- 1) Slide courtesy of Nina Balcan
32 Furthest point initialization heuristic sensitive to outliers Assume k=3 (0,1) (- 2,0) (3,0) (0,- 1) Slide courtesy of Nina Balcan
33 K- means++ Initialization: D 6 sampling [AV07] Interpolate between random and furthest point initialization Let D(x) be the distance between a point x and its nearest center. Chose the next center proportional to D 6 (x). Choose c 1 at random. For j = 2,, k Pick c j among x 1, x 2,, x n according to the distribution Pr(c j = x i ) min x i c j? 2 D 6 (x i ) Theorem: K- means++ always attains an O(log k) approximation to optimal k- means solution in expectation. Running Lloyd s can only further improve the cost. Slide courtesy of Nina Balcan
34 K- means++ Idea: D 6 sampling Interpolate between random and furthest point initialization Let D(x) be the distance between a point x and its nearest center. Chose the next center proportional to D D x = min x i c j? α. α = 0, random sampling α =, furthest point (Side note: it actually works well for k- center) α = 2, k- means++ Side note: α = 1, works well for k- median Slide adapted from Nina Balcan
35 K- means ++ Fix (0,1) (- 2,0) (3,0) (0,- 1) Slide courtesy of Nina Balcan
36 K- means++/ Lloyd s Running Time K- means ++ initialization: O(nd) and one pass over data to select next center. So O(nkd) time in total. Lloyd s method Repeat until there is no change in the cost. For each j: C K {x S whose closest center is c j } Each round takes time O(nkd). For each j: c j mean of C K Exponential # of rounds in the worst case [AV07]. Expected polynomial time in the smoothed analysis (non worst- case) model! Slide courtesy of Nina Balcan
37 K- means++/ Lloyd s Summary Running Lloyd s can only further improve the cost. Exponential # of rounds in the worst case [AV07]. Expected polynomial time in the smoothed analysis model! Does well in practice. K- means++ always attains an O(log k) approximation to optimal k- means solution in expectation. Slide courtesy of Nina Balcan
38 What value of k? Heuristic: Find large gap between (k - 1)- means cost and k- means cost. Hold- out validation/cross- validation on auxiliary task (e.g., supervised learning task). Try hierarchical clustering. Slide courtesy of Nina Balcan
39 EM AND GMMS 43
40 Expectation- Maximization Outline Background Multivariate Gaussian Distribution Marginal Probabilities Building up to GMMs Distinction #1: Model vs. Objective Function Gaussian Naïve Bayes (GNB) Gaussian Discriminant Analysis Gaussian Mixture Model (GMM) Expectation- Maximization Distinction #2: Data Likelihood vs. Marginal Data Likelihood Distinction #3: Latent Variables vs. Parameters Objective Functions for EM Hard (Viterbi) EM Algorithm Example: K- Means as Hard EM Soft (Standard) EM Algorithm Example: Soft EM for GMM Properties of EM Nonconvexity / Local Optimization Example: Grammar Induction Variants of EM 44
41 Whiteboard Background Multivariate Gaussian Distribution Marginal Probabilities 46
42 GAUSSIAN MIXTURE MODEL (GMM) 47
43 Whiteboard Building up to GMMs Distinction #1: Model vs. Objective Function Gaussian Naïve Bayes (GNB) Gaussian Discriminant Analysis Gaussian Mixture Model (GMM) 48
44 Gaussian Discriminant Analysis Data: D = {( (i), (i) )} N i=1 where (i) R M and z (i) {1,...,K} Generative Story: z Categorical( ) Gaussian(µ z, z) Model: Joint: p(,z;, µ, )=p( z; µ, )p(z; ) Log- likelihood: (, µ, )= = N i=1 N i=1 p( (i),z (i) ;, µ, ) p( (i) z (i) ; µ, )+ p(z (i) ; ) 49
45 Gaussian Discriminant Analysis Data: D = {( (i), (i) )} N i=1 where (i) R M and z (i) {1,...,K} Log- likelihood: (, µ, )= N p( (i) z (i) ; µ, )+ p(z (i) ; ) i=1 Maximum Likelihood Estimates: Take the derivative of the Lagrangian, set it equal to zero and solve. N k = 1 N µ k = k = i=1 I(z (i) = k), k N i=1 I(z(i) = k) (i), k N i=1 I(z(i) = k) N i=1 I(z(i) = k)( (i) µ k )( (i) µ k ) T N i=1 I(z(i) = k) Implementation: Just counting, k 50
46 Gaussian Mixture- Model Assume we are given data, D, consisting of N fully unsupervised examples in M dimensions: Data: (i) D = {t(i) }N i=1 where t Generative Story: Model: Joint: RM z Categorical( ) t Gaussian(µz, z) p(t, z;, µ, ) = p(t z; µ, )p(z; ) K Marginal: p(t;, µ, ) = p(t z; µ, )p(z; ) z=1 (Marginal) Log- likelihood: (, µ, ) = HQ; N p(t(i) ;, µ, ) i=1 N = i=1 HQ; K z=1 p(t(i) z; µ, )p(z; ) 51
47 Mixture- Model Assume we are given data, D, consisting of N fully unsupervised examples in M dimensions: Data: (i) D = {t(i) }N i=1 where t Generative Story: Model: RM Multinomial( Categorical( )) pgaussian(µ ( z) z, z) z t p Joint:, (t, z) = p (t z)p (z) K Marginal: p, (t) = p (t z)p (z) z=1 (Marginal) Log- likelihood: ( ) = HQ; N p, (t(i) ) i=1 N = i=1 HQ; K z=1 p (t(i) z)p (z) 52
48 Mixture- Model Assume we are given data, D, consisting of N fully unsupervised examples in M dimensions: Data: (i) D = {t(i) }N i=1 where t Generative Story: Model: RM Multinomial( Categorical( )) pgaussian(µ ( z) ) z, zthis could be any z t p Joint:, arbitrary distribution parameterized by θ. (t, z) = p (t z)p (z) K Marginal: p, (t) = (Marginal) Log- likelihood: ( ) = HQ; Today we re thinking p (t z)p (z) about the case where it z=1 is a Multivariate Gaussian. N p, (t(i) ) i=1 N = i=1 HQ; K z=1 p (t(i) z)p (z) 53
49 Learning a Mixture Model Supervised Learning: The parameters decouple! D = {( (i), (i) )} N i=1 Z Unsupervised Learning: Parameters are coupled by marginalization. D = { (i) } N i=1 Z X 1 X 2 X M X 1 X 2 X M, =, N i=1 p ( (i) z (i) )p (z (i) ), =, N i=1 K z=1 p ( (i) z)p (z) = N p ( (i) z (i) ) i=1 = N p (z (i) ) i=1 54
50 Learning a Mixture Model Supervised Learning: The parameters decouple! Unsupervised Learning: Parameters are coupled by marginalization. D = {(t(i), x(i) )}N i=1 D = {t(i) }N i=1 Z Training certainly isn t as simple as the supervised case., In many cases, we could still use box optimization Xsome black- X2 XM 1 method (e.g. Newton- Raphson) N to solve this coupled (i) (i) (i) = `;K t HQ; p (t z )p (z ) optimization problem., i=1 Z X1, X2 = `;K t, N i=1 HQ; XM K z=1 p (t(i) z)p (z) N = `;K t HQ; p (t(i) z (i) ) This lecture is about a more problem- i=1 specific method: EM. = `;K t N i=1 HQ; p (z (i) ) 55
51 EXAMPLE: K- MEANS VS GMM 56
52 Example: K- Means 57
53 Example: K- Means 58
54 Example: K- Means 59
55 Example: K- Means 60
56 Example: K- Means 61
57 Example: K- Means 62
58 Example: K- Means 63
59 Example: K- Means 64
60 Example: GMM 65
61 Example: GMM 66
62 Example: GMM 67
63 Example: GMM 68
64 Example: GMM 69
65 Example: GMM 70
66 Example: GMM 71
67 Example: GMM 72
68 Example: GMM 73
69 Example: GMM 74
70 Example: GMM 75
71 Example: GMM 76
72 Example: GMM 77
73 Example: GMM 78
74 Example: GMM 79
75 Example: GMM 80
76 Example: GMM 81
77 Example: GMM 82
78 Example: GMM 83
79 Example: GMM 84
80 Example: GMM 85
81 Example: GMM 86
82 K- Means vs. GMM Convergence: K- Means tends to converge much faster than a GMM Speed: Each iteration of K- Means is computationally less intensive than each iteration of a GMM Initialization: To initialize a GMM, we typically first run K- Means and use the resulting cluster centers as the means of the Gaussian components Output: A GMM yields a probability distribution over the cluster assignment for each point; whereas K- Means gives a single hard assignment 87
Clustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 03/02/2016 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 11/05/2018 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationClustering. K-means clustering
Clustering K-means clustering Clustering Motivation: Identify clusters of data points in a multidimensional space, i.e. partition the data set {x 1,...,x N } into K clusters. Intuition: A cluster is a
More informationClustering COMS 4771
Clustering COMS 4771 1. Clustering Unsupervised classification / clustering Unsupervised classification Input: x 1,..., x n R d, target cardinality k N. Output: function f : R d {1,..., k} =: [k]. Typical
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More information10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)
10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,
More informationThe EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di
The EM Algorithm This lecture introduces an important statistical estimation algorithm known as the EM or \expectation-maximization" algorithm. It reviews the situations in which EM works well and its
More informationMixture Models and EM
Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Exam
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationExperimental Design + k- Nearest Neighbors
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationCSC 411: Lecture 12: Clustering
CSC 411: Lecture 12: Clustering Raquel Urtasun & Rich Zemel University of Toronto Oct 22, 2015 Urtasun & Zemel (UofT) CSC 411: 12-Clustering Oct 22, 2015 1 / 18 Today Unsupervised learning Clustering -means
More informationLecture 11: Clustering Introduction and Projects Machine Learning
Lecture 11: Clustering Introduction and Projects Machine Learning Andrew Rosenberg March 12, 2010 1/1 Last Time Junction Tree Algorithm Efficient Marginals in Graphical Models 2/1 Today Clustering Project
More informationBackpropagation + Deep Learning
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation + Deep Learning Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationClustering. Image segmentation, document clustering, protein class discovery, compression
Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationLogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16
LogisBcs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and InformaBon Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Exam
More informationClustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2
So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationk-means, k-means++ Barna Saha March 8, 2016
k-means, k-means++ Barna Saha March 8, 2016 K-Means: The Most Popular Clustering Algorithm k-means clustering problem is one of the oldest and most important problem. K-Means: The Most Popular Clustering
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationClustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015
Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs
More informationK-Means Clustering 3/3/17
K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationClustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B
Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationValue Iteration. Reinforcement Learning: Introduction to Machine Learning. Matt Gormley Lecture 23 Apr. 10, 2019
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Reinforcement Learning: Value Iteration Matt Gormley Lecture 23 Apr. 10, 2019 1
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material
More informationCluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.
Cluster analysis formalism, algorithms Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz poutline motivation why clustering? applications, clustering as
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear
More informationA Computational Theory of Clustering
A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze
More informationk-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out
Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative
More information10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2
161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under
More informationK-Means Clustering. Sargur Srihari
K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationLecture 2 The k-means clustering problem
CSE 29: Unsupervised learning Spring 2008 Lecture 2 The -means clustering problem 2. The -means cost function Last time we saw the -center problem, in which the input is a set S of data points and the
More informationScan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
Scan Matching Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Scan Matching Overview Problem statement: Given a scan and a map, or a scan and a scan,
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means
1 MACHINE LEARNING Kernel for Clustering ernel K-Means Outline of Today s Lecture 1. Review principle and steps of K-Means algorithm. Derive ernel version of K-means 3. Exercise: Discuss the geometrical
More informationExpectation-Maximization. Nuno Vasconcelos ECE Department, UCSD
Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationL9: Hierarchical Clustering
L9: Hierarchical Clustering This marks the beginning of the clustering section. The basic idea is to take a set X of items and somehow partition X into subsets, so each subset has similar items. Obviously,
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationClustering gene expression data
Clustering gene expression data 1 How Gene Expression Data Looks Entries of the Raw Data matrix: Ratio values Absolute values Row = gene s expression pattern Column = experiment/condition s profile genes
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationColorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationLecture 10: Semantic Segmentation and Clustering
Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More information( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components
Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues
More informationSemi- Supervised Learning
Semi- Supervised Learning Aarti Singh Machine Learning 10-601 Dec 1, 2011 Slides Courtesy: Jerry Zhu 1 Supervised Learning Feature Space Label Space Goal: Optimal predictor (Bayes Rule) depends on unknown
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue
More information