Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Similar documents
Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering and Visualisation of Data

K-Means Clustering 3/3/17

Note Set 4: Finite Mixture Models and the EM Algorithm

Unsupervised Learning

COMS 4771 Clustering. Nakul Verma

Clustering web search results

Grundlagen der Künstlichen Intelligenz

Unsupervised Learning

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Network Traffic Measurements and Analysis

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

CSC 411: Lecture 12: Clustering

Machine Learning. Unsupervised Learning. Manfred Huber

Clustering and The Expectation-Maximization Algorithm

ECE 5424: Introduction to Machine Learning

Clustering Lecture 5: Mixture Model

Unsupervised Learning : Clustering

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Introduction to Machine Learning CMU-10701

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Mixture Models and the EM Algorithm

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Shape Classification and Cell Movement in 3D Matrix Tutorial (Part I)

CSE 158. Web Mining and Recommender Systems. Midterm recap

Expectation Maximization (EM) and Gaussian Mixture Models

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas


Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Unsupervised Learning: Clustering

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

University of Florida CISE department Gator Engineering. Clustering Part 2

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Unsupervised: no target value to predict

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Introduction to Machine Learning. Xiaojin Zhu

Supervised vs. Unsupervised Learning

Random projection for non-gaussian mixture models

Introduction to Mobile Robotics

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Supervised vs unsupervised clustering

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

Statistics 202: Statistical Aspects of Data Mining

K-Means Clustering. Sargur Srihari

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering k-mean clustering

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CHAPTER 4: CLUSTER ANALYSIS

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Discrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Clustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Lecture 11: Clustering Introduction and Projects Machine Learning

Clustering. Supervised vs. Unsupervised Learning

Clustering. Image segmentation, document clustering, protein class discovery, compression

Clustering CS 550: Machine Learning

What to come. There will be a few more topics we will cover on supervised learning

A Course in Machine Learning

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1

Clustering: Classic Methods and Modern Views

Content-based image and video analysis. Machine learning

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Segmentation Computer Vision Spring 2018, Lecture 27

Clustering in Data Mining

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Parameter Selection for EM Clustering Using Information Criterion and PDDP

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Slide07 Haykin Chapter 9: Self-Organizing Maps

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

ALTERNATIVE METHODS FOR CLUSTERING

Bayes Classifiers and Generative Methods

Hierarchical Clustering 4/5/17

IBL and clustering. Relationship of IBL with CBR

Latent Variable Models and Expectation Maximization

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

Spectral Classification

Unsupervised Learning

Chapter 7: Competitive learning, clustering, and self-organizing maps

Further Applications of a Particle Visualization Framework

10-701/15-781, Fall 2006, Final

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Transcription:

Clustering & Dimensionality Reduction 273A Intro Machine Learning

What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning we are only given attributes. Our task is to discover structure in the data. Example I: the data may be structured in clusters: Example II: the data may live on a lower dimensional manifold:

Why Discover Structure? Data compression: If you have a good model you can encode the data more cheaply. Example PCA: To encode the data I have to encode the x and y position of each data-case. However, I could also encode the offset and angle of the line plus the deviations from the line. Small numbers can be encoded more cheaply than large numbers with the same precision. This idea is the basis for model selection: The complexity of your model (e.g. the number of parameters) should be such that you can encode the data-set with the fewest number of bits (up to a certain precision). Clustering: represent every data-case by a cluster representative plus deviations. ML is often trying to find semantically meaningful representations (abstractions). These are good as a basis for making new predictions.

Clustering: K-means We iterate two operations: 1. Update the assignment of data-cases to clusters 2. Update the location of the cluster. Denote Denote Denote the assignment of data-case i to cluster c. the position of cluster c in a d-dimensional space. the location of data-case i Then iterate until convergence: 1. For each data-case, compute distances to each cluster and the closest one: 2. For each cluster location, compute the mean location of all data-cases assigned to it: Nr. of data-cases in cluster c Set of data-cases assigned to cluster c

K-means Cost function: Each step in k-means decreases this cost function. Often initialization is very important since there are very many local minima in C. Relatively good initialization: place cluster locations on K randomly chosen data-cases. How to choose K? Add complexity term: Or X-validation Or Bayesian methods and minimize also over K

Vector Quantization K-means divides the space up in a Voronoi tesselation. Every point on a tile is summarized by the code-book vector +. This clearly allows for data compression!

Mixtures of Gaussians K-means assigns each data-case to exactly 1 cluster. But what if clusters are overlapping? Maybe we are uncertain as to which cluster it really belongs. The mixtures of Gaussians algorithm assigns data-cases to cluster with a certain probability.

MoG Clustering Covariance determines the shape of these contours Idea: fit these Gaussian densities to the data, one per cluster.

EM Algorithm: E-step r is the probability that data-case i belongs to cluster c. is the a priori probability of being assigned to cluster c. Note that if the Gaussian has high probability on data-case i (i.e. the bell-shape is on top of the data-case) then it claims high responsibility for this data-case. The denominator is just to normalize all responsibilities to 1:

EM Algorithm: M-Step total responsibility claimed by cluster c expected fraction of data-cases assigned to this cluster weighted sample mean where every data-case is weighted according to the probability that it belongs to that cluster. weighted sample covariance

EM-MoG EM comes from expectation maximization. We won t go through the derivation. If we are forced to decide, we should assign a data-case to the cluster which claims highest responsibility. For a new data-case, we should compute responsibilities as in the E-step and pick the cluster with the largest responsibility. E and M steps should be iterated until convergence (which is guaranteed). Every step increases the following objective function (which is the total log-probability of the data under the model we are learning):