Lecture 11: Clustering Introduction and Projects Machine Learning

Similar documents
ECG782: Multidimensional Digital Signal Processing

Clustering Lecture 5: Mixture Model

Data Clustering. Danushka Bollegala

Machine Learning. Unsupervised Learning. Manfred Huber

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

COMS 4771 Clustering. Nakul Verma

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Lecture 9: Undirected Graphical Models Machine Learning

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Natural Language Processing

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Machine Learning using MapReduce

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Introduction to Mobile Robotics

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Search Engines. Information Retrieval in Practice

Bioinformatics - Lecture 07

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Clustering in Data Mining

Intro to Artificial Intelligence

Supervised vs unsupervised clustering

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

CS Introduction to Data Mining Instructor: Abdullah Mueen

Unsupervised Learning

Expectation Maximization!

ECS289: Scalable Machine Learning

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Introduction to Machine Learning CMU-10701

K-Means and Gaussian Mixture Models

ECE 5424: Introduction to Machine Learning

Contents. Preface to the Second Edition

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

CS 229 Midterm Review

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

SGN (4 cr) Chapter 11

Unsupervised Learning and Clustering

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

Image Analysis, Classification and Change Detection in Remote Sensing

Clustering and The Expectation-Maximization Algorithm

Time series, HMMs, Kalman Filters

The K-modes and Laplacian K-modes algorithms for clustering

Day 3 Lecture 1. Unsupervised Learning

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Clustering web search results

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Based on Raymond J. Mooney s slides

Visual Representations for Machine Learning

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Grundlagen der Künstlichen Intelligenz

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution

Unsupervised Learning: Clustering

Clustering: Classic Methods and Modern Views

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Unsupervised Learning : Clustering

1 Case study of SVM (Rob)

Behavioral Data Mining. Lecture 18 Clustering

Unsupervised Learning

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Unsupervised Learning and Clustering

Kapitel 4: Clustering

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Using Machine Learning to Optimize Storage Systems

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

What to come. There will be a few more topics we will cover on supervised learning

Machine Learning Lecture 3

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

ECS289: Scalable Machine Learning

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

ALTERNATIVE METHODS FOR CLUSTERING

Modeling time series with hidden Markov models

Supervised vs. Unsupervised Learning

Lecture 7: Segmentation. Thursday, Sept 20

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Content-based image and video analysis. Machine learning

K-Means Clustering 3/3/17

Transcription:

Lecture 11: Clustering Introduction and Projects Machine Learning Andrew Rosenberg March 12, 2010 1/1

Last Time Junction Tree Algorithm Efficient Marginals in Graphical Models 2/1

Today Clustering Project Details 3/1

Clustering Clustering Clustering is an unsupervised Machine Learning application The task is to group similar entities into groups. 4/1

We do this all the time 5/1

We do this all the time 6/1

We do this all the time 7/1

We can do this in many dimensions 8/1

We can do this to many degrees 9/1

We can do this in many dimensions 10 /1

We do this all the time 11 /1

How do we set this up computationally? In Machine Learning, we optimize objective functions to find the best solution. Maximum Likelihood (for Frequentists) Maximum A Posteriori (for Bayesians) Empirical Risk Minimization Loss function Minimization What makes a good cluster? How do we define loss or likelihood in a clustering solution? 12 /1

Cluster Evaluation Intrinsic Evaluation Evaluate the compactness of the clusters Extrinsic Evaluation Compare the results to some gold standard labeled data. (Not covered today) 13 /1

Intrinsic Evaluation Intercluster Variability (IV) How different are the data points within the same cluster Extracluster Variability (EV) How different are the data points that are in distinct clusters Minimize IV while maximizing EV. Minimize IV EV IV = C d(x, c) x C d(x,c) = x c 14 /1

Degenerate Clustering Solutions One Cluster 15 /1

Degenerate Clustering Solutions N Clusters 16 /1

Clustering Approaches Hierarchical Clustering Partitional Clustering 17 /1

Hierarchical Clustering Recursive Partitioning 18 /1

Hierarchical Clustering Recursive Partitioning 19 /1

Hierarchical Clustering Recursive Partitioning 20 /1

Hierarchical Clustering Recursive Partitioning 21 /1

Hierarchical Clustering Recursive Partitioning 22 /1

Hierarchical Clustering Agglomerative Clustering 23 /1

Hierarchical Clustering Agglomerative Clustering 24 /1

Hierarchical Clustering Agglomerative Clustering 25 /1

Hierarchical Clustering Agglomerative Clustering 26 /1

Hierarchical Clustering Agglomerative Clustering 27 /1

Hierarchical Clustering Agglomerative Clustering 28 /1

Hierarchical Clustering Agglomerative Clustering 29 /1

Hierarchical Clustering Agglomerative Clustering 30 /1

Hierarchical Clustering Agglomerative Clustering 31 /1

Hierarchical Clustering Agglomerative Clustering 32 /1

Hierarchical Clustering Agglomerative Clustering 33 /1

K-Means Clustering K-Means clustering is a Partitional Clustering Algorithm. Identify different partitions of the space for a fixed number of clusters Input: a value for K the number of clusters. Output: the K centers of clusters centroids 34 /1

K-Means Clustering 35 /1

K-Means Clustering Algorithm: Given an integer K specifying the number of clusters. Initialize K cluster centroids Select K points from the data set at random Select K points from the space at random For each point in the data set, assign it to the cluster whose center it is closest to. argmin Ci d(x, C i ) Update the centroid based on the points assigned to the cluster. c i = 1 C i x C i x If any data point has changed clusters, repeat. 36 /1

Why does K-Means Work? When an assignment is changed, the sum of squared distances of the data point to its assigned cluster is reduced. IV is reduced. When a cluster centroid is moved the sum of squared distances of the data points within that cluster is reduced IV is reduced. At convergence we have found a local minimum of IV 37 /1

K-Means Clustering 38 /1

K-Means Clustering 39 /1

K-Means Clustering 40 /1

K-Means Clustering 41 /1

K-Means Clustering 42 /1

K-Means Clustering 43 /1

K-Means Clustering 44 /1

K-Means Clustering 45 /1

K-Means Clustering 46 /1

K-Means Clustering 47 /1

K-Means Clustering 48 /1

Soft K-Means In K-means, we forced every data point to be the member of exactly one cluster. We can relax this constraint. p(x,c i ) = d(x,c i) j d(x,c j) p(x,c i ) = Based on minimizing entropy of cluster assignment. exp{ d(x,c i)} j exp{ d(x,c j)} We still define a cluster by a centroid, but we calculate the centroid as a weighted center of all the data points. x c i = x p(x,c i) x p(x,c i) Convergence is based on a stopping threshold rather than changing assignments. 49 /1

Potential Problems with K-Means Optimal? K-means approaches a local minimum, but this is not guaranteed to be globally optimal. Could you design an approach which is globally optimal? Consistent? Different starting clusters can lead to different cluster solutions 50 /1

Potential Problems with K-Means Optimal? K-means approaches a local minimum, but this is not guaranteed to be globally optimal. Could you design an approach which is globally optimal? Sure, in NP. Consistent? Different starting clusters can lead to different cluster solutions 51 /1

Suboptimality in K-Means 52 /1

Inconsistency in K-Means 53 /1

Inconsistency in K-Means 54 /1

Inconsistency in K-Means 55 /1

Inconsistency in K-Means 56 /1

More Clustering K-Nearest Neighbors Gaussian Mixture Models Spectral Clustering We will return to these. 57 /1

The Project Research Paper Project 58 /1

Research Paper 8-10 pages Reporting on work in 4-5 papers. Scope: One application area or One technique 59 /1

Research Paper: Application Areas Identify an application that has made use of machine learning and discuss how. Graphics Game Playing Object Recognition Scrabble Optical Character Recognition Craps Superresolution Prisoner s Dilemma Segmentation Financials Natural Language Processing Stock Prediction Parsing Review Systems Sentiment Analysis Amazon Information Extraction Netflix Speech Facebook Recognition Synthesis Discourse Analysis Intonation 60 /1

Research Paper: Techniques Identify a machine learning technique. Describe its use and variants. L1-regularization Non-linear Kernels Loopy Belief Propagation Non-parametric Belief Propagation Soft-Decision Trees Analysis of Neural Network Hidden Layers Structured Learning Generalized Expectation Evaluation Measures Cluster Evaluation Semi-supervised Evaluation Graph Embedding Dimensionality Reduction Feature Selection Graphical Model Construction Non-parametric Bayesian Methods Latent Dirichlet Allocation 61 /1

Project Run a Machine Learning Experiment Identify a problem/task data set. Implement one or more ML algorithm Evaluate the approach. Write a Report of the Experiment 4 pages including references. Abstract 1 paragraph summarizing the experiment Introduction describe the Problem Data Describe the data set, features extracted, etc. Method Describe the algorithm/approach Results Present and discuss results Conclusion Summarize the experiment and results. 62 /1

Project Ideas: Tasks Projects can take any combination of Tasks and Approaches Graphics Object Classification Facial Recognition Fingerprint Identification Optical Character Recognition Handwriting recognition (for languages/character systems other than English...) Language Topic Classification Sentiment Analysis Speech Recognition Speaker Identification Punctuation Restoration Semantic Segmentation Recognition of Emotion, Sarcasm, etc. SMS Text normalization Chat participant identification Twitter classification/threading 63 /1

Games Chess Checkers Poker (Poker Academy Pro) Blackjack Recommenders (Collaborative Filtering) Netflix Courses Jokes Books Facebook? Video Classification Motion classification Segmentation 64 /1

Bye Next Hidden Markov Models Viterbi Decoding 65 /1