Unsupervised Learning

Similar documents
Unsupervised Learning

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Unsupervised Learning : Clustering

Supervised vs. Unsupervised Learning

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Clustering CS 550: Machine Learning

Based on Raymond J. Mooney s slides

Clustering. Supervised vs. Unsupervised Learning

University of Florida CISE department Gator Engineering. Clustering Part 2

Clustering and Visualisation of Data

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Spectral Classification

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Cluster Analysis: Agglomerate Hierarchical Clustering

CSE 5243 INTRO. TO DATA MINING

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

CHAPTER 4: CLUSTER ANALYSIS

Unsupervised Learning

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

INF 4300 Classification III Anne Solberg The agenda today:

Gene Clustering & Classification

CSE 5243 INTRO. TO DATA MINING

CSE 158. Web Mining and Recommender Systems. Midterm recap

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Clustering algorithms

Using Machine Learning to Optimize Storage Systems

Machine Learning for OR & FE

Semi-Supervised Clustering with Partial Background Information

Lesson 3. Prof. Enza Messina

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Cluster Analysis. Ying Shen, SSE, Tongji University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Introduction to Artificial Intelligence

10701 Machine Learning. Clustering

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

ECLT 5810 Clustering

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

ECLT 5810 Clustering

Behavioral Data Mining. Lecture 18 Clustering

K-Means Clustering 3/3/17

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Data Informatics. Seon Ho Kim, Ph.D.

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Introduction to Machine Learning CMU-10701

COMS 4771 Clustering. Nakul Verma

Chapter DM:II. II. Cluster Analysis

Machine Learning. Unsupervised Learning. Manfred Huber

Association Rule Mining and Clustering

Network Traffic Measurements and Analysis

Clustering & Classification (chapter 15)

What to come. There will be a few more topics we will cover on supervised learning

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Data clustering & the k-means algorithm

Pattern Clustering with Similarity Measures

Unsupervised Learning: Clustering

Improving Cluster Method Quality by Validity Indices

Region-based Segmentation

Unsupervised Learning Partitioning Methods

Exploratory Analysis: Clustering

Machine Learning in Biology

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Unsupervised Learning

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Cluster Analysis. Angela Montanari and Laura Anderlucci

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Computer Vision. Exercise Session 10 Image Categorization

Jarek Szlichta

High throughput Data Analysis 2. Cluster Analysis

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Mining di Dati Web. Lezione 3 - Clustering and Classification

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Introduction to Mobile Robotics

K-Means. Oct Youn-Hee Han

Clustering Part 4 DBSCAN

6. Dicretization methods 6.1 The purpose of discretization

Understanding Clustering Supervising the unsupervised

Clustering algorithms and autoencoders for anomaly detection

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

Transcription:

Unsupervised Learning

Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However, there are problems where the definition of the classes and even the number of the classes may be unknown. Machine learning methods which deal with such data are said to be unsupervised. Questions: Why would one even be interested in learning from unlabeled samples? Is it even possible in principle to learn anything of value from unlabeled samples?

Why unsupervised learning (in random order) 1. To limit the cost of the often surprisingly costly process of collecting and labeling a large set of sample patterns. E.g., videos are virtually free, but accurately labeling the video pixels is expensive and time consuming. 2. To obtain a larger training set than the one available using semi-supervised learning. Train a classifier on a small set of samples, then tune it up to make it run without supervision on a large, unlabeled set. Or, in the reverse direction, let a large set of unlabeled data group automatically, then label the groupings found. 3. To detect a gradual change of patterns over time. 4. To find features that will be useful for categorization. 5. To gain insight into the nature or structure of the data during the early stages of an investigation.

Unsupervised learning: clustering In practice, unsupervised learning methods implement what is usually referred to as data clustering. Qualitatively and generally, the problem of data clustering can be defined as: Grouping of objects into meaningful categories Given a representation of N objects, find k clusters based on a similarity measure.

Data clustering The problem can be tackled from several points of view: Statistics: represent the density function for all data as the mixture of a number of different distributions S i w i p y wi (y w i ) and fit a set of weights w i and component densities p y wi to the given data

Data clustering The problem can be tackled from several points of view: Geometry/topology : Partition the pattern space such that data belonging to each partition are highly homogeneous (i.e., they are similar to one another) More directly related with classification: Group (label) data such that average intra-group distance is minimized and average inter-group distance is maximized (yet another optimization problem!)

Data clustering Why data clustering? Natural Classification: degree of similarity among species. Data exploration: discover underlying structure, generate hypotheses, detect anomalies. Data Compression: for organizing/indexing/storing/broadcasting data. Applications: useful to any scientific field that collects data! Relevance: 2340 papers about data clustering indexed in Scopus in 2014!

Data clustering: examples 800,000 scientific papers clustered into 776 topics based on how often the papers were cited together by the authors of other papers

Data clustering Given a set of N unlabeled examples D = (x 1 ; x 2 ; ; x N ) in a d-dimensional feature space, D is partitioned into a number of disjoint subsets Dj 's: D = j=1,k Dj where i j D i D j = where the points in each subset are similar to each other according to a given criterion.

Data clustering A partition is denoted by p = (D 1 ;D 2 ;.. ; D k ) and the problem of data clustering is thus formulated as p * = argmin f(p) p where f( ) is formulated according to.

Data clustering A general optimization (minimization) algorithm for a classification function J(Y, W) (Y being the dataset and W the ordered set of labels assigned to each sample) can be described as follows: Choose an initial classifier W 0 repeat (step i) Change the classification such that J decreases until the classification is the same as the previous one. If the variables were continuous, a gradient method could be used. However, here, we are dealing with discrete variables.

Data clustering A reasonable algorithm (based on the simplifying assumption that the optimization problem is separable, i.e., that the minimum of an n-dimensional function can be found by minimizing it along each dimension separately) would assign to each sample the label that causes the largest (negative) J. NB Since the problem is non-separable, there is no guarantee that J decreases as the sum of J s. It may even increase! A better but slower solution, which guarantees monotonicity, would be to change, in each step, the label that causes the greatest negative J.

K-means clustering K-means clustering is developed by choosing the Euclidean distance as the similarity criterion and J = S k=1,nc Sy (i) ~w k y (i) - m k as the function to be optimized. y (i) is the i th sample and m k is the centroid of the k th cluster. y (i) ~w k refers to all samples y (i) assigned to cluster k Then M {m 1, m 2,.., m Nc } is the set of reference vectors, each of which represents the prototype for a class. J is minimized by choosing m k as the sample mean of the data having label w k.

Unsupervised learning K-means clustering In practice: The algorithm partitions the input space S into a predefined number of subspaces, induced by the Euclidian distance. Each subspace s i of S is defined as: s i ={x j ϵs d(x j,m i ) = min t d(x j,m t )} This induces a so-called Voronoi tessellation of the input space (example limited to 2D patterns)

K-means clustering Randomly initialize m {m 1, m 2, m 3, m 4,..., m Nc } repeat Classify N samples according to nearest m i Recompute m i (as the mean of the patterns assigned to cluster i) until there is no change in m

Improving on k-means The main problem with k-means is the need to set a priori the number of desired clusters. A large number of algorithms have been proposed that overcome this limitation, by determining an optimal number of clusters at runtime. The basic idea behind these algorithms is splitting and merging : A cluster is split into two clusters when a measure of homogeneity falls below a certain threshold Two clusters are merged into one when a separation measure falls below a certain threshold

Isodata N D = approximate (desired) number of clusters T= threshold for (minimum) number of samples in a cluster Set Nc = N D 1. Cluster data into Nc clusters, eliminating data and clusters with fewer than T members and decreasing Nc accordingly. Exit if classification has not changed, 2. If Nc N D /2 or (Nc < 2 N D and iteration is odd) : a. Split clusters whose samples are sufficiently disjointed and increase Nc accordingly. b. If any clusters have been split go to 1. 3. Merge any pair of clusters whose samples are sufficiently close and/or overlapping and decrease Nc accordingly. 4. Go to step 1.

Isodata (cluster computation) 1. Cluster data into Nc clusters, eliminating data and clusters with fewer than T members and decreasing Nc accordingly. Exit if classification has not changed, For each cluster k the following quantities are computed: d k = (1/N k ) S y(i)~wk y (i) - m k avg. distance of samples from the mean (centroid) of cluster k s k 2 = max { (1/N k ) S y(i)~wk (y(i) j - m kj ) 2 } j largest variance along the coordinate axes. d = 1 N Nc k=1 N kd k overall average distance of samples N k = number of samples in cluster k

Isodata (split) s s 2 = max. spread threshold for splitting (no splitting below s s2 ) For k = 1,.., Nc If s k 2 > s s 2 If (d k > d and ( N k > 2T + 1 or Nc N D /2 or (Nc < 2 N D and iteration odd) ): split cluster k and increase Nc accordingly. Splitting means replacing the original center with two new centers displaced slightly (usually a fraction of s m ) in opposite directions along the axis m of largest variance.

Isodata (merge) D m = maximum distance separation for merging N max = maximum number of clusters that can be merged For i = 1,.., Nc For j = 1,.., Nc i. Compute d ij = m i - m j ii. Sort d ij < D m in ascending order For all sorted d ij, if neither cluster i or j have been merged: While the number of merges < N max Merge clusters i and j and decrease Nc accordingly end while The new center m of the merged cluster is computed as: m = (N i m i + N j m j )/(N i +N j )