Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
|
|
- Allyson Franklin
- 5 years ago
- Views:
Transcription
1 Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 2 / 1 Description Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram: A tree like diagram that records the sequences of merges or splits A clustering and its dendrogram Tan,Steinbach, Kumar Introduction to 4/18/ / 1 4 / 1
2 Strengths Do not have to assume any particular number of clusters Each horizontal cut of the tree yields a clustering The tree may correspond to a meaningful taxonomy: (eg, animal kingdom, phylogeny reconstruction, ) Need only a similarity or distance matrix for implementation Agglomerative Start with the points as individual clusters At each step, merge the closest pair of clusters until only one cluster (or some fixed number k clusters) remain 5 / 1 6 / 1 Divisive Start with one, all-inclusive cluster At each step, split a cluster until each cluster contains a point (or there are k clusters) Agglomerative Clustering Algorithm 1 Compute the proximity matrix 2 Let each data point be a cluster 3 While there is more than one cluster: 1 Merge the two closest clusters 2 Update the proximity matrix The major difference is the computation of proximity of two clusters 7 / 1 8 / 1
3 Intermediate Situation After some merging steps, we have some clusters C1 C2 C3 C4 C5 C1 C2 C1 C3 C4 C3 C4 C5 Proximity Matrix C2 C5 Starting point for agglomerative clustering Intermediate point, with 5 clusters Tan,Steinbach, Kumar Introduction to 4/18/ / 1 10 / 1 After Merging The question is How do we update the proximity matrix? C1 C2 U C5 C3 C4 C1? C1 C3 C4 C2 U C5???? C3? C4? Proximity Matrix C2 U C5 We will merge C2 and C5 How do we update proximity matrix? Tan,Steinbach, Kumar Introduction to 4/18/ / 1 12 / 1
4 How to Define Inter-Cluster Similarity p1 p2 p3 p4 p5 How to Define Inter-Cluster Simi p1 Similarity? p1 p2 p3 p1 p2 p3 We need a notion of similarity between clusters MIN MAX Group Average Distance Between Centroids Other methods driven by an objective function Ward s Method uses squared error p4 p5 13 / 1 MIN MAX Proximity Matrix How to Define Inter-Cluster Similarity Single linkage uses the minimum distance Group Average Distance Between Centroids Other methods driven by an objective function Ward s Method uses squared error Tan,Steinbach, Kumar Introduction to 4/18/2004 Tan,Steinbach, 13 Kumar Introduction to p1 Complete linkage uses the maximum distance MIN MAX Group Average p2 p3 p4 p5 15 / 1 How to Define Inter-Cluster Simil p1 p2 p3 p4 p5 Group average linkage uses the average distance between groups MIN MAX Group Average p4 p5 14 / 1 p1 p2 p3 p4 p5 16 / 1 Pr p1 Pro
5 How to Define Inter-Cluster Similarity Similarity of two clusters is based on the two most similar (closest) points in the different clusters Determined by one pair of points, ie, by one link in the proximity graph p1 p2 p3 p4 p5 p1 p2 p4 Centroid uses the distance between the centroids of the clusters p5 (presumes one can compute centroids) MIN MAX Group Average 17 / 1 Distance Between Centroids Hierarchical Other clustering methods driven by an objective function Ward s Method uses squared error Distance matrix for nested clusterings Tan,Steinbach, Kumar Introduction to 4/18/ p3 Tan,Steinbach, Kumar Introduction to 4/18/ Proximity Matrix Proximity matrix and dendrogram of single linkage Hierarchical Clustering: MIN / 1 Nested cluster representation and dendrogram of single linkage Nested Clusters Dendrogram Tan,Steinbach, Kumar Introduction to 4/18/ / 1 20 / 1
6 The Iris data (single linkage) Single linkage Can handle irregularly shaped regions fairly naturally Sensitive to noise and outliers in the form of chaining 21 / 1 Cluster Similarity: MAX or Complete Linkage 22 / 1 The Iris data (single linkage) Similarity of two clusters is based on the two least similar (most distant) points in the different clusters Determined by all pairs of points in the two clusters Proximity matrix and dendrogram of complete linkage Tan,Steinbach, Kumar Introduction to 4/18/ / 1 24 / 1
7 Hierarchical Clustering: MAX Nested Nested cluster Clusters and dendrogram ofdendrogram complete linkage Complete linkage Less sensitive to noise and outliers than single linkage Regions are generally compact, but may violate closeness That is, points may much closer to some points in neighbouring cluster than its own cluster This manifests itself as breaking large clusters Clusters are biased to be globular Tan,Steinbach, Kumar Introduction to 4/18/ / 1 26 / 1 The Iris data (complete linkage) The Iris data (complete linkage) 27 / 1 28 / 1
8 The Iris data (complete linkage) Hierarchical Clustering: Group Average Nested cluster and dendrogram of group average linkage Nested Clusters Dendrogram Tan,Steinbach, Kumar Introduction to 4/18/ / 1 30 / 1 The Iris data (average linkage) Average linkage Given two elements of the partition C r, C s, we might consider d GA (C r, C s ) = 1 C r C s x C r,y C s d(x, y) A compromise between single and complete linkage Shares globular clusters of complete, less sensitive than single 31 / 1 32 / 1
9 The Iris data (average linkage) Ward s linkage Similarity of two clusters is based on the increase in squared error when two clusters are merged Similar to average if dissimilarity between points is distance squared Hence, it shares many properties of average linkage A hierarchical analogue of K-means Sometimes used to initialize K-means 33 / 1 34 / 1 The Iris data (Ward s linkage) The Iris data (Ward s linkage) 35 / 1 36 / 1
10 NCI data (complete linkage) NCI data (single linkage) 37 / 1 38 / 1 NCI data (average linkage) NCI data (Ward s linkage) 39 / 1 40 / 1
11 Computational issues O(n 2 ) space since it uses the proximity matrix O(n 3 ) time in many cases as there are N steps, and at each step a matrix of size N 2 must be updated and/or searched Statistical issues Once a decision is made to combine two clusters, it cannot be undone No objective function is directly minimized Different schemes have problems with one or more of the following: Sensitivity to noise and outliers Difficulty handling different sized clusters and convex shapes Breaking large cluster 41 / 1 42 / 1 Model-based clustering Part II Model-based clustering General approach Choose a type of mixture model (eg multivariate Normal) and a maximum number of clusters, K Use a specialized hierarchical clustering technique Uses some criterion to determine optimal model and number of clusters 43 / 1 44 / 1
12 Model-based clustering Model-based clustering Choosing a mixture model General form of the mixture model f (x) = k π j f (x; θ j ) j=1 Choosing a mixture model Generally, we can write Σ j = c j D j A j D T j For multivariate normal, θ j = (µ j, Σ j ) The EM algorithm we discussed before assumed Σ j are all different in the different classes Other possibilities: Σ j = Σ, Σ j = λ j I, etc with c j diag(a j ) the eigenvalues of Σ j with max(a j ) = 1 The parameter c j is the size, A j is the shape, and D j is the orientation 45 / 1 46 / 1 Model-based agglomerative clustering NCI data (Ward s linkage) Ward s criterion A hierarchical clustering algorithm that merges k clusters {C1 k,, C k k } into k 1 clusters based on k 1 WSS = WSS(C k 1 j ) j=1 where WSS is the within-cluster sum of squared distances The procedure merges the two clusters Ci k, Cl k that produce the smallest increase in WSS 47 / 1 48 / 1
13 Model-based agglomerative clustering Model selection & BIC Model-based agglomerative clustering If Σ j = σ 2 I, then Ward s criterion is equivalent to merging based on the criterion 2 log L(θ, l) Bayesian Information Criterion After a merge, the clusters are taken as initial starting points for the EM algorithm where L(θ, l) = n f (x x i ; θ li ) This results in several mixture models: one for each number of clusters and each type of mixture model considered i=1 is called the classification likelihood This idea can be used to make a hierarchical clustering algorithm for other types of multivariate normal models, ie equal shape, same size, etc How do we choose? This raises the topic of model selection 49 / 1 50 / 1 Model selection & BIC Model selection & BIC Bayesian Information Criterion Suppose we have several possible models M = {M 1,, M T } for a data set which we assume is given by a data matrix X n p Bayesian Information Criterion The BIC of a model is usually These models have parameters Θ = {θ 1,, θ T } Further, suppose that each one has a likelihood, L j and Θ = { θ 1,, θ T } are the maximum likelihood estimators BIC(M j ) = 2 log L j ( θ j ) + log n # parameters in M j The BIC can be thought of as approximating We can compare 2 log L j ( θ j ) P(M j is correct X n p ) but this ignores how much fitting each model does under an appropriate Bayesian model for X A common approach is to add a penalty that makes different models comparable 51 / 1 52 / 1
14 Model selection & BIC Model-based clustering Bayesian Information Criterion Typically, statisticians will try to prove choosing model with best BIC yields correct model Some theoretical justification is needed for this, and this breaks down for mixture models Nevertheless, it is still used Another common criterion is AIC (Akaike Information Criterion) AIC(M j ) = 2 log L j ( θ j ) + 2 # parameters in M j Summary 1 Choose a type of mixture model (eg multivariate Normal) and a maximum number of clusters, K 2 Use a specialized hierarchical clustering technique: model-based hierarchical agglomeration 3 Use clusters from previous step to initialize EM for the mixture model 4 Uses BIC to compare different mixture models and models with different numbers of clusters 53 / 1 54 / 1 The Iris data best model: equal shape, 2 components The Iris data 55 / 1 56 / 1
15 The Iris data Part III 57 / 1 58 / 1 What is an outlier? Concepts What is an outlier? The set of data points that are considerably different than the remainder of the data When do they appear in data mining tasks? Given a data matrix X, find all the cases x i X with anomaly/outlier scores greater than some threshold t Or, the top n outlier scores Given a data matrix X, containing mostly normal (but unlabeled) data points, and a test case x new, compute an anomaly/outlier score of x new with respect to X Applications Credit card fraud detection; Network intrusion detection; Misspecification of a model 59 / 1 60 / 1
16 Issues How many outliers are there in the data? Method is unsupervised, similar to clustering or finding clusters with only 1 point in them Usual assumption: There are considerably more normal observations than abnormal observations (outliers/anomalies) in the data General steps Build a profile of the normal behavior The profile generally consists of summary statistics of this normal population Use these summary statistics to detect anomalies, ie points whose characteristics are very far from the normal profile General types of schemes involve a statistical model of normal, and far is measured in terms of likelihood Other schemes based on distances can be quasi-motivated by such statistical techniques 61 / 1 62 / 1 Statistical approach Assume a parametric model describing the distribution of the data (eg, normal distribution) Apply a statistical test that depends on: Data distribution (eg normal) Parameter of distribution (eg, mean, variance) Number of expected outliers (confidence limit, α or Type I error) Grubbs Test Suppose we have a sample of n numbers Z = {Z 1,, Z n }, ie a n 1 data matrix Assuming data is from normal distribution, Grubbs tests uses distribution of max 1 i n Z i Z SD(Z) to search for outlying large values 63 / 1 64 / 1
17 Grubbs Test Lower tail variant: Two-sided variant: min 1 i n Z i Z SD(Z) max 1 i n Z i Z SD(Z) 65 / 1 Model based techniques Grubbs Test Having chosen a test-statistic, we must determine a threshold that sets our threshold rule Often this is set via a hypothesis test to control Type I error For large positive outlier, threshold is based on choosing some acceptable Type I error α and finding c α so that P 0 ( max1 i n Z i Z SD(Z) c α ) α Above, P 0 denotes the distribution of Z under the assumption there are no outliers If Z are IID N(µ, σ 2 ) it is generally possible to compute a decent approximation of this probability using Bonferonni Model based: linear regression with outliers 66 / 1 Grubbs Test Two sided critical level has the form c α = n 1 t2 α/(2n),n 2 n n 2 + tα/(2n),n 2 2 where P(T k t γ,k ) = γ is the upper tail quantile of T k In R, you can use the functions pnorm, qnorm, pt, qt for these quantities! First build a model! Points which don t fit the model well are identified as outliers! For the example at the right, a least squares regression model would be appropriate! Residuals can be fed in to Grubbs test Figure : Residuals from model can be fed into Grubbs test or Bonferroni (variant) Tan,Steinbach, Kumar Introduction to 4/18/ / 1 68 / 1
18 Multivariate data If the non-outlying data is assumed to be multivariate Gaussian, what is the analogy of Grubbs statistic Likelihood approach Assume data is a mixture max 1 i n Z i Z SD(Z) Answer: use Mahalanobis distance max (Z i Z) T Σ 1 (Z i Z) 1 i n Above, each individual statistic has what looks like a Hotelling s T 2 distribution F = (1 λ)m + λa Above, M is the distribution of most of the data The distribution A is an outlier distribution, could be uniform on a bounding box for the data This is a mixture model If M is parametric, then the EM algorithm fits naturally here Any points assigned to A are outliers 69 / 1 70 / 1 Likelihood approach Do we estimate λ or fix it? The book starts describing an algorithm that tries to maximize the equivalent classification likelihood L(θ M, θ A ; l) = (1 λ) #l M f M (x i, θ M ) i l M λ #l A f A (x i ; θ A ) i l A Likelihood approach: Algorithm Algorithm tries to maximize this by forming iterative estimates (M t, A t ) of normal and outlying data points 1 At each stage, tries to place individual points of M t to A t 2 Find ( θ M, θ A ) based on partition new partition (if necessary) 3 If increase in likelihood is large enough, call these new set (M t+1, A t+1 ) 4 Repeat until no further changes 71 / 1 72 / 1
19 Nearest neighbour approach Many ways to define outliers Example: data points for which there are fewer than k neighboring points within a distance ɛ Example: the n points whose distance to k-th nearest neighbour is largest The n points whose average distance to the first k nearest neighobours is largest Each of these methods all depend on choice of some parameters: k, n, ɛ Difficult to choose these in a systematic way Density-based: LOF approach Density approach 73 / 1! Compute local outlier factor (LOF) of a sample p as the For each point, x i compute a density estimate f xi,k using its k nearest neighbours Density estimate used is Define ( y N(x f xi,k = i,k) d(x i, y) #N(x i, k) LOF (x i ) =! For each point, compute the density of its local neighborhood average of the ratios of the density of sample p and the density of its nearest neighbors! are points with largest LOF value ) 1 f xi,k ( y N(x i,k) f y,k)/#n(x i, k) 74 / 1 p 2! p 1! Detection rate In the NN approach, p 2 is Set P(O) to be the proportion of outliers or anomalies not considered as outlier, while LOF approach Set find P(D O) to be the probability of declaring an outlier if both p 1 and p 2 as outliers it truly is an outlier This is the detection rate Set P(D O c ) to the probability of declaring an outlier if it is truly not an outlier Tan,Steinbach, Kumar Introduction to 4/18/ Figure : Nearest neighbour vs density based 75 / 1 76 / 1
20 Bayesian detection rate Bayesian detection rate is P(O D) = P(D O)P(O) P(D O)P(O) + P(D O c )P(O c ) The false alarm rate or false discovery rate is P(O c D) = P(D O c )P(O c ) P(D O c )P(O c ) + P(D O)P(O) 77 / 1
Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationHierarchical clustering
Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized
More informationHierarchical Clustering
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 0 0 0 00
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationLecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Hierarchical Clustering Produces a set
More informationCSE 347/447: DATA MINING
CSE 347/447: DATA MINING Lecture 6: Clustering II W. Teal Lehigh University CSE 347/447, Fall 2016 Hierarchical Clustering Definition Produces a set of nested clusters organized as a hierarchical tree
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No 08 Cluster Analysis Naeem Ahmed Email: naeemmahoto@gmailcom Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline
More informationClustering Part 3. Hierarchical Clustering
Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points
More informationClustering Lecture 3: Hierarchical Methods
Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationHierarchical Clustering
Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationCS7267 MACHINE LEARNING
S7267 MAHINE LEARNING HIERARHIAL LUSTERING Ref: hengkai Li, Department of omputer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) Mingon Kang, Ph.D. omputer Science,
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More informationLecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More information5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction
Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationUnsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis
7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationDATA MINING - 1DL105, 1Dl111. An introductory class in data mining
1 DATA MINING - 1DL105, 1Dl111 Fall 007 An introductory class in data mining http://user.it.uu.se/~udbl/dm-ht007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationKnowledge Discovery in Databases
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/004 What
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationALTERNATIVE METHODS FOR CLUSTERING
ALTERNATIVE METHODS FOR CLUSTERING K-Means Algorithm Termination conditions Several possibilities, e.g., A fixed number of iterations Objects partition unchanged Centroid positions don t change Convergence
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationK-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.
K-means clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 K-means Outline K-means, K-medoids Choosing the number of clusters: Gap test, silhouette plot. Mixture
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 07) Old Faithful Geyser Data
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationChapter 6: Cluster Analysis
Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationLecture 7: Segmentation. Thursday, Sept 20
Lecture 7: Segmentation Thursday, Sept 20 Outline Why segmentation? Gestalt properties, fun illusions and/or revealing examples Clustering Hierarchical K-means Mean Shift Graph-theoretic Normalized cuts
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationData Clustering. Danushka Bollegala
Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationK-means and Hierarchical Clustering
K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationClustering: K-means and Kernel K-means
Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationSTATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral
More informationSolution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013
Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More information3. Cluster analysis Overview
Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationSpatial Outlier Detection
Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point
More informationMultivariate Analysis
Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data
More information3. Cluster analysis Overview
Université Laval Analyse multivariable - mars-avril 2008 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More information