Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
|
|
- Bernard Bradford
- 6 years ago
- Views:
Transcription
1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1
2 Introduction Cluster analysis is the formal study of algorithms and methods for grouping data. Cluster analysis is a tool for exploring the structure of the data. Applications: in a variety of engineering and scientific disciplines 2003/3/11 2
3 Applications of Cluster Analysis (1) Biology, Psychology, Archeology, Geology, Marketing, Information retrieval, Remote sensing, etc. 2003/3/11 3
4 Applications of Cluster Analysis (2) Characterizing customer groups based on purchasing patterns. Categorizing Web documents. Grouping genes and proteins that have similar functionality. Grouping spatial locations prone to earth-quakes based on seismological data. Feature extraction. Image segmentation 2003/3/11 4
5 Backgrounds While it is easy to give a functional definition of a cluster, it is very difficult to give an operational definition of a cluster A cluster is a set of entities which are alike, and entities from different clusters are not alike. At global level or local level? 2003/3/11 5
6 2003/3/11 6
7 Data Representation (1) 2003/3/11 7
8 Data Representation (2) Pattern Matrix: It can be viewed as a n x d matrix where n and d represent the number of objects and features, respectively. Ex: /3/11 8
9 Data Representation (3) Proximity Matrix: It accumulates the pairwise indices of proximity in a matrix in which each row and column represents a pattern. Ex: Note: All proximity matrices are symmetry. 2003/3/11 9
10 Data Types and Scales (1) Data Types: the degree of quantization in the data. Binary: 0/1, Yes/No. Discrete: a finite number of possible values. Continuous: a point on the real line. 2003/3/11 10
11 Data Types and Scales (2) Data Scale: It indicates the relative significance of numbers. Qualitative (normal and ordinal) scales: discrete numbers can be coded on these qualitative scales. (1) A nominal scale is not really a scale at all because numbers are simply used as a names. E.g. a (yes, no) response could be coded as (0, 1) or (1,0) or (50, 100). (2) The ordinal scale: the numbers have meaning only in relation to one another. E.g. (1,2,3), (10,20,30), and (100, 200, 300) are all equivalent from an ordinal viewpoint. 2003/3/11 11
12 Data Types and Scales (3) Quantitative (interval and ratio): a unit of measurement exists vs. an absolute zero exists along with a unit of measurement. (1) Interval: The interpretation of the numbers depends on this unit. E.g. 90 degree of Fahrenheit vs. Celsius or judge satisfaction (2) Ratio: The ratio between two numbers has meaning. E.g. the distance between two cities 2003/3/11 12
13 Proximity Indices A proximity index between the ith and kth patterns is denoted d(i,k). The most common proximity index for patterns is the Minkowski metric, which measures dissimilarity. d( i, k) = ( d j= 1 x ij x kj r 1 ) r 2003/3/11 13
14 2003/3/11 14 Common Distance Metrics Euclidean distance (r=2) Manhattan or city block distance (r=1) Mahalanobis distance )] ( ) [( ) ( ), ( 1 2 k i T k i d j kj ij x x x x x x k i d = = = = = d j x ij x kj k i d 1 ), ( ) ( ) ( ), ( 1 k i T k i x x x x k i d Σ =
15 Normalization (1) Some normalization is usually employed based on the requirements of the analysis 2003/3/11 15
16 Normalization (2) Zero mean and unit variance: m1 N N 1 * m = ( M ) = N x 1 * i σ j = ( xij m N j ) i= 1 i= 1 m n (1) Invariant to rigid displacements x ij = x * ij m j (2) All features have zero mean and unit variance 2 xij x ij m = * j σ j 2003/3/11 16
17 Classification Types (1) Clustering is a special kind of classification. 2003/3/11 17
18 Classification Types (2) Exclusive vs. Nonexclusive: Each object belongs to exactly one subset, or cluster. Nonexclusive classification can assign an object to several classes. Unsupervised vs. Supervised: An unsupervised classification uses only the proximity matrix to perform the classification. Supervised classification uses category labels on the subjects as well as the proximity matrix. 2003/3/11 18
19 Classification Types (3) Hierarchical vs. Partitional: A hierarchical classification is a nested sequence of partitions, whereas a partitional classification is a single partition. 2003/3/11 19
20 Hierarchical Clustering (1) A picture of a hierarchical clustering is much easier for a human being to comprehend than is a list of abstract symbols. A dendrogram is a special type of tree structure that provides a convenient picture of a hierarchical clustering. Two types: agglomerative and divisive Agglomerative: It starts with the disjoint clustering, which places each of the n objects in an individual cluster and then merges them in a nested procedure Divisive: It performs the task in the reverse order 2003/3/11 20
21 Hierarchical Clustering (2) Step 1: Assign each object to its own cluster. Step 2: Computer the distances between all clusters. Step 3: Merge the two clusters that are closest to each other. Step 4: Return to step 2 until there is only one cluster left. 2003/3/11 21
22 Hierarchical Clustering (3) {X1}, {X2}, {X3}, {X4}, {X5} {X1, X2}, {X3}, {X4}, {X5} {X1, X2}, {X3, X4}, {X5} {X1, X2, X3, X4}, {X5} {X1, X2, X3, X4, X5} Note: Cutting s dendrogram horizontally creates a clustering. 2003/3/11 22
23 2003/3/11 23 Hierarchical Clustering (4) The single-linkage algorithm The complete-linkage algorithm: The average-linkage algorithm: ), ( max ), (, b a d C C D i C j b C a j i CL = ), ( 1 ), (, b a d N N C C D i C j b C a j i j i SL = ), ( min ), (, b a d C C D i C j b C a j i SL =
24 Hierarchical Clustering (5) The single-linkage algorithm allows clusters to grow long and thin. The complete-linkage algorithm produces more compact clusters. Both the single-linkage algorithm and the complete-linkage algorithm are susceptible to distortion by outliers or deviant observation. The average-linkage algorithm is an attempt to compromise between the extreme of the singlelinkage algorithm and the complete-linkage algorithm. 2003/3/11 24
25 Hierarchical Clustering (6) 2003/3/11 25
26 Partitional Clustering Partitional: We generate a single partition of the data in an attempt to recover natural groups present in the data Basic idea: Simply select a criterion, evaluate it for all possible partitions containing K clusters, and pick the partition that optimizes the criterion Hierarchical techniques: biological, social, and behavior science because of the need to construct taxonomies Partitional technologies: engineering applications where single partitions are important 2003/3/11 26
27 Algorithm for Iterative Partitional Clustering Step 1. Select an initial partition with K clusters. Step 2. Generate a new partition by assigning each pattern to its closest cluster center. Step 3. Compute new cluster centers as the centers of the clusters. Step 4. Repeat step2 and 3 until an optimum value of the criterion function is found. Step 5. Adjust the number of clusters by merging and splitting existing clusters or by removing small, or outlier, clusters. 2003/3/11 27
28 The K-means Algorithm (1) Step 1: Choose K cluster centers: C1(1), L, CK (1) Step 2: At the kth iterative step distribute the samples among the K cluster domains, using the relation x Sj( k) if x cj( k) < x ci ( k) for i j Step 3: Computer the new cluster centers C j ( k where 1 + 1) = x N j N j x S ( k ) j j = 1, LK = the number of samples in S Step 4: If the algorithm has converged and the procedure is terminated. Otherwise go to Step /3/11 28 j ( k)
29 The K-Means Algorithm (2) Seed patterns can be the first K patterns of K randomly chosen data points. Different initial partitions can lead to different final clustering results If the clustering results using several different initial partitions all lead to the same final partition, we have some confidence on the result. The Euclidean distance can be replaced by the Mahalanobis distance. 2003/3/11 29
30 The K-Means Algorithm (3) 2003/3/11 30
31 The K-Means Algorithm (4) 2003/3/11 31
32 The K-Means Algorithm (5) 2003/3/11 32
33 Nearest-Neighbor Clustering Algorithm (1) Step 1: Set i=1 and k=1. Assign pattern to cluster C1 Step 2: Set i=i+1. Find the nearest neighbor of x i among the patterns already assigned to clusters. Let d denote the distance from x i to its nearest neighbor. Suppose that the nearest neighbor is in cluster c k. Step 3: If d m t (a prespecified threshold), then assign xi to cm. Otherwise, set k=k+1 and assign xi to a new cluster ck. Step 4: If every pattern has been assigned to a cluster, stop. Else, go to step 2. x 1 Note: The number of clusters generated, K, is a function of the parameter t. As the value of t increases, fewer clusters are generated. 2003/3/11 33
34 Nearest-Neighbor Clustering Algorithm (2) 2003/3/11 34
35 Nearest-Neighbor Clustering Algorithm (3) 2003/3/11 35
36 Projections Projection algorithms maps a set of N ndimensional patterns onto an m-dimensional space, where m<n. The main motivation for projection algorithms is to permit visual examination of multidimensional data such that one can cluster by eye and qualitatively validate clustering results. Projection algorithms can be categorized into two types linear type and nonlinear type. 2003/3/11 36
37 Linear Projections (1) y = H xi for i = 1, L, N i Linear projection algorithms are relatively simple to use and have well-understood mathematical properties. Eigenvector projection (Karhunen-Loeve method) is commonly used. The eigenvectors of the covariance matrix R defines a linear projection and replace the features in the raw data with uncorrelated features. 2003/3/11 37
38 Linear Projections (2) Let Σ denote the covariance matrix of the data and λ denote the eigenvalue of Σ. λ λ L 1 2 λ d c 1, c2, L, c d denote the corresponding eigenvectors (principal components). m = 1 N 1 N i= 1 x i N T = ( xi m)( x m) 2003/3/11 N i= 1 38
39 Linear Projections (3) Define the m x d transformation matrix H as H = c c M c T 1 T 2 T m 2003/3/11 39
40 Linear Projections (4) This matrix projects the pattern space into an m-dimensional subspace (m<d) whose axes are in the directions of the largest eigenvalues of Σ as follows. y = H xi for i = 1, L, N i The covariance matrix in the new space becomes a diagonal matrix as diag ( 1 2 m λ, λ, L, λ ) 2003/3/11 40
41 Linear Projections (5) This implies that the m new features are uncorrelated. One could choose m so that rm m = λi / λi i= 1 d i= which would assure that 95% of the variance is retained in the new space. Thus a good eigenvector projection is that which retains a large proportion of the variance present in the original feature space with only a few features in the transformed space. 2003/3/11 41
42 Linear Projections (6) 2003/3/11 42
43 Linear Projections (7) 2003/3/11 43
44 Linear Projections (8) 2003/3/11 44
45 Linear Projections (9) There is no guarantee that the features with the largest eigenvalues will be best for preserving the separation among categories. 2003/3/11 45
46 Nonlinear Projections The inability of linear projections to preserve complex data structures has made nonlinear projections more popular in recent years. Most nonlinear projection algorithms are based on maximizing or minimizing an object function. Nonlinear projection algorithms are expensive to use, so several heuristics are employed to reduce the search time for the optimal solution. In exploratory data analysis, we seek two-dimensional projections to visually perceive the structure present in the data. 2003/3/11 46
47 Sammon s Algorithm (1) Sammon proposed a nonlinear technique that tries to create a two-dimensional configuration of points in which interpattern distances are preserved. Let { x i } denote a set of N n-dimensional patterns and let d( i, j) denote the distance between patterns xi and x j in the n-dimensional space. Let { y i } denote a set of N m- dimensional corresponding patterns to be found and let D( i, j) denote the distance between patterns y and y i j in the m-dimensional space. 2003/3/11 47
48 Sammon s Algorithm (2) Sammon suggested looking for minimizing the error function E called stress d i j D i j E = d 1 [ (, ) (, )] ( i, j) d ( i, j) i< j i< j Sammon s algorithm starts with a random configuration of N patterns in m dimensions and use the method of steepest descent to reconfigure the patterns so as to minimize E in an iterative fashion. The algorithm should be applied with several initial configurations to ensure a global minimum of E /3/11 48
49 2003/3/11 49 Sammon s Algorithm (3) = = = + N i k k ik ij ij ij ij ij y y k D i k i d k D i k i d t y t y t E t y t y 1, ) ]( ), ( ), ( ), ( ), ( [ 2 ) ( ) ( ) ( ) ( 1) ( λ α α where < = j i j i d ), ( λ Ref: N. R. Pal and V. K. E;uri, Two efficient connectionist schemes for structure preserving dimensionality reduction, IEEE Trans. on Neural Networks, vol. 9, no. 6, pp , 1998.
50 Sammon s Algorithm (4) (a) (b) Figure: (a) iris data set; (b) 10-dimensional data set. 2003/3/11 50
Unsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationUnsupervised Learning
Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationMarket basket analysis
Market basket analysis Find joint values of the variables X = (X 1,..., X p ) that appear most frequently in the data base. It is most often applied to binary-valued data X j. In this context the observations
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationUnsupervised Learning
Unsupervised Learning A review of clustering and other exploratory data analysis methods HST.951J: Medical Decision Support Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationUnsupervised learning, Clustering CS434
Unsupervised learning, Clustering CS434 Unsupervised learning and pattern discovery So far, our data has been in this form: We will be looking at unlabeled data: x 11,x 21, x 31,, x 1 m x 12,x 22, x 32,,
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationLecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationRoad map. Basic concepts
Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More information5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction
Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationUnsupervised Learning
Unsupervised Learning Fabio G. Cozman - fgcozman@usp.br November 16, 2018 What can we do? We just have a dataset with features (no labels, no response). We want to understand the data... no easy to define
More informationExploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray
Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationHierarchical Clustering
Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationCS7267 MACHINE LEARNING
S7267 MAHINE LEARNING HIERARHIAL LUSTERING Ref: hengkai Li, Department of omputer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) Mingon Kang, Ph.D. omputer Science,
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More informationLecture 15 Clustering. Oct
Lecture 15 Clustering Oct 31 2008 Unsupervised learning and pattern discovery So far, our data has been in this form: x 11,x 21, x 31,, x 1 m y1 x 12 22 2 2 2,x, x 3,, x m y We will be looking at unlabeled
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationPattern Clustering with Similarity Measures
Pattern Clustering with Similarity Measures Akula Ratna Babu 1, Miriyala Markandeyulu 2, Bussa V R R Nagarjuna 3 1 Pursuing M.Tech(CSE), Vignan s Lara Institute of Technology and Science, Vadlamudi, Guntur,
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationClustering. Partition unlabeled examples into disjoint subsets of clusters, such that:
Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationHierarchical Clustering
What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationNonlinear dimensionality reduction of large datasets for data exploration
Data Mining VII: Data, Text and Web Mining and their Business Applications 3 Nonlinear dimensionality reduction of large datasets for data exploration V. Tomenko & V. Popov Wessex Institute of Technology,
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationChapter 6: Cluster Analysis
Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each
More informationClustering Lecture 3: Hierarchical Methods
Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationHierarchical clustering
Hierarchical clustering Rebecca C. Steorts, Duke University STA 325, Chapter 10 ISL 1 / 63 Agenda K-means versus Hierarchical clustering Agglomerative vs divisive clustering Dendogram (tree) Hierarchical
More informationAdministration. Final Exam: Next Tuesday, 12/6 12:30, in class. HW 7: Due on Thursday 12/1. Final Projects:
Administration Final Exam: Next Tuesday, 12/6 12:30, in class. Material: Everything covered from the beginning of the semester Format: Similar to mid-term; closed books Review session on Thursday HW 7:
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationUnsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team
Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised
More informationCLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi
CLUSTER ANALYSIS V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi-110 012 In multivariate situation, the primary interest of the experimenter is to examine and understand the relationship amongst the
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationMultivariate Analysis
Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data
More informationImage Analysis - Lecture 5
Texture Segmentation Clustering Review Image Analysis - Lecture 5 Texture and Segmentation Magnus Oskarsson Lecture 5 Texture Segmentation Clustering Review Contents Texture Textons Filter Banks Gabor
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationHierarchical Clustering
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 0 0 0 00
More informationUnsupervised Learning
Unsupervised Learning Chapter 14: The Elements of Statistical Learning Presented for 540 by Len Tanaka Objectives Introduction Techniques: Association Rules Cluster Analysis Self-Organizing Maps Projective
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationAn Unsupervised Technique for Statistical Data Analysis Using Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 11-20 International Research Publication House http://www.irphouse.com An Unsupervised Technique
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationClustering. Pattern Recognition IX. Michal Haindl. Clustering. Outline
Clustering cluster - set of patterns whose inter-pattern distances are smaller than inter-pattern distances for patterns not in the same cluster a homogeneity and uniformity criterion no connectivity little
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More informationMultivariate Methods
Multivariate Methods Cluster Analysis http://www.isrec.isb-sib.ch/~darlene/embnet/ Classification Historically, objects are classified into groups periodic table of the elements (chemistry) taxonomy (zoology,
More informationWhat is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology
Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More information