Distance-based Methods: Drawbacks
|
|
- Luke Welch
- 5 years ago
- Views:
Transcription
1 Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1
2 How to Find Irregular Clusters? Divide the whole space into many small areas The density of an area can be estimated Areas may or may not be exclusive A dense area is likely in a cluster Start from a dense area, traverse connected dense areas and discover clusters in irregular shape Jian Pei: CMPT 459/741 Clustering (3) 2
3 Directly Density Reachable Parameters Eps: Maximum radius of the neighborhood MinPts: Minimum number of points in an Epsneighborhood of that point NEps(p): {q dist(p,q) Eps} Core object p: NEps(p) MinPts A core object is in a dense area MinPts = 3 Eps = 1 cm Point q directly density-reachable from p iff q NEps(p) and p is a core object q p Jian Pei: CMPT 459/741 Clustering (3) 3
4 Density-Based Clustering Density-reachable Directly density reachable p 1 àp 2, p 2 àp 3,, p n-1 à p n p n density-reachable from p 1 Density-connected If points p, q are density-reachable from o then p and q are density-connected p p q q p 1 o Jian Pei: CMPT 459/741 Clustering (3) 4
5 DBSCAN A cluster: a maximal set of densityconnected points Discover clusters of arbitrary shape in spatial databases with noise Outlier Border Core Eps = 1cm MinPts = 5 Jian Pei: CMPT 459/741 Clustering (3) 5
6 DBSCAN: the Algorithm Arbitrary select a point p Retrieve all points density-reachable from p wrt Eps and MinPts If p is a core point, a cluster is formed If p is a border point, no points are densityreachable from p and DBSCAN visits the next point of the database Continue the process until all of the points have been processed Jian Pei: CMPT 459/741 Clustering (3) 6
7 Challenges for DBSCAN Different clusters may have very different densities Clusters may be in hierarchies Jian Pei: CMPT 459/741 Clustering (3) 7
8 OPTICS: A Cluster-ordering Method Idea: ordering points to identify the clustering structure Group points by density connectivity Hierarchies of clusters Visualize clusters and the hierarchy Jian Pei: CMPT 459/741 Clustering (3) 8
9 Ordering Points Points strongly density-connected should be close to one another Clusters density-connected should be close to one another and form a cluster of clusters Jian Pei: CMPT 459/741 Clustering (3) 9
10 OPTICS: An Example Reachability-distance undefined ε ε ε Cluster-order of the objects Jian Pei: CMPT 459/741 Clustering (3) 10
11 DENCLUE: Using Density Functions DENsity-based CLUstEring Major features Solid mathematical foundation Good for data sets with large amounts of noise Allow a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets Significantly faster than existing algorithms (faster than DBSCAN by a factor of up to 45) But need a large number of parameters Jian Pei: CMPT 459/741 Clustering (3) 11
12 DENCLUE: Techniques Use grid cells Only keep grid cells actually containing data points Manage cells in a tree-based access structure Influence function: describe the impact of a data point on its neighborhood Overall density of the data space is the sum of the influence function of all data points Clustering by identifying density attractors Density attractor: local maximal of the overall density function Jian Pei: CMPT 459/741 Clustering (3) 12
13 Density Attractor Jian Pei: CMPT 459/741 Clustering (3) 13
14 Center-defined and Arbitrary Clusters Jian Pei: CMPT 459/741 Clustering (3) 14
15 A Shrinking-based Approach Difficulties of Multi-dimensional Clustering Noise (outliers) Clusters of various densities Not well-defined shapes A novel preprocessing concept Shrinking A shrinking-based clustering approach Jian Pei: CMPT 459/741 Clustering (3) 15
16 Intuition & Purpose For data points in a data set, what if we could make them move towards the centroid of the natural subgroup they belong to? Natural sparse subgroups become denser, thus easier to be detected Noises are further isolated Jian Pei: CMPT 459/741 Clustering (3) 16
17 Inspiration Newton s Universal Law of Gravitation Any two objects exert a gravitational force of attraction on each other The direction of the force is along the line joining the objects The magnitude of the force is directly proportional to the product of the gravitational masses of the objects, and inversely proportional to the square of the distance between them G: universal gravitational constant G = 6.67 x N m 2 /kg 2 Fg = 2 G m m 1 r 2 Jian Pei: CMPT 459/741 Clustering (3) 17
18 The Concept of Shrinking A data preprocessing technique Aim to optimize the inner structure of real data sets Each data point is attracted by other data points and moves to the direction in which way the attraction is the strongest Can be applied in different fields Jian Pei: CMPT 459/741 Clustering (3) 18
19 Apply shrinking into clustering field Shrink the natural sparse clusters to make them much denser to facilitate further cluster-detecting process. Multiattribute hyperspac e Jian Pei: CMPT 459/741 Clustering (3) 19
20 Data Shrinking Each data point moves along the direction of the density gradient and the data set shrinks towards the inside of the clusters Points are attracted by their neighbors and move to create denser clusters It proceeds iteratively; repeated until the data are stabilized or the number of iterations exceeds a threshold Jian Pei: CMPT 459/741 Clustering (3) 20
21 Approximation & Simplification Problem: Computing mutual attraction of each data points pair is too time consuming O(n 2 ) Solution: No Newton's constant G, m 1 and m 2 are set to unit Only aggregate the gravitation surrounding each data point Use grids to simplify the computation Jian Pei: CMPT 459/741 Clustering (3) 21
22 Termination condition Average movement of all points in the current iteration is less than a threshold The number of iterations exceeds a threshold Jian Pei: CMPT 459/741 Clustering (3) 22
23 Optics on Pendigits Data Before data shrinking After data shrinking Jian Pei: CMPT 459/741 Clustering (3) 23
24 Biclustering Clustering both objects and attributes simultaneously Four requirements Only a small set of objects in a cluster (bicluster) A bicluster only involves a small number of attributes An object may participate in multiple biclusters or no biclusters An attribute may be involved in multiple biclusters, or no biclusters Jian Pei: Big Data Analytics -- Clustering 24
25 Application Examples Recommender systems Objects: users Attributes: items Values: user ratings Microarray data Objects: genes Attributes: samples Values: expression levels gene sample/condition w 11 w 21 w 31 w n1 w 12 w 22 w 32 w n2 w 1m w 2m w 3m w nm Jian Pei: Big Data Analytics -- Clustering 25
26 Biclusters with Constant Values b 6 b 12 b 36 b 99 a a a On rows Jian Pei: Big Data Analytics -- Clustering 26
27 Biclusters with Coherent Values Also known as pattern-based clusters Jian Pei: Big Data Analytics -- Clustering 27
28 Biclusters with Coherent Evolutions Only up- or down-regulated changes over rows or columns Coherent evolutions on rows Jian Pei: Big Data Analytics -- Clustering 28
29 Differences from Subspace Clustering Subspace clustering uses global distance/ similarity measure Pattern-based clustering looks at patterns A subspace cluster according to a globally defined similarity measure may not follow the same pattern Jian Pei: Big Data Analytics -- Clustering 29
30 Objects Follow the Same Pattern? pscore Object blue Obejct green D 1 D 2 The less the pscore, the more consistent the objects Jian Pei: Big Data Analytics -- Clustering 30
31 Jian Pei: Big Data Analytics -- Clustering 31 Pattern-based Clusters pscore: the similarity between two objects r x, r y on two attributes a u, a v δ-pcluster (R, D): for any objects r x, r y R and any attributes a u, a v D, ).. ( ).. (.... v y v x u y u x v y u y v x u x a r a r a r a r a r a r a r a r pscore = 0) (.... δ δ v y u y v x u x a r a r a r a r pscore
32 Maximal pcluster If (R, D) is a δ-pcluster, then every subcluster (R, D ) is a δ-pcluster, where R R and D D An anti-monotonic property A large pcluster is accompanied with many small pclusters! Inefficacious Idea: mining only the maximal pclusters! A δ-pcluster is maximal if there exists no proper super cluster as a δ-pcluster Jian Pei: Big Data Analytics -- Clustering 32
33 Mining Maximal pclusters Given A cluster threshold δ An attribute threshold min a An object threshold min o Task: mine the complete set of significant maximal δ-pclusters A significant δ-pcluster has at least min o objects on at least min a attributes Jian Pei: Big Data Analytics -- Clustering 33
34 pcluters and Frequent Itemsets A transaction database can be modeled as a binary matrix Frequent itemset: a sub-matrix of all 1 s 0-pCluster on binary data Min o : support threshold Min a : no less than mina attributes Maximal pclusters closed itemsets Frequent itemset mining algorithms cannot be extended straightforwardly for mining pclusters on numeric data Jian Pei: Big Data Analytics -- Clustering 34
35 Where Should We Start from? How about the pclusters having only 2 objects or 2 attributes? MDS (maximal dimension set) A pcluster must have at least 2 objects and 2 attributes Objects Finding MDSs Attribute a b c d e f g h x y x - y Jian Pei: Big Data Analytics -- Clustering 35
36 How to Assemble Larger pclusters? Systematically enumerate every combination of attributes D For each attribute subset, find the maximal subsets of objects R s.t. (R, D) is a pcluster Check whether (R, D) is maximal Prune search branches as early as possible Why attribute-first-objectlater? # of objects >> # attributes Algorithm MaPle (Pei et al, 2003) Jian Pei: Big Data Analytics -- Clustering 36
37 More Pruning Techniques Only possible attributes should be considered to get larger pclusters Pruning local maximal pclusters having insufficient possible attributes Extracting common attributes from possible attribute set directly Prune non-maximal pclusters Jian Pei: Big Data Analytics -- Clustering 37
38 Gene-Sample-Time Series Data Sample-Time Matrix Sample time2 time1 sample1 sample2 Time gene1 gene2 Gene-Sample Matrix Gene-Time Matrix Gene expression level of gene i on sample j at time k Jian Pei: Big Data Analytics -- Clustering 38
39 Mining GST Microarray Data Reduce the gene-sample-time series data to gene-sample data Use the Pearson's correlation coeffcient as the coherence measure Jian Pei: Big Data Analytics -- Clustering 39
40 Basic Approaches Sample-gene search Enumerate the subsets of samples systematically For each subset of samples, find the genes that are coherent on the samples Gene-sample search Enumerate the subsets of genes systematically For each subset of genes, find the samples on which the genes are coherent Jian Pei: Big Data Analytics -- Clustering 40
41 Basic Tools Set enumeration tree Sample-gene search and gene-sample search are not symmetric! Many genes, but a few samples No requirement on samples coherent on genes Jian Pei: Big Data Analytics -- Clustering 41
42 Phenotypes and Informative Genes samples Informative Genes gene 1 gene 2 gene 3 gene 4 Noninformative Genes gene 5 gene 6 gene 7 Jian Pei: Big Data Analytics -- Clustering 42
43 The Phenotype Mining Problem Input: a microarray matrix and k Output: phenotypes and informative genes Partitioning the samples into k exclusive subsets phenotypes Informative genes discriminating the phenotypes Machine learning methods Heuristic search Mutual reinforcing adjustment Jian Pei: Big Data Analytics -- Clustering 43
44 Requirements The expression levels of each informative gene should be similar over the samples within each phenotype The expression levels of each informative gene should display a clear dissimilarity between each pair of phenotypes Jian Pei: Big Data Analytics -- Clustering 44
45 To-Do List Read Chapters 10.4 and 11.2 Assignment 3 Jian Pei: CMPT 459/741 Clustering (3) 45
DS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationData Mining 4. Cluster Analysis
Data Mining 4. Cluster Analysis 4.5 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction DBSCAN Algorithm OPTICS Algorithm DENCLUE Algorithm References Outline Introduction Introduction Density-based
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationDBSCAN. Presented by: Garrett Poppe
DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large spatial databases with noise by Martin Ester, Hans-peter Kriegel, Jörg S, Xiaowei Xu Slides adapted from resources
More informationClustering Lecture 4: Density-based Methods
Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationCHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH
37 CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 4.1 INTRODUCTION Genes can belong to any genetic network and are also coordinated by many regulatory
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Cluster Analysis Reading: Chapter 10.4, 10.6, 11.1.3 Han, Chapter 8.4,8.5,9.2.2, 9.3 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber &
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationDensity-Based Clustering. Izabela Moise, Evangelos Pournaras
Density-Based Clustering Izabela Moise, Evangelos Pournaras Izabela Moise, Evangelos Pournaras 1 Reminder Unsupervised data mining Clustering k-means Izabela Moise, Evangelos Pournaras 2 Main Clustering
More informationClustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY
Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm Clustering is an unsupervised machine learning algorithm that divides a data into meaningful sub-groups,
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationData Mining Algorithms
for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationKnowledge Discovery in Databases
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering
More informationLecture 7 Cluster Analysis: Part A
Lecture 7 Cluster Analysis: Part A Zhou Shuigeng May 7, 2007 2007-6-23 Data Mining: Tech. & Appl. 1 Outline What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Outline Prototype-based Fuzzy c-means
More informationUnsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team
Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationA Review on Cluster Based Approach in Data Mining
A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More information! Introduction. ! Partitioning methods. ! Hierarchical methods. ! Model-based methods. ! Density-based methods. ! Scalability
Preview Lecture Clustering! Introduction! Partitioning methods! Hierarchical methods! Model-based methods! Densit-based methods What is Clustering?! Cluster: a collection of data objects! Similar to one
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationPAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods
Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:
More informationAnalysis and Extensions of Popular Clustering Algorithms
Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University
More informationA Comparative Study of Various Clustering Algorithms in Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationHierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1
Hierarchy An arrangement or classification of things according to inclusiveness A natural way of abstraction, summarization, compression, and simplification for understanding Typical setting: organize
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationFaster Clustering with DBSCAN
Faster Clustering with DBSCAN Marzena Kryszkiewicz and Lukasz Skonieczny Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland Abstract. Grouping data
More information数据挖掘 Introduction to Data Mining
数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationData Mining: Concepts and Techniques. Chapter 7 Jiawei Han. University of Illinois at Urbana-Champaign. Department of Computer Science
Data Mining: Concepts and Techniques Chapter 7 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 6 Jiawei Han and Micheline Kamber, All rights reserved
More informationData Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data matrix and
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationCS Data Mining Techniques Instructor: Abdullah Mueen
CS 591.03 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 6: BASIC CLUSTERING Chapter 10. Cluster Analysis: Basic Concepts and Methods Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationData Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A
More informationCOMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS
COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,
More informationA Parallel Community Detection Algorithm for Big Social Networks
A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic
More informationMobility Data Management & Exploration
Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter
More informationCluster Analysis: Basic Concepts and Algorithms
Cluster Analysis: Basic Concepts and Algorithms Data Warehousing and Mining Lecture 10 by Hossen Asiful Mustafa What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationChapter VIII.3: Hierarchical Clustering
Chapter VIII.3: Hierarchical Clustering 1. Basic idea 1.1. Dendrograms 1.2. Agglomerative and divisive 2. Cluster distances 2.1. Single link 2.2. Complete link 2.3. Group average and Mean distance 2.4.
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More information2. Background. 2.1 Clustering
2. Background 2.1 Clustering Clustering involves the unsupervised classification of data items into different groups or clusters. Unsupervised classificaiton is basically a learning task in which learning
More informationFoundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot
Foundations of Machine Learning CentraleSupélec Fall 2017 12. Clustering Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectives
More informationCS412 Homework #3 Answer Set
CS41 Homework #3 Answer Set December 1, 006 Q1. (6 points) (1) (3 points) Suppose that a transaction datase DB is partitioned into DB 1,..., DB p. The outline of a distributed algorithm is as follows.
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationClustering Techniques
Clustering Techniques Marco BOTTA Dipartimento di Informatica Università di Torino botta@di.unito.it www.di.unito.it/~botta/didattica/clustering.html Data Clustering Outline What is cluster analysis? What
More informationCommunity Detection. Jian Pei: CMPT 741/459 Clustering (1) 2
Clustering Community Detection http://image.slidesharecdn.com/communitydetectionitilecturejune0-0609559-phpapp0/95/community-detection-in-social-media--78.jpg?cb=3087368 Jian Pei: CMPT 74/459 Clustering
More informationTriclustering in Gene Expression Data Analysis: A Selected Survey
Triclustering in Gene Expression Data Analysis: A Selected Survey P. Mahanta, H. A. Ahmed Dept of Comp Sc and Engg Tezpur University Napaam -784028, India Email: priyakshi@tezu.ernet.in, hasin@tezu.ernet.in
More informationDENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE
DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Scalable Clustering Methods: BIRCH and Others Reading: Chapter 10.3 Han, Chapter 9.5 Tan Cengiz Gunay, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei.
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationBiclustering for Microarray Data: A Short and Comprehensive Tutorial
Biclustering for Microarray Data: A Short and Comprehensive Tutorial 1 Arabinda Panda, 2 Satchidananda Dehuri 1 Department of Computer Science, Modern Engineering & Management Studies, Balasore 2 Department
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationCHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling
To Appear in the IEEE Computer CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling George Karypis Eui-Hong (Sam) Han Vipin Kumar Department of Computer Science and Engineering University
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationHeterogeneous Density Based Spatial Clustering of Application with Noise
210 Heterogeneous Density Based Spatial Clustering of Application with Noise J. Hencil Peter and A.Antonysamy, Research Scholar St. Xavier s College, Palayamkottai Tamil Nadu, India Principal St. Xavier
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 09: Vector Data: Clustering Basics Instructor: Yizhou Sun yzsun@cs.ucla.edu October 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification
More informationBiclustering with δ-pcluster John Tantalo. 1. Introduction
Biclustering with δ-pcluster John Tantalo 1. Introduction The subject of biclustering is chiefly concerned with locating submatrices of gene expression data that exhibit shared trends between genes. That
More informationClustering Algorithms for Spatial Databases: A Survey
Clustering Algorithms for Spatial Databases: A Survey Erica Kolatch Department of Computer Science University of Maryland, College Park CMSC 725 3/25/01 kolatch@cs.umd.edu 1. Introduction Spatial Database
More informationDBRS: A Density-Based Spatial Clustering Method with Random Sampling. Xin Wang and Howard J. Hamilton Technical Report CS
DBRS: A Density-Based Spatial Clustering Method with Random Sampling Xin Wang and Howard J. Hamilton Technical Report CS-2003-13 November, 2003 Copyright 2003, Xin Wang and Howard J. Hamilton Department
More informationCSE 347/447: DATA MINING
CSE 347/447: DATA MINING Lecture 6: Clustering II W. Teal Lehigh University CSE 347/447, Fall 2016 Hierarchical Clustering Definition Produces a set of nested clusters organized as a hierarchical tree
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationEfficient Parallel DBSCAN algorithms for Bigdata using MapReduce
Efficient Parallel DBSCAN algorithms for Bigdata using MapReduce Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Software Engineering Submitted
More informationSponsored by AIAT.or.th and KINDML, SIIT
CC: BY NC ND Table of Contents Chapter 4. Clustering and Association Analysis... 171 4.1. Cluster Analysis or Clustering... 171 4.1.1. Distance and similarity measurement... 173 4.1.2. Clustering Methods...
More informationDATA MINING - 1DL105, 1Dl111. An introductory class in data mining
1 DATA MINING - 1DL105, 1Dl111 Fall 007 An introductory class in data mining http://user.it.uu.se/~udbl/dm-ht007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationMathematical Morphology and Distance Transforms. Robin Strand
Mathematical Morphology and Distance Transforms Robin Strand robin.strand@it.uu.se Morphology Form and structure Mathematical framework used for: Pre-processing Noise filtering, shape simplification,...
More informationThe Parameter-less Randomized Gravitational Clustering algorithm with online clusters structure characterization
Prog Artif Intell (2014) 2:217 236 DOI 10.1007/s13748-014-0054-5 REGULAR PAPER The Parameter-less Randomized Gravitational Clustering algorithm with online clusters structure characterization Jonatan Gomez
More informationA Survey on DBSCAN Algorithm To Detect Cluster With Varied Density.
A Survey on DBSCAN Algorithm To Detect Cluster With Varied Density. Amey K. Redkar, Prof. S.R. Todmal Abstract Density -based clustering methods are one of the important category of clustering methods
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationWeb Structure Mining Community Detection and Evaluation
Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside
More informationMining Quantitative Maximal Hyperclique Patterns: A Summary of Results
Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results Yaochun Huang, Hui Xiong, Weili Wu, and Sam Y. Sung 3 Computer Science Department, University of Texas - Dallas, USA, {yxh03800,wxw0000}@utdallas.edu
More informationAcknowledgements First of all, my thanks go to my supervisor Dr. Osmar R. Za ane for his guidance and funding. Thanks to Jörg Sander who reviewed this
Abstract Clustering means grouping similar objects into classes. In the result, objects within a same group should bear similarity to each other while objects in different groups are dissimilar to each
More informationUnsupervised Learning. Unsupervised Learning. What is Clustering? Unsupervised Learning I Clustering 9/7/2017. Clustering
Unsupervised Learning Clustering Centroid models (K-mean) Connectivity models (hierarchical clustering) Density models (DBSCAN) Graph-based models Subspace models (Biclustering) Feature extraction techniques
More informationChapter 5: Outlier Detection
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More information