Hierarchical Clustering 4/5/17

Similar documents
CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

CSE 5243 INTRO. TO DATA MINING

K-Means Clustering 3/3/17

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Lesson 3. Prof. Enza Messina

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Cluster Analysis. Ying Shen, SSE, Tongji University

Kernels and Clustering

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Based on Raymond J. Mooney s slides

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

CS 343: Artificial Intelligence

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Classification: Feature Vectors

Clustering CS 550: Machine Learning

CHAPTER 4: CLUSTER ANALYSIS

Information Retrieval and Web Search Engines

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

University of Florida CISE department Gator Engineering. Clustering Part 2

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Introduction to Data Mining

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

What to come. There will be a few more topics we will cover on supervised learning

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hierarchical and Ensemble Clustering

Clustering Lecture 3: Hierarchical Methods

Unsupervised Learning Hierarchical Methods

Hierarchical Clustering Lecture 9

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC

Clustering algorithms

What is Unsupervised Learning?

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

Unsupervised Learning

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering and Visualisation of Data

11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records

Hierarchical Clustering

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Hierarchical clustering

Unsupervised Learning and Clustering

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Machine Learning (BSMC-GA 4439) Wenke Liu

CSE 573: Artificial Intelligence Autumn 2010

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Unsupervised Learning and Clustering

Machine Learning: Symbol-based

MSA220 - Statistical Learning for Big Data

A Comparison of Document Clustering Techniques

Midterm Examination CS540-2: Introduction to Artificial Intelligence

9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Unsupervised Learning and Data Mining

Clustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme

Machine Learning. Unsupervised Learning. Manfred Huber

ECG782: Multidimensional Digital Signal Processing

Data Informatics. Seon Ho Kim, Ph.D.

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Exploratory Analysis: Clustering

Cluster Analysis: Agglomerate Hierarchical Clustering

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Information Retrieval and Web Search Engines

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Clustering Part 3. Hierarchical Clustering

Unsupervised Learning

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Clustering part II 1

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

Data Mining Algorithms

Clustering. Lecture 6, 1/24/03 ECS289A

COMS 4771 Clustering. Nakul Verma

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning

Chapter DM:II. II. Cluster Analysis

L9: Hierarchical Clustering

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Clustering: Overview and K-means algorithm

Introduction to Machine Learning. Xiaojin Zhu

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

CSE 494 Project C. Garrett Wolf

Objective of clustering

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

10701 Machine Learning. Clustering

Transcription:

Hierarchical Clustering 4/5/17

Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions.

Direction of Clustering Two basic algorithms for building the tree: Agglomerative (bottom-up) Each point starts in its own cluster. Repeatedly merge the two most-similar clusters until only one remains. Divisive (top-down) All points start in a single cluster. Repeatedly split the data into the two most self-similar subsets.

Agglomerative Clustering Each point starts in its own cluster. Repeatedly merge the two most-similar clusters until only one remains. How do we decide which clusters are most similar? Need a measure of similarity between points. Need a measure of similarity between clusters of points.

Similarity Measures Euclidean distance (2-norm) Other distance metrics 1-norm, -norm Cosine similarity Cosine of the angle between two vectors. aaabbb à aabab Requires 3 edits. The smallest number of mutations/deletions/insertions to transform between two words. Good for genomes and text documents Edit distance

p-norm x p! dx 1 x i p i=1 p p = 1 Manhattan distance p = 2 Euclidean distance p = largest distance in any dimension

Cluster Similarity If we ve chosen a point-similarity measure, we still need to decide how to extend it to clusters. Distance between closest points in each cluster (single link). Distance between farthest points in each cluster (complete link). Distance between centroids (average link).

Which clusters should be merged next? Under single link? Under complete link? Under average link? Use Euclidean distance.

Divisive Clustering All points start in a single cluster. Repeatedly split the data into the two most selfsimilar subsets. How do we split the data into subsets? We need a subroutine for 2-clustering. Options include k-means and EM.

Strengths and weaknesses of hierarchical clustering +Creates easy-to-visualize output (dendrograms). +We can pick what level of the hierarchy to use after the fact. +It s often robust to outliers. It s extremely slow: the basic agglomerative clustering algorithm is O(n 3 ); divisive is even worse. Each step is greedy, so the overall clustering may be far from optimal. Doesn t generalize well to new points. Bad for online applications, because adding new points requires recomputing from the start.

Growing Neural Gas A different approach to unsupervised learning: Adaptively learns a map of the data set. Start with a two-node graph, then repeatedly pick a random data point and: 1. Pull nodes of the graph closer to the data point. 2. Occasionally add new nodes and edges in the places where we had to adjust the graph a lot. 3. Discard nodes and edges that haven t been near any data points in a long time.

Growing Neural Gas Demo https://www.youtube.com/watch?v=1zydhqn6p4c

Growing Neural Gas Algorithm Start with two random connected nodes, then repeat 1...9: 1. Pick a random data point. 2. Find the two closest nodes to the data point. 3. Increment the age of all edges from the closest node. 4. Add the squared distance to the error of the closest node. 5. Move the closest node and all of its neighbors toward the data point. Move the closest node more than its neighbors. 6. Connect the two closest nodes or reset their edge age. 7. Remove old edges; if a node is isolated, delete it. 8. Every λ iterations, add a new node. Between the highest-error node and its highest-error neighbor 9. Decay all errors.

Adjusting nodes based on a data point

Adjusting nodes based on a data point This node s error increases This edge s age is set to zero These edges ages increase If age is too great, delete the edge.

Every λ iterations, add a new node Highest error node Highest error neighbor

Consider the GNG hypothesis space What does the output of the GNG look like? What unsupervised learning problem is growing neural gas solving? Is it clustering? Is it dimensionality reduction? Is it something else?