Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Similar documents
Cluster Analysis: Agglomerate Hierarchical Clustering

Unsupervised learning in Vision

Hierarchical Clustering 4/5/17

Exploratory Analysis: Clustering

Dimension reduction : PCA and Clustering

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Network Traffic Measurements and Analysis

Unsupervised Learning I: K-Means Clustering

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Clustering and Visualisation of Data

Information Retrieval and Organisation

Introduction to Data Mining

Gene Clustering & Classification

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

COMS 4771 Clustering. Nakul Verma

Dimension Reduction CS534

What to come. There will be a few more topics we will cover on supervised learning

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Based on Raymond J. Mooney s slides

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

Data Informatics. Seon Ho Kim, Ph.D.

Clustering algorithms

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

CSE 5243 INTRO. TO DATA MINING

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Machine Learning using MapReduce

CSE 5243 INTRO. TO DATA MINING

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

Supervised vs unsupervised clustering

Clustering: Overview and K-means algorithm

CSE 158. Web Mining and Recommender Systems. Midterm recap

Unsupervised Learning : Clustering

Exploratory data analysis for microarrays

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Introduction to Artificial Intelligence

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Chapter 4: Text Clustering

CS Introduction to Data Mining Instructor: Abdullah Mueen

General Instructions. Questions

Unsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction.

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Unsupervised Learning

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Mining Social Network Graphs

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Slides adapted from Marshall Tappen and Bryan Russell. Algorithms in Nature. Non-negative matrix factorization

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Unsupervised Learning

Kernels and Clustering

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 19: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia

Clustering k-mean clustering

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

CHAPTER 4: CLUSTER ANALYSIS

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Clustering CE-324: Modern Information Retrieval Sharif University of Technology

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

INF 4300 Classification III Anne Solberg The agenda today:

How do microarrays work

Lecture 25: Review I

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition

Unsupervised Learning and Clustering

CS 343: Artificial Intelligence

Unsupervised Learning and Clustering

Clustering analysis of gene expression data

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Unsupervised learning, Clustering CS434

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Document Clustering: Comparison of Similarity Measures

High throughput Data Analysis 2. Cluster Analysis

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

UVA CS 4501: Machine Learning. Lecture 22: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia. Department of Computer Science

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Approximate Nearest Neighbors. CS 510 Lecture #24 April 18 th, 2014

Intro to Artificial Intelligence

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Unsupervised Learning

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Cluster Analysis. Ying Shen, SSE, Tongji University

University of Florida CISE department Gator Engineering. Clustering Part 2

Chapter 9. Classification and Clustering

Hierarchical clustering

Transcription:

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017

Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data?

Supervised vs. Unsupervised Supervised Predicting an outcome: Loss function used to characterize quality of prediction

Supervised vs. Unsupervised Expected value of y (something we are trying to predict) based on X (our features or evidence for what y should be) Supervised Predicting an outcome: Loss function used to characterize quality of prediction

Supervised vs. Unsupervised Supervised Predicting an outcome Loss function used to characterize quality of prediction Unsupervised No outcome to predict Goal: Infer properties of without a supervised loss function. Often larger data. Don t need to worry about conditioning on another variable.

Concept, In Matrix Form: f1, f2, f3, f4, o1 o2 o3 rows: N observations on columns: p features fp

Concept, In Matrix Form: f1, f2, f3, f4, o1 o2 o3 on fp

Dimensionality reduction Concept, In Matrix Form: f1, f2, f3, f4, Try to best represent but with on p columns. fp c1, c2, c3, c4, o1 o2 o3 o1 o2 o3 on on cp

Clustering: Group observations based on the features (i.e. like reducing the number of observations into K groups). Concept, In Matrix Form: f1, f2, f3, f4, o1 o2 o3 fp Cluster 1 Cluster 2 Cluster 3 on

Feature 2 Concept: in 2-d (clustering) each point is an observation Feature 1

Feature 2 Concept: in 2-d (clustering) Feature 1

Clustering Typical formalization: Given: set of points distance metric (Euclidean, cosine, etc ) number of clusters (not always provided) Do: Group observations together that are similar. Ideally, Members of same cluster are the same. Members of different clusters are different. Keep in mind: usually many more than 2 dimensions.

Clustering Often many dimensions and no clean separation.

Clustering Supposes observations have a true cluster. Often many dimensions and no clean separation.

K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A prototype method (i.e. not based on an algebraic model). Euclidean Distance:

K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A prototype method (i.e. not based on an algebraic model). Euclidean Distance: centers = a random selection of k cluster centers until centers converge: 1. For all xi, find the closest center (according to d) 2. Recalculate centers based on mean of euclidean distance

K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A prototype method (i.e. not based on an algebraic model). Euclidean Distance: centers = a random selection of k cluster centers until centers converge: 1. For all xi, find the closest center (according to d) 2. Recalculate centers based on mean of euclidean distance Example: http://shabal.in/visuals/kmeans/6.html

K-Means Clustering Understanding K-Means (source: Scikit-Learn)

The Curse of Dimensionality Problems with high-dimensional spaces: 1. All points (i.e. observations) are nearly equally far apart. 2. The angle between vectors are almost always 90 degrees (i.e. they are orthogonal).

Hierarchical Clustering f1, f2, f3, f4, o1 o2 o3 fp Cluster 1 Cluster 2 Cluster 3 Cluster 4 on

Hierarchical Clustering f1, f2, f3, f4, o1 o2 o3 fp Cluster 1 Cluster 5 Cluster 2 Cluster 6 Cluster 3 Cluster 4 on

Hierarchical Clustering Agglomerative (bottom up): Initially, each point is a cluster Repeatedly combine the two nearest clusters into one Divisive (top down): Start with one cluster and recursively split it

Hierarchical Clustering Agglomerative (bottom up): Initially, each point is a cluster Repeatedly combine the two nearest clusters into one Divisive (top down): Start with one cluster and recursively split it Regular K-Means is Point assignment clustering : Maintain a set of clusters Points belong to nearest cluster

Hierarchical Clustering Agglomerative (bottom up): Initially, each point is a cluster Repeatedly combine the two nearest clusters into one

Hierarchical Clustering Agglomerative (bottom up): Initially, each point is a cluster Repeatedly combine the two nearest clusters into one Stop when reaching a threshold in Distance between points in cluster, or Maximum distance of points from center Maximum number of points

Hierarchical Clustering Agglomerative (bottom up): Initially, each point is a cluster Repeatedly combine the two nearest clusters into one Stop when reaching a threshold in Distance between points in cluster, or Maximum distance from center Maximum number of points In Euclidean space

Hierarchical Clustering Agglomerative (bottom up): Initially, each point is a cluster Repeatedly combine the two nearest clusters into one But what if we have no centroid? (such as when using cosine distance)

Clustering: Applications

Clustering: Applications

Clustering: Applications

Clustering: Applications (musicmachinery.com)

Clustering: Applications (musicmachinery.com)

Concept: Dimensionality Reduction in 3-D, 2-D, and 1-D Data (or, at least, what we want from the data) may be accurately represented with less dimensions.

Dimensionality reduction Concept, In Matrix Form: f1, f2, f3, f4, Try to best represent but with on p columns. fp c1, c2, c3, c4, o1 o2 o3 o1 o2 o3 on on cp

Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can t be derived from the other columns through addition). Q: What is the rank of this matrix? 1-2 3 2-3 5 1 1 0

Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can t be derived from the other columns). Q: What is the rank of this matrix? 1-2 3 2-3 5 1 1 0 A: 2. The 1st is just the sum of the second two columns we can represent as linear combination of 2 vectors: 1-2 2-3 1 1

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: X[nxp] = U[nxr] D[rxr] V[pxr]T X: original matrix, D: singular values (diagonal), U: left singular vectors, V: right singular vectors

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: X[nxp] = U[nxr] D[rxr] V[pxr]T X: original matrix, D: singular values (diagonal), p n X U: left singular vectors, V: right singular vectors p n

Dimensionality Reduction - PCA - Example X[nxp] = U[nxr] D[rxr] V[pxr]T Users to movies matrix

Dimensionality Reduction - PCA - Example X[nxp] = U[nxr] D[rxr] V[pxr]T

Dimensionality Reduction - PCA - Example X[mxn] = U[mxr] D[rxr] VT[nxr]

Dimensionality Reduction - PCA - Example X[mxn] = U[mxr] D[rxr] VT[nxr] V=

Dimensionality Reduction - PCA - Example X[mxn] = U[mxr] D[rxr] VT[nxr] (UD)T =

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: X[nxp] = U[nxr] D[rxr] V[pxr]T X: original matrix, D: singular values (diagonal), U: left singular vectors, V: right singular vectors Projection (dimensionality reduced space) in 3 dimensions: (U[nx3] D[3x3] V[px3]T) To reduce features in new dataset: Xnew V = Xnew_small

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: X[nxp] = U[nxr] D[rxr] V[pxr]T U, D, and V are unique D: always positive

Dimensionality Reduction v. Clustering Clustering: Group n observations into k clusters Soft Clustering: Assign observations to k clusters with some weight or probability. Dimensionality Reduction: Assign m features to p components with some weight or probability.

Dimensionality Reduction v. Clustering Clustering: Group n observations into k clusters Soft Clustering: Assign observations to k clusters with some weight or probability. Dimensionality Reduction: Assign m features to p components with some weight or probability. Can often use one to do the other with one extra step. Examples From Dimensionality Reduction to Clusters: Use U instead of a V from SVD = mapping observations to soft clusters Project based on V, Apply a threshold on U = mapping observations to clusters Threshold v (or use sparse PCA) = soft clustering of features From Clusters to Dimensionality Reduction: Use soft cluster ids as features