Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 11 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu

Clustering II

Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the data Obtain data representation in the low-dimensional space that can be easily clustered Variety of methods that use the eigenvectors differently Difficult to understand.

Graph Theory Basics A graph G = (V,E) consists of a vertex set V and an edge set E. If G is a directed graph, each edge is an ordered pair of vertices A bipartite graph is one in which the vertices can be divided into two groups, so that all edges join vertices in different groups.

Similarity Graph Distance decrease similarity increase Represent dataset as a weighted graph G(V,E) Wij represent similarity between vertex If Wij=0 where isn t similarity; Wii=0 V={x i } E={W ij } Set of n vertices representing data points Set of weighted edges indicating pair-wise similarity between points 1 0.1 5 2 3 0.6 4 0.7 6

Graph Partitioning Clustering can be viewed as partitioning a similarity graph Bi-partitioning task: Divide vertices into two disjoint groups (A,B) A 1 5 B 2 4 6 3

Partitioning Criterion Traditional definition of a good clustering: Points assigned to same cluster should be highly similar. Points assigned to different clusters should be highly dissimilar 1 0.1 5 2 3 0.6 4 0.7 6

Graph Cut Express partitioning objectives as a function of the edge cut of the partition. Cut: Set of edges with only one vertex in a group.we wants to find the minimal cut between groups. The groups that has the minimal cut would be the partition A 1 0.1 B 5 2 0.6 4 6 3 0.7

Min-Cut Minimize weight of connections between groups Optimal cut Minimum cut

Normalized Cut Consider the connectivity between groups relative to the density of each group Normalize the association between groups by volume. Vol(A): The total weight of the edges originating from group A. Why use this criterion? Minimizing the normalized cut is equivalent to maximizing normalized association. Produces more balanced partitions.

Spectral Graph Theory Possible approach Represent a similarity graph as a matrix Apply knowledge from Linear Algebra The eigenvalues and eigenvectors of a matrix provide global information about its structure. Spectral Graph Theory Analyze the spectrum of matrix representing a graph. Spectrum : The eigenvectors of a graph, ordered by the magnitude(strength) of their corresponding eigenvalues

Matrix Representation x 1 x 2 x 3 x 4 x 5 x 6 1 0.1 5 x 1 0 0.6 0 0.1 0 x 2 0 0 0 0 2 0.6 4 6 x 3 0.6 0 0 0 3 0.7 x 4 0 0 0 0.7 x 5 0.1 0 0 0 x 6 0 0 0 0.7 0 x 1 x 2 x 3 x 4 x 5 x 6 0.1 x 1 1.5 0 0 0 0 0 2 1 0.6 4 5 6 x 2 0 1.6 0 0 0 0 x 3 0 0 1.6 0 0 0 3 0.7 x 4 0 0 0 1.7 0 0 x 5 0 0 0 0 1.7 0 x 6 0 0 0 0 0 1.5

Laplacian Matrix x 1 x 2 x 3 x 4 x 5 x 6 x 1 1.5 0 0 0 0 0 x 1 x 2 x 3 x 4 x 5 x 6 x 1 0 0.6 0 0.1 0 x 1 x 2 x 3 x 4 x 5 x 6 x 2 0 1.6 0 0 0 0 x 3 0 0 1.6 0 0 0 x 4 0 0 0 1.7 0 0 x 5 0 0 0 0 1.7 0 x 6 0 0 0 0 0 1.5 - x 2 0 0 0 0 x 3 0.6 0 0 0 x 4 0 0 0 0.7 x 5 0.1 0 0 0 x 6 0 0 0 0.7 0 = x 1 1.5 - -0.6 0-0.1 0 x 2-1.6-0 0 0 x 3-0.6-1.6-0 0 x 4 0 0-1.7 - -0.7 x 5-0.1 0 0-1.7-1 0.1 5 x 6 0 0 0-0.7-1.5 2 0.6 4 6 3 0.7

Normalized Laplacian 0.1 0.7 0.6 1 2 3 4 5 6-0.06-0.39-0.52 1.00-0.50 1.00-0.52-0.12 1.00-0.50-0.39-0.44-0.47 1.00-0.12-0.50 1.00 0.47- -0.06 1.00-0.50-0.44

Spectral Clustering Three basic stages: Pre-processing Construct a matrix representation of the dataset. Decomposition Compute eigenvalues and eigenvectors of the matrix. Map each point to a lower-dimensional representation based on one or more eigenvectors. Grouping Assign points to two or more clusters, based on the new representation.

Min-cut x 1 x 2 x 3 x 4 x 5 x 6 x 1 1.5 - -0.6 0-0.1 0 x 2-1.6-0 0 0 x 3-0.6-1.6-0 0 x 4 0 0-1.7 - -0.7 x 5-0.1 0 0-1.7 - x 6 0 0 0-0.7-1.5 x 1 x 2 x 3 0.0 0.4 0.1 0.4 - -0.9 x 4-0.4 0.4 0.4 0.1-0. 0.4 0.3 2.2 0.4 0.4-0.4 - Λ = X = -0.7 2.3 0.9 0.0 - -0.4 0.6-0.6 x 5 x 6-0.7 2.5 0.4-0.7-0.4 - -0.6-3.0 0.4-0.7-0.5 0.9

k-way Partitioning Partition using only one eigenvector at a time Use procedure recursively Example: Image Segmentation Uses 2 nd (smallest) eigenvector to define optimal cut Recursively generates two clusters with each cut

k-way Partitioning Use k eigenvectors (k chosen by user) Directly compute k-way partitioning Experimentally has been seen to be better

Spectral Clustering with Data Vectors Given a set of data points Form the pairwise affinity matrix Construct the normalized Laplacian matrix Stack the k smallest eigenvectors Renormalize each eigenvector to have a unit norm and perform k-means

Gaussian Affinity

Magic Sigma σ = 0.041235 σ = 0.015625 σ = 0.35355 σ = 1

Local Scaling Instead of selecting a single scaling parameter σ, calculate a local scaling parameter σi for each data point

Local Scaling (a) (b) Figure 2: The effect of local scaling. (a) Input data points. A tight clu a background cluster. (b) The affinity between each point and its surr is indicated by the thickness of the line connecting them. The affinities larger than the affinities within the background cluster. (c) The correspo of affinities after local scaling. The affinities across clusters are now than the affinities within any single cluster. Introducing Local Scaling: Instead of selecting a single scaling param to calculate a local scaling parameter σ i for each data point s i. Th

http://www.vision.caltech.edu/lihi/demos/selftuningclustering.html

How to determine k Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that maximises the expression 50 45 40 35 Eigenvalue 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 K