COSC 6339 Big Data Analytics Fuzzy Clustering Some slides based on a lecture by Prof. Shishir Shah Edgar Gabriel Spring 217 Clustering Clustering is a technique for finding similarity groups in data, called clusters. i.e., it groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into different clusters. Clustering is often called an unsupervised learning task as no class values denoting an a priori grouping of the data instances are given. 1
K-means algorithm Given k, the k-means algorithm works as follows: 1) Randomly choose k data points (seeds) to be the initial centroids, cluster centers 2) Assign each data point to the closest centroid 3) Re-compute the centroids using the current cluster memberships. 4) If a convergence criterion is not met, go to 2). Stopping/convergence criterion 1. no (or minimum) re-assignments of data points to different clusters, 2. no (or minimum) change of centroids, or 3. minimum decrease in the sum of squared error (SSE), k 2 SSE dist ( x, m C j ) x j j 1 C j is the jth cluster, m j is the centroid of cluster C j (the mean vector of all the data points in C j ), and dist(x, m j ) is the distance between data point x and centroid m j. 2
Strengths of k-means Strengths: Simple: easy to understand and to implement Efficient: Time complexity: O(tkn), where n is the number of data points, k is the number of clusters, and t is the number of iterations. Since both k and t are small. k-means is considered a linear algorithm. K-means is the most popular clustering algorithm. Note that: it terminates at a local optimum if SSE is used. The global optimum is hard to find due to complexity. Weaknesses of k-means The algorithm is only applicable if the mean is defined. For categorical data, k-mode - the centroid is represented by most frequent values. The user needs to specify k. The algorithm is sensitive to outliers Outliers are data points that are very far away from other data points. Outliers could be errors in the data recording or some special data points with very different values. 3
Weaknesses of k-means: Problems with outliers Weaknesses of k-means: outliers One method is to remove some data points in the clustering process that are much further away from the centroids than other data points. To be safe, we may want to monitor these possible outliers over a few iterations and then decide to remove them. Another method is to perform random sampling. Since in sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small. Assign the rest of the data points to the clusters by distance or similarity comparison, or classification 4
Weaknesses of k-means (cont ) The algorithm is sensitive to initial seeds. Weaknesses of k-means (cont ) If we use different seeds: good results There are some methods to help choose good seeds 5
Weaknesses of k-means (cont ) The k-means algorithm is not suitable for discovering clusters that are not hyper-ellipsoids (or hyper-spheres). + Weaknesses of k-means (cont ) Membership of a point to a single cluster not always clear -> Fuzzy clustering can help with that 6
Boolean Logic In Boolean logic, an object is either a member of a set or is not, i.e. their membership function can be expressed as μ A x = 1 x A x A In Boolean Logic μ A ~ A x = μ A ~ A x = {A U } A set is a collection of objects grouped sharing a common property A boolean set is also referred to as a crisp set Fuzzy Logic Logic based on continuous variables Provides the ability to represent intrinsic ambiguity Fuzzification: the process of finding the membership value of a (scalar) number in a fuzzy set Defuzzification: the process of converting the outcome of a fuzzy set to a single representative number 7
Grade of membership m(x) Fuzzy Sets Indicate that the membership function can be different than just and 1 indicates no membership 1 indicates complete set membership [>,<1] indicate partial membership Superset of Boolean Logic Fuzzy set has three principal components Degree of membership Possible Domain values Membership function: a continuous function that connects a domain value to its degree of membership in the set Fuzzy Numbers Fuzzy number: a fuzzy set representing an approximation to a number Support set Domain 8
Grade of membership m(x) Grade of membership m(x) Grade of membership m(x) Fuzzy number About 2 14 16 18 2 22 24 26 Expectancy Expectancy e: degree of spread e=: normal scalar value Other fuzzy sets Fuzzy set of tall men Fuzzy set for long project 4.5 5 5.5 6 6.5 7 7.5 Height in ft 4 6 8 1 12 14 16 Project duration in weeks 9
Grade of membership m(x) Collection of Fuzzy Sets Child Teen Young adult Middle aged senior 1 15 2 25 3 35 4 Client age (in years) 45 5 55 6 65 7 Each underlying fuzzy set defines a portion of the variables domain A portion is not necessarily uniquely defined Hedges: Fuzzy set transformers A hedge acts on a fuzzy set the same way an adjective acts on a noun Increase or decrease the expectancy of a fuzzy number Intensify or dilute the membership of a fuzzy set Change the shape of a fuzzy set through contrast or restriction 1
Hedge Mathematical Expression Graphical Representation A little [ A (x)] 1.3 Slightly [ A (x)] 1.7 Very [ A (x)] 2 Extremely [ A (x)] 3 Hedge Mathematical Expression Graphical Representation Very very [ A (x)] 4 More or less A (x) Somewhat A (x) Indeed 2 [ A (x)] 2 if A.5 1 2 [1 A (x)] 2 if.5 < A 1 11
Grade of membership m(x) Grade of membership m(x) Alpha Cut Threshold An Alpha cut threshold defines a minimum truth membership level for a fuzzy set Fuzzy set for long project µ[.15] 4 6 8 1 12 14 16 Project duration in wks Fuzzy AND Operator Young adult Middle Aged 1 15 2 25 3 35 4 45 5 55 6 65 7 Client age (in years) Example: region produced by proposition of Young Adult and Middle Aged Mathematical representation μ T x i = min(μ A x i, μ B x i ) 12
Grade of membership m(x) Grade of membership m(x) Fuzzy OR Operator Young adult Middle Aged 1 15 2 25 3 35 4 45 5 55 6 65 7 Client age (in years) Example: region produced by proposition of Young Adult or Middle Aged Mathematical representation μ T x i = max(μ A x i, μ B x i ) Fuzzy NOT Operator Middle Aged 1 15 2 25 3 35 4 45 5 55 6 65 7 Client age (in years) Example: region produced by proposition of NOT Middle Aged Mathematical representation μ T x i = 1 μ A x i 13
Fuzzy Clustering: Motivation Crisp clustering allows each data point to be member of exactly one cluster Fuzzy clustering assign membership values for each cluster Might be zero for some points Fuzzy Clustering Concepts Each data point will have an associated degree of membership for each cluster center in the range of [,1] 14
Fuzzy clustering concepts Fuzzification parameter m m=1 clusters do not overlap m>1 clusters overlap Fuzzy c-means clustering Extension of the k-means algorithm Two steps: calculation of cluster centers Assignment of points to the clusters with varying degree of memberships Constraint on fuzzy membership function associated p with each point: j=1 μ j x i = 1, i=1,..,k p : number of clusters k: number of datapoints x i : i th data point µ j (): function returning the membership value of x i in the j th cluster 15
Fuzzy c-means clustering Minimization of standard loss function p n k=1 i=1 μ k x i m x i ck 2 Basic algorithm Initialize p = number of clusters m = fuzzification parameter c j = cluster centers Repeat for all data points: calculate distance d ij to all centers c j for i=1 to n: update µ j (x i ) using c j for j=1 to p: Update c j using current µ j (x i ) Until c j estimates stabilize Fuzzy c-means clustering With µ j (x i )= 1 d ji p 1 k=1 d ki 1 m 1 1 m 1 d ji being the distance of x i to cluster center c j (e.g. euclidean distance) and c j = i( µ j(x i ) m x i ) i µ j (x i ) m 16
Fuzzy c-means clustering Problem with c-means clustering: Outlier data points still have to be assigned to a cluster Fuzzy Adaptive Clustering Alternative formulation for constraint on membership p n j=1 i=1 µ j (x i ) = n Membership quantifiers for all sample points is n Individual point could have a total value of membership function of <1 => µ j (x i )= p k=1 n 1 d ji m 1 1 n 1 z=1 d kz 1 m 1 17