Random Swap algorithm

Size: px

Start display at page:

Download "Random Swap algorithm"

Amy Arnold
5 years ago
Views:

1 Random Swap algorithm Pasi Fränti

2 Definitions and data Set of N data points: X={x 1, x 2,, x N } Partition of the data: P={p 1, p 2,,p k }, Set of k cluster prototypes (centroids): C={c 1, c 2,,c k },

3 N i P i i C x d dn MSE N i C x P j i k j i 1, argmin 2 1 k j x C j P j P i j i i 1 1, Clustering problem Objective function: Optimality of partition: Optimality of centroid:

4 K-means algorithm X = Data set C = Cluster centroids P = Partition K-Means(X, C) (C, P) REPEAT C prev C; FOR i=1 TO N DO p i FindNearest(x i, C); FOR j=1 TO k DO c j Average of x i p i = j; Optimal partition Optimal centoids UNTIL C = C prev

5 Problems of k-means

6 Swapping strategy

7 Pigeon hole principle CI = Centroid index: P. Fränti, M. Rezaei and Q. Zhao "Centroid index: cluster level similarity measure Pattern Recognition, 47 (9), , September 2014, 2014.

8 Pigeon hole principle S2

9 Aim of the swap One centroid, but two clusters. Two centroids, but only one cluster. S2

10 Random Swap algorithm Random Swap(X) C, P C Select random representatives(x); P Optimal partition(x, C); REPEAT T times (C new,j) Random swap(x, C); P new Local repartition(x, C new, P, j); C new, P new Kmeans(X, C new, P new ); IF f(c new, P new ) < f(c, P) THEN (C, P) C new, P new ; RETURN (C, P);

11 Steps of the swap 1. Random swap: c j x i j random( 1, k), i random(1, N) 2. Re-allocate vectors from old cluster: p i argmind 1 jk 3. Create new cluster: 2 x, c i p j i j 2 1 p arg min d x, c i, N i k j kp i i k i

12 Swap Swap is made from centroid rich area to centroid poor area.

13 Local re-partition Re-allocate vectors Create new cluster

14 Iterate by k-means 1 st iteration

15 Iterate by k-means 2 nd iteration

16 Iterate by k-means 3 rd iteration

17 Iterate by k-means 16 th iteration

18 Iterate by k-means 17 th iteration

19 Iterate by k-means 18 th iteration

20 Iterate by k-means 19 th iteration

21 Final result 25 iterations

22 Extreme example

23 Dependency on initial solution MSE K-means Random + RS K-means + RS Split + RS Bridge Ward + RS

24 Data sets

25 Data sets

26 Data sets visualized Images Bridge House Miss America Europe 4x4 blocks RGB color 4x4 blocks frame differential Differential coordinates 16-d 3-d 16-d 2-d

27 Data sets visualized Artificial S 1 S 2 S 3 S 4 Unbalance DIM 32

28 Data sets visualized Artificial G G G A 1 A 2 A 3

29 Time complexity

30 Efficiency of the random swap Total time to find correct clustering: Time per iteration Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) K-means: IkN = O(IkN) Bottleneck!

31 Efficiency of the random swap Total time to find correct clustering: Time per iteration Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) (Fast) K-means: 4N = O(N) 2 iterations only! T. Kaukoranta, P. Fränti and O. Nevalainen "A fast exact GLA based on code vector activity detection" IEEE Trans. on Image Processing, 9 (8), , August 2000.

32 Estimated and observed steps Bridge N=4096, k=256, N/k=16, 8

33 Processing time profile 10 8 N = 4096 k = Bridge Time (ms) 6 4 KM 2 39 % 2 KM 1 53 % 0 Local repartition

34 Processing time profile N = k = Birch1 Time (ms) KM 2 48 % CI=0 reached 40 0 KM 1 49 % Local repartition 50 % 44 %

35 Effect of K-means iterations 190 MSE Bridge Time (s)

36 Effect of K-means iterations MSE Birch Time (s)

37 How many swaps?

38 Three types of swaps Trial swap Accepted swap Successful swap MSE improves CI improves Before swap Accepted Successful CI=2 CI=2 CI=

39 Accepted and successful swaps

40 Number of swaps needed Example with 35 clusters A3 CI=4 CI=9

41 Number of swaps needed Example from image quantization K-means clustering result (3 swaps needed) Final clustering result Remove Added Remove

42 Statistical observations N=5000, k=15, d=2, N/k=333, % 15 % 10 % 5 % Median = 26 Average = 35 S1 Max = % cases < 70 0 % Swaps

43 Statistical observations N=5000, k=15, d=2, N/k=333, % 15 % 10 % Median = 18 Average = 25 S4 Max = % 90% cases < 50 0 % Swaps

44 Theoretical estimation

45 Probability of good swap Select a proper prototype to remove: There are k clusters in total: p removal =1/k Select a proper new location: There are N choices: p add =1/N Only k are significantly different: p add =1/k Both happens same time: k 2 significantly different swaps. Probability of each different swap is p swap =1/k 2 Open question: how many of these are good? p = (/k)(/k) = O(/k) 2

46 Expected number of iterations Probability of not finding good swap: q 2 1 k 2 T Estimated number of iterations: log q T 2 T log 1 k 2 log q 2 log 1 2 k

47 Probability of failure (q) depending on T q Iterations

48 Observed probability (%) of fail N=5000, k=15, d=2, N/k=333, q =10% 10 q q =1% q =0.1% 0.1 S 1 -S 4 S1 S2 S3 S4 Dataset

49 Observed probability (%) of fail N=1024, k=16, d=32-128, N/k=64, q Dimensions q =10% q =1% q =0.1% Dim

50 Bounds for the iterations Upper limit: T ln 1 ln q - ln q -ln q α / k α / k k α 2 Lower limit similarly; resulting in: T 2 k - ln q 2 α

51 Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations: i T i w i k k i T q log 1 ˆ k w k i t w i

52 Expected time complexity tˆ N, k log w N -log 1. Linear dependency on N k α w 2. Quadratic dependency on k (With large number of clusters, it can be too slow) 3. Logarithmic dependency on w (Close to constant) Inverse dependency on (Higher the dimensionality, faster the method) α Nk 2

53 Linear dependency on N N< , k=100, d=2, N/k=1000, % Birch2 Iterations N= % Data size (thousands)

54 Quadratic dependency on k N< , k=100, d=2, N/k=1000, Birch2 Iterations Quadratic fit 75% 25% Clusters

55 Logarithmic dependency on w

56 Theory vs. reality

57 Neighborhood size

58 How much is? Voronoi neighbors Neighbors by distance 1 2 S dim: D-dim: 2(3k-6)/k = 6 12/k 2kD/2/k = O(2k D/2-1) Upper limit: k

59 Observed number of neighbors Data set S 2 Frequency 45 % 40 % 35 % 30 % 25 % 20 % 15 % 10 % 5 % 0 % Average = Number of neighbours

60 Estimate Five iterations of random swap clustering Each pair of prototypes A and B: 1. Calculate the half point HP = (A+B)/2 2. Find the nearest prototype C for HP 3. If C=A or C=B they are potential neighbors. Analyze potential neighbors: 1. Calculate all vector distances across A and B 2. Select the nearest pair (a,b) 3. If d(a,b) < min( d(a,c(a), d(b,c(b) ) then Accept = Number of pairs found / k

61 Observed values of

62 Optimality

63 Multiple optima (=plateaus) Very similar result (<0.3% diff. in MSE) CI-values significantly high (9%) Finds one of the near-optimal solutions

64 Experiments

65 Time-versus-distortion N=4096, k=256, d=16, N/k=16, 5.4 MSE K-means (single) 70 (28) (172) 60 (365) (825) 30 (3k) (146) Random Swap Bridge Repeated k-means ( repeats) (13k) (78k) Time 5 61 (10k) 0 (387k) (813k)

66 Time-versus-distortion N=6480, k=256, d=16, N/k=25, 17.1 MSE K-means (single) (62) Random Swap (501) (1632) (136) Missa Repeated k-means ( repeats) 88 (10k) (23k) (70k) (135k) (940k) Time

67 Time-versus-distortion N= , k=100, d=2, N/k=1000, (2) Birch1 MSE (22) 5 (43) Random Swap K-means 7 (single) (606) 0 Repeated k-means (2500 repeats) Time 3 (489)

68 Time-versus-distortion N= , k=100, d=2, N/k=1000, 3.1 MSE (1) 18 K-means (single) 20 (34) 10 (64) 5 Random Swap Repeated k-means (2500 repeats) (41) 0 (2067) Birch2 9 (845) (2500) Time

69 Time-versus-distortion N= , k=256, d=2, N/k=663, Europe MSE (11) 80 (24) Random Swap 60 (44) K-means (single) (305) Repeated k-means (1888) (9k) (500 repeats) Time 10 (58k) (576k)

70 Time-versus-distortion N=6500, k=8, d=2, N/k=821, (4) K-means (single) Unbalance MSE (20) 2 1 (43) (98) 0 (116) Random swap Repeated k-means Time 2 1 (3.219) 1 (20.000)

71 Variation of results 15 % 10 % Random swap Bridge 5 % Repeated k-means % MSE

72 Variation of results 15 % Random swap Birch1 10 % 5 % 0 % 4.64 Repeated k-means MSE

73 Comparison of algorithms k-means (KM) k-means++ repeated k-means (RKM) x-means agglomerative clustering (AC) random swap (RS) global k-means genetic algorithm D. Arthur and S. Vassilvitskii, "K-means++: the advantages of careful seeding", ACM-SIAM Symp. on Discrete Algorithms (SODA 07), New Orleans, LA, , January, D. Pelleg, and A. Moore, "X-means: Extending k- means with efficient estimation of the number of clusters", Int. Conf. on Machine Learning, (ICML 00), Stanford, CA, USA, June P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), , May A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, , P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pat. Rec. Let., 21 (1), 61-68, January 2000.

74 Processing time

75 Clustering quality

76 Conclusions

77 What we learned? 1. Random swap is efficient algorithm 2. It does not converge to sub-optimal result 3. Expected processing has dependency: Linear O(N) dependency on the size of data Quadratic O(k2) on the number of clusters Inverse O(1/) on the neighborhood size Logarithmic O(log w) on the number of swaps

78 References P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), , August P. Fränti, O. Virmajoki and V. Hautamäki, Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR 08), Tampa, FL, Dec Pseudo code:

fi/web/machine-learning/software Interactive animation:

79 Supporting material Implementations available: (C, Matlab, Java, Javascript, R and Python) Interactive animation: Clusterator:

Pasi Fränti K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017.

Pasi Fränti K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017. K-means properties Pasi Fränti 17.4.2017 K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017. Goal of k-means Input N points: X={x 1, x 2,, x N } Output