Collaborative Clustering with Heterogeneous Algorithms

Size: px

Start display at page:

Download "Collaborative Clustering with Heterogeneous Algorithms"

Griselda Belinda Murphy
5 years ago
Views:

1 Collaborative Clustering with Heterogeneous Algorithms Jérémie Sublime, Nistor Grozavu, Younès Bennani and Antoine Cornuéjols AgroParisTech, UMR INRA MIA rue Claude Bernard, Paris Cedex 5, France Université Paris 13, Sorbonne Paris Cité, LIPN UMR CNRS av. J-B Clément, Villetaneuse, France Abstract The aim of collaborative clustering is to reveal the common underlying structures found by different algorithms while analyzing data. The fundamental concept of collaboration is that the clustering algorithms operate locally but collaborate by exchanging information about the local structures found by each algorithm. In this framework, the one purpose of this article is to introduce a new method which allows to reinforce the clustering process by exchanging information between several results acquired from different clustering algorithms. The originality of our proposed approach is that the collaboration step can use clustering results obtained from any type of algorithm during the local phase. This article gives the theoretical foundations of our approach as well as some experimental results. The proposed approach has been validated on several data sets and the results have shown to be very competitive. I. INTRODUCTION Data Clustering is an important task in the process of knowledge extraction from databases. It aims to discover the intrinsic structure of a set of objects by forming clusters that share similar characteristics. The difficulty of this task has significantly increased over the past two decades when the available data sets saw their volume explode. This increasing complexity as well as new types of data sets distributed over several sites has make it difficult for a single clustering algorithm to give competitive results. This problem can however be tackled more easily by having several clustering algorithms working together. Most of the existing approaches to this problem rely on consensus-based methods [1], [2], [3], [4], [5] which aggregate (or fuse) the clustering results of all algorithms. Then a voting technique or a clustering technique is applied in order to get a unique final clustering result. Other methods to solve this kind of problem have recently appeared in the form of collaborative clustering techniques. Collaborative Clustering is an emerging problem in data mining and only few works on this subject can be found in the literature, (e.g. [6], [7], [8], [9]). The fundamental concept of collaboration is that the clustering algorithms operate locally (on individual subsets) and then collaborate by exchanging information about their structures and results in order to improve their models [6], [9], [10]. Collaborative techniques therefore differ from consensus-based methods in the sense that they aim at a mutual improvement of all participating algorithms whereas the consensus methods aims at providing a single result for all the algorithms. While the idea of collaboration is clearly more interesting, unlike consensus-based methods, so far all proposed collaborative techniques imply that the collaborating algorithms must be from the same algorithm family. In this article we propose a solution to previous limits in the form of a collaborative framework that works with most clustering algorithms regardless of their algorithm family. The exact context of our framework is known as horizontal collaboration [9]: All algorithms are working either on subsets representing the same data in different feature spaces, or on the exact same data searching for a different number of clusters, or a mix of both. The rest of the article is organized as follows: In Section 2, we present our proposed collaborative approach. Section 3 shows the adaptation of our collaborative framework to several clustering algorithms. In Section 4, we present some test of the proposed approach on different data sets. Finally the paper s with a conclusion and future works to ext the range of the proposed method. II. A. Principles COLLABORATIVE CLUSTERING WITH HETEROGENEOUS ALGORITHMS In this section we consider a group of clustering algorithms C = {c 1,..., c j } and a data set X = {x 1,..., x N }, x i R d. We suppose that our clustering algorithms are working on the same data, but may or may not have access to all the columns attributes. We note S = {s 1,..., s N }, s i 1..K, the solution vector that each local algorithm will provide. S = arg max(p (S X, Θ)) = arg max (P (X S, Θ)P (S)) S S (1) We make the assumption that all our clustering algorithms uses clusters that can be modeled by a law of parameters Θ.

2 Each algorithm may be based on a different statistical models (gaussian, multinomial, poisson, etc.) and may search for a different number of clusters. We make the hypothesis that our collaborative method can only be applied to algorithms that are trying to optimize an equation similar to equation (1). Such algorithms include: The K-Means algorithm and its variants, all Expectation Minimization based algorithms [11], several unsupervised neural network algorithms such as the SOM algorithm and the GTM algorithm, and a few algorithms from image segmentation such as the Iterated Conditional Modes [12]. Equation (1) can be solved using a local minimization process as shown below: P (X S, Θ)P (S) = t P (x t s t, θ st )P (s t ) (2) In equation (2), the main issue comes from P (s), the probability of occurrence of each cluster which is a priori unknown. Each specific algorithms implies its own hypothesis about P (s) : Equiprobability of all clusters. The hypothesis assume that all clusters have the same probability of occurrence. This is the hypothesis made in the K-Means algorithm and some version of the EM algorithm. A posteriori measured probability. This method is followed by most probabilistic unsupervised classifiers. The main idea is to measure the occurrence of all clusters after each iteration. The local probability hypothesis. This is mostly used in computer vision and works as follows : Instead of measuring a global P (s) for the whole data set, P (s) will dep on the neighborhood configuration of the observed data, and thus its value will be decided deping on the clusters assigned to the neighbor data (think about pixel depencies). For our collaborative process, we decided to follow a hypothesis similar to the one made in computer vision: during the collaborative step, we will consider that each cluster occurrence probability P (s) will not be a global probability. Instead, P (s) will be bound to the clustering choices made by the other algorithms for the same data point. First, in the local step each clustering algorithm will process the data it has access to and produce a clustering result in the form of a solution vector. The result of this local step will be a matrix similar to Table I where each column is a solution vector proposed by an algorithm. Then, from these solutions vectors, we will compute probabilistic confusion matrices that will be used in the collaborative step to evaluate P (s). This idea is inspired from another work [13], where the same confusion matrices are computed for a consensus-based method. The first step of our collaboration process is to map the clusters of the different algorithms. There is no reason or obligation for two given clustering algorithms c i and c j to be looking for the same number of clusters. TABLE I. EXAMPLE OF RESULTS AFTER THE LOCAL STEP X\C c 1 c 2... c j x x x x N We note Ψ ci cj the probabilistic confusion matrix mapping the clusters from algorithm c i to the clusters of algorithm c j. Ψ ci cj s a,s b the probability of having a data being put in the cluster s b of the clustering algorithm c j if it is in the cluster s a of algorithm c i. Once all the matrices Ψ ci cj have been computed by browsing all the results vectors, it is possible to start the collaborative step. For a given clustering algorithm c i, given the matrices Ψ and the results of the others algorithms from the local step, the local equation to be maximized regarding to s during the collaborative step becomes as shown in Equations (3) and (4). P (x s, θ s )P (s) = P (x s, θ s ) P (s s x,cj ) (3) What we propose here is to use the solution vectors produced by the other algorithms in order to improve the local results. In equation (3), we note s x,cj the cluster decided for data x by algorithm c j. Then, we assume that the P (s s x,cj ) are indepent from each others. Given this assumption, we can use the matrices Ψ previously retrieved and get the following equation: P (x s, θ s )P (s) = 1 Z P (x s, θ s) Ψ cj ci s x,cj,s (4) The general process of our collaborative method is shown in Algorithm 1. Algorithm 1: Collaborative Clustering for Heterogeneous Algorithms : General Framework Local step: for each clustering algorithm do Apply the clustering algorithm on the data X. Initialize the local parameters Θ Compute all Ψ ci cj matrices Collaboration step: while the results are not stable do for each clustering algorithm do Run one iteration of the modified algorithm using Equation 4. Update the solution vectors S Update the local parameters Θ Update all Ψ ci cj matrices

3 It is important to highlight that since the consensus matrices Ψ are relying on the original algorithm results to be built, there is no way that the collaborative process will have any positive effects on results that are extremely poor. In such a case the collaboration matrices built using the contingency probability would bring additional noise and may make the results even worse. Our method is inted to improve the results of clustering algorithms having already decent performances by their own. In this case, our collaboration model will help to detect and fix cluster labels that are unlikely given the results of the other algorithms included in the collaborative process. Furthermore, the process becomes more effective when the number of clusters to be found, the size of the data set, and the number of collaborating clustering algorithms increases. During our experiments, we have even detected that the worst case scenario would be to have only two mildly performing clustering algorithms working together and searching for only two classes. B. Exemples of collaborative schemes In this subsection, we introduce a few cases where our collaborative framework can be used. Fig. 1. Collaborative clustering algorithms working in parallel Figure 1 shows an example of 3 collaborative clustering algorithms working in parallel and improving each others results. This is the scenario of horizontal collaboration that we have been mentioning since the beginning of this article. Concrete examples of such a collaborations are: several algorithms working on several sites of a distributed data set, several algorithms accessing the same data features but looking for a different number of clusters (satellite image multiscale analysis for example), algorithms using different models working on a difficult dataset distributed or not. In all three cases, a mutual improvement on the results would be beneficial. Another possible scenario that our model can handle is shown in Figure 2. In this case, the EM algorithm uses the information from two others classifiers to improve its own results. Such a one sided information transfer would be beneficial if the two other clustering algorithms are specialized algorithms capable of detecting very specific elements of the observed data. The more generic EM algorithm working with those two clustering algorithms would then benefit from Fig. 2. Collaborative EM reinforced by two algorithms the information provided by the two specialized clustering algorithms. In this case, the process would be analog to a reinforcement learning process rather than a collaborative one. III. EXEMPLES OF APPLICATIONS OF OUR COLLABORATIVE FRAMEWORK In this section, we show how to use our collaborative framework with some Gaussian-Mixture Model based clustering algorithms. First, we show an application to the Expectation Maximization (EM) Algorithm [11] when considering a gaussian distribution, then we adapt our framework to the K-Means algorithm, and finally we show a more complex adaptation with to the ICM Algorithm for MRF-based image segmentation. While in this article we only show our method applied to clustering algorithms using Gaussian emission laws, it could be adapted to any algorithm using any other probabilistic model. This model is also compatible with fuzzy clustering algorithms albeit at the cost of more computation power to determine the confusion matrices. A. Collaborative EM and K-Means Algorithms In the following example, we suppose the clusters s 1..K to follow a Gaussian distribution. We consequently have θ s = {µ s, Σ s } where µ s is the average vector of cluster s and Σ s its co-variance matrix. Given this model, we can consider the K- Means algorithm to be a degenerate case of the EM algorithm with a co-variance matrix being the Identity. Then, in the case of the Expectation-Maximization algorithm, when the results of the collaborating algorithms are fixed, Equation (3) can be developed as follows: P (x s, θ s )P (s) = 1 Z N (µ s, Σ s, x) where: Ψ cj ci s x,cj,s (5) Z is a normalizing constant or partition function that is indepent of s. The Ψ cj ci are the consensus matrices mapping the other algorithms clusters to the ones of the current algorithm c i.

4 An expression similar to Equation (5) can be found for the K-Means Algorithm: P (x s, θ s )P (s) = 1 Z N (µ s, I d, x) Ψs cj ci x,cj,s (6) The original EM algorithm in its collaborative step is then modified as shown in Algorithm 2 Algorithm 2: Collaboration step for the EM Algorithm Retrieve the initial Ψ matrices while the results are not stable up to small variations do E-Step: S = arg max S P (X S, Θ)P (S): for each x X do s = arg max s P (x s, θ s )P (s) using Equation (5) M-Step: Θ = arg max Θ P (X S, Θ) Update Ψ B. Collaborative Hidden Markov Fields Model In this subsection we introduce an adaptation of our model to the specific case of the Hidden-Markov Model for computer vision [14] using the Expectation Maximization and Iterated Conditional Modes framework. In this case, the original algorithm already uses a Markov random Field based model to take into consideration the inner depencies between the data. Therefore, by adding our collaborative term, we get the following formula: 1 Z N (µ s, Σ s, x) v V x exp U N (s v,v) Ψ cj ci s x,cj,s (7) The previous equation can then be turned in to an MRF style energy function by applying a log function on the previous Equation. The result is shown in Equation (8): U MRF (s, x, Ψ) = 1 2 (x µ s) T Σ 1 s (x µ s ) + log( Σ s (2π) d ) + U N (s v, s) v V x Where: log(ψ cj ci ) s c j x,s V x is the set containing all the neighboring data of x in the data set U N (s v, s) is an inner neighborhood energy formula based on the clique potentials. The couple Expectation Maximization and Iterated Conditional Modes Algorithm (ICM) [15] is then modified as shown in Algorithm 3. (8) Algorithm 3: Collaborative EM+ICM Framework for MRF segmentation Local step: for each collaborating subset do Apply a clustering algorithm on the subset Initialize Θ and S Initialize Ψ for each collaboration link Collaboration step: while the results are not stable do for each x X do Minimize U MRF (s, x, Ψ) using Equation (8) Update Ψ return (S, Θ) IV. A. Data sets and indexes EXPERIMENTAL RESULTS In order to evaluate our approach, we have applied our algorithm on four data sets of different sizes and complexity: The Iris data set, Wisconsin Diagnostic Breast Cancer (wdbc) and the Spam Base data set from [16], as well as the VHR Strasbourg data set from [17]. We will give more details on the experiments done on this last data set, as it is by far the most difficult data set here. Iris data set (Iris): This data set has 150 instances of iris flowers described by 4 attributes. The flowers can be classified in 3 categories : Iris Setosa, Iris Versicolour and Iris Virginica. Wisconsin Diagnostic Breast Cancer (WDBC): This data set contains 569 instances having 32 parameters (an id, a diagnosis, and 30 real-valued attributes). Each observation is labeled as benign (357 observations) or malignant (212 observations). Spam Base: The Spam Base data set contains observations described by 57 attributes and a label column: Spam or not Spam (1 or 0). VHR Strasbourg: This data set contains observations described by 27 attributes. These data have been extracted from a very high resolution image from the French city of Strasbourg. This data set is much more complex because each data has neighborhood irregular depencies (between 1 and 15 neighbors deping on the data). A partial expert-based groundtruth was provided with this data set but it has a few caveats : First, it covers only 90% of the data. Then, the ground truth was hand-made by expert geographers and the areas they classified do not always match the preprocessed regions of our segmentation. It also features classes are quite difficult to detect such as specific types of crops, or multiples class of forest areas deping on the size of the area covered by the said forest. For these reasons, the indexes computed using this ground truth such as the Rand Index have to be taken with caution.

5 As criteria to validate our approach we have used an internal validation criterion: the Davies-Bouldin Index ; And two external criteria: The clustering accuracy for the data sets from the UCI repository, and the visual aspect of the result and Rand Index for the VHR Strasbourg data set. Davies-Bouldin Index The Davies-Bouldin Index (DB) [18] is a metric to evaluate clustering algorithms. It is an internal validity index that assess the clusters internal compactness and determines if they are well separated from each other. Rand Index The Rand Index [19] is a measure of agreement frequently used in cluster validation. It aims at measuring the agreement between two partitions one given by the clustering process and the other defined by external criteria such as an expert groundtruth. The Rand Index measures the relation between pairs of data set elements without using information from classes (labels) and can be used to detect problems with a clustering algorithm. In the following subsection, we use the Rand Index to compare our results on the VHR Strasbourg data set with the expert-based ground-truth. B. Results In the following, the DB index and accuracy index are shown before and after collaboration in order to analyze the impact of the collaboration on the clustering results. a) Iris: We have split the Iris data set into 2 subsets of size containing respectively the two first and two last attributes. Then, we ran the regular EM algorithm on the two subsets to assess the results of the algorithms to detect the 3 different varieties of Iris flowers. We also ran an extra EM algorithm looking for only two clusters (merging Versicolour and Virginica) on the first subset containing the first two attributes. And then we tried our collaborative EM using these 3 results to collaborate on the two subsets using the same method as the one shown in Figure 1. We note EM 1:2 the results of the regular EM algorithm working on the two first attributes, EM 3:4 the results on the two last attributes, and EM 1:2bis the results on the two first attributes looking for only 2 classes. The notation Algo x (y+z) is used to indicate that algorithm x is collaborating with algorithms y and z. TABLE II. EXPERIMENTAL RESULTS OF THE COLLABORATIVE APPROACH ON THE IRIS DATA SET data set algorithm Accuracy DB Index EM 1: % EM 3: % EM 1:2bis 86.67% EM 1:2 (3:4+1:2bis) 80.67% EM 1:2 3: % EM 3:4 1:2bis 73.33% EM 3:4 1: % EM 3:4 (1:2+1:2bis) 74.00% As one can see, the most interesting result on this data set are those achieved when EM 1:2 collaborates with EM 3:4 and EM 1:2bis. A 5% gain in the accuracy of the final clustering can be observed after the collaboration process: EM 12 (34+12bis). On the other hand the 11% gain observed in EM 1:2 3:4 is more problematic as we can clearly see that the results from the best clustering algorithm are simply replacing the one from the weaker clustering algorithm. The same phenomenon can be observed when the collaboration is done in the opposite direction: EM 3:4 1:2, the results from the weaker clustering algorithm are replaced. However this is not true for EM 34 1:2bis, where the weaker clustering algorithm EM 1:2bis affects the results of EM 3:4. Our interpretation is that in the later case EM 1:2bis can bring some diversity to EM 3:4, while EM 1:2 cannot. Furthermore, when we look at the collaborations EM 3:4 1:2bis, EM 3:4 1:2 and EM 3:4 (1:2+1:2bis), while two of them are deteriorating the results in term of classification purity, we can see that in the three cases the DB Index either remains the same or improves after the collaborative process. It means that in spite of a worst purity result the separation of the clusters is better after the collaboration process. Finally it is worth noting that EM 12 (34+12bis) and EM 3:4 (1:2+1:2bis) give different results, which proves that this collaboration framework is a directional process. TABLE III. EXPERIMENTAL RESULTS OF THE COLLABORATIVE APPROACH ON THE WDBC DATA SET data set Algorithm Purity DB Index EM 1: % EM 11: % EM 21: % EM 1:10 (11:20+21:30) 90.07% EM 11:20 (1:10+21:30) 74.87% EM 21:30 (1:10+11:20) 93.32% b) WDBC: In order to apply the proposed approach on the WDBC data set, we have split it into 3 subsets of size The first subset was featuring the first 10 attributes, the second one attributes 11 to 20 and the last one attributes 21 to 30. Then we ran regular version of the EM algorithm on the 3 subsets and have evaluated the purity and DB Index of the clustering results. And finally, we have run our collaborative EM algorithm on the 3 subsets again and we have used the results of the regular EM on the other 2 subsets for the collaboration process. This process is analog to the one shown in Figure 1. We note EM 1:10 the results of the regular EM algorithm on the 10 first attributes, EM 11:20 the results on the 10 middle attributes and EM 21:30 the results on the last 10 attributes. In EM 1:10 (11:20+21:30), we have the average EM algorithm collaborating with both a weaker and better performing EM, and as we can see in this case the results are almost unchanged, the purity and the DB Index is slightly better. In EM 11:20 (1:10+21:30) the weakest EM algorithm collaborates with two better EM algorithms, and as could be expected both the purity and DB Index are improved. In EM 21:30 (1:10+11:20) as the strongest EM algorithm is collaborating with two weaker ones, we can see that the purity is negatively impacted. However, due to the diversity between the different collaborating algorithms results, this negative impact remains minimal on the DB Index which actually slightly improves. For these 3 results the collaborative process proved to be

6 efficient for all algorithms when it comes to the DB Index. All algorithms scores for the DB Index were improved even when collaborating with weaker algorithms. It is however not the case for the purity which behaved in more predicable way: collaborations with stronger algorithms improved the purity while collaboration with weaker ones deteriorated it. It highlight the importance of choosing carefully the algorithms to collaborate with both in term of result quality and diversity. TABLE IV. EXPERIMENTAL RESULTS OF THE COLLABORATIVE APPROACH ON THE SPAM BASE DATA SET data set Algorithm Purity DB Index EM % EM % GT M a 86.96% GT M b 86.59% EM 1 (GT Ma+GT Mb ) 69.20% EM 2 (GT Ma+GT Mb ) 74.87% c) Spam Base: For our experiments on the Spam Base data set, during the local phase, the EM algorithm proved to give irregular and average results (EM 1 and EM 2 ). Therefore, instead of splitting the data set and risking even worse performances, we decided to reinforce the EM Algorithm results by enabling a collaboration with two results from the GTM (Generative Topographic Mappings) algorithms applied twice on the data set with different parameters [20]. This reinforcement process is analog to the one shown in Figure 2. With the help of the results acquired from the GTM algorithm, our collaborative EM s results have been greatly improved. In the case of EM 1 (GT Ma+GT M b ), from an EM algorithm giving results slightly better than random, we were able to get a sharp improvement in both the purity of the clustering and the DB Index. While the improvements shown in EM 2 (GT Ma+GT M b ) are less drastic, the positive effect of the collaborative reinforcement process cannot be denied. These results show the importance of the quality of the initial segmentation acquired during the local phase as it has to be taken into account for the collaboration. d) VHR Strasbourg: Unlike for the previous experiments, the exact number of clusters for this data set is unknown and deps on the level of detail that one wishes to have : From very large landscape elements such as cities, water bodies or forest areas, down to small details such as trees or cars, there are several possible scales of analysis that would lead to a different number of clusters. Expert geographers provided us a ground-truth with 15 classes for this data set, but it can be reduced to between 7 and 9 potential clusters when merging elements that couldn t possibly be distinguished by an unsupervised algorithm. Therefore, instead of splitting the data, we decided to go for a multi-scale approach on this data set and ran the Semantic Rich ICM algorithm described in [21] looking for different number of clusters: 7, 8 and 9 clusters to match with the expert ground truth, and 5 clusters as it gave the best results from a purely unsupervised point of view. We then proceeded to run the collaborative version for the MRF algorithms searching for 7 and 9 clusters collaborating with the 3 others results. This process is both a reinforcement process and a collaborative process. The first important remark is that unlike for our previous experiments, here the supervised index (Rand Index) always disagrees with the unsupervised Davies-Bouldin Index : Good results from a clustering point of view are here the worst ones when we compare them to our ground truth. As one can see, SRICM9 is the best result from a supervised point of view, while SRICM5 and SRICM7 are the best one from an unsupervised point of view. The collaboration SRICM7 SRICM5+SRICM 8+SRICM 9 improves the DB- Index value despite the poor DB-Index values of SRICM 9 and SRICM8. In this case, the collaboration leads to better clusters at the cost of a segmentation farther from the ground truth. In the case of SRICM9 SRICM5+SRICM 8+SRICM 7 however, the collaboration does not bring any positive effect and deteriorate both DB-Index and Rand Index results. Our current explanation for these diverging results in very similar cases is that it is caused by our asymmetric probabilistic confusion matrices. Since these matrices are asymmetric they probably lead to a collaborative process that is equally asymmetric and does not lead to the same improvement or deterioration of all the collaborating algorithms. For a better analysis of the results, all images shown in section C shall be visualized in color and full scale version. Furthermore, these images are only small extracts from the full image. When we look at the 7-cluster segmentations in Figures 4 and 7, the following changes can be noticed after the collaborative process: First, not all the clusters have the same prototypes, the dark green cluster (dense vegetation) from the segmentation without collaboration does not exist anymore and has been merged along with other vegetation clusters to form a single cluster for vegetation areas. The clusters representing the buildings are not exactly the same either. The main improvement can be seen when looking at the 3 big buildings near the top left corner of the image. Their segmentation is almost homogeneous after the collaboration process. A similar improvement can be seen on the right of the central stadium where the segmentation proposed after the collaborative process is correct whereas the one before is not. Another noticeable amelioration is that there are less unwanted areas wrongly labeled as water after the collaborative process. These observation are confirming what we had already hinted from the DB index result: the collaborative process improved the quality of the segmentation in this case. On the other hand, SRICM 9 (see Figure 6) who was the best segmentation (from a supervised and visual point of view) before the collaborative process has been deteriorated by the 3 weaker algorithms with which the collaborative process was enabled. The most obvious deterioration is the apparent merging of the red and orange clusters thus merging together roads and building in Figure 8. Furthermore, many smaller clusters have seen their population shrink. It is already well documented that MRF-based algorithms for image segmentation t to have the sparser clusters absorbed by the larger ones. This phenomenon is amplified by the fact that the collaboration process that we use is analog to using a second Markov Random Field on top of the original Markov Random Field used during the local step.

VHR S TRASBOURG DATA SET Algorithm SRICM 5 clusters SRICM 7

+SRICM9 SRICM 9 SRICM5 +SRICM8 +SRICM7 Figure Figure 3 Figure 4

66274 3.28659 2.80191 3.35541 Rand Index 0.7307 0.7833 0.7850 0.

A 5-cluster segmentation using the regular SRICM algorithm. Fig. 4.

9-cluster segmentation using the SRICM algorithm. Fig. 7.

7 TABLE V. DB I NDEX R AND I NDEX RESULTS FOR THE DIFFERENT ALGORITHMS ON THE VHR S TRASBOURG DATA SET Algorithm SRICM 5 clusters SRICM 7 clusters SRICM 8 clusters SRICM 9 clusters SRICM 7 SRICM5 +SRICM8 +SRICM9 SRICM 9 SRICM5 +SRICM8 +SRICM7 Figure Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 DB Index Rand Index C. VHR Strasbourg Figures Fig. 3. A 5-cluster segmentation using the regular SRICM algorithm. Fig. 4. A 7-cluster segmentation using the SRICM algorithm. Fig. 5. Fig cluster segmentation using the SRICM algorithm. Fig. 7. Result of the 7-cluster segmentation after collaboration Fig. 8. Result of the 9-cluster segmentation after collaboration 8-cluster segmentation using the SRICM algorithm.

8 V. CONCLUSION In this study we have proposed a new collaborative framework for heterogeneous clustering algorithms in the context of horizontal collaboration. We have shown that our framework is flexible and can also be used in the context of reinforcement learning. The strength of our approach is that it does not need the subsets, prototypes or models used by the different collaborating algorithms to be shared during the collaboration step. Only the solution vectors produced by all algorithms need to be share. Our framework is therefore much more generic than the previously proposed collaborative methods. Our theoretical model has been validated on several data sets and the experimental results have shown competitive performances. However these results were obtained with at most 4 collaborating algorithms and have to be exted to contexts with more collaborating clustering algorithms and more data sets. This would certainly be our first perspective to ext this work. One could also argue that in our experiments, the collaborative process never s up with a results better than the one of the best algorithm result from the local phase. While it did not happen during our experiment, it does not mean that it cannot happen at all. Furthermore, the concept of best result in unsupervised learning is highly depent on the index considered and it is impossible to know in advance which algorithm will give the best results. Therefore, in most cases and especially with distributed data, our approach remains valid. Given this issue of strong and weak collaborators, another possible extension to this work would be to study the impact of clustering results quality and diversity in order to weight or choose with which algorithm to collaborate. ACKNOWLEDGMENT This work has been supported by the ANR Project COCLICO, ANR-12-MONU [5] W. Pedrycz and K. Hirota, A consensus-driven fuzzy clustering, Pattern Recogn. Lett., vol. 29, no. 9, pp , [6] W. Pedrycz, Collaborative fuzzy clustering, Pattern Recognition Letters, vol. 23, no. 14, pp , [7] M. Ghassany, N. Grozavu, and Y. Bennani, Collaborative clustering using prototype-based techniques, International Journal of Computational Intelligence and Applications, vol. 11, no. 3, [8] B. Depaire, R. Falcon, K. Vanhoof, and G. Wets, PSO Driven Collaborative Clustering: a Clustering Algorithm for Ubiquitous Environments, Intelligent Data Analysis, vol. 15, pp , January [9] M. Ghassany, N. Grozavu, and Y. Bennani, Collaborative clustering using prototype-based techniques. International Journal of Computational Intelligence and Applications, vol. 11, no. 3, [10] N. Grozavu and Y. Bennani, Topological collaborative clustering, Austr. J. Intelligent Information Processing Systems, vol. 12, no. 3, [11] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B, vol. 39, no. 1, pp. 1 38, [12] J. Besag, On the statistical analysis of dirty pictures, Journal of the Royal Statistical Society. Series B, vol. 48, no. 3, pp , [13] G. Forestier, P. Gancarski, and C. Wemmert, Collaborative clustering with background knowledge. Data & Knowledge Engineering, vol. 69, no. 2, pp , [14] S. Roth and M. J. Black, Fields of experts, in Markov Random Fields for Vision and Image Processing, A. Blake, P. Kohli, and C. Rother, Eds. MIT Press, 2011, pp [15] Y. Zhang, M. Brady, and S. M. Smith, Segmentation of brain mr images through a hidden markov random field model and the expectation maximization algorithm. IEEE Trans. Med. Imaging, vol. 20, no. 1, pp , [16] A. Frank and A. Asuncion, UCI machine learning repository, [Online]. Available: [17] S. Rougier and A. Puissant, Improvements of urban vegetation segmentation and classification using multi-temporal pleiades images, 5th International Conference on Geographic Object-Based Image Analysis, p. 6, [18] D. L. Davies and D. W. Bouldin, A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1, no. 2, pp , Feb [19] W. Rand, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association., pp , [20] C. M. Bishop, M. Svensen, and C. K. I. Williams, Gtm: The generative topographic mapping, Neural Computation, vol. 10, pp , [21] J. Sublime, A. Cornuéjols, and Y. Bennani, A new energy model for the hidden markov random fields, in ICONIP 2014, Part II, Lecture Notes in Computer Science, vol. 8835, 2014, pp REFERENCES [1] A. Strehl and J. Ghosh, Cluster ensembles a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, vol. 3, pp , [2] A. D. Silva, Y. Lechevallier, F. de A. T. de Carvalho, and B. Trousse, Mining web usage data for discovering navigation clusters. in ISCC, P. Bellavista, C.-M. Chen, A. Corradi, and M. Daneshmand, Eds. IEEE Computer Society, 2006, pp [3] A. Cornuéjols and C. Martin, Unsupervised object ranking using not even weak experts, in ICONIP (1), ser. Lecture Notes in Computer Science, B.-L. Lu, L. Zhang, and J. T. Kwok, Eds., vol Springer, 2011, pp [4] K. Jong, J. Mary, A. Cornuéjols, E. Marchiori, and M. Sebag, Ensemble feature ranking, in PKDD, ser. Lecture Notes in Computer Science, J.-F. Boulicaut, F. Esposito, F. Giannotti, and D. Pedreschi, Eds., vol Springer, 2004, pp

A New Energy Model for the Hidden Markov Random Fields

A New Energy Model for the Hidden Markov Random Fields Jérémie Sublime 1,2, Antoine Cornuéjols 1, and Younès Bennani 2 1 AgroParisTech, INRA - UMR 518 MIA, F-75005 Paris, France {jeremie.sublime,antoine.cornuejols}@agroparistech.fr