Multi-Clustering Centers Approach to Enhancing the Performance of SOM Clustering Ability

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 25, 1087-1102 (2009) Multi-Clustering Centers Approach to Enhancing the Performance of SOM Clustering Ability CHING-HWANG WANG AND CHIH-HAN KAO * Department of Construction Engineering National Taiwan University of Science and Technology Taipei, 106 Taiwan * Department of Construction Engineering National Kinmen Institute of Technology Kinmen, 892 Taiwan This paper modified the mechanism of weight adjusting of the Self-Organizing Mapping network (SOM) for solving the problems of topology preserving and clarifying boundary of clustering graph for the clustering analysis. The modified SOM is named the Multiple Clustering Centers SOM (MCC-SOM). The MCC-SOM changed the competitive learning mechanism of winner takes all to allow the more one clustering centers that can cause the graph of neighboring clusters with blurring boundary to focus on each of the cluster centers, and highlight the boundary of each cluster. The mechanism also can automatically set the units of topology without appropriate setting, and promote the topology preserving of graph by the consistency between the standard of weight adjustment and the case. Through the case studies, it is evident that the MCC-SOM can modify the performance of the SOM model. Thus, by using the MCC-SOM, the analysts can use the output topology to produce more precise result of classification of cases, and enhance the correct percent of the afterward predicting or classifying models. Keywords: clustering analysis, output topology, topology preserving, multiple clustering centers, self-organizing mapping 1. INTRODUCTION Clustering analysis is an effective tool of data extracting in the data mining process. Most of the original clustering analyses use the k-mean method. But in fact, the k-mean method has unfavorable filtering ability for the noise of background environment, and must pre-set the amount of cluster for calculation [1]. The above mentioned problem causes the disregard for the cluster with blurring boundary in the topology, and it also causes the output to not be precise. Thus, there are several modified algorithms of the clustering analysis. However, those modified algorithms of the clustering analysis cannot solve the problems about the presetting of the calculation and the pre-processing for the data of the clustering analysis [2]. Vesanto [3] cited that the following researches of the clustering analysis adopt the unsupervised learning type of neural network the Self-Organizing Mapping network (denoted as SOM in this paper) as the one algorithm of the clustering analysis. The SOM model has the ability of fault tolerance networks, and do not have to preset the amount of cluster. It is more flexible than the other algorithms of the clustering analysis, and it be- Received September 10, 2007; revised April 9 & July 18, 2008; accepted August 22, 2008. Communicated by Chung-Yu Wu. * Corresponding author. 1087

1088 CHING-HWANG WANG AND CHIH-HAN KAO comes one of the major algorithms. However, when the SOM model is used for the clustering analysis, the model has the requirement for solving the problems of topology preserving and clarifying boundary of the clustering graph [4, 5]. When the SOM is applied to construct the predicting model for the decision making of the construction engineering, the historical cases of construction engineering do not have the consistent background condition and the same evaluating standards. The SOM model is usually used to create the desired outputs of training data. In other words, the data is pre-processing for the predicting model construction. However, due to the ability of fault tolerance of the SOM model, the flexible network structure of the SOM causes the topology twisting to eliminate the topology preserving. The SOM is used to pre-process the data of model construction. Moreover, the analysts must use the precise result of classification when the training process of the supervised neural networks simplifies the calculation process. Based on this requirement, the boundary between the neighboring clusters needs more specifics than the application on the other fields of the SOM model. Lo [6] proposed the approach of optimizing learning parameter of the SOM for dealing with the above mentioned problems. His quoted algorithms of modified mechanisms are the optimized learning environment of weight adjusting of the SOM. These algorithms have the same target of promoting the convergence of network learning. Thus, the modified mechanisms of weight adjustment can be found to be directly related to the promotion of topology preserving of the SOM. To overcome the shortcomings mentioned above, this paper develops a modified SOM model with the new weight adjustment mechanism. The modified SOM model can improve the topology preserving of the SOM and also can clarify the shape of clustering graph. Furthermore, it enhances the correct percent of the SOM for the clustering analysis. 2. BASIC CONCEPT OF MODELING Concerning the discussion mentioned before, this paper will survey the related papers for three issues in order to construct the basic concept of the modified SOM model. The three issues are the relation between the weight adjustment and the topology preserving, the feasibility of the multiple clustering centers, and the modification of competitive theory. First, this paper discusses the issue of the relation between the weight adjustment and the topology preserving. It refers to Dittenbach [7] that proposed: The topology adopts the difference between the feature values of cases and the weights of output units as the evaluating standard of the structure adjusting. The adjusted distribution of output units of topology increase or decrease the output units. The mechanism of adjusting output generates the distribution of output points of cases to match the original shape of the mapping graph of the inputting features. The output units are deleted to decrease the influence for the topology that has not been influenced by the imported cases. Furthermore, the output units are flexibly added to increase the ability of data transaction of the units located on the zone of high density of original topology. The previously quoted mechanism of adjusting output units causes the positive influence of the correctness of weight adjustment. Likewise, it can improve the topology preserving.

MULTI-CLUSTERING CENTERS APPROACH 1089 Second, attention has been directed to the feasibility of the multiple clustering centers. This paper refers to the evaluation model of the topology preserving in the Vesanto s research [3] to conclude the following viewpoint: Based on the goal of the topology preserving promoting, the evaluation equation of topology preserving has the multiple benchmarks of topology distance measuring (clustering center). This can prove that it has the possibility of multiple clustering centers in a single cluster. In additional, Martinetz [8, 9] proposed the soft-max concept of the fuzzy c mean cluster model. The fuzzy c mean cluster model uses the fuzzy theory to set the flexible threshold for electing the clustering center. The clustering center defining is not limited by the rule that the output unit must be absolutely similar to the case as the clustering center. It can enhance the representative ability of clustering center for the original information that is defined by the fuzzy c mean cluster model. Through the above discussion, this paper can also prove that the clustering center can be diversified. Finally, this paper discusses the issue of the modification of the competitive theory. DeSieno [11] proposed the conscience mechanism of the modified competitive theory. He claimed that the weight adjustment needs to attend to the following possibility. The other output units in the same cluster (except the clustering center) may have the possibility for the representation of cases. The weight adjusting mechanism of the SOM should restrain the influence of cluster center for the weight adjustment of other output units, and should increase the influence for all output units. Thus, the standard of weight adjustment equation is not only from the clustering center. Moreover, Si [12] proposed the concept of the winner take quota. The concept also claims that the treatment of the winner unit need to amend. It can avoid ignoring the potential output unit that the output unit needs to adjust. Concerning the above algorithm, this paper confirms the feasibility that the single cluster with the multiple clustering centers can reach the optimized weight combination for the best topology preserving and the optimum output topology. As discussed above, this paper confirms the feasibility of the performance promotion of the modified SOM model so that the SOM model adopts the multiple clustering centers mechanism. Furthermore, the authors have implemented this basic concept to construct the Multi-Clustering Center Self-Organizing Mapping model (MCC-SOM). As Fig. 1 is detailed, the MCC-SOM model replaces the original competitive theory that the mechanism of adjusting the amount of the output unit is replaced to the mechanism of the flexible amount and the changeable location for the clustering center in the topology space with the fixed amount of output unit. Therefore, it eliminates the influencing level of weight adjustment from the single clustering center, and attends the influencing level of weight adjustment from the other output units. More precisely, this paper constructs the modified SOM model by replacing the rule of weight adjustment of the SOM, that is, the winner takes all. It is the original theory of competitive learning, that the winner of the output unit is the single clustering center. The clustering center is the standard of topology distance measuring for the topology distance that is between the other output units and the cluster center in the neighborhood area. The topology distance measuring of weight adjustment causes the all output units toward the clustering center in the neighborhood area (as shown in Fig. 2). The MCC-SOM model uses the difference between the weights of the output unit

1090 CHING-HWANG WANG AND CHIH-HAN KAO Adding the weight adjusting influence of the others output unit with representative ability Reducing the influence strength of the main clustering center The identification of the clustering centers The mechanisms of weighting adjusting for each locations combinations of the clustering centers The MCC-SOM model Fig. 1. The basic concept of the MCC-SOM model. Main Cluster j Topology space A Fig. 2. The neighboring area of the single clustering center. Main Cluster j Sub Cluster Topology space A Fig. 3. The neighboring area of the multiple clustering centers. j1 and the feature values of cases as the electing threshold of the sub-clustering center. The output units also are the sub best matching units and the weight values of output units are closed to the feature values of cases. It can add the reference of adjusting weight of other output units in the neighborhood area (as shown in Fig. 3). The modified mechanism of competitive theory by the multiple clustering centers leads the weight adjustment of output unit to be more conforming to the characteristics of cases. Moreover, the modified mechanism of weight adjusting has the advantage because the mechanism handles the higher level of the integrated ability for the output units. Also, the mechanism can avoid adding too many output units. Excessive output units will increase the computing cost, and generate the wide distribution of output points of case, leading to the fault of the clusters with the blurred boundary. The mechanism of multiple clustering centers calculates the multi-standards of weight adjustment in the same time.

MULTI-CLUSTERING CENTERS APPROACH 1091 3. PROCESS OF MODEL CONSTRUCTION This paper establishes the modified competitive theory with the multiple clustering centers as the new basic concept of weight adjustment of the SOM, so as to construct the MCC-SOM model. Fig. 4 shows the processes of calculation for the MCC-SOM model. Beginning The intinal setting of the parameters Amount of dimension, Amount of output unit, Shape of topology,initial value of weight, Radius of neighborhood area Identification of clustering center Calculating EI and net of output unit, Examining the threshold, Citing clustering center Electing subclustering center Yes Determining status of related location between clustering center Yes No No Calculating topology distances of each clustering center Calculating location of virtual clustering center and topology distance Weight adjusting Calculating topology distance and weight adjustment Reducing radius of neighborhood area Examining term of learning termination Epoch and preset value is equal The analysis for unknown case The regonizing output of cluster END Fig. 4. The flow chart of calculation of the MCC-SOM model.

1092 CHING-HWANG WANG AND CHIH-HAN KAO 3.1 Initial Setting Items of MCC-SOM In the first process, the MCC-SOM model sets the output type of topology so that the parameters are the amount of dimension, and the coordinate system of topology. The topology space is usually presented by two dimensions, and this paper uses the coordinate of the two dimensions, m, n, as the code of output unit in topology. Additionally, the model selects the shape of topology and the amount of output unit, etc. Next, the model sets the radius σ and the reducing function of neighborhood to converge the learning of weight adjustment. The model also selects the measuring function of topology distance as the important variable of weight adjustment. Finally, the model sets the initial value of the kth weight of the output unit in the m, n code location of coordinates (denoted as w kmn in this paper). The authors use W mn = (w 1mn, w 2mn,, w kmn ) to present the weight group of the output unit. Moreover, the model normalizes the inputting values of cases to be the feature vectors (denoted as X = [x 1, x 2,, x k ] to present the feature vectors of cases). Those feature vectors are imported to the calculation of the MCC-SOM model. The above mentioned items of initial parameters of the MCC-SOM model are shown in the Table 1. Those parameters are the basis of following the calculations of the MCC- SOM model. Table 1. The initial parameters of the MCC-SOM model. Item Value Item Value Amount of the inputting layer 1 Amount of feature vector 12 Amount of adding vector 0 ~ 1 Amount of the outputting layer 1 Range of the output 0 ~ 1 Topology function Grid Topology Amount of the hidden layer 0 Topology distance function Euclidean Radius of neighboring area 1 Amount of the output unit 6 * 6 Reducing function of the Threshold of sub clustering Guass radius of neighboring area center electing 1.25 * net Amount of weight 36 * 12 Learning rate 0.9 Amount of iteration 1000 3.2 Electing Terms of the Clustering Center and Sub-Clustering Center This paper develops the MCC-SOM model by the modification of an electing rule of the clustering center for the competitive theory. Furthermore, the model imports the feature values of samples to the MCC-SOM model. The model calculates the differences between the weights of the output unit and the feature values (denoted as net in this paper). It uses the sum of the net of one output unit to be an index (denoted as EI in this paper) for filtering the clustering center. The output unit is chosen as the Best Match Unit (denoted as B.M.U. in this paper) and has the lowest EI value. The mathematical equation of the clustering center electing is expressed in the Eq. (1): (i 1, j 1 ) = arg min k(x W mn ) (B.U.M.). (1)

MULTI-CLUSTERING CENTERS APPROACH 1093 The basic concept of the clustering center is that the output unit is the most similar to the characteristics of samples. The model selects the output unit with the smallest EI value to be the clustering center. Moreover, the model implements the basic concept to allow more clustering centers with output units that have the lower EI values. Therefore, the MCC-SOM model selects the output units as the sub-clustering centers that the EI values of sub-clustering centers close to the lowest EI value, while the EI is lower than the threshold. The sub-clustering centers should be authorized the influencing right of weight adjusting for the other output units in the same neighborhood area. (To easily discriminate from the clustering center and the sub-clustering center, this paper names the clustering center as the main clustering center in the following description.) The mathematic equation of the sub-clustering center election is expressed in Eq. (2): (i 2, j 2 ) = arg min m 1,n 1 k(x W mn ) If net k (i 2, j 2 ) S net k (i 1, j 1 ) Then Y(i 2, j 2 ) = 1, (sub B.U.M.). (2) The value of Y in Eq. (2) is a binary index for indicating the output unit, whether it is the clustering center or not. The topology distance between the clustering center and the other output units is the important variable of following weight adjustment of the MCC-SOM model. 3.3 Weight Adjustment by the Combination of Main Clustering Center and Sub-Clustering Center The model calculates the topology distances between each of the output units of one cluster and the main clustering center after that the clustering centers have been identified. Hence, the equations of weight adjustment of output units are modified to use the multiple topology distances when the single clustering center of one cluster transfers to the multiple clustering centers. It is the extending algorithm of the multiple clustering centers. The definition of the neighborhood area of cluster is the area that is included from the main clustering center or the sub-clustering center by the radius (denoted as σ in this paper). The output units are defined by the neighborhood area, whether the output units need to adjust the weights or not. The MCC-SOM model evaluates the two variables that select the equation of weight adjustment. The one variable is the topology distance between the two types of clustering center (denoted as r ij in this paper). The other variable is the radius of the neighborhood area σ. The relative locations of two types of combinations of clustering centers have been defined. However, there are two types of intersection latitudes between the neighborhood area of the main clustering center and the sub-cluster center. One is where the neighborhood areas are almost fully overlapping, and the other one is where the neighborhood areas are not overlapping. The equations of the weight adjusting of the output unit are also different.

1094 CHING-HWANG WANG AND CHIH-HAN KAO 1. Neighborhood Areas that are Almost Fully Overlapping When the locations of the main clustering center and the sub-clustering center are very close, the standard of the overlapping latitude of neighborhood areas is defined by this paper as the topology distance between two types of clustering centers being lower than 0.1σ. Because the neighborhood areas of two types of clustering centers almost overlap, the difference of topology distance between the same output unit and each clustering center in the same cluster is small. Furthermore, the model uses the proportional relation of the distances between the each clustering center and the same output unit to calculate the averages of the weights of each dimension. Those averages of weights can set a virtual clustering center that presents the combination of the main clustering center and the sub-clustering center. The mathematic equation of weight calculation of the virtual clustering center and the weight adjustments of output units are expressed in Eq. (3.1). If [ k(w i1j1 w i2j2 ) 2 ] 1/2 0.1σ Then r ij = S 1 r i1j1 + S 2 r i2j2 If σ r ij W ij = S 1 W i1,j1 + S 2 W i2,j2 2 2 H ij = exp( rij /2 σ ) W mn (t + 1) = W mn (t) + η H i,j [X W i,j (t)] Else W mn (t + 1) = W mn (t) Else Goto Eq. (3.2) (3.1) In the iteration of the learning process, the model refers to the location of virtual clustering center so as to calculate the topology distance and the range of weight adjustments of output units by Eq. (3.1). When the topology distance between two types of clustering center is higher than the presetting threshold of the selection for the virtual clustering center, the calculation of topology distance and range of weight adjustments of output units use the Eq. (3.2) that is developed in the next section of this paper. 2. Neighborhood Areas are not Overlapping When most of the neighborhood areas of the main clustering center and the subclustering center are separated, the weight adjustments of output units in the neighborhood area of clustering center of the MCC-SOM are the same as the original SOM. However, the candidate output units of the sub-clustering center shall be located in the neighborhood area of the clustering center. Therefore, the weight adjustment output units also need to consider the influence from the sub-clustering center on the output units located on the overlapping part of neighborhood areas between the two types clustering centers. When the model adjusts the weight of output units that are located on the non-overlapping part of neighborhood area between the two types clustering centers, the model treats the main clustering center or the sub-clustering center as an independent system. The mathematic equations of the main clustering center or the sub-clustering center are expressed in Eq. (3.2).

MULTI-CLUSTERING CENTERS APPROACH 1095 If σ r1 ij and σ r2 ij Then W mn (t + 1) = W mn (t) + S 1 η 1 H i1,j1 [X W i1,j1 (t)] + S 2 η 2 H i2,j2 [X W i2,j2 (t)] If σ r1 ij and σ r2 ij Then W mn (t + 1) = W mn (t) + S 1 η 1 H i1,j1 [X W i1,j1 (t)] If σ r1 ij and σ r2 ij Then W mn (t + 1) = W mn (t) + S 2 η 2 H i2,j2 [X W i2,j2 (t)] Else W mn (t + 1) = W mn (t) (3.2) When the topology distance between the two types of clustering centers and the radius of the neighborhood area have been evaluated, the MCC-SOM model can select the appropriate equation of weight adjustment. 3.4 Examining Termination Criteria Because the MCC-SOM model is the unsupervised learning type of neural network, the termination decision of next iteration proceeding is made by the presetting epoch. The mathematic equation of examining termination criteria is expressed in Eq. (4). IF t > t termin THEN the learning procedure is terminated (4) If the present epoch is smaller than the threshold of presetting epoch, the MCC- SOM model then randomly imports the next feature values of case. Additionally, the epoch is added, the model reduces the radius of neighborhood area and the learning ratio as the parameters of the next iteration. It helps the convergence learning of the MCC- SOM model. The neighborhood function of the next iteration is also recalculated, and the mathematic equation is expressed in Eq. (5). H mn = 2 2 mn exp( r /2 σ ) (5) The reduced range of neighborhood function is calculated by Eq. (5). It uses the logarithm ratio between the topology distance of the output unit and the radius of the neighborhood to calculate the reduced range of the neighborhood area. 3.5 Clustering Analysis of Unknown Case After the weight of the MCC-SOM model has been adjusted, the feature values of the unknown case are imported into the model. The feature values are compared with the weight of all output units in the topology to find the output unit that has the smallest EI value. This location of the selected output unit is the location of the unknown case mapped into the topology space. Moreover, the output unit indicates the feature of the unknown case. The unknown case can be identified the cluster by the location of this output unit. Additionally, the distributed density of all cases can be presented by the output topology.

1096 CHING-HWANG WANG AND CHIH-HAN KAO 3.6 Evaluated Index of Clustering Performance To evaluate whether the MCC-SOM model is suitable to the requirement of data proceeding ability of cluster or not, this paper needs an objectively evaluated index of clustering performances to evaluate the level of promoting clustering ability of the MCC- SOM model. When the SOM model is applied to the clustering analysis, there are the two requirements: the preserving topology and the focusing status of the topology graph of the cluster. First, the topology preserving represents the accuracy of output that the output layer is connected by the weight combination. The weight combination makes sure that the information is not twisted in the process of mapping. It can evaluate whether the MCC-SOM model will satisfy the requirement of the topology preserving of the clustering analysis or not. Second, the focusing status of the topology graph of cluster presents the precise location identification of the cluster. It can avoid the confusion of classification of output points located in the zone between each cluster. This paper implements the AEI to evaluate the topology preserving of the MCC- SOM model. The AEI index is the average of EI value of all output units in the topology. The mathematic equation of AEI is expressed in Eq. (6). AEI = mn k(x W ij )/(m n) (6) Besides, this paper refers to the suggestion of Vesanto [2] and Dimitriadou [14] to implement Davies-Bouldin index [15] index to evaluate the focusing status of topology graph of cluster from the MCC-SOM model. The Davies-Bouldin index can use the topology distance of the internal cluster and the separating clusters to calculate the index. The internal topology distance of the cluster (L) and the separating topology distance of the cluster (D) are expressed in Eqs. (7) and (8). L = C x=1~m,y=1~n W mn W i1j1 /(N c 1) (7) D = C W C1i1j1 W C2i1j1 (8) Therefore, the Davies-Bouldin index uses the above mentioned two types of topology distance to be calculated. David Bouldin index = 1/N d c max c1 c2 {[L(C1) + L(C2)]/D(C1, C2)} (9) 4. CASE STUDY AND RESULT ANALYSIS To prove the SOM model with the single clustering center of weight adjusting mechanism can be modified by the MCC-SOM model to enhance the correctness and the explicit level of clustering result, this paper uses Kohonen s SOM model as the benchmark of performance comparison, and adopts the financial ratios from large-scale construction contractors in Taiwan as the case study. The number of financial ratios of contractors for the training data is 868. After the case study has been calculated, this paper analyzes the distribution result of output topology of the case study.

MULTI-CLUSTERING CENTERS APPROACH 1097 4.1 Comparison of EI of Output Topology for Choosing Threshold Value As mentioned in section 3.2, the threshold value of sub-clustering center (the S ratio in Eq. (2)) can decide the amount of sub-clustering center. Because the adjusting range of weight of output unit in the SOM is decided by the topology distance between output unit and cluster center, the amount of sub-cluster center will effect the learning converge of the MCC-SOM. The status of learning converge of the MCC-SOM will be shown in the AEI. Hence, in this paper, it is needed to choose a suitable threshold value of sub-clustering center to make the MCC-SOM learning converge well. This paper utilizes an arithmetic progression to set the threshold value, which ranges from 0.9 to 0.5. Thus, the authors use those different threshold values as the variable of a sensitive analysis of the MCC-SOM, and employ the lowest AEI of the MCC-SOM as the critical for choosing suitable threshold value. Table 2 shows the result of sensitive analysis of the MCC-SOM for threshold value, and the suitable threshold value is found to be 0.8 which has the lowest AEI of all. Table 2. The result of sensitive analysis of the MCC-SOM for threshold value. Threshold value 0.5 0.6 0.7 0.8 0.9 AEI 2.03 1.77 1.71 1.64 1.89 According to the result in Table 2, this paper explains the trend of AEI with different threshold values, that is: because the amount of cluster center is insufficient, the MCC-SOM acts as the weight adjustment of the original SOM which has less amount of the sub-clustering center. Moreover, the amount of cluster center is surplus, the unnecessary roles for weight adjusting of the MCC-SOM make it uneasy for learning to convergence. Therefore, the MCC-SOM of the following case study uses 0.8 as the threshold value of the sub-clustering center. 4.2 Comparison of EI of Output Topology for the Benchmark After the financial ratios of contractors have been imported to the three types of SOM models for the clustering analysis, they then get the trained weight of the SOM models. This paper uses the trained weight to calculate the EI of every output unit in topology. The EI values of every output unit in topology are shown in Figs. 5 to 7. Because the EI value represents the consistency level between the weight of the output unit and the feature value of case, the output unit with the smallest EI is the main clustering center. Moreover, the output unit with the largest EI is the boundary between the two neighboring clusters of the output unit is among the two neighboring clusters. To present the shape of the cluster, this paper set the boundary in the center of this output unit. According to the above mentioned mechanism, this paper sets the boundaries of clusters in the topologies of three types of SOM models and elects the clustering centers from output units. The case study is calculated by the MCC-SOM model. As Fig. 5 shows, there are seven clusters in the output topology. The same case study is calculated by the

1098 CHING-HWANG WANG AND CHIH-HAN KAO Coordinate X1 X2 X3 X4 X5 X6 Y1 3.0785 3.4441 1.8632 1.4381 7 3.0814 3.5327 Y2 3.3172 3.2218 3.4501 2.5067 2.3343 1 3.3412 Y3 3.4419 1.8082 1.6652 2.3702 6 2 2.6037 2.8482 Y4 3.3613 1.9125 2.7437 2.2726 2.4420 4.5540 Y5 1.5194 5 3.3077 1.2581 4 2.0625 3.3466 4.0172 Y6 3.5747 3.1746 2.5145 2.4859 1.9807 Fig. 5. The output EI distribution of the program of the MCC-SOM. 1.4445 3 Coordinate X1 X2 X3 X4 X5 X6 Y1 3.9186 2.1899 1.3224 6 3.7570 1.6538 3.0410 Y2 3.2427 2.5851 2.3642 2.5156 1.3894 1 3.9285 Y3 2.2947 5 3.9421 5.3128 1.9384 3.2927 3.3595 Y4 3.0330 3.7242 2.5276 3.8553 2.2642 3.9250 Y5 3.8204 1.9593 2.5410 3.3706 Y6 2.5198 1.7818 4 3.4764 3.3058 3 1.7474 2 2.4498 3.4282 3.6604 Fig. 6. The output EI distribution of the program of Kohonen s SOM. Coordinate X1 X2 X3 X4 X5 X6 Y1 3.0676 1.9915 1 3.5409 3.8826 4.6887 3.7136 Y2 3.5025 3.6992 2.7739 1.8864 2 2.4032 2.5195 Y3 3.7399 2.1146 6 4.3973 4.6196 4.4501 3.9768 Y4 2.2747 3.1034 3.0177 3.9886 3.0459 2.7114 Y5 3.8578 2.4424 3.5880 1.7508 1.8143 4.2584 Y6 2.2313 5 2.4379 2.8943 1.6532 4 3.7900 Fig. 7. The output EI distribution of the program of Si s SOM. 2.3182 3

MULTI-CLUSTERING CENTERS APPROACH 1099 Kohonen s SOM model. The output topology is detailed in Fig. 6, showing six clusters. Additionally, the output topology of the Si s SOM is shown in Fig. 7, with six clusters in the output topology. Through the above result, it can be found that the MCC-SOM model appropriately adds the amount of the clustering center to increase the amount of the cluster. The single fusing cluster in the topology is outputted by Kohonen s SOM or Si s SOM model, the models that adopt the competitive theory of the single clustering center. The topology with the single cluster center can be precisely divided into few clusters by the learning mechanism of the multiple clustering centers. Thus, it effects that the clustering boundary of the MCC-SOM model is more clarifying. 4.3 Comparison of Training Correctness for the Benchmark This paper compares the AEIs of clustering centers from the MCC-SOM, Si s SOM, and Kohonen s SOM. The AEIs of clustering centers from the MCC-SOM, Si s SOM, and Kohonen s SOM are 1.64, 2.03, and 1.96. The MCC-SOM model eliminates about 16.33% of the AEI from Kohonen s SOM model. Thus, it can be found that the MCC- SOM model adopts the approach of adding clustering centers to adjust weights of output units suitably. The weight combination maps the information to fit the characteristic of case. However, the AEI of Si s SOM model is a little larger than the AEI of Kohonen s SOM model. Si s SOM model preserves the mechanism of the single clustering center, eliminating the influence to other output units from the main clustering center, but does not attend the influence to other output units from the other output units with the representative of case. 4.4 Comparison of Davies-Bouldin Index for the Benchmark To prove the influence of clarified output topology from the MCC-SOM model, this paper uses the Davies-Bouldin index as the evaluating standard. The Davies-Bouldin indexes in Tables 3 to 5 are the calculated results from output topologies of the three types of SOM models. Furthermore, the cluster amounts of the three types of SOM models are different; thus, this paper uses the average of Davies-Bouldin index of two neighboring clusters as the comparing standard for the comparing base consisting. According to Tables 3 to 5, this paper finds out that the MCC-SOM model reduces the average of the Davies-Bouldin index of Kohonen s SOM from 1.14 to 1.063. The scale of the Davies-Bouldin index of Kohonen s SOM eliminates about 6.8% by the MCC-SOM model. Moreover, Si s SOM model reduces the average of the Davies- Bouldin index of Kohonen s SOM from 1.14 to 1.08. It also eliminates the scale of the Davies-Bouldin index of Kohonen s SOM model about 5.2%. Therefore, this paper concludes the following description. Because the multiple clustering centers are appropriately identified, it wards off that the output unit on the clustering boundary refers to the single standard of weight adjustment to adjust the direction of weight. Moreover, the distance between the output unit on the clustering boundary and the clustering center is too far, and it causes the influence of clustering center to be insufficient. These reasons cause a situation to happen where the output unit locates

1100 CHING-HWANG WANG AND CHIH-HAN KAO Code of clustering center Table 3. The result of the Davies-Bouldin index by the MCC-SOM. Inner topology distance L(C) Separating topology distance D(C1, C2) Davies-Bouldin index 1&2 1.4 0.97 1 1.28 1&3 4.0 0.82 1&7 1.4 1.78 2 1.08 2&3 3.7 0.83 2&7 2.0 1.15 3 2.00 3&4 3.1 1.0 4 1.11 4&5 2 1.09 4&6 2.25 1.06 5 1.06 5&6 2.25 1.04 6 1.27 6&7 2.80 0.89 7 1.21 Average of Davies-Bouldin index 1.063 Code of clustering center Table 4. The result of the Davies-Bouldin index by the Kohonen SOM. Inner topology distance L(C) Separating topology distance D(C1, C2) Davies-Bouldin index 1&2 3.0 1.14 1 1.89 1&4 5.0 0.58 1&6 2.2 1.47 2 1.54 2&3 1.4 1.62 3 0.73 3&4 2.0 0.88 4 1.03 4&5 3.2 1.36 5 1.32 5&6 2.9 0.91 6 1.34 6&7 Average of Davies-Bouldin index 1.14 Code of clustering center Table 5. The result of the Davies-Bouldin index by the Si s SOM. Inner topology distance L(C) Separating topology distance D(C1, C2) Davies-Bouldin index 1 1.18 1&2 2.25 1.03 1&6 2.0 1.12 2 1.33 2&4 4.0 0.89 2&6 2.25 1.06 3 0.75 3&4 2.0 1.49 4 2.24 4&5 3.0 1.22 5 1.41 5&6 3.2 0.77 6 1.06 Average of Davies-Bouldin index 1.08 on the fuzzy zone between the two neighboring clusters. As Fig. 7 shows, there is a shadowing zone in the center between two neighboring clusters in the output topology of Si s SOM model. The output units on the shadowing zone can not be defined which cluster it belongs to, and that it is the proof for the above mentioned insufficient influence in the cluster centers.

MULTI-CLUSTERING CENTERS APPROACH 1101 In additional, the mechanism of Si s SOM model for the reducing influence of the clustering center for other output units is similar to the mechanism of the multiple clustering centers. However, the mechanism of Si s SOM model lacks the correct leading direction of weight adjustment for output unit from the sub clustering centers. Thus, it causes the Davies-Bouldin index of Kohonen s SOM model and Si s SOM model to be smaller than the Davies-Bouldin index of the MCC-SOM model. 5. CONCLUSION This paper develops the mechanism of the multiple clustering centers of competitive theory and proposes the MCC-SOM model. It can effectively enhance the mapping ability between the input and output layer of the SOM model by the connected weight. Furthermore, it reduces the fault percent of trained weight, and increases the ability of topology preserving of the SOM model. Next, this paper allows for more clustering centers in a cluster. It reduces the amount of output point of case that the output points are located on the fuzzy zone between the two neighboring clusters. Thus, the cluster boundaries of output topology have more clarity, and therefore the recognizable ability of cluster is enhanced. It eliminates the missing possibility of cluster identifying. Furthermore, this paper develops the mechanism of the multiple clustering centers of weight adjustment to clarify the cluster graph. The hidden cluster in the neighboring cluster inner of topology can be isolated by the MCC-SOM model. The feature of the MCC-SOM model enhances the advantage that the SOM model does not need to preset the amount of cluster before the calculation of the SOM model constructing. In additional, it enhances the ability of explaining information of the MCC-SOM model for the clustering analysis. Finally, in the constructing process of the MCC-SOM, this paper suggests that: if the MCC-SOM model wants to enhance the ability of topology preserving, the following related researches shall construct an objective threshold of sub-clustering center electing by both the presupposition of the reasonable computing cost and the acceptable topology preserving. Additionally, the shape of the neighborhood area combination of the multiple clustering centers is not a circle. It is worthy to pay more attention to finding the fit dimension and shape of topology. REFERENCES 1. H. Ritter and K. Schulten, On the stationary state of Kohonen s self-organizingsensory mapping, Biological Cybernetics, Vol. 54, 1986, pp. 99-106. 2. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice Hall, New Jersey, 1999. 3. J. Vesanto and E. Alhoniemi, Cluster of the self-organizing map, IEEE Transactions on Neural Networks, Vol. 11, 2000, pp. 586-600. 4. A. Baraldi and P. Blonda, A survey of fuzzy cluster algorithms for pattern recognition Part ІII, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 29, 1999, pp. 778-801.

1102 CHING-HWANG WANG AND CHIH-HAN KAO 5. T. Villmann, M. Herrmann, D. Ralf, and T. M. Martinetz, Topology preservation in self-organizing feature maps: Exact definition and mearsurement, IEEE Transactions on Neural Networks, Vol. 8, 1997, pp. 256-266. 6. Z. P. Lo and B. Bavarian, Improved rate of convergence in Kohonen neural networks, in Proceedings of International Joint Conference on Neural Networks, Vol. 2, 1991, pp. 201-206. 7. M. Dittenbach, D. Merkl, and A. Rauber, The growing hierarchical self-organizing map, in Proceedings of the IEEE International Joint Conference on Neural Networks, Vol. 6, 2000, pp. 15-19. 8. T. M. Martinetz, S. G. Berkovich, and K. Schulten, Neural-gas network for vector quantization and its application to time-series prediction, IEEE Transactions on Neural Networks, Vol. 4, 1993, pp. 558-569. 9. T. Martinetz and K. Schulten, Topology representing networks, Neural Networks, Vol. 7, 1994, pp. 507-522. 10. D. DeSieno, Adding a conscience to competitive learning, in Proceedings of the IEEE International Conference on Neural Networks, Vol. 1, 1988, pp. 117-124. 11. J. Si, S. Lin, and M. A. Vuong, Dynamic topology representing networks, Neural Networks, Vol. 13, 2000, pp. 617-627. 12. E. Dimitriadou, S. Dolnicar, and A. Weingessel, An examination of indexes for determining the number of clusters in binary data sets, Psychometrika, Vol. 67, 2002, pp. 137-160. 13. D. L. Davies and D. W. Bouldin, A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-1, 1979, pp. 224-227. Ching-Hwang Wang ( ) received the M.S. degree and the Ph.D. degree in Civil Engineering from University of Washington, Seattle, U.S.A., in 1985 and 1988. He is a Professor of the Department of Construction Engineering at National Taiwan University of Science and Technology, Taipei, Taiwan, and his areas of research focus on construction management and economics, simulation of construction schedule and cost, building investment, and modeling technologies. Chih-Han Kao ( ) received the B.S. degree, the M.S. degree and the Ph.D. degree in Construction Engineering from National Taiwan University of Science and Technology, Taipei, Taiwan, in 1990, 1992 and 2007. He is a Assistant Professor of the Department of Construction Engineering of National Kinmen Institute of Technology, Kinmen, Taiwan. His areas of research include construction management and data mining embedded.