Jinbiao Wang, Ailing Tu, Hongwei Huang

Similar documents
EFFICIENT CLUSTERING WITH FUZZY ANTS

Aggregation and Segregation Mechanisms. AM, EE141, Swarm Intelligence, W4-1

Fuzzy Ants as a Clustering Concept

Lorentzian Distance Classifier for Multiple Features

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Hybrid ant colony optimization algorithm for two echelon vehicle routing problem

Design and Development of a High Speed Sorting System Based on Machine Vision Guiding

Rough Set Flow Graphs and Ant Based Clustering in Classification of Disturbed Periodic Biosignals

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters

Available online at ScienceDirect. Procedia Manufacturing 6 (2016 ) 33 38

Application of Improved Discrete Particle Swarm Optimization in Logistics Distribution Routing Problem

Ant Colonies, Self-Organizing Maps, and A Hybrid Classification Model

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm

Available online at ScienceDirect. Energy Procedia 69 (2015 )

A new improved ant colony algorithm with levy mutation 1

View-dependent fast real-time generating algorithm for large-scale terrain

Assignment in COMPLEX ADAPTIVE SYSTEMS

CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS

Bipartite Graph Partitioning and Content-based Image Clustering

IMPROVING THE PARTICLE SWARM OPTIMIZATION ALGORITHM USING THE SIMPLEX METHOD AT LATE STAGE

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

Mobile Robot Path Planning in Static Environments using Particle Swarm Optimization

ScienceDirect. Analogy between immune system and sensor replacement using mobile robots on wireless sensor networks

A Hand Gesture Recognition Method Based on Multi-Feature Fusion and Template Matching

An Optimization Algorithm of Selecting Initial Clustering Center in K means

Cluster Analysis using Spherical SOM

Text clustering based on a divide and merge strategy

Density Based Clustering using Modified PSO based Neighbor Selection

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Toward Part-based Document Image Decoding

Dynamic Robot Path Planning Using Improved Max-Min Ant Colony Optimization

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

PSO-based Energy-balanced Double Cluster-heads Clustering Routing for wireless sensor networks

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

ANALYZING AND OPTIMIZING ANT-CLUSTERING ALGORITHM BY USING NUMERICAL METHODS FOR EFFICIENT DATA MINING

Research of Fault Diagnosis in Flight Control System Based on Fuzzy Neural Network

A Rapid Automatic Image Registration Method Based on Improved SIFT

Generic Object Detection Using Improved Gentleboost Classifier

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Cemetery Organization, Sorting, Graph Partitioning and Data Analysis

REVIEW OF ANT METHODS AND PROPOSED MODIFIED ANT METHOD FOR FUZZY CLUSTERING

Ant Colony. Clustering & Sorting Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus, July 2003

The GPU-based Parallel Calculation of Gravity and Magnetic Anomalies for 3D Arbitrary Bodies

A Novel Field-source Reverse Transform for Image Structure Representation and Analysis

Out-of-Plane Rotated Object Detection using Patch Feature based Classifier

Fuzzy Ant Clustering by Centroid Positioning

Video annotation based on adaptive annular spatial partition scheme

Distance-based Methods: Drawbacks

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li

An Improved PageRank Method based on Genetic Algorithm for Web Search

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

Unknown Malicious Code Detection Based on Bayesian

PARTICLE SWARM OPTIMIZATION (PSO)

Path-based XML Relational Storage Approach

An Adaptive Flocking Algorithm for Spatial Clustering

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Detection of Blue Screen Special Effects in Videos

Keywords: clustering, construction, machine vision

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

Indexing by Shape of Image Databases Based on Extended Grid Files

Fuzzy-Kernel Learning Vector Quantization

Machine Learning Reliability Techniques for Composite Materials in Structural Applications.

An Improved KNN Classification Algorithm based on Sampling

Research on Technologies in Smart Substation

A Feature Selection Method to Handle Imbalanced Data in Text Classification

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures

Application of Geometry Rectification to Deformed Characters Recognition Liqun Wang1, a * and Honghui Fan2

Personalized Search for TV Programs Based on Software Man

A Two-phase Distributed Training Algorithm for Linear SVM in WSN

A new Class of Priority-based Weighted Fair Scheduling Algorithm

Tele-operation Construction Robot Control System with Virtual Reality Technology

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

Classification with Class Overlapping: A Systematic Study

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

Solving the Shortest Path Problem in Vehicle Navigation System by Ant Colony Algorithm

A Population Based Convergence Criterion for Self-Organizing Maps


Enhanced Hemisphere Concept for Color Pixel Classification

Available online at ScienceDirect. Procedia Computer Science 98 (2016 )

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

Feature weighting using particle swarm optimization for learning vector quantization classifier

A Novel Image Transform Based on Potential field Source Reverse for Image Analysis

Methods for Intelligent Systems

The Memetic Algorithm for The Minimum Spanning Tree Problem with Degree and Delay Constraints

Unsupervised learning in Vision

Navigation of Multiple Mobile Robots Using Swarm Intelligence

Homework 4 Computer Vision CS 4731, Fall 2011 Due Date: Nov. 15, 2011 Total Points: 40

Some questions of consensus building using co-association

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION

A HYBRID METHOD FOR SIMULATION FACTOR SCREENING. Hua Shen Hong Wan

Graph projection techniques for Self-Organizing Maps

An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm

Data Complexity in Pattern Recognition

A Study on Clustering Method by Self-Organizing Map and Information Criteria

SWARM INTELLIGENCE -I

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Dynamic Obstacle Detection Based on Background Compensation in Robot s Movement Space

The design of medical image transfer function using multi-feature fusion and improved k-means clustering

Transcription:

Physics Procedia 24 (2012) 1414 1421 2012 International Conference on Applied Physics and Industrial Engineering An Ant Colony Clustering Algorithm Improved from ATTA Jinbiao Wang, Ailing Tu, Hongwei Huang School of Computer Science and Technology, Civil Aviation University of China TianJin, China, 300300 Abstract DENNEUBOURG presents the first ant-based clustering algorithm in 1991.Ant colony clustering has the characteristics of automatically clustering, high clustering accuracy, irregular cluster shapes and so on. These are important for a clustering algorithm, so ant colony clustering is arousing more and more data mining researchers' concentration. In 2004, J. Handl together with his partners normalized the basic ant colony clustering algorithm, getting the ATTA model with superior performance. In this paper, we ll present an algorithm improved from ATTA which we call it LCA and test LCA on UCI data sets to show its feasibility. 2011 Published by Elsevier B.V. Selection and/or peer-review under responsibility of ICAPIE Organization Committee. 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [name organizer] Open access under CC BY-NC-ND license. Keywords:Ant colony clustering; Logic; cold; UCI data sets; feature attribute 1. Atta Model In the past two decades, many achievements in computer science depends on the bio-inspired information. Such as ant populations of parallelism, self-organization and robustness that played a particularly important role in promoting the development of intelligent algorithms. Although it is composed of simple individuals, but the entire population in case of no central control was able to complete complex tasks. Deneubourg J L in 1991 was enlightened by the behavior of ant colony classified ant eggs, first proposed Ant Colony Clustering Algorithm[1]. The algorithm was among the first to be used for the robot. Robot to imitate the behavior of ants, which is to identify two different objects, according to local environmental information to decide to pick up or drop the object, and so forth to achieve the purpose of object clustering. Then in 1994,Lumer and Faita used the idea of ant colony clustering to data analysis, proposed the LF algorithm [2],of ant colony clustering mainly in three aspects:1)perception radius, 2)pick up and drop probability calculation,3)mobile mechanism. In 2004, J.Handl and others worked on the specification of the ant colony clustering algorithm such as the norm, got the superior performance ATTA model [3]. Compared with the LF, ATTA's similarity function, picking up and dropping function, perception radius, and it's memory parameters are all more normalized : 1875-3892 2011 Published by Elsevier B.V. Selection and/or peer-review under responsibility of ICAPIE Organization Committee. Open access under CC BY-NC-ND license. doi:10.1016/j.phpro.2012.02.210

Jinbiao Wang et al. / Physics Procedia 24 (2012) 1414 1421 1415 1.1. Three principal elements 1The similarity function: 1 δ(, i j) δ(, i j) (1 ), if j (1 ) > 0 f * 2 () i = σ j L α α ⑴ 0, else 2Individual ants picking up and dropping probability formula: * 1, if f ( i) 1 * p pick ( i) = 1 ⑵, else * 2 f ( i) * * 1, if f ( i) 1 p drop ( i) = ⑶ * 4 f ( i), otherwise These two formulas derived from Experiments,and the two formulas achieve better results accompanied by a perceived increase of radius. 3Perception radius r:during the operation of the algorithm, so that the radius of perception changed all over the time, ATTA model for linear growth of r from 1 to 5, when r changes, σ = 2 r + 1 also changes, because of r = ( σ 1) 2. But here the data object in order to maintain sufficient separation in space, so when r changes, σ constant. 1.2. Several important parameters 1Memory m:similarity f*(i) rather than dissimilarity δ ( i, j ) used as a criterion, when the ants pick up a new individual data I, it put the data into memory in the neighborhood of the point position,calculating the similarity f*(i) of those m locations. The most appropriate location of f*(i) for the largest object in the down position of the data. 2α value changes over time:initially given for each ant a unified single value α between 0 to 1, after that, according to each ant s mobile data object to get change. That ant steps in the mobile # fail r fail = #active=100, the calculation of the data object that controls the failure rate # active, #fail is the number of ants failing to put down its data objects,#activeis is the number of ant movements there is a new formula of α : 3Empirical parameters: Number of ants =10, Memory m=10, tstart=0.45*#iterations, α + 0.01, if α α 0.01, if r r fail fail > 0.99 0.99

1416 Jinbiao Wang et al. / Physics Procedia 24 (2012) 1414 1421 tend=0.55*#iterations space = 10 # items 10 # items stepsize = 20 # items # iterations = 2000 # items #iterations than or equal to 100 million. ATTA clustering performance has been achieved relatively good condition. However, For classes and extract other issues cluster adhesion problems that exist in the basic model, ATTA has not fully solved; Secondly, ATTA clustering accuracy and speed to be further improved. ATTA has an interesting feature: after the formation of the initial class clusters, add an episode stage, that is, between the tstart and tend to change the similarity function f * (i), using 1/N OCC instead of 1/σ 2, here N OCC is the number of grid occupied by data objects in the local neighborhood. In this episode, the data points go through the "divergence - to re-aggregation" process, which improves the quality of clustering. 2. LCA: An improved ant colony clustering algorithm LCA(Logic based cold ants) makes the following improvements on ATTA:Ant populations were initially picked up the data object, and calculate their current location is suitable for down, take the data object which are carried by those ants are not suitable for put down directly to the various objects for maximum similarity value of the position, moreover, in order to allow rapid formation of class cluster center, we propose a similarity measure based on logic. That means that:ant makes objects into similar and dissimilar two categories,and enthusiasm of similar objects (attract), not similar object detached (not processed). 2.1 Similarity calculation based on logic ATTA seek all the properties of the data object to calculate the precise degree of similarity neighborhood (see formula⑴),but the reality is that many attributes are mutually weaken, and noises component contained in the data objects are also a wide gap, Therefore, accurate calculation often become wishful thinking. Practice by a large number of calculations found that the average distance α of the data object is a key parameter. Less than α similar, otherwise dissimilar. As shown below,data object i is carried by an ant at the center of the grid,assuming α is 0.2, data object δ ( i, j) around the neighborhood 8 grid is shown in Table 1. Figure 1

Jinbiao Wang et al. / Physics Procedia 24 (2012) 1414 1421 1417 TABLE 1 Similarity function formula based on the ATTA (1): * 1 δ ( i, j ) f ( i ) = (1 ) 2 σ j L α = 1 0. 6 0. 1 0. 15 [( 1 ) + (1 ) + (1 2 σ + (1 0. 12 0. 4 0. 08 ) + (1 ) + (1 ) + (1 = 1 2 σ ( 0. 75 ) = 0 Calculated result is that the probability that ant puts the data into that region is 0. Obviously, this result is wrong. The reason is simply because the two different points of the δ ( i, j) value is too large. Adjusted as follows for this: 1 Statistics the number of similar points (snum) and the number of dissimilar points (dnum) in the δ ( i, j) ( 1 ) > 0 2 neighborhood, if α, that the data object i and j are alike, snum plus 1 The method for calculation of similarity to: 0, if dnum > snum ⑷ f '( i) = snum, else In the example point 2, point 3, point 4, point 6 and point 8 are similar, snum=5; point 1 and point 5 are not similar, dnum=2. snum>dnum. The result is that ants lay down the data in this location, in line with the actual situation. 2Drop probability by: * 1, if f ( i) > 4 p * ( i) = * drop ⑸ f ( i) 3 ( ) otherwise 4, The formula is obtained by experiment. The algorithm does not need to pick up the object probability formula. 3Perception radius r:throughout the algorithm the perception of the radius of the ant is constantly changing, to allow more data to be a part of the single action of ants. The Perception Radius of the proposed algorithm have linear increased from 1 to 3, and then decreased from 3 to 1. 4Separation of the initial class of cluster centers: The latter part of the algorithm may occur two Class cluster adhesion object suitable for being put down, and then choose one of which to mark as not appropriate to lay down. 2.2 Algorithm flow j 1 2 3 4 5 6 7 8 ( i, j) δ 0.6 0.1 0.15 0.12 0.4 0.08 null 0.1 // Initialization phase 1The data is randomly scattered on a two-dimensional toroidal plane; 2For(j=1; j<= number of ants; j++) { ) 0. 1 )]

1418 Jinbiao Wang et al. / Physics Procedia 24 (2012) 1414 1421 The ant j randomly selected a free data object and pick it up; The ant j randomly selected a location of the environment and jump to the empty position; } // The main loop phase 3for(iteration=1;iteration<= the total number of iterations; iteration++) { Calculate all the labels of ants which are suitalbe to put down their data objects; Increase the separation of the initial cluster centers to change some suitable places into non-down positions; The ants which not suitable for putting down jump into those suitable for putting down, in the neighborhood of the maximum similarity value; Update the free data objects table; If one ant successfully put down the object { Randomly finds a new free data object I; Ant picks up the new data object i; Ant jumps onto the location of the object I; } } 2.3 Simulation of the algorithm 1400 data objects, 10 ants, ATTA model 100 times to run the process and results: Iteration=0: Iteration=70: Iteration=100: Figure 2. clustering process and results for ATTA algorithms In the model ATTA, Previous algorithm requires a longer period of time to run. After forming in the class cluster, the algorithm used the stage of an episode. At this stage, ants consider only the similarity without considering the density to pick up and down the object data, as a results, data objects will be orderly dispersed. When the data points are ordered dispersed, then use of previous similarity formula to Cluster of data objects, this process is equivalent to re-select the cluster center of class. Because the ant colony clustering algorithm is a stochastic algorithm, class of the results of cluster center re-selection has the potential to separate from each other class of cluster center, may also make the new class of cluster center still closely linked. The above model, the lower left corner of the class of cluster centers were separated, but separation was not obvious. 2400 data objects, 100 ants, LCA model 1 time to run the process and results. step=0: step =7: step =19: Figure 3. Run Results of LCA

Jinbiao Wang et al. / Physics Procedia 24 (2012) 1414 1421 1419 Figure 3 shows that LCA only use 19 steps to a good accurate classes cluster together. After the first step of the algorithm, cluster centers have been formed in all categories. Careful observation can be found, the early formation of the lower right corner there is a cluster of red type, but the center is too close to the blue and purple cluster space, so it will not be put down as a suitable location. As the algorithm progresses, the red class cluster clustering slowly up to that position that far apart with others. 3 LCA calculations faster than ATTA increased by 1 ~ 2 orders of magnitude. That is because in the clustering process, a huge gap between the number of data move caused. LCA ATTA Figure 4. ATTA and the LCA algorithm to cluster the results of each comparison the number of moving objects In Figure 4, X axis represents the number to be moving, Y-axis is the number corresponding to the number of objects to move. The average number of ATTA move which is 2129 times while the number of LCA is 5 times. 3. Experiments on UCI Data sets Examples in Table 2 are from the UCI data sets [4]. An object is often expressed through the multidimensional attribute, but clustering practice shows that, only feature attribute contribute to clustering. The following data set of wine as an example for analysis. To get feature attributes, we first experiment the 13-dimensional properties of single-attribute of wine in the algorithm experimental platform, Through a combination of 7 and 13, the clustering results are good, we may conclude that 7 and 13 are feature attributes. Through UCI data sets typical examples of the clustering effect of comparison, we can see the LCA algorithm have significant improvements than ATTA and the classic LF. As indicated in Table 2 attribute 7 attribute 13 attributes 7,13 Figure 5 Clustering result

1420 Jinbiao Wang et al. / Physics Procedia 24 (2012) 1414 1421 TABLE 2 clustering results of data set using different algorithms Data Set Iris Wine Soybean Lenses Balances Ecoli Evaluate Index LF ATTA LCA LF ATTA LCA LF ATTA LCA LF ATTA LCA LF ATTA LCA LF ATTA LCA MacroP(%) 50 72 88 48 61 80 43 59 100 47 44 58 62 79 55 26 27 32 MacroR(%) 89 82 90 88 84 88 69 61 100 85 76 93 58 12 85 81 81 79 MacroF1(%) 63 76 89 61 69 83 53 59 100 59 55 72 58 21 66 39 40 46 C(%) 63 43 100 66 66 100 64 65 100 55 55 66 - -1299 79 54 55 49 136 Iterative rounds 70 150 1 70 180 1 80 210 1 50 100 1 70 250 1 70 250 1 Number of steps per round 10 4 10 4 10 2 10 4 10 4 10 2 10 4 10 4 10 2 10 4 10 4 10 2 10 4 10 4 10 2 10 4 10 4 10 2 Iteration Time (s) 55 69 3.8 106 88 4.9 59 85 3 79 37 2.7 311 162 32 56 132 9.5 Size example 150 178 47 24 625 336 Dimension 4 13 35 4 4 7 Explanation: 1MacroP is the macro average accuracy: n 1 Ii MacroP = ⑹ n M Thereinto, Mi is the number of clustering elements in i-class, Ii is the correct clustering number of elements in Mi. n is the total number of all categories. 2MacroR is the macro average recall rate: i= 1 i n 1 Ii MacroR = ⑺ n N i= 1 i Thereinto, Ni is the number of elements in i-class, Ii is the number of elements clustered in i-class. n is the total number of all categories. 3MacroF1 is the average F1 value of the macro: MacroP MacroR 2 MacroF1 = ⑻ MacroP + MacroR Thereinto, MacroP is the macro average accuracy, MacroR is the macro average recall-rate. 4Clustering data rate: n1, if n1 n2 n2 C = n1 n2, else n2 (9) C more closer to 1 indicates better clustering results. Thereinto, n1 means that clustering system actually produce the number of polymer, n2 means that the classes existed in real data.

Jinbiao Wang et al. / Physics Procedia 24 (2012) 1414 1421 1421 4. Summary and Discussion By this paper proposed, LCA algorithm is useful and feasible. Examples show that ants mode Environmental Recognition Judgement Method logical worth more than accurate calculations in depth. An actual problem is not only facing the multidimensional, but inevitably existing all kinds of noise, therefore, how to more effectively identify the characteristics of property, move the noise is the direction of future efforts. Especially with the advent of multi-core computer the ant algorithm is possible to achieve parallel computation, so that prospect of ant clustering algorithm is worth the wait. 5. Acknowledgment This research is supported by the National Natural Science Foundation of China (Grant Nos. 60472121 and 60979021). The authors are grateful to the editors and the anonymous reviewers: their remarks and suggestions are important for shaping this paper. References [1]J.L.Deneubourg, S.Gross, N.R.Franks, A.Sendova-Franks, C.Detrain, and L.Chretien. The dynamics of collective sorting: Robot-like ants and ant-like robots. Simulation of Adaptative Behavior: From Animals to Animats, pp.356-363, 1991. [2]E.D.Lumer and B.Faieta.Diversity and Adaptation in populations of clustering ants.in Proc. Of the Third International Conference on the Simulation of Adataptive Behavior:From Animals to Animats 3,pp.449-508.Mit Press,1994. [3]J.Handl, J.Knowles, and M.Dorigo. Ant-based clustering and topographic mapping. Artificial Life, Vol.12(1), 2004. [4]http://archive.ics.uci.edu/ml/datasets.html