Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College Luhiana, Punjab Asstt. Prof, Guru Nanak Dev Engineering College Luhiana, Punjab Abstract: Clustering is the process to present the ata in an effective an organize way. There are number of existing clustering approaches but most of them suffer with problem of ata istribution. If the istribution is non linear it gives impurities in clustering process. The propose work is about to improve the accuracy of such clustering algorithm. Here we are presenting the work on time series oriente atabase to present the work. Here we are presenting the three layer architecture, where first layer perform the partitioning base on time series an secon layer will perform the basic clustering. The PSO is finally implemente to remove the impurities from the ataset. Keywors: Clustering, CMeans, KMeans, Dataset, Feature I. INTRODUCTION Clustering is a funamental problem in machine learning an has been approache in many ways. Clustering can be consiere the most important unsupervise learning technique; so, as every other problem of this kin, it eals with fining a structure in a collection of unlabele ata. There are ifferent clustering algorithms.the k-centers clustering algorithm is one of the popular algorithm. Clustering is use because following some reasons: Simplifications Pattern etection Useful in ata concept construction Unsupervise learning process A new clustering approach calle Affinity Propagation was introuce by Brenen J. Frey an Delbert Dueck in 007. Affinity Propagation has many avantages over other clustering algorithms like simplicity, general applicability, lower errors an goo performance. Affinity Propagation takes as input a collection of real value similarities between ata points. It is istance base clustering algorithm an similarity measure are calculate in the form of Eucliean istance. Real value messages are exchange between ata points until a high quality set of exemplars an corresponing clusters graually emerges. Each ata point i sens a responsibility r (i, k) to caniate exemplar inicating how much it favors the caniate exemplar over other caniates.availability a(i,k) is sent from caniate exemplar to ata points inicating how well the caniate exemplar can act as a cluster center for the ata points. At the en of each iteration, availability an responsibility are combine to ientify the exemplar. r(i,k)=s(i,k)-max{ a(i,k )+s(i.k )} where s(i,k) is the similarity between ata point i an caniate exemplar k,a(i,k ) is availability of other caniate exemplar k to ata point i, s(i,k ) is the similarity between ata point i an other caniate exemplar k. a(i,k)= min{0,r(k,k)+ max{0,r(i,k)}} where a(i,k) is the availability of caniate exemplar k to ata point i,r(k,k) is self responsibility of caniate exemplar k, r(i,k) is the responsibility from other ata point I to the sane caniate exemplar k. an other caniate exemplar k a(k,k) = max{0.r(i,k)} Where a (k, k) is self availability of caniate exemplar k an r(i,k) is the responsibility from other ata point I to the sane caniate exemplar k. Clustering is the process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects which are similar between them an are issimilar to the objects belonging to other clusters A. Clustering as unsupervise classification As for Classification, we have a multivariate ataset; a set of cases, each of which has multiple attributes/variables. Each case is represente as a recor, (or row), an to each attribute variable there is an associate column. The whole comprises the caseset. The attributes may be of any type; nominal, orinal, continuous, etc, but the in Cluster Analysis none of these attributes is taken to be a classification variable. The objective is to obtain a rule which puts all the cases into groups, or clusters. it may then efine a nominal classification variable which assigns a istinct label to each cluster. We o not know the final classification variable before we o the clustering; the objective is to efine such a variable. Hence, clustering is sai to be an unsupervise classification metho; unsupervise, because we o not know the classes of any of the cases before we o the clustering. We o not have class values to guie (or "supervise") the training of our algorithm to obtain classification rules. We shoul note that one or more of the attributes in our case set may in fact be nominal classification variables. However, as far as our Clustering of the case set is concerne they are just nominal variables an are not treate as classification variables for the clustering analysis. 504 Page
Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 B. Distance Matrices If we have two cases, C1 an C say, which have continuous variables x an y, taking values (x 1, y 1 ) an (x, y ) respectively, then we can plot the two cases in x-y space as in Figure 1. Figure 1: Eucliean istance Using Pythagorus's Theorem, we may write ( x x ) ( y y ) 1 1 1 This is the Eucliean istance between the two cases, in the x-y state-space. We often say that we have efine a "istance metric" when we have efine the formula for the istance between two cases. If the two cases are characterize by p continuous variables ( x 1, x, x i, x p say) rather than two (i.e. x, y), [Note: x x 1 so x(case ) x 1 (case ) x 1 (); similarly y x ()], then we may generalize the Eucliean istance to: p ( x i () x i (1)) 1 i 1 This can be generalize further to the Minkowski metric: p m m ( x i () x 1 i (1)) i 1 where x enotes the absolute value of x, (i.e. the size, without the sign). C. Stanarization If the values of the variables are in ifferent units then it is likely that some of the variables will take vary large values, an hence the "istance" between two cases, on this variable, can be a big number. Other variables may be small in value, or not vary much between cases, in which case the ifference in this variable between the two cases will be small. Hence in the istance metrics consiere above, are epenent on the choice of units for the variables involve. The variables with high variability will ominate the metric. We can overcome this by stanarizing the variables. D. Similarity Measure A measure of the similarity (or closeness) between two cases must take its highest value when the two cases have ientical values of all variables, (i.e. when the two cases are coincient in multivariable space). The similarity measure must ecrease monotonically as the case variable values increase, i.e. as the istance between the cases increases. Hence, any monotonically ecreasing function of istance will be a possible similarity measure. If we want the similarity measure to take value 1 when the cases are coincient, then we might consier: D. Particle Swarm Optimization PSO was first propose by Kenney an Eberhart [9]. The main principle behin this optimisation metho is communication. In PSO there is a group of particles that look for the best solution within the search area. If a particle fins a better value for the objective function, the particle will communicate this result to the rest of the particles. All the particles in the PSO have memory an they moify these memorize values as the optimisation routine avances. The recore values are: velocity (V), position (p), best previous performance (pbest) an best group performance (gbest). The velocity escribes how fast a particle shoul move from its current position which contains the coorinates of the particle. The last two parameters are the recore best values that have been foun uring the iterations. A simple PSO algorithm is expresse as [9]: II. LITERATURE SURVEY Lei Jiang efine a work on Hybri Aaptive Niche to Improve Particle Swarm Optimization Clustering Algorithm,. This paper use aaptive niche particle swarm algorithm to improve clustering An it is also stuie the impact of ifferent fitness optimization function to clustering ata [1]. Shuai Li, Xin-Jun Wang, Ying Zhang efine a work on X-SPA: Spatial Characteristic PSO Clustering Algorithm with Efficient Estimation of the Number of Cluster. In this paper author unifies Particle Swarm optimization (PSO) algorithm an Bayesian Information Criterion (BIC), proposes a numeric clustering algorithm. Chaos an space characteristic ieas are involve in the algorithm to avoi local optimal problem. Furthermore, BIC is also involve to provie an efficient estimation of the number of cluster []. Rehab F. Abel-Kaer presente a work on Genetically Improve PSO Algorithm for Efficient Data Clustering. The propose algorithm combines the ability of the globalize searching of the evolutionary algorithms an the fast convergence of the k-means algorithm an can avoi the rawback of both. The performance of the propose algorithm is evaluate through several benchmark atasets. The experimental results show that the propose algorithm is highly forceful an outperforms the previous approaches such as SA, ACO, PSO an k-means for the partitione clustering problem [3]. Surat Srinoy presente a work on Combination Artificial Ant Clustering an K- PSO Clustering Approach to Network Security Moel, His paper presents natural base ata mining algorithm approach to ata clustering. Artificial ant clustering algorithm is use to initially create raw clusters an then these clusters are refine using k-mean particle swarm optimization (KPSO). KPSO that has been evelope as evolutionary-base clustering technique [4]. 505 Page
Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 Shafiq Alam has efine a work on Particle Swarm Optimization Base Hierarchical Agglomerative Clustering The propose algorithm exploits the swarm intelligence of cooperating agents in a ecentralize environment. The experimental results were compare with benchmark clustering techniques, which inclue K-means, PSO clustering, Hierarchical Agglomerative clustering (HAC) an Density-Base Spatial Clustering of Applications with Noise (DBSCAN). The results are evience of the effectiveness of Swarm base clustering an the capability to perform clustering in a hierarchical agglomerative manner [5]. Alireza Ahmayfar presente a novel work on Combining PSO an k-means to Enhance Data Clustering. In this paper we propose a clustering metho base on combination of the particle swarm optimization (PSO) an the k-mean algorithm. PSO algorithm was showe to successfully converge uring the initial stages of a global search, but aroun global optimum, the search process will become very slow. On the contrary, k-means algorithm can achieve faster convergence to optimum solution [6]. III. PROPOSED APPROACH The propose work is improvement of the ata clustering algorithm with effect of Particle Swarm Optimization. Here the improvement is expecte in terms of accuracy an the efficiency. The propose work will be affecte with 3 imensions of improvement in ata clustering. As the first imension the ata will be segmente accoring to the time series. Such as in case of temperature the time series will be accoring to the seasonal parameters. Once the ata is segmente, The clustering algorithm will be implemente on it. Figure : Three Dimensional Clustering The clustering algorithm will perform the actual ata categorization accoring to the ata values. Here either C-Means or the K-Means algorithm will be implemente base on the ataset. This work will be one only once. The clustering algorithm is the see of the propose algorithm. Once the clustering one the refinement over the ataset is performe by the PSO. The PSO will use the output of the clustering algorithm an generate a Hybri algorithm. The work of PSO is first to fin the best fitness function an then pass this clustere ataset on this fitness function. The fitness function will be generate respective to the cluster as well as the whole ataset. The fitness function will accept a particular ata item an verify that whether it shoul belong to the cluster or not after examine the cluster ata. To implement the PSO some parameters are require. These parameters will be ecie accoring to the ataset an the eficiencies in the clustere ataset. The basic algorithm for the PSO that will be implemente is given as uner 1: Initialize a population array of particles with ranom positions an velocities on D- imensions in the search space. : For each particle, evaluate the esire optimization fitness function in D variables. 3: Compare particle s fitness evaluation with its pbest i. If current value is better than pbest i, then set pbest i equal to the current value, an p i equal to the current location x i in D- imensional space. 4: Ientify the particle in the neighborhoo with the best success so far, an assign its inex to the variable g. 5: Change the velocity an position of the particle accoring to the equation (3) 6: If a criterion is met (usually a sufficiently goo fitness or a maximum number of iterations), exit. 7: If criteria are not met, go to step The presente work is implemente for the weather ataset uner the ifferent attributes. These attributes inclues the temperature, rainfall, humiity etc. The process on this ataset is performe uner a three layere approach. The algorithmic process is given as uner Table 1: Propose Algorithm Step 1: Accept Whole Dataset As Input Step. Arrange the Dataset Accoring to Years Step 3: Fin the Instance Count for Each Year Step 4: Fin the ThresholCount Value for whole Dataset Step 5: Remove the Year Dataset Having Count(Year)<ThresholCount Step 6: Represent the Remaing as Training Dataset Step 7: Select N-points as initial centrois Step 8: Repeat a. Form N-clusters by assigning each point to its closest centroi b. Recompute the centroi of each cluster Step 9: Until centroi oes not change Step10: Run PSO on initial clusters generate by Clustering Algorithms a. Initialize the Particles (Clusters) b. Initialize Population size an maximum iterations c. Initialize clusters to input ata. Evaluate fitness value an accoringly fin personal best an global best position Step 11: Start with an initial set of particles, typically ranomly istribute throughout the esign space. Step 1: Calculate a velocity vector for each particle in the swarm. Step 13: Upate the position of each particle, using its previous position an the upate velocity vector. Step 14: Repeat From Step 10 an Exit on reaching stopping criteria (maximum number of iterations). 506 Page
Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 IV. RESULTS The presente clustering approach is implemente on weather ataset in mat lab 7.8 environment. The initial ataset contains about 10000 instances an having 7 attributes. The ataset is containing the ata between years 1970 an 010. As the first layer the time series segmentation is performe an the ata values of specific years are prune as these are foun less effective. The ataset remaine after the pruning process is shown in table with escription of specific years. Table : Prune Dataset 1971, 197, 1974, 1976, 1977, 1978, 1979, 1981, 1984, 1986, 1987, 1988, 1991, 1993, 1994, 1996, 1998, 000, 001, 003, 005, 009 Initial ata contains 10000 instances an after the pruning process about 6700 recors left. To perform the pruning, the time series segmentation is been implemente in this work. After the pruning process, the C-Menas clustering is performe over the ataset. In this work, we have ivie the complete ataset in 4 clusters uner the ifferent attributes. Figure 3, 4, 5 are showing the outcome of 1 st cluster uner three attributes respectively i.e. temperature, rain an humity. Figure 4: Cluster 1(Rain) Figure 4 is showing the outcome of clustering on Rain ataset. As we can see in this figure, the ata obtaine in cluster 1 is shown. This particular cluster is having about 1400 instances an contains the temperature values between 150 to 160. Out of 6700 instances, only 1400 instances are having that much of rainfall. Figure 5: Cluster 1(Humiity) Figure 3: Cluster 1(Temperature) As we can see figure 3 is showing the outcome of clustering on temperature ataset. As we can see in this figure, the ata obtaine in cluster 1 is shown. This particular cluster is having about 1600 instances an contains the temperature values between to 5. As we can see figure 5 is showing the outcome of clustering on Humiity ataset. As we can see in this figure, the ata obtaine in cluster 1 is shown. This particular cluster is having about 1500 instances an contains the temperature values between 0 to 50. V. CONCLUSION The presente work is about to improve the clustering process by implementing a three layere approach. The first layer is the filtration layer to ientify the most appropriate ataset on which clustering will be performe. On the secon layer, the actual clustering process is performe. Finally, the PSO is use to ajust the bounary values an to optimize the outcome of clustering process. 507 Page
Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 REFERENCES [1] Jiang Lixin Ding, Hybri Aaptive Niche to Improve Particle Swarm Optimization Clustering Algorithm, 978-1- 444-9953-3/11 011 IEEE [] Shuai Li, Xin-Jun Wang, X-SPA: Spatial Characteristic PSO Clustering Algorithm with Efficient Estimation of the Number of Cluster, Fifth International Conference on Fuzzy Systems an Knowlege Discovery, 978-0-7695-3305-6/08 008 IEEE [3] Rehab F. Abel-Kaer, Genetically Improve PSO Algorithm for Efficient Data Clustering, 010 Secon International Conference on Machine Learning an Computing, 978-0-7695-3977-5/10 010 IEEE [4] Surat Srinoy, Combination Artificial Ant Clustering an K- PSO Clustering Approach to Network Security Moel, IEEE 006 [5] Shafiq Alam, Gillian Dobbie, Patricia Rile, M. Asif Naeem, Particle Swarm Optimization Base Hierarchical Agglomerative Clustering,010 IEEE/WIC/ACM International Conference on Web Intelligence an Intelligent Agent Technology [6] Alireza Ahmayfar, Hamireza Moares, Combining PSO an k-means to Enhance Data Clustering,008 Internatioal Symposium on Telecommunications, 978-1- 444-751-/08 008 IEEE [7] [7] Dongyun Wang, Clustering Research of fabric eformation comfort using bi-swarm PSO algorithm, Worl Congress on Intelligent Control an Automation July 6-9 010, Jinan, China [8] [8] Junyan Chen, Research on Application of Clustering Algorithm Base on PSO for the Web Usage Pattern, 1-444-131-5/07 007 IEEE [9] [9] Jinxin Dong, Minyong Qi, A New Clustering Algorithm Base on PSO with the Jumping Mechanism of SA, 009 Thir International Symposium on Intelligent Information Technology Application [10] Brenan J. Frey an Delbert Dueck, Clustering by Passing essages Between Data Points University of Toronto Science,Vol 315, 97 976, February 007. [11] Bryan Conroy an Yongxin Taylor Xi. Semi Supervise Clustering Using Affinity Propagation. August 4, 009. [1] Supporting Material available at http://www.psi.toronto.eu/affinitypropagation [13] Delbert Dueck an Brenan J. Frey. Non-metric affinity propagation for unsupervise image categorization. In IEEE Int. Conf. Computer Vision (ICCV), 007 508 Page