A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

Similar documents
Image Segmentation using K-means clustering and Thresholding

Clustering using Particle Swarm Optimization. Nuria Gómez Blas, Octavio López Tolic

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Fast Fractal Image Compression using PSO Based Optimization Techniques

Particle Swarm Optimization with Time-Varying Acceleration Coefficients Based on Cellular Neural Network for Color Image Noise Cancellation

Evolutionary Optimisation Methods for Template Based Image Registration

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Skyline Community Search in Multi-valued Networks

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

6 Gradient Descent. 6.1 Functions

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Blind Data Classification using Hyper-Dimensional Convex Polytopes

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance

Selection Strategies for Initial Positions and Initial Velocities in Multi-optima Particle Swarms

Study of Network Optimization Method Based on ACL

Data Mining: Concepts and Techniques. Chapter 7. Cluster Analysis. Examples of Clustering Applications. What is Cluster Analysis?

Online Appendix to: Generalizing Database Forensics

A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance

A Revised Simplex Search Procedure for Stochastic Simulation Response Surface Optimization

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

MEASURING THE PERFORMANCE OF SIMILARITY PROPAGATION IN AN SEMANTIC SEARCH ENGINE

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

Short-term prediction of photovoltaic power based on GWPA - BP neural network model

A FUZZY FRAMEWORK FOR SEGMENTATION, FEATURE MATCHING AND RETRIEVAL OF BRAIN MR IMAGES

Shift-map Image Registration

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD

Lecture 1 September 4, 2013

Fast Window Based Stereo Matching for 3D Scene Reconstruction

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

Shift-map Image Registration

Comparison of Methods for Increasing the Performance of a DUA Computation

WLAN Indoor Positioning Based on Euclidean Distances and Fuzzy Logic

Adjacency Matrix Based Full-Text Indexing Models

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

Divide-and-Conquer Algorithms

Using the disparity space to compute occupancy grids from stereo-vision

Cellular Ants: Combining Ant-Based Clustering with Cellular Automata

Genetic Fuzzy Logic Control Technique for a Mobile Robot Tracking a Moving Target

Generalized Edge Coloring for Channel Assignment in Wireless Networks

1 Surprises in high dimensions

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2.

Research Article Hydrological Cycle Algorithm for Continuous Optimization Problems

arxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Design of Policy-Aware Differentially Private Algorithms

New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction

Journal of Cybernetics and Informatics. Slovak Society for Cybernetics and Informatics

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Interior Permanent Magnet Synchronous Motor (IPMSM) Adaptive Genetic Parameter Estimation

Design and Analysis of Optimization Algorithms Using Computational

A Convex Clustering-based Regularizer for Image Segmentation

Object Recognition Using Colour, Shape and Affine Invariant Ratios

Dual Arm Robot Research Report

Coupling the User Interfaces of a Multiuser Program

Verifying performance-based design objectives using assemblybased vulnerability

A Modified Immune Network Optimization Algorithm

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

Learning Polynomial Functions. by Feature Construction

Characterizing Decoding Robustness under Parametric Channel Uncertainty

CLASS BASED RATIOING EFFECT ON SUB-PIXEL SINGLE LAND COVER AUTOMATIC MAPPING

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory

Fuzzy Clustering in Parallel Universes

A Plane Tracker for AEC-automation Applications

Modifying ROC Curves to Incorporate Predicted Probabilities

MANJUSHA K.*, ANAND KUMAR M., SOMAN K. P.

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Rough Set Approach for Classification of Breast Cancer Mammogram Images

Overlap Interval Partition Join

Classification and clustering methods for documents. by probabilistic latent semantic indexing model

Lab work #8. Congestion control

Robust Camera Calibration for an Autonomous Underwater Vehicle

Loop Scheduling and Partitions for Hiding Memory Latencies

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

Clustering Expression Data. Clustering Expression Data

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

All-to-all Broadcast for Vehicular Networks Based on Coded Slotted ALOHA

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

Nearest Neighbor Search using Additive Binary Tree

Keywords Data compression, image processing, NCD, Kolmogorov complexity, JPEG, ZIP

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

AnyTraffic Labeled Routing

Comparative Study of Projection/Back-projection Schemes in Cryo-EM Tomography

Learning convex bodies is hard

k-nn Graph Construction: a Generic Online Approach

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Research Article Research on Law s Mask Texture Analysis System Reliability

A shortest path algorithm in multimodal networks: a case study with time varying costs

Transcription:

Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College Luhiana, Punjab Asstt. Prof, Guru Nanak Dev Engineering College Luhiana, Punjab Abstract: Clustering is the process to present the ata in an effective an organize way. There are number of existing clustering approaches but most of them suffer with problem of ata istribution. If the istribution is non linear it gives impurities in clustering process. The propose work is about to improve the accuracy of such clustering algorithm. Here we are presenting the work on time series oriente atabase to present the work. Here we are presenting the three layer architecture, where first layer perform the partitioning base on time series an secon layer will perform the basic clustering. The PSO is finally implemente to remove the impurities from the ataset. Keywors: Clustering, CMeans, KMeans, Dataset, Feature I. INTRODUCTION Clustering is a funamental problem in machine learning an has been approache in many ways. Clustering can be consiere the most important unsupervise learning technique; so, as every other problem of this kin, it eals with fining a structure in a collection of unlabele ata. There are ifferent clustering algorithms.the k-centers clustering algorithm is one of the popular algorithm. Clustering is use because following some reasons: Simplifications Pattern etection Useful in ata concept construction Unsupervise learning process A new clustering approach calle Affinity Propagation was introuce by Brenen J. Frey an Delbert Dueck in 007. Affinity Propagation has many avantages over other clustering algorithms like simplicity, general applicability, lower errors an goo performance. Affinity Propagation takes as input a collection of real value similarities between ata points. It is istance base clustering algorithm an similarity measure are calculate in the form of Eucliean istance. Real value messages are exchange between ata points until a high quality set of exemplars an corresponing clusters graually emerges. Each ata point i sens a responsibility r (i, k) to caniate exemplar inicating how much it favors the caniate exemplar over other caniates.availability a(i,k) is sent from caniate exemplar to ata points inicating how well the caniate exemplar can act as a cluster center for the ata points. At the en of each iteration, availability an responsibility are combine to ientify the exemplar. r(i,k)=s(i,k)-max{ a(i,k )+s(i.k )} where s(i,k) is the similarity between ata point i an caniate exemplar k,a(i,k ) is availability of other caniate exemplar k to ata point i, s(i,k ) is the similarity between ata point i an other caniate exemplar k. a(i,k)= min{0,r(k,k)+ max{0,r(i,k)}} where a(i,k) is the availability of caniate exemplar k to ata point i,r(k,k) is self responsibility of caniate exemplar k, r(i,k) is the responsibility from other ata point I to the sane caniate exemplar k. an other caniate exemplar k a(k,k) = max{0.r(i,k)} Where a (k, k) is self availability of caniate exemplar k an r(i,k) is the responsibility from other ata point I to the sane caniate exemplar k. Clustering is the process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects which are similar between them an are issimilar to the objects belonging to other clusters A. Clustering as unsupervise classification As for Classification, we have a multivariate ataset; a set of cases, each of which has multiple attributes/variables. Each case is represente as a recor, (or row), an to each attribute variable there is an associate column. The whole comprises the caseset. The attributes may be of any type; nominal, orinal, continuous, etc, but the in Cluster Analysis none of these attributes is taken to be a classification variable. The objective is to obtain a rule which puts all the cases into groups, or clusters. it may then efine a nominal classification variable which assigns a istinct label to each cluster. We o not know the final classification variable before we o the clustering; the objective is to efine such a variable. Hence, clustering is sai to be an unsupervise classification metho; unsupervise, because we o not know the classes of any of the cases before we o the clustering. We o not have class values to guie (or "supervise") the training of our algorithm to obtain classification rules. We shoul note that one or more of the attributes in our case set may in fact be nominal classification variables. However, as far as our Clustering of the case set is concerne they are just nominal variables an are not treate as classification variables for the clustering analysis. 504 Page

Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 B. Distance Matrices If we have two cases, C1 an C say, which have continuous variables x an y, taking values (x 1, y 1 ) an (x, y ) respectively, then we can plot the two cases in x-y space as in Figure 1. Figure 1: Eucliean istance Using Pythagorus's Theorem, we may write ( x x ) ( y y ) 1 1 1 This is the Eucliean istance between the two cases, in the x-y state-space. We often say that we have efine a "istance metric" when we have efine the formula for the istance between two cases. If the two cases are characterize by p continuous variables ( x 1, x, x i, x p say) rather than two (i.e. x, y), [Note: x x 1 so x(case ) x 1 (case ) x 1 (); similarly y x ()], then we may generalize the Eucliean istance to: p ( x i () x i (1)) 1 i 1 This can be generalize further to the Minkowski metric: p m m ( x i () x 1 i (1)) i 1 where x enotes the absolute value of x, (i.e. the size, without the sign). C. Stanarization If the values of the variables are in ifferent units then it is likely that some of the variables will take vary large values, an hence the "istance" between two cases, on this variable, can be a big number. Other variables may be small in value, or not vary much between cases, in which case the ifference in this variable between the two cases will be small. Hence in the istance metrics consiere above, are epenent on the choice of units for the variables involve. The variables with high variability will ominate the metric. We can overcome this by stanarizing the variables. D. Similarity Measure A measure of the similarity (or closeness) between two cases must take its highest value when the two cases have ientical values of all variables, (i.e. when the two cases are coincient in multivariable space). The similarity measure must ecrease monotonically as the case variable values increase, i.e. as the istance between the cases increases. Hence, any monotonically ecreasing function of istance will be a possible similarity measure. If we want the similarity measure to take value 1 when the cases are coincient, then we might consier: D. Particle Swarm Optimization PSO was first propose by Kenney an Eberhart [9]. The main principle behin this optimisation metho is communication. In PSO there is a group of particles that look for the best solution within the search area. If a particle fins a better value for the objective function, the particle will communicate this result to the rest of the particles. All the particles in the PSO have memory an they moify these memorize values as the optimisation routine avances. The recore values are: velocity (V), position (p), best previous performance (pbest) an best group performance (gbest). The velocity escribes how fast a particle shoul move from its current position which contains the coorinates of the particle. The last two parameters are the recore best values that have been foun uring the iterations. A simple PSO algorithm is expresse as [9]: II. LITERATURE SURVEY Lei Jiang efine a work on Hybri Aaptive Niche to Improve Particle Swarm Optimization Clustering Algorithm,. This paper use aaptive niche particle swarm algorithm to improve clustering An it is also stuie the impact of ifferent fitness optimization function to clustering ata [1]. Shuai Li, Xin-Jun Wang, Ying Zhang efine a work on X-SPA: Spatial Characteristic PSO Clustering Algorithm with Efficient Estimation of the Number of Cluster. In this paper author unifies Particle Swarm optimization (PSO) algorithm an Bayesian Information Criterion (BIC), proposes a numeric clustering algorithm. Chaos an space characteristic ieas are involve in the algorithm to avoi local optimal problem. Furthermore, BIC is also involve to provie an efficient estimation of the number of cluster []. Rehab F. Abel-Kaer presente a work on Genetically Improve PSO Algorithm for Efficient Data Clustering. The propose algorithm combines the ability of the globalize searching of the evolutionary algorithms an the fast convergence of the k-means algorithm an can avoi the rawback of both. The performance of the propose algorithm is evaluate through several benchmark atasets. The experimental results show that the propose algorithm is highly forceful an outperforms the previous approaches such as SA, ACO, PSO an k-means for the partitione clustering problem [3]. Surat Srinoy presente a work on Combination Artificial Ant Clustering an K- PSO Clustering Approach to Network Security Moel, His paper presents natural base ata mining algorithm approach to ata clustering. Artificial ant clustering algorithm is use to initially create raw clusters an then these clusters are refine using k-mean particle swarm optimization (KPSO). KPSO that has been evelope as evolutionary-base clustering technique [4]. 505 Page

Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 Shafiq Alam has efine a work on Particle Swarm Optimization Base Hierarchical Agglomerative Clustering The propose algorithm exploits the swarm intelligence of cooperating agents in a ecentralize environment. The experimental results were compare with benchmark clustering techniques, which inclue K-means, PSO clustering, Hierarchical Agglomerative clustering (HAC) an Density-Base Spatial Clustering of Applications with Noise (DBSCAN). The results are evience of the effectiveness of Swarm base clustering an the capability to perform clustering in a hierarchical agglomerative manner [5]. Alireza Ahmayfar presente a novel work on Combining PSO an k-means to Enhance Data Clustering. In this paper we propose a clustering metho base on combination of the particle swarm optimization (PSO) an the k-mean algorithm. PSO algorithm was showe to successfully converge uring the initial stages of a global search, but aroun global optimum, the search process will become very slow. On the contrary, k-means algorithm can achieve faster convergence to optimum solution [6]. III. PROPOSED APPROACH The propose work is improvement of the ata clustering algorithm with effect of Particle Swarm Optimization. Here the improvement is expecte in terms of accuracy an the efficiency. The propose work will be affecte with 3 imensions of improvement in ata clustering. As the first imension the ata will be segmente accoring to the time series. Such as in case of temperature the time series will be accoring to the seasonal parameters. Once the ata is segmente, The clustering algorithm will be implemente on it. Figure : Three Dimensional Clustering The clustering algorithm will perform the actual ata categorization accoring to the ata values. Here either C-Means or the K-Means algorithm will be implemente base on the ataset. This work will be one only once. The clustering algorithm is the see of the propose algorithm. Once the clustering one the refinement over the ataset is performe by the PSO. The PSO will use the output of the clustering algorithm an generate a Hybri algorithm. The work of PSO is first to fin the best fitness function an then pass this clustere ataset on this fitness function. The fitness function will be generate respective to the cluster as well as the whole ataset. The fitness function will accept a particular ata item an verify that whether it shoul belong to the cluster or not after examine the cluster ata. To implement the PSO some parameters are require. These parameters will be ecie accoring to the ataset an the eficiencies in the clustere ataset. The basic algorithm for the PSO that will be implemente is given as uner 1: Initialize a population array of particles with ranom positions an velocities on D- imensions in the search space. : For each particle, evaluate the esire optimization fitness function in D variables. 3: Compare particle s fitness evaluation with its pbest i. If current value is better than pbest i, then set pbest i equal to the current value, an p i equal to the current location x i in D- imensional space. 4: Ientify the particle in the neighborhoo with the best success so far, an assign its inex to the variable g. 5: Change the velocity an position of the particle accoring to the equation (3) 6: If a criterion is met (usually a sufficiently goo fitness or a maximum number of iterations), exit. 7: If criteria are not met, go to step The presente work is implemente for the weather ataset uner the ifferent attributes. These attributes inclues the temperature, rainfall, humiity etc. The process on this ataset is performe uner a three layere approach. The algorithmic process is given as uner Table 1: Propose Algorithm Step 1: Accept Whole Dataset As Input Step. Arrange the Dataset Accoring to Years Step 3: Fin the Instance Count for Each Year Step 4: Fin the ThresholCount Value for whole Dataset Step 5: Remove the Year Dataset Having Count(Year)<ThresholCount Step 6: Represent the Remaing as Training Dataset Step 7: Select N-points as initial centrois Step 8: Repeat a. Form N-clusters by assigning each point to its closest centroi b. Recompute the centroi of each cluster Step 9: Until centroi oes not change Step10: Run PSO on initial clusters generate by Clustering Algorithms a. Initialize the Particles (Clusters) b. Initialize Population size an maximum iterations c. Initialize clusters to input ata. Evaluate fitness value an accoringly fin personal best an global best position Step 11: Start with an initial set of particles, typically ranomly istribute throughout the esign space. Step 1: Calculate a velocity vector for each particle in the swarm. Step 13: Upate the position of each particle, using its previous position an the upate velocity vector. Step 14: Repeat From Step 10 an Exit on reaching stopping criteria (maximum number of iterations). 506 Page

Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 IV. RESULTS The presente clustering approach is implemente on weather ataset in mat lab 7.8 environment. The initial ataset contains about 10000 instances an having 7 attributes. The ataset is containing the ata between years 1970 an 010. As the first layer the time series segmentation is performe an the ata values of specific years are prune as these are foun less effective. The ataset remaine after the pruning process is shown in table with escription of specific years. Table : Prune Dataset 1971, 197, 1974, 1976, 1977, 1978, 1979, 1981, 1984, 1986, 1987, 1988, 1991, 1993, 1994, 1996, 1998, 000, 001, 003, 005, 009 Initial ata contains 10000 instances an after the pruning process about 6700 recors left. To perform the pruning, the time series segmentation is been implemente in this work. After the pruning process, the C-Menas clustering is performe over the ataset. In this work, we have ivie the complete ataset in 4 clusters uner the ifferent attributes. Figure 3, 4, 5 are showing the outcome of 1 st cluster uner three attributes respectively i.e. temperature, rain an humity. Figure 4: Cluster 1(Rain) Figure 4 is showing the outcome of clustering on Rain ataset. As we can see in this figure, the ata obtaine in cluster 1 is shown. This particular cluster is having about 1400 instances an contains the temperature values between 150 to 160. Out of 6700 instances, only 1400 instances are having that much of rainfall. Figure 5: Cluster 1(Humiity) Figure 3: Cluster 1(Temperature) As we can see figure 3 is showing the outcome of clustering on temperature ataset. As we can see in this figure, the ata obtaine in cluster 1 is shown. This particular cluster is having about 1600 instances an contains the temperature values between to 5. As we can see figure 5 is showing the outcome of clustering on Humiity ataset. As we can see in this figure, the ata obtaine in cluster 1 is shown. This particular cluster is having about 1500 instances an contains the temperature values between 0 to 50. V. CONCLUSION The presente work is about to improve the clustering process by implementing a three layere approach. The first layer is the filtration layer to ientify the most appropriate ataset on which clustering will be performe. On the secon layer, the actual clustering process is performe. Finally, the PSO is use to ajust the bounary values an to optimize the outcome of clustering process. 507 Page

Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 REFERENCES [1] Jiang Lixin Ding, Hybri Aaptive Niche to Improve Particle Swarm Optimization Clustering Algorithm, 978-1- 444-9953-3/11 011 IEEE [] Shuai Li, Xin-Jun Wang, X-SPA: Spatial Characteristic PSO Clustering Algorithm with Efficient Estimation of the Number of Cluster, Fifth International Conference on Fuzzy Systems an Knowlege Discovery, 978-0-7695-3305-6/08 008 IEEE [3] Rehab F. Abel-Kaer, Genetically Improve PSO Algorithm for Efficient Data Clustering, 010 Secon International Conference on Machine Learning an Computing, 978-0-7695-3977-5/10 010 IEEE [4] Surat Srinoy, Combination Artificial Ant Clustering an K- PSO Clustering Approach to Network Security Moel, IEEE 006 [5] Shafiq Alam, Gillian Dobbie, Patricia Rile, M. Asif Naeem, Particle Swarm Optimization Base Hierarchical Agglomerative Clustering,010 IEEE/WIC/ACM International Conference on Web Intelligence an Intelligent Agent Technology [6] Alireza Ahmayfar, Hamireza Moares, Combining PSO an k-means to Enhance Data Clustering,008 Internatioal Symposium on Telecommunications, 978-1- 444-751-/08 008 IEEE [7] [7] Dongyun Wang, Clustering Research of fabric eformation comfort using bi-swarm PSO algorithm, Worl Congress on Intelligent Control an Automation July 6-9 010, Jinan, China [8] [8] Junyan Chen, Research on Application of Clustering Algorithm Base on PSO for the Web Usage Pattern, 1-444-131-5/07 007 IEEE [9] [9] Jinxin Dong, Minyong Qi, A New Clustering Algorithm Base on PSO with the Jumping Mechanism of SA, 009 Thir International Symposium on Intelligent Information Technology Application [10] Brenan J. Frey an Delbert Dueck, Clustering by Passing essages Between Data Points University of Toronto Science,Vol 315, 97 976, February 007. [11] Bryan Conroy an Yongxin Taylor Xi. Semi Supervise Clustering Using Affinity Propagation. August 4, 009. [1] Supporting Material available at http://www.psi.toronto.eu/affinitypropagation [13] Delbert Dueck an Brenan J. Frey. Non-metric affinity propagation for unsupervise image categorization. In IEEE Int. Conf. Computer Vision (ICCV), 007 508 Page