Regression Based Cluster Formation for Enhancement of Lifetime of WSN

Regression Based Cluster Formation for Enhancement of Lifetime of WSN K. Lakshmi Joshitha Assistant Professor Sri Sai Ram Engineering College Chennai, India lakshmijoshitha@yahoo.com A. Gangasri PG Scholar Sri Sai Ram Engineering College Chennai, India srivedhu@yahoo.com Abstract The objective of the proposed system is to develop an adaptive iterative linear regression (ILR) based clustering for wireless sensor network. ILR classifies the initial cluster simultaneously in horizontal and vertical patterns to form two sub clusters. Among these two, the best is selected based on similarity index (SI). This selected cluster is taken as reference and the iteration continues until the convergence criteria Delta is met. The cluster quality is evaluated using internal and external indices and then compared with existing k-means and hierarchical clustering. The performance indices confirm the supremacy of the ILR clustering. Index Terms ILR, Horizontal and Vertical classification, SI, Delta, CH, Data replication. I. INTRODUCTION Wireless sensor network (WSN) is a network which has number of nodes to gather information from environment. The set of nodes are grouped as clusters using clustering approach to minimize transmission overhead and to increase network lifetime [1]. The aim of iterative linear regression (ILR) is to form clusters with better quality which has high intra cluster similarity and low inter cluster similarity. Each cluster is allocated with cluster head (CH) which is responsible to communicate the gathered information to other clusters, network or base station through gateway, hence traffic load can be reduced. The rest of the paper is organized as follows. Section II deals with the related work. In section III we have explained about the mathematical model, description and definition of the related terms used in the proposed work. Section IV contains the algorithm of ILR clustering. Work of the proposed system and its flow explained in section V. Section VI shows the simulation result and the metrics are tabulated in section VII. Conclusion of the proposed work and future work is given in section VII. II. RELATED WORK In the existing K-Means clustering [2], K number of centroids are assigned. Each node in the wireless sensor network is assigned to the centroid nearest to it and form initial clusters. The position of centroid in each cluster is recalculated, if position of centroid changes the clustering process is repeated otherwise the process is stopped. The drawback of this method is that since it is a ccentralized approach if the central node malfunctions or dies then the entire network will fail. If a packet drops while sending the node information to the central node or while resending back from central node to the individual nodes that is more dependent on the routing algorithms, then the node will be left out. The existing Density-based clustering method [3] starts by randomly selecting a point and checking whether the E-neighborhood of the point contains at least min points. Else it is considered as a noise point, otherwise it is considered as a core point and a new cluster is created. It iteratively adds the data points, which do not belong to any cluster and are directly density reachable from the core points of a new cluster. If the new cluster can no longer be expanded, in order to find the next cluster, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) randomly selects the unvisited data points and the clustering process continues until all the points are visited and no new point is added to any cluster. Therefore, a density-based cluster is a set of density-connected data objects [4]. The drawbacks of Density- based clustering is, difficult to handle the high-dimensional data. However, the performance of these algorithms depends on user defined criterion which requires previous knowledge of the domain. Further, it is impossible to have a prior knowledge in case of large real time data. 978-1-5090-6221-8/17/$31.00 c 2017 IEEE 414

In existing Fuzzy based predictive cluster head selection scheme for wireless sensor network [5] uses Rate of recurrent communication of the sensor node (RCSN) as a parameter for Cluster Head selection. RCSN of a sensor node is defined as, how frequently the node communicates with the base station. Other parameters to be considered for cluster head selection on this method are, Distance between node and base station (DNBS), Residual power of sensor nodes (RPSN), Sensor node movement (SNM) and Degree of neighboring nodes (DNN). Speed of sensor nodes also plays an important role in cluster head selection because if the sensor node move faster, it may loss it energy earlier. So the slow moving sensor nodes get priority in CH selection. The sensor node should meet the above mentioned parameters as per the conditions given in following Table I, to be elected as cluster head TABLE I. PARAMETERS AND ITS CONDITIONS FOR CH ELECTION (6) In the first iteration, the best classification is selected using Similarity Index (SI). Hence the classification with lower Similarity Index alone considered for the next iteration process. (7) Where, (8) Consider the N number of nodes are randomly deployed as shown in Fig.1 (a) and (b) Ci Parameters to be considered for CH selection RCSN DNBS RPSN SNM DNN Speed of sensor node Condition for a node to be elected as CH Low High High Low High Slow Cj Fig.1 (a) Horizontal Classification III. MATHEMATICAL MODEL Let us consider that N numbers of nodes are randomly deployed. In order to perform clustering, set of nodes are classified in horizontal and vertical manner simultaneously in the first iteration. Horizontal Classification of nodes follows the equation (1). (1) Ci Cj Where, (2) (3) Vertical Classification of nodes follows the equation (2). (4) Where, (5) Fig.1 (b) Vertical Classification During horizontal and vertical classification two sub clusters are formed namely, C i and C j. Similarity index for both the sub clusters are calculated using equations (7) and (8) and then SI H and SI V is determined using equations (9) and (10). (9) (10) 2017 Second International Conference On Computing and Communications Technologies(ICCCT 17) 415

Then the best classification is chosen based on the equation (11). (11) For example, if the horizontal classification has lower SI value then that is alone considered for further process and the vertical classification is not taken into account and vice versa. Likewise the iteration continues until the Convergence criteria Delta ( ) value calculated through the equation (12) is met. (12) SIHCi SIHCj SIVCi SIVCj SIMin Similarity Index for the sub-cluster Ci obtained in horizontal classification Similarity Index for the sub-cluster Cj obtained in horizontal classification Similarity Index for the sub-cluster Ci obtained in vertical classification Similarity Index for the sub-cluster Cj obtained in horizontal classification Minimum Similarity Index among SIH and SIV Convergence criterion The quality of the cluster thus formed is evaluated using Dunn Index which is given in equation (13). (13) SIMin(i-1) SIMin(i) Similarity index of previous iteration Similarity index of current iteration Dunn Index TABLE II. DESCRIPTIONS Symbol Description Minimal intra cluster distance x x coordinate of nodes Maximal inter cluster distance y x y coordinate of nodes Mean of x Cl H V Cluster Horizontal clustering Vertical Clustering y Mean of y I1, I2, I3 Iterations m c i SIH SIV Slope of the regression line Intercept of the regression line Similarity Index of Cluster i Standard deviation of nodes Distance between centre of two cluster Number of nodes Indicates the node Similarity Index for the horizontal classification Similarity Index for the vertical classification A. Related Definitions The terms used in the proposed work are given below in detail. 1) Similarity Index (SI): Similarity Index of the cluster is the ratio between intra cluster compactness and the inter cluster separation. SI must be low for high quality clusters which are separated well and nodes within a cluster are more compact. 2) Convergence criterion (Delta): Delta( ) is the difference in Similarity Index of previous and current iteration process. 3) Performance Indices: Performance Indices evaluates the cluster quality of various clustering approach. It is mainly classified 416 2017 Second International Conference On Computing and Communications Technologies(ICCCT 17)

as two types as follows: external and internal indices. i. External Indices: The Cluster quality evaluation based on external indices is done using benchmarks predetermined by the experts. ii. Internal Indices: Cluster quality evaluation is done using data which are available after cluster formation. 4) Dunn Index (D i ): Dunn Index given by equation (13) for the ILR based cluster formation is the ratio of the minimum distance between nodes of different clusters to the maximum distance between nodes of same cluster which ensures the quality of proposed work. IV. ITERATIVE LINEAR REGRESSION ALGORITHM 9. Calculate the Dunn Index using equation (13) to evaluate the cluster quality [6, 7]. V. PROPOSED SYSTEM The proposed Iterative linear regression based cluster formation provides clusters with better quality. In this method sensor nodes are deployed in random manner. Then, horizontal and vertical classifications are performed simultaneously on the deployed nodes. The best classification alone taken into consideration for further iteration based on the value of Similarity Index (SI). The regression process continues until the convergence criteria Delta is met. We get different number of clusters with better quality for various number of node arrangement. Cluster quality of the ILR based clustering is evaluated using Dunn Index [6,7] which ensures supremacy of the proposed clustering technique. The ILR (Iterative Linear Regression) based cluster formation increases the lifetime of the wireless sensor network and is useful in variety of applications like Robotics, Forest fire detection, Landslide detection, Healthcare, Military and Surveillance applications. This ILR clustering is mainly useful in environmental monitoring, for sensing weather conditions through the wireless sensor nodes deployed in the hilly areas [8]. ILR algorithm for cluster formation 1. Deploy sensor nodes randomly. 2. Classify the deployed nodes in horizontal manner based on the equation (1). 3. Classify the deployed nodes in vertical manner based on the equation (4). 4. Calculate the Similarity Index of horizontal and vertical classification using the equation (7). 5. Follow the equations (9), (10) and (11) to select the best classification among horizontal and vertical classification, which is to be considered for further iteration. 6. If SI H < SI V, then consider upper and lower groups of horizontal classification alone for next iteration. 7. If SI V < SI H, then consider upper and lower groups of vertical classification alone for next iteration. 8. Continue the iteration process until the convergence criterion Delta given in the equation (12) is met. Fig.2 Flow Diagram of Regression Based Cluster Formation 2017 Second International Conference On Computing and Communications Technologies(ICCCT 17) 417

VI. SIMULATION RESULT The simulation result of the proposed Regression based cluster formation, Cluster head and candidate Cluster head election is implemented using NS2 the snapshot of which is given in Fig.3 to Fig.6. Fig.6 ILR based Cluster Formation for Wireless Sensor Network Fig.3 Sensor node Deployment using NS2 Fig.4 Horizontal Classification Fig.5 Vertical Classification V. METRICS AND PERFORMANCE EVALUATION Iterative Linear Regression (ILR) based cluster formation is experimented by varying the number of sensor nodes as 18, 25 and 50. Table III shows the parameters like Similarity Index (SI), Delta value ( ) and optimum number of clusters formed (Cl). For the deployment of 18 nodes, three iterations (I 1, I 2, I 3) were required to meet the optimum value of Delta ( ), for which the which the threshold is set as 0.2 in the proposed work. During first iteration (I 1) vertical classification is selected as best classification with the similarity index of 0.54 and it is considered as the reference cluster for the next iteration as the similarity index is less comparatively. During second iteration upper and lower group of vertical classification are classified in horizontal and vertical manner. In second iteration horizontal classification has low similarity indices (SI H) which is 0.4 and 0.6. Hence these classifications are alone taken into consideration for third iteration (I 3). The same procedure is repeated for third iteration. The Dunn Index is calculated using equation (13). Lesser the deviation from the threshold value of Delta better is the Dunn Index obtained and better is the quality of the cluster. The position of nodes during deployment decides the number of iteration. 418 2017 Second International Conference On Computing and Communications Technologies(ICCCT 17)

VI. CONCLUSION AND FUTURE WORK The proposed work has proved that ILR clustering improves the cluster quality which is evaluated by the performance indices. The Cluster Head (CH) can further be elected for each cluster thus formed based on residual energy [9] and the role of Cluster Head can be changed periodically using the Game theoretic approach [10, 11]. In addition to the above said work the data replication [12] can also be done to have a reliable communication in the network. In case of any link failure that happens between the Cluster Head and the sink the data replication provides the perfect communication. TABLE III. COMPARISON ON CLUSTER FORMATION n I 1 I 2 I 3 Cl DI 18 25 50 0.64 0.54 0.52 0.4 0.66 0.51 0.57 0.58 0.54 0.6 0.4 0.3 0.4 0.1 8 0.2 0.06 4 0.3 0.02 4 0.5 REFERENCES [1] Akyildiz, I. F., Su, W., Sankarasubramaniam, Y., & Cayirci, E. (2002). Wireless Sensor Networks: A Survey. Comput-er Networks, 38, 393-422. [2] Sasikumar, P., & Khara, S. (2012, November). K- means clustering in wireless sensor networks. In Computational intelligence and communication networks (CICN), 2012 fourth international conference on (pp. 140-144). IEEE. [3]Amini, A., Wah, T. Y., & Saboohi, H. (2014). On density-based data streams clustering algorithms: A survey. Journal of Computer Science and Technology, 29(1), 116-141. [4] Tarng, W., Lin, H. W., & Ou, K. L. (2012). A Cluster Allocation and Routing Algorithm based on Node Density for Extending the Lifetime of Wireless Sensor Networks. International Journal of Computer Science & Information Technology (IJCSIT). [5]Natarajan, H., & Selvaraj, S. A Fuzzy Based Predictive Cluster Head Selection Scheme for Wireless Sensor Networks. In Proceedings of 8th International Conference on Sensing Technology. [6] Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2002). Cluster validity methods: part I. ACM Sigmod Record, 31(2), 40-45. [7]Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2002). Clustering validity checking methods: part II. ACM Sigmod Record, 31(3), 19-27. [8]Prabhu, S. B., & Sophia, S. (2013). Real-world applications of distributed clustering mechanism in dense wireless sensor networks. International Journal of Computing, Communications and Networking, 2(4). [9] Dasgupta, S., & Dutta, P. (2013). A Novel Game Theoretic Approach for Cluster Head Selection in WSN. International journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN, 2(3), 2278-3075. [10]Xu, Z., Yin, Y., Chen, X., & Wang, J. (2013). A Game-theory Based Clustering Approach for Wireless Sensor Networks. NGCIT 2013, ASTL, 58-66. [11] Shi, H. Y., Wang, W. L., Kwok, N. M., & Chen, S. Y. (2012). Game theory for wireless sensor networks: a survey. Sensors, 12(7), 9055-9097. [12] Zheng, J., Su, J., & Lu, X. (2004, December). A clustering-based data replication algorithm in mobile ad hoc networks for improving data availability. In International Symposium on Parallel and Distributed Processing and Applications (pp. 399-409). 2017 Second International Conference On Computing and Communications Technologies(ICCCT 17) 419