ANOMALY DETECTION IN WIRELESS SENSOR NETWORK (WSN) LAU WAI FAN FACULTY OF COMPUTING AND INFORMATICS UNIVERSITI MALAYSIA SABAH 2015 i
ABSTRACT Wireless Sensor Networks (WSN) composed of a lot of randomly deployed sensor nodes, which used in signal processing and wireless communication. Besides of the communication capability, it also used to measure temperature, humidity and pressure in environments. However, the sensor nodes can becomes abnormal due to various reasons, such as limited computational and communication capability, hardware or software faults, and limited coverage areas. The worse condition is the sensor nodes are compromise by the anomalies. Due to the limitations of the sensor node, it is more easily to get attacks. An efficient anomaly detection technique is important as to detect the anomalies before it brings a huge damage to the network. In the works presented in this thesis, the Time Series Classification (TSC) technique is used to detect the anomalies in WSN. The TSC technique used is K-Nearest Neighbor (KNN) with Euclidean distance and Dynamic Time Warping (DTW). In order to apply the TSC technique, the data set that was collected is transform to the point form series. A window-based technique, which generates sliding windows, is employed to transform the data into point series form. The accuracy, sensitivity and specificity are the metrics that used to measure the performance of the classification. The KNN with Euclidean distance and DTW are comparing with the other classification approaches which Support Vector Machine (SVM) classifier, Naïve Bayes, Neural Networks and Decision Tree. The TSC technique with Euclidean distance of 1-nearest neighbor (1-nn), has achieved the best classification results with highest accuracy, sensitivity and specificity, which is 99.63%, 65.00% and 100% respectively. When compared with other approaches, in the aspect of sensitivity, the Naïve Bayes has achieved the highest which is 75.00%, in the aspect of accuracy and specificity, the TSC technique with Euclidean distance of 1-nn has the highest. The best classification results of TSC technique as mentioned above were generated with the length of sliding window 10. For KNN, the best result was achieved by using the K value of 1. ii
ABSTRAK Wireless Sensor Networks (WSN) adalah Rangkaian Sensor Tanpa Wayar yang terdiri daripada banyak nod sensor secara rawak, biasanya digunakan dalam pemprosesan isyarat dan komunikasi tanpa wayar. Selain daripada keupayaan komunikasi, ia juga digunakan untuk mengukur suhu, kelembapan dan tekanan dalam persekitaran. Walau bagaimanapun, nod sensor menjadi tidak normal boleh disebabkan oleh pelbagai sebab, seperti keupayaan pengkomputeran dan komunikasi yang terhad, kesalahan dalam perkakasan atau perisian, dan kawasan liputan yang terhad. Keadaan yang lebih buruk ialah nod sensor dikompromi oleh anomali. Disebabkan nod sensor yang terhad, ia adalah lebih mudah untuk mendapatkan serangan. Teknik pengesanan anomali yang berkesan adalah penting untuk mengesan anomali sebelum ia membawa kerosakan besar kepada rangkaian. Teknik Time Series Classification (TSC) adalah kaedah yang paling standard dan mudah yang digunakan untuk mengesan anomali dalam Wireless Sensor Networks (WSN)) melalui langkah-langkah persamaan, K-Nearest Neighbor (KNN) dengan jarak Euclidean dan Dynamic Time Warping (DTW) telah digunakan. Apabila menggunakan teknik Time Series Classification (TSC), set data yang dikumpulkan mestilah mengubah kepada bentuk siri titik. Dalam kerja-kerja yang dibentangkan dalam tesis ini, teknik TSC digunakan untuk mengesan anomali dalam WSN. Teknik TSC yang digunakan adalah KNN dengan jarak Euclidean dan DTW. Dalam usaha untuk menggunakan teknik TSC, set data yang dikumpulkan akan mengubah kepada bentuk siri titik. Teknik window-based yang menjana tingkap gelongsor, digunakan untuk mengubah format data kepada bentuk siri titik. Ketepatan, kepekaan dan kekhususan adalah metrik yang digunakan untuk mengukur prestasi klasifikasi. The KNN dengan jarak Euclidean dan DTW membanding dengan pendekatan klasifikasi yang lain, iaitu Support Vector Machine (SVM) classifier, Naïve Bayes, Neural Networks and Decision Tree. Teknik TSC dengan jarak Euclidean 1-nearest neighbour (1-nn), telah mencapai keputusan klasifikasi terbaik dengan ketepatan yang tinggi, kepekaan dan kekhususan, ialah 99.63%, 65.00% dan 100% masing-masing. Berbanding dengan pendekatan yang lain, dari aspek kepekaan, Naive Bayes telah mencapai yang tertinggi iaitu 75.00%, dalam aspek ketepatan dan kekhususan, dari aspek ketepatan dan kekhususan, teknik TSC dengan jarak Euclidean 1-nn mempunyai tertinggi. Keputusan klasifikasi terbaik TSC teknik seperti yang dinyatakan di atas telah dihasilkan dengan window-based yang bersaiz 10. Untuk KNN, hasil yang terbaik dicapai dengan menggunakan nilai K=1. iii
CHAPTER 1 INTRODUCTION 1.1 Chapter Overview This chapter overview is to introduce the details in each section of Chapter 1. In Section 1.2, the introduction of the Wireless Sensor Networks (WSN) will be presented. In Section 1.3, the problem background of Wireless Sensor Networks (WSN) is discussed. In Section 1.4, the research questions have been listed before proceed to the next section, which is the objectives. In Section 1.5, the objectives have been listed based on the research questions. In Section 1.6, the project scopes have been listed. In Section 1.7, the organisation report will be briefly descripted the details of the next few chapters. 1.2 Introduction Wireless Sensor Networks (WSNs) are a rapidly growing area for research, (Akshay, An Efficient Approach for Sensor Deployments in Wireless Sensor Network, 2010) it can be used to monitor a given field of interest for changes in the environment. Wireless Sensor Networks (WSNs) consists of large number of small, low-cost sensor nodes (M.Dagar, 2013) which have huge applications in habitat monitoring, disaster management, security and military. Wireless Sensor Networks (WSNs) implement complex and larger scale of environmental monitoring and tracking tasks in a wide range of network areas. Wireless transmission of Wireless Sensor Networks (WSNs) in such large network areas (Y.Zhang, 2010) make it vulnerable to all kinds of malicious attacks such as Denial of Service (DoS) attacks, black hole attacks and eavesdropping. (S. Fayssal, 2007) Due to the significant increase of wireless attacks, many wireless intrusion detection and prevention systems have been proposed. An effective Wireless Intrusion Detection System (WIDS) is needed as to detect the anomalies from the network earlier so that the anomalies tacking action can perform on time before a critical damage to the network. (M. Rassam, A 4
Survey of Intrusion Detection Schemes in Wireless Sensor Networks, 2012) Prevention-based security approaches such as cryptography, authentication and key management have been used to protect the Wireless Sensor Networks (WSNs) from different kinds of attacks, while for the detection-based approaches, it is proposed to protect the Wireless Sensor Networks (WSNs) from insider attacks and act as a second line defence after the failure of the prevention-based approaches. In this research paper, we will focus on the implementation of anomaly-based detection by using the Time Series Classification (TSC) technique. 1.3 Problem Background Recently, the Wireless Sensor Network (WSN) has been attracted many researchers to work on various issues related to it due to the attractive features. Normally, (M. Rassam, A Survey of Intrusion Detection Schemes in Wireless Sensor Networks, 2012) the Wireless Sensor Network (WSN) is formed by about ten to thousands of tiny sensors which are densely deployed in an unattended environment. Although the Wireless Sensor Network (WSN) has been widely used in various fields, however, it has the limitations. (D. Bhattacharyya, 2010) The sensor nodes in Wireless Sensor Network (WSN) have some constraints due to their limited processing capability, very low storage capacity and constrained communication bandwidth. (S. K. Singh, 2010 ) These limitations are due to limited energy and physical size of the sensor nodes. Due to these constraints, it is difficult to directly employ the conventional security mechanisms in WSNs. (Z. Hu, 2004) Wireless Sensor Network (WSN) possess two fundamental characteristics which are multihop transmission and constrained energy sources. These two characteristics have important implications to the fundamental performance limits of Wireless Sensor Network (WSN). (R. Jurdak, 2011) Wireless Sensor Network (WSN) can experience unexpected problems during deployment, due to hardware, software, or environmental anomalies. Anomaly detection in Wireless Sensor Network (WSN) is important (R. Sutharshan, 2008) to detect abnormal behaviour which can be caused by malicious attacks or intrusions on a network, faulty sensors in the network, or unusual phenomena in the monitored domain. (K. Leonid, 2014) Different approaches to 5
the anomaly detection problems have been reviewed based on their applications and specific features. Recently, the researchers have been searching for the most suitable Time Series Classification (TSC) technique to detect anomalies in the Wireless Sensor Network (WSN). In some of the research, (S. Stan, 2004) the methods for generating models that can detect anomalies in time series data have been studied. 1.4 Research Questions 1. How the Wireless Sensor Network (WSN) data collected can be represented in point series form? 2. How the Time Series Classification (TSC) technique can be applied to detect the anomalies on the data generated in (1)? 1.5 Objectives 1. To investigate and identify the feature extraction method that can be used to represent the Wireless Sensor Network (WSN) data in point series form. 2. To implement Time Series Classification (TSC) by using the K-Nearest Neighbour (KNN) with Dynamic Time Warping (DTW) or Euclidean Distance on the data features generated in (1). 3. To compare the performance of techniques in (2) with other approaches, which are Support Vector Machine (SVM), Neural Networks, Naïve Bayesian, and Decision Tree. 1.6 Project Scopes 1) Extract the features in the point form series. 2) Implement 2 types of Time Series Classification (TSC) techniques which are K-Nearest Neighbour (KNN) with Dynamic Time Warping (DTW) or Euclidean distance. 6
1.7 Organisation of Report In chapter 1, the general explanation and usage of Wireless Sensor Network (WSN) is briefly discussed and the introduction of Time Series Classification (TSC) technique used in Wireless Sensor Network (WSN) for anomaly detection is presented. In chapter 2, the literature reviews shows how the previous works have been done for anomaly detection in Wireless Sensor Network (WSN), and how the Time Series Classification (TSC) technique can be applied. In chapter 3, the flow of the methodology shows the steps of experiment implementation. Basically it has three important steps, which are data set collection, proposed methodology, and results analysis. In chapter 4, the details of proposed methodology have been further discussed in this chapter. The proposed approaches framework has illustrated the steps when conducting the experiment. In chapter 5, the performance of the classifiers will be analysed, the discussion of the analysis were justified. In chapter 6, the limitations and future works will be discussed. 7
8