Intrusion Detection using Neural Network Committee Machine

2013 XXIV International Conference on Information, Communication and Automation Technologies (ICAT) October 30 November 01, 2013, Sarajevo, Bosnia and Herzegovina Intrusion Detection using Neural Network Committee Machine Alma Husagic-Selman Department of Computer Science and Engineering International University of Sarajevo Sarajevo, BiH Email: aselman@ius.edu.ba Rasit Koker Department of Computer Engineering Sakarya University Sakarya, Turkey Email: rkoker@sakarya.edu.tr Suvad Selman Department of Electrical and Electronics Engineering International University of Sarajevo Sarajevo, BiH Email: sselman@ius.edu.ba Abstract Intrusion detection plays an important role in todays computer and communication technology. As such it is very important to design time efficient Intrusion Detection System (IDS) low in both, False Positive Rate (FPR) and False Negative Rate (FNR), but high in attack detection precision. To achieve that, this paper proposes Neural Network Committee Machine (NNCM) IDS. NNCM IDS consists of Input Reduction System based on Principal Component Analysis (PCA) and Intrusion Detection System, which is represented by three levels committee machine, each based on Back-Propagation Neural Network. To reduce the FNR, the system uses offline System Update, which retrains the networks when new attacks are introduced. The system shows the overall attack detection success of 99.8%. Keywords Intrusion detection, Neural Networks, Committee Machine, Intelligent Intrusion Detection System I. INTRODUCTION Since second half of last century, computer networks started to grow with tremendous speed and with them the need for security mechanisms which would ensure data, privacy and computer security grew as well. Many different security mechanisms were designed, yet none was reliable enough to protect the computer-network system from ever evolving threats and attacks. Firewalls were made in order to protect the networks from attacks that come from outside world, but they do not obliterate any intrusion coming from inside the network. Intrusion Detection Systems (IDS), on the other hand, monitor networking packets in order to prevent any form of computer attacks from within the network [1] [4]. This work focuses on IDS, since existing commercial IDSs offer wide window for improvements. In general, IDSs may be designed to perform misuse detection or anomaly detection [1] [5]. In misuse detection, all known abnormal behavior is defined and the system is trained to recognize it. It works by comparing the arriving packet with features of known attack behavior. If any new, not predefined attack arrives, the system would recognize it as normal packet, causing high FNR [2].To avoid very high FNR, misuse based IDS must be retrained very often, sometimes causing delays in the network [6]. Anomaly detection is modeled based on normal behavior [7], so any pattern violating that behavior would be defined as a system attack [1] [5]. Anomaly detection causes high FPR, because even some new normal packets, unknown to the system, would be identified as attacks. This deteriorates overall network performance, since some normal packets would never reach destination. For these reasons, most commercial IDS are designed to perform misuse detection alone. False alarms, be it false positive or false negative, are limiting the performance of IDS. It is therefore very important to reduce both types of these alarms, and the best way to do so is by combining anomaly and misuse detection [5] [8]. This paper proposes fast, efficient and accurate IDS based on committee machine. The system is updated offline anytime threshold for unrecognized packets is reached. The system produces five outputs, normal packet and four attack types (DoS, U2R, R2L and Probe) [9]. The paper is organized as follows: Related works are discussed in Section 2, Section 3 gives an overview of methods and algorithms used in this work, Section 4 presents data used for experimentation, Section 5 describes the system model, Section 6 presents and discusses results, and Section 7 concludes the paper. II. RELATED WORKS Different machine learning mechanisms, including Artificial Neural Networks, Fuzzy Logic, Genetic Algorithms, etc. have been used on KDD CUP 1999 data for Intrusion Detection [1] [10]. Different neural network algorithms have been used, including Grey Neural Networks [4], RBF [10], [11] Recirculation Neural Networks [2], PCA [6] [12] and MLP [5], with MLP generally showing better results than others [2].These works are mainly focusing on misuse detection. In order to combine misuse and anomaly detection, many researchers have recently attempted hybrid methods, by combining neural networks with other machine learning mechanisms, such as fuzzy logic or genetic algorithms [5], [1], [13] [15]. Summary of all these results is presented in Table I. Table II shows FPR and FNR for different neural network and hybrid classification algorithms. III. ALGORITHMS AND METHODS Multiple methods are used in this work: PCA for feature reduction, feed-forward Multilayer Perceptron Neural Net- 978-1-4799-0431-0/13/$31.00 2013 IEEE

TABLE I. SUMMARY OF RESULTS FOR DIFFERENT NEURAL NETWORKS AND HYBRID SYSTEMS Approach DOS, % Probe, % R2L, % U2R, % Decision Trees 99.80 50.00 33.30 50.00 Bayesian Networks 99.70 52.60 46.20 25.00 Flexible Neural Tree 98.80 99.30 98.80 99.90 Fuzzy NN 100.00 100.00 99.80 40.00 MLP 99.90 48.10 93.20 83.30 Advanced NN 98.97 94.62 97.02 59.00 Evolving Fuzzy NN 98.99 99.88 97.26 65.00 Recirculation NN 97.89 98.15 98.22 100.00 PCA & Gray NN 68.00 88.00 58.00 26.00 Fuzzy C-Mean and MLP 100.00 99.80 40.00 100.00 ANSIF, FIS & GA 99.70 84.97 31.68 16.67 above it. There are no connections among the units of a single layer. The common examples of this category are Multilayer Perceptron or Radial Basis Function networks [18]. TABLE II. SUMMARY OF FPR AND FNR FOR DIFFERENT CLASSIFICATION ALGORITHMS Approach FNR, % FPR, % Flexible Neural Tree 1.20 0.30 MLP 5.80 0.80 Clasterization 7.00 10.00 K-NN 9.00 8.00 SVM 2.00 10.00 Recirculation Neural Networks 1.83 0.03 Fuzzy C-Mean and MLP 0.01 0.01 works for classification and Committee Machine as a boosting mechanism. A. Principle Component Analysis PCA is very useful mathematical algorithm, based on orthogonal linear transformation, which is widely used for data compression, image processing and feature extraction [6] [12]. The goal of PCA is to find a set of orthogonal components that minimize the error in reconstructed data. An equivalent formulation of PCA is to find an orthogonal set of vectors that maximize the variance of the projected data [16]. In other words, PCA transforms the data into different frame of reference with minimal error and using fewer features than the original data, while preserving data randomness [17]. For more detailed description of PCA algorithm refer to [16], [17]. B. Neural Networks Neural Networks are mathematical algorithms whose design was inspired by biological neural networks found in living organisms. Neural networks produce efficient solutions to a wide range of specific problems, ranging from simple classification to data compression and image processing [18]. They can be divided into two main categories: feed-forward and recurrent networks. In feed-forward networks the flow of data goes from input to output cells, which can be grouped into layers but no feedback interconnections can exist. On the other hand, recurrent networks contain feedback loops and their dynamical properties are very important. The most popularly used type of neural networks employed in pattern classification tasks is the feed-forward network which is constructed from layers and possesses unidirectional weighted connections between neurons [18], [19]. A feed-forward network has a layered structure (Figure 1), with input layer, hidden layer and output layer. Each layer consists of units where each unit receives its inputs from units in a layer below and send its output to units in a layer directly Fig. 1. Feed-forward network (MLP) with N inputs, one output and one hidden layer Multilayer Perceptron (MLP) type is defined by establishing the number of neurons from which it is built. This process can be divided into three parts: obtaining number of inputs, defining number of outputs, and deciding number of hidden layers. The last part can become crucial to accuracy of obtained classification results [19]. The number of input and output neurons can be actually seen as external specification of the network and these parameters are rather found in a task specification. The number of inputs is determined by the number of features that characterize the data, and outputs typically reflect the number of classes [20]. Fig. 2. Activation function (logistic function) used at neurons Each neuron within the hidden layer is represented by transfer function known as activation function. The transfer function should be able to accept an input within any range, and to produce an output in a strictly limited range. Figure 2 shows one of the most common transfer functions, the logistic function. In this case, the output is in the range (0, 1), and the input is sensitive in a range not much larger than (-1, +1). This function is also smooth and easily differentiable. These properties are critical in allowing the network training algorithms to operate [21]. In order to produce the desired set of output states whenever a set of inputs is presented, neural network has to be

configured by setting the strengths or weights to the interconnections. This step is known as learning procedure [21]. Learning rules are roughly divided into three categories of supervised, unsupervised and reinforcement learning methods. For descriptions of these algorithms refer to [18] [23]. Back-propagation (backward propagation of errors) is supervised learning algorithm, which is the most useful for feedforward networks. The algorithm can be divided into two main phases, propagation phase and weight update. In propagation phase, inputs are passed to the hidden layer where the initial weights were set. The hidden layer produces certain outputs, which are evaluated against original outputs, and error is calculated. In the second phase, the calculated error is used to update neuron weights. The process continues until optimum weights, are obtained [18] [20]. The problem with back-propagation is slow convergence, which can be improved if Scaled Conjugate Gradient (SCG)back-propagation learning mechanism is used. SCG belongs to the class of Conjugate Gradient Methods, which are characteristic by super-linear convergence on most problems [22]. Moller (1993) has shown that SCG is at least an order of magnitude faster than standard back-propagation learning. This speed-up depends on the convergence criteria, so the bigger demand for reduction in error, the bigger the speedup. By using a step size scaling mechanism SCG avoids a time consuming line-search per learning iteration, which makes the algorithm faster than some other learning algorithms [24]. C. Committee Machine In this work, artificial neural network committee machines with feed-forward multilayer perceptron and back-propagation learning algorithm is used. In committee machines, a complex computational task is solved by dividing it into a number of computationally simple tasks and then combining the solutions of these computations [18]. In other words, computational simplicity is achieved by distributing the learning task among a number of experts, which in turn divide the input space into a set of subspaces. The combination of these experts is said to constitute a committee machine. Basically, it fuses knowledge acquired by experts to arrive at an overall decision that is superior to that attainable by any one of them acting alone [20]. This method reduces the FP and FN errors, and improves performance of the system [20]. IV. DATA DESCRIPTION The data used in this work is widely used KDD CUP 1999 data, which was created based on DARPA Intrusion Detection data set, collected by MIT Lincoln Laboratory [9]. The data contains 41 features, specifying packet type, protocol and so on, and class label, specifying if the packet is normal or attack. Data set contains 22 attack types, which can be divided into four main categories [9], as follows: Denial of Service (DoS) denies service to legitimate users, most commonly through overloading of existing resources. Six out of total 22 attacks fall into this group. User-to-Root (U2R), user with normal user privileges tries to exploit vulnerabilities of the system in order to gain the access to the root of the system. Four out of total 22 attack fall into this group. Remote-to-Locals (R2L), unauthorized user from a remote machine tries to access local machine by exploiting holes in local machine. Eight out of 22 attacks fall into this group. Probing (Probe), unauthorized user monitors the networks in order to obtain information and discover systems vulnerabilities. Four out of total 22 attack fall into this group. Original KDD CUP 1999 training data, consisting of about 5 million records, was too large to analyze, and for that reason, concise set known as 10% training set was used. Out of this concise set of 500 000 records, 13094 records was chosen for training and 6900 records was used for testing. V. INTRUSION DETECTION MODEL Intrusion Detection Model was designed based on anomaly detection, with Committee Machine recognizing if the attack exists or not. As mentioned earlier, anomaly detection is shown to have low FNR, but in order to reduce FPR, system update is designed. Thus, proposed Intrusion Detection Model consists of three parts: Input Reduction System, Committee Machine and System Update (Figure 3). Fig. 3. Proposed Intrusion Detection Model A. Input Reduction System In systems with large dataset characterized with numerous features, input or feature reduction process should be done whenever possible. This step helps remove distracting variance from a dataset and as such improves the performance of the classifier and speeds up the classification process. In this work, single PCA neural network was chosen as a tool for feature reduction. PCA Neural Network takes original 41 inputs and reduces the input size to 13. General overview of input reduction system is shown in Figure 4. Fig. 4. Input/Feature Reduction System B. Committee Machine Committee consists of three Back-Propagating Neural Networks, set in parallel. Each neural network receives inputs from

Fig. 5. scheme Committee Machine with three MLP Neural Networks and voting VI. RESULTS AND DISCUSSION Simulation has shown an outstanding results for proposed model with 99.6% classification success. Following figures present results. Figure 7 shows confusion matrix for test data of third neural network. This network was trained to be an expert for data which first two neural networks within the committee machine could not recognize. the classification success was 99.6% with 0.2% FPR (4 records) and 0.3% FNR (7 records). IRS, and produces one output, which states if the attack exists or not (Figure 5). In order to train and test Committee Machine, we needed to divide test data into three sets, each containing 2300 data records. Initial training set consisting of 13094 data records was used to train the first neural network. The network was then tested with first test data set, and all test data records which failed to be classified using first neural network, were fed into the second neural network training set. Second network was then trained and tested, and all failed test data records were fed into third neural network training set. This way of training enables each neural network within committee machine to become expert for different sets of records. The voting scheme was designed to choose the output based on the agreement of at least two neural networks. C. System Update The system update consists of database, used to hold undefined new packets. Once undefined packets are detected, the system update retrains the network offline, and then updates the weights on committee machine. Figure 6 shows the flowchart for System update. Fig. 7. Confusion matrix for third neural network within Committee Machine Figure 8 shows confusion matrix for overall data, which includes training, test and validation data. The results show classification of 99.8%, with 0.1% (23 records) FPR and 0.1% (13 records) FNR. Fig. 6. System Update flowchart Fig. 8. Confusion Matrix for overall data

Figure 9 shows receiver operating characteristics (ROC) for each output class, which represent sensitivity against 1- specificity values associated with the observations predicted event probabilities. According to [24], the more ROC curve approaches the left top edge of the plot, the better the classification. From the Figure is clear than after 0.1% of FPR, the classification is almost maximum. Input reduction system is based on PCA, and it reduces the number of inputs from 41 to 13. Intrusion detection system was based on committee machine, which performs attack classification. It consists of three feed-forward multilayer perceptron neural networks with back-propagation learning, and voting scheme that is based on the combination of best outputs. The system update should be done offline once the threshold for recorded unknown packets is reached. Generally, the system update retrains the neural networks for new unknown attacks. The overall system showed the classification of 99.8%, with 0.1% both, false positive and false negative rate, and mean square error of 0.004. REFERENCES Fig. 9. Receiver Operating Characteristics Last figure (Figure 10) shows the mean squared error (mse) for our classification. It has been shown that the best classification was obtained at epoch 37 with 0.00405 mse. At this point, test and validation data gets the best common minimum. Fig. 10. Mean Squared Error (mse) VII. CONCLUSION In this paper we have proposed a new model for Intrusion Detection System. The model consists of three parts: Input reduction system, intrusion detection system and offline system update. [1] S. Chavan, K. Shah, N. Dave, S. Mukherjee, A. Abraham, and S. Sanyal, Adaptive neuro-fuzzy intrusion detection systems, in Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on, vol. 1, 2004, pp. 70 74 Vol.1. [2] P. Kachurka and V. Golovko, Neural network approach to realtime network intrusion detection and recognition, in Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), 2011 IEEE 6th International Conference on, vol. 1, 2011, pp. 393 397. [3] J. Zhao, M. Chen, and Q. Luo, Research of intrusion detection system based on neural networks, in Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on, 2011, pp. 174 178. [4] D.-X. Xia, S.-H. Yang, and C.-G. Li, Intrusion detection system based on principal component analysis and grey neural networks, in Networks Security Wireless Communications and Trusted Computing (NSWCTC), 2010 Second International Conference on, vol. 2, 2010, pp. 142 145. [5] M. Muna and M. Mehrotra, Design network intrusion detection system using hybrid fuzzy-neural network, International Journal of Computer Science and Security, vol. 4, no. 3, p. 258294, 2010. [6] F. Song, Z. Guo, and D. Mei, Feature selection using principal component analysis, in System Science, Engineering Design and Manufacturing Informatization (ICSEM), 2010 International Conference on, vol. 1, 2010, pp. 27 30. [7] D.-K. Kang, D. Fuller, and V. Honavar, Learning classifiers for misuse and anomaly detection using a bag of system calls representation, in Information Assurance Workshop, 2005. IAW 05. Proceedings from the Sixth Annual IEEE SMC, 2005, pp. 118 125. [8] G. Wang, J. Hao, J. Ma, and L. Huang, A new approach to intrusion detection using artificial neural networks and fuzzy clustering, Expert Systems with Applications, vol. 37, no. 9, pp. 6225 6232, 2010. [9] (1999, October) Kdd cup 99 competition. [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [10] C. Zhang, J. Jiang, and M. Kamel, Intrusion detection using hierarchical neural networks, Pattern Recognition Letters, vol. 26, no. 6, pp. 779 791, 2005. [11] U. Ahmed and A. Masood, Host based intrusion detection using rbf neural networks, in Emerging Technologies, 2009. ICET 2009. International Conference on, 2009, pp. 48 51. [12] S. Lakhina, S. Joseph, and B. Verma, Feature reduction using principal component analysis for effective anomalybased intrusion detection on nsl-kdd, International Journal of Engineering Science and Technology, vol. 2, no. 6, pp. 1790 1799, 2010. [13] F. Li, Hybrid neural network intrusion detection system using genetic algorithm, in Multimedia Technology (ICMT), 2010 International Conference on, 2010, pp. 1 4. [14] A. N. Toosi and M. Kahani, A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers, Computer Communications, vol. 30, no. 10, pp. 2201 2212, 2007. [15] S. Mukkamala, G. Janoski, and A. Sung, Intrusion detection using neural networks and support vector machines, in Neural Networks, 2002. IJCNN 02. Proceedings of the 2002 International Joint Conference on, vol. 2, 2002, pp. 1702 1707.

[16] K. I. Diamantaras and S. Y. Kung, Principal Component Neural Networks. John Wiley & Sons, New York., 1996. [17] I. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986. [18] S. Haykin, Neural Networks A Comprehensive Foundation, 2nd ed. Prentice-Hall, Inc. Simon & Schuster, A Viacom Company Upper Saddle River, New Jersey 07458, 1999. [19] J. Goldsmith, Unsupervised learning of the morphology of a natural language, Computational Linguistics, vol. 27, no. 2, pp. 153 198, 2001. [20] S. Selman and A. Husagic-Selman, Multilayered feedforward neural networks as a tool for distinction of the authors of texts, in Information, Communication and Automation Technologies (ICAT), 2011 XXIII International Symposium on, 2011, pp. 1 6. [21] R. E. Schapire, The strength of weak learnability, Machine Learning, vol. 5, pp. 192 227, 1990. [22] M. F. Mller, A scaled conjugate gradient algorithm for fast supervised learning, Neural Networks, vol. 6, no. 4, pp. 525 533, 1993. [23] N. J. Nilsson, Learning Machines: Foundations of Trainable Pattern- Classifying Systems. McGraw-Hill, New York, 1965. [24] M. H. Beale, M. T. Hagan, and H. B. Demuth, Neural Networks Toolbox. User s Guide, MatLab R2012a. MathWorks Inc., 2012.