Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks

Similar documents
* This manuscript has been accepted for publication in IET Networks.

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics

Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Intrusion Detection Datasets

Network attack analysis via k-means clustering

Unsupervised clustering approach for network anomaly detection

Combination of Three Machine Learning Algorithms for Intrusion Detection Systems in Computer Networks

Ranking and Filtering the Selected Attributes for Intrusion Detection System

This is a repository copy of Deep Learning Approach for Network Intrusion Detection in Software Defined Networking.

CHAPTER 7 Normalization of Dataset

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION

Discriminant Analysis based Feature Selection in KDD Intrusion Dataset

INTRUSION DETECTION WITH TREE-BASED DATA MINING CLASSIFICATION TECHNIQUES BY USING KDD DATASET

Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection

Two Level Anomaly Detection Classifier

An Intrusion Prediction Technique Based on Co-evolutionary Immune System for Network Security (CoCo-IDP)

Performance improvement of intrusion detection with fusion of multiple sensors

Intrusion Detection of Multiple Attack Classes using a Deep Neural Net Ensemble

Learning Intrusion Detection: Supervised or Unsupervised?

A Review on Performance Comparison of Artificial Intelligence Techniques Used for Intrusion Detection

A Hierarchical SOM based Intrusion Detection System

INTRUSION DETECTION MODEL IN DATA MINING BASED ON ENSEMBLE APPROACH

On Dataset Biases in a Learning System with Minimum A Priori Information for Intrusion Detection

Anomaly Intrusion Detection System Using Hierarchical Gaussian Mixture Model

Intrusion detection system with decision tree and combine method algorithm

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Comparative Analysis of Classification Algorithms on KDD 99 Data Set

Deep Feature Extraction for multi-class Intrusion Detection in Industrial Control Systems

Comparison of variable learning rate and Levenberg-Marquardt back-propagation training algorithms for detecting attacks in Intrusion Detection Systems

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform

Towards A New Architecture of Detecting Networks Intrusion Based on Neural Network

Intrusion Detection Based On Clustering Algorithm

INTRUSION DETECTION SYSTEM

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms

CHAPTER 2 DARPA KDDCUP99 DATASET

Distributed Detection of Network Intrusions Based on a Parametric Model

Flow-based Anomaly Intrusion Detection System Using Neural Network

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS

RUSMA MULYADI. Advisor: Dr. Daniel Zeng

Classification of Attacks in Data Mining

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) PROPOSED HYBRID-MULTISTAGES NIDS TECHNIQUES

Analysis of KDD 99 Intrusion Detection Dataset for Selection of Relevance Features

IDuFG: Introducing an Intrusion Detection using Hybrid Fuzzy Genetic Approach

Feature Selection in the Corrected KDD -dataset

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Extracting salient features for network intrusion detection using machine learning methods

Performance Analysis of various classifiers using Benchmark Datasets in Weka tools

A Comparative Study of Supervised and Unsupervised Learning Schemes for Intrusion Detection. NIS Research Group Reza Sadoddin, Farnaz Gharibian, and

Network Intrusion Detection Using Deep Neural Networks

Analysis of neural networks usage for detection of a new attack in IDS

Independent degree project - first cycle Bachelor s thesis 15 ECTS credits

Review on Data Mining Techniques for Intrusion Detection System

A Neural Network Based Intrusion Detection System For Wireless Sensor Networks

Detection of DDoS Attack on the Client Side Using Support Vector Machine

An Optimized Genetic Algorithm with Classification Approach used for Intrusion Detection

Intrusion Detection System with FGA and MLP Algorithm

A Rough Set Based Feature Selection on KDD CUP 99 Data Set

Using MongoDB Databases for Training and Combining Intrusion Detection Datasets

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection

Performance Analysis of Big Data Intrusion Detection System over Random Forest Algorithm

Enhanced Multivariate Correlation Analysis (MCA) Based Denialof-Service

A Survey And Comparative Analysis Of Data

Fast Feature Reduction in Intrusion Detection Datasets

Analysis of Feature Selection Techniques: A Data Mining Approach

DDoS Detection in SDN Switches using Support Vector Machine Classifier

Hybrid Feature Selection for Modeling Intrusion Detection Systems

2017 IEEE 31st International Conference on Advanced Information Networking and Applications

Network Intrusion Detection System: A Machine Learning Approach

Intrusion Detection -- A 20 year practice. Outline. Till Peng Liu School of IST Penn State University

Multiple Classifier Fusion With Cuttlefish Algorithm Based Feature Selection

Cyber Attack Detection and Classification Using Parallel Support Vector Machine

ANOMALY-BASED INTRUSION DETECTION THROUGH K- MEANS CLUSTERING AND NAIVES BAYES CLASSIFICATION

DDoS Attacks Classification using Numeric Attribute-based Gaussian Naive Bayes

Mining Audit Data for Intrusion Detection Systems Using Support Vector Machines and Neural Networks

Collaborative Security Attack Detection in Software-Defined Vehicular Networks

Leveraging SDN for Collaborative DDoS Mitigation

Deep Learning Approach to Network Intrusion Detection

Anomaly Intrusion Detection System using Hamming Network Approach

Classification of BGP Anomalies Using Decision Trees and Fuzzy Rough Sets

Abnormal Network Traffic Detection Based on Semi-Supervised Machine Learning

Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering

Network Traffic Anomaly Detection Based on Packet Bytes ABSTRACT Bugs in the attack. Evasion. 1. INTRODUCTION User Behavior. 2.

Keywords: Intrusion Detection System, k-nearest neighbor, Support Vector Machine, Primal Dual, Particle Swarm Optimization

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 4, Issue 7, January 2015

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Applying Packet Score Technique in SDN for DDoS Attack Detection

Network Traffic Measurements and Analysis

IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 06, 2014 ISSN (online):

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection

Review of Detection DDOS Attack Detection Using Naive Bayes Classifier for Network Forensics

Detection and Classification of Attacks in Unauthorized Accesses

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters

CSE 565 Computer Security Fall 2018

POLLING AND DUAL-LEVEL TRAFFIC ANALYSIS FOR IMPROVED DOS DETECTION IN IEEE NETWORKS

Hierarchical Adaptive FCM To Detect Attacks Using Layered Approach

Comparative Analysis of Machine Learning Methods in Anomaly-based Intrusion Detection

An Efficient Flow-based Multi-level Hybrid Intrusion Detection System for Software-Defined Networks

Dimension Reduction in Network Attacks Detection Systems

A COMPARATIVE STUDY OF CLASSIFICATION MODELS FOR DETECTION IN IP NETWORKS INTRUSIONS

Transcription:

Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks In spite of the significant impact of using a centralized controller, the controller itself creates a single point of failure which makes the network more vulnerable compared with the conventional network architecture [2]. On the other hand, the existence of a communication between the OF-enabled switches and the controller opens the door for various attacks such Denial of Service (DoS) [3], Host Location Hijacking and Man in the Middle (MIM) attacks [4]. Therefore, in order to develop an efficient Intrusion Detection System (IDS) for SDNs, the system should be able to make intelligent and real time decisions. Commonly, an IDS designed for SDNs works on the top of the controller which forms an additional burden on the controller itself. Thus, designing a lightweight IDS is considered an advantage, since it helps in effectively detecting of any potential attacks as well as performing other fundamental network operations such as routing and load balancing in a more flexible manner. Scalability, is also an important factor which should be taken into consideration during the designing stage of the system [4]. There are two main groups of intrusion dearxiv:1803.06762v1 [cs.cr] 18 Mar 2018 Majd Latah, Levent Toker Department of Computer Engineering, Ege University, 35100, Bornova, Izmir, Turkey latahmajd@gmail.com, toker1960@gmail.com Abstract Software-defined networking (SDN) is a new paradigm that allows developing more flexible network applications. SDN controller, which represents a centralized controlling point, is responsible for running various network applications as well as maintaining different network services and functionalities. Choosing an efficient intrusion detection system helps in reducing the overhead of the running controller and creates a more secure network. In this study, we investigate the performance of well-known anomaly-based intrusion detection approaches in terms of accuracy, false positive rate, area under ROC curve and execution time. Precisely, we focus on supervised machine-learning approaches where we use the following classifiers: Adaptive Neuro-Fuzzy Inference System (ANFIS), Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT) and K Nearest- Neighbor (KNN). By using the benchmark dataset, we observe that KNN achieves the best testing accuracy. However, in terms of execution time, we conclude that ELM shows the best results for both training and testing stages. 1 Introduction Network security is one of the most important aspects in modern communications. Recently, programmable networks have gained popularity due to their abstracted view of the network which, in turn, provides a better understanding of the complex network operations and increases the effectiveness of the actions that should be taken in the case of any potential threat. Software Defined Networking (SDN) represents an emerging centralized network architecture, in which the forwarding elements are being managed by a central unit, called an SDN controller, which has the ability to obtain traffic statistics from each forwarding element in order to take the appropriate action required for preventing any malicious behavior or abusing of the network. At the same time, the SDN controller uses a programmable network protocol, which is Open- Flow (OF) protocol, in order to communicate and forward its decisions to OF-enabled switches [1].

tection systems: signature-based IDS and anomaly-based IDS. Signature-based IDS searches for defined patterns within the analyzed network traffic. On the other hand, anomaly-based IDS is able to estimate and predict the behavior of system. Signature-based IDS shows a good performance only for specified well-known attacks. On the contrary, anomaly-based IDS enjoys ability to detect unseen intrusion events, which is an important advantage in order to detect zero day attacks [5]. Anomaly-based IDS can be grouped into three main categories [5]: statisticalbased approaches, knowledge-based approaches, and machine learning-based approaches. In this study, we are focusing on machine learning-based approaches. Machine learning techniques can be categorized into four main categories: (i) supervised techniques, (ii) semi-supervised techniques, (iii) unsupervised techniques and (iv) reinforcement techniques. In this paper, we investigate various supervised learning techniques with respect to their accuracy, false positive rate, area under ROC curve and time taken to train and test each classifier. 2 Related Work The previous research efforts made for providing a detailed analysis of supervised machine learning-based intrusion detection are summarized in Table 1. These studies focused on training and testing different machine learning approaches using standard intrusion detection datasets. However, obtaining all these features from an SDN controller could be computationally expensive. Therefore, we can either use a subset of these standard datasets [6] or extract new features based on network traces of standard datasets or statistics provided by the controller [7]. In this study, we use a subset of features extracted from dataset and we consider the following supervised machine learning approaches: Adaptive Neuro-Fuzzy Inference System (ANFIS), Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT) and K Nearest-Neighbor (KNN). 3 Dataset and Selected Features [ht!] As mentioned earlier, in this study we use NSL- KDD dataset. is an improved version of KDD Cup99 dataset, which suffers from huge number of redundant records [10]. Both KDD Cup99 and datasets include the features shown in Table 2. In this study, we use only features which are easy to obtain from SDN controllers [7]. Therefore, features number F1, F2, F5, F6, F23 and F24 are selected for this study. As shown in Table 3, includes a total of 39 attacks where each one of them is classified into one of the following four categories. Moreover, a set of these attacks is introduced only in the testing set. In addition, Table 4 shows the distribution of the normal and attack records in training and testing sets. 4 Evaluation Metrics The performance of each classifier is evaluated in terms of accuracy, False Positive Rate, Area Under ROC Curve (AUC) and execution time. A good IDS should achieve high accuracy with low false positive rate. The accuracy is calculated by: Accuracy = T P + T N T P + T N + F P + F N True Positives (TP) is the number of attack records correctly classified; True Negatives (TN) is the number of normal traffic records correctly classified; False Positives (FP) is the number of normal traffic records falsely classified and False Negatives (FN) is number of attack records instances falsely classified. False positive rate is calculated by: F alse P ositive Rate = F P T N + F P In addition, we evaluate the performance of previously selected classifiers based on execution time as well as the analysis of the receiver operator characteristic (ROC) curve where the area under curve (AUC) can be used to compare each classifier with another one. The higher AUC, the better IDS. (1) (2) 2

Ref. Year Algorithms Dataset [8] 2005 C 4.5, K-nearest neighbor Multi-layer perceptron Regularized discriminant analysis KDD CUP99 Fisher linear discriminant Support vector machines [9] 2007 Decision Trees, Random Forest Naive Bayes, Gaussian classifier KDD CUP99 [10] 2009 J48, SVM, Naive Bayes (NB), NB Tree Random Forest, Random Tree Multi-layer perceptron (MLP) [11] 2010 Discriminative multinomial Nave Bayes classifiers [12] 2013 Principal component analysis based feature selection Genetic algorithm based detector generation, J48, NB, MLP, BF-Tree, NB- Tree, RF Tree. [13] 2013 Feature selection by correlation based feature selection and consistency based filter ADTree, C4.5, J48graft, LADTree, NBTree, RandomTree, RandomForest, REPTree [14] 2013 J48, BayesNet, Logistic, SGD, IBK, JRip, PART, Random Forest, Random Tree and REPTree [15] 2015 Neural Networks [16] 2016 Logistic Regression Gaussian Naive Bayes SVM and Random Forest Table 1: Overview of previous supervised machine learning studies for intrusion detection F. # Feature name. F. # Feature name. F. # Feature name. F1 Duration F15 Su attempted F29 Same srv rate F2 Protocol type F16 Num root F30 Diff srv rate F3 Service F17 Num file creations F31 Srv diff host rate F4 Flag F18 Num shells F32 Dst host count F5 Source bytes F19 Num access files F33 Dst host srv count F6 Destination bytes F20 Num outbound cmds F34 Dst host same srv rate F7 Land F21 Is host login F35 Dst host diff srv rate F8 Wrong fragment F22 Is guest login F36 Dst host same src port rate F9 Urgent F23 Count F37 Dst host srv diff host rate F10 Hot F24 Srv count F38 Dst host serror rate F11 Number failed logins F25 Serror rate F39 Dst host srv serror rate F12 Logged in F26 Srv serror rate F40 Dst host rerror rate Table 2: List of features of KDD Cup 99 dataset. 3

Attack category Denial of service (DoS) Remote to local (R2L) User to root (U2R) Probe Attack name Apache2, Smurf, Neptune, Back, Teardrop, Pod, Land, Mailbomb, Processtable, UDPstorm WarezClient, Guess Password, WarezMaster, Imap, Ftp Write, Named, MultiHop, Phf, Spy SnmpGetAttack, SnmpGuess, Worm, Xsnoop, Xlock, Sendmail Buffer Overflow, Httptuneel, Rootkit, LoadModule, Perl, Xterm, Ps, SQLattack Satan, Saint, Ipsweep, Portsweep, Nmap, Mscan Table 3: List of attacks presented in dataset 5 Experimental Results The experiment is conducted on Intel i5 machine with 12 GB of RAM. Table 5 shows the results obtained for both training and testing stages. In terms of accuracy, the most accurate classifiers for the training stage are: DT, KNN and RF with a slight difference between them. For the testing stage, however, we notice that KNN approach achieved the highest accuracy. In terms of false positive rate, it is worth mentioning that RF approach achieved the best results. In terms of area under ROC curve, as shown in Fig. 1 and Fig. 2, we notice that KNN approach achieves the best AUC for both training and testing tasks followed by DT, RF and ELM with slight difference between each other. Both ANFIS and SVM had nearly the same AUC for the training task. NB, however, achieved the least AUC for both training and testing tasks. For the testing task, as shown in Fig. 2, RF achieved a higher AUC than DT. In the same context, we notice that ELM also achieved a higher AUC than DT approach. In addition, SVM showed a better AUC than LDA and ANFIS. On the other hand, using a subset of flow-based features was fairly good for the training stage. However, it significantly reduced the testing accuracy, which may indicate that selecting optimal features should be taken into consideration for achieving a higher accuracy during the testing stage. In terms of execution time, from Fig. 3 and Fig. 4, we Figure 1: ROC curve for training different supervised machine learning algorithms Figure 2: ROC curve for testing different supervised machine learning algorithms observe that ELM approach achieves the best results for both training and testing tasks. Therefore, we conclude that ELM is a timely-efficient choice for SDNs. It is also observed that ANFIS approach shows the worst training 4

KDD Train KDD Test Total Records 125973 22544 Normal DoS R2L U2R Probe 67343 45927 995 52 11656 53.46% 36.46 0.79% 0.04% 9.25% 9711 7458 2754 200 2421 43.07% 33.08% 12.22% 0.89% 10.74% Table 4: Distribution of attacks and normal records in dataset False Positive Accuracy (%) Method Rate (%) Training Testing Training Testing Naive Bayes 59.27 49.88 3.7227 5.14 NN 84.10 66.22 2.41 1.61 LDA 87.57 69.36 3.26 2.24 ANFIS 88.88 68.80 2.89 2.46 SVM 90.86 71.00 6.55 10.27 ELM 93.16 74.17 2.25 2.31 RandomForest 98.09 75.96 0 0 KNN 98.23 77.09 3.128 4.07 Decision Trees 98.37 74.43 0.306 6.43 Table 5: Accuracy and false positive rate obtained after training and testing different supervised machine learning algorithms Figure 3: Execution time for training different supervised machine learning methods time. On the other hand, in spite of the highest accuracy for the testing stage achieved by KNN approach, it showed the worst testing time, which may indicate that KNN is not the best choice for SDNs where each controller may need to handle thousands of flows per second. Consequently, it seems obvious that there is a trade-off between accuracy and execution time which should be taken into consideration when choosing a supervised machine learning based IDS for SDNs. 6 Conclusion In this paper, we provided a comparative study of choosing an efficient anomaly-based intrusion detection method for SDNs. We focused on supervised machine learning approaches by using the following classifiers: ANFIS, NN, LDA, DT, RF, SVM, KNN, NB, ELM. Using NSL- KDD dataset and based on our experimental studies, we conclude that KNN approach shows the best performance Figure 4: Execution time for testing different supervised machine learning methods 5

in terms of accuracy and AUC. Whereas in terms of false positive rate RF approach achieves the best results. In terms of execution time, ELM achieves the best results. Our future work will be focused on comparing the results obtained from this study with other machine learning approaches and exploring other flow-based features that could be used in order to achieve a higher testing accuracy. References [1] Ha, T., Kim, S., An, N, Narantuya, J., Jeong, C., Kim, J., Lim, H.: Suspicious traffic sampling for intrusion detection in software-defined networks, Comput. Netw, 2016, 109, pp. 172-182. [2] AlErouda, A. and Alsmadib, I.: Identifying cyberattacks on software defined networks: An inferencebased intrusion detection approach, J. Netw. Comput. Appl, 2017, 80, pp. 152164. [3] Cui, Y., Yan, L., Li, S., Xing, H., Pan, W., Zhu, J., Zheng, X.: SD-Anti-DDoS: fast and efficient DDoS defense in software-defined networks, J. Netw. Comput. Appl, 2016, 68, pp. 6579. [4] Hong, S., Xu, L., Wang, H., Gu, G.: Poisoning network visibility in software-defined networks: new attacks and countermeasures. Proc. NDSS. 22nd Annu. Network and Distributed System Security Symposium, California, USA, Feb 2015, pp. 1-15. [5] Garca-Teodoroa, P., Daz-Verdejoa, J. Macia- Fernandeza, G., Vazquezb, E.: Anomaly-based network intrusion detection: Techniques, systems and challenges, Comput. Secur, 2009, 28, pp. 1828. [6] Tang, T. A., Mhamdi, L., McLernon, D., Zaidi, S. A. R., Ghogho, M.: Deep learning approach for network intrusion detection in software defined networking. Proc. International Conf. on Wireless Networks and Mobile Communications, Fez, Morocco, Oct. 2016, pp. 258-263. [7] Braga, R., Mota, E., Passito, A.: Lightweight DDoS flooding attack detection using NOX/OpenFlow. Proc. IEEE 35th Conf. on Local Computer Networks, Denver, USA, Oct. 2010, pp. 408-415. [8] Laskov, P., Dssel, P., Schfer C., Rieck, K.: Learning intrusion detection: supervised or unsupervised?. Proc. 13th International Conf. Image Analysis and ProcessingICIAP, Cagliari, Italy, Sept. 2005, pp. 50-57. [9] F. Gharibian F.,Ghorbani, A. A.: Comparative study of supervised machine learning techniques for intrusion detection. Proc. IEEE Fifth Annu. Conf. on Communication Networks and Services Research, New Brunswick, Canada, May 2007, pp. 350-358. [10] Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A. A.: A detailed analysis of the KDD CUP 99 data set. Proc. IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ontario, Canada, July 2009, pp. 1-6. [11] Panda, M., Abraham, A., Patra, M. R.: Discriminative multinomial naive bayes for network intrusion detection. Proc. IEEE Sixth International Conf. on Information Assurance and Security, Atlanta, Canada, Aug. 2010, pp. 5-10. [12] Aziz, A. S. A., Hassanien, A. E., Hanaf, S. E. O., Tolba, M. F.: Multi-layer hybrid machine learning techniques for anomalies detection and classification approach. Proc. IEEE 13th International Conf. on Hybrid Intelligent Systems, Gammarth, Tunisia, Dec. 2013, pp. 215-220. [13] Thaseen S., and Kumar, C. A.: An analysis of supervised tree based classifiers for intrusion detection system. Proc. IEEE International Conf. on Pattern Recognition, Informatics and Mobile Engineering, Salem, India, Feb. 2013, pp. 294-299. [14] Chauhan, H., Kumar, V., Pundir, S., Pilli E. S.: A comparative study of classification techniques for intrusion detection. Proc. IEEE International Symposium on Computational and Business Intelligence, New Delhi, India, Aug. 2013, pp. 40-43. [15] Ingre B., Yadav, A.: Performance analysis of NSL- KDD dataset using ANN. Proc. IEEE International 6

Conf. on Signal Processing and Communication Engineering Systems, Guntur, India, Jan. 2015, pp. 92-96. [16] Belavagi, M. C., Muniyal, B.: Performance evaluation of supervised machine learning algorithms for intrusion detection. Proc. Twelfth International Multi-Conf. on Information Processing, Bangalore, India, Dec. 2016, pp. 117-123. 7