A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection

Similar documents
Efficient Flow based Network Traffic Classification using Machine Learning

Machine Learning Classifiers for Network Intrusion Detection

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Machine Learning based Traffic Classification using Low Level Features and Statistical Analysis

Internet Traffic Classification using Machine Learning

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing

Intrusion Detection System using AI and Machine Learning Algorithm

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Anomaly Detection in Communication Networks

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly

Event Detection through Differential Pattern Mining in Internet of Things

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Evaluating Classifiers

Intrusion Detection System with FGA and MLP Algorithm

Statistical based Approach for Packet Classification

Automated Application Signature Generation Using LASER and Cosine Similarity

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

Detecting Network Performance Anomalies with Contextual Anomaly Detection

International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17, ISSN

Flow-based Anomaly Intrusion Detection System Using Neural Network

Evaluating Classifiers

Performance Evaluation of Various Classification Algorithms

Use of Synthetic Data in Testing Administrative Records Systems

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

NMLRG #4 meeting in Berlin. Mobile network state characterization and prediction. P.Demestichas (1), S. Vassaki (2,3), A.Georgakopoulos (2,3)

Encoding Words into String Vectors for Word Categorization

Clustering algorithms and autoencoders for anomaly detection

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Business Club. Decision Trees

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Large Scale Data Analysis Using Deep Learning

Keyword Extraction by KNN considering Similarity among Features

Network Traffic Measurements and Analysis

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Detecting Malicious Hosts Using Traffic Flows

A NEW HYBRID APPROACH FOR NETWORK TRAFFIC CLASSIFICATION USING SVM AND NAÏVE BAYES ALGORITHM

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

Object Purpose Based Grasping

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Tree-Based Minimization of TCAM Entries for Packet Classification

Detecting malware even when it is encrypted

SOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER

node2vec: Scalable Feature Learning for Networks

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

Artificial Intelligence. Programming Styles

Can t you hear me knocking

An Analysis of UDP Traffic Classification

Application of Support Vector Machine Algorithm in Spam Filtering

CS5670: Computer Vision

Network traffic classification: From theory to practice

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

INTRUSION DETECTION SYSTEM USING BIG DATA FRAMEWORK

Chapter 3: Supervised Learning

Python With Data Science

Using Machine Learning to Optimize Storage Systems

Intelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI) April 25.

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Anonymization of Network Traces Using Noise Addition Techniques

Data Sources for Cyber Security Research

I211: Information infrastructure II

A Firewall Architecture to Enhance Performance of Enterprise Network

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection System for Video Data Using Machine Learning

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Machine Learning based session drop prediction in LTE networks and its SON aspects

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Semi-Supervised Clustering with Partial Background Information

Performance Analysis of various classifiers using Benchmark Datasets in Weka tools

Evaluating Classifiers

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CS145: INTRODUCTION TO DATA MINING

Machine Learning in Digital Security

Contents. Preface to the Second Edition

Further Thoughts on Precision

Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data

Basic Concepts in Intrusion Detection

Network Traffic Measurements and Analysis

Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks. Anna Giannakou, Daniel Gunter, Sean Peisert

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Supervised Learning Classification Algorithms Comparison

Empirical Study of Automatic Dataset Labelling

Online Traffic Classification Based on Sub-Flows

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

Plagiarism Detection Using FP-Growth Algorithm

Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks. Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer

BUILDING A FRAMEWORK FOR INTRUSION DETECTION AND PREVENTION IN IoT USING DATA ANALYTICS METHODS

Transcription:

Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 25 A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection Kaiser Nahiyan, Samilat Kaiser, Dr. Ken Ferens, Dr. Robert McLeod Department of Electrical and Computer Engineering, University of Manitoba, Canada. { nahiyank, kaisers3 }@myumanitoba.ca, { Ken.Ferens, Robert.McLeod }@umanitoba.ca Abstract The importance of finding meaning in unstructured data is increasing. In the field of network intrusion detection, unsupervised learning from unlabeled data is of vital significance yet there is no universal technique for the purpose. Most approaches including unsupervised machine learning algorithms involve tedious efforts in terms of computational complexity on large amounts of data that needs additional preprocessing, and yet the accuracy of detection is not satisfactory. This work focuses on an automated, agent-based, nonsupervised, relatively uncomplicated cognitive approach that segregates attacks from normal events within the large search space with reduced computational demands. The algorithm presented collects features from statistical analysis of the observed attributes over each time-step (much like any intuitive learner would try to infer from a stream of unlabeled data) and uses machine learning to isolate the attack events from normal ones using an unsupervised k-means clustering algorithm over the reduced dataset. The computational load for central processing is further optimized by utilizing the agent based architecture where agents are deployed in hosts, and some processing is done at the host and the rest is performed by the node that performs the classification. With an increasing number of small device networks supporting IoT, mobile and sensor networks, demands for fast light weight machine learning models for unsupervised attack identification is a requirement. We validate our algorithm on two recent datasets with modern day attacks, and furthermore do a multi-scale analysis to locate the time-scale of attacks. I. INTRODUCTION From leaking debit card details to intrusion into highly classified materials, cyber-attacks have become a real threat and a part of our political and social discourse. Attacks are no longer done by isolated individuals, now there are organized crimes orchestrated by hacker groups. Likewise, the research in cybersecurity is also at its peak. Machine learning has demonstrated much recent success in transforming all sectors including cyber-security. However, in cyber security the availability of datasets is very rare. Only a small number of datasets are publicly available, generation methods are not uniform, they often contain private data with added formalities, and in many cases, there is no ground truth to guide the researchers into what to expect. For supervised learning methods, the approach is to utilize the labelled data to train the algorithm with training data with mixed samples consisting of all the classes. Once the learner has learned it can be exposed to new samples and can classify the attacks from normal traffic data. Unlike others [1], network traffic data is vastly diverse; IPs and ports are categorical data represented in numbers, hardware addresses are categories represented by groups of characters, payloads and user data are often encrypted, network parameters are flags that are often binary, and the list goes on. Henceforth, achieving consistent detection accuracy on test data becomes difficult even for supervised techniques, let alone unsupervised ones. In unsupervised machine learning the data is unlabeled and hence there is no understanding as to how to find out meaning of the data and how to utilize the knowledge to further classify the samples. The conventional techniques like clustering when applied to the entire dataset are incapable of delivering satisfactory accuracy whereas complex methods like deep learning and neural networks require huge sets of data samples and long hours of intensive computation. We argue that nowadays much focus is towards implementing complex and resource hungry machine learning methods whereas comparable results can be achieved with much less computation power and much less data, and hence

26 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 timely actions can be taken to address the intrusion. From the context of cognitive detection, an unsupervised learner will be applying simpler techniques like statistical learning, flow analysis and clustering to identify the attacks, which is exactly the approach described in this study. Now that we have set our focus on simplified learning using less computation power, we present the idea of agent based model in our approach. An Agent Based Model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole [2]. In our model, the agents deployed in the hosts and gateways, agents at hosts perform independent analysis from the host traffic and provides the processed information to a gateway agent for further processing. In this manner, the computation load and time required for convergence is further reduced. II. RELATED PREVIOUS WORK Statistical analysis of traffic has been done previously for classification of application or user types. Roughan et al. [3], used nearest neighbor and linear discriminate analysis approaches to map different network applications to different QoS classes. Bernaille et al. [4], proposed a technique using unsupervised ML (K-Means clustering) algorithm that classifies different types of TCP-based applications using the first few packets of the traffic flow. On the UNSW-NB15 dataset [5], Moustafa et al. [6] applied an Association Rule Mining algorithm as feature selection to generate the strongest features from the dataset, Gharaee et al. [7] proposed an anomaly based IDS using Genetic algorithm and Support Vector Machine (SVM) with a new feature selection method. Moustafa et al. [8] performed statistical learning of the observations and the attributes of UNSW dataset, examined the feature correlations and applied existing classifiers to evaluate the complexity in terms of accuracy with KDD99 data set. Previously on the Aegean Wi-Fi Intrusion Dataset - AWID [9], Kolias compared the accuracy of different machine learning techniques on AWID reduced dataset. Thanthrige et al. [10] applied feature reduction techniques such as Information Gain and Chi-Squared statistics to evaluate dataset performance with feature reduction techniques. However, no one has worked on the analysis of time-step based statistical feature analysis on these datasets. Moreover, no previous work mentioned above has approached agent based computation modeling which has been presented in this work. The results of accuracy gained from previous authors were in lieu of high computation based machine learning methods which had to process the entire number of rows in the dataset, hence required major processing time. Our approach is a much more straight-forward, can be easily automated, and can classify the big complex datasets by extracting smaller feature datasets using statistical techniques, runs much faster than others, and utilizes the distributed processing architecture which makes it compatible in micro habitats. III. PROPOSED METHODOLOGY Our motive is to classify the dataset into normal and attack in an unsupervised manner without any training as such, and to find some meaning out of the data. We first apply our algorithm with UNSW-NB15 dataset. We consider all the four files of UNSM-NB15 dataset, which has 3,239,993 rows containing 14.48% attack rows and the rest are normal. The dataset for UNSW contains 49 columns in total. To alter the datasets for the unsupervised problem, we strip the labels from the dataset during preprocessing. The missing data analysis are shown in Fig.. We impute the missing values, and change the categorical data into numeric representations for the columns state, proto, service, srcip and dstip. Such methods are conventional measures for making it easier for the machine to learn. We add two more attributes - srcip_trunc_encoded and dstip_trunc_encoded, which are the subnet addresses of the source and destination IPs and encode them from categorical to numbers. When working with large data sets it is helpful to divide the dataset into smaller fractions which can be analyzed individually. Our sampler divides the large data into time-steps, fragmenting the data set into smaller sections based on the timestamp. These small segments are then processed to find out features from their statistical analysis. Our hypothesis is that the time window that contain attack samples will have significant feature separation from the time window that will have only normal samples. Hence, the sampler collects groups of rows from the dataset, which fall in a certain range of timestamps and creates a new data frame. Then,

Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 27 the feature extraction portion of the algorithm extracts the mean and standard deviation of each of the attribute in the data frame. We have now transformed 3,239,993 rows to 85348 rows, each row now representing the events that occurred during that time window. At this point the dataset has been reduced by 97%. The statistical analysis of each of these attributes are extracted as features and a new data frame consisting of the mean and variance of all the attributes except IP, port, time, etc. Fig. 1: Sampler and Feature Extractor Fig. 2: Classification and Evaluation The dataset that is created from the original data is much reduced in terms of rows, and columns. Hence we are reducing the algorithm s computation time by reducing the number of rows that the unsupervised algorithm needs to process. Fig. 3: Missing Data UNSW Fig. 4: Missing Data AWID This presents the cognitive learner with two sample spaces, one of which has the attack samples. Now, for the intuitive learner to identify which is the attack cluster, it will pick up the time-step samples from each class and try to understand if any attack has occurred during this time step. A way of achieving that will be using the internal system logs corresponding to the time mentioned in the time-step, however, this part is out of the scope of this study. For the evaluation of our algorithm, these two clusters are examined individually and checked if they have accurately classified the events as attack and normal. Since we are doing this on the datasets which initially had labels, we can evaluate the prediction with the actual values by checking their accuracy scores from their false and true positives and negatives. After we have evaluated the accuracy of our machine learning instance, the same algorithm is applied on another dataset to as a final validation for the algorithm. For this we have used AWID dataset. On top of the classification of network intrusion based on statistical features of timesteps, our work presents a multi-scale analysis of the time-steps; we create feature datasets considering time-steps of t, 2t, 4t and 8t, where t is a relatively small time-window in the dataset which contains a balanced number of events. In other words, we are trying to find out the best value of n for time-step 2 n t. We run our algorithm on each of the datasets and our results present the best time-scales for each dataset. Such an analysis can be further used as a benchmark for future research. IV. MULTI-AGENT BASED MODEL Without an agent based approach, the gateway is the node that has access to the entire traffic in the network, hence the gateway must process the traffic flows from each host, and this may include multiple traffic from each host that has occurred in that time window. For example, any host X, has initiated 1000 traffic flows at a time step t. If there are 20 hosts like X, the gateway must process 500*1000 traffic flows every time step, and if we are recording 50 attributes of a traffic flow, the gateway node must perform multiplications and additions over a data size of 20*1000*50 = 10,00,000 for each time step. If the classification is deployed in another node, external to the gateway, then the gateway should send this much data (for each time step) over the network to that

28 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 classifier node for further processing. On the contrary, if we deploy a multi-agent based approach in the below manner the computation on the classifier node can be further optimized from the central gateway or classifier node and distributed over the network. Each host has an agent that performs the computation of the traffic flows for that host IP. Hence, the 1000 traffic flows for the host X will be processed by the agent in X and the statistical analysis of these 1000 rows will provide a single row for host X at the timestep t. In this manner now the classifier has to compute over only 20(nodes) *1(row provided by each nodes)*50(attributes recorded) = 1000 rows instead of 10,00,000, which is a significant increase. The other advantage of such an approach is that now not only the gateway, any node can be the classifier node. V. ANALYSIS AND RESULTS A. Physical Setup We perform the simulation on the python engine running on a 64-bit OS, the underlying hardware is AMD Quad-Core processor with 8GB RAM. The data is processed using the various python libraries like pandas, scikit-learn, etc. B. Results Out of the 85348 rows in the reduced dataset, 36580 positives were correctly identified, and 37902 negatives were correctly identified. The confusion matrix is shown in below, which is depicted in Fig. 5. In Fig 6., the comparison of computation time is shown, which shows that our approach is much faster. The classification is 89% correct which is a very high number achieved for unsupervised learning. The AWID- R dataset showed an accuracy of 29% with basic unsupervised K-Means, and with our algorithm the accuracy was increased by 60%. This is depicted in Fig. 7. Fig. 5: Comparison of processing time Fig. 6: Comparison of Rows Processed Fig. 7: Comparison of Achieved Accuracy Fig. 6 to 9 is a depiction of the time scale analysis of one of the UNSW dataset files. The plot shows the time step in x-axis, vs the count of total rows observed during that time-step for various scales t=1 second to t = 4 seconds. Fig. 8 : Confusion Matrix for K-means on time scale t=1 sec for UNSW dataset As the scales increase, the maximum value in the x-axis reduces and the maximum value in the y- axis increases.

Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 29 Fig. 9: Time Scale Analysis shown for UNSW (scale t = 1 sec) Fig. 10: Time Scale Analysis shown for UNSW (scale t = 2 sec) Fig. 11: Time Scale Analysis shown for UNSW (scale t = 4 sec) Fig. 12: Time Scale Analysis shown for UNSW (scale t = 8 sec) The metrices that we use for evaluating the results are Accuracy, Recall, Precision and F1 score. Their desciptions are provided in Table 1. The results achieved are provided in Table 2 show that the algorithm performed best was for the scales that are 4 seconds or higher. The same is depicted in Fig. 13. Table 2: Results achieved for different time-scales Time Scale 1 sec 2 sec 4 sec 8 sec Class precision recall f1- score 0 0.9 0.85 0.87 1 0.85 0.89 0.87 total 0.87 0.87 0.87 0 0.98 0.84 0.91 1 0.85 0.98 0.91 total 0.92 0.91 0.91 0 0.99 0.84 0.91 1 0.85 0.99 0.92 total 0.92 0.91 0.91 0 0.99 0.84 0.91 1 0.85 0.99 0.92 total 0.92 0.91 0.91 Fig. 13: Accuracy Analysis on various time windows Table 1 : Accuracy Metrics Accuracy Recall Precision F1 Score (TP + TN) / (TP + TN + FP + FN) (TP ) / (TP + FN) (TP ) / (TP + FP) 2 ( (Precision * Recall) / (Precision + Recall) ) Ratio of positive and negative cases correctly identified Ratio of overall positive cases correctly identified Ratio of negative cases correctly identified measure of the accuracy of the test, a weighted average of the recall and precision VI. FUTURE WORK We need to address the fact that the attacks are ever changing. No algorithm can withstand for decades as there are more improved efforts by the attackers to imitate the normal traffic, hence soon there will be attacks with normal features. Therefore, our future work of this study will be to synthetically design attack traffic that will

30 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 outperform this algorithm, and then to apply other advanced techniques to filter out such attacks. One way of doing this would be by applying fractal analysis to differentiate normal and attack. This approach has received significant recent attention in the research community. VII. REFERENCES [1] R. Sommer and V. Paxson, "Outside the Closed World: On Using Machine Learning for Network Intrusion Detection," in IEEE Symposium on Security and Privacy, Oakland, CA, USA, 2010. [2] "Wikipedia," [Online]. Available: https://en.wikipedia.org/wiki/agentbased_model. [Accessed 15 May 2017]. [3] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, "Class of service mapping for QoS: a statistical signature-based approach to IP traffic classification," in 4th ACM SIGCOMM conference on Internet measurement, New York, NY, USA, 2004. [4] L. Bernaille, R. Teixeira, T. Akodkenou, A. Soule and K. Salamatian, "Traffic Classification On The Fly," in ACM SIGCOMM Computer Communication Review, New York, NY, USA, April 2006. [5] N. Moustafa and J. Slay, "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)," in 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, 2015. [6] N. Moustafa and J. Slay, "The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems," in 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), Kyoto, 2015. [7] H. Gharaee and H. Hosseinvand, "A new feature selection IDS based on genetic algorithm and SVM," in 8th International Symposium on Telecommunications (IST), Tehran, 2016. [8] N. Moustafa and J. Slay, "The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set," Information Security Journal: A Global Perspective, Vols. 1-3, no. 25, pp. 18-31, 2016. [9] C. Kolias, G. Kambourakis, A. Stavrou, S. Gritzali, "Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset," in Communications Surveys Tutorials IEEE, 2015. [10] U. S. K. P. M. Thanthrige, J. Samarabandu and X. Wang, "Machine learning techniques for intrusion detection on public dataset," in IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Vancouver, 2016.