International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17, ISSN

Similar documents
Ripple Down Rule learner (RIDOR) Classifier for IRIS Dataset

A Comparative Study of Selected Classification Algorithms of Data Mining

Performance Evaluation of Rule Based Classification Algorithms

EVALUATING THE EFFICIENCY OF RULE TECHNIQUES FOR FILE CLASSIFICATION

Comparative Study on Classification Meta Algorithms

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining Part 5. Prediction

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Weka ( )

Data Mining and Knowledge Discovery Practice notes 2

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Data Mining and Knowledge Discovery: Practice Notes

A Review on Performance Comparison of Artificial Intelligence Techniques Used for Intrusion Detection

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Data Mining and Knowledge Discovery: Practice Notes

INTRUSION DETECTION MODEL IN DATA MINING BASED ON ENSEMBLE APPROACH

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

Credit card Fraud Detection using Predictive Modeling: a Review

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Research Article International Journals of Advanced Research in Computer Science and Software Engineering ISSN: X (Volume-7, Issue-6)

NETWORK INTRUSION DETECTION SYSTEM BASED ON MODIFIED RANDOM FOREST CLASSIFIERS FOR KDD CUP-99 AND NSL-KDD DATASET

ERA -An Enhanced Ripper Algorithm to Improve Accuracy in Software Fault Prediction

CS570: Introduction to Data Mining

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

Feature Selection in UNSW-NB15 and KDDCUP 99 datasets

International Journal of Software and Web Sciences (IJSWS)

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Cluster Based detection of Attack IDS using Data Mining

Deep Learning Approach to Network Intrusion Detection

Intrusion Detection Using Data Mining Technique (Classification)

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Machine Learning and Bioinformatics 機器學習與生物資訊學

Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing

CS4491/CS 7265 BIG DATA ANALYTICS

Intrusion Detection System with FGA and MLP Algorithm

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods

Flow-based Anomaly Intrusion Detection System Using Neural Network

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 4, Issue 7, January 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

An Information-Theoretic Approach to the Prepruning of Classification Rules

A Comparison of Decision Tree Algorithms For UCI Repository Classification

Feature Selection in the Corrected KDD -dataset

CLASSIFICATION OF ARTIFICIAL INTELLIGENCE IDS FOR SMURF ATTACK

Efficient Pairwise Classification

Intrusion detection in computer networks through a hybrid approach of data mining and decision trees

A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Comparative Analysis of Classification Algorithms on KDD 99 Data Set

Network Traffic Measurements and Analysis

Data Mining and Knowledge Discovery: Practice Notes

REVIEW OF VARIOUS INTRUSION DETECTION METHODS FOR TRAINING DATA SETS

Bayesian Learning Networks Approach to Cybercrime Detection

Machine Learning Classifiers for Network Intrusion Detection

A Survey And Comparative Analysis Of Data

Classification with Decision Tree Induction

Data Mining and Knowledge Discovery: Practice Notes

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

List of Exercises: Data Mining 1 December 12th, 2015

Data Mining and Knowledge Discovery: Practice Notes

Application of the Generic Feature Selection Measure in Detection of Web Attacks

Evaluating Classifiers

IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 06, 2014 ISSN (online):

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

AGRICULTURAL SOIL LIME STATUS ANALYSIS USING DATA MINING CLASSIFICATION TECHNIQUES

Machine Learning Techniques for Data Mining

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

An Anomaly-Based Intrusion Detection System for the Smart Grid Based on CART Decision Tree

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

A Novel Approach for Removal of Redundant Test Cases using Hash Set Algorithm along with Data Mining Techniques

Decision Tree Learning

Pramod Bide 1, Rajashree Shedge 2 1,2 Department of Computer Engg, Ramrao Adik Institute of technology/mumbai University, India

Multiple Classifier Fusion With Cuttlefish Algorithm Based Feature Selection

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters

K- Nearest Neighbors(KNN) And Predictive Accuracy

Intrusion Detection System based on Support Vector Machine and BN-KDD Data Set

Anomaly Detection in Communication Networks

Classification. Instructor: Wei Ding

A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis

The Explorer. chapter Getting started

Gurmeet Kaur 1, Parikshit 2, Dr. Chander Kant 3 1 M.tech Scholar, Assistant Professor 2, 3

Implementation of Novel Algorithm (SPruning Algorithm)

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management

Classification. Slide sources:

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Comparative Study of J48, Naive Bayes and One-R Classification Technique for Credit Card Fraud Detection using WEKA

Transcription:

RULE BASED CLASSIFICATION FOR NETWORK INTRUSION DETECTION SYSTEM USING USNW-NB 15 DATASET Dr C Manju Assistant Professor, Department of Computer Science Kanchi Mamunivar center for Post Graduate Studies, Lawspet, Puducherry ABSTRACT: Communication plays a vital role in information technology. It involves transfer of data from one place to another. An intrusion detection system is used to detect and manage internal and external attacks and other threats such as botnets, phishing spoofing etc. Here in this paper, evaluation of Network Intrusion Detection Systems is dealt with using USNWNB 15 dataset and rule based classifiers. Direct and Indirect method of analysis is done using Ripper, One-R, RIDOR, Decision Table and PART procedures. After evaluation and Analysis, it is found that PART classifier which is an indirect method of rule based classifiers is best in accuracy and error reduction compared to other classifiers. Keywords: Intrusion Detection System, USNW-NB15 dataset, Rule Based Classifiers, Direct Method, Indirect Method [1] INTRODUCTION Security in information technology is very important when transmission of data is involved. IDS deals in detecting and managing various attacks that happen during the process of communication. IDS can be classified as Host based and Network based. Host based concerned with local attacks where as Network based IDS on overall network activities [1][2]. Dr C Manju 130

RULE BASED CLASSIFICATION FOR NETWORK INTRUSION DETECTION SYSTEM USING USNW-NB 15 DATASET IDS can be modelled using analysis approach which monitors against predetermined attack list or signatures. It is based on matching signature system hence can be focused only on known attacks. Next is anomaly based approach which makes use of state of network traffic and report whether it has normal traffic or anomaly in it. Main aim of IDS is to generate and integrate Network and Host based approaches for better detection. Many IDS schemes can be developed for detecting novel attacks more than individual incantations. Evaluation of network data is done using various available data sets. [2] DATA SET DESCRIPTION Evaluation of network intrusion data system was done by using KDD98, KDDCUP99, NSDLKDD benchmark data set. These data sets are very old and cannot take care of current topology and traffic in the network. KDDCUP [3] dataset contains a large number of redundant records and also multiple missing data. NSDL is another data set which is a modification of KDDCUP but they cannot be used as perfect data set in modern network and traffic environment [4]. The Australian center for cyber security research group created data set called UNSW-NB15 [4] data set to evaluate NIDS. The IXIA perfect system tool is utilized in cyber range lab ACCS to create a modern and abnormal dataset. The dataset contains 49 fields and nine different types of attacks namely Fuzzers, Backdoors, Analysis, DOS, Exploits, Generic, Reconnaissance,shell code and Worms,[4].The dataset data can be categorized into details of flow features(which contains source,destination address, port address, Transmission protocol),basic features(data transfer details, load, services),time features, connection features and labelled features. [3] RULE BASED CLASSIFICATION Mining is the process of extracting knowledge from available datasets. Analysis of data can be used for extracting models, specifying classes or to predict what will happen. Classification can be used to analyse categorized labels and used to predict what will happen. Different classification models are available such as Statistical models, Fuzzy models, Rule based models, Ensemble method and Probabilistic method [5]. The rule based model generates a set of rules for prediction of output. A rule is actually a condition of the form (Condition) - y where condition is conjunction of attribute tests and y is a class label. A rule r covers an instance x if the attribute of instance satisfies the condition of rule. Main advantages of rule based classifiers are it is highly expressive as decision trees and easy to interpret and generate. New instances can be easily classified by rule based system. The rules can be mutually exclusive in which classifier contains rule that is independent of each other and exhaustive which accounts for every combination of attribute values [6]. There are two ways of building rules. They are direct method and indirect method. Direct method extracts rules directly from data and indirect method from other classification models. In this paper an analysis of direct method and indirect method is done by using classification algorithms and evaluation is done on the result. [3.1] Direct Method It starts with empty set of rules and the rules are generated directly from data. The rules are then pruned and simplified. Quality of classification rule can be evaluated by coverage and accuracy. The coverage is fraction of records that satisfy the antecedent of a rule and accuracy is fraction of records covered by rule that belongs to class on RHS. Various methods of classifiers are available under method. Here, RIPPER, RIDOR, One-R classifiers are studied and analyzed. Dr C Manju 131

A. One-R This method is used for finding relations between various variables in datasets. The method creates rule for each predictor and makes the rule assign value of each target class, It also calculates total error of rule of each predictor. Rules generated are as below < 131580.5 -> Exploits < 131883.5 -> Generic < 131895.5 -> Fuzzers >= 131895.5 -> Generic (55580/81694 instances correct) B. RIDOR Ridor is ripple down rule leaner, which generates a default rule first and use incremental reduced error pruning is it used to find exceptions with smallest error rate [7]. Except (id > 123179.5) and (id <= 123213.5) => attack_cat = Generic (3.0/0.0) [2.0/1.0] Except (id > 123520.5) and (id <= 123957.5) and (id > 123781) => attack_cat = DoS (33.0/0.0) [13.0/1.0] Except (id <= 123931) and (id > 123796) => attack- cat = Generic (25.0/0.0) [9.0/0.0].The values specify accuracy and coverage. Total number of rules (incl. the default rule) is 141272 and time taken to build model: 3015.4 seconds. C. DECISION TABLE It specifies only logic rules and is used to find quality of decision. It contains classifier rules which are created by a simple decision table majority classifier. It returns the majority of the training sets if the decision table matching the new instance is empty [8]. The testing resulted in forward searching with 47 evaluated subsets. The number of rules generated is 4326 and time for generating them is 16.71 s. D. RIPPER RIPPER is repeated Incremental Pruning to Produce Error Reduction. It divides training set into growing and pruning sets [7]. It is easy to interpret the results and applicable for certain kind of problems. The sample rule generated is as follows. (label = Attacked) and (dmean >= 45) and (dmean >= 107) and (sttl >= 254) => attack_cat=exploits (230.0/48.0) (label = Attacked) and (sinpkt >= 0.024) and (sinpkt <= 103.884333) and (dloss >= 2) and (dmean <= 55) => attack_cat=exploits (88.0/17.0) The value (230.0/48.0) specifies the coverage. It means, out of 230 instances 48 instances in data set is covered by the rule and others are not covered by the rule. The number of rules generated is 39 and execution time is 8049 s. Dr C Manju 132

RULE BASED CLASSIFICATION FOR NETWORK INTRUSION DETECTION SYSTEM USING USNW-NB 15 DATASET [3.2] INDIRECT METHOD The rules in this method are extracted from other classification models. The rules generated here are mutually exclusive and exhaustive. A. PART This is a new method for rule induction in which extract rules from an unpruned decision tree in the attempt to avoid problems. Unlike both C4.5 and RIPPER, it does not need to perform global optimization to produce accurate rule sets and the added simplicity is its main advantage. It will create partial decision tree on the current state of instances and rules are created from decision tree. It is a separate divide and conquer rule proposed by EIBE [8][9]. It generates decision list which are ordered set of rules and are as below id <= 120977 AND inpkt <= 0.004 AND id <= 120774: Fuzzers (5.0/2.0) id <= 120977 AND sinpkt <= 0.004 AND d > 120784: DoS (3.0/1.0) [4] EXPERIMENTAL ANALYSIS The analysis of the above classifiers is done using WEKA tool [8]. The evaluation is done using mining techniques which include pre-processing and filtration techniques. The pre-processing is done using CfsSubsetEval which Evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them and corresponding search is done using Best First which searches the space of attribute subsets by greedy hill climbing augmented with a backtracking facility. After the pre-processing and filtration, the number of attributes were reduced from 49 to 10. After that, data is passed through 10-fold cross validation. This method splits the dataset into 10 folds and for each 10 folds it builds a model on 9 sets of datasets. It records the error on each prediction and repeat the process until each of the 10 folds has served as test set. The dataset is analyzed through various classifiers in the category of direct and indirect method. They are evaluated for accuracy and error parameters. [4.1] ACCURACY PARAMETERS Accuracy parameters include Precision, True Positive, F-measure, ROC and Kappa statistic. Precision measure is the accuracy of the dataset and is evaluated based on attack so that intrusion detection data can be evaluated and find how accurate data is. It also specifies the attack on the data [8]. Accuracy refers to the ability of model to correctly predict the attacks of new or previously unseen data. Also, it is the percentage of correctly classified by the classifier testing set. The Precision is defined by TP/(TP+FP). Recall r is the number of correctly classified positive data divided by actual positive data in dataset. R = TP/(TP+FN). Receive Operating Characteristics Curve is the plot of True Positive Rate against False Positive Rate which also provides accuracy of classifier on the data [9]. It shows the tradeoff between sensitivity and specificity. The area under ROC is measurement of accuracy. Dr C Manju 133

Fig.1 ROC curve for PART classification The rule based system are evaluated and the accuracy parameters are as specified in the table CLASSIFIERS TP FP PRECISION F MEASURE ROC KAPPA DT 0.72 0.061 0.682 0.691 0.9 0.626 JRIP 0.679 0.169 0.66 0.621 0.795 0.541 One-R 0.628 0.099 0.555 0.57 0.788 0.498 PART 0.771 0.041 0.753 0.754 0.944 0.697 RIDOR 0.739 0.041 0.725 0.729 0.849 0.6565 Table:1 showing accuracy parameters of various classifiers The graph representing the above data is as follows Fig 2. Graph representing accuracy parameters From the graph and table, it is found that in evaluation based on accuracy parameters, the PART classification algorithm has increased accuracy rate precision, Kappa statistic, True positive and False Negative. The area Under ROC that means accuracy is high with PART classifier. From this, Dr C Manju 134

RULE BASED CLASSIFICATION FOR NETWORK INTRUSION DETECTION SYSTEM USING USNW-NB 15 DATASET we can conclude that PART classifier which is an indirect method of the classification is most suitable method of evaluating the USNWNB15 dataset. [4.2] Error rate evaluation Parameters The error evaluation parameters include Root Mean Squared error which shows the error in the predicated actual classes which the instance dataset belongs to [10][11]. RMSE values should be lower for more accurate classification rules. Mean absolute error measures the average magnitude of errors. The classifiers are evaluated for relative absolute error (RAE) and root relative squared error (RRSE) also. Classifiers MAE RMSE RAE RRSE DT 0.0767 0.1901 49.9626 68.6313 RIDOR 0.0591 0.2286 36.0723 82.6019 JRIP 0.0986 0.2224 64.2821 80.3133 One-R 0.0743 0.2726 48.4527 98.4417 PART 0.0523 0.1748 28.5934 63.1169 Table 2: Error parameters of various classifiers Fig 3: Graph representing error rate In the figure, PART algorithm have reduced error rate and have higher performance. RIDOR classification found to have next less error rate. So we can conclude that these two provide higher performance than other classifiers under study. [5] CONCLUSION The Intrusion Detection Systems plays a vital role in the secure communication of data. The system is evaluated through USNW-NB 15 dataset using various rule based classification algorithms. In this paper performance of rule classifiers namely RIDOR, RIPPER, Decision Table, PART, One-R is analyzed using the cross- fold validation. The performance is evaluated for accuracy and error parameters. From the result it is evident that PART classification which is an indirect method of rule based classification is the better method in accuracy and reduced error rate than when compared with other system under study. Dr C Manju 135

REFERNCES [1] Krishna Kant Tiwari, Susheel Tiwari, Sriram Yadav Intrusion Detection Using Data Mining Techniques International Journal of Advanced Computer Technology (IJACT). [2] Trupti Phutane, Apashabi Pathan A Survey of Intrusion Detection System Using Different Data Mining Techniques International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2, Issue 11, November 2014. [3] UNSW-NB15: A Comprehensive Data set for Network Intrusion Detection systems (UNSW-NB15 Network Data Set) Nour Moustafa, University of New South Wales at the Australian Defence Force Academy Canberra,Australia.Conference Paper November2015DOI:10.1109/MilCIS.2015.7348942 [4] Safaa O. Al-mamory, Firas S. Jassim Evaluation of Different Data Mining Algorithms with KDD CUP 99 Data Set Journal of Babylon University/Pure and Applied Sciences/ No.(8)/ Vol.(21): 2013 [5] Dr C Manju, Performance Evaluation of Intrusion Detection System Using Classification Algorithms, International Journal of Innovative Research in Science, Engineering and Technology Vol. 6, Issue 7, July 2017. [6] S Vijayarani, S, M. Muthulakshmi. Evaluating The Efficiency Of Rule Techniques for File Classification. International Journal of Research in Engineering and Technology eissn: 2319-1163 ISSN: 2321-7308. [7] Gaines, B.R., Paul Compton, J. 1995. Induction of Ripple-Down Rules Applied to Modeling Large Databases. [8] Petra Kralj Novak,,Intell. Inf. Syst. 5(3):211-228,2009 : Classification in WEKA. [9] Ali, Shawkat, and Kate A. Smith. "On learning algorithm selection for classification." Applied Soft Computing 6.2 (2006): 119-138. [10] Pankaj Singh, Sudhakar Singh, Comparative Study of Data Mining Algorithms through Weka, International Journal of Emerging Research in Management &Technology, ISSN: 2278-9359 (Volume-4, Issue-9). [11] Qin, Biao, et al. "A rule-based classification algorithm for uncertain data." Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on. IEEE, 2009. Dr C Manju 136