EVALUATION OF INTRUSION DETECTION TECHNIQUES AND ALGORITHMS IN TERMS OF PERFORMANCE AND EFFICIENCY THROUGH DATA MINING

Similar documents
A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection

Review on Data Mining Techniques for Intrusion Detection System

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

Feature Selection in the Corrected KDD -dataset

INTRUSION DETECTION MODEL IN DATA MINING BASED ON ENSEMBLE APPROACH

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 4, Issue 7, January 2015

Performance Analysis of various classifiers using Benchmark Datasets in Weka tools

Parallel Misuse and Anomaly Detection Model

Feature Selection for Black Hole Attacks

Ranking and Filtering the Selected Attributes for Intrusion Detection System

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

Deep Learning Approach to Network Intrusion Detection

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Intrusion Detection System with FGA and MLP Algorithm

A Survey And Comparative Analysis Of Data

An Optimized Genetic Algorithm with Classification Approach used for Intrusion Detection

Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Intrusion Detection Datasets

Approach Using Genetic Algorithm for Intrusion Detection System

A study on fuzzy intrusion detection

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Intrusion Detection System based on Support Vector Machine and BN-KDD Data Set

INTRUSION DETECTION WITH TREE-BASED DATA MINING CLASSIFICATION TECHNIQUES BY USING KDD DATASET

Flow-based Anomaly Intrusion Detection System Using Neural Network

Keywords Intrusion Detection System, Artificial Neural Network, Multi-Layer Perceptron. Apriori algorithm

Decision Trees for Analyzing Different Versions of KDD Cup Datasets

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

CSE 565 Computer Security Fall 2018

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

Disquisition of a Novel Approach to Enhance Security in Data Mining

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction

INTRUSION DETECTION SYSTEM USING BIG DATA FRAMEWORK

Enhanced Multivariate Correlation Analysis (MCA) Based Denialof-Service

ANOMALY DETECTION IN COMMUNICTION NETWORKS

Detection of Network Intrusions with PCA and Probabilistic SOM

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

Application of the Generic Feature Selection Measure in Detection of Web Attacks

A Review on Performance Comparison of Artificial Intelligence Techniques Used for Intrusion Detection

Intrusion Detection System Using K-SVMeans Clustering Algorithm

Unsupervised clustering approach for network anomaly detection

Abnormal Network Traffic Detection Based on Semi-Supervised Machine Learning

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Determining the Number of Hidden Neurons in a Multi Layer Feed Forward Neural Network

Firewalls, Tunnels, and Network Intrusion Detection

Study of Machine Learning Based Intrusion Detection System

IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 06, 2014 ISSN (online):

Based on the fusion of neural network algorithm in the application of the anomaly detection

Combination of Three Machine Learning Algorithms for Intrusion Detection Systems in Computer Networks

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Anomaly Detection in Communication Networks

Domain-specific Concept-based Information Retrieval System

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

REVIEW OF VARIOUS INTRUSION DETECTION METHODS FOR TRAINING DATA SETS

Denial of Service (DoS) Attack Detection by Using Fuzzy Logic over Network Flows

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Intrusion detection in computer networks through a hybrid approach of data mining and decision trees

Intrusion Detection System For Denial Of Service Flooding Attacks In Sip Communication Networks

Data Mining Based Online Intrusion Detection

A Neuro-Fuzzy Classifier for Intrusion Detection Systems

DECISION TREE BASED IDS USING WRAPPER APPROACH

RULE BASED LEARNING INTRUSION DETECTION SYSTEM USING KDD AND NSL KDD DATASET

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters

Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering

Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM

Comparative Analysis of Machine Learning Methods in Anomaly-based Intrusion Detection

Intrusion Detection Based On Clustering Algorithm

A Survey on Intrusion Detection Using Outlier Detection Techniques

HYBRID INTRUSION DETECTION USING SIGNATURE AND ANOMALY BASED SYSTEMS

IDuFG: Introducing an Intrusion Detection using Hybrid Fuzzy Genetic Approach

Improving the Attack Detection Rate in Network Intrusion Detection using Adaboost Algorithm

Anomaly Intrusion Detection System Using Hierarchical Gaussian Mixture Model

A Data Mining Approach for Intrusion Detection System Using Boosted Decision Tree Approach

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA

A Rough Set Based Feature Selection on KDD CUP 99 Data Set

A FEATURE SELECTION APPROACH USING BINARY FIREFLY ALGORITHM FOR NETWORK INTRUSION DETECTION SYSTEM

Multiple Classifier Fusion With Cuttlefish Algorithm Based Feature Selection

ART 알고리즘특강자료 ( 응용 01)

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System

Optimal Sampling for Class Balancing with Machine Learning Technique for Intrusion Detection System

Network Security. Chapter 0. Attacks and Attack Detection

IJCSC Volume 4 Number 2 September 2013 pp ISSN

A SEMI-SUPERVISED MODEL FOR NETWORK TRAFFIC ANOMALY DETECTION

Intrusion Detection Using Data Mining Technique (Classification)

6, 11, 2016 ISSN: X

Dimension Reduction in Network Attacks Detection Systems

Intrusion Detection System

Bayesian Learning Networks Approach to Cybercrime Detection

Classification Of Attacks In Network Intrusion Detection System

Intrusion Detection. Overview. Intrusion vs. Extrusion Detection. Concepts. Raj Jain. Washington University in St. Louis

Classification and Optimization using RF and Genetic Algorithm

Diversified Intrusion Detection with Various Detection Methodologies Using Sensor Fusion

Diverse network environments Dynamic attack landscape Adversarial environment IDS performance strongly depends on chosen classifier

ANOMALY-BASED INTRUSION DETECTION THROUGH K- MEANS CLUSTERING AND NAIVES BAYES CLASSIFICATION

International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17, ISSN

Keywords: Intrusion Detection System, k-nearest neighbor, Support Vector Machine, Primal Dual, Particle Swarm Optimization

Computer Security: Principles and Practice

Data Sources for Cyber Security Research

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

Transcription:

International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 2, Jun 2013, 47-54 TJPRC Pvt. Ltd. EVALUATION OF INTRUSION DETECTION TECHNIQUES AND ALGORITHMS IN TERMS OF PERFORMANCE AND EFFICIENCY THROUGH DATA MINING PRABHDEEP KAUR & SHEVETA VASHISHT Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India ABSTRACT Data mining has been one of the approches used as a foundation then there is a érection of intrusion detection systems (IDS).Various algorithms and techniques have been développe to make out different types of vulnérabilités. These techniques help to recoder the performance of the detection of vulnerabilities, calculâtes the overheads, and to make efficient without redundancy. However there is no heuristic to corroborate the accuracy of their results. In this paper we study the techniques and algorithms of the detection of known and unknown attacks through data mining. The methods, metrics and datasets are used to find the known and unknown attacks through data mining. The various techniques like, Scoring, Attack generalization technique, Language model techniques, Novel hybrid technique, C4.5 decision algorithm, Nearest neighbour, Rough Set Theory(RST), Support Vector Machine(SVM),Statically, Ensemble boosted decision tree, apriori algorithm, Rule based techniques Association and Sequence techniques, are used to detect the vulnerabilities through Data mining. In this paper, we have described the outline of all the algorithms and techniques to identify their strengths and limitations. KEYWORDS: KDD 99, Attacks, IDS, Data Mining, Learning, Decision INTRODUCTION With the rapid development of electronics and information technology, computer network plays a significant role in our daily lives. However, more and more people use it as a tool for crime, and it has brought great losses to many companies and countries. [1] Data mining and machine learning technology has been extensively applied in network intrusion detection and prevention systems by discovering user behaviour patterns from the network traffic data. Intrusion Detection System (IDS) is necessary because traditional firewalls cannot provide the complete security against the intrusion. [2] Intrusion Detection (ID) is an active and important research area of network security. The functions for intrusion detection system. Monitoring and analyzing both consumer and structure activities. Analyzing structure configurations and vulnerabilities. Assessing structure and file reliability. Ability to be familiar with patterns typical of attacks. Analysis of uncharacteristic activity patterns. Tracking user strategy violations. IDS are one of the key technologies to guarantee the systems security. IDS make a real time response to intrusion actions and intrusion processes. The goal of Intrusion Detection is to identify all the proper attacks and negatively identify all the non-attacks. The datasets (KDD 99CUP, NSL-KDD, and TCP Dump) are used. And the various techniques for

48 Prabhdeep Kaur & Sheveta Vashisht detection of vulnerabilities that improve the performance of the detection of known and unknown vulnerabilities, and use a dataset which is efficient means without redundancy. INTRUSION DETECTION TECHNIQUES No particular algorithm could detect all attack categories with a high possibility of detection and a low false alarm rate. It reinforce our conviction that different algorithms should be used to deal with different types of network attacks. Figure 1: Classification of for Detection of Malicious Activities These techniques helps to detect the malicious activities from dataset efficiently.for choosing one of best technique is the objective of the user. And having one of best technique or algorithm for detecting intrusion is main task for a successful security. RELATED WORK This section discusses the related Works on IDS, Existing Pre-processing and Classifiers techniques applied on the IDS data. There are many research papers published regarding the pre-processing as well as classifiers in order to detect the intrusions in the dataset. The majority of them suffering with false positive type of errors while classifying the attacks and redundancy in the datasets. The following are the some of the related works suffering with low accuracy and false positive errors. Mahbod Tavallaee et.al (2009) Anomaly detection has attracted the attention of many researchers to overcome the weakness of signature based IDSs in detecting novel attacks, and KDDCUP 99 is the mostly widely used dataset for the evolution of these systems. [3] Having conducted a statistical analysis on this data set, they found two important issues which highly affect the performance of evaluated system, and results in a very poor evaluation of anomaly detection approaches. To solve these issues, they proposed a new dataset, NSL-KDD, which consists of selected records of the complete KDD dataset and does not suffer from any of the mentioned shortcomings. Lei, De-Zhan get et.al (2010) Network security is becoming an increasingly important issue, since the rapid Development of internet. Network intrusion detection systems, as the main security defending techniques, is widely used against such malicious attacks.[4] Data mining and machine learning technology has been extensively applied in network intrusion detection and prevention system by discovering user behaviour patterns from the network traffic data. Association rules and sequence rules are the main techniques of data mining for intrusion detection.

Evaluation of Intrusion Detection and Algorithms in Terms of Performance and Efficiency through Data Mining 49 Radhika Goel et.al (2012) in this paper a novel hybrid model is being proposed for Misuse and anomaly detection. C4.5 based binary decision trees are used for misuse and CBA based classifier is used for anomaly detection. Firstly, the C4.5 based decision tree separates the network traffic into normal and attack categories. The normal traffic is sent to anomaly detector and parallel attacks are sent to a decision trees based classifier for labelling with specific attack type. [5] The CBA based anomaly detection is a single level classifier where as the decision trees based misuse detector is a sequential multilevel classifier which labels one attack at a time in a step by step manner. Results show that 99.995% misuse detection rate with an anomaly detection rate of 99.298% is achievable. The overall attack detection rate is 99.911% and false alarm ratio of the integrated model is 3.229%. To overcome the deficiencies in KDD 99 dataset, a new improved dataset is also proposed. The overall accuracy of integrated model trained on new dataset is 97.495% compared to 97.24% of the old dataset. Mrudula gudade et.al (2010) Since Many current intrusion detection systems are constructed by manual encoding of an expert knowledge, changes to them are expensive and slow.[6] In data mining based intrusion detection system, they propose new ensemble boosted decision tree approach for intrusion detection system. Experimental results shows better results for detecting intrusions as compared to others existing methods. Duanyang Zhao et.al (2012) A truly effective intrusion detection system will employ both technologies. They discusses the differences in host and network-based intrusion detection techniques to demonstrate how the two can work together to provide additionally effective intrusion detection and protection.[7] They propose a hybrid IDS, which combines network and host IDS, with anomaly and misuse detection mode, utilizes auditing programs to extract an extensive set of features that describe each network connection or host session, and applies data mining programs to learn rules that accurately capture the behaviour of intrusions and normal. K.Nageswara rao et.al (2012) it has become essential to evaluate machine learning techniques for web based intrusion detection on the KDD Cup 99 data set.[8] This data set has served well to identify attacks using data mining. Furthermore, selecting the relevant set of attributes for data classification is one of the most significant problems in designing a reliable classifier. Existing C4.5 decision tree technology has a problem in their learning phase to detect automatic relevant attribute selection, while some statistical classification algorithms require the feature subset to be selected in a pre-processing phase. Also, C4.5 algorithm needs strong pre-processing algorithm for numerical attributes in order to improve classifier accuracy in terms of Mean root square error. In this paper, evaluated the influence of attribute pre-selection using Statistical techniques on real-world kddcup99 data set. Experimental result shows that accuracy of the C4.5 classifier could be improved with the robust pre-selection approach when compare to traditional feature selection techniques. COMPARISON OF DIFFERENT ALGORITHMS Comparison between different algorithms and techniques can be done by the two By going through the literature analysis of some of the important intrusion detection algorithms, it is concluded that each algorithm has some relative strengths and limitations. A tabular summary is given below in table 1, which summarizes the techniques, advantages and limitations of some of important intrusion detection algorithms and techiques.some of the highlighted algorithms are given below just for knowledge. Different MACHINE LEARNING TECHNIQUES SCORING TECHNIQUES

50 Prabhdeep Kaur & Sheveta Vashisht ATTACK GENERLIZATION TECHNIQUES LANGUAGE MODEL TECHNIQUES HYBRID MODEL TECHNIQUES RULE BASED TECHNIQUES DIFFERENT ALGORITHMS DECISION TREE ALGORITHMS ENSEMBLE BOOSTED ALGORITHMS C4.5 DECISION ALGORITHMS Table 1: Comparisons of Different and Algorithms for Performance and Efficiency PCA (principal component analysis) Machine Learning Machine Learning (RST & SVM) Machine Learning (NN and SVM) Stastical Rule based approach on genetic programming. Signature Based Noval hybrid techniques or multilevel techniques used Un-Supervised techniques Algorithm or Methods Decision tree and, Nearest Neighbourhood Classifier Algorithms Classifier Algorithms ANN and SVM C4.5 Dcision tree algorithm Genetic Algorithm. Packet sniffing method And testing tools are used Classification algorithms,c4.5 Language based models used Working Limitations Overall Result KDD 99 KDD 99 KDD 99 Dataet KDD 99 KDD 99 DARPHA Network communication between distributed environment KDD 99 DARPHA 99 1.Apply both algo s on. 2. Projected data element into feature space. Classification of different algorithms and make comperisons which is best. Packet capture from network, then RST preprocess the data, Feature selected by RST sent to SVM to learn and test. Applying this approach to the KDD CUP'99 data set demonstrates that the proposed approach performs high performance, especially to U2R and U2L type attacks. Evaluated the influence of attribute pre-selection using Statistical techniques on real-world kddcup99 data set Detecting the noval attacks by using four genetic operators, namely reproduction, mutation, crossover, and dropping condition operators Detect and combat some common attacks on network systems Detect known and unknown attacks based on decision tree algorithms,level wise detection Detect Unknown Attacks, Poor Prediction Rate of R2L class, Not Efficient technique for New attacks Not more efficient This method is Very effective to decrease the space density of data. Redundency in dataset,not much performance Problem in desigining the reliable classifier, Problem in Classifier Accuracy Genetic operations are random,result is not good for some runs. More network traffic. Still redundency in dataset, not much performance with C4.5 algorithm Not proper measure the unknown attacks Improve learning time Efficiency, Space representation of different dataset. Performance Improvement and real-time intrusion Detection Compare the results with (PCA) and show RST and SVM schema could reduce the false positive rate and increase the accuracy. SVM is superior to NN in false alarm rate and in accuracy for Probe, DoS and U2R and R2L attacks, while NN is better than SVM only in accuracy. Well outperformed to detect the outliers. Average, NOT more performance evaluated Good. Good Detection Accuracy 80%, No false rate.

Evaluation of Intrusion Detection and Algorithms in Terms of Performance and Efficiency through Data Mining 51 SELECTION OF AN EFFICIENT DATASET John McHugh was the first to criticize the KDDCUP data sets, pointing out that the synthetic data generated we requite far from what one collects from real networks.[10] But, due to the lack of publicly available data sets, they are still used.sabhani and Serpen showed that no pattern classification or machine learning algorithm can be trained successfully with the KDD data set to perform misuse detection for two of the four attack categories included in the KDD data set userto-root or remote-to-local attack categories. Tavallaee conducted a statistical analysis of the KDDCUP 99 data set and found two important issues which affect the performance of machine learners: the huge number of redundant records in both train and test sets.[11] To correct this, they proposed a new data set which they called NSL.The main task for the KDD 99 classifier learning contest was to provide a predictive model able to distinguish between legitimate (normal) and illegitimate (called intrusion or attacks) connections in a computer network. Each attack type falls exactly into one of the following four categories: Probing: observation and other penetrating, e.g., port scanning; DOS: denial-of-service, e.g. syn flooding; U2R: unofficial access to local marvellous user (root) privileges, e.g., various buffer overflow attacks; R2L: unofficial access from a remote machine, e.g. password guessing.[9] IMPROVING EFFICIENCY By Taking competent properties for a dataset upon where the attacks must be noted down. The KDD-dataset is run by the most realistic environment for the evaluation of efficiency and redundancy.

52 Prabhdeep Kaur & Sheveta Vashisht Above environment shows the result without redundancy. So in future we can use the KDD dataset for evaluation of detection of attacks. Here we have so many datasets available but choosing an efficient dataset is one of the prevalent objectives for providing the efficiency. The performance enhancement by using algorithms or techniques also is a objective of by means of an efficient dataset. So the KDD dataset is one of dataset which proves the efficiency by removing the redundancy. The 39 dissimilar attack types present in the 10% datasets are available here. CONCLUSIONS In this paper we see most of the techniques that are based on machine learning techniques, Decision based techniques, classification and hybrid techniques for detection of known and unknown vulnerabilities and datasets (KDD 99CUP, NSL-KDD) is used. But still problems are there these all techniques cannot detect the both type of attacks known or unknown. Because some have the accuracy problem false alarm rate problems Some techniques only efficient for identification of known attacks so, it is found that decision models are mostly used for detection of attacks that increase the usage of public datasets for attack detection in future.and we can conclude that by using the KDD 99 we improve the efficiency and redundency. REFERENCES 1. Lei Li, De-Zhang Yang, Fang-Cheng Shen, A Novel Rule-based Intrusion detection System Using Data Mining international conference 2010 2. Sandip Sonawane, Shailendra Parde and Ganesh Prasad, A survey on intrusion detection techniques World Journal of Science and Technology 2012, 2(3):127-13ISSN: 2231 2587 3. Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani A Detailed Analysis of the KDD CUP 99 Data Set proceeding on 2009 4. Lei Li, De-Zhang Yang, Fang-Cheng Shen, A Novel Rule-based Intrusion detection System Using Data Mining international conference 2010. 5. Radhika Goel, Anjali Sardana, and Ramesh C. Joshi Parallel Misuse and Anomaly Detection Model International Journal of Network Security, Vol.14, No.4, PP.211-222, July 2012 6. Mrudula Gudadhe, Prakash Prasad A New Data Mining Based Network Intrusion Detection Model International conference on computer and communication technology, ICCCT 10 7. Duanyang Zhao, Analysis of an intrusion detection system in second international conference on security and management 2012(3):127-13ISSN. 8. K.Nageswara Rao, D.Rajya Lakshmi, T.Venkateswara Rao Robust Statistical Outlier based Feature Selection Technique for Network Intrusion Detection International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-1, March 2012. 9. Yacine Bouzida, Fr ed eric Cuppens, Nora Cuppens-Boulahia and Sylvain Gombault, Efficient Intrusion Detection Using Principal Component Analysis, D epartement RSM GET/ ENST Bretagne 2, rue de la Chˆataigneraie, CS 17607 35576 Cesson S evign e CEDEX, FRANCE

Evaluation of Intrusion Detection and Algorithms in Terms of Performance and Efficiency through Data Mining 53 10. J. McHugh, Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory, ACM Trans. Inf. Syst. Secur., vol. 3, pp. 262 294, November 2000. [Online]. Available: http: //doi.acm.org/10.1145/382912.382923 11. Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani A Detailed Analysis of the KDD CUP 99 Data Set proceeding on 2009 AUTHOR S DETAILS Prabhdeep Kaur, Department of CSE, Lovely Professional University, Phagwara, Punjab, India, 9878282180, (prabhdeep_aulakh@yahoo.com). Sheveta Vashisht, Department Of CSE, Lovely Professional University, Phagwara, Punjab, India, 7508280568, (sheveta.16856@lpu.co.in)

.