IJCSC Volume 4 Number 2 September 2013 pp ISSN

Similar documents
Role of Genetic Algorithm in Routing for Large Network

Approach Using Genetic Algorithm for Intrusion Detection System

Anomaly Detection in Communication Networks

Review on Data Mining Techniques for Intrusion Detection System

ISSN: (Online) Volume 4, Issue 3, March 2016 International Journal of Advance Research in Computer Science and Management Studies

ANOMALY DETECTION IN COMMUNICTION NETWORKS

CSE 565 Computer Security Fall 2018

An advanced data leakage detection system analyzing relations between data leak activity

Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm

Intrusion Detection System with FGA and MLP Algorithm

Computer Security: Principles and Practice

DDoS Attacks Detection Using GA based Optimized Traffic Matrix

CS419 Spring Computer Security. Vinod Ganapathy Lecture 13. Chapter 6: Intrusion Detection

Intrusion Detection - Snort. Network Security Workshop April 2017 Bali Indonesia

HSNORT: A Hybrid Intrusion Detection System using Artificial Intelligence with Snort

Chair for Network Architectures and Services Department of Informatics TU München Prof. Carle. Network Security. Chapter 8

AN EVOLUTIONARY APPROACH TO DISTANCE VECTOR ROUTING

2. INTRUDER DETECTION SYSTEMS

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Means for Intrusion Detection. Intrusion Detection. INFO404 - Lecture 13. Content

Intrusion Detection System based on Support Vector Machine and BN-KDD Data Set

Overview Intrusion Detection Systems and Practices

A Hybrid Approach for Misbehavior Detection in Wireless Ad-Hoc Networks

Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT

IDS Using Machine Learning Techniques

Network Security. Chapter 0. Attacks and Attack Detection

Application of Genetic Algorithm in Intrusion Detection System

SPIDeR. A Distributed Multi-Agent Intrusion Detection and Response Framework. Patrick Miller

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

19.1. Security must consider external environment of the system, and protect it from:

Intrusion Detection - Snort

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

Intrusion Detection System (IDS) IT443 Network Security Administration Slides courtesy of Bo Sheng

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm

IDS: Signature Detection

IDuFG: Introducing an Intrusion Detection using Hybrid Fuzzy Genetic Approach

The k-means Algorithm and Genetic Algorithm

APPLICATION OF INTRUSION DETECTION SOFTWARE TO PROTECT TELEMETRY DATA IN OPEN NETWORKED COMPUTER ENVIRONMENTS.

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Determining the Number of Hidden Neurons in a Multi Layer Feed Forward Neural Network

Detection of DDoS Attack on the Client Side Using Support Vector Machine

Framework For Cloud Computing Networks Pdf

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown

Developing the Sensor Capability in Cyber Security

Basic Concepts in Intrusion Detection

Anomaly Intrusion Detection System Using Hierarchical Gaussian Mixture Model

Enhancing the features of Intrusion Detection System by using machine learning approaches

NIDS: Snort. Group 8. Niccolò Bisagno, Francesco Fiorenza, Giulio Carlo Gialanella, Riccardo Isoli

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

Intrusion Detection - Snort

Software Defined Networking based Intrusion Detection System

Denial of Service (DoS) Attack Detection by Using Fuzzy Logic over Network Flows

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN

Current Trends in Network Intrusion Detection Techniques

CE Advanced Network Security

NetDetector The Most Advanced Network Security and Forensics Analysis System

Intrusion Detection. Comp Sci 3600 Security. Introduction. Analysis. Host-based. Network-based. Distributed or hybrid. ID data standards.

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

A Neuro-Fuzzy Classifier for Intrusion Detection Systems

IJSER. Virtualization Intrusion Detection System in Cloud Environment Ku.Rupali D. Wankhade. Department of Computer Science and Technology

Keywords Intrusion Detection System, Artificial Neural Network, Multi-Layer Perceptron. Apriori algorithm

Using a Particle Swarm Optimization Approach for Evolutionary Fuzzy Rule Learning: A Case Study of Intrusion Detection

Introduction to Genetic Algorithms

IDS / SNORT. Matsuzaki maz Yoshinobu stole slides from Fakrul Alam

9. Security. Safeguard Engine. Safeguard Engine Settings

Training And Testing Anomaly-Based Neural Network Intrusion Detection Systems

Genetic Algorithm for Finding Shortest Path in a Network

Feature Selection in the Corrected KDD -dataset

Virtual CMS Honey pot capturing threats In web applications 1 BADI ALEKHYA, ASSITANT PROFESSOR, DEPT OF CSE, T.J.S ENGINEERING COLLEGE

Evolving SQL Queries for Data Mining

Chair for Network Architectures and Services Department of Informatics TU München Prof. Carle. Network Security. Chapter 9

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set

A Flow Based Horizontal Scan Detection Using Genetic Algorithm Approach. These authors contributed to the same extend

Intrusion Detection System For Denial Of Service Flooding Attacks In Sip Communication Networks

Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Intrusion Detection Datasets

Introduction to IA Class Notes. 2 Copyright 2018 M. E. Kabay. All rights reserved. 4 Copyright 2018 M. E. Kabay. All rights reserved.

Classification Of Attacks In Network Intrusion Detection System

Outline. Intrusion Detection. Intrusion Detection History. Some Challenges. Network-based Host Compromises. Host-based Network Intrusion Detection

Distributed Denial of Service (DDoS)

Comparison of variable learning rate and Levenberg-Marquardt back-propagation training algorithms for detecting attacks in Intrusion Detection Systems

INTRUSION DETECTION SYSTEM USING BIG DATA FRAMEWORK

REMINDER course evaluations are online

The Parallel Software Design Process. Parallel Software Design

Data Reduction and Ensemble Classifiers in Intrusion Detection

An Intelligent CRF Based Feature Selection for Effective Intrusion Detection

Raj Jain. Washington University in St. Louis

Intrusion Detection. Overview. Intrusion vs. Extrusion Detection. Concepts. Raj Jain. Washington University in St. Louis

Network Security Issues and Cryptography

Efficient Network Intrusion Detection System Navaneethakrishnan.P a*,theivanathan.g b

Flow-based Anomaly Intrusion Detection System Using Neural Network

Network Intrusion Detection Using Fast k-nearest Neighbor Classifier

INTRUSION DETECTION WITH TREE-BASED DATA MINING CLASSIFICATION TECHNIQUES BY USING KDD DATASET

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN

Study on the Application Analysis and Future Development of Data Mining Technology

A Firewall Architecture to Enhance Performance of Enterprise Network

ANALYSIS ON IDS EVALUATION USING A QUANTITATIVE ASSESSMENT APPROACH

A Rough Set Based Feature Selection on KDD CUP 99 Data Set

Transcription:

Improving the performance of IDS using Genetic Algorithm Kuldeep Kumar, Ramkala Punia Computer Programmer, CCS Haryana Agriculture University, Hisar, Haryana *Teaching Associate, Deptt. of CSE, Guru Jambheshwar University of Science and Technology Hisar verma1.kuldeep@gmail.com, ramkalapunia@gmail.com Abstract Intrusion detection system (IDS) aim to detect computer attacks and/or computer misuse, and to alert the proper individuals upon detection. The growing number of Internet threats increasingly inspires the need of applying a defense in depth concepts to protect worldwide computer system from being intruded for grabbing information. We need a very safe and secure intrusion detection system [1]. So, intrusion detection has become an important area of research the existing systems are not completely flawless and secure. This paper presents a genetic algorithm based approach to network intrusion detection for analyzing and improving performance of IDS. Genetic algorithms (GA) are search algorithms based on the principles of natural selection and genetics. The aim of developing of GAs is developing a system as robust and as adaptable to the environment as the natural systems. The methodology of GA starts from the initial population for a number of generations [5]. During each generation three basic genetic operators are applied to each individual with certain probabilities, i.e. selection, crossover, mutation. Key words: IDS, misuse detection, anomaly detection, genetic algorithm, SNORT. 1. Introduction Today we are suffering from many problems because of intruder interference in our communication with other person/organisation. The growing number of Internet threats increasingly inspires the need of applying a defense in depth concepts to protect worldwide computer system from being intruded for grabbing information. We need a very safe and secure intrusion detection system. So, intrusion detection has become an important area of research the existing systems are not completely flawless and secure. So, there is the need to improve the existing system. Many methods have been developed to secure the network infrastructure and communication over the Internet [8]. Intrusion detection systems monitor the network resources and sensing whether a system or network is being used by an authorized person. There are two ways to protect our network against malicious attempts. First is to build complete secure network system by applying all complicated cryptographic, authentication and authorization methods. However, this solution is not realistic. In practice, it is impossible to have completely secure system, because the user uses operating system and other applications to accomplish his/her job. Almost all applications have one or the other vulnerabilities. Second way is to detect an attack as soon as possible preferably in real-time and take appropriate action [6]. This is essentially what an Intrusion Detection and Preventation System (IDS and IPS) does. An IDS does not usually take preventive measures when an attack is detected; it is a reactive rather than pro-active. There are two general types of intrusion detection systems: misuse detection and anomaly detection. Misuse detection systems detect intruders with known patterns and anomaly detection systems identity deviation from normal network behaviors and alert for the potential unknown attacks [2]. IDS have three common issues: speed, accuracy and adaptability. The speed issue arises from the extensive set of data that needs to be monitored in order to observe the entire situation. An existing approach to solving this problem is to split the network stream into few more manageable streams and analyze each in real time using separate IDSs [1]. The traditional network security technology is a static, passive defense technology, which prevent most of the external attack, but cannot solve the internal attack. In order to solve the shortcomings of traditional passive defense system passive and rigid, experts propose a new security system-active defense system. The core of active defense system is intrusion detection, which can Real-time detect the intrusion of host or network, not only internet, intranet or some operations of computers, but also authorized operation, as long as the intrusion was detected it should immediately report and collect intrusion evidence, even track the source of the attack [7]. A number of soft computing based approaches have been proposed for detecting network intrusions. The principle constitutes of soft computing are Fuzzy Logic, Artificial Neural Networks, Probabilistic Reasoning and Genetic Algorithms. When used for intrusion detection soft computing techniques are often 93

used in conjunction with rule based expert systems acquiring expert knowledge where the knowledge is represented as a set of if then rules. This work present GA based approach to network intrusion detection system. GA is best approach because of some of its good properties e.g. robust to noise; no gradient information is required to find a global optimal or sub-optimal solution, self learning capabilities, etc. In the recent past there has been a growing recognition of deploying intelligent techniques for the creation of efficient and reliable intrusion detection systems. [10] These all the techniques have two steps: training and testing. GA-based techniques are appropriate for dealing with rare classes. As they work with populations of candidate solutions rather than a single solution and employ stochastic operators to guide the search process, GAs cope well with attribute interactions and avoid getting stuck in local maxima, which together make them very suitable for dealing with classifying rare classes. We have gone further by deploying standard F-measure as fitness function. F-value is proven to be very suitable when dealing with rare classes [10]. 2. Genetic Algorithm Genetic algorithm attempts to incorporate ideas of natural evaluation. In general, genetic learning starts as follows. An initial population is created consisting of randomly generated rules. Each rule can be represented by a string of bits [11]. Genetic algorithms are easily parallelizable and have been used for classification as well as other optimization problem. In data mining, they may be used to evaluate the fitness of other algorithms. There is a large class of interesting problem for which no reasonably fast algorithms have been developed. Many of these problems are optimization problems that arise frequently in applications. For some hard optimization problems we can use probabilistic algorithms as well these algorithms do no guarantee the optimum value, but randomly choosing sufficiently many fitnesses the probability of error may be made as small as we like [3]. GA operates on a population of potential solutions applying the principle of the survival of the fittest to produce better and better approximations to the solution of the problem that GA is trying to solve. At each generation, a new set of approximations is created by the process of selecting individuals according to their level of fitness value in the problem domain and breeding them together using the Operators borrowed from the genetic process performed in the nature, i.e. crossover and mutation. This process leads to the evolution of the populations of individuals that are better adapted to their environment than the individuals that they were created from, just as it happens in natural adaptation. The genetic algorithm is employed to derive a set of classification rules from network audit data, and the support-confidence framework is utilized as fitness function to judge the quality of each rule. The generated rules are then used to detect or classify network intrusions in a real-time environment [17]. Figure 1 describes the operation of a general genetic algorithm. The operation starts from an initial population of randomly generated individuals. Then the qualities of the individuals are gradually improved. During each generation, three basic genetic operators are sequentially applied to each individual with certain probabilities, i.e., selection, crossover, and mutation. First, a number of best-fit individuals are selected based on a user-defined fitness function. The remaining individuals are discarded. Next, a number of individuals are selected and paired with each other [1]. Each individual pair produces one offspring by partially exchanging their genes around one or more randomly selected crossing points. At the end, a certain number of individuals are selected and the mutation operations are applied, i.e., a randomly selected gene of an individual abruptly changes its value. 2.1 Structure of Genetic algorithm GA has a population of initial individuals to a population of high quality individuals, where each individual represents a solution of the problem. Each individual is called chromosome. Each chromosome is composed of a certain number of genes that in general case does not have to be fixed. The quality of each rule is measured by a fitness function which is quantitative representation of each rule s adoptions to the environment. The procedure starts from an initial population is evolved for a number of generations while the qualities of increasing the fitness value as the measure of quality. During each generation, three basic genetic operators are sequentially applied to each individual with certain probabilities, i.e. selection, crossover and mutation [5]. Crossover consisting of exchanging of the genes between two chromosomes performed in a certain way, while mutation consists of random changing of a value of a randomly chosen 94

gene of a chromosome. Both crossover and mutation are performed with a certain possibility, called crossover/mutation rate. Create a population of the chromosome Determine the fitness of each individual Select next generation Display result Perform reproduction using Perform mutation Figure 1: Process of Genetic Algorithm SNORT is an open source ID that is used on Window or Linux operating system. Snort is rule based detection engine which is freely available. Snort is capable of performing real time traffic, analysis, packet logging on IP network. It can detect variety of attack. By protocol analysis and content searching, snort detects thousand of worms, vulnerability exploit attempts, port scan and other behavior. Snort is configurable in three modes: sniffer mode, packet logger mode, network Intrusion Detection system mode. In sniffer mode it simply reads packets of network and displays them on screen. In packet logger mode record the packet to the disk. Network Intrusion Detection system mode analyzes the network traffic against a user defined rule set. Several network features have higher possibilities to be involved in network intrusions. In our approach, some rules are selected from the snort rule set to compose a classification rule [5]. Following are some example of SNORT rules: Rule 1: alert tcp any any -> any any (Content : "www.facebook.com" ; msg : "Some one visiting facebook at this time" ; sid : 1000001 ; rev:2 ;) Rule 2 : alert tcp $EXTERNAL_NET 10101 -> $HOME_NET any (msg:"scan myscan"; flow:stateless; ack:0; flags:s; ttl:>220; classtype:attempted-recon; sid:613; rev:8;) When snort generates an alert message, it will usually look like the following: [**] [158:11:1] (snort_decoder): T/TCP Detected [**] The first number is the Generator ID, this tells the user what component of Snort generated this alert. In this case, we know that this event came from the decode (158) component of Snort. The second number is the Snort ID (sometimes referred to as Signature ID). Rule-based SIDs are written directly into the rules with the sid option. In this case, 11 represents a T/TCP event. The third number is the revision ID. This number is primarily used when writing signatures, as each rendition of the rule should increment this number with the rev option. 95

Snort Test Result: 02/12-13:10:50.895874 [**] [1:1000001:2] Some one visiting facebook at this time [**] [Priority: 0] {TCP} 91.121.153.107:80 -> 172.17.11.10:8469 02/12-13:17:44.627562 [**] [1:1000001:2] Some one visiting facebook at this time [**] [Priority: 0] {TCP} 91.121.153.107:80 -> 172.17.11.10:8655 2.3 Methodology The proposed GA-based intrusion detection approach contains two modules where each works in a different stage. In the training stage; a set of classification rules are generated from network audit data using the GA in an offline environment. In the intrusion detection stage; the generated rules are used to classify incoming network connections in the real time environment. Once the rules are generated; the intrusion detection is simple and efficient. The methodology used by the genetic algorithm is fitness value manipulated on the individuals. The fitness of individuals is dependent upon similarities occurred between different chromosomes corresponding to SNORT rules. A fitness function is a measure of quality that is used to design solution. In the fields of genetic programming and genetic algorithm, each design solution is represented as a string of numbers that is also known as chromosome. After each round of testing the idea is to remove the worst design solution and to keep new ones from the best solutions. Each design solution, needs to be awarded a figure of merit, to indicate how close it came to meeting the overall specification, and this is generated by applying the fitness function to results obtained from that solution. Our development research uses the similarity function as a fitness function for analyzing the performance of the system. To calculate the similarity between two chromosomes many types of similarity function are used. There are a number of possible measures for computing the similarity between chromosomes, but the most common is the Dice, Cosine and Jacard measure. We use the Jacard function as a fitness function for finding the best rules. It is defined as following: Jacard = XY X + Y - XY 3.1 IDS Dataset The dataset was divided into training and test dataset. Training is used to train the work presented here; while test dataset is used to test it. Test dataset contains additional attacks not described in training dataset. The attacks include the four most common categories of attack [8,14]: Denial of service (DoS) attacks; here; the attacker makes some computing or memory resource which makes the system too busy to handle legitimate requests. These attacks may be initiated by flooding a system with communications; abusing legitimate resources; targeting implementation bugs; or exploiting the system s configuration. User to root (U2R) attacks; here; the attacker starts with accessing normal user account and exploits vulnerabilities to gain unauthorized access to the root. The most common U2R attacks cause buffer overflows. Remote to user (R2L) attacks; here; the attacker sends packets to a machine; then exploit the machine s vulnerabilities to gain local access as a user. This unauthorized access from a remote machine may include password guessing. Probing (PROBE); here; the attacker scans a network to gather information or find known vulnerabilities through actions such as port scanning. This table shows some common important network features name. Feature No. Feature Name 1 Flag 2. Src_byte 3. Dst_byte 4. Wrong fragment 5. Urgent 6. hot Table 1: Network Features 96

3.2 Results In the experiment; the system was trained with the training dataset; and the JACARD fitness function and the GA parameters were used i.e. 500 generations; 3 initial rules in the population; crossover rate of 0.85; two-point crossover; and mutation rate 0.025. When the training process was finished; the top best quality rules was taken as the final classification rules. The rules were then used to classify the training data and the testing data respectively. The results are in figure 2: Generations Figure 2: Population generation Kiwi Syslog Server [Freeware] Version 8.3.52 /// Kiwi Syslog Server Statistics /// --------------------------------------------------- 24 hour period ending on: Mon, 04 Feb 2013 21:48:57 Syslog Server started on: Mon, 04 Feb 2013 18:48:29 Syslog Server uptime: 2 hours, 6 minutes --------------------------------------------------- + Messages received - Total: 8 + Messages received - Last 24 hours: 8 + Messages received - Since Midnight: 8 + Messages received - Last hour: 0 + Message queue overflow - Last hour: 0 + Messages received - This hour: 0 + Message queue overflow - This hour : 0 + Messages per hour - Average: 4 + Messages forwarded: 0 + Messages logged to disk: 8 + Errors - Logging to disk: 2 + Errors - Invalid priority tag: 0 + Errors - No priority tag: 0 + Errors - Oversize message: 0 + Disk space remaining on drive C: 34577 MB Message Messages Percentage Level 0 - Emerg 0 0.00% 1 - Alert 4 0.50% 2 - Critical 0 0.00% 3 - Error 2 0.25% 4 - Warning 0 0.00% 5 - Notice 0 0.00% 6 - Info 0 0.00% 7 - Debug 2 0.25% Table 2: Breakdown of Syslog messages by severity 97

3.4 Conclusion In this paper; to improve the performance of IDS using Genetic Algorithm is presented. One of the major advantages of this technique is that it is just close to the natural environment because the types of intrusions change and become complicated very rapidly. The proposed detection system can upload and update new rules to the systems as the new intrusions become known. Therefore; it is cost effective and adaptive to real world environment. The GA approach is used to derive a set of classification rules from network audit data. A simple but efficient and flexible fitness function; i.e. the support-confidence framework; is used to select the appropriate rules. Depending on the selection of fitness function weight values; the generated rules can be used to either generally detect network intrusions or precisely classify the types of intrusions. References: [1] A. Chittur; Model Generation for an Intrusion Detection System Using Genetic Algorithms, http://www1.cs.columbia.edu/ids/publications/gaids-thesis01.pdf (accessed in January 2005). [2] Ren Hui Gong; Mohammad Zulkernine; Purang, A software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection. Proceeding of IEEE;2005. [3].Wafa S. AI-Sharafat; Reyadh Sh. Naoum, Adaptive Framework for Network Intrusion Detection by using Genetic based Machine Learning Algorithm. IJCSNS; Vol 9;April 2009. [4]. Jose M. Moya; Alvaro Araujo; A genetic algorithm based solution for intrusion detection. Journal of information assurance and security; 2009. [5] D. Dasgupta and F. A. Gonzalez; An Intelligent Decision Support System for Intrusion Detection and Response ; MMM-ACNS; Lecture Notes in Computer Science; vol. 2052; pp. 1-14; 2001. [6] J. Gomez and D. Dasgupta; Evolving Fuzzy Classifiers for Intrusion Detection ; Proceedings of the IEEE; 2002. [7] H. Pohlheim; Genetic and Evolutionary Algorithms: Principles; Methods and Algorithms http://www.geatbx.com/docu/index.html (accessed in January 2005). [8] MITLincoln Laboratory; DARPA datasets. (accessed in November 2004). [9] B. Mukherjee; L. T. Heberlein; and K. N. Levitt, Network intrusion detection ; IEEE Network; 8(3), pp 26-41; May/June 1994. [10] T. Xiao; G. Qu; S. Hariri; and M. Yousif, An Efficient Network Intrusion Detection Method Based on Information Theory and Genetic Algorithm, Proceedings of the 24th IEEE International Performance Computing and Communications Conference (IPCCC 05); Phoenix; AZ; USA. 2005. [11] S.Selvakani; R.S. Rajesh, Genetic Algorithm for framing rules for intrusion Detection, IJCSNS International Journal of Computer Science and Network Security; VOL.7 No.11; November 2007 [12] A.Christie; W. Fithen; J.McHugh; J.Pickel; E. Stoner, State of the Practice of Intrusion Detection Technologies, Technical Report; Carnegie Mellon University; 2000. [13] N.Toosi; M. Kahani, A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers, Computer Communications 30(2007), pp 2201 2212; 2007. [14] M. Sabhnani; G. Serpen, Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context, Proceeding of International Conference on Machine Learning: Models; Technology and Application; Las Vegas; Nevada; USA; June 2003. [15] Ch. Sinclair; L. Pierce; S. Matzner, An Application of Machine Learning to Network Intrusion Detection, 15th Annual Computer Security Applications Conference Phoenix; Arizona; December 6-10; 1999 [6] KDD-CUP 1999 Data. 98