Improving the performance of IDS using Genetic Algorithm Kuldeep Kumar, Ramkala Punia Computer Programmer, CCS Haryana Agriculture University, Hisar, Haryana *Teaching Associate, Deptt. of CSE, Guru Jambheshwar University of Science and Technology Hisar verma1.kuldeep@gmail.com, ramkalapunia@gmail.com Abstract Intrusion detection system (IDS) aim to detect computer attacks and/or computer misuse, and to alert the proper individuals upon detection. The growing number of Internet threats increasingly inspires the need of applying a defense in depth concepts to protect worldwide computer system from being intruded for grabbing information. We need a very safe and secure intrusion detection system [1]. So, intrusion detection has become an important area of research the existing systems are not completely flawless and secure. This paper presents a genetic algorithm based approach to network intrusion detection for analyzing and improving performance of IDS. Genetic algorithms (GA) are search algorithms based on the principles of natural selection and genetics. The aim of developing of GAs is developing a system as robust and as adaptable to the environment as the natural systems. The methodology of GA starts from the initial population for a number of generations [5]. During each generation three basic genetic operators are applied to each individual with certain probabilities, i.e. selection, crossover, mutation. Key words: IDS, misuse detection, anomaly detection, genetic algorithm, SNORT. 1. Introduction Today we are suffering from many problems because of intruder interference in our communication with other person/organisation. The growing number of Internet threats increasingly inspires the need of applying a defense in depth concepts to protect worldwide computer system from being intruded for grabbing information. We need a very safe and secure intrusion detection system. So, intrusion detection has become an important area of research the existing systems are not completely flawless and secure. So, there is the need to improve the existing system. Many methods have been developed to secure the network infrastructure and communication over the Internet [8]. Intrusion detection systems monitor the network resources and sensing whether a system or network is being used by an authorized person. There are two ways to protect our network against malicious attempts. First is to build complete secure network system by applying all complicated cryptographic, authentication and authorization methods. However, this solution is not realistic. In practice, it is impossible to have completely secure system, because the user uses operating system and other applications to accomplish his/her job. Almost all applications have one or the other vulnerabilities. Second way is to detect an attack as soon as possible preferably in real-time and take appropriate action [6]. This is essentially what an Intrusion Detection and Preventation System (IDS and IPS) does. An IDS does not usually take preventive measures when an attack is detected; it is a reactive rather than pro-active. There are two general types of intrusion detection systems: misuse detection and anomaly detection. Misuse detection systems detect intruders with known patterns and anomaly detection systems identity deviation from normal network behaviors and alert for the potential unknown attacks [2]. IDS have three common issues: speed, accuracy and adaptability. The speed issue arises from the extensive set of data that needs to be monitored in order to observe the entire situation. An existing approach to solving this problem is to split the network stream into few more manageable streams and analyze each in real time using separate IDSs [1]. The traditional network security technology is a static, passive defense technology, which prevent most of the external attack, but cannot solve the internal attack. In order to solve the shortcomings of traditional passive defense system passive and rigid, experts propose a new security system-active defense system. The core of active defense system is intrusion detection, which can Real-time detect the intrusion of host or network, not only internet, intranet or some operations of computers, but also authorized operation, as long as the intrusion was detected it should immediately report and collect intrusion evidence, even track the source of the attack [7]. A number of soft computing based approaches have been proposed for detecting network intrusions. The principle constitutes of soft computing are Fuzzy Logic, Artificial Neural Networks, Probabilistic Reasoning and Genetic Algorithms. When used for intrusion detection soft computing techniques are often 93
used in conjunction with rule based expert systems acquiring expert knowledge where the knowledge is represented as a set of if then rules. This work present GA based approach to network intrusion detection system. GA is best approach because of some of its good properties e.g. robust to noise; no gradient information is required to find a global optimal or sub-optimal solution, self learning capabilities, etc. In the recent past there has been a growing recognition of deploying intelligent techniques for the creation of efficient and reliable intrusion detection systems. [10] These all the techniques have two steps: training and testing. GA-based techniques are appropriate for dealing with rare classes. As they work with populations of candidate solutions rather than a single solution and employ stochastic operators to guide the search process, GAs cope well with attribute interactions and avoid getting stuck in local maxima, which together make them very suitable for dealing with classifying rare classes. We have gone further by deploying standard F-measure as fitness function. F-value is proven to be very suitable when dealing with rare classes [10]. 2. Genetic Algorithm Genetic algorithm attempts to incorporate ideas of natural evaluation. In general, genetic learning starts as follows. An initial population is created consisting of randomly generated rules. Each rule can be represented by a string of bits [11]. Genetic algorithms are easily parallelizable and have been used for classification as well as other optimization problem. In data mining, they may be used to evaluate the fitness of other algorithms. There is a large class of interesting problem for which no reasonably fast algorithms have been developed. Many of these problems are optimization problems that arise frequently in applications. For some hard optimization problems we can use probabilistic algorithms as well these algorithms do no guarantee the optimum value, but randomly choosing sufficiently many fitnesses the probability of error may be made as small as we like [3]. GA operates on a population of potential solutions applying the principle of the survival of the fittest to produce better and better approximations to the solution of the problem that GA is trying to solve. At each generation, a new set of approximations is created by the process of selecting individuals according to their level of fitness value in the problem domain and breeding them together using the Operators borrowed from the genetic process performed in the nature, i.e. crossover and mutation. This process leads to the evolution of the populations of individuals that are better adapted to their environment than the individuals that they were created from, just as it happens in natural adaptation. The genetic algorithm is employed to derive a set of classification rules from network audit data, and the support-confidence framework is utilized as fitness function to judge the quality of each rule. The generated rules are then used to detect or classify network intrusions in a real-time environment [17]. Figure 1 describes the operation of a general genetic algorithm. The operation starts from an initial population of randomly generated individuals. Then the qualities of the individuals are gradually improved. During each generation, three basic genetic operators are sequentially applied to each individual with certain probabilities, i.e., selection, crossover, and mutation. First, a number of best-fit individuals are selected based on a user-defined fitness function. The remaining individuals are discarded. Next, a number of individuals are selected and paired with each other [1]. Each individual pair produces one offspring by partially exchanging their genes around one or more randomly selected crossing points. At the end, a certain number of individuals are selected and the mutation operations are applied, i.e., a randomly selected gene of an individual abruptly changes its value. 2.1 Structure of Genetic algorithm GA has a population of initial individuals to a population of high quality individuals, where each individual represents a solution of the problem. Each individual is called chromosome. Each chromosome is composed of a certain number of genes that in general case does not have to be fixed. The quality of each rule is measured by a fitness function which is quantitative representation of each rule s adoptions to the environment. The procedure starts from an initial population is evolved for a number of generations while the qualities of increasing the fitness value as the measure of quality. During each generation, three basic genetic operators are sequentially applied to each individual with certain probabilities, i.e. selection, crossover and mutation [5]. Crossover consisting of exchanging of the genes between two chromosomes performed in a certain way, while mutation consists of random changing of a value of a randomly chosen 94
gene of a chromosome. Both crossover and mutation are performed with a certain possibility, called crossover/mutation rate. Create a population of the chromosome Determine the fitness of each individual Select next generation Display result Perform reproduction using Perform mutation Figure 1: Process of Genetic Algorithm SNORT is an open source ID that is used on Window or Linux operating system. Snort is rule based detection engine which is freely available. Snort is capable of performing real time traffic, analysis, packet logging on IP network. It can detect variety of attack. By protocol analysis and content searching, snort detects thousand of worms, vulnerability exploit attempts, port scan and other behavior. Snort is configurable in three modes: sniffer mode, packet logger mode, network Intrusion Detection system mode. In sniffer mode it simply reads packets of network and displays them on screen. In packet logger mode record the packet to the disk. Network Intrusion Detection system mode analyzes the network traffic against a user defined rule set. Several network features have higher possibilities to be involved in network intrusions. In our approach, some rules are selected from the snort rule set to compose a classification rule [5]. Following are some example of SNORT rules: Rule 1: alert tcp any any -> any any (Content : "www.facebook.com" ; msg : "Some one visiting facebook at this time" ; sid : 1000001 ; rev:2 ;) Rule 2 : alert tcp $EXTERNAL_NET 10101 -> $HOME_NET any (msg:"scan myscan"; flow:stateless; ack:0; flags:s; ttl:>220; classtype:attempted-recon; sid:613; rev:8;) When snort generates an alert message, it will usually look like the following: [**] [158:11:1] (snort_decoder): T/TCP Detected [**] The first number is the Generator ID, this tells the user what component of Snort generated this alert. In this case, we know that this event came from the decode (158) component of Snort. The second number is the Snort ID (sometimes referred to as Signature ID). Rule-based SIDs are written directly into the rules with the sid option. In this case, 11 represents a T/TCP event. The third number is the revision ID. This number is primarily used when writing signatures, as each rendition of the rule should increment this number with the rev option. 95
Snort Test Result: 02/12-13:10:50.895874 [**] [1:1000001:2] Some one visiting facebook at this time [**] [Priority: 0] {TCP} 91.121.153.107:80 -> 172.17.11.10:8469 02/12-13:17:44.627562 [**] [1:1000001:2] Some one visiting facebook at this time [**] [Priority: 0] {TCP} 91.121.153.107:80 -> 172.17.11.10:8655 2.3 Methodology The proposed GA-based intrusion detection approach contains two modules where each works in a different stage. In the training stage; a set of classification rules are generated from network audit data using the GA in an offline environment. In the intrusion detection stage; the generated rules are used to classify incoming network connections in the real time environment. Once the rules are generated; the intrusion detection is simple and efficient. The methodology used by the genetic algorithm is fitness value manipulated on the individuals. The fitness of individuals is dependent upon similarities occurred between different chromosomes corresponding to SNORT rules. A fitness function is a measure of quality that is used to design solution. In the fields of genetic programming and genetic algorithm, each design solution is represented as a string of numbers that is also known as chromosome. After each round of testing the idea is to remove the worst design solution and to keep new ones from the best solutions. Each design solution, needs to be awarded a figure of merit, to indicate how close it came to meeting the overall specification, and this is generated by applying the fitness function to results obtained from that solution. Our development research uses the similarity function as a fitness function for analyzing the performance of the system. To calculate the similarity between two chromosomes many types of similarity function are used. There are a number of possible measures for computing the similarity between chromosomes, but the most common is the Dice, Cosine and Jacard measure. We use the Jacard function as a fitness function for finding the best rules. It is defined as following: Jacard = XY X + Y - XY 3.1 IDS Dataset The dataset was divided into training and test dataset. Training is used to train the work presented here; while test dataset is used to test it. Test dataset contains additional attacks not described in training dataset. The attacks include the four most common categories of attack [8,14]: Denial of service (DoS) attacks; here; the attacker makes some computing or memory resource which makes the system too busy to handle legitimate requests. These attacks may be initiated by flooding a system with communications; abusing legitimate resources; targeting implementation bugs; or exploiting the system s configuration. User to root (U2R) attacks; here; the attacker starts with accessing normal user account and exploits vulnerabilities to gain unauthorized access to the root. The most common U2R attacks cause buffer overflows. Remote to user (R2L) attacks; here; the attacker sends packets to a machine; then exploit the machine s vulnerabilities to gain local access as a user. This unauthorized access from a remote machine may include password guessing. Probing (PROBE); here; the attacker scans a network to gather information or find known vulnerabilities through actions such as port scanning. This table shows some common important network features name. Feature No. Feature Name 1 Flag 2. Src_byte 3. Dst_byte 4. Wrong fragment 5. Urgent 6. hot Table 1: Network Features 96
3.2 Results In the experiment; the system was trained with the training dataset; and the JACARD fitness function and the GA parameters were used i.e. 500 generations; 3 initial rules in the population; crossover rate of 0.85; two-point crossover; and mutation rate 0.025. When the training process was finished; the top best quality rules was taken as the final classification rules. The rules were then used to classify the training data and the testing data respectively. The results are in figure 2: Generations Figure 2: Population generation Kiwi Syslog Server [Freeware] Version 8.3.52 /// Kiwi Syslog Server Statistics /// --------------------------------------------------- 24 hour period ending on: Mon, 04 Feb 2013 21:48:57 Syslog Server started on: Mon, 04 Feb 2013 18:48:29 Syslog Server uptime: 2 hours, 6 minutes --------------------------------------------------- + Messages received - Total: 8 + Messages received - Last 24 hours: 8 + Messages received - Since Midnight: 8 + Messages received - Last hour: 0 + Message queue overflow - Last hour: 0 + Messages received - This hour: 0 + Message queue overflow - This hour : 0 + Messages per hour - Average: 4 + Messages forwarded: 0 + Messages logged to disk: 8 + Errors - Logging to disk: 2 + Errors - Invalid priority tag: 0 + Errors - No priority tag: 0 + Errors - Oversize message: 0 + Disk space remaining on drive C: 34577 MB Message Messages Percentage Level 0 - Emerg 0 0.00% 1 - Alert 4 0.50% 2 - Critical 0 0.00% 3 - Error 2 0.25% 4 - Warning 0 0.00% 5 - Notice 0 0.00% 6 - Info 0 0.00% 7 - Debug 2 0.25% Table 2: Breakdown of Syslog messages by severity 97
3.4 Conclusion In this paper; to improve the performance of IDS using Genetic Algorithm is presented. One of the major advantages of this technique is that it is just close to the natural environment because the types of intrusions change and become complicated very rapidly. The proposed detection system can upload and update new rules to the systems as the new intrusions become known. Therefore; it is cost effective and adaptive to real world environment. The GA approach is used to derive a set of classification rules from network audit data. A simple but efficient and flexible fitness function; i.e. the support-confidence framework; is used to select the appropriate rules. Depending on the selection of fitness function weight values; the generated rules can be used to either generally detect network intrusions or precisely classify the types of intrusions. References: [1] A. Chittur; Model Generation for an Intrusion Detection System Using Genetic Algorithms, http://www1.cs.columbia.edu/ids/publications/gaids-thesis01.pdf (accessed in January 2005). [2] Ren Hui Gong; Mohammad Zulkernine; Purang, A software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection. Proceeding of IEEE;2005. [3].Wafa S. AI-Sharafat; Reyadh Sh. Naoum, Adaptive Framework for Network Intrusion Detection by using Genetic based Machine Learning Algorithm. IJCSNS; Vol 9;April 2009. [4]. Jose M. Moya; Alvaro Araujo; A genetic algorithm based solution for intrusion detection. Journal of information assurance and security; 2009. [5] D. Dasgupta and F. A. Gonzalez; An Intelligent Decision Support System for Intrusion Detection and Response ; MMM-ACNS; Lecture Notes in Computer Science; vol. 2052; pp. 1-14; 2001. [6] J. Gomez and D. Dasgupta; Evolving Fuzzy Classifiers for Intrusion Detection ; Proceedings of the IEEE; 2002. [7] H. Pohlheim; Genetic and Evolutionary Algorithms: Principles; Methods and Algorithms http://www.geatbx.com/docu/index.html (accessed in January 2005). [8] MITLincoln Laboratory; DARPA datasets. (accessed in November 2004). [9] B. Mukherjee; L. T. Heberlein; and K. N. Levitt, Network intrusion detection ; IEEE Network; 8(3), pp 26-41; May/June 1994. [10] T. Xiao; G. Qu; S. Hariri; and M. Yousif, An Efficient Network Intrusion Detection Method Based on Information Theory and Genetic Algorithm, Proceedings of the 24th IEEE International Performance Computing and Communications Conference (IPCCC 05); Phoenix; AZ; USA. 2005. [11] S.Selvakani; R.S. Rajesh, Genetic Algorithm for framing rules for intrusion Detection, IJCSNS International Journal of Computer Science and Network Security; VOL.7 No.11; November 2007 [12] A.Christie; W. Fithen; J.McHugh; J.Pickel; E. Stoner, State of the Practice of Intrusion Detection Technologies, Technical Report; Carnegie Mellon University; 2000. [13] N.Toosi; M. Kahani, A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers, Computer Communications 30(2007), pp 2201 2212; 2007. [14] M. Sabhnani; G. Serpen, Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context, Proceeding of International Conference on Machine Learning: Models; Technology and Application; Las Vegas; Nevada; USA; June 2003. [15] Ch. Sinclair; L. Pierce; S. Matzner, An Application of Machine Learning to Network Intrusion Detection, 15th Annual Computer Security Applications Conference Phoenix; Arizona; December 6-10; 1999 [6] KDD-CUP 1999 Data. 98