DDoS Attacks Detection Using GA based Optimized Traffic Matrix

2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing DDoS Attacks Detection Using GA based Optimized Traffic Matrix Je Hak Lee yitsup2u@gmail.com Dong Seong Kim Electrical&Computer Eng. Duke University USA dk76@duke.edu Sang Min Lee minuri33@kau.ac.kr Jong Sou Park jspark@kau.ac.kr Abstract Threat of Distributed Denial of Service (DDoS) attacks has been increasing with growth of computer and network infrastructures. DDoS attacks generating mass traffics make network bandwidth and/or system resources depleted. Therefore, it is significant to detect DDoS attacks in early stage. Our previous approach used a traffic matrix to detect DDoS attack. However, it is hard to tune up the parameters of the matrix including (i) size of traffic matrix, (ii) packet based window size, and (iii) threshold value of variance from packets information with respect to various monitoring environments and DDoS attacks. In this paper, we propose an enhanced DDoS attacks detection approach which (i) improves the traffic matrix building operation and (ii) optimizes the parameters of the traffic matrix using Genetic Algorithm (GA). We perform experiments with DARPA 2000 dataset and LBL-PKT-4 dataset of Lawrence Berkeley Laboratory to show its performance in terms of detection accuracy and speed. Keywords-DDoS attacks; intrusion detection; traffic matrix; genetic algorithm I. INTRODUCTION There are many kinds of security threats on the Internet. Distributed Denial of Service (DDoS) attacks have emerged as one of the most serious threats among others [16]. The intensity of DDoS attacks has become stronger according to improvement of network infrastructure. DDoS attacks are launched by generating an extremely large volume of traffics and they rapidly exhaust resources of target systems, such as network bandwidth and computing power. DDoS defense mechanisms can be classified into four categories which are prevention, detection, mitigation, and response [2]. When DDoS attacks occur, first step to thwart DDoS attacks is the detection and it should be done as quickly as possible. However, it is difficult to distinguish between DDoS attack and normal traffics, since DDoS attack traffics often do not contain malicious contents in the packets. Moreover, attackers forge their source addresses to conceal their locations and to make DDoS attacks more sophisticated [9]. DDoS detection schemes should guarantee both short detection delay and high detection rates with low false positives. Computational overheads should be also considered because a detection engine (or module) has to deal with large volume of real-time network traffic. The anomaly schemes which are effective to detect unknown intrusions have been researched with relying on monitoring IP attributes of incoming packets. Peng et al. [11] proposed a simple detection scheme using arrival rates of new source IP addresses but it takes at least over 10 seconds, which is not an appropriate detection delay. Feinstein et al. [3] presented entropy based statistical detection model that is computed on selected IP attributes of some consecutive packets. However, they did not perform any parameters optimization for their detection model so they could not provide optimal window size. In addition, Kim et al. [5] collected a baseline profile on various attribute combinations but the combined attributes increase computational overheads. In this paper, we propose a novel DDoS detection model using a revised traffic matrix from our previous work [6]. The traffic matrix is built up with packet based window and a simple hash function. It makes our proposed model effective in terms of processing overheads and detection delay. Therefore, our approach can be used to detect DDoS attacks at the early stage in a real-time. Furthermore, we use Genetic Algorithm (GA) for optimization of parameters used in the traffic matrix. The GA is a well-known heuristic approach to figure out an optimal value in large search space. To maximize detection rates, we optimize three parameters in our detection model; (i) size of traffic matrix, (ii) packet based window size, and (iii) threshold value of variance from packets information. Then, we carry out experiments on not only LBL-PKT-4 dataset but also DARPA 2000 dataset. The experimental results show the feasibility of our proposed approach. The rest of this paper is organized as follows. In Section 2, related work is presented in brief. Our proposed detection model is presented in Section 3. In Section 4, the experiments and analysis are described. Finally, we conclude this paper in Section 5. II. RELATED WORK Anomaly detection schemes mainly can be divided into following technical categories; rate limiting, data mining, and statistical analysis techniques. Rate limiting techniques are easy to understand and implement as well. They are too simple to detect sophisticated intrusions and it is hard to set up proper threshold values. Data mining techniques are used to build a detection model (classifier) that can discover profile of network features. Lee and Stolfo [7] built a classification model to detect anomalies. They achieved a 978-0-7695-4372-7/11 $26.00 2011 IEEE DOI 10.1109/IMIS.2011.116 254 216

reasonable success in terms of classifying normal and intrusion data. They also reduced misclassification rates by using additional statistical features. A meta-detection model [8] was proposed to improve their previous research. It used combined multiple detection models to increase detection accuracy but multiple models definitely make computation more complex. Many detection techniques have been proposed in the statistical analysis field. We introduce several statistical analysis based detection model, in particular, relying on monitoring IP attributes of arrival packets. Talpade et al. [14] proposed NOMAD which is a scalable and passive network monitoring system. It can detect attacks by analyzing IP packet header information such as time to live (TTL) field, packet delay variation, and traffic flow. It does not support a method for creating the classifier for the high-bandwidth traffic aggregate from distributed sources [2]. Peng et al. [11] proposed a simple detection scheme named Source IP address Monitoring (SIM) to detect high bandwidth attacks. The model monitors arrival rates of new source IP addresses and detects changes of them using a non-parametric Cumulative Sum (CUSUM) algorithm [1][15] which is more suitable for analyzing a complex network environment than parametric algorithm. Their approach showed high detection accuracy with low computational overheads. Attacks including subnet spoofed IP addresses [2][10] can be also detected by this model. However, their experimental results showed that detection delay is between 10 and 127.3 seconds which are not a satisfactory in terms of the detection delay time for a realtime detection system. Feinstein et al. [3] proposed a statistical detection model to identify DDoS attacks by computing entropy and frequency-sorted distributions of specific IP attributes. The entropy could be calculated through a number of consecutive packets called as sliding window of fixed width. They implemented an entropy model as a plug-in for Snort [12][13] and performed experiments to validate it in various network traces. However, size of sliding window, a tunable parameter was not optimized. They set it from their experiential knowledge. Moreover, they did not concern about subnet spoofed attacks in their experiments. The aforementioned approaches do not satisfy several major requirements which should be achieved in DDoS detection approaches such as low processing overheads, short detection delay and high detection rates. Our model is operated in a real-time, since optimized traffic matrix which is constructed by a lightweight hash function. Packet based variable time window enables our model to detect DDoS attacks within very short period of time. The parameters for detection are optimized by GA, it guarantees very high detection rates. In addition, we show that our model can detect DDoS attacks including subnet spoofed packets and relatively low bandwidth attacks. The details of proposed detection model are presented in next section. III. PROPOSED DETECTION MODEL A. Overall Flow of Proposed Model No Start Genetic Algorithms sets three parameters 1. matrix size 2. packet based window size 3. threshold value, T Construct a traffic matrix for one window size Compute variance from the traffic matrix Variance < T? Alert Yes Figure 1. Overall flow of our proposed model Training Testing Figure 1 shows an overall flow of our proposed model. Our model can be divided into three main steps; parameters optimization using GA, traffic matrix construction, and calculation of variance. In the first step, GA optimizes three detection parameters with training dataset in order to maximize detection rates. The detection parameters consist of matrix size, packet based window size, and threshold value, T. In the second step, we build up the traffic matrix by extracting source IP addresses from inbound traffic stream and using a simple hash function to locate packets to the traffic matrix. The size of the traffic matrix and the number of packets for one traffic matrix are set by GA in the previous step. In the final step, variance is computed from the traffic matrix and we can decide whether inbound traffic is attack or normal through comparing the computed variance and a threshold value set by GA. Then, our proposed model alerts when DDoS attack is detected. Otherwise, the second and third steps are repeated continuously. Details of each step are presented in the following subsections. 255 217

B. Parameters Optimization using Genetic Algorithm (GA) Yes End No Generation > 50 Start Initialize population of 30 Evaluation first population Selection operation (Roulette wheel) Crossover operation (Standard crossover) (Pc = 0.6) Mutation operation (Bit inversion) (Pm = 0.05) Evaluate Evolved population Figure 2. Overall flow of GA process Training It is necessary to figure out optimal values of three detection parameters in order to maximize detection rates since victims have different network environments. These parameters consist of traffic matrix size, the number of packets for a time window and a threshold value of variance which makes a decision whether the state of the network is normal or attack. We can get maximum detection rates by optimizing parameters for a corresponding network environment through GA. GA is a heuristic approach for figuring out an optimal value. It is mainly composed of selection, reproduction, and evaluation operations. Repeating these main operations leads to the evolution. An overall flow of GA is shown in Figure 2. We set the pool for parents population to 30. The initial population of 30 is set randomly and fitness function evaluates them with training data. Fitness of a chromosome is proportional to its detection rates. The selection operation elects a new parent group for the next generation. In our detection model, we adopt roulette wheel selection for the selection operation which is commonly used and derives good results. It gives candidates a chance to be selected for the next generation s parent according to their fitness. The candidate of which fitness is high has a high probability to be selected. Therefore, good chromosomes can be passed down the next generation. Roulette wheel selection also maintains the diversity of parent s chromosome because it does not exclude the candidates which have low fitness. Even if there are some candidates who have very low fitness, they still have a little chance to be selected as a parent of next generation. After the selection operation, GA processes a reproduction operation composed of crossover and mutation. In our detection model, we adopt a standard crossover operation. The standard crossover is one-point crossover operation. We set a crossover probability (P c ) for 0.6. Bit inversion is used for our mutation operation. The mutation operation helps the solution not to converge toward local optima. We set a mutation probability (P m ) for 0.05. These values (P c, P m ) should also be set properly through experiential experiments. Otherwise, it may take very long time to get a solution or the local optima problem may be happened. We adopted ordinary value of probabilities for our reproduction parameters. Finally, the fitness function evaluates evolved chromosomes with training data. The above main operations are performed repeatedly until 50 generations. C. Traffic Matrix Construction inbound packets variable time window ex) 10 packets per 1 window H(x) n by n traffic matrix time t Figure 3. Construction of traffic matrices based on variable time window Traffic matrix can be built by capturing inbound traffic stream as shown in Figure 3. Time window is a period of time to make one traffic matrix. Length of each time window depends on amounts of the traffic stream. When a large volume of traffic streams comes into the victim s network, the time window becomes short and vice versa. The time window can be variable according to the network size/condition. All packets coming into the victim s network during one time window are mapped to a traffic matrix by a hash function using source IP addresses of the packets. The hash function produces coordinates for incoming packets to locate them in the traffic matrix. The flow of construction of a traffic matrix is shown in Figure 4. IPv4 address domain is too large to map all IP addresses to the traffic matrix. Since our detection model must run in a real-time environment, the traffic matrix has to have a reasonable size to construct it fast enough and to cover as many as incoming packets. We use a hash function to reduce the scale of the traffic matrix even though it sometimes makes a hash collision. However, the collision problem might be negligible because our model restricts the number of packets for one traffic matrix. 256 218

Packets coming from the network B C B B A B A (2) where, µ M,,if M, 0 32bit Source IP address High 16bit Low 16bit Row = High 16bit mod n Column = Low 16bit mod n The variance (V) of a complete traffic matrix is given by Equation (2). M(i,j) presents an element which has (i, j) index of the traffic matrix (M). The k denotes the number of non-zero elements in M. If the hash collision does not happen during the operation, the k will be equal to the number of source IP addresses. j IV. EXPERIMENTS AND ANALYSIS A. Experimental s 1 TABLE I. DATASETS FOR EXPERIMENTS i 4 Increment value of (i, j) in Traffic Matrix 2 IP spoofing Duration (sec) The number of compromised hosts Average pps LBL-PKT-4 N/A 360 N/A 250 n by n Traffic Matrix DARPA 2000 LLDOS 1.0 32bit 6 Unknown 5500 Figure 4. Constructing a traffic matrix using the hash function i High 16bit of IP mod n j Low 16bit of IP mod n (1) IPv4 address format consists of four octets. In our previous work [6], we used a value that first (third) octet multiplied by second (fourth) octet to calculate index (i, j). It caused more hash collision than using directly the octets. Equation (1) shows that the hash function divides a 32bit source IP address into high and low 16bit address; first two octets and last two octets. Modular operation is applied to each 16bit address to compute an index of a packet. The high 16bit address is divided by the row size of the traffic matrix and the remainder (i) is used as an index of the row. The low 16bit address is divided by the column size of the traffic matrix and the remainder (j) is used as an index of the column. The value of a corresponding element on the index (i, j) is increased by 1. If there is no hash collision, it will indicate the number of incoming packets which have the same IP address during a time window as shown in Figure 3. D. Variance Computation The variance which presents a dispersion of the incoming traffic can be derived from a complete traffic matrix. The variance of DDoS attack traffic which comes from various sources or forges its source IP address by using random generator is much lower than the variance of normal traffic since the packets which have a randomly generated source IP address are spread out to the traffic matrix uniformly. V 1 k M, µ,if M, 0 We used two datasets in our experiments; LBL-PKT-4 and DARPA 2000 LLDOS 1.0 datasets. Their properties, such as IP spoofing, duration, the number of compromised hosts, and packets per second (pps) on average, are summarized in Table 1. LBL-PKT-4 dataset of Lawrence Berkeley Laboratory was used as normal traffic data. It is a normal traffic trace from 14:00 to 15:00 on Friday, January 21, 1994. We extracted first 6 minutes trace from LBL-PKT- 4 dataset for our experiments. The trace is provided with sanitized IP addresses due to a privacy problem. It means that the IP addresses have been renumbered as a positive integer. We converted the sanitized source IP addresses to IPv4 format since our proposed model needs IPv4 format as source IP addresses to construct a traffic matrix. DARPA 2000 LLDOS 1.0 dataset [17] was used as attack traffic data. DARPA 2000 LLDOS 1.0 dataset is a DDoS attack scenario which has been carried out over multiple network and audit sessions. These sessions consist of 5 phases. We employed only the fifth phase traffic because it contains flood attack traffic with 6 seconds duration among the five phases. Therefore, the rest of the phases, first to fourth phases, were not used in our experiments. We combined DARPA 2000 LLDOS 1.0 attack dataset with LBL-PKT-4 normal dataset together and used it for our experiment. B. Experiments for subnet IP spoofed DDoS attacks detection We carried out experiments on DARPA 2000 LLDOS 1.0 dataset. The dataset have volume of attack traffic around 5500pps. Duration is 6 seconds and the entire field of source IP address was generated randomly. Most of flood traffic of DARPA 2000 LLDOS 1.0 dataset can be filtered by ingress/egress filtering techniques [4] which are widely deployed. We used GA to optimize parameters and Table 2 shows the experimental results. 257 219

DARPA 2000 LLDOS 1.0 + LBL- PKT-4 Matrix Size TABLE II. The number of packets for a window OPTIMIZATION RESULT Threshold value T Detection Rates Detection Delay (sec) 86 by 86 795 173.60 1.0 0.13 The optimized parameters may provide high detection rates and short detection delay. Figure 5 shows changes of variance. In Figure 5, each point indicates value of variance derived from a complete traffic matrix. When attack traffic comes between 30 and 36 seconds, the variance becomes lower than threshold value, T, which means that attack is detected. Another marked difference during attack phase is interval between derived variances which represents updating speed. These experimental results showed that our proposed detection model can response to DDoS attacks almost immediately with high accuracy regardless of subnet spoofing technique. Figure 5. Variance VS. time on DARPA 2000 LLDOS 1.0 with LBL-PKT-4 The experimental results showed that our detection model is very accurate and does not need much time to response to DDoS attacks. Also, all three detection parameters were well optimized by GA as it had the highest detection rates. The optimized parameters made our detection model run in different network environments regardless of volume of attack traffic. V. CONCLUSION In this paper, we have proposed a novel detection model to detect the DDoS attacks using the dispersibility of the inbound packets source IP addresses. We have employed a traffic matrix based approach and optimized parameters used in detection through Genetic Algorithm (GA). We showed that our approach meet the major requirements of detection approach such as low processing overheads, short detection delay and high detection rates. Our model can be used in a real-time network environment and it can be implemented easily. Constructing a traffic matrix requires only two fields of IP header such as arrival time of packet and source IP address. A simple hash function was also adopted to locate packets to the traffic matrix. Variable window based on the number of incoming packets makes our model effective in terms of detection delay. Our detection model can also maximize the detection rates by optimizing detection parameters according to corresponding network environments through GA. REFERENCES [1] B. E. Brodsky and B. S. Darkhovsky, Nonparametric Methods in Change-point Problems, Kluwer Academic Publishers, 1993. [2] C. Douligeris and A. Mitrokotsa, DDoS Attacks and Defense Mechanisms: Classification and State-of-the-Art, Computer Networks, Vol. 44, No. 5, pp. 643-666, April 2004. [3] L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred, Statistical Approaches to DDoS Attack Detection and Response, In Proc. of DARPA Information Survivability Conf. and Exposition, pp. 303-314, 2003. [4] T. Killalea, Recommended Internet Service Provider Security Services and Procedures, IETF RFC 3013, November 2000. [5] Y. Kim, J.-Y. Jo, and K. Suh, Baseline Profile Stability for Network Anomaly Detection, In Proc. of the 3 rd Int. Conf. on Information Technology: New Generations, pp. 720-725, April 2006. [6] T. Kim, D. Kim, S. Lee, and J. Park, Detecting DDoS Attacks Using Dispersible Traffic Matrix and Weighted Moving Average, In Proc. 3 rd Conf. on Information Security and Assurance, Lecture Notes in Computer Science, Vol. 5567, pp. 290-300, June 2009. [7] W. Lee and S. J. Stolfo, Data Mining Approaches for Intrusion Detection, In Proc. of the 7 th USENIX Security Symp., pp. 79-93, January 1998. [8] W. Lee, S. J. Stolfo, and K. W. Mok, A Data Mining Framework for Building Intrusion Detection Models, In Proc. of the 1999 IEEE Symp. on Security and Privacy, pp. 120-132, May 1999. [9] J. Li, J. Mirkovic, M. Wang, P. Reiher, and L. Zhang, SAVE: Source Address Validity Enforcement Protocol, In Proc. Of INFOCOM 2002, Vol. 3, pp. 1557-1566, June 2002. [10] J. Mirkovic and P. Reiher, A Taxonomy of DDoS Attacks and DDoS Defense Mechanisms, ACM SIGCOMM Computer Comm. Rev., Vol. 34, No. 2, pp. 39-54, April 2004. [11] T. Peng, C. Leckie, and R. Kotagiri, Proactively Detecting DDoS Attack Using Source IP Address Monitoring, In Proc. of 3 rd Int. IFIP-TC6 Networking Conf., Lecture Notes in Computer Science, Vol. 3042, pp. 771-782, May 2004. [12] M. Roesch, Snort - Lightweight Intrusion Detection for Networks, In Proc. of the 13 th USENIX Conf. on System Administration, pp. 229-238, November 1999. [13] M. Roesch, Snort Users Manual: Snort Release 1.8.5, http://www.snort.org/documentation.html, March 2002. [14] R. R. Talpade, G. Kim, and S. Khurana, NOMAD: Traffic-Based Network Monitoring Framework for Anomaly Detection, In Proc. of the 4 th IEEE Symp. on Computers and Comm., July 1998. [15] H. Wang, D. Zhang, and K. G. Shin, Detecting SYN Flooding Attacks, In Proc. of IEEE INFOCOM, Vol. 3, pp. 1530-1539, June 2002. [16] Arbor networks, "World Wide Infrastructure Security Report 2008," 2008. [17] MIT Lincoln lab (2000), DARPA intrusion detection scenario specific datasets, Available from <http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/da ta/index.html>. 258 220