Correlating Alerts with a Data Mining Based Approach

Size: px

Start display at page:

Download "Correlating Alerts with a Data Mining Based Approach"

Allen Ford
6 years ago
Views:

1 Correlating Alerts with a Data Mining Based Approach Guang Xiang, Xiaomei Dong, Ge Yu School of Information Science and Engineering, Northeastern University, China xguang80@163.com Abstract In monitoring anomalous network activities, intrusion detection systems tend to generate a large amount of alerts, which greatly increase the workload of post-detection analysis and decision-making. In this paper, we propose a correlation approach based on sequential pattern mining techniques to fuse related alerts for the Distributed Denial of Service (DDoS) attacks. By mining the alert sequences and iteratively consolidating the matching sequential alert patterns, our approach is able to greatly reduce the related alerts and identify their DDoS membership. The alert reduction and fusing mechanism allow us to concentrate on a higher level of abstraction and thus save much extra efforts spent on analyzing a big volume of trivial raw alerts. Experimental comparisons of our method with hidden Markov model (HMM), a powerful stochastic process for sequence analysis, show that our algorithm is slightly better than HMM in terms of DDoS alert sequence identification. 1. Introduction Extensive deployment of the intrusion detection system (IDS) helps boost the cyberspace immunity to potential attacks. However, it also complicates the network by generating a large amount of alerts to be processed. Among various network intrusions, there is an effective one often called the Distributed Denial of Service (DDoS) attack. During a DDoS, the adversary usually launches attacks against the chosen victims indirectly. Specifically, the adversary first probes a network to locate the up hosts (phase 1). After determining which hosts are alive, the adversary breaks into some of them, installs relevant tools (phase 2) and launches DDoS via these victims (phase 3), which serve as the springboard of the attack. As a matter of fact, an attacker can make a DDoS more sophisticated by repeatedly running phase 1 and phase 2 and launching DDoS from the last victim on the chain of exploited machines. We will call this stealthier variant DDoS2. In this case, tracing the true source through the alerts is infeasible and a large volume of false alerts tend to be raised by an IDS. Hence, it is necessary to reduce the irrelevant alerts and fuse related ones before they are presented to the administrators, which will facilitate alert processing and ultimately improve the detection performance. In this paper, we propose such an alert correlation approach based on sequential pattern mining techniques. The justifications of our method partially stem from the fact that an attacker always performs each fraction of the whole intrusion process many times to guarantee success. Our goal is twofold: to correlate related alerts into scenarios at a higher level and to identify DDoS attacks through the scenarios. In evaluation, we conducted a series of experiments over the DARPA 2000 benchmark data corpus comparing our method with hidden Markov model, a powerful stochastic process for sequence analysis. Experimental results show that our approach is better than HMM in terms of DDoS alert sequence identification and alert reduction. The remainder of this paper is organized as follows. Section 2 gives a brief survey of related works. A short summary of sequential pattern mining algorithms and our correlation approach are presented in section 3. In section 4, we compare our method with HMM in identifying DDoS sequences and correlating alerts and report the experimental results. Finally, we draw the conclusions in section Related work To the best of our knowledge, not much research has been done on the field of alert correlation. In the thrusts on using probabilistic methods, Valdes and Skinner proposed a unified mathematical framework for alert correlation in [1]. They add a set of similarity functions to comprehend observations in the intrusion detection domain. Accordingly, a new alert can be merged with the best matching meta alert by computing its similarity to existing meta alerts. On the data mining frontier, [2] presented an algorithm capable of fusing multiple heterogeneous alerts into scenarios. The algorithm builds scenarios by computing the probability that a new alert belongs to an existing scenario and adding the alert to the most likely candidate scenario. Three

2 probability estimation methods were evaluated and the data mining approach was found to outperform the naive and heuristic techniques. In another attempt to alert correlation, Ning et al. [3] constructed a series of prerequisites and consequences of the intrusions and developed a formal framework to correlate related hyper-alerts by matching the consequence of some previous alerts and the prerequisite of some later ones. 3. Alert correlation In this section, we first present some preliminaries regarding sequential pattern mining, which is followed by a detailed description of our correlation approach. The definitions and conventional notations used in this section strictly follows the previous classic works in [4, 5, 6, 7] Preliminaries Let I = {i 1,i 2,...,i n } be a set of items. A subset X I is called an itemset and a sequence is an ordered list of itemsets, denoted by s 1 s 2...s k where s j (1 j k) is an itemset. A sequence database S is a set of tuples sid, s where sid is the sequence ID and s is a sequence. The support of a sequence s in the sequence database S, denoted by sup(s), is defined as the number of sequences s Scontaining s. For a given support threshold minsup, a sequence s is called a frequent sequence or sequential pattern if sup(s) minsup. The aim of sequential pattern mining is to discover all the sequential patterns, given a sequence database S and a support threshold minsup. The sequential patterns are very similar to DDoS attacks in that both are composed of a set of logically related yet temporally separated steps. The effectiveness of our approach partially ascribes to this inherent resemblance. During a DDoS, an IDS will generate alerts on the activities that it regards as anomalous. As many of the attacks detected aim to accomplish the same result, all alerts are mapped into one of three categories or stages: probe, exploit, DDoS according to the alert type labeled on them by the IDS. Then, we aggregate the alerts from hosts with matching addresses according to the temporal order they are generated and further build them into a set of alert sequences. Alerts in a sequence are arranged based on the stage-ascending order. An appropriate sequential pattern mining algorithm then runs on the sequences and the alerts contained in a sequential pattern mined are correlated into a single alert. Subsequently, the longest sequential pattern (lspattern) is selected to represent all the sequential patterns mined in a group. The source and destination addresses of a lspattern are marked respectively by the source address of the first alert and the destination address of the last alert it contains. We then generate DDoS scenarios by iteratively Table 1. Examples of alert sequences. addresses sid alert sequence 1 probe, exploit From address1 to address2 2 probe, exploit 3 probe, exploit 4 probe, exploit, DDoS From address2 to address3 5 probe, exploit, DDoS 6 probe, exploit, DDoS From address4 to address5 7 probe matching the destination address of one lspattern with the source address of another. After these steps, we have fused many trivial raw alerts into only a few and further classify the correlated alerts as belonging to the DDoS attack type. Table 1 shows several examples of the alert sequences. Each item in a sequence is an alert after mapping and the sequences are constructed from the raw alert stream by our algorithm. Mining sequential patterns on the above sequence set with the support threshold set to 2 will eliminate infrequent alert sequence 7. Two lspatterns obtained are <probe, exploit> and <probe, exploit, DDoS>. They satisfy the fusing criterion and thus are correlated together. Hence, sixteen alerts are reduced to one and alerts in sequence 1 to sequence 6 are identified as part of a DDoS attack. A variety of candidate sequential pattern mining algorithms are eligible for our purpose. Agrawal and Srikant introduced three algorithms, AprioriAll, AprioriSome and DynamicSome in [4]. Based on the Apriori property proposed in association rule mining [8], these algorithms make multiple passes over the database and use the candidategeneration-and-test strategy in mining sequential patterns. Agrawal and Srikant added time constraints, sliding time windows and taxonomies to sequential patterns and proposed another algorithm Generalized Sequential Patterns (GSP) in [5]. Pei et al. proposed a novel algorithm, named PrefixSpan [6], which recursively mines the prefixprojected databases for sequential patterns. As a pattern growth method, PrefixSpan greatly reduces the cost spent on candidate subsequence generation in the Apriori-based approaches. In a more recent work, Ayres et al. implemented a new algorithm called SPAM [7] which employs a vertical bitmap representation to store the sequences and utilizes a depth-first traversal strategy for mining sequential patterns. We adopt the SPAM algorithm in our research Data source and approach settings In our research, we use the DARPA 2000 benchmark repository [9]. It consists of two intrusion scenarios, LLDDOS1.0 and LLDDOS2.0.2, each of which contains DDoS data sniffered on both the inside network and the demilitarized zone (DMZ).

3 Algorithm 1 corralerts: Correlate raw alerts and detect DDoS attacks through the generated DDoS scenarios Input: alertset containing m raw alerts, address prefix length l, time constraint t, support threshold s Output: correlated alerts, DDoS scenarios 1: sequenceset gensequence( alertset, t, l ) 2: lspset minepattern( sequenceset, s ) 3: scenarios corrspattern( lspset, l ) 4: output DDoS scenarios and the alerts after correlation In generating alerts, we choose snort and turn on the r option to directly read data from the tcpdump file. In addition, snort is configured according to the network settings of DARPA 2000 evaluation. Among many features of the alerts fired by snort, we only preserve several essential ones: time stamp, source and destination addresses, source and destination ports, alert type, alert category. To depict the proximity of two IP addresses, we adopt a variable l, called address prefix length, which denotes the maximum number of 1 bits in an IPv4 subnet mask that could interpret the two addresses. For instance, for B class addresses and , thel value is 16 since the highest order bit in the third octet differs. The matching process is implemented in a method named match(ip 1,IP 2, l), which returns true if the addresses match each other under address prefix length l. We also add time constraints to the sequential patterns by specifying the maximum time gap t allowed between adjacent alerts in a sequence. Another parameter to tune is the support threshold s for mining sequential patterns. A very small threshold will incorporate a large amount of uninteresting sequential patterns while a too large value is prone to eliminating many useful sequences in mistake Correlation algorithms Our correlation approach is formally described by the following four algorithms, in which algorithm 1 is the main routine and the subsequent three algorithms are invoked to realize alert correlation. Algorithm 2 describes the alert sequence construction process. In real-world DDoS attacks, the final DDoS stage usually consists of packets from many spoofed addresses to a variety of ports on the victims. As a result, simply matching addresses will not work for alerts of this stage and we overcome this difficulty by examining the stage information when adding alerts to a sequence. The algorithm core is the three loops, the first two of which serve as determining an alert group among hosts with matching addresses and the third extracts alerts in different attack stages from the previously obtained alert group to build alert sequences. Algorithm 3 finds the sequential alert patterns from the Algorithm 2 gensequence: Construct alert sequences Input: alertset containing m alerts, t, l Output: sequenceset 1: sequence, seqarray φ 2: k =0 3:foral 1 = alert 1 to alert m do 4: if (al 1 is not marked ) 5: foral 2 = al 1 to alert m do 6: if((al 2 is not marked ) match(al 1.srcIP, al 2.srcIP, l) match(al 1.dstIP, al 2.dstIP, l)) 7: curalert= al 2 8: sequence sequence al 2 9: markal 2 10: foral 3 = al 2 +1to alert m do 11: if((al 3 is not marked ) (al 3.time curalert.time t )) 12: flag= false 13: if (match(al 2.srcIP, al 3.srcIP, l) match(al 2.dstIP, al 3.dstIP, l) al 3.stage == curalert.stage +1) 14: flag = true 15: else if ( curalert.stage == 2 al 3.stage == 3 ) 16: flag = true 17: endif 18: if (flag) 19: curalert= al 3 20: sequence sequence al 3 21: markal 3 22: endif 23: endif 24: endfor 25: seqarray seqarray sequence 26: sequence φ 27: endif 28: endfor 29: sequenceset[k++] seqarray 30: seqarray φ 31: endif 32:endfor 33: output sequenceset sequence groups created in algorithm 2. Alerts in each pattern mined are deemed as strongly relevant and are thus correlated. The longest sequential pattern is reserved to represent all the sequential patterns in a group. It is possible that normal activities of a legitimate user sometimes exhibit similar patterns and are incorrectly captured into alert sequences. However, we argue that such sequences are unlikely to be frequent and consequently, these alerts are excluded from further analysis by the mining algorithm.

4 Algorithm 3 minepattern: Mine sequential alert patterns Input: sequenceset (p alert groups), support s Output: longest sequential alert pattern set lspset 1:lspSet φ 2:fori =0to p do 3: sp SPAM(sequenceSet[i], s) 4: if (sp φ ) 5: correlate all alerts covered by sp 6: lsp = the longest sequential alert pattern in sp 7: lsp.srcip the srcip of the first alert in lsp 8: lsp.dstip the dstip of the last alert in lsp 9: lspset lspset lsp 10: endif 11:endfor 12: output lspset Algorithm 4 corrspattern: Generate DDoS scenarios Input: longest sequential pattern set lspset (containing n lspatterns), address prefix length l Output: DDoS scenarios 1: corrlist, scenariolist φ 2:forlsp 1 = lspattern 1 to lspattern n do 3: if (lsp 1 is not marked ) 4: currentlsp lsp 1 5: corrlist.addtotail(lsp 1 ) 6: marklsp 1 7: forlsp 2 = lsp 1 to lspattern n do 8: if((lsp 2 is not marked ) match(currentlsp.dstip, lsp 2.srcIP, l) currentlsp.lastalert.stage 2 ) 9: currentlsp lsp 2 10: corrlist.addtotail(lsp 2 ) 11: marklsp 2 12: endif 13: endfor 14: scenario fuse the lspatterns in corrlist 15: scenariolist = scenariolist scenario 16: corrlist φ 17: endif 18:endfor 19: output DDoS scenarios in scenariolist In algorithm 4, DDoS scenarios are produced according to the sequential alert patterns obtained. In particular, we require the first lspattern to end with stage2 or stage3 alerts during correlation. By this heuristics, we prevent alerts similar to the one in sequence 7 (Table 1) from being fused Parameter optimization In our research, we assign three distinct values 16, 24 and 32 to address prefix length l. Another vital parameter is the time gap t. Here, t can take values from {20, 50, 100} (minutes), which we think suffice for capturing highly related alerts. As to support s, we pick three values 2, 4 and 8. Thus thereareatotalof27 compositions. To optimize the parameters, we use the inside tcpdump file from LLDDOS1.0 as the training data and run our algorithm on it 27 times. The criterion by which we gauge the superiority of a parameter composition is called correlation ratio defined by C r = N N c (1) n n c where N (n) is the number of true alerts (false alerts) before correlation, N c (n c ) is the number of true alerts (false alerts) after running our algorithms. The optimal parameters are those that yield the largest C r score over the training data and we use them when running our algorithms on the testing sets. Moreover, the mapping mechanisms in determining the intrusion stages of an alert are different for the training and testing processes. We manually assign each alert a stage label in the former case while write a script to implement automatic stage assignment according to the alert type for the latter case. 4. Evaluation To evaluate our approach, we compare it with hidden Markov model in identifying DDoS alert sequences. In this section, we first briefly introduce HMM and explain its usages in multi-stage intrusion detection. Then we report the experimental results achieved by these two methods Hidden Markov model The hidden Markov model is a stochastic process consisting of a finite set of states {S i = q i }(1 i N), each of which can emit observables O = {o 1,o 2,...,o M } according to a probability distribution b j (k)(1 j N,1 k M) associated with it. Transitions among the states are governed by a set of probabilities called transition probabilities {a ij }(1 i, j N). HMM is actually a doubly embedded stochastic process with two hierarchy levels. In fact, HMM has been extensively applied to a variety of areas such as speech recognition, intrusion detection [10], etc. Several properties of the multi-stage DDoS alert sequences match quite well with the HMM representation. Each state of a HMM probabilistically depends only on the previous state, which is very similar to a DDoS scenario where a stage is often based on the outcome of the previous step. In other words, an alert corresponds to an observable in the HMM, while the intrusion stage of the alert can be modeled internally as a state. An intruder has many ways to accomplish each attack step, just as each HMM state can produce many observables.

5 Figure 1. A hidden Markov model for DDoS attack detection. A HMM for DDoS detection is illustrated in Figure 1. In our research, we also use the inside tcpdump file from LLDDOS1.0 as the training data. Specifically, we train the HMM using the alert sequences constructed by algorithm 2 proposed in section 3.3. In addition, we train our HMM with a supervised learning method rather than the popular Baum-Welch method [11]. To combat the skewed distribution problem, we employ the absolute discounting smoothing technique during HMM training. After obtaining a model, we can calculate the probability that a given observable sequence is an emission of the HMM by determining its Viterbi path [11]. This can be described as follows. Suppose we need to find the best state sequence Q = {q 1 q 2 q T } given an observation sequence O = {o 1 o 2 o T }. A quantity is defined for this purpose δ t (j) = max P [q 1 q 2 q t = j, o 1 o 2 o t λ] (2) q 1,q 2,,q t 1 δ t (j) is the highest probability at time t along a state sequence that accounts for the first t observations and ends in state S j. The best state sequence can then be found by induction δ t+1 (k) =[maxδ t (j)a jk ] b k (o t+1 ) (3) j If the probability computed by Viterbi algorithm is no less than a threshold, the alert sequence is regarded as part of a DDoS attack. The rest of our algorithms then continue. Apparently, the difference between our correlation approach and HMM mainly lies in the DDoS alert sequence identification step in algorithm 3 (line 3) Experimental results In section 3.4, we introduce how to optimize the parameters (s, l, t) through training for our correlation approach. The training results are shown in Table 2. Table 2. Training results for parameter optimization. Parameters(s, l, t) (2, 16, 20) (2, 16, 50) (2, 16, 100) correlation ratio (2, 24, 20) (2, 24, 50) (2, 24, 100) correlation ratio (2, 32, 20) (2, 32, 50) (2, 32, 100) correlation ratio (4, 16, 20) (4, 16, 50) (4, 16, 100) correlation ratio (4, 24, 20) (4, 24, 50) (4, 24, 100) correlation ratio (4, 32, 20) (4, 32, 50) (4, 32, 100) correlation ratio (8, 16, 20) (8, 16, 50) (8, 16, 100) correlation ratio (8, 24, 20) (8, 24, 50) (8, 24, 100) correlation ratio (8, 32, 20) (8, 32, 50) (8, 32, 100) correlation ratio Obviously, the winner parameter values are 2, 16 and 100 for s, l and t respectively, which lead to the maximum correlation rate C r 4.73 over the training data. Table 3 summarizes the training and testing performance achieved with our approach and HMM. As shown in Table 3, the number of true alerts reduced on the training set was 298 for our approach and 295 for HMM, which amounts to an alert reduction rate of 73.6% and 72.8% for the two methods respectively. On the LLDDOS1.0 DMZ data set, our approach almost correlated all the true alerts (a 99.8% reductionrate) and far outperformed HMM which achieved 93.1%. In this case, only one alert indicating the existence of DDoS attacks was output thanks to our method. Note that the performance of each model on this data set was better than the performance on the training set. We think two facts might lead to this discrepancy. First, the DMZ data and Inside data are from the same scenario and thus have very similar intrusive patterns. Second, snort emitted far more true alerts on LLDDOS1.0 DMZ data (996) than on LLDDOS1.0 Inside data (405). For the more sophisticated LLDDOS2.0.2 scenario, the performance of both models degraded to some extent. Only 6 true alerts were generated by snort on the DMZ tcpdump data and both methods correlated all of them, attaining the same reduction rate of 83.3%. For the Inside part, our correlation approach reduced 217 true alerts out of 420, slightly more than the 214 alerts cut out by HMM. Both methods achieved an over 50% reduction rate, decreasing the number of true alerts by more than a half. Note that the false

6 Table 3. Correlation performance. LLDDOS1.0 LLDDOS2.0.2 DMZ Inside DMZ Inside #Alerts generated by snort(true/false) 996/ /129 6/45 420/37 Method Ours HMM Ours HMM Ours HMM Ours HMM #True alerts reduced #False alerts reduced Reduction rate(%) 99.8% 93.1% 73.6% 72.8% 83.3% 83.3% 51.7% 51% alerts correlated were also deemed as part of a DDoS scenario and hence induced error to the correlation methods. However, this error is to a large extent entailed by the intrusion detection system and is therefore inevitable. 5. Conclusion In this paper, we propose an alert correlation approach based on sequential pattern mining techniques and give a formal description of our method via four algorithms. The twofold goal of our approach is to reduce the number of alerts and to identify the DDoS attack type. Our motivation is based upon the fact that network administrators often have a really hard time fighting against a large amount of alerts engendered by an IDS. The sequential pattern mining algorithm proves to be robust since though many irrelevant alerts can be frequent, they are unlikely to form sequential alert patterns along the track of a DDoS attack and are hence excluded from further correlation. By targeting such frequently co-occurred alerts which are a good barometer of the existence of DDoS attacks, our approach has great potential in boosting the accuracy of multi-stage intrusion detection. Extensive experimental comparisons with the hidden Markov model showed that our method was slightly better than HMM in terms of DDoS alert sequence identification and alert reduction. Our research is still preliminary mainly in that the parameter constraints added to the sequential pattern mining algorithm are straightforward. As a future line of research, we would like to explore more complicated constraints, which will incorporate more inherent characteristics of the alert sequences. A new criterion for selecting sequential alert patterns other than the sheer counting mechanism is also under our consideration. Acknowledgements This research was partially funded by the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No and the National Natural Science Foundation of China under Grant No References [1] A. Valdes and K. Skinner. Probabilistic alert correlation. In Proceedings of the Fourth International Symposium on Recent Advances in Intrusion Detection, pages 54 68, [2] O. M. Dain and R. K. Cunningham. Fusing a heterogeneous alert stream into scenarios. In Proceedings of the 2001 ACM Workshop on Data Mining for Security Applications, pages 1 13, [3] P. Ning, Y. Cui, and D. S. Reeves. Constructing attack scenarios through correlation of intrusion alerts. In Proceedings of the 9th ACM Conference on Computer and Communications Security, pages , [4] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE), pages 3 14, [5] R. Agrawal and R. Srikant. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the Fifth International Conference on Extending Database Technology (EDBT), volume 1057, pages 3 17, [6] J.Pei,J.Han,B.Mortazavi-Asl,H.Pinto,Q.Chen,U.Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth. In Proceedings of the 17th International Conference on Data Engineering (ICDE), pages , [7] J. Ayres, J. Gehrke, T. Yiu, and J. Flannick. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages , [8] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages , [9] J. W. Haines et al. DARPA intrusion detection evaluation. [10] D. Ourston, S. Matzner, W. Stump, and B. Hopkins. Applications of Hidden Markov Models to detecting multi-stage network attacks. In Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS), pages 334b, [11] L. R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. In Proceedings of the IEEE, volume 77(2), pages , 1989.

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University