Correlating Alerts with a Data Mining Based Approach

Size: px
Start display at page:

Download "Correlating Alerts with a Data Mining Based Approach"

Transcription

1 Correlating Alerts with a Data Mining Based Approach Guang Xiang, Xiaomei Dong, Ge Yu School of Information Science and Engineering, Northeastern University, China xguang80@163.com Abstract In monitoring anomalous network activities, intrusion detection systems tend to generate a large amount of alerts, which greatly increase the workload of post-detection analysis and decision-making. In this paper, we propose a correlation approach based on sequential pattern mining techniques to fuse related alerts for the Distributed Denial of Service (DDoS) attacks. By mining the alert sequences and iteratively consolidating the matching sequential alert patterns, our approach is able to greatly reduce the related alerts and identify their DDoS membership. The alert reduction and fusing mechanism allow us to concentrate on a higher level of abstraction and thus save much extra efforts spent on analyzing a big volume of trivial raw alerts. Experimental comparisons of our method with hidden Markov model (HMM), a powerful stochastic process for sequence analysis, show that our algorithm is slightly better than HMM in terms of DDoS alert sequence identification. 1. Introduction Extensive deployment of the intrusion detection system (IDS) helps boost the cyberspace immunity to potential attacks. However, it also complicates the network by generating a large amount of alerts to be processed. Among various network intrusions, there is an effective one often called the Distributed Denial of Service (DDoS) attack. During a DDoS, the adversary usually launches attacks against the chosen victims indirectly. Specifically, the adversary first probes a network to locate the up hosts (phase 1). After determining which hosts are alive, the adversary breaks into some of them, installs relevant tools (phase 2) and launches DDoS via these victims (phase 3), which serve as the springboard of the attack. As a matter of fact, an attacker can make a DDoS more sophisticated by repeatedly running phase 1 and phase 2 and launching DDoS from the last victim on the chain of exploited machines. We will call this stealthier variant DDoS2. In this case, tracing the true source through the alerts is infeasible and a large volume of false alerts tend to be raised by an IDS. Hence, it is necessary to reduce the irrelevant alerts and fuse related ones before they are presented to the administrators, which will facilitate alert processing and ultimately improve the detection performance. In this paper, we propose such an alert correlation approach based on sequential pattern mining techniques. The justifications of our method partially stem from the fact that an attacker always performs each fraction of the whole intrusion process many times to guarantee success. Our goal is twofold: to correlate related alerts into scenarios at a higher level and to identify DDoS attacks through the scenarios. In evaluation, we conducted a series of experiments over the DARPA 2000 benchmark data corpus comparing our method with hidden Markov model, a powerful stochastic process for sequence analysis. Experimental results show that our approach is better than HMM in terms of DDoS alert sequence identification and alert reduction. The remainder of this paper is organized as follows. Section 2 gives a brief survey of related works. A short summary of sequential pattern mining algorithms and our correlation approach are presented in section 3. In section 4, we compare our method with HMM in identifying DDoS sequences and correlating alerts and report the experimental results. Finally, we draw the conclusions in section Related work To the best of our knowledge, not much research has been done on the field of alert correlation. In the thrusts on using probabilistic methods, Valdes and Skinner proposed a unified mathematical framework for alert correlation in [1]. They add a set of similarity functions to comprehend observations in the intrusion detection domain. Accordingly, a new alert can be merged with the best matching meta alert by computing its similarity to existing meta alerts. On the data mining frontier, [2] presented an algorithm capable of fusing multiple heterogeneous alerts into scenarios. The algorithm builds scenarios by computing the probability that a new alert belongs to an existing scenario and adding the alert to the most likely candidate scenario. Three

2 probability estimation methods were evaluated and the data mining approach was found to outperform the naive and heuristic techniques. In another attempt to alert correlation, Ning et al. [3] constructed a series of prerequisites and consequences of the intrusions and developed a formal framework to correlate related hyper-alerts by matching the consequence of some previous alerts and the prerequisite of some later ones. 3. Alert correlation In this section, we first present some preliminaries regarding sequential pattern mining, which is followed by a detailed description of our correlation approach. The definitions and conventional notations used in this section strictly follows the previous classic works in [4, 5, 6, 7] Preliminaries Let I = {i 1,i 2,...,i n } be a set of items. A subset X I is called an itemset and a sequence is an ordered list of itemsets, denoted by s 1 s 2...s k where s j (1 j k) is an itemset. A sequence database S is a set of tuples sid, s where sid is the sequence ID and s is a sequence. The support of a sequence s in the sequence database S, denoted by sup(s), is defined as the number of sequences s Scontaining s. For a given support threshold minsup, a sequence s is called a frequent sequence or sequential pattern if sup(s) minsup. The aim of sequential pattern mining is to discover all the sequential patterns, given a sequence database S and a support threshold minsup. The sequential patterns are very similar to DDoS attacks in that both are composed of a set of logically related yet temporally separated steps. The effectiveness of our approach partially ascribes to this inherent resemblance. During a DDoS, an IDS will generate alerts on the activities that it regards as anomalous. As many of the attacks detected aim to accomplish the same result, all alerts are mapped into one of three categories or stages: probe, exploit, DDoS according to the alert type labeled on them by the IDS. Then, we aggregate the alerts from hosts with matching addresses according to the temporal order they are generated and further build them into a set of alert sequences. Alerts in a sequence are arranged based on the stage-ascending order. An appropriate sequential pattern mining algorithm then runs on the sequences and the alerts contained in a sequential pattern mined are correlated into a single alert. Subsequently, the longest sequential pattern (lspattern) is selected to represent all the sequential patterns mined in a group. The source and destination addresses of a lspattern are marked respectively by the source address of the first alert and the destination address of the last alert it contains. We then generate DDoS scenarios by iteratively Table 1. Examples of alert sequences. addresses sid alert sequence 1 probe, exploit From address1 to address2 2 probe, exploit 3 probe, exploit 4 probe, exploit, DDoS From address2 to address3 5 probe, exploit, DDoS 6 probe, exploit, DDoS From address4 to address5 7 probe matching the destination address of one lspattern with the source address of another. After these steps, we have fused many trivial raw alerts into only a few and further classify the correlated alerts as belonging to the DDoS attack type. Table 1 shows several examples of the alert sequences. Each item in a sequence is an alert after mapping and the sequences are constructed from the raw alert stream by our algorithm. Mining sequential patterns on the above sequence set with the support threshold set to 2 will eliminate infrequent alert sequence 7. Two lspatterns obtained are <probe, exploit> and <probe, exploit, DDoS>. They satisfy the fusing criterion and thus are correlated together. Hence, sixteen alerts are reduced to one and alerts in sequence 1 to sequence 6 are identified as part of a DDoS attack. A variety of candidate sequential pattern mining algorithms are eligible for our purpose. Agrawal and Srikant introduced three algorithms, AprioriAll, AprioriSome and DynamicSome in [4]. Based on the Apriori property proposed in association rule mining [8], these algorithms make multiple passes over the database and use the candidategeneration-and-test strategy in mining sequential patterns. Agrawal and Srikant added time constraints, sliding time windows and taxonomies to sequential patterns and proposed another algorithm Generalized Sequential Patterns (GSP) in [5]. Pei et al. proposed a novel algorithm, named PrefixSpan [6], which recursively mines the prefixprojected databases for sequential patterns. As a pattern growth method, PrefixSpan greatly reduces the cost spent on candidate subsequence generation in the Apriori-based approaches. In a more recent work, Ayres et al. implemented a new algorithm called SPAM [7] which employs a vertical bitmap representation to store the sequences and utilizes a depth-first traversal strategy for mining sequential patterns. We adopt the SPAM algorithm in our research Data source and approach settings In our research, we use the DARPA 2000 benchmark repository [9]. It consists of two intrusion scenarios, LLDDOS1.0 and LLDDOS2.0.2, each of which contains DDoS data sniffered on both the inside network and the demilitarized zone (DMZ).

3 Algorithm 1 corralerts: Correlate raw alerts and detect DDoS attacks through the generated DDoS scenarios Input: alertset containing m raw alerts, address prefix length l, time constraint t, support threshold s Output: correlated alerts, DDoS scenarios 1: sequenceset gensequence( alertset, t, l ) 2: lspset minepattern( sequenceset, s ) 3: scenarios corrspattern( lspset, l ) 4: output DDoS scenarios and the alerts after correlation In generating alerts, we choose snort and turn on the r option to directly read data from the tcpdump file. In addition, snort is configured according to the network settings of DARPA 2000 evaluation. Among many features of the alerts fired by snort, we only preserve several essential ones: time stamp, source and destination addresses, source and destination ports, alert type, alert category. To depict the proximity of two IP addresses, we adopt a variable l, called address prefix length, which denotes the maximum number of 1 bits in an IPv4 subnet mask that could interpret the two addresses. For instance, for B class addresses and , thel value is 16 since the highest order bit in the third octet differs. The matching process is implemented in a method named match(ip 1,IP 2, l), which returns true if the addresses match each other under address prefix length l. We also add time constraints to the sequential patterns by specifying the maximum time gap t allowed between adjacent alerts in a sequence. Another parameter to tune is the support threshold s for mining sequential patterns. A very small threshold will incorporate a large amount of uninteresting sequential patterns while a too large value is prone to eliminating many useful sequences in mistake Correlation algorithms Our correlation approach is formally described by the following four algorithms, in which algorithm 1 is the main routine and the subsequent three algorithms are invoked to realize alert correlation. Algorithm 2 describes the alert sequence construction process. In real-world DDoS attacks, the final DDoS stage usually consists of packets from many spoofed addresses to a variety of ports on the victims. As a result, simply matching addresses will not work for alerts of this stage and we overcome this difficulty by examining the stage information when adding alerts to a sequence. The algorithm core is the three loops, the first two of which serve as determining an alert group among hosts with matching addresses and the third extracts alerts in different attack stages from the previously obtained alert group to build alert sequences. Algorithm 3 finds the sequential alert patterns from the Algorithm 2 gensequence: Construct alert sequences Input: alertset containing m alerts, t, l Output: sequenceset 1: sequence, seqarray φ 2: k =0 3:foral 1 = alert 1 to alert m do 4: if (al 1 is not marked ) 5: foral 2 = al 1 to alert m do 6: if((al 2 is not marked ) match(al 1.srcIP, al 2.srcIP, l) match(al 1.dstIP, al 2.dstIP, l)) 7: curalert= al 2 8: sequence sequence al 2 9: markal 2 10: foral 3 = al 2 +1to alert m do 11: if((al 3 is not marked ) (al 3.time curalert.time t )) 12: flag= false 13: if (match(al 2.srcIP, al 3.srcIP, l) match(al 2.dstIP, al 3.dstIP, l) al 3.stage == curalert.stage +1) 14: flag = true 15: else if ( curalert.stage == 2 al 3.stage == 3 ) 16: flag = true 17: endif 18: if (flag) 19: curalert= al 3 20: sequence sequence al 3 21: markal 3 22: endif 23: endif 24: endfor 25: seqarray seqarray sequence 26: sequence φ 27: endif 28: endfor 29: sequenceset[k++] seqarray 30: seqarray φ 31: endif 32:endfor 33: output sequenceset sequence groups created in algorithm 2. Alerts in each pattern mined are deemed as strongly relevant and are thus correlated. The longest sequential pattern is reserved to represent all the sequential patterns in a group. It is possible that normal activities of a legitimate user sometimes exhibit similar patterns and are incorrectly captured into alert sequences. However, we argue that such sequences are unlikely to be frequent and consequently, these alerts are excluded from further analysis by the mining algorithm.

4 Algorithm 3 minepattern: Mine sequential alert patterns Input: sequenceset (p alert groups), support s Output: longest sequential alert pattern set lspset 1:lspSet φ 2:fori =0to p do 3: sp SPAM(sequenceSet[i], s) 4: if (sp φ ) 5: correlate all alerts covered by sp 6: lsp = the longest sequential alert pattern in sp 7: lsp.srcip the srcip of the first alert in lsp 8: lsp.dstip the dstip of the last alert in lsp 9: lspset lspset lsp 10: endif 11:endfor 12: output lspset Algorithm 4 corrspattern: Generate DDoS scenarios Input: longest sequential pattern set lspset (containing n lspatterns), address prefix length l Output: DDoS scenarios 1: corrlist, scenariolist φ 2:forlsp 1 = lspattern 1 to lspattern n do 3: if (lsp 1 is not marked ) 4: currentlsp lsp 1 5: corrlist.addtotail(lsp 1 ) 6: marklsp 1 7: forlsp 2 = lsp 1 to lspattern n do 8: if((lsp 2 is not marked ) match(currentlsp.dstip, lsp 2.srcIP, l) currentlsp.lastalert.stage 2 ) 9: currentlsp lsp 2 10: corrlist.addtotail(lsp 2 ) 11: marklsp 2 12: endif 13: endfor 14: scenario fuse the lspatterns in corrlist 15: scenariolist = scenariolist scenario 16: corrlist φ 17: endif 18:endfor 19: output DDoS scenarios in scenariolist In algorithm 4, DDoS scenarios are produced according to the sequential alert patterns obtained. In particular, we require the first lspattern to end with stage2 or stage3 alerts during correlation. By this heuristics, we prevent alerts similar to the one in sequence 7 (Table 1) from being fused Parameter optimization In our research, we assign three distinct values 16, 24 and 32 to address prefix length l. Another vital parameter is the time gap t. Here, t can take values from {20, 50, 100} (minutes), which we think suffice for capturing highly related alerts. As to support s, we pick three values 2, 4 and 8. Thus thereareatotalof27 compositions. To optimize the parameters, we use the inside tcpdump file from LLDDOS1.0 as the training data and run our algorithm on it 27 times. The criterion by which we gauge the superiority of a parameter composition is called correlation ratio defined by C r = N N c (1) n n c where N (n) is the number of true alerts (false alerts) before correlation, N c (n c ) is the number of true alerts (false alerts) after running our algorithms. The optimal parameters are those that yield the largest C r score over the training data and we use them when running our algorithms on the testing sets. Moreover, the mapping mechanisms in determining the intrusion stages of an alert are different for the training and testing processes. We manually assign each alert a stage label in the former case while write a script to implement automatic stage assignment according to the alert type for the latter case. 4. Evaluation To evaluate our approach, we compare it with hidden Markov model in identifying DDoS alert sequences. In this section, we first briefly introduce HMM and explain its usages in multi-stage intrusion detection. Then we report the experimental results achieved by these two methods Hidden Markov model The hidden Markov model is a stochastic process consisting of a finite set of states {S i = q i }(1 i N), each of which can emit observables O = {o 1,o 2,...,o M } according to a probability distribution b j (k)(1 j N,1 k M) associated with it. Transitions among the states are governed by a set of probabilities called transition probabilities {a ij }(1 i, j N). HMM is actually a doubly embedded stochastic process with two hierarchy levels. In fact, HMM has been extensively applied to a variety of areas such as speech recognition, intrusion detection [10], etc. Several properties of the multi-stage DDoS alert sequences match quite well with the HMM representation. Each state of a HMM probabilistically depends only on the previous state, which is very similar to a DDoS scenario where a stage is often based on the outcome of the previous step. In other words, an alert corresponds to an observable in the HMM, while the intrusion stage of the alert can be modeled internally as a state. An intruder has many ways to accomplish each attack step, just as each HMM state can produce many observables.

5 Figure 1. A hidden Markov model for DDoS attack detection. A HMM for DDoS detection is illustrated in Figure 1. In our research, we also use the inside tcpdump file from LLDDOS1.0 as the training data. Specifically, we train the HMM using the alert sequences constructed by algorithm 2 proposed in section 3.3. In addition, we train our HMM with a supervised learning method rather than the popular Baum-Welch method [11]. To combat the skewed distribution problem, we employ the absolute discounting smoothing technique during HMM training. After obtaining a model, we can calculate the probability that a given observable sequence is an emission of the HMM by determining its Viterbi path [11]. This can be described as follows. Suppose we need to find the best state sequence Q = {q 1 q 2 q T } given an observation sequence O = {o 1 o 2 o T }. A quantity is defined for this purpose δ t (j) = max P [q 1 q 2 q t = j, o 1 o 2 o t λ] (2) q 1,q 2,,q t 1 δ t (j) is the highest probability at time t along a state sequence that accounts for the first t observations and ends in state S j. The best state sequence can then be found by induction δ t+1 (k) =[maxδ t (j)a jk ] b k (o t+1 ) (3) j If the probability computed by Viterbi algorithm is no less than a threshold, the alert sequence is regarded as part of a DDoS attack. The rest of our algorithms then continue. Apparently, the difference between our correlation approach and HMM mainly lies in the DDoS alert sequence identification step in algorithm 3 (line 3) Experimental results In section 3.4, we introduce how to optimize the parameters (s, l, t) through training for our correlation approach. The training results are shown in Table 2. Table 2. Training results for parameter optimization. Parameters(s, l, t) (2, 16, 20) (2, 16, 50) (2, 16, 100) correlation ratio (2, 24, 20) (2, 24, 50) (2, 24, 100) correlation ratio (2, 32, 20) (2, 32, 50) (2, 32, 100) correlation ratio (4, 16, 20) (4, 16, 50) (4, 16, 100) correlation ratio (4, 24, 20) (4, 24, 50) (4, 24, 100) correlation ratio (4, 32, 20) (4, 32, 50) (4, 32, 100) correlation ratio (8, 16, 20) (8, 16, 50) (8, 16, 100) correlation ratio (8, 24, 20) (8, 24, 50) (8, 24, 100) correlation ratio (8, 32, 20) (8, 32, 50) (8, 32, 100) correlation ratio Obviously, the winner parameter values are 2, 16 and 100 for s, l and t respectively, which lead to the maximum correlation rate C r 4.73 over the training data. Table 3 summarizes the training and testing performance achieved with our approach and HMM. As shown in Table 3, the number of true alerts reduced on the training set was 298 for our approach and 295 for HMM, which amounts to an alert reduction rate of 73.6% and 72.8% for the two methods respectively. On the LLDDOS1.0 DMZ data set, our approach almost correlated all the true alerts (a 99.8% reductionrate) and far outperformed HMM which achieved 93.1%. In this case, only one alert indicating the existence of DDoS attacks was output thanks to our method. Note that the performance of each model on this data set was better than the performance on the training set. We think two facts might lead to this discrepancy. First, the DMZ data and Inside data are from the same scenario and thus have very similar intrusive patterns. Second, snort emitted far more true alerts on LLDDOS1.0 DMZ data (996) than on LLDDOS1.0 Inside data (405). For the more sophisticated LLDDOS2.0.2 scenario, the performance of both models degraded to some extent. Only 6 true alerts were generated by snort on the DMZ tcpdump data and both methods correlated all of them, attaining the same reduction rate of 83.3%. For the Inside part, our correlation approach reduced 217 true alerts out of 420, slightly more than the 214 alerts cut out by HMM. Both methods achieved an over 50% reduction rate, decreasing the number of true alerts by more than a half. Note that the false

6 Table 3. Correlation performance. LLDDOS1.0 LLDDOS2.0.2 DMZ Inside DMZ Inside #Alerts generated by snort(true/false) 996/ /129 6/45 420/37 Method Ours HMM Ours HMM Ours HMM Ours HMM #True alerts reduced #False alerts reduced Reduction rate(%) 99.8% 93.1% 73.6% 72.8% 83.3% 83.3% 51.7% 51% alerts correlated were also deemed as part of a DDoS scenario and hence induced error to the correlation methods. However, this error is to a large extent entailed by the intrusion detection system and is therefore inevitable. 5. Conclusion In this paper, we propose an alert correlation approach based on sequential pattern mining techniques and give a formal description of our method via four algorithms. The twofold goal of our approach is to reduce the number of alerts and to identify the DDoS attack type. Our motivation is based upon the fact that network administrators often have a really hard time fighting against a large amount of alerts engendered by an IDS. The sequential pattern mining algorithm proves to be robust since though many irrelevant alerts can be frequent, they are unlikely to form sequential alert patterns along the track of a DDoS attack and are hence excluded from further correlation. By targeting such frequently co-occurred alerts which are a good barometer of the existence of DDoS attacks, our approach has great potential in boosting the accuracy of multi-stage intrusion detection. Extensive experimental comparisons with the hidden Markov model showed that our method was slightly better than HMM in terms of DDoS alert sequence identification and alert reduction. Our research is still preliminary mainly in that the parameter constraints added to the sequential pattern mining algorithm are straightforward. As a future line of research, we would like to explore more complicated constraints, which will incorporate more inherent characteristics of the alert sequences. A new criterion for selecting sequential alert patterns other than the sheer counting mechanism is also under our consideration. Acknowledgements This research was partially funded by the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No and the National Natural Science Foundation of China under Grant No References [1] A. Valdes and K. Skinner. Probabilistic alert correlation. In Proceedings of the Fourth International Symposium on Recent Advances in Intrusion Detection, pages 54 68, [2] O. M. Dain and R. K. Cunningham. Fusing a heterogeneous alert stream into scenarios. In Proceedings of the 2001 ACM Workshop on Data Mining for Security Applications, pages 1 13, [3] P. Ning, Y. Cui, and D. S. Reeves. Constructing attack scenarios through correlation of intrusion alerts. In Proceedings of the 9th ACM Conference on Computer and Communications Security, pages , [4] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE), pages 3 14, [5] R. Agrawal and R. Srikant. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the Fifth International Conference on Extending Database Technology (EDBT), volume 1057, pages 3 17, [6] J.Pei,J.Han,B.Mortazavi-Asl,H.Pinto,Q.Chen,U.Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth. In Proceedings of the 17th International Conference on Data Engineering (ICDE), pages , [7] J. Ayres, J. Gehrke, T. Yiu, and J. Flannick. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages , [8] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages , [9] J. W. Haines et al. DARPA intrusion detection evaluation. [10] D. Ourston, S. Matzner, W. Stump, and B. Hopkins. Applications of Hidden Markov Models to detecting multi-stage network attacks. In Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS), pages 334b, [11] L. R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. In Proceedings of the IEEE, volume 77(2), pages , 1989.

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

A multi-step attack-correlation method with privacy protection

A multi-step attack-correlation method with privacy protection A multi-step attack-correlation method with privacy protection Research paper A multi-step attack-correlation method with privacy protection ZHANG Yongtang 1, 2, LUO Xianlu 1, LUO Haibo 1 1. Department

More information

Sequential Pattern Mining Methods: A Snap Shot

Sequential Pattern Mining Methods: A Snap Shot IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-661, p- ISSN: 2278-8727Volume 1, Issue 4 (Mar. - Apr. 213), PP 12-2 Sequential Pattern Mining Methods: A Snap Shot Niti Desai 1, Amit Ganatra

More information

Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns

Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns Selim Mimaroglu, Dan A. Simovici Bahcesehir University,Istanbul, Turkey, selim.mimaroglu@gmail.com University of Massachusetts

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management

More information

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS APPLYIG BIT-VECTOR PROJECTIO APPROACH FOR EFFICIET MIIG OF -MOST ITERESTIG FREQUET ITEMSETS Zahoor Jan, Shariq Bashir, A. Rauf Baig FAST-ational University of Computer and Emerging Sciences, Islamabad

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Medical Data Mining Based on Association Rules

Medical Data Mining Based on Association Rules Medical Data Mining Based on Association Rules Ruijuan Hu Dep of Foundation, PLA University of Foreign Languages, Luoyang 471003, China E-mail: huruijuan01@126.com Abstract Detailed elaborations are presented

More information

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS K. Kavitha 1, Dr.E. Ramaraj 2 1 Assistant Professor, Department of Computer Science,

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Sequential PAttern Mining using A Bitmap Representation

Sequential PAttern Mining using A Bitmap Representation Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu Dept. of Computer Science Cornell University ABSTRACT We introduce a new algorithm for mining

More information

Interestingness Measurements

Interestingness Measurements Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

Parallelizing Frequent Itemset Mining with FP-Trees

Parallelizing Frequent Itemset Mining with FP-Trees Parallelizing Frequent Itemset Mining with FP-Trees Peiyi Tang Markus P. Turkia Department of Computer Science Department of Computer Science University of Arkansas at Little Rock University of Arkansas

More information

An Enhanced Bloom Filter for Longest Prefix Matching

An Enhanced Bloom Filter for Longest Prefix Matching An Enhanced Bloom Filter for Longest Prefix Matching Gahyun Park SUNY-Geneseo Email: park@geneseo.edu Minseok Kwon Rochester Institute of Technology Email: jmk@cs.rit.edu Abstract A Bloom filter is a succinct

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS ABSTRACT V. Purushothama Raju 1 and G.P. Saradhi Varma 2 1 Research Scholar, Dept. of CSE, Acharya Nagarjuna University, Guntur, A.P., India 2 Department

More information

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining Bonfring International Journal of Data Mining, Vol. 4, No. 1, March 214 1 Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining Alpa Reshamwala and Dr. Sunita Mahajan Abstract--- Computer

More information

Web Service Usage Mining: Mining For Executable Sequences

Web Service Usage Mining: Mining For Executable Sequences 7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 266 Web Service Usage Mining: Mining For Executable Sequences MOHSEN JAFARI ASBAGH, HASSAN ABOLHASSANI

More information

A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion

A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion Ye Tian, Gary M. Weiss, Qiang Ma Department of Computer and Information Science Fordham University 441 East Fordham

More information

A Survey of Sequential Pattern Mining

A Survey of Sequential Pattern Mining Data Science and Pattern Recognition c 2017 ISSN XXXX-XXXX Ubiquitous International Volume 1, Number 1, February 2017 A Survey of Sequential Pattern Mining Philippe Fournier-Viger School of Natural Sciences

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

An Algorithm for Mining Large Sequences in Databases

An Algorithm for Mining Large Sequences in Databases 149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential

More information

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information

More information

ADAPTING QUERY OPTIMIZATION TECHNIQUES FOR EFFICIENT ALERT CORRELATION*

ADAPTING QUERY OPTIMIZATION TECHNIQUES FOR EFFICIENT ALERT CORRELATION* ADAPTING QUERY OPTIMIZATION TECHNIQUES FOR EFFICIENT ALERT CORRELATION* Peng Ning and Dingbang Xu Department of Computer Science North Carolina State University { pning, dxu } @ncsu.edu Abstract Keywords:

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

A Rule-Based Intrusion Alert Correlation System for Integrated Security Management *

A Rule-Based Intrusion Alert Correlation System for Integrated Security Management * A Rule-Based Intrusion Correlation System for Integrated Security Management * Seong-Ho Lee 1, Hyung-Hyo Lee 2, and Bong-Nam Noh 1 1 Department of Computer Science, Chonnam National University, Gwangju,

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

A SYSTEM FOR DETECTION AND PRVENTION OF PATH BASED DENIAL OF SERVICE ATTACK

A SYSTEM FOR DETECTION AND PRVENTION OF PATH BASED DENIAL OF SERVICE ATTACK A SYSTEM FOR DETECTION AND PRVENTION OF PATH BASED DENIAL OF SERVICE ATTACK P.Priya 1, S.Tamilvanan 2 1 M.E-Computer Science and Engineering Student, Bharathidasan Engineering College, Nattrampalli. 2

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

6.001 Notes: Section 4.1

6.001 Notes: Section 4.1 6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,

More information

Lecture 2 Wednesday, August 22, 2007

Lecture 2 Wednesday, August 22, 2007 CS 6604: Data Mining Fall 2007 Lecture 2 Wednesday, August 22, 2007 Lecture: Naren Ramakrishnan Scribe: Clifford Owens 1 Searching for Sets The canonical data mining problem is to search for frequent subsets

More information

Generation of Potential High Utility Itemsets from Transactional Databases

Generation of Potential High Utility Itemsets from Transactional Databases Generation of Potential High Utility Itemsets from Transactional Databases Rajmohan.C Priya.G Niveditha.C Pragathi.R Asst.Prof/IT, Dept of IT Dept of IT Dept of IT SREC, Coimbatore,INDIA,SREC,Coimbatore,.INDIA

More information

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India

More information

Mining Temporal Indirect Associations

Mining Temporal Indirect Associations Mining Temporal Indirect Associations Ling Chen 1,2, Sourav S. Bhowmick 1, Jinyan Li 2 1 School of Computer Engineering, Nanyang Technological University, Singapore, 639798 2 Institute for Infocomm Research,

More information

CS395/495 Computer Security Project #2

CS395/495 Computer Security Project #2 CS395/495 Computer Security Project #2 Important Dates Out: 1/19/2005 Due: 2/15/2005 11:59pm Winter 2005 Project Overview Intrusion Detection System (IDS) is a common tool to detect the malicious activity

More information

Parallel Mining of Maximal Frequent Itemsets in PC Clusters

Parallel Mining of Maximal Frequent Itemsets in PC Clusters Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan

More information

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on   to remove this watermark. 119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched

More information

ANOMALY NETWORK INTRUSION DETECTION USING HIDDEN MARKOV MODEL. Received August 2015; revised December 2015

ANOMALY NETWORK INTRUSION DETECTION USING HIDDEN MARKOV MODEL. Received August 2015; revised December 2015 International Journal of Innovative Computing, Information and Control ICIC International c 2016 ISSN 1349-4198 Volume 12, Number 2, April 2016 pp. 569 580 ANOMALY NETWORK INTRUSION DETECTION USING HIDDEN

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

A new algorithm for gap constrained sequence mining

A new algorithm for gap constrained sequence mining 24 ACM Symposium on Applied Computing A new algorithm for gap constrained sequence mining Salvatore Orlando Dipartimento di Informatica Università Ca Foscari Via Torino, 155 - Venezia, Italy orlando@dsi.unive.it

More information

Web Usage Mining: A Research Area in Web Mining

Web Usage Mining: A Research Area in Web Mining Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining

More information

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS 23 CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS This chapter introduces the concepts of association rule mining. It also proposes two algorithms based on, to calculate

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

A Comprehensive Survey on Sequential Pattern Mining

A Comprehensive Survey on Sequential Pattern Mining A Comprehensive Survey on Sequential Pattern Mining Irfan Khan 1 Department of computer Application, S.A.T.I. Vidisha, (M.P.), India Anoop Jain 2 Department of computer Application, S.A.T.I. Vidisha, (M.P.),

More information

A Study on Network Flow Security

A Study on Network Flow Security BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 8, No 3 Sofia 28 A Study on Network Flow Security Tsvetomir Tsvetanov, Stanislav Simeonov 2 Sofia University, Faculty of Mathematics

More information

Associating Terms with Text Categories

Associating Terms with Text Categories Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science

More information

PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN

PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN Dvijesh Bhatt Department of Information Technology, Institute of Technology, Nirma University Gujarat,( India) ABSTRACT Data

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Fuzzy Cognitive Maps application for Webmining

Fuzzy Cognitive Maps application for Webmining Fuzzy Cognitive Maps application for Webmining Andreas Kakolyris Dept. Computer Science, University of Ioannina Greece, csst9942@otenet.gr George Stylios Dept. of Communications, Informatics and Management,

More information

Alert correlation and aggregation techniques for reduction of security alerts and detection of multistage attack

Alert correlation and aggregation techniques for reduction of security alerts and detection of multistage attack Alert correlation and aggregation techniques for reduction of security alerts and detection of multistage attack Faeiz M. Alserhani College of Computer & Information Sciences, Dep. of Computer Engineering

More information

MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS

MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS by Ramin Afshar B.Sc., University of Alberta, Alberta, 2000 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

More information

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation 009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set Dao-I Lin Telcordia Technologies, Inc. Zvi M. Kedem New York University July 15, 1999 Abstract Discovering frequent itemsets

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling Yanhong Wang*, Dandan Ji Department of Information Science and Engineering, Shenyang University of Technology, Shenyang 187, China. * Corresponding

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

HUMAN activities associated with computational devices

HUMAN activities associated with computational devices INTRUSION ALERT PREDICTION USING A HIDDEN MARKOV MODEL 1 Intrusion Alert Prediction Using a Hidden Markov Model Udaya Sampath K. Perera Miriya Thanthrige, Jagath Samarabandu and Xianbin Wang arxiv:1610.07276v1

More information

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare

More information

Association Rule Mining from XML Data

Association Rule Mining from XML Data 144 Conference on Data Mining DMIN'06 Association Rule Mining from XML Data Qin Ding and Gnanasekaran Sundarraj Computer Science Program The Pennsylvania State University at Harrisburg Middletown, PA 17057,

More information

An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range

An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range 13 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.1, January 216 An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range Abhilasha

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

Mining Frequent Itemsets in Time-Varying Data Streams

Mining Frequent Itemsets in Time-Varying Data Streams Mining Frequent Itemsets in Time-Varying Data Streams Abstract A transactional data stream is an unbounded sequence of transactions continuously generated, usually at a high rate. Mining frequent itemsets

More information

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING James J. Rooney 1 José G. Delgado-Frias 2 Douglas H. Summerville 1 1 Dept. of Electrical and Computer Engineering. 2 School of Electrical Engr. and Computer

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

AN ADAPTIVE PATTERN GENERATION IN SEQUENTIAL CLASSIFICATION USING FIREFLY ALGORITHM

AN ADAPTIVE PATTERN GENERATION IN SEQUENTIAL CLASSIFICATION USING FIREFLY ALGORITHM AN ADAPTIVE PATTERN GENERATION IN SEQUENTIAL CLASSIFICATION USING FIREFLY ALGORITHM Dr. P. Radha 1, M. Thilakavathi 2 1Head and Assistant Professor, Dept. of Computer Technology, Vellalar College for Women,

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS

IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS Leontyev Denis Vasilevich, Kharitonov Dmitry Ivanovich and Tarasov Georgiy Vitalievich ABSTRACT Institute of Automation and

More information

Categorization of Sequential Data using Associative Classifiers

Categorization of Sequential Data using Associative Classifiers Categorization of Sequential Data using Associative Classifiers Mrs. R. Meenakshi, MCA., MPhil., Research Scholar, Mrs. J.S. Subhashini, MCA., M.Phil., Assistant Professor, Department of Computer Science,

More information

Part 2. Mining Patterns in Sequential Data

Part 2. Mining Patterns in Sequential Data Part 2 Mining Patterns in Sequential Data Sequential Pattern Mining: Definition Given a set of sequences, where each sequence consists of a list of elements and each element consists of a set of items,

More information

Ans 1-j)True, these diagrams show a set of classes, interfaces and collaborations and their relationships.

Ans 1-j)True, these diagrams show a set of classes, interfaces and collaborations and their relationships. Q 1) Attempt all the following questions: (a) Define the term cohesion in the context of object oriented design of systems? (b) Do you need to develop all the views of the system? Justify your answer?

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Mining for User Navigation Patterns Based on Page Contents

Mining for User Navigation Patterns Based on Page Contents WSS03 Applications, Products and Services of Web-based Support Systems 27 Mining for User Navigation Patterns Based on Page Contents Yue Xu School of Software Engineering and Data Communications Queensland

More information

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Multistep Attacks Extraction Using Compiler Techniques 1

Multistep Attacks Extraction Using Compiler Techniques 1 Multistep Attacks Extraction Using Compiler Techniques 1 Safaa O. Al-Mamory, ZHANG Hongli School of Computer Science, Harbin Institute of technology, Harbin, China safaa_vb@yahoo.com, zhl@pact518.hit.edu.cn

More information

A Fully Unsupervised Appliance Modelling Framework for NILM

A Fully Unsupervised Appliance Modelling Framework for NILM A Fully Unsupervised Appliance Modelling Framework for NILM Bo Liu, Wenpeng Luan, Senior Member, IEEE, Yixin Yu *, Senior Member, IEEE Abstract Most of the existing Non-Intrusive Load Monitoring (NILM)

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Application of Case-Based Reasoning to Multi-Sensor Network Intrusion Detection

Application of Case-Based Reasoning to Multi-Sensor Network Intrusion Detection Application of Case-Based Reasoning to Multi-Sensor Network Intrusion Detection Jidong Long, Daniel Schwartz, and Sara Stoecklin Department of Computer Science Florida State University Tallahassee, Florida

More information

Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT

Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT environment (e.g., Windows vs Linux) Levels of abstraction

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN: IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Brief Survey on Frequent Patterns Mining of Uncertain Data Purvi Y. Rana*, Prof. Pragna Makwana, Prof. Kishori Shekokar *Student,

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information