Combining Clustering with Moving Sequential Patterns Mining: A Novel and Efficient Technique

Size: px
Start display at page:

Download "Combining Clustering with Moving Sequential Patterns Mining: A Novel and Efficient Technique"

Transcription

1 Combining Clustering with Moving Sequential Patterns Mining: A Novel and Efficient Technique Shuai Ma, Shiwei Tang, Dongqing Yang, Tengjiao Wang, Jinqiang Han Department of Computer Science, Peking University, Beijing , China National Laboratory on Machine Perception, Peking University, Beijing , China {mashuai, tjwang, jqhan}@db.pku.edu.cn {tsw, dqyang}@pku.edu.cn Abstract Sequential pattern mining is a well-studied problem. In the context of mobile computing, a special sequential pattern, moving sequential pattern that reflects the moving behavior of mobile users attracted researchers interests recently. Moving sequential patterns can be viewed as a special type of conventional sequential pattern with the extension of support. Mining moving sequential patterns has great significance for effective and efficient location management in wireless communication systems. In this paper a novel and efficient technique is proposed to mine moving sequential patterns. Firstly the idea of clustering is introduced to process the original moving histories into moving sequences as a preprocessing step. And then an efficient algorithm, called PrefixTree, is presented to mine the moving sequences. Performance study shows that PrefixTree outperforms LM algorithm, which is revised to mine moving sequences, in mining large moving sequence databases. Keywords Clustering, Sequential pattern, Moving sequential pattern, Data mining, Mobile Computing Supported by the National High Technology Development 863 Program of China under Grant No. 02AA4Z3440; the National Grand Fundamental Research 973 Program of China under Grant No. G ; the Foundation of the Innovation Research Institute of PKU-IBM. 1

2 1. Introduction In order to provide effective and efficient location management technologies in wireless communication systems, researchers recently tend to focus on characterizing the mobile users moving and/or calling characteristics, such as CMR (Call to Mobility Ratio) [1], Markov model [2], probability distribution based on profile [3], mobility graph [4], and moving behaviors [5,6,7]. User s moving behaviors can help to allocate personal data, pre-fetch useful information, pre-allocate wireless resources, and design personal paging areas, etc. In this paper we focus on mining mobile user maximal moving sequential patterns. Moving sequential patterns is a kind of moving behaviors, and we will first systematically describe the problem of mining moving sequential patterns. It can be viewed as a special case of mining sequential patterns with the extension of support, which helps a more reasonable pattern discovery. There are four major differences when mining conventional sequential patterns and moving sequential patterns. Firstly, if two items are consecutive in a moving sequence α, which is a subsequence of β, those two items must be consecutive in β. That is because we care about what the next move is for a mobile user in a mobile computing environment. Secondly, in mining moving sequential patterns the support considers the number of occurrences in a moving sequence, so the support of a moving sequence is the sum of the number of occurrence in all the moving sequences in the moving sequence database. Thirdly, the Apriori property plays an important role for efficient candidate pruning in mining conventional sequential patterns. For example, suppose <ABC> is a frequent length-3 sequence, and then all the length-2 subsequences {<AB>, <AC>, <BC>} must be frequent in mining sequential patterns. In mining moving sequential patterns <AC> need not be frequent. This is because a mobile user can only move into a neighboring cell in a wireless system and items must be consecutive in mining moving sequential patterns. In addition, <AC> is not a subsequence of <ABC> any more in mining moving sequential patterns, and so the property (any subsequence of a frequent moving sequence must be frequent) is still fulfilled from this meaning, which is called Pseudo-Apriori property. The last difference is that a moving sequence is an order list of items, but not an order list of itemsets, where each item is a cell id. The problem of mining conventional sequential patterns was first introduced by R. Agrawal et al. in [9], and there have been many studies on efficient sequential pattern mining and its applications [10-14], which can be categorized into two classes: (1) Apriori-based [9-12]; (2) Projection-based [13,14]. Wen-Chih Peng et al. present a data-mining algorithm, which involves mining for user moving patterns in a mobile computing environment in [5]. Moving pattern mining is based on a roundtrip model [8], and their LM algorithm selects an initial location S, which is either VLR (Visitor Location Register) or HLR (Home Location Register) whose geography area contains the homes of the mobiles users. Suppose a mobile user goes on errands to a strange place for one month or longer, the method in [5] cannot find the proper moving pattern to characterize the mobile user. A more general method should not give the assumption of the start point of a moving pattern. Basically, algorithm LM is a variant one from GSP [9,10]. The Apriori-based methods can efficiently prune candidate sequence patterns based on Aprior property, but in moving sequential pattern mining we cannot prune candidate sequences efficiently because the moving sequential pattern only preserves Pseudo-Apriori property. In the meanwhile Apriori-based algorithms still encounter problem when a sequence database is large and/or when 2

3 sequential patterns to be mined are numerous and/or long [14]. J. Han et al. propose a method, named FP-tree to mine frequent itemsets without candidate generation in [15], and FP-tree compresses the database into a set of conditional databases (a special kind of projected database), each associated with one frequent item, and mine each database separately. Projection-based methods adopt a divide and conquer idea to confine the search and the growth of subsequence fragments, and leads to efficient processing. Time factor is considered for personal paging area design and location management respectively in [6,7]. Time is also a very important fator in mining moving sequential patterns. G. Das et al. present a clustering based method to discretize a times series in [16], and Hsiao-Kuang Wu et al. also use clustering methods to preprocess the moving logs in personal paging area design [6]. We introduce the idea of clustering into the mining of moving sequential patterns. The idea of using clustering to discretize time is that if a mobile user possesses regular moving behavior, then he often moves on the same set of paths, and the arrival time to each point of the paths is similar. In the meanwhile, day is a natural periodic cycle that a mobile user usually behaves, so we consider the moving histories in a day. Thus the moving sequential patterns reflect a mobile user s moving behaviors in a day unit. In this paper, firstly a clustering algorithm CURD [15] is used to discretize the time attribute of the moving histories, and the moving histories are transformed into moving sequences based on the clustering result. Then based on the idea of projection and Pseudo-Apriori property, we propose an efficient moving sequential pattern mining algorithm PrefixTree, which can effectively represent candidate frequent moving sequences with a key tree structure of prefix trees. Prefix tree is a compact representation of candidate moving sequential patterns. It is easy for us to generate the moving sequential patterns based on the prefix trees. Every moving sequence from the root node to the leaf node is a candidate frequent moving sequences. We can get all the moving sequential patterns by scanning all the prefix trees once. The support of each node decreases with the depth increase, so a new frequent moving sequence is generated when we traverse the prefix trees from the root to the leaves when encountering a node whose count is less than the support threshold. We also use a wireless network topology based optimization method to improve the efficiency of PrefixTree algorithm. Contributions. In this paper, a novel and efficient technique is proposed to mine mobile user maximal moving sequential patterns. Firstly, the idea of clustering is introduced into the mining of moving sequential patterns, whose main role is to discretize the time attribute as a preprocessing step. Secondly, an efficient algorithm, called PrefixTree, is proposed to discover the mobile users moving sequential patterns based on prefix trees. Being different from traditional projection-based sequential algorithms [13,14], PrefixTree doesn t generate projected physical files, which is time-cost work. In addition, PrefixTree only need scan the database three times, which guarantees its efficiency. Thirdly, performance study shows that PrefixTree outperforms LM, which is revised to mine moving sequences, in mining large moving sequence databases. Organization. The rest of the paper is organized as follows. Data preprocessing is given in section 2. In section 3, we describe PrefixTree algorithm. In section 4, we show the experimental results from different viewpoints. Discussion is made in section Data Preprocessing 3

4 User moving history is the moving logs of a mobile user, which is an ordered (c, t) list where c is the cell ID and t is the time when the mobile user reaches cell c. Let MH=<(c 1, t 1 ), (c 2, t 2 ), (c 3, t 3 ), (c n, t n )> be a moving history, and MH means the mobile user enters c 1 at t 1, leaves c 1 and enters c 2 at t 2, leaves c 2 and enters c 3 at t 3 and finally enters c n at t n. Each (c i, t i ) MH (1 i n) is called an element of MH. The time difference between two consecutive elements in a moving history reflects a mobile users moving speed (or sojourn time in a cell). If the difference is high, it shows the mobile user moves at a relative low speed; otherwise, if the difference is low, it shows the mobile user moves at a relative high speed. ID Moving History 1 <(0, 0)(3, 363)(2, 373)(6, 393)(4, 924)(5, 987)(2, 993)(7, 999)(0, 1005)> 2 <(0, 0)(3, 438)(2, 448)(7, 458)(0, 882)> 3 <(0, 0)(3, 383)(2, 393)(6, 403)(4, 770)(5, 795)(2, 801)(7, 807)(0, 813)> 4 <(0, 0)(8, 508)(1, 726)(9, 857)(1, 867)(9, 1163)(0, 1169)> 5 <(0, 0)(8, 551)(1, 1174)(9, 1180)(0, 1186)> 6 <(0, 0)(8, 412)(1, 1022)(9, 1028)(0, 1034)> Table 1 Moving History Database Table 1 shows us a mobile user s moving histories in six days, where the cell set is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and minute is used as the unit of time with the scope from 0 to 1439 in a day period. The following analysis will use the above moving history database as an example to explain the idea of our solution in the whole paper. Cluster ID (Cell ID, T s, T e ) 2.1 Clustering Processing C 01 (0, 0, 0) There are many clustering algorithms, such as [15, 17], and many C 02 (0, 813, 1186) mining tools, such as DB2 intelligent miner for data. A clustering C 1 (1, 726, 1174) method called CURD (Clustering Using Reference and Density) C 21 (2, 373, 448) was presented in [15], which preserves the ability of density based C 22 (2, 801, 993) clustering methods good advantages, and is very efficient because C 3 (3, 363, 438) of its nearly linear time complexity (see [15] for details). Here we C 4 (4, 770, 924) use CURD algorithm to discretize the time attribute. The clustering C 5 (5, 795, 987) processing is explained as follows, C 6 (6, 393, 403) For each cell c in the cell set, we collect all the elements in the C 7 (7, 807, 999) moving history database D, and we denote the element set of c by ES(c)={(c, t) (c, t) MH i MH i D}. Then we use CURD algorithm C 8 (8, 412, 551) to cluster the element set ES(c), where Euclidean distance on time t C 9 (4, 857, 1180) is used as the similarity function between two elements in ES(c). Table 2 Clustering Results This is a clustering problem in one-dimension space. After the clustering processing we have a clustering result as {(c, T s, ID Moving Sequence T e ) T s, T e T T s T e }(T is the time domain), which means the 1 <C 01 C 3 C 21 C 6 C 4 C 5 C 22 C 7 C 02 > mobile user often enters cell c at the period [T s, T e ]. All the 2 <C 01 C 3 C 21 C 7 C 02 > clustering results of the moving history database D are shown 3 <C 01 C 3 C 21 C 6 C 4 C 5 C 22 C 7 C 02 > in Table 2. 4 <C 01 C 8 C 1 C 9 C 1 C 9 C 02 > 2.2 Data Transformation 5 <C 01 C 8 C 1 C 9 C 02 > The main idea of data transformation is to replace the MH 6 <C 01 C 8 C 1 C 9 C 02 > elements with the corresponding clusters that they belong to. Table 3 Moving Sequence Database 4

5 The transformed moving history is called a moving sequence, and the transformed moving history database is called moving sequence database. After transformation the moving history database in Table 2 becomes the moving sequence database in Table 3. Lemma 1: Each moving history can be transformed into one and only one moving sequence. Proof. The clustering method guarantees that any MH element belongs to one and only one cluster, so the above lemma is true. 3. Mining Moving Sequential Patterns 3.1 Related Definitions Let C={CT 1, CT 2, CT n } be a set of all items, each CT i is a tuple, denoted by (c, T s, T e ), where c is a cell id, T s and T e are time points, and T s T e. A moving sequence is an order list of items, and is denoted by <s 1 s 2 s m > (s i C), and each moving sequence records the moving behavior of a mobile user with the same period, such as a day, a month, etc. The number of items in a sequence is called the length of the sequence. A moving sequence with length k is called a k-sequence. A moving sequence α=<a 1 a 2 a n > is called a subsequence of another moving sequence β=<b 1 b 2 b m > and β is a super sequence of α, denoted by α β, if there exists an integer 1 i m such that a 1 =b i, a 2 =b i+1 a n =b i+n-1. A moving sequence database D is a set of tuples <sid, s>, where sid is the sequence id and s is a moving sequence. A tuple <sid, s> is said to contain a sequence α if α s, and the number of occurrences of α within s is denoted by C <sid, s> (α). The support of a sequence α in a moving sequence database D is denoted by support(α)=sum(c <sid, s> (α)), where <sid, s> D and α s. Give a positive integer ζ as the support threshold, a moving sequence α is called a moving sequential pattern, if the support of α is at least ζ in the database, i.e. support(α) ζ. A moving sequential pattern α is called a maximal moving sequential pattern if there is not any other moving sequential pattern β, and α β α β. Lemma 2: A moving sequence α=<a 1 a 2 a n > of length n has n-k+1 subsequences of length k (1 k n). Proof. From the definition of subsequence it is easy to know that the above lemma is true. Lemma 3: A moving sequence α=<a 1 a 2 a n > of length n has 2 subsequences of length n-1. Proof. It follows from lemma 2, if k=n-1, and it is easy know that the above lemma is true. The above two lemmas give the theoretic evidence that the Apriori-based methods fail pruning candidate sequence patterns efficiently in mining moving sequential patterns. Any length n (n>1) candidate moving sequence must be generated from two frequent length n-1 moving sequences. As Lemma 2 shows length n sequence only has 2 subsequence of length n-1 and those subsequences must be frequent, so no candidate sequences can be pruned at all. The above lemmas also give the reason why Pseudo-Apriori property is preserved in mining moving sequential patterns. Definition 1 (Prefix, projection, and postfix) Given a moving sequence α=<a 1 a 2 a n >, a moving sequence β=<b 1 b 2 b m >(m n) is called a prefix of α if and only if b i =a i for (1 i m). Given moving sequences α and β such that β is a subsequence of α, i.e., β α. A subsequence α of α (i.e. α α) is called a projection of α w.r.t. prefix β if and only if (1) α has prefix β and (2) there is no proper super-sequence α of α (i.e. α α but α α ) such that α is a subsequence of α and also has prefix β. 5

6 Let α =<c 1 c 2 c l > be the projection of α w.r.t. prefix β=<c 1 c 2 c k > (k l). Sequence γ=<c k+1 c l > is called the postfix of α w.r.t. prefix β, denoted by γ=α/β. If α =β, postfix of α w.r.t. β are empty, and if β is not a subsequence of α, both projection and postfix of α w.r.t. β are empty. From the above definitions, we can see they are similar to the ones in PrefixSpan [14], and are changed according to the characteristics of moving sequences. Definition 2 (Before relation and After relation) For any two items, their time attributes have the following two relations: For two items C i =(c k, T si, T ei ) and C j =(c m, T sj, T ej ), if (T ei <T sj ), we call item C i is before item C j, denoted by C i C j. For two items C i =(c k, T si, T ei ) and C j =(c m, T sj, T ej ), if (T ei T sj ), we call item C i is after item C j, denoted by C i C j. From the above definitions, we can see that for any two items must fulfill either before or after relation, and if they fulfill before relation they cannot fulfill after relation, vice versa. We now give an intuitionistic explanation about the before and after relations, suppose two items C i and C j both appear in a moving sequence. If C i C j, C i must appear before C j in any other moving sequence, and if C i C j, C i may appear after C j in other moving sequences (sometimes C i may appear before C j ). Definition 3 (Candidate Consecutive Items) For each item C i, we call the items that may appears just after it candidate consecutive items. It is easy to know that only the items after C i in a moving sequence may be the consecutive items of C i, denoted by CCI(C i ), and CCI(C i )={C j C j C i }. 3.2 PrefixTree Algorithm The mining of mobile user maximal moving sequential patterns is divided into six steps as follows, Step 1. Scanning the database once to find the set of frequent items, and then order them according to their values of T s attribute. Step 2. Computing candidate consecutive items in terms of the after relation. Step 3. Scanning the database again to generate frequent length-2 moving sequences, where the candidate frequent length-2 moving sequences are generated based on candidate consecutive items. Based on the frequent length-2 moving sequences, candidate consecutive items are reduced again based on Pseudo-Apriori property. Step 4. Scanning the database to construct prefix trees based on candidate consecutive items and Pseudo-Apriori property, which is the last time to scan the database. Step 5. Generating moving sequential patterns based on the prefix trees. Step 6. Generating maximal moving sequential patterns. We will use the example of moving sequence database D in Table 3 with support=3 to show the details of the above six steps one by one. Step 1 (Generating Frequent Items) This processing is similar to normal sequential pattern mining. PrefixTree first scans D, collects the support for each item, and find the frequent items. Frequent items are listed in an ascending order according to their T s attribute in the form of item: support as below, F_ItemList={C 01 : 6, C 3 : 3, C 21 : 3, C 8 : 3, C 1 : 4, C 7 : 3, C 02 : 6, C 9 : 4}. Those frequent items are also the frequent length-1 moving sequences. Step 2 (Computing Candidate Consecutive Items) 6

7 Each item has two time attributes T s and T e, which form a time ID CCIs region that reflects a mobile user often enters some cell area at some C 01 C 01 C 3 C 21 C 8 C 7 C 1 C 9 C 02 time between T s and T e. We compute the frequent items candidate consecutive items according to the after relation. This processing is very simple. First, we order the frequent items in an ascending order according to their T e attribute. Second, for each frequent item, the frequent items, who s T e are equal to or bigger than it s T s, are listed in its CCI. The details of the frequent items CCIs are showed in Table 4. This step helps to decrease the number of candidate length-2 moving sequential patterns in the next step, and then improves the processing efficiency. Step 3 (Generating Length-2 Moving Sequential Patterns) C 3 C 21 C 8 C 1 C 7 C 02 C 9 C 3 C 21 C 8 C 7 C 1 C 9 C 02 C 3 C 21 C 8 C 7 C 1 C 9 C 02 C 3 C 21 C 8 C 7 C 1 C 9 C 02 C 7 C 1 C 9 C 02 C 7 C 1 C 9 C 02 C 7 C 1 C 9 C 02 C 7 C 1 C 9 C 02 Table 4 Initial CCIs To any item it may have the same number of candidate length-2 moving sequential patterns as the number of all the frequent items. The candidate consecutive items are used to limit the number of candidate length-2 moving sequences, which improves the processing efficiency. For example, the candidate length-2 moving sequential patterns with prefix <C 01 > are {<C 01 C i > C i CCI(C 01 )}. By scanning once the moving sequence database D, we can compute the support for all candidate length-2 moving sequential patterns. Now we have the complete set of length-2 moving sequential patterns in the form of sequence: support as bellow, F_Length2List ={<C 01 C 3 >: 3, <C 01 C 8 >: 3, <C 3 C 21 >: 3, <C 8 C 1 >: 3, <C 1 C 9 >: 4, <C 7 C 02 >: 3, <C 9 C 02 >: 3}. After getting all the length-2 moving sequential patterns, The CCI of ID CCIs each frequent item can be optimized again. Suppose there is a length-n C 01 C 3 C 8 moving sequential pattern α=<a 1 a 2 a n >, and we will generate all the length-(n+1) moving sequential patterns with prefix α. All the candidate frequent moving sequences have the form of <a 1 a 2 a n b>, and if <a 1 a 2 a n b> is frequent, <a n b> must be frequent based on the Pseudo-Apriori property. According to the definition of CCI it is easy to know that b CCI(a n ), so we get the rule that for any item b CCI(a), <ba> must be frequent. Thus candidate consecutive items of each item can be reduced again, which is shown in Table 5. We can see that this results in a C 3 C 21 C 8 C 1 C 7 C 02 C 9 C 21 C 1 C 9 C 02 C 02 huge reduction of the CCI s scale. The behind fact is that a mobile user can Table 5 Reduced CCIs only move into the current cell s neighbor cells. The number of neighbor cells is often small, for example it is two in one-dimension model, and six in two-dimension hexagonal model, and eight in two-dimension mesh model, and is relative small even in graph model []. Any frequent moving sequence α with C as its last item cannot grow into longer frequent moving sequences having prefix α if the CCI of C is empty. Step 4 (Constructing Prefix Trees) In this step we will construct a prefix tree for each frequent item. The moving sequential patterns, i.e. the complete set of frequent moving sequences, can be divided into the following eight subsets according to the eight prefixes: (1) the one having prefix <C 01 >; and (8) the one having prefix <C 9 >. The eight prefixes are ordered by their value of attribute T s. Prefix tree is a compact representation of candidate moving sequential patterns, which has the following characteristics: 1) The root is the frequent item, and is defined at depth one; 7

8 2) Each node contains three attributes: one is the item, one is the count attribute which means the support of the item, and the last one is the flag indicating whether the node is traversed; 3) The items of a node s children are all contained in its candidate consecutive items. Firstly, the prefix trees are initialized based on the frequent items and the frequent length-2 moving sequences. Secondly, by scanning the database once, all the prefix trees, which can effectively represent the complete set of candidate frequent moving sequences, can be constructed. The algorithms of constructing prefix trees are shown as follows, Algorithm1 GPS (Generating Projected Sequences) Input: Moving sequences α and β Output: The projected sequences of α w.r.t. β. 1. S= ; 2. α : =postfix (α, β); //postfix (α, β) is a function, which returns the postfix of α w.r.t. β 3.While (α is not empty) { 4. S=S {α }; 5. α : =postfix (α, β); 6.} 7. Return S Algorithm2 ISPT (Insert a Sequence into a Prefix Tree) Input: A moving sequence α, and a prefix tree PT Output: a prefix tree PT 1. If the root node N R of PT contains no children Then break; 2. For every child node N C of N R { 3. β: =<N R.itemN C.item>; 4. S: =GPS(α, β); 5. For every sequence α in S { 6. b is item that is contained in the current node; 7. While (α is not empty) { 8. c is the first item of α ; 9. If some child node of the current one that contains c, just increase the node s support by one; 10. Else If c CCI(b), generate a new node as the child node of the current one; 11. Otherwise, stop the while loop; 12. α : = postfix (α, <c>); 13. b: = c; 14. } 15. } 16. } 17. Return PT Algorithm3 CPT (Constructing Prefix Trees) Input: A moving sequence database D, the frequent items set FSet, and the set of CCIs Output: The complete set of prefix trees 1. Initializing prefix trees PTSet according to FSet and CCIs; 2. For every moving sequence α in D { 3. For every initial prefix tree pt in PTSet 4. ISPT (α, pt); 5. } 6. Return PTSet Given moving sequences α, algorithm CPT is used to add the information contained in α to all the existing prefix trees; Algorithm ISPT is really used to add the information contained in α to a prefix tree; And algorithm GPS is used to generate the projected sequences. All this work can be done in one scan of the moving database, which guarantees its efficiency. The novelty of PrefixTree is that it doesn t generate projected physical files, which is different from traditional projection-based sequential algorithms [13,14] and is a time-cost work. Step 5 (Generating Moving Sequential Patterns) 8

9 Based on the prefix trees, it is easy for us to generate the moving sequential patterns. Every moving sequence from the root node to the leaf node is a candidate frequent moving sequences. Scanning all the prefix trees once can get all the moving sequential patterns. Here the depth first searching strategy is used to find all the frequent moving sequences in a prefix tree. The support of each node decreases with the depth increase, so a new frequent moving sequence is generated when traversing the prefix trees from the root to the leaves when encountering a node whose count is less than the support threshold. Step 6 (Generating Maximal Moving Sequential Patterns) This step is to delete the moving sequential patterns that are contained in other moving sequential patterns. The method of maximal phase in [9] can be used to get the final maximal moving sequential patterns. 3.3 Optimization of PrefixTree Algorithm As pointed in [21], the initial candidate set generation, especially for the length-2 frequent patterns, is the key issue to improve the performance of data mining. Fortunately, sometimes the apriori of the wireless network topology can be got. From the apriori we can know the neighboring cells for each cell. We have pointed out that for any two consecutive items in a moving sequence, one item must be the other one s neighboring cell or itself. The number of neighbor cells is often small, for example it is two in one-dimension model, and six in two-dimension hexagonal model, and eight in two-dimension mesh model, and is relative small even in graph model []. So based on this apriori the number of the candidate length-2 moving sequential patterns is decreased much, and then we can efficiently generating the frequent length-2 moving sequential patterns in step 3. In fact, this optimization method can be used for other algorithms too, and we show how efficient it is in later experiments. 3.4 Analysis of PrefixTree Algorithm Based on Pseudo-Apriori property, for any frequent length-n moving sequence α=<a 1 a 2 a n >, all the possible frequent length-(n+1) moving sequences having prefix α have the form of <a 1 a 2 a n b>, and if <a 1 a 2 a n b> is frequent, <a n b> must be frequent based on the Pseudo-Apriori property, so b must belong to CCI(a n ). From the above description and analysis of PrefixTree algorithm, it is easy to know the prefix trees contain all the possible frequent moving sequences, and PrefixTree can discover the complete set of maximal moving sequential patterns. PrefixTree needs to scan the moving sequence database once to generate the frequent items and scan the database again to generate the frequent length-2 moving sequences. Then PrefixTree constructs the prefix trees by scanning the database for the last time, thus in all PrefixTree only needs scan the database three times, which guarantees the efficiency of PrefixTree algorithm. 4. Experimental Results and Performance Study All experiments are performed on a 1.7GHz Pentium 4 PC machine with 512M main memory and 60G hard disk, running Microsoft Windows 00 Professional. All the methods are implemented using JBuilder 6.0. The synthetic dataset used in our experiments comes from SUMATRA (Stanford University Mobile Activity TRAces) [22]. BALI-2: Bay Area Location Information (real-time) dataset records the mobile users moving and calling activities in a day. The mobile user averagely moves 7.2 times in a day in 90 zones, so the number of items is 90 and the average length of the moving sequences is 8.2. BALI-2 contains about 40,000 moving sequences, which are used for our 9

10 experiments. Since the details of experimental analysis of CURD are shown in [15], here we focus on the experimental analysis of PrefixTree. We report our experimental results on the performance of PrefixTree in comparison with our version of LM, called Revised LM. Revised LM first computes the frequent items and length-2 moving sequential patterns, and then it uses the original LM to find the moving sequential patterns, and lastly the maximal moving sequential patterns are found. The Revised LM uses the same method as PrefixTree to find the frequent items and length-2 moving sequential patterns, and uses the same method to find the maximal moving sequential patterns (see [5] for details about LM). Before showing the experimental results, we first give a simple I/O analysis of PrefixTree and Revised LM. Let D is the number of moving sequences in database D, and L is the length of the longest moving sequential pattern. Let reading a moving sequence in a data file costs 1 unit of I/O. The I/O cost of Revised LM is equal to L( D ); the I/O cost of PrefixTree is equal to 3( D ). From the I/O costs analysis we could get a coarse conclusion that PrefixTree is more efficient than LM if L is bigger than 3. We have stated that the average length of the moving sequences is 8.2, so the length is a relative small number. Even so, PrefixTree still outperforms Revised LM in our experiments PrefixTree Revised LM PrefixTree Revised LM Rumtime(seconds) Support threshold(50) Figure 8 PrefixTree and Revised LM Support threshold(50) Figure 9 PrefixTree and Revised LM with constraint 140 PrefixTree 35 PrefixTree 1 Revised LM 30 Revised LM Rumtime(seconds) # of sequences(10 thousand) # of sequences(10 thousand) Figure 10 PrefixTree and Revised LM Figure 11 PrefixTree and Revised LM with constraint The experimental results of scalability of PrefixTree and Revised LM according to the support threshold are shown in Figure 8 and Figure 9, and the number of moving sequences is about 40,000. Figure 8 and Figure 9 show the experimental results without and with the wireless network topology constraint respectively. When the support is high, there is only a limited number of moving sequential patterns, and the length of patterns is short, the two methods are close to each other according to their runtime. However as the support threshold decreases, PrefixTree is more efficient than Revised LM because a lower support threshold results in longer moving sequential patterns. If we compare Figure 8 with Figure 9, we can find that the run time of PrefixTree with constraint is about 1/6 of PrefixTree and the one of Revised LM with constraint is 10

11 about 1/3 of Revised LM. That proves the efficiency of the wireless network topology based optimization method. Figure 10 and Figure 11 show the experimental results of the scalability of PrefixTree and Revised LM in terms of the number of moving sequences, and the support threshold is set to 100. Figure 10 and Figure 11 show the experimental results without and with the wireless network topology constraint respectively. Both methods are linearly scalable, but PrefixTree is more efficient than Revised LM. If we compare Figure 10 with Figure 11, we can find that the run time of PrefixTree with constraint is about 1/5 of PrefixTree and the one of Revised LM with constraint is about 1/4 of Revised LM. That proves the efficiency of the wireless network topology based optimization method again. Rumtime(seconds) Step3 Other steps Step3 Other steps Support threshold(50) Figure 12 Details of PrefixTree Support threshold(50) Figure 13 Details of PrefixTree with constraint Figure 12 and Figure 13 show the executing details of PrefixTree from the aspect of support threshold, and the number of moving sequences is about 40,000. Figure 12 shows the result without the wireless network topology constraint, and Figure 12 shows that the step 3, which needs to generate frequent length-2 moving sequences, occupies the main cost of PrefixTree. Compared with step 3 the other steps of PrefixTree occupy only a little time. Figure 13 shows the experimental result with the wireless network topology constraint and Figure 13 shows that the step 3 does not occupy the main cost of PrefixTree any more. Compared with the run time of step 3 in Figure 12, the one of step 3 in Figure 13 is much small. The run time of step 3 with the wireless network topology constraint is only about 1/ of the one of step without optimization, which shows the efficiency and effectivity of the optimization method Step3 Other steps Step3 Other steps # of sequences(10 thousand) Figure 14 Details of PrefixTree # of sequences(10 thousand) Figure 15 Details of PrefixTree with constraint Figure 14 and Figure 15 show the executing details of PrefixTree from the aspect of the number of moving sequences, and the support threshold is set to 100. Figure 14 shows the result without the wireless network topology constraint, and Figure 14 shows that the step 3, which needs to generate frequent length-2 moving sequences, occupies the main cost of PrefixTree. Compared with step 3 the other steps of PrefixTree occupy only a little time. Figure 15 shows the result with the wireless network topology constraint and Figure 15 shows that the step 3 does not occupy the main cost of PrefixTree. Compared with the run time of step 3 in Figure 14, the one of 11

12 step 3 in Figure 15 is much small. The run time of step 3 with the wireless network topology constraint is only about 1/ of the one of step without optimization, which shows the efficiency and effectivity of the optimization method again Step3 with constraint Step Step3 with constraint Step Figure 16 Comparison of Step 3 Figure 17 Comparison of Step 3 Figure 16 and Figure 17 show the experimental results of the comparison of step 3 of PrefixTree from the aspects of with or without the wireless network topology constraint. The run time of step 3 with the wireless network topology constraint is only about 1/ of the one of step without optimization. Both the experimental results show that the optimization based on the wireless network topology improves much the efficiency of generating the length-2 moving sequential patterns and the whole algorithm. In summary, performance study shows that PrefixTree algorithm, which needs only three scans of the moving sequence database, is more efficient and scalable than Revised LM, which needs multiple scans of the moving sequence database limited with the length of the longest moving sequential pattern. In addition, the optimization based on the wireless network topology improves much the efficiency of mining processing. 5. Discussion Support threshold(50) In this paper a clustering method is introduced to discretize the time attribute in moving histories, which transforms the original moving histories into moving sequences based on the clustering results. And then a novel and efficient method, called PrefixTree, is proposed to mine the moving sequences. Its main idea is to generate projected sequences and construct the prefix trees. Performance study shows that PrefixTree is more efficient and scalable than Revised LM. Another valuable function of PrefixTree is supporting support threshold tuning. It is highly desirable because in most cases the user tends to try a few minimum supports before being satisfied with the result. Pruning the prefix trees with a smaller support threshold, which avoid re-mining the whole moving sequence database, can generate the prefix trees with higher support threshold. Acknowledgement # of sequences(10 thousand) We are very grateful to Professor Lu Hong Jun for giving us many good suggestions and advices, which help improve the quality of this paper; we are also thankful to Professor Pei Jian for helping us to understand the PrefixSpan algorithm. 12

13 References [1] T.Imielinski, B.R.Badrinath. Querying in Highly Mobile Distributed Environments. Proc. of VLDB Conference, pp , [2] A. Bar-Noy, I. Kessler, and M. Sidi. Mobile Users: To Update or not to Update? ACM/Baltzer J. Wireless Networks, vol. 1, no. 2, pp , [3] Sami Tabbane. An Alternative Strategy for Location Tracking. IEEE Journal on Selected Areas in Comms, vol. 13, pp , [4] Zohar Naor, Hanoch Levy. Minimizing the Wireless Cost of Tracking Mobile Users: An Adaptive Threshold Scheme. Proc. of INFOCOM Conference, pp , [5] Wen-Chih Peng, Ming-Syan Chen. Mining User Moving Patterns for Personal Data Allocation in a Mobile Computing System. Proc. of ICPP Conference, pp , 00. [6] Hsiao-Kuang Wu, Ming-Hui Jin, Jorng-Tzong Horng. Personal Paging Area Design Based On Mobiles Moving Behaviors. Proc. of INFOCOM Conference, pp , 01. [7] Wenchao Ma and Yuguang Fang. A New Location Management Strategy Based on Mobility Pattern for Wireless Networks. Proc. of LCN Conference, pp , 02. [8] N. Shivakumar, J. Jannink, and J. Widom.Per-user Profile Replication in Mobile Environments: Algorithms Analysis and Simulation Result. ACM/Baltzer Journal of Mobile Networks and Applications, 2(2): , [9] Rakesh Agrawal, Ramakrishnan Srikant. Mining Sequential Patterns. Proc. of ICDE Conference, pp.3-14, [10] Ramakrishnan Srikant, Rakesh Agrawal, Mining Sequential patterns: Generalizations and Performance Improvements. Proc. of EDBT Conference, pp.3-17, [11] Mohammed J. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning, 0, 1-31 (00) Kluwer Academic Publishers. [12] Jay Ayres, Johannes Gehrke, Tomi Yu, and Jason Flannick. Sequential PAttern Mining using A Bitmap Representation. Proc. of KDD Conference, pp , 02. [13] Jiawei Han, Jian Pei, Behzad Mortazavi-Asl et al. FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining. Proc. of KDD Conference, pp , 00. [14] J. Pei, J. Han, B. Mortazavi-Asl et al. PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth. Proc. of ICDE Conference, pp , 01. [15] Shuai Ma, TengJiao Wang, ShiWei Tang et al. A New Fast Clustering Algorithm Based on Reference and Density. G. Dong et al. (Eds.), Proc. of WAIM Conference, LNCS 2762, Springer-Verlag, pp , 03. [16] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule Discovery from Time Series. Proc. of KDD Conference, pp , [17] M. Ester, H. P. Kriegel, J. Sander et al. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. of KDD Conference, pp , [18] Jian Pei, Jiawei Han, Wei Wang. Mining Sequential Patterns with Constraints in Llarge Databases. Proc. of CIKM Conference, pp , 02. [19] J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. Proc. of SIGMOD Conference, pp. 1 12, 00. [] Vincent W.-S. Wong, Victor C. M. Leung. Location Management for Next Generation Personal Communication Networks. IEEE network, vol. 14, no. 5, pp , 00. [21] Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE TKDE, vol.9, no.5, pp , [22] SUMATRA: Stanford University Mobile Activity TRAces. 13

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS ABSTRACT V. Purushothama Raju 1 and G.P. Saradhi Varma 2 1 Research Scholar, Dept. of CSE, Acharya Nagarjuna University, Guntur, A.P., India 2 Department

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

SeqIndex: Indexing Sequences by Sequential Pattern Analysis

SeqIndex: Indexing Sequences by Sequential Pattern Analysis SeqIndex: Indexing Sequences by Sequential Pattern Analysis Hong Cheng Xifeng Yan Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign {hcheng3, xyan, hanj}@cs.uiuc.edu

More information

Sequential PAttern Mining using A Bitmap Representation

Sequential PAttern Mining using A Bitmap Representation Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu Dept. of Computer Science Cornell University ABSTRACT We introduce a new algorithm for mining

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS

MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS by Ramin Afshar B.Sc., University of Alberta, Alberta, 2000 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

More information

This paper proposes: Mining Frequent Patterns without Candidate Generation

This paper proposes: Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation a paper by Jiawei Han, Jian Pei and Yiwen Yin School of Computing Science Simon Fraser University Presented by Maria Cutumisu Department of Computing

More information

Mining Web Access Patterns with First-Occurrence Linked WAP-Trees

Mining Web Access Patterns with First-Occurrence Linked WAP-Trees Mining Web Access Patterns with First-Occurrence Linked WAP-Trees Peiyi Tang Markus P. Turkia Kyle A. Gallivan Dept of Computer Science Dept of Computer Science School of Computational Science Univ of

More information

Data Mining for Knowledge Management. Association Rules

Data Mining for Knowledge Management. Association Rules 1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Parallelizing Frequent Itemset Mining with FP-Trees

Parallelizing Frequent Itemset Mining with FP-Trees Parallelizing Frequent Itemset Mining with FP-Trees Peiyi Tang Markus P. Turkia Department of Computer Science Department of Computer Science University of Arkansas at Little Rock University of Arkansas

More information

Sequential Pattern Mining Methods: A Snap Shot

Sequential Pattern Mining Methods: A Snap Shot IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-661, p- ISSN: 2278-8727Volume 1, Issue 4 (Mar. - Apr. 213), PP 12-2 Sequential Pattern Mining Methods: A Snap Shot Niti Desai 1, Amit Ganatra

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Comparing the Performance of Frequent Itemsets Mining Algorithms

Comparing the Performance of Frequent Itemsets Mining Algorithms Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range

An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range 13 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.1, January 216 An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range Abhilasha

More information

Applying Data Mining to Wireless Networks

Applying Data Mining to Wireless Networks Applying Data Mining to Wireless Networks CHENG-MING HUANG 1, TZUNG-PEI HONG 2 and SHI-JINN HORNG 3,4 1 Department of Electrical Engineering National Taiwan University of Science and Technology, Taipei,

More information

PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN

PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN Dvijesh Bhatt Department of Information Technology, Institute of Technology, Nirma University Gujarat,( India) ABSTRACT Data

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

A Comprehensive Survey on Sequential Pattern Mining

A Comprehensive Survey on Sequential Pattern Mining A Comprehensive Survey on Sequential Pattern Mining Irfan Khan 1 Department of computer Application, S.A.T.I. Vidisha, (M.P.), India Anoop Jain 2 Department of computer Application, S.A.T.I. Vidisha, (M.P.),

More information

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Sequence Data Sequence Database: Timeline 10 15 20 25 30 35 Object Timestamp Events A 10 2, 3, 5 A 20 6, 1 A 23 1 B 11 4, 5, 6 B

More information

An Algorithm for Mining Frequent Itemsets from Library Big Data

An Algorithm for Mining Frequent Itemsets from Library Big Data JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 2361 An Algorithm for Mining Frequent Itemsets from Library Big Data Xingjian Li lixingjianny@163.com Library, Nanyang Institute of Technology, Nanyang,

More information

Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns

Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns Guimei Liu Hongjun Lu Dept. of Computer Science The Hong Kong Univ. of Science & Technology Hong Kong, China {cslgm, luhj}@cs.ust.hk

More information

PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth

PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth PrefixSpan: ining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jian Pei Jiawei Han Behzad ortazavi-asl Helen Pinto Intelligent Database Systems Research Lab. School of Computing Science

More information

An Algorithm for Mining Large Sequences in Databases

An Algorithm for Mining Large Sequences in Databases 149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS APPLYIG BIT-VECTOR PROJECTIO APPROACH FOR EFFICIET MIIG OF -MOST ITERESTIG FREQUET ITEMSETS Zahoor Jan, Shariq Bashir, A. Rauf Baig FAST-ational University of Computer and Emerging Sciences, Islamabad

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

More information

An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams *

An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams * An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams * Jia-Ling Koh and Shu-Ning Shin Department of Computer Science and Information Engineering National Taiwan Normal University

More information

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity Unil Yun and John J. Leggett Department of Computer Science Texas A&M University College Station, Texas 7783, USA

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Discover Sequential Patterns in Incremental Database

Discover Sequential Patterns in Incremental Database Discover Sequential Patterns in Incremental Database Nancy P. Lin, Wei-Hua Hao, Hung-Jen Chen, Hao-En, and Chueh, Chung-I Chang Abstract The task of sequential pattern mining is to discover the complete

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}

More information

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management

More information

Item Set Extraction of Mining Association Rule

Item Set Extraction of Mining Association Rule Item Set Extraction of Mining Association Rule Shabana Yasmeen, Prof. P.Pradeep Kumar, A.Ranjith Kumar Department CSE, Vivekananda Institute of Technology and Science, Karimnagar, A.P, India Abstract:

More information

Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce

Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce Maria Malek and Hubert Kadima EISTI-LARIS laboratory, Ave du Parc, 95011 Cergy-Pontoise, FRANCE {maria.malek,hubert.kadima}@eisti.fr

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

International Journal of Scientific Research and Reviews

International Journal of Scientific Research and Reviews Research article Available online www.ijsrr.org ISSN: 2279 0543 International Journal of Scientific Research and Reviews A Survey of Sequential Rule Mining Algorithms Sachdev Neetu and Tapaswi Namrata

More information

ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences

ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences Antonio Gomariz 1,, Manuel Campos 2,RoqueMarin 1, and Bart Goethals 3 1 Information and Communication Engineering Dept., University of

More information

A New Fast Vertical Method for Mining Frequent Patterns

A New Fast Vertical Method for Mining Frequent Patterns International Journal of Computational Intelligence Systems, Vol.3, No. 6 (December, 2010), 733-744 A New Fast Vertical Method for Mining Frequent Patterns Zhihong Deng Key Laboratory of Machine Perception

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Maintenance of the Prelarge Trees for Record Deletion

Maintenance of the Prelarge Trees for Record Deletion 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of

More information

Fast Accumulation Lattice Algorithm for Mining Sequential Patterns

Fast Accumulation Lattice Algorithm for Mining Sequential Patterns Proceedings of the 6th WSEAS International Conference on Applied Coputer Science, Hangzhou, China, April 15-17, 2007 229 Fast Accuulation Lattice Algorith for Mining Sequential Patterns NANCY P. LIN, WEI-HUA

More information

Frequent Itemsets Melange

Frequent Itemsets Melange Frequent Itemsets Melange Sebastien Siva Data Mining Motivation and objectives Finding all frequent itemsets in a dataset using the traditional Apriori approach is too computationally expensive for datasets

More information

A New Fast Clustering Algorithm Based on Reference and Density

A New Fast Clustering Algorithm Based on Reference and Density A New Fast Clustering Algorithm Based on Reference and Density Shuai Ma 1, TengJiao Wang 1, ShiWei Tang 1,2, DongQing Yang 1, and Jun Gao 1 1 Department of Computer Science, Peking University, Beijing

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

Data Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases

Data Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases Data Mining: Concepts and Techniques Chapter 8 8.3 Mining sequence patterns in transactional databases Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign

More information

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India

More information

A SHORTEST ANTECEDENT SET ALGORITHM FOR MINING ASSOCIATION RULES*

A SHORTEST ANTECEDENT SET ALGORITHM FOR MINING ASSOCIATION RULES* A SHORTEST ANTECEDENT SET ALGORITHM FOR MINING ASSOCIATION RULES* 1 QING WEI, 2 LI MA 1 ZHAOGAN LU 1 School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450002,

More information

Roadmap. PCY Algorithm

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets : A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets J. Tahmores Nezhad ℵ, M.H.Sadreddini Abstract In recent years, various algorithms for mining closed frequent

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent

More information

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS K. Kavitha 1, Dr.E. Ramaraj 2 1 Assistant Professor, Department of Computer Science,

More information

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information

More information

Parallel Mining of Maximal Frequent Itemsets in PC Clusters

Parallel Mining of Maximal Frequent Itemsets in PC Clusters Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2017 Han, Kamber & Pei. All

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

Survey: Efficent tree based structure for mining frequent pattern from transactional databases IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 5 (Mar. - Apr. 2013), PP 75-81 Survey: Efficent tree based structure for mining frequent pattern from

More information

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking Shariq Bashir National University of Computer and Emerging Sciences, FAST House, Rohtas Road,

More information

CSCI6405 Project - Association rules mining

CSCI6405 Project - Association rules mining CSCI6405 Project - Association rules mining Xuehai Wang xwang@ca.dalc.ca B00182688 Xiaobo Chen xiaobo@ca.dal.ca B00123238 December 7, 2003 Chen Shen cshen@cs.dal.ca B00188996 Contents 1 Introduction: 2

More information

Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint

Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint William Cheung and Osmar R. Zaïane University of Alberta, Edmonton, Canada {wcheung, zaiane}@cs.ualberta.ca Abstract

More information

A Novel Method of Optimizing Website Structure

A Novel Method of Optimizing Website Structure A Novel Method of Optimizing Website Structure Mingjun Li 1, Mingxin Zhang 2, Jinlong Zheng 2 1 School of Computer and Information Engineering, Harbin University of Commerce, Harbin, 150028, China 2 School

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.

More information

Part 2. Mining Patterns in Sequential Data

Part 2. Mining Patterns in Sequential Data Part 2 Mining Patterns in Sequential Data Sequential Pattern Mining: Definition Given a set of sequences, where each sequence consists of a list of elements and each element consists of a set of items,

More information

A Quantified Approach for large Dataset Compression in Association Mining

A Quantified Approach for large Dataset Compression in Association Mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 3 (Nov. - Dec. 2013), PP 79-84 A Quantified Approach for large Dataset Compression in Association Mining

More information

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining Bonfring International Journal of Data Mining, Vol. 4, No. 1, March 214 1 Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining Alpa Reshamwala and Dr. Sunita Mahajan Abstract--- Computer

More information

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,

More information

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules A Comparative study of CARM and BBT Algorithm for Generation of Association Rules Rashmi V. Mane Research Student, Shivaji University, Kolhapur rvm_tech@unishivaji.ac.in V.R.Ghorpade Principal, D.Y.Patil

More information

Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns

Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns Selim Mimaroglu, Dan A. Simovici Bahcesehir University,Istanbul, Turkey, selim.mimaroglu@gmail.com University of Massachusetts

More information

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home

More information

Chapter 13, Sequence Data Mining

Chapter 13, Sequence Data Mining CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence

More information

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

More information

Adaption of Fast Modified Frequent Pattern Growth approach for frequent item sets mining in Telecommunication Industry

Adaption of Fast Modified Frequent Pattern Growth approach for frequent item sets mining in Telecommunication Industry American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-4, Issue-12, pp-126-133 www.ajer.org Research Paper Open Access Adaption of Fast Modified Frequent Pattern Growth

More information

ETP-Mine: An Efficient Method for Mining Transitional Patterns

ETP-Mine: An Efficient Method for Mining Transitional Patterns ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com

More information

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India

More information

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,

More information

Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study Mirzaei.Afshin 1, Sheikh.Reza 2 1 Department of Industrial Engineering and

More information

Parallel Popular Crime Pattern Mining in Multidimensional Databases

Parallel Popular Crime Pattern Mining in Multidimensional Databases Parallel Popular Crime Pattern Mining in Multidimensional Databases BVS. Varma #1, V. Valli Kumari *2 # Department of CSE, Sri Venkateswara Institute of Science & Information Technology Tadepalligudem,

More information

Improved Algorithm for Frequent Item sets Mining Based on Apriori and FP-Tree

Improved Algorithm for Frequent Item sets Mining Based on Apriori and FP-Tree Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 2 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

Performance Based Study of Association Rule Algorithms On Voter DB

Performance Based Study of Association Rule Algorithms On Voter DB Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,

More information

A Mining Algorithm to Generate the Candidate Pattern for Authorship Attribution for Filtering Spam Mail

A Mining Algorithm to Generate the Candidate Pattern for Authorship Attribution for Filtering Spam Mail A Mining Algorithm to Generate the Candidate Pattern for Authorship Attribution for Filtering Spam Mail Khongbantabam Susila Devi #1, Dr. R. Ravi *2 1 Research Scholar, Department of Information & Communication

More information

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree International Journal for Research in Engineering Application & Management (IJREAM) ISSN : 2454-915 Vol-4, Issue-3, June 218 RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree Mrs.

More information

Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning

Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning Kun Li 1,2, Yongyan Wang 1, Manzoor Elahi 1,2, Xin Li 3, and Hongan Wang 1 1 Institute of Software, Chinese Academy of Sciences,

More information

Sequential Pattern Mining A Study

Sequential Pattern Mining A Study Sequential Pattern Mining A Study S.Vijayarani Assistant professor Department of computer science Bharathiar University S.Deepa M.Phil Research Scholar Department of Computer Science Bharathiar University

More information