CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS

Size: px
Start display at page:

Download "CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS"

Transcription

1 23 CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS This chapter introduces the concepts of association rule mining. It also proposes two algorithms based on, to calculate and use support thresholds automatically. 3.1 BASIC CONCEPTS Association rules were first introduced by Agrawal et al, (1993) as a means of determining relationships among a set of items in a dataset. It is a type of unsupervised learning, which has been applied to many fields such as retail industry, web mining and text mining. The most familiar and renowned example for association rule mining is Market Basket Analysis (MBA). The problem of association rule mining is generally divided into two sub-problems. One is to find frequent itemsets. The second problem is to discover association rules from those frequent itemsets. The extraction of frequent itemsets is a process of extracting set of items occurring with a frequency (i.e. the number of times the items occurring together) from a dataset D whose frequency is greater than a given threshold. It is then possible to generate association rules of the form, relating a subset of items from A with a subset of items from B. It can further be interpreted as follows: an itemset A relates to itemset B with a certain support and a certain confidence. The number of itemsets and rules that can be extracted from a

2 24 dataset may be very large. To ensure the successive interpretation of the extracted set of itemsets and the sets of extracted rules, there is a need for pruning of the extracted units. In the following, the principles of frequent itemset search and of the extraction of association rules have been introduced Frequent Itemset Search Definition 3.1: Consider a transaction dataset D which comprises a set of records R. A record in R consists of a set of items. An itemset, or a pattern corresponds to a set of items. The number of items in an itemset determines the length of the itemset. The support of an itemset corresponds to the number of records which includes the itemset. An itemset is said to be frequent if its support is greater than or equal to a given support threshold called minimum support (minsup). To evaluate the performance of the existing and newly proposed algorithms, the datasets in Table 3.4 is used in the thesis. However, a simple dataset D (Table 3.1) is used throughout the thesis, to illustrate a running example of those algorithms. Table 3.1 Dataset D Record Number Items 1 Bread, milk 2 Bread, jam 3 Milk, egg 4 Milk, sugar 5 Bread, milk, egg 6 Bread, jam, milk 7 Bread, jam 8 Bread, egg 9 Milk, sugar, egg 1 Milk, egg, sugar, bread

3 25 For example, considering the dataset D in Table 3.1, with minsup = 2, In the first level:{bread} is a frequent itemset of length 1 and of support 7; {bread, milk} is of length 2, of support 4, and frequent; {bread,milk,egg} is of length 3, of support 2, and frequent;{bread,milk,egg,sugar} is of length 4, of support 1, and not frequent. It can be noticed that the support is a monotonously decreasing function, with respect to the length of an itemset. When the number of items in D is equal to n, the number of potential itemsets is equal to 2 n. Thus, a direct search for the frequent itemsets is not conceivable. Heuristic methods have to be used for pruning the set of all itemsets to be tested. This is the purpose of the levelwise search of frequent itemsets, and the well-known algorithm proposed by Agrawal et al 1993, Agrawal et al 1994, Mannila et al 1994 and Agrawal et al relies on two fundamental and dual principles: (i) every subset of a frequent itemset is a frequent itemset, (ii) every superset of an infrequent itemset is infrequent. can further be summarized as follows: 1. The search for frequent itemsets starts with the search for frequent itemsets of length The initial frequent itemsets are found and combined together to form candidate itemsets of greater length. The infrequent itemsets are removed, and by consequence, all their super itemsets are also removed. The candidate itemsets are then tested, and the process continues in the same way, until no more candidates can be produced. For example, considering the Dataset on Table 3.1, with minsup = 2, the frequent itemsets of length 1, with their support are: {bread} (7), {milk}

4 26 (6), {egg} (5), {jam} (3), {sugar} (3). All items in the first level are found to be frequent. Then the candidates of length 2 are formed, combining the frequent itemsets of length 1, e.g. {bread,milk}, {bread,egg}, {bread,jam}, {bread,sugar}... and then tested. The frequent itemsets of length 2 are: {bread,milk} (4), {bread,egg} (3), {bread,jam} (3),{milk,egg} (4), {milk,sugar} (3). The candidates of length 3 are formed and tested. The frequent itemsets of length 3 are: {bread,milk,egg} (2), {milk,sugar,egg} (2). Finally, the candidate of length 4 is formed, i.e. {bread,milk,sugar,egg}, tested and found to be not a frequent itemset. No other candidates can be formed, and the algorithm terminates Association Rule Extraction Definition 3.2: An association rule has the form B, where A and B are two itemsets. The support of the rule B is defined as the support of the itemset B. The confidence of a rule B is defined as ( ) ( ). The confidence can be represented as a conditional probability P(B A) i.e. probability of B knowing A. A rule is said to be valid if its confidence is greater than or equal to a confidence threshold or minimum confidence (minconf), and its support is greater than or equal to the support threshold or minimum support (minsup). A valid rule can only be extracted from a frequent itemset. A rule is said to be exact if its confidence is equal to 1(1%), otherwise the rule is approximate. For example, with minsup = 3 and minconf = 5%, {bread, milk} is frequent, and the rule bread milk is valid (with support 4 and confidence 4/7); the rule bread jam is not valid (with support 3 and confidence 3/7). The generation of valid association rules from a frequent itemset of length necessarily greater than or equal to two which proceeds in a similar way as the search for frequent itemsets.

5 LEVELWISE MINING ALGORITHMS The most well-known algorithm of this kind is. This algorithm addresses the problem of finding all frequent itemsets in a dataset. has been followed by lots of variations, and several of these levelwise algorithms concentrate on a special subset of frequent itemsets, like closed itemsets or generators. The levelwise algorithm for finding all frequent itemsets is a breadth-first and bottom-up algorithm. It means the following: First it finds all 1-frequent itemsets, then at each i th iteration it identifies all i-long frequent itemsets. The algorithm stops when it has identified the largest frequent itemset. Frequent itemsets are computed iteratively, in ascending order by their length. This approach is very simple and efficient for sparse, weakly correlated data. The levelwise algorithm is based on two basic properties. Property 3.1 Property 3.2 (downward closure) All subsets of a frequent itemset are frequent. (anti-monotonocity) All supersets of a nonfrequent itemset are non-frequent Levelwise Exploration of the Frequent Itemsets The levelwise algorithm discovers frequent itemsets in a level by level manner. At each level i, it uses i-long frequent itemsets to generate their (i + 1)-long supersets. These supersets are called candidates, and only the frequent itemsets are kept. For storing itemsets, two kinds of tables are used: Fi for i-long frequent itemsets, and C i for frequent i-long candidates. An itemset of length (i + 1) is called frequent if all its i-long subsets are frequent. Otherwise, if it has a i-long subset not present in F i, then it means that it has an infrequent subset. And by Property 3.2 the candidate is also infrequent and can be pruned. With each database pass, the support of frequent candidates is

6 28 counted, and itemsets that turn out to be infrequent are pruned. The frequent (i + 1)-long itemsets are used then to generate (i+2)-long candidates, etc. The process continues until no new candidates can be generated. The generation of (i+1)-long frequent candidates from i-long frequent itemsets consists of two steps. First, in the join step, table F i is joined with itself. Next, in the prune step, candidates in C i+1 are deleted if they have an i-long subset not present in F i. This way infrequent itemsets are pruned and only frequent candidates are kept in C i+1. As an example of the join step, consider the F 2 table of dataset D: {(bread, milk), (bread, egg), (bread, jam), (milk, egg), (milk, sugar)}. After the join step, C 3 is: {(bread, milk, egg), (bread, milk, jam), (milk, egg, sugar)}. Since all their 2-long subsets are present in F 2, they are also frequent and kept in C 3. That is, the prune step will not delete any element of C 3. The candidate generation and the support counting process require a subset test. In candidate generation, for an (i+1)- long candidate, its subsets need to be identified in F i. In the support counting process, each itemset in the dataset is read. For each itemset in C k, the subsets are found. Then, the support value of each subset in C k is incremented by 1. Since these operations must be performed many times, it must be implemented in a very efficient way for a good performance. In this thesis, all the algorithms used trie data structure (Aho et al 1985) for subset test. 3.3 DYNAMIC ADAPTIVE SUPPORT APRIORI Motivation The support distributions of data items have a strong influence on the performance of association rule mining algorithms. In reality, support distribution of itemsets is highly skewed. The majority of the items have relatively low support values while a small fraction of them have very high support values. The datasets that exhibit such a highly skewed support

7 29 distribution is shown in Table 3.2. This table illustrates the Minimum support value among the items (minsup), Maximum support value among the items (maxsup) and the support distribution of items in various datasets. Table 3.2 Effect of Skewed Support Distribution Datasets minsup maxsup Support distribution in % (%) (%) <1 1 to 1 11to 3 31 to 61 to >9 6 9 Mushroom Chess C2D1K T2I6D1K T25I1D1K The T2I6D1K and T25I1D1K synthetic datasets imitate market basket data that are typically sparse and weakly correlated data. Mushroom, Chess and C2D1K are highly correlated datasets. Mushroom describes its characteristics. Chess describes a game dataset and C2D1K is a census dataset. These three datasets reflect the characteristics of real life datasets. These are the datasets used to carry out all experiments in this thesis. In these datasets, most of the items have less support and only small numbers of items have high support. Chess dataset is exceptional. It is a highly dense dataset which has all ranges of support values. Choosing the right support threshold for mining these datasets will be quite tricky. If the threshold is set too high, then many frequent itemsets involving the low support items will be missed. In market basket analysis, such low support items may correspond to expensive products that are seldom

8 3 bought by customers, but whose patterns are still interesting to retailers. Conversely, when the threshold is set too low, it becomes difficult to find the association rules due to the following reasons. First, the computational and memory requirements of association rule mining algorithms increase considerably with low support thresholds. Second, the number of extracted patterns also increases substantially, many of which relate a high-frequency item to a low-frequency item. For example, bread relates to a gold ring. Such patterns are likely to be spurious. However, the actual number of frequent items depends greatly on the support threshold that is chosen. Similarly, the possible number of association rules is large and is sensitive to the support threshold that is chosen. For example, considering the Dataset on Table 3.1, with minsup = 2, the frequent itemsets of length 1 are {bread} (7), {milk} (6), {egg} (5), {jam} (3), {sugar} (3). The frequent itemsets of length 2 are {bread,milk} (4), {bread,egg} (3), {bread,jam} (3),{milk,egg} (4), {milk,sugar} (3). The frequent itemsets of length 3 are {bread,milk,egg} (2), {milk,sugar,egg} (2). If the minsup is set to 3, then the algorithm terminates at the second level itself. At initial levels, support of items will be high, whereas in subsequent levels, for the combination of items the support will be low. For example, in level 2, the itemset {bread, milk} appears 4 times in the dataset whereas in level 3, the itemset {bread,milk,egg} which is a superset of {bread,milk} appears only twice in the dataset. Hence it is necessary to reduce the minsup threshold in subsequent levels. At each level the support distribution of items has to be analyzed and a suitable minsup threshold is to be chosen. This will be helpful to extract more number of lengthy frequent itemsets.

9 Computation of minsup Threshold In view of this, an algorithm based on called Dynamic Adaptive Support (DAS) for mining association rules is proposed. It employs a new method for calculating the minsup threshold and mining the large frequent itemsets and frequent association rules. An automatic support threshold should hold the following properties: Property 3.3: Property 3.4: The support threshold should be feasible. The support threshold should be appropriate. A support threshold is said to be feasible if its value is not exceeded the MAXS value and not lowered than the MINS value. That is MINS minsup MAXS. To obtain an appropriate minsup threshold, the support distributions of the items in the dataset need to be analyzed. Hence, it is necessary to conduct a statistical analysis on the support distribution of items. Here, the Mean (µ) and the Standard Deviation (SD) are the two statistical values that are used to compute the appropriate threshold. The mean (µ) is obtained by dividing the sum of support values by the number of candidates in that level. Let there be n candidates and the support of each candidate be denoted by sup. The mean (µ) can be computed as below in (3.1) = (3.1) The standard deviation (SD) denotes how close the entire set of support values is to the mean value. If the support values lie close to the mean, then the standard SD will be small. While the support values spread out over a large range of values, SD will be large. The formula for standard deviation is given below in equation (3.2)

10 32 SD= (sup ) (3.2) Using the µ and the SD value, the minimum and maximum bounds of the set of supports can be determined. The minimum bound of support (MINS) and the maximum bound of support (MAXS) can be calculated using equations (3.3) and (3.4) as given below: MINS = SD (3.3) MAXS = + SD (3.4) To increase or decrease the value of these bounds, another bound called the Candidate Threshold (CT) is introduced. It is based on the concept proposed by a mathematician Chebyshev. Theorem 3.1: Chebyshev's inequality Theorem (Grimmett and Stirzaker, 21): if X is a random variable with standard deviation, the probability that the outcome of X is not less than a away from its mean is no more than 1 / a 2. The fraction of observations falling between two distinct values, whose differences from the mean have the same absolute value, is related to the variance of the population. Chebyshev's Theorem gives a conservative estimate to the above percentage. For any population or sample, at least (1 (1 / k) 2 ) of the observations in the data set fall within k standard deviations of the mean, where k 1. Using the concept of z scores, Chebyshev's Theorem can be restated that for any population or sample, the proportion of all observations,

11 33 whose z score has an absolute value less than or equal to k, is not less than (1 - (1 / k 2 )). For k = 1, this theorem states that the fraction of all observations having a z score between -1 and 1 is (1 - (1 / 1)) 2 =. But for k 1, Chebyshev's Theorem provides a lower bound to the proportion of measurements that are within a certain number of standard deviations from the mean. This lower bound estimate can be very helpful when the distribution of a particular population is unknown or mathematically intractable. This bound can be used to determine how much of the items must lie close to the mean. In particular for any positive value T, the proportion of the items that lies within k standard deviations of the mean is given in equation (3.5) as CT. CT= 1. (3.5) For example, if T = 2, CT= 1 =.75. To ensure at least 75% of the items that lies within 75% of the mean, the k value should be calculated from (3.5). This is given below in equation (3.6). T= (3.6) To ensure the minimum percentage of items that should be mined, the equations (3.3) and (3.4) can be modified as in equations (3.7) and (3.8). MINS = (T ) SD (3.7) MAXS= + (T) SD (3.8)

12 34 As mentioned before, a feasible minsup threshold should lie between MINS and MAXS values. As a simple measure of central dispersion, the minsup threshold will be obtained from the equation (3.9) as follows: minsup= (3.9) At each level, the MINS, MAXS and thus the minsup values are calculated dynamically. Let there be n number of k-itemsets in a level. Initially, MINS and MAXS values are calculated. If the MAXS value is greater than the maximum among the support values, then the MAXS is set to the maximum support values. It is not necessary to search itemsets beyond the maximum support. In each subsequent level, a new minsup threshold is adapted based on the support distribution in that level. At each level, the newly computed minsup threshold is used for itemset pruning DAS_ Algorithm The proposed algorithm, DAS_, extends the algorithm. It employs a levelwise search procedure for finding large frequent itemsets. The additional computational overhead incurred by this algorithm is compensated using an efficient support counting procedure used by Szathmary et al (25). It works as follows: If the dataset has n items, then an (n 1) (n 1) upper triangular matrix is built, such as the one shown in Figure 3.1. The support count method for the dataset D in Table 3.1 is illustrated. Bread Milk Egg Jam Sugar Jam Egg Milk Figure 3.1 Upper Triangular Matrix

13 35 This matrix will contain the support values of 2-itemsets. First its entries are initialized by zero. A record of the dataset is decomposed into a list of 2-itemsets. For each element in this list, the value of its corresponding entry in the matrix is incremented by 1. This process is repeated for each record in the dataset. To facilitate quick retrieval and lookup of the subsets, the subset function is implemented using Trie data structure. The trie data structure is a tree for storing strings in which each node corresponds to a prefix. The root is associated with the empty string. The descendants of each node represent strings that begin with the prefix stored at that node. The name of the data structure comes from the word retrieval. Here, itemsets with their support value form a record. All itemsets are sorted in lexicographic order. If a subset operation is to be performed, then a trie is built over the itemsets. Each node in the trie has a value, which is a 1-long item (an attribute). Because of lexicographic order, the value of each child is greater than the value of its parent. An itemset is represented as a path in the trie, starting from the root node. Each node has a pointer back to its parent. Each terminal node has a pointer to its corresponding record. Algorithm DAS_ Variables D F C k CT T Dataset Frequent Itemsets Candidate Itemsets Level Candidate Threshold Threshold calculated from CT

14 36 µ Mean SD Standard Deviation MINS Minimum Support bound MAXS Maximum Support bound Input Dataset D Candidate Threshold CT Output Large Frequent Itemsets F DAS_ F 1 = find frequent 1 itemset(d) for (k = 2; F k-1 Ø, k++) do C k =candidate-gen (F k-1 ) end for each Record r in D do C t = subset (C k, r); for each candidate c C t do c.count++; end end minsup k = minsup_calc(c k, CT) F k = {c C k c.count ( minsup p )} return k F k ; // To generate Candidate itemsets Procedure Candidate-gen(F k-1 ) for each itemset l 1 F k-1 for each itemset l 2 F k-1 perform join operation l 1 and l 2

15 37 if has_infrequent_subset(c, F k-1) prune c; else add c to C k; end if end end return C k; // To perform pruning of infrequent itemsets Procedure has_infrequent_subset(c, F k-1 ) for each (k-1) subset s of c if s is in F k-1 return false; else return true; end if end // To Calculate the minsup value Procedure minsup_calc(c k, CT) T =calculate_k(ct) µ k = calculate_mean(c k ) SD k =calculate_sd(c k, µ k ) MINS (T) SD MAXS + (T) SD minsup k = Average (MAXS k, MINS k ) return minsup k

16 38 DAS_ algorithm generates all large frequent itemsets by making multiple passes over data. In each pass, it counts the supports of itemsets and finds the MINS and MAXS values and thus the minsup threshold. Initially, the itemsets that satisfies the minsup are retrieved. To extract sufficient number of frequent itemsets, user can specify the value of another threshold CT. To calculate the T value used in (3.7) and (3.8), CT value is chosen between.1 and.9. Hence, the performance of the algorithm depends on the CT value. Example The execution of DAS_ algorithm on Dataset D (Table 3.1) with CT=.1 is illustrated in Table 3.3. Table 3.3 Execution of DAS_ with CT=.1 on Dataset D C 1 Sup F 1 C 2 Sup F 2 Bread 7 Bread Bread, milk 4 Bread, milk Milk 6 Milk Bread, egg 3 Milk, egg Egg 5 Egg Milk, egg 4 Jam 3 Sugar 3 During the first level the calculated values are: µ=4.8, SD=1.79, T=1.5, MINS=3, MAXS=7 and thus the minsup=5. Hence there are three items satisfying the minsup value and are listed in F 1. Now, three candidate itemsets are formed in the second level and listed in C 2. During level 2, the values are calculated as µ=3.67, SD=.58, MINS=3, MAXS=4 and thus the minsup=4. Two itemsets are found to be frequent and listed in F 2. Only one candidate is generated for the third level. Since, the SD value is, the algorithm terminates.

17 Rule Generation The concept of association rule was introduced by Agrawal et al (1993). The proposed algorithm adopts the confidence based rule generation method of for rule generation. The confidence threshold can be used to find out the frequent association rules. The confidence of a rule is its support divided by the support of its antecedent. In this process, the first step is to find all frequent itemsets F in dataset D, where sup(f) minsup. For each frequent itemset (F), all non empty subsets (f) of the frequent itemset are generated. For every non empty subset s of f, the rule (f s)is generated, if sup (f) sup(s) minconf. The algorithm for rule generation is given below: Algorithm Rule generation Input Output F k Frequent Itemsets minconf minimum confidence Association Rules Algorithm: Rule_gen (F k, minconf) for each frequent itemset f F do for each nonempty subset s of f do if sup (f) sup(s) minconf then output the rule f end end The rule generation algorithm works based on this principle. The support of any subset F 3 of F 2 is greater than or equal to the support of F 2.

18 4 Thus, the confidence of the rule F F F is necessarily less than or equal to the confidence of the rule F F F. Hence, if the rule F F F is not confident, then neither is the rule F F F. Conversely, if the rule F F F is confident, then all rules of the form F F F are also confident. For example, if the rule milk bread,egg is confident, then the rules milk,bread egg and milk,egg bread are confident as well. For each frequent itemset F 1, all association rules with one item in the consequent are generated. Then, using the sets of 1-long consequents, 2-long consequents are generated. The rules with 2 items in the consequent are kept only when the confidence of them is greater than or equal to minconf. The 2-long consequents of the association rules are used for generating consequents with 3 items and so on. The confidence of the rules will vary % to 1%. The rules with low support and 1% confidence will be considered as exceptional but highly useful in analyzing critical cases. The rules with 1% confidence are known as exact rules Experimental Results All the experiments in this thesis are carried out on an Intel Pentium IV 1.99 GHz machine running under Fedora 1 Operating system with 2.99 GB RAM. Algorithms are implemented in Java. Testing of the algorithms is carried out on five different benchmark datasets. MUSHROOMS, CHESS, C2D1K, T2I6D1K and T25I1D1K datasets are taken from CORON platform (szathmary et al 25a). The characteristics of these datasets are illustrated in the Table 3.4.

19 41 Table 3.4 Characteristics of Datasets used for Evaluation Dataset Records Items #Non empty #Average Density items attributes Mushroom % Chess % C2D1K % T2I6D1K % T25I1D1K % In Table 3.4, there are two kinds of datasets: Synthetic and Real life datasets. According to the frequency of items in the datasets, they are further categorized into two types: Sparse and Dense datasets. This study uses two synthetic datasets namely T2I6D1K and T25I1D1K. These datasets are typically sparse datasets. In these datasets, most of the items have less support. Mushroom and C2D1K are real life, dense datasets. Chess is also a real life dataset with highly dense data with all ranges of support values. Since these datasets reflect various kinds of support behaviour, these datasets are chosen to carry out all experiments in this study. Since, the algorithms are implemented as an improvement over algorithm, the performance of each algorithm is tested against. The dataset and CT are given as input to the proposed DAS_ algorithm. The value of CT should be chosen between and 1. The proposed algorithm calculates the support automatically. It uses CT values from.1 to.9. For each.1 interval, the algorithm is executed. The existing algorithm requires a minsup value to be specified. To run algorithm, each dataset is analyzed and an optimum minsup value is picked up. First, the frequent itemsets (FI) are generated. The FIs generated by the algorithm are normalized to one. Also, the FIs obtained from DAS_ algorithm is normalized with FI of algorithm for better comparison. The length of the longest frequent itemsets is also compared.

20 42 Then the extracted frequent itemsets are used to generate association rules. To generate association rules, a minconf value should be specified. By assigning % minconf value, all association rules with specified minsup value are generated. The exact rules (1% confidence) of specified support thresholds are produced by choosing a minconf value of 1%. The response time of the DAS_ algorithm is compared with the response times of the existing algorithms like (Agrawal et al 1993), Pascal (Bastide et al 2) and Zart (Szathmary et al 25). The implementation of Pascal and Zart are taken from CORON platform for comparison.

21 Table 3.5 Response Time (in Seconds), Length of FIs, #FI and # Exact rules of Existing Algorithms Dataset Pascal Zart Length Response Response Response Response Response Response of FIs Time (FI) Time Time (FI) Time Time (FI) Time (Rule) (Rule) (Rule) #FIs #Exact Rules Mushroom (5%) Chess (5%) C2d1k (5%) T2i6d1k (1%) T25i1d1k (1%)

22 44 Table 3.6 Length of FIs, #FI and # Exact rules and %Exact rules of DAS_ Dataset Threshold DAS_ Mushroom Chess C2D1K Length of FIs #FI #Rules #Exact Rules % Exact Rules

23 45 Table 3.5 and Table 3.6 shows the performance of the existing algorithms and the proposed algorithm in terms of Response Time, the number of FIs, the number of rules and the number of exact rules respectively. Also, the performance comparison for each of the above factor is graphically shown. Figures 3.2 to 3.1 show the Comparison results of the Length of FI. DAS_(.1) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.2 Comparison of Length of FIs with CT set to.1 DAS_(.2) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.3 Comparison on Length of FIs with CT set to.2

24 46 DAS_(.3) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.4 Comparison on Length of FIs with CT set to.3 DAS_(.4) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.5 Comparison on Length of FIs with CT set to.4

25 47 DAS_(.5) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.6 Comparison on Length of FIs with CT set to.5 DAS_(.6) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.7 Comparison on Length of FIs with CT set to.6

26 48 DAS_(.7) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.8 Comparison on Length of FIs with CT set to.7 DAS_(.8) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.9 Comparison on Length of FIs with CT set to.8

27 49 DAS_(.9) Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.1 Comparison on Length of FIs with CT set to.9 In case of DAS, if the CT value is less, then lengthy FIs are obtained. It becomes evident that for all strongly correlated datasets like Mushroom, Chess and C2D1K, the lower CT value leads to the longest frequent itemsets. Small length FIs are only generated by the proposed algorithm for weakly correlated, Sparse Datasets like T2I6D1K and T25I1D1K. The proposed algorithm produces lengthy FIs than for CT values.1 to.5. DAS_ algorithm guarantees to produce FIs for any CT value. But, this is not true with. For arbitrary minsup values, the algorithm may break down and will not yield any results. The number of FIs generated by and DAS_ is compared and shown in Figures 3.11 to The #FI value of is normalized to one. The proposed DAS_ algorithm is also normalized with. The normalized results are compared because the scales of #FI values vary differently for different types of datasets. From Figures 3.11 to 3.19, for strongly correlated datasets, the proposed algorithm with Low CT

28 5 value produces better results than the existing algorithm. Also, it shows that, for mushroom dataset, the number of FIs increases from.15% and goes up to 33.44% for the chosen CT values. The C2D1K dataset shows an improvement from.27% to 8.76% in #FI generation. The T2I6D1K shows a slight improvement of.2% to.2%. Also, T25I1D1K exhibits the same pattern. DAS_(.1) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.11 Comparison on #FIs with CT set to.1 ( normalized to one)

29 51 DAS_(.2) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.12 Comparison on #FIs with CT set to.2 ( normalized to one) DAS_(.3) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.13 Comparison on #FIs with CT set to.3 ( normalized to one)

30 52 DAS_(.4) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.14 Comparison on #FIs with CT set to.4 ( normalized to one) DAS_(.5) 25 2 #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.15 Comparison on #FIs with CT set to.5 ( normalized to one)

31 53 DAS_(.6) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.16 Comparison on #FIs with CT set to.6 ( normalized to one) DAS_(.7) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.17 Comparison on #FIs with CT set to.7 ( normalized to one)

32 54 DAS_(.8) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.18 Comparison on #FIs with CT set to.8 ( normalized to one) DAS_(.9) #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.19 Comparison on #FIs with CT set to.9 ( normalized to one)

33 55 The response times of algorithms during the generation of FI and rules for various datasets are shown in Table 3.5 and Table 3.7. It is also graphically illustrated in Figures 3.2 to Table 3.7 Response Times of DAS_ for FI and Rule Generation Dataset Threshold FI generation (in seconds) Mushroom Chess C2D1K Rule generation (in seconds)

34 56 Response Time Candidate Threshold DAS_ Pascal Zart Figure 3.2 Response Times of FI generation on Mushroom Response Time Candidate Threshold DAS_ Pascal Zart Figure 3.21 Response Times of FI generation on Chess

35 57 Response Time Candidate Threshold DAS_ Pascal Zart Figure 3.22 Response Times of FI generation on C2D1K Response Time DAS_ Pascal Zart Candidate Threshold Figure 3.23 Response Times of FI generation on T2I6D1K

36 Response Time DAS_ Pascal Zart Candidate Threshold Figure 3.24 Response Times of FI generation on T25I1D1K Response Time Candidate Threshold DAS_ Pascal Zart Figure 3.25 Response Times of Rule generation on Mushroom

37 59 Response Time Candidate Threshold DAS_ Pascal Zart Figure 3.26 Response Times of Rule generation on Chess 3 25 Response Time DAS_ Pascal Zart Candidate Threshold Figure 3.27 Response Times of Rule generation on C2D1K

38 Response Time DAS_ Pascal Zart Candidate Threshold Figure 3.28 Response Times of Rule generation on T2I6D1K Response Time Candidate Threshold DAS_ Pascal Zart Figure 3.29 Response Times of Rule generation on T25I1D1K According to the rule generation, the DAS_ algorithm generates more number of rules. The proposed algorithm produces more exact rules than. For mushroom dataset, DAS_ improves the rule

39 61 generation at least by 8.4% and up to18.75% for the chosen CT values. For Chess dataset, the algorithm shows an increase from.73% to 46.32% whereas for C2D1K dataset, it exhibits an improvement from 1.55% to 5.49% only. DAS_ algorithm produces poor results on weakly correlated datasets. In case of Mushroom dataset, the response time of the proposed algorithm is quick when a high CT value is picked. The proposed algorithm yields better response time than for medium CT values. In most of the datasets, almost for all values of CT, the proposed algorithm yields quick response time except for few cases in Chess during FI generation. In many cases, the performance of the algorithm is better than Pascal and Zart algorithm. DAS_ algorithm performs better for three out of five datasets. However, the performance of the sparse datasets can still be improved. 3.4 DYNAMIC AND COLLECTIVE SUPPORT THRESHOLDS This algorithm is proposed to improve the performance of DAS_ algorithm. DAS_ algorithm requires a user specified threshold called CT for the generation of frequent itemsets and association rules. An improved algorithm is proposed which avoids the use of CT. Also, the DAS_ algorithm does not perform well for sparse datasets. In particular for weakly correlated sparse datasets, the minsup value has to be significantly reduced in subsequent levels. If the minsup value is lowered for strongly correlated datasets, then it may lead to memory scarcity problem. Keeping this in view, the proposed algorithm considers the collective support of itemsets obtained from the previous level and uses it for the subsequent level. In this model, two minimum support counts namely Dynamic Minimum Support (DMS) and Collective Minimum Support Count

40 62 (CMS) have been introduced for the item set generation at each level. Initially, DMS is calculated while scanning the items in the dataset. CMS is calculated during the itemset generation. DMS reflects the frequency of items in the dataset. CMS reflects the intrinsic nature of items in the dataset by carrying over the existing support to the next level. In each level, a different minimum support value is used, that is, the DMS and CMS values are calculated in each level. Initially, the DMS is used for itemset generation and in the subsequent levels CMS values are used to find the frequent itemsets. Let there be n items in the dataset and sup be the support of each item in the dataset and k represents the current level. MAXS k and MINS k values are calculated as in DAS_. Although, the T value is ignored since it is derived from CT. Then the equations (3.7) and (3.8) can be redefined and are shown in equation (3.1) and (3.11): MINS = SD (3.1) MAXS = + SD (3.11) The total support of items considered in each level is TOTOCC k which is shown in Equation (3.12). TOTOCC k n sup (3.12) i i 1 1 TOTOCC k MINS k MAXS k DMS (3.13) k 2 n 2 CMS k 1 4 DMS DMS (3.14) k k 1 DMS k and CMS k are calculated using Equations (3.13) and (3.14). The calculation for the DMS k values has been the same at all level. Here

41 63 DMS k represents the value at the current level whereas the DMS k-1 represents the value at the previous level. The proposed method is known as Dynamic Collective Support (DCS_). It works as follows: In each level k, it counts the supports of itemsets and finds the MINS k and MAXS k values, TOTOCC k and thus the DMS k value. Initially, the itemsets that satisfy the DMS k value are retrieved. The DMS k value is calculated based on the candidates generated in the previous level DCS_ Algorithm Algorithm DCS_ Variables D Dataset F Frequent Itemsets C Candidate Itemsets k Level µ Mean SD Standard Deviation MINS Minimum Support bound MAXS Maximum Support bound Input Dataset D Candidate Threshold CT Output Large Frequent Itemsets F k DCS_ F 1 = find frequent 1 itemset(d) for (k = 2; F k-1 Ø, k++) do

42 64 C k =candidate-gen (F k-1 ) end for each Record r in D do C t = subset (C k, r); for each candidate c C t do c.count++; end end minsup k = minsup_calc(c k ) F k = {c C k c.count ( minsup k )} return k F k ; // To generate Candidate itemsets Procedure Candidate-gen(F k-1 ) for each itemset l 1 F k-1 for each itemset l 2 F k-1 perform join operation l 1 and l 2 if has_infrequent_subset(c, F k-1 ) prune c; else add c to C k; end if end end return C k; // To perform pruning of infrequent itemsets Procedure has_infrequent_subset(c, F k-1 ) for each (k-1) subset s of c

43 65 if s is in F k-1 return false; else return true; end if end // To Calculate the minsup value Procedure minsup_calc(c k ) µ k = calculate_mean(c k ) SD k =calculate_sd(c k, µ k ) MINS SD MAXS + SD TOTOCC k = sum(c.count) for all C k DMS k If (k= 1) then CMS k = DMS k Else 1 TOTOCC 2 n k MINS k MAXS 2 k CMS k = (DMS k-1 + DMS k ) / 4 End if return CMS k DCS_ An Example The execution of DAS_ algorithm on Dataset D (Table 3.1) with CT=.1 is illustrated in Table 3.8.

44 66 Table 3.8 Execution of DCS_ on Dataset D C 1 Sup F 1 C 2 Sup F 2 F 3 Bread 7 Bread Bread, milk 4 Bread, milk Bread,milk,egg Milk 6 Milk Bread, egg 3 Milk, egg Egg 5 Egg Milk, egg 4 Jam 3 Sugar 3 During the first level DMS 1 = + =5, so there are three items satisfying the condition. Three candidate itemsets are formed in the second level. The support threshold will be, DMS 2 = + =4 and CMS 2 = (5+ 4) = 2. Thus, the support threshold is 2 and all three candidate itemsets are selected in the second level. Only one candidate itemset with support 2 is generated in the third level. For a single candidate it is not necessary to calculate the DMS 3 and CMS 3 during the third level. The support is directly compared with the support of DMS 2 and found to be frequent Experimental Results The proposed DCS_ algorithm is tested with the benchmark datasets described in Table 3.4. The algorithm is compared with and DAS_. The minsup value of 5% is considered to be optimum thresholds for those datasets whereas 1% minsup is considered to be optimum for T2I6D1K and T25I1D1K datasets. Hence, these minsup values are taken to run. DAS_ algorithm uses its CT value as.5 which is considered to be optimum for all datasets used in this comparison.

45 Table 3.9 Response Time of Algorithms during FI and Rule Generation Dataset Pascal Zart DAS- DCS- FI Rule FI Rule FI Rule FI Rule FI Rule Mushroom (5%) Chess (5%) C2d1k (5%) T2i6d1k (1%) T25i1d1k (1%) Table 3.1 Length of FIs, #FI and # Rules and #Exact rules of DAS_, DCS_ and Dataset Length of FIs DAS_ DCS_ #FIs # Exact Rules Length of FIs #FIs #Rules # Exact Rules Length of FIs #FI #Rules # Exact Rules Mushroom (5%) Chess (5%) C2d1k (5%) T2i6d1k (1%) T25i1d1k (1%)

46 68 Table 3.9 lists the response times of existing algorithms along with the proposed algorithms DAS_ and DCS_. The length of the FIs, number of FIs generated number of rules generated by the algorithms for the benchmark datasets are shown in Table 3.1. Also, the graphical representations are given in Figures 3.3 to DAS_ DCS_ Length of FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.3 Comparison of Length of FI DAS_(.5) DCS_ #FIs M ushroom Chess C2d1k T2i6d1k T25i1d1k Datasets Figure 3.31 Comparison of #FIs

47 69 From the analysis, it is known that both DAS_ and DCS_ algorithms produce longest FIs for strongly correlated datasets. DAS_ algorithm performs poorly for weakly correlated datasets in terms of length of FIs, #FIs and #Rules. In case of DCS_, the weakly correlated datasets like T1I6D1K and T25I1D1K also yields good results comparatively. The proposed DCS_ algorithm produces 33.51% more FIs on mushroom dataset. The algorithm also shows an increase up to 3% for other benchmark datasets. In case of exact rule generation, the DCS_ algorithm yields an additional 3.46% improvement over DAS_ algorithm on mushroom dataset. The total increase in the number of exact rules by DCS_ on musroom dataset is 13.46%. Due to the highly densed nature of chess, the rule generation algorithm terminates. However, for C2D1K dataset, the algorithm exhibits an improvement by.1% only. The performance improvement for T25I1D1K dataset in the number of exact rule generation is.14%. The same pattern of improvement is noticed in the rule generation process also. The results show that DCS_ algorithm explores more hidden frequent itemsets and thus more rules in all kinds of datasets. Also, DCS_ shows significant improvement in the generation of exact association rules. Although the response time of DCS_ is considerably increased for all datasets, it is negligible as it explores more hidden rules. Also, it leaves the user free from specifying minimum support and guarantees the generation of interesting association rules.

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

Finding frequent closed itemsets with an extended version of the Eclat algorithm

Finding frequent closed itemsets with an extended version of the Eclat algorithm Annales Mathematicae et Informaticae 48 (2018) pp. 75 82 http://ami.uni-eszterhazy.hu Finding frequent closed itemsets with an extended version of the Eclat algorithm Laszlo Szathmary University of Debrecen,

More information

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Lecture notes for April 6, 2005

Lecture notes for April 6, 2005 Lecture notes for April 6, 2005 Mining Association Rules The goal of association rule finding is to extract correlation relationships in the large datasets of items. Many businesses are interested in extracting

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

Closed Non-Derivable Itemsets

Closed Non-Derivable Itemsets Closed Non-Derivable Itemsets Juho Muhonen and Hannu Toivonen Helsinki Institute for Information Technology Basic Research Unit Department of Computer Science University of Helsinki Finland Abstract. Itemset

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find

More information

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set Dao-I Lin Telcordia Technologies, Inc. Zvi M. Kedem New York University July 15, 1999 Abstract Discovering frequent itemsets

More information

Mining Imperfectly Sporadic Rules with Two Thresholds

Mining Imperfectly Sporadic Rules with Two Thresholds Mining Imperfectly Sporadic Rules with Two Thresholds Cu Thu Thuy and Do Van Thanh Abstract A sporadic rule is an association rule which has low support but high confidence. In general, sporadic rules

More information

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.

More information

Association mining rules

Association mining rules Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when

More information

2 CONTENTS

2 CONTENTS Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a Road Map..................................... 3 5.1.1 Market Basket Analysis: A Motivating Example........................

More information

Association Rules Apriori Algorithm

Association Rules Apriori Algorithm Association Rules Apriori Algorithm Market basket analysis n Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner n Putting both items on promotion at the

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

Frequent Itemsets Melange

Frequent Itemsets Melange Frequent Itemsets Melange Sebastien Siva Data Mining Motivation and objectives Finding all frequent itemsets in a dataset using the traditional Apriori approach is too computationally expensive for datasets

More information

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare

More information

Association Rules Apriori Algorithm

Association Rules Apriori Algorithm Association Rules Apriori Algorithm Market basket analysis n Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner n Putting both items on promotion at the

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}

More information

Tutorial on Association Rule Mining

Tutorial on Association Rule Mining Tutorial on Association Rule Mining Yang Yang yang.yang@itee.uq.edu.au DKE Group, 78-625 August 13, 2010 Outline 1 Quick Review 2 Apriori Algorithm 3 FP-Growth Algorithm 4 Mining Flickr and Tag Recommendation

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923

More information

Rule induction. Dr Beatriz de la Iglesia

Rule induction. Dr Beatriz de la Iglesia Rule induction Dr Beatriz de la Iglesia email: b.iglesia@uea.ac.uk Outline What are rules? Rule Evaluation Classification rules Association rules 2 Rule induction (RI) As their name suggests, RI algorithms

More information

Data mining, 4 cu Lecture 6:

Data mining, 4 cu Lecture 6: 582364 Data mining, 4 cu Lecture 6: Quantitative association rules Multi-level association rules Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Data mining, Spring 2010 (Slides adapted

More information

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN: IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Brief Survey on Frequent Patterns Mining of Uncertain Data Purvi Y. Rana*, Prof. Pragna Makwana, Prof. Kishori Shekokar *Student,

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Memory issues in frequent itemset mining

Memory issues in frequent itemset mining Memory issues in frequent itemset mining Bart Goethals HIIT Basic Research Unit Department of Computer Science P.O. Box 26, Teollisuuskatu 2 FIN-00014 University of Helsinki, Finland bart.goethals@cs.helsinki.fi

More information

Interestingness Measurements

Interestingness Measurements Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected

More information

Mining Frequent Patterns with Counting Inference at Multiple Levels

Mining Frequent Patterns with Counting Inference at Multiple Levels International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika

More information

Chapter 7: Frequent Itemsets and Association Rules

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12 VII.1-1 Chapter VII: Frequent Itemsets and Association

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Nesnelerin İnternetinde Veri Analizi

Nesnelerin İnternetinde Veri Analizi Bölüm 4. Frequent Patterns in Data Streams w3.gazi.edu.tr/~suatozdemir What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together

More information

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu, Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk

More information

gspan: Graph-Based Substructure Pattern Mining

gspan: Graph-Based Substructure Pattern Mining University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on   to remove this watermark. 119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched

More information

EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining

EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining 1 High-utility itemset mining Input a transaction database a unit profit table minutil: a minimum utility threshold set by the

More information

A Fast Algorithm for Data Mining. Aarathi Raghu Advisor: Dr. Chris Pollett Committee members: Dr. Mark Stamp, Dr. T.Y.Lin

A Fast Algorithm for Data Mining. Aarathi Raghu Advisor: Dr. Chris Pollett Committee members: Dr. Mark Stamp, Dr. T.Y.Lin A Fast Algorithm for Data Mining Aarathi Raghu Advisor: Dr. Chris Pollett Committee members: Dr. Mark Stamp, Dr. T.Y.Lin Our Work Interested in finding closed frequent itemsets in large databases Large

More information

Optimization using Ant Colony Algorithm

Optimization using Ant Colony Algorithm Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department

More information

Data Mining Clustering

Data Mining Clustering Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0

More information

Association Rules. A. Bellaachia Page: 1

Association Rules. A. Bellaachia Page: 1 Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...

More information

Association Rules. Berlin Chen References:

Association Rules. Berlin Chen References: Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A

More information

Association Rule Mining: FP-Growth

Association Rule Mining: FP-Growth Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong We have already learned the Apriori algorithm for association rule mining. In this lecture, we will discuss a faster

More information

Chapter 7: Frequent Itemsets and Association Rules

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line

More information

FP-Growth algorithm in Data Compression frequent patterns

FP-Growth algorithm in Data Compression frequent patterns FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : nagesh.v@gmail.com Abstract-The transmission

More information

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India

More information

Chapter 6: Association Rules

Chapter 6: Association Rules Chapter 6: Association Rules Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model. Transaction data (no time-dependent) Assume all data are categorical. No good

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Vladimir Estivill-Castro School of Computing and Information Technology With contributions fromj. Han 1 Association Rule Mining A typical example is market basket

More information

Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold

Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold Zengyou He, Xiaofei Xu, Shengchun Deng Department of Computer Science and Engineering, Harbin Institute of Technology,

More information

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011, Weighted Association Rule Mining Without Pre-assigned Weights PURNA PRASAD MUTYALA, KUMAR VASANTHA Department of CSE, Avanthi Institute of Engg & Tech, Tamaram, Visakhapatnam, A.P., India. Abstract Association

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

An Algorithm for Mining Large Sequences in Databases

An Algorithm for Mining Large Sequences in Databases 149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential

More information

signicantly higher than it would be if items were placed at random into baskets. For example, we

signicantly higher than it would be if items were placed at random into baskets. For example, we 2 Association Rules and Frequent Itemsets The market-basket problem assumes we have some large number of items, e.g., \bread," \milk." Customers ll their market baskets with some subset of the items, and

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Parallel Mining of Maximal Frequent Itemsets in PC Clusters

Parallel Mining of Maximal Frequent Itemsets in PC Clusters Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan

More information

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Efficient Remining of Generalized Multi-supported Association Rules under Support Update Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou

More information

A Survey of Sequential Pattern Mining

A Survey of Sequential Pattern Mining Data Science and Pattern Recognition c 2017 ISSN XXXX-XXXX Ubiquitous International Volume 1, Number 1, February 2017 A Survey of Sequential Pattern Mining Philippe Fournier-Viger School of Natural Sciences

More information

Mining Top-K Association Rules. Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2. University of Moncton, Canada

Mining Top-K Association Rules. Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2. University of Moncton, Canada Mining Top-K Association Rules Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2 1 University of Moncton, Canada 2 National Cheng Kung University, Taiwan AI 2012 28 May 2012 Introduction

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

More information

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

Efficient Incremental Mining of Top-K Frequent Closed Itemsets Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,

More information

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking Shariq Bashir National University of Computer and Emerging Sciences, FAST House, Rohtas Road,

More information

Data Structure for Association Rule Mining: T-Trees and P-Trees

Data Structure for Association Rule Mining: T-Trees and P-Trees IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 1 Data Structure for Association Rule Mining: T-Trees and P-Trees Frans Coenen, Paul Leng, and Shakil Ahmed Abstract Two new

More information

Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms

Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms ABSTRACT R. Uday Kiran International Institute of Information Technology-Hyderabad Hyderabad

More information

Parallelizing Frequent Itemset Mining with FP-Trees

Parallelizing Frequent Itemset Mining with FP-Trees Parallelizing Frequent Itemset Mining with FP-Trees Peiyi Tang Markus P. Turkia Department of Computer Science Department of Computer Science University of Arkansas at Little Rock University of Arkansas

More information

Association Rule Discovery

Association Rule Discovery Association Rule Discovery Association Rules describe frequent co-occurences in sets an itemset is a subset A of all possible items I Example Problems: Which products are frequently bought together by

More information

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea. 15-721 DB Sys. Design & Impl. Association Rules Christos Faloutsos www.cs.cmu.edu/~christos Roadmap 1) Roots: System R and Ingres... 7) Data Analysis - data mining datacubes and OLAP classifiers association

More information

Research Report. Constraint-Based Rule Mining in Large, Dense Databases. Roberto J. Bayardo Jr. Rakesh Agrawal Dimitrios Gunopulos

Research Report. Constraint-Based Rule Mining in Large, Dense Databases. Roberto J. Bayardo Jr. Rakesh Agrawal Dimitrios Gunopulos Research Report Constraint-Based Rule Mining in Large, Dense Databases Roberto J. Bayardo Jr. Rakesh Agrawal Dimitrios Gunopulos IBM Research Division Almaden Research Center 650 Harry Road San Jose, California

More information

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 68 CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 5.1 INTRODUCTION During recent years, one of the vibrant research topics is Association rule discovery. This

More information

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW International Journal of Computer Application and Engineering Technology Volume 3-Issue 3, July 2014. Pp. 232-236 www.ijcaet.net APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW Priyanka 1 *, Er.

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai EFFICIENTLY MINING FREQUENT ITEMSETS IN TRANSACTIONAL DATABASES This article has been peer reviewed and accepted for publication in JMST but has not yet been copyediting, typesetting, pagination and proofreading

More information

Association Rule Discovery

Association Rule Discovery Association Rule Discovery Association Rules describe frequent co-occurences in sets an item set is a subset A of all possible items I Example Problems: Which products are frequently bought together by

More information

An Automated Support Threshold Based on Apriori Algorithm for Frequent Itemsets

An Automated Support Threshold Based on Apriori Algorithm for Frequent Itemsets An Automated Support Threshold Based on Apriori Algorithm for sets Jigisha Trivedi #, Brijesh Patel * # Assistant Professor in Computer Engineering Department, S.B. Polytechnic, Savli, Gujarat, India.

More information

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning Philippe Fournier-Viger 1 Cheng Wei Wu 2 Souleymane Zida 1 Vincent S. Tseng 2 presented by Ted Gueniche 1 1 University

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

Data Mining Algorithms

Data Mining Algorithms Algorithms Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Looking for patterns in data Machine

More information

Performance Based Study of Association Rule Algorithms On Voter DB

Performance Based Study of Association Rule Algorithms On Voter DB Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Finding Sporadic Rules Using Apriori-Inverse

Finding Sporadic Rules Using Apriori-Inverse Finding Sporadic Rules Using Apriori-Inverse Yun Sing Koh and Nathan Rountree Department of Computer Science, University of Otago, New Zealand {ykoh, rountree}@cs.otago.ac.nz Abstract. We define sporadic

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency Math 14 Introductory Statistics Summer 008 6-9-08 Class Notes Sections 3, 33 3: 1-1 odd 33: 7-13, 35-39 Measures of Central Tendency odd Notation: Let N be the size of the population, n the size of the

More information

An Improved Algorithm for Mining Association Rules Using Multiple Support Values

An Improved Algorithm for Mining Association Rules Using Multiple Support Values An Improved Algorithm for Mining Association Rules Using Multiple Support Values Ioannis N. Kouris, Christos H. Makris, Athanasios K. Tsakalidis University of Patras, School of Engineering Department of

More information

Survey on Frequent Pattern Mining

Survey on Frequent Pattern Mining Survey on Frequent Pattern Mining Bart Goethals HIIT Basic Research Unit Department of Computer Science University of Helsinki P.O. box 26, FIN-00014 Helsinki Finland 1 Introduction Frequent itemsets play

More information

Classification by Association

Classification by Association Classification by Association Cse352 Ar*ficial Intelligence Professor Anita Wasilewska Generating Classification Rules by Association When mining associa&on rules for use in classifica&on we are only interested

More information

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Preetham Kumar, Ananthanarayana V S Abstract In this paper we propose a novel algorithm for discovering multi

More information

Chapter 2. Related Work

Chapter 2. Related Work Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information