Incrementally mining high utility patterns based on pre-large concept

Size: px
Start display at page:

Download "Incrementally mining high utility patterns based on pre-large concept"

Transcription

1 Appl Intell (2014) 40: DOI /s z Incrementally mining high utility patterns based on pre-large concept Chun-Wei Lin Tzung-Pei Hong Guo-Cheng Lan Jia-Wei Wong Wen-Yang Lin Published online: 27 August 2013 Springer Science+Business Media New York 2013 Abstract In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a twophase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be C.-W. Lin Innovative Information Industry Research Center (IIIRC), Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen , P.R. China jerrylin@ieee.org C.-W. Lin Shenzhen Key Laboratory of Internet Information Collaboration, School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen , P.R. China T.-P. Hong (B) W.-Y. Lin Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, R.O.C. tphong@nuk.edu.tw W.-Y. Lin wylin@nuk.edu.tw T.-P. Hong J.-W. Wong Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 804, Taiwan, R.O.C. J.-W. Wong jwwong.alex@gmail.com G.-C. Lan Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 701, Taiwan, R.O.C. rrfoheiay@gmail.com inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms. Keywords Utility mining Pre-large itemset High utility itemset Incremental mining Two-phase approach Notation I A set of m items, I ={i 1,i 2,...,i j,...,i m }, in which each item i j has its own profit value p j ; P The profit table, {p 1,p 2,...,p j,...,p m }, in which p j is the profit value of item i j ; D The original quantitative database, D ={T 1,T 2,...,T k,...,t n }, in which each transaction contains several items with purchase quantities; d The new transactions, d ={t 1,t 2,...,t k,...,t n }, in which each transaction contains several items with purchase quantities; U The entire updated database, i.e., D d; TU D The total utility of the transactions in D; TU d The total utility of the transactions in d;

2 344 C.-W. Lin et al. TU U The total utility of the transactions in U; q kj The quantity of item i j in transaction t k ; u kj The utility of item i j in transaction t k, which is calculated as q kj p j ; tu k The transaction utility of currently processed transaction t k ; buf A buffer used to store the total utility of the last processed transactions for transaction insertion. It is set to 0 after the database is rescanned; X An itemset containing several items i j ; S u The upper utility threshold for large (high) transaction-weighted utilization and high utility itemsets. It is the same as the high utility threshold in traditional utility mining; S l The lower utility threshold for pre-large transaction-weighted utilization and pre-large itemsets, where S u >S l ; f The safety transaction utility bound for new transactions; C r The set of candidate r-itemsets; Rescan_Items The set of the itemsets that must be rescanned in original database; HTWU D r The set of large (high) transaction-weighted utilization r-itemsets in the original database; PTWU D r The set of pre-large transaction-weighted utilization r-itemsets in the original database; HTWU D The set of large (high) transaction-weighted utilization itemsets in the original database; PTWU D The set of pre-large transaction-weighted utilization itemsets in the original database; HTWU U r The set of large (high) transaction-weighted utilization r-itemsets in the updated database; PTWU U r The set of pre-large transaction-weighted utilization r-itemsets in the updated database; HTWU U The set of large (high) transaction-weighted utilization itemsets in the updated database; PTWU U The set of pre-large transaction-weighted utilization itemsets in the updated database; HU U The set of high-utility itemsets in the updated database; twu D (X) The transaction-weighted utilization of itemset X in the original database; twu d (X) The transaction-weighted utilization of itemset X in the new transactions; twu U (X) The transaction-weighted utilization of itemset X in the updated database; au D (X) The actual utility of itemset X in the original database; au d (X) au U (X) 1 Introduction The actual utility of itemset X in the new transactions; The actual utility of itemset X in the updated database. Mining association rules [1, 3, 11, 26, 33] is the most popular approach among data mining techniques [2, 7, 9, 12, 13, 15, 19, 21, 22, 30]. It firstly finds all frequent itemsets based on a user-defined minimum support threshold then generates the association rules from the discovered frequent itemsets based on the user-defined minimum confidence threshold. In association rule mining, each item is treated as a binary variable for discovering interesting relationships between itemsets. The frequency of an itemset, however, is insufficient for identifying highly profitable itemsets. In association-rule mining, the frequent itemsets only reflect the frequency of the presence or absence of an item in a database. The other factors, such as price, quantity, or profit are not identified in association rules. In realworld business, however, most customers may not buy large amounts of jewels or gourmets and a retailer may be interested to identify the most valuable customers. Utility mining [6, 23, 29, 31, 32] was thus proposed to improve the limitation of frequent itemsets. The local transaction utility and the external utility are usually respectively defined as quantity and profit in utility mining. The frequent-itemset mining can thus be thought as a specific case of utility mining with both sold quantities and item profits as 1 only. The utility of an itemset X is defined as u(x), which is the summation of the utilities of an itemset X in all transactions containing X. In utility mining, the utility of an itemset is considered as its quantity multiplied by its profit. If u(x) is greater than or equal to the given minimum utility threshold, the itemset X is considered as a high utility itemset. An example is given below to illustrate how to calculate the utility value of an itemset. Suppose there are two transactions as TID 1 ={B : 2,C : 1} and TID 2 ={A : 3,C : 2} respectively. The value appended to an item is its quantity value. Assume the profits of the items are defined as {A : 1,B : 3,C : 2}. Thus, the utilities of the items A, B, C are respectively calculated as u(a) = (3 1) (=3), u(b) = (2 3) (=6), and u(c) = (1 2) +(2 2)(=6). Also, the utilities of the itemsets BC and AC are respectively calculated as u(bc) = (2 3) + (1 2) (=8), and u(ac) = (3 1) + (2 2) (=7). Liu et al. proposed a two-phase algorithm and designed the transaction-weighted utilization (TWU) property for efficiently extracting high utility itemsets based on the downward closure property [24]. The transaction-weighted utilization is considered as an upper bound of each candidate

3 Incrementally mining high utility patterns based on pre-large concept 345 itemset in a transaction to reduce the number of candidate itemsets for later processing in a two-phase algorithm. An additional database scan is performed to determine the real utility values of the remaining candidates to identify high utility itemsets. In the above approaches, the database is assumed to be a static one and the mining processes are performed in a batch mode. In real-world applications, transactions may be dynamically inserted into the original database. The discovered frequent itemsets may become invalid, or some new information may emerge in the updated database. Developing an efficient algorithm to incrementally update the discovered association rules is thus desired in the dynamic data mining. Hong et al. proposed a pre-large concept to define large and pre-large itemsets based on the upper and lower thresholds for inducing a safety number of new transactions to reduce the number required database rescans [10]. In this paper, an incremental mining algorithm based on pre-large concept to update discovered high utility itemsets is proposed for transaction insertion. The contributions of this paper are described below. 1. Traditional utility mining processes the database in a batch way no matter whether new transactions are inserted. It is thus not efficient to waste the discovered information for updating the whole database with a small number of inserted transactions. In this paper, an efficient approach is proposed for handling transaction insertion in utility mining. 2. The downward closure property from the two-phase approach [24] is applied in this paper to reduce the number of candidate itemsets in utility mining, thus speeding up the processing time for mining high utility itemsets. 3. In the proposed algorithm, only a small number of itemsets must be rescanned to maintain the high utility itemsets, thus reducing the computational process when compared to the batch approach. 4. The upper bound utility and the lower bound utility are defined in this paper as the effective thresholds for respectively deriving high (large) and pre-large utility itemsets. The remainder of this paper is organized as follows. Some related works are reviewed in Sect. 2. The algorithm for handling transaction insertion in utility mining is described in Sect. 3. An example to illustrate the proposed algorithm is given in Sect. 4. Experimental results for showing the performance of the proposed algorithm are provided in Sect. 5, and conclusions are finally given in Sect Related works In this section, the association rule mining and the high utility mining are then respectively reviewed. 2.1 Association rule mining Traditional data mining is used to extract useful itemsets or rules from a binary database. The most common approach is to generate association rules from a transactional database, such that the presence of certain items in a transaction implies the presence of some other items. Agrawal and Srikant proposed Apriori algorithm [1] for mining association rules from a set of transactions level by level. The downward closure property is used to prune unpromising candidate itemsets, thus improving the efficiency of discovering association rules. Many algorithms have been proposed for efficiently discovering the desired association rules [1, 2, 4, 5, 7, 26 28, 33]. In real-world applications, transaction databases usually grow over time and the procedure for mining association rules is performed in a batch mode. Some new association rules may be generated and some existing ones may become invalid. Traditional batch mining algorithms solve this problem by rescanning the entire updated database when transactions are inserted, deleted, or modified in the dynamic databases. Cheung et al. thus proposed the Fast UPdated (FUP) algorithm [8] to effectively handle transaction insertion for maintaining frequent itemsets. It divides the itemsets into four parts according to whether they are large or small in the original database and in the newly inserted transactions. Each part has its own procedure to update the discovered information. The original database, however, is still required to be rescanned while an itemset is large in the inserted transactions but small in the original database based on the FUP concept. Hong et al. proposed the pre-large concept for efficiently maintaining discovered rules in incremental data mining [10]. A pre-large itemset is not truly large (frequent), but might easily become large in the future through the data insertion process. Two support thresholds are used to respectively discover the large and pre-large itemsets to reduce the rescanning time of the original database. Since rescanning the database requires a lot of computational time, the maintenance cost is thus reduced. The procedure for transaction insertion is as follows. When new transactions are inserted into the database, there are nine possible cases (see Fig. 1)[10]. Cases 1, 5, 6, 8, and 9 do not affect the final association rules. Cases 2 and 3 may remove some existing association rules, and Cases 4 and 7 may generate new association rules. If all large and pre-large itemsets from the original database are pre-stored, Cases 2, 3, and 4 can be easily handled. In the maintenance phase, the ratio of new transactions compared to those in the database is usually very small. It has been formally shown that an itemset in Case 7 cannot possibly be

4 346 C.-W. Lin et al. Fig. 1 Nine cases arising from newly inserted transactions into an existing database large for the entire updated database as long as the number of new transactions is smaller than safety number f [10]. (Su S l )d f =, 1 S u where S u is the upper threshold, S l is the lower threshold, and d is the number of transactions in the original database. In this paper, the pre-large concept is used to reduce the rescanning of the original database by determining the number of newly inserted transactions. The original database is required to be rescanned only when the number of inserted transactions is larger than the safety bound, which is more efficient than the batch approach and the FUP concept [8]. 2.2 High utility mining Utility mining [6, 17, 32], an extension of frequent itemset mining, is based on the measurement of the local transaction utility and the external utility. The utility of an item in a transaction is defined as the product of its quantity multiplied by its profit. The utility of an itemset in a transaction is the sum of the utilities of all items in this transaction. If the sum ratio of the utilities for an itemset in all transactions is larger than or equal to the user-specified minimum utility threshold, the itemset is considered as a high utility itemset. Many algorithms have been proposed for mining high utility itemsets. Li et al. proposed the FSM, ShFSM, and DCG methods [16, 18] for finding high utility itemsets based on the level closure property. These approaches delete useless candidate itemsets according to the critical function of each candidate. Yao et al. proposed an algorithm for efficiently mining high utility itemsets based on the mathematical property of utility constraints [31]. Two pruning strategies are used to reduce the search space based on the utility upper bounds and expected utility upper bounds, respectively. Liu et al. proposed a two-phase algorithm for efficiently discovering high utility itemsets [24] based on the downward closure property. The two phases generate and test high utility itemsets level by level. The transactionweighted utilization (TWU) is thus defined as an upper bound for maintaining the downward closure property in the two-phase model. For an itemset X, its TWU is calculated as the sum of the transaction utilities of all the transactions containing X. The two-phase model consists of two phases. In the first phase, the transaction-weighted utilization is used as an effective upper bound of each candidate itemset in the transactions according to the downward closure property of it. This method can be applied to the search space to reduce the number of candidate itemsets. In the second phase, an additional database scan is performed to find the real utility values of the remaining candidate itemsets for discovering high utility itemsets. 3 The proposed incremental utility mining algorithm In the past, the updated database is processed in a batch way for handling the transaction insertion. Lin et al. thus proposed the FUP-HUI algorithm [20] to update the discovered utility itemsets based on the FUP concept in incremental utility mining. The database, however, is still required to be re-scanned if some itemsets are small in the original database but large in the newly inserted transactions. In this paper, the proposed incremental mining approach can efficiently maintain and update the discovered high utility itemsets by integrating and modifying the two-phase algorithm [24] and the pre-large concepts [10]. The downward closure property is applied to the proposed approach for reducing the size of candidates in order to decrease the computational time of scanning the database. Figure 1 shows that an itemset in Case 7 cannot possibly be large in the updated database as long as the transaction utility value in the new transactions is smaller than the safety transaction utility bound f. Thus, the original database does not need to be rescanned for maintaining the discovered information. When new transactions are inserted into the original database, the proposed incremental algorithm is performed to update the discovered high utility itemsets. The candidate 1-itemsets in the new transactions are firstly derived with their transaction-weighted utilizations and their actual utility values. The upper utility threshold and the lower utility threshold can be extended from the pre-large concept [10] for respectively deriving the high utility itemsets and the pre-large utility itemsets, which are normally defined as a percentage between 0 and 1. Based on the two utility thresholds, the generated candidate 1-itemsets in the new transactions can be divided into three parts with nine cases

5 Incrementally mining high utility patterns based on pre-large concept 347 according to whether they have large, pre-large, or small transaction-weighted utilization in the original database. Individual procedures are then applied to each part to maintain and update the candidate 1-itemsets for generating high transaction-weighted utilization 1-itemsets in the updated database. Note that both of the transaction-weighted utilizations and the actual utility values of the generated candidate 1-itemsets are updated at the same time. The candidate 2-itemsets are then formed from the remaining large (high) and pre-large transaction-weighted utilization 1-itemsets. The same procedure is repeated level by level until all high utility itemsets have been maintained and updated. The notation and details of the proposed algorithm are described below. A variable, buf, is used to record the total utility value of new transactions since the last rescan of the original database. The notations used in the proposed incremental utility algorithm are given at the beginning of this paper. 3.1 Theoretical foundation In the pre-large concept [10], the number of new transactions and the number of original database can be respectively extended as TU d and TU D in the utility mining. The safety transaction utility bound f in utility mining is then modified as f = (S u S l ) TU D 1 S u. Thus, it has formally been shown below that an itemset in Case 7 cannot possibly be large for the entire updated database as long as the total utility in the new transactions is smaller than the safety number f as: TU d (S u S l ) TU D. 1 S u Theorem 1 Let S l and S u be respectively the lower and the upper utility thresholds, and let TU D and TU d be respectively the total utility in the original database and in the new transactions. If TU d (S u S l ) TU D 1 S u, the transactionweighted utilization of an itemset in Case 7[10] is no longer large in the entire updated database. Proof From TU d (S u S l ) TU D 1 S u, the following derivations can be obtained as: TU d (S u S l ) TU D 1 S u TU d (1 S u ) (S u S l ) TU D TU d TU d S u TU D S u TU D S l TU d + TU D S l S u ( TU D + TU d) TUd + TU D S l TU D + TU d S u. (1) For an itemset in Case 7, if the transaction-weighted utilization of an itemset X is small (neither large nor pre-large) in the original database D, its transaction-weighted utilization twu D (X) in the original database D must be less than S l TU D. Therefore, twu D (X) < S l TU D. (2) Besides, if the transaction-weighted utilization of an itemset X is large in the new transactions d, its transactionweighted utilization twu d (X) in the new transactions d must be larger than or equal to S u TU d and S l TU d,but smaller than the total utility TU d in the new transactions. Therefore, TU d > twu d (X) S u TU d >S l TU d. (3) In utility mining, the ratio of an itemset X in the updated database U is calculated as twuu (X), which can be further TU D +TU d expanded from the formulas (2) and (3) as: twu U (X) TU D + TU d = twud (X) + twu d (X) TU D + TU d < S l TU D + TU d TU D + TU d. (4) From the formulas (1) and (4), the updated ratio for an itemset X is always small as follows: twu U (X) TU D + TU d < S l TU D + TU d TU D + TU d S u. (5) It can thus be found that the transaction-weighted utilization of an itemset X is not large in the entire updated database when the total utility TU d in the new transactions is smaller than or equal to (S u S l ) TU D 1 S u. Example Assume the total utility in original database is setat100(tu D = 100), and the lower and upper utility threshold are respectively set at 50 % and 60 % (S l = 0.5, S u = 0.6). The safety bound f is calculated as: (S u S l ) TU D ( ) 100 = = S u The database is not required to be re-scanned if the total utility in the new transactions is less than 25. A small utility itemset X is definitely not large in the updated database. From Theorem 1, the itemsets in Case 7 can thus be efficiently handled. The proposed PRE-HUI algorithm INPUT: A profit table P of items, an original database D, an upper utility threshold S u (the same as the minimum high utility threshold), a lower utility threshold S l,the total utility TU D of D, the large (high) transactionweighted utilization itemsets HTWU D and the prelarge transaction-weighted utilization itemsets PTWU D with their transaction-weighted utilization values and

6 348 C.-W. Lin et al. actual utility values discovered from D, the safety transaction utility buffer buf for preserving the total utility value of the last processed transactions, and a set of d new transactions. OUTPUT: A set of high utility itemsets (HU U ) for the updated database U(D d). STEP 1: Calculate the safety transaction utility bound f as: f = S u S l TU D. 1 S u STEP 2: Calculate the item utility value u kj of each item i j in each new transaction t k as: u kj = q kj p j, where q kj is the quantity of i j in t k and p j is the profit of i j in the profit table P ; sum up the utility values of all the items in each transaction t k as the transaction utility tu k as: m tu k = u kj ; j=1 Add the transaction utilities for all the new transactions in d as the total utility TU d as: n TU d = tu k. k=1 STEP 3: Calculate the total utility TU U for the updated database as: TU U = TU D + TU d, where TU D is the total utility in the original database D, and TU d is the total utility in the new transactions d. STEP 4: Generate the candidate 1-itemsets C 1, which are the items appearing in the new transactions d. STEP 5: Set r = 1, where r records the number of items in the itemsets currently being processed. STEP 6: For each candidate r-itemset X in C r, calculate the transaction-weighted utilization twu d (X) and the actual utility au d (X) respectively as: twu d (X) = tu k and au d (X) = t k d&t k X t k d&t k X&i j X u kj, where twu d (X) is the sum of the transaction utilities containing the itemset X in the new transactions d, and au d (X) is the sum of the actual item utilities containing itemset X in the new transactions d. STEP 7: For each large (high) transaction-weighted utilization itemset in HTWU D r in the original database, do the following substeps (Cases 1, 2, and 3): Substep 7.1: Set the updated transaction weightedutilization twu U (X) of itemset X in the entire updated database as: twu U (X) = twu D (X) + twu d (X), where twu D (X) was kept in the set of HTWU D in the original database, and twu d (X) was calculated in STEP 6 from the new transactions. Substep 7.2: Set the updated actual utility au U (X) of itemset X in the entire updated database as: au U (X) = au D (X) + au d (X), where au D (X) was kept in the set of HTWU D in the original database, and au d (X) was calculated in STEP 6 from the new transactions. Substep 7.3: If twuu (X) S TU U u, put itemset X in the set of HTWU U r, which is the large (high) transactionweighted utilization r-itemset in the updated database; if S l twuu (X) <S TU U u, put itemset X in the set of PTWU U r, which is the pre-large transactionweighted utilization r-itemset in the updated database; otherwise, discard itemset X since it is still small after the database is updated. STEP 8: For each pre-large transaction-weighted utilization itemset in PTWU D r in the original database, do the substeps in STEP 7 (Cases 4, 5, and 6). STEP 9: For each candidate r-itemset X in C r which does not appear in the sets of HTWU D r and PTWU D r and its twu d (X) TU d S l, put X in the set of Rescan_Items which is processed if the original database is rescanned in STEP 10. STEP 10: If (buf + TU d ) f or the set of Rescan_Items is null, nothing is done in this step; otherwise, do the following substeps for each candidate r-itemset X in the set of Rescan_Items: Substep 10.1: Rescan the original database to calculate the transaction-weighted utilization twu D (X) and the actual utility au D (X) respectively as: twu d (X) = tu k and au d (X) = t k d&t k X t k d&t k X&i j X u kj, where twu D (X) is the sum of the transaction utilities containing the itemset X in the original database D, and au D (X) is the sum of the item utilities containing itemset X in the original database D. Substep 10.2: Set the updated transaction weightedutilization twu U (X) of itemset X in the entire updated database as: twu U (X) = twu D (X) + twu d (X).

7 Incrementally mining high utility patterns based on pre-large concept 349 Fig. 2 The flowchart of the proposed PRE-HUI algorithm Substep 10.3: Set the updated actual utility au U (X) of itemset X in the entire updated database as: au U (X) = au D (X) + au d (X). Substep 10.4: If twuu (X) S TU U u, put itemset X in the set of HTWU U r as the large (high) transaction-weighted utilization r-itemset in the updated database; if S l twuu (X) <S TU U u, put itemset X in the set of PTWU U r as the pre-large transaction-weighted utilization r-itemset in the updated database; otherwise, discard itemset X since it is still small after the database is updated. STEP 11: Form the candidate (r + 1)-itemsets C r+1 from the large transaction-weighted utilization r-itemsets HTWU U r and the pre-large transaction-weighted utilization r-itemsets PTWU U r (HTWUU r PTWUU r ). STEP 12: Set r = r + 1. STEP 13: Repeat Steps 6 to 12 until no updated large (high) or pre-large transaction-weighted utilization itemsets are found. STEP 14: Process each itemset X in the set of HTWU U ; if auu (X) S TU U u, itemset X is a high utility itemset. Put itemset X into the set of HU U. STEP 15: If(buf + TU d )>f, set TU D = TU U and buf = 0; otherwise, set buf = buf + TU d. STEP 16: Set HTWU D = HTWU U as the set of the original large (high) transaction-weighted utilization itemsets and PTWU D = PTWU U as the set of the original pre-large transaction-weighted utilization itemsets for the next transaction insertion in incremental mining. In STEP 13, the large and the pre-large transactionweighted utilization itemsets have been found in the first phase. The actual utility values are then confirmed in the second phase from STEPs 14 to 16. After STEP 16, the high utility itemsets can thus be updated. The set of HU U in STEP 14 includes all the high utility itemsets after the database is updated. The procedure of the proposed algorithm is summarized and shown in Fig An illustrated example of the PRE-HUI algorithm In this section, an example is given to illustrate the proposed incremental utility mining algorithm for transaction insertion. For processing the proposed algorithm in incremental utility mining, the original database must firstly derive the initially high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets, respectively. An example to derive the above information before transaction insertion is shown in Table 1. It consists of 8 transactions with 5 items, denoted by A to E. The upper utility threshold is set at 30 %, and the lower utility threshold is set at 20 %. Note that the upper utility threshold is the same as the minimum high utility threshold in traditional utility mining. The profits of the items are shown in Table 2. Based on the two utility thresholds, the two-phase algorithm is first performed to find the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets with their actual utilities for transaction insertion. Since the two-phase approach is stated in [24], the procedure of the two-phase algorithm for deriving the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets is not described here. The results are respectively shown in Tables 3 and 4.

8 350 C.-W. Lin et al. Table 1 Original database in the example TID A B C D E Table 2 Profit table Item Profit ($) A 6 B 2 C 15 D 7 E 10 Table 3 Large (high) transaction-weighted utilization itemsets and their actual utilities Itemset High transaction-weighted utilization itemsets {A} {C} {E} Actual utility Table 4 Pre-large transaction-weighted utilization itemsets and their actual utilities Itemset Pre-large transaction-weighted utilization itemsets {B} {AC} {AE} {CE} {ACE} Actual utility Assume that the four new transactions shown in Table 5 are inserted into the original database. The proposed incremental utility algorithm processes the new transactions as follows. STEP 1 The safety transaction utility bound is used to evaluate whether the original database should be rescanned for the transaction insertion. It is calculated as S u S l 1 S u TU D = (=50), where S u and S l are the up- Table 5 Four new transactions inserted into the database TID A B C D E Table 6 Transaction utilities for the four new transactions TID A B C D E tu per and lower utility thresholds, respectively, and TU D is the transaction utility in the original database D. STEP 2 The utility value of each item occurring in each new transaction in Table 7 is firstly calculated. Take the first transaction as an example to illustrate the process. The items with their quantities in the first transaction are (B : 2, C : 3,D : 7), and the profits for the three 1-itemsets {B}, {C}, and {D} are 2, 15, and 7, respectively. The transaction utility for the first transaction in Table 5 is thus calculated as tu 9 = (2 2) + (3 15) + (7 7) (=98). The other transactions in Table 5 are calculated in the same way. The results are shown in Table 6. The total utility of the four new transactions TU d is then calculated as ( ) (=210). STEP 3 In this example, the total utility of the updated database TU U is calculated as ( ) (=560). STEP 4 The items that appear in the new transactions are used to generate the candidate 1-itemsets, which are {B}, {C}, {D}, and {E} in this example. STEP 5 The r value is initially set at 1. STEP 6 The transaction-weighted utilization and the actual utility of each candidate 1-itemset in the new transactions are calculated. Take the 1-itemset {B} as an example. The 1-itemset {B} appears in transactions 9, 11, and 12. The value of twu d (B) is the sum of the transaction utilities of the three transactions, which is calculated as ( ) (=200). The actual utility of au d (B) is calculated at the same time, which is (2 2) + (5 2) + (3 2) (=20).The other items are calculated in the same way. The results are shown in Table 7.

9 Incrementally mining high utility patterns based on pre-large concept 351 Table 7 Transaction-weighted utilization and the actual utility of each candidate 1-itemset in the new transactions 1-itemset twu au {B} {C} {D} {E} Table 8 The updated 1-itemsets after STEP 7 1-itemset twu au Ratio Updated result {A} % Small {C} % Large {E} % Pre-large STEP 7 For each 1-itemset in the set of large (high) transaction-weighted utilization 1-itemset HTWU D 1 in the original database is then processed. In this example, the 1- itemsets {A}, {C}, and {E} are processed. The transactionweighted utilization of 1-itemset {A} in the original database is twu D (A) (=108), which was shown in Table 3. The transaction-weighted utilization of 1-itemset {A} in the new transactions is 0. The updated transaction-weighted utilization of 1-itemset {A} is then calculated as ( )(=108). The actual utility of 1-itemset {A} in the original database is au D (A) (=48), which was shown in Table 3. Its actual utility in the new transactions is au d (A) (=0). The updated actual utility of 1-itemset {A} is thus calculated as (48 + 0) (=48). The 1-itemsets {C} and {E} are processed in the same way. The results are shown in Table 8. In this example, the updated transaction-weighted utilization ratio of 1-itemset {A} is calculated as (=19.3 %), which is smaller than the lower utility threshold (20 %). 1-itemset {A} is thus not considered as a high-utility itemset after the database is updated. It is directly neglected in the updated database. The updated transaction-weighted utilization ratio of 1-itemset {C} is (=57 %), which is larger than the upper utility threshold (30 %). 1-itemset {C} is thus put into the set of HTWU U 1. The updated transactionweighted utilization ratio of 1-itemset {E} is calculated as (=28.9 %), which is larger than the lower utility threshold but smaller than the upper utility threshold. 1-Itemset {E} is thus put into the set of PTWU U 1. Thus, HTWUU 1 = {C} and PTWU U 1 ={E}. STEP 8 For each 1-itemset in the set of pre-large transaction-weighted utilization 1-itemset PTWU D 1 in the original database is then processed. In this example, only 1-itemset {B} is processed. The transaction-weighted utilization of 1-itemset {B} in the original database is twu D (B) Table 9 The updated 1-itemsets after STEP 8 1-itemset twu au Ratio Updated result {B} % Large Table 10 The values of remaining 1-itemsets in the new transactions 1-itemset twu au Ratio Updated result {D} % Add to the set of Rescan_Items (=87), which was shown in Table 4. The transactionweighted utilization of 1-itemset {B} in the new transactions is twu d (B) (=200). The updated transaction-weighted utilization of 1-itemset {B} is updated as ( ) (=287). The actual utility of 1-itemset {B} in the original database is au D (B) (=24), which was shown in Table 4. Its actual utility in the new transactions is au d (B) (=20). The updated actual utility of 1-itemset {B} is thus calculated as ( ) (=44). The results are shown in Table 9. In this example, the updated transaction-weighted utilization ratio of 1-itemset {B} is calculated as (=51.2 %), which is larger than the upper utility threshold (30 %). 1-itemset {B} is thus put into the set HTWU U 1. Thus, HTWU U 1 ={B,C} and PTWUU 1 ={E}. STEP 9 1-itemsets {B}, {C}, and {E} in Table 7 are respectively processed in STEPs 7 and 8. The remaining 1-itemset {D} does not appear in either HTWU D 1 or PTWUD 1.The transaction-weighted utilization of 1-itemset {D} in the new transactions is twu d (D) (=200), which was shown in Table 7. The transaction utility in the new transactions is calculated as TU d (=210). Thus, the updated transactionweighted utilization ratio of 1-itemset {D} is calculated as (=95.2 %), which is larger than the lower utility threshold (20 %). 1-itemset {D} is thus put into the set of Rescan_Items if the database is required to be rescanned in STEP 10. The results are shown in Table 10. STEP 10 Since there are no processed transactions in the past in this example, the buf is initially set at 0. The total utility in the new transactions TU d is calculated as 210. Thus, (buf + TU d )(= ) (=210), which is larger than the safety transaction utility bound (f = 50). The original database in Table 1 must be rescanned to find the transaction-weighted utilization twu d (D) and the actual utility au d (D) of 1-itemset {D} in the original database, which are respectively calculated as 67 and 63. The updated transaction-weighted utilization twu U (D) and the actual utility au U (D) of 1-itemset {D} in the updated database are calculated as twu U (D) (=twu D (D) + twu d (D)) (= ) (=267), and au U (D) (=au D (D) + au d (D)) (= ) (=168). The updated transaction-weighted utilization

10 352 C.-W. Lin et al. Table 11 Large (high) and pre-large transaction-weighted utilization 1-itemsets with their actual utilities Large (high) transaction-weighted utilization Pre-large transaction-weighted utilization Itemset Transaction-weighted utilization {B} {C} {D} {E} Actual utility Table 13 The final results Large (high) transaction-weighted utilization Itemset Actual utility Ratio (%) HU {B} % {C} % HU {D} % HU {BC} % {BD} % HU {CD} % {BCD} % HU Table 12 Final results for large (high) and pre-large transactionweighted utilization itemsets with their actual utilities Large (high) transaction-weighted utilization Pre-large transaction-weighted utilization Itemset Transaction-weighted utilization Actual utility {B} {C} {D} {BC} {BD} {CD} {BCD} {E} ratio of 1-itemset {D} is calculated as (=47.7 %), which is larger than the upper utility threshold (30 %). 1-Itemset {D} is thus put into the large (high) transaction-weighted utilization 1-itemset HTWU U 1. The large (high) transactionweighted utilization and the pre-large transaction-weighted utilization 1-itemsets with their actual utilities in the updated database are shown in Table 11. STEP 11 Based on the downward closure property, the candidate 2-itemsets are formed using an Apriori-like approach from Table 11. The generated results are {BC}, {BD}, {BE}, {CD}, {CE}, and {DE}. STEP 12 The variable r issetat2. STEP 13 STEPs 6 to 12 are repeated until no candidate itemsets are generated. The final results are then generated, as shown in Table 12. STEP 14 The final large (high) transaction-weighted utilization itemsets in Table 12 are then determined to evaluate whether they are high-utility itemsets in the updated database. Take 1-itemset {B} as an example. The updated actual utility for 1-itemset {B} is 44; its ratio in the updated database is calculated as (=7.9 %), which is smaller than the lower utility threshold. The 1-itemset {B} is thus not a high-utility itemset in the updated database. The other large (high) transaction-weighted utilization itemsets in Table 12 are processed in the same way. The results are shown in Table 13. STEP 15 Since (buf + TU d )(= ) (=210), which is larger than the safety transaction utility bound (f = 50). Thus, the buf is set at 0 and the transaction utility in the original database is updated as TU D (=TU U ) (=560) for the next transaction insertion in incremental mining. STEP 16 In this example, the set HTWU U ={B,C,D, BC, BD, CD, BCD}, and the set PTWU U ={E}. Theyare the respectively considered as the large (high) and the prelarge transaction-weighted utilization itemsets and put into the set of HTWU D and PTWU D for the next transaction insertion in incremental mining. The final results for the highutility itemsets are {C,D,BD, BCD}. 5 Experimental results Experiments are conducted to evaluate the performance of the two-phase TP-HUI algorithm in batch mode [24], the incremental algorithm for transaction insertion based on FUP concepts (FUP-HUI) [20], and the proposed PRE-HUI. The experiments are implemented in the Java language and executed on a PC with a 3.0 GHz CPU and 4 GB of memory. Two datasets are respectively used in the experiments, namely a simulation dataset from the IBM data generator [14] and the real-world foodmart database [25]. A simulation model [24] was developed to generate the quantities of items in the transactions for the database, which is generated from the IBM data generator [14]. The range of quantities is set from 1 to 5, and the profit is randomly set from 0.01 to 10 in the utility table. The foodmart dataset was collected from an anonymous chain store [25]. It is a quantitative database of the products sold by the chain store. There are 21,556 transactions and 1,559 items in the dataset.

11 Incrementally mining high utility patterns based on pre-large concept 353 In the experimental evaluation, the TP-HUI [24], FUP- HUI [20], and the proposed PRE-HUI algorithms are compared. When new transactions are inserted, the TP-HUI algorithm has to rescan the updated database to extract the updated high utility itemsets in a batch way. The FUP- HUI algorithm divides the itemsets into four parts according to their transaction-weighted utilizations in the original database and in the new transactions. Each part is then individually processed to update the discovered information. For the FUP-HUI algorithm, the original database is required to be rescanned if the itemset has a large (high) transactionweighted utilization in the new transactions but has a small transaction-weighted utilization in the original database. 5.1 Experimental results of the simulation database The IBM data generator [14] is used to generate two simulation databases called T10I4N4KD200K and T20I8N8- KD400K. The average length of items in a transaction is denoted as T, the average length of maximal potentially frequent itemsets is denoted as I, the total number of unique items is denoted as N, and the total number of transactions is denoted as D. For the T10I4N4KD200K, the first 200,000 transactions are used to initially mine the large (high) and pre-large transaction-weighted utilization itemsets with their actual utility values. Each 2,000 transactions are then sequentially and bottom-up extracted from the original database, thus forming the new transactions for transaction insertion at each time. The minimum high utility threshold (upper utility threshold) is set at 0.2 % to evaluate the performance of TP-HUI, FUP-HUI, and the proposed PRE- HUI. The lower utility threshold is set at 0.17 % for PRE- HUI algorithm. Figure 3 shows the comparisons of the execution times for three algorithms. In Fig. 3, it shows that the proposed PRE-HUI algorithm ran faster than the other two algorithms for transaction insertion since the proposed algorithm can reduce the rescanning time by defining a safety transaction utility bound for newly inserted transactions. The FUP-HUI algorithm, however, must rescan the whole database if there is any itemset appearing in Case 3 and the TP-HUI algorithm has to process the updated database in a batch way whenever transactions are inserted. Experiments are also made to evaluate the efficiency of the algorithms in different minimum high utility thresholds. The minimum high utility threshold (upper utility threshold) is set from 0.2 % to 1.0 %, increments 0.2 % each time. The lower utility threshold for the proposed algorithm is set from 0.17 % to 0.97 %, with a decrement of 0.03 % each time along with the upper utility threshold. The results are shown in Fig. 4. In Fig. 4, it is obvious to see that the execution time of the proposed PRE-HUI is lesser than those of the TP- HUI and FUP-HUI algorithms for handling transaction insertion in different minimum high utility thresholds. The average number of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets are then compared to show the space complexity of the three algorithms. The results are shown in Table 14. In the proposed PRE-HUI algorithm, it keeps both the high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets for speeding up the execution time. From Fig. 4 and Table 14, it is thus obvious to see that more pre-large transaction-weighted utilization itemsets can reduce more execution time when compared to Fig. 4 Comparison of execution time in different minimum utility thresholds Table 14 Comparison of numbers of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets TP-HUI/FUP-HUI PRE-HUI Fig. 3 Comparison of execution time for sequentially inserting transactions 0.2 % 0.17 % % 0.37 % % 0.57 % % 0.77 % % 0.97 %

12 354 C.-W. Lin et al. Fig. 5 Comparison of execution time for various numbers of inserted transactions Fig. 7 Comparison of execution time in different minimum utility thresholds Table 15 Comparison of numbers of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets TP-HUI/FUP-HUI PRE-HUI 0.2 % 0.17 % % 0.37 % % 0.57 % % 0.77 % % 0.97 % Fig. 6 Comparison of execution time for sequentially inserting transactions the TP-HUI and the FUP-HUI algorithms. The experiment is made to evaluate the efficiency of the proposed PRE-HUI algorithm for inserting the various numbers transactions. The upper utility threshold (minimum high utility threshold) and the lower utility threshold are respectively set at 0.2 % and 0.17 %. The numbers of inserted transactions were 4,000, 8,000, 12,000, 16,000, and 20,000. The results are shown in Fig. 5. Figure 5 shows that the proposed PRE-HUI algorithm is the fastest one among them. The T20I8N8KD400K is also used to evaluate the scalability of the proposed PRE- HUI algorithm. Each 4,000 transactions are then sequentially and bottom-up extracted from the original database, thus forming the new transactions for transaction insertion at each time. The minimum high utility threshold (upper utility threshold) is set at 0.53 % to evaluate the performance of the TP-HUI, FUP-HUI, and the proposed PRE-HUI algorithms. The lower utility threshold is set at 0.51 % for the PRE-HUI algorithm. The results are shown in Fig. 6. Figure 6 also shows that the proposed PRE-HUI algorithm runs faster than the other two algorithms for transaction insertion. Experiments are then made to evaluate the efficiency of the algorithms with different minimum high utility thresholds. The minimum high utility threshold (upper utility threshold) is set from 0.53 % to 0.61 % with an increment 0.02 % each time. The lower utility threshold for the proposed algorithm is set from 0.51 % to 0.59 % with a decrement of 0.02 % each time along with the upper utility threshold. The results are shown in Fig. 7. In Fig. 7, it is obvious to see that the proposed PRE- HUI is faster than the TP-HUI and FUP-HUI algorithms for handling transaction insertion for different minimum high utility thresholds. The average numbers of the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets are then compared to show the space complexity of the three algorithms. The results are shown in Table 15. From Fig. 7 and Table 15, it is obvious to see that more pre-large transaction-weighted utilization itemsets can reduce more execution time when compared to the TP-HUI and the FUP-HUI algorithms. Experiments are then made to evaluate the efficiency of the algorithms for various numbers of inserted transactions. The upper utility threshold (minimum high utility threshold) and the lower utility threshold are respectively set at 0.2 % and 0.17 %. The numbers of inserted transactions are 4,000, 8,000, 12,000, 16,000, and 20,000. The results are shown in Fig. 8. Figure 8 shows that the proposed PRE-HUI algorithm is the fastest one among them.

13 Incrementally mining high utility patterns based on pre-large concept 355 Fig. 8 Comparison of execution time for various numbers of inserted transactions Fig. 10 Comparison of execution time in different minimum utility thresholds Table 16 Comparison of numbers of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets TP-HUI/FUP-HUI PRE-HUI 0.01 % % % % % % % % % % Fig. 9 Comparison of execution time for transaction insertion 5.2 Experimental results of the foodmart dataset In this section, the foodmart dataset is used to compare the three algorithms. The first 21,556 transactions were initially used to mine the large (high) and pre-large transactionweighted utilization itemsets with their actual utility values. The 215 transactions are then sequentially and bottom-up extracted from the original database, thus forming the new transactions for transaction insertion at each time. The minimum high utility threshold (upper utility threshold) is set at 0.01 % to evaluate the performance of the three algorithms. The lower utility threshold is set at %. Figure 8 shows the execution times of the three algorithms. The 215 transactions are sequentially inserted into the original database. In Fig. 9, it shows that the proposed PRE-HUI algorithm ran faster than the other two algorithms for transaction insertion. Experiments are also made to evaluate the efficiency of the algorithms in different minimum high utility thresholds. The results are shown in Fig. 10. The minimum high utility threshold (upper utility threshold) is set from 0.01 % to %, increments % each time. The lower utility threshold for the proposed algorithm is set at % to %, decrements % each time along with the upper utility threshold. In Fig. 10, it shows that the execution time of the proposed PRE-HUI is faster than those of the TP-HUI and FUP-HUI algorithms for handling transaction insertion in the foodmart dataset. The average numbers of the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets are then compared to show the space complexity of the three algorithms. The results are shown in Table 16. From Fig. 10 and Table 16, it is obvious to see that more pre-large transaction-weighted utilization itemsets can reduce more execution time when compared to the TP-HUI and FUP-HUI algorithms. Finally, experiments are made to evaluate the efficiency of the algorithms in various numbers of inserted transactions. The minimum high utility threshold (upper utility threshold) and the lower utility threshold are respectively set at 0.01 % and %. The numbers of inserted transactions were 215, 430, 645, 860, and The results are shown in Fig. 11. It is obvious to see from Fig. 11 that the proposed PRE- HUI algorithm is faster than the other two algorithms. The above results about time and space complexity are thus acceptable.

14 356 C.-W. Lin et al. References Fig. 11 Comparison of execution time in various numbers of inserted transactions 6 Conclusion and future works In real-world applications, new transactions are constantly inserted into the original database. It is thus important to efficiently update and maintain the discovered utility itemsets for transaction insertion. In this paper, an incremental algorithm for efficiently mining high utility itemsets is proposed for transaction insertions based on the prelarge concept. When new transactions are inserted into the original database, the proposed incremental algorithm partitions itemsets into three parts with nine cases according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database. Each part is then processed individually to maintain the discovered high utility itemsets. Based on the proposed approach, it can thus achieve the advantages as follows: (1) it re-uses the already discovered information to efficiently maintain and update the high utility itemsets without rescanning the entire updated database for handling inserted transactions. (2) The downward closure property is kept to reduce the number of candidates for generating the high utility itemsets level-by-level. (3) The computational time can be greatly reduced by handling only a small number of itemsets in the inserted transactions. (4) The defined upper and lower bound utilities are treated as the effective thresholds for respectively deriving high (large) and pre-large utility itemsets. From the experimental results, it is obvious to see that the proposed incremental high utility mining algorithm outperforms the two-phase algorithm and the previous FUP-HUI algorithm in incremental mining, and the correctness of the proposed PRE-HUI algorithm is also proved. Since the transaction deletion and transaction modification also exist in the real-world applications. In the future, these two topics will be explored and studied as well for developing more efficient approaches to maintain and update the discovered knowledge. 1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The 20th international conference on very large data bases, pp Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6): Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: International conference on management of data, pp Berzal F, Cubero JC, Marín N, Serrano JM (2001) Tbar: an efficient method for association rule mining in relational databases. Data Knowl Eng 37(1): Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 26(2): Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE international conference on data mining, pp Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6): Cheung DW, Jiawei H, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering, pp Hong TP, Wu CH (2011) An improved weighted clustering algorithm for determination of application nodes in heterogeneous sensor networks. J Inf Hiding Multimed Signal Process 2: Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5: Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4): Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4): Hu K, Lu Y, Zhou L, Shi C (1999) Integrating classification and association rule mining: a concept lattice framework. In: The international workshop on new directions in rough sets, data mining, and granular-soft computing, pp IBM quest data mining project, Quest synthetic data generation code Lent B, Swami A, Widom J (1997) Clustering association rules. In: The international conference on data engineering, pp Li YC, Yeh JS, Chang CC (2005) Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets. In: Lecture notes in computer science, vol 3614, pp Li YC, Yeh JS, Chang CC (2005) Efficient algorithms for mining share-frequent itemsets. In: The world congress of international fuzzy systems association, pp Li YC, Yeh JS, Chang CC (2005) Fast algorithm for mining sharefrequent itemsets. In: The Asia Pacific web conference, pp Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Discov 3(3): Lin CW, Lan GC, Hong TP (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8): Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4(4): Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell. doi: /s Liu Y, Liao W-k, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: The international workshop on utility-based data mining, pp Liu Y, Liao W-k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Lecture notes in computer science, pp

15 Incrementally mining high utility patterns based on pre-large concept Microsoft Example database foodmart of Microsoft analysis services. (SQL.80).aspx 26. Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. SIGMOD Rec 24(2): Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5): Sarda NL, Srinivas NV (1998) An adaptive algorithm for incremental mining of association rules. In: The international workshop on database and expert systems applications, pp Song W, Liu Y, Li J (2013) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell. doi: / s Sucahyo Y, Gopalan R (2005) Building a more accurate classifier based on strong frequent patterns. In Lecture notes in computer science, pp Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3): Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM international conference on data mining, pp Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: International conference on knowledge discovery and data mining, pp Chun-Wei Lin received his B.S. and M.S. degrees in Information Management from I-Shou University, Taiwan, in 2002 and 2006, respectively and his Ph.D. degree in Computer Science and Information Engineering from National Cheng Kung University in He is an Associate Professor in the Innovative Information Industry Research Center (IIIRC) at Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China. His research interests include data mining, fuzzy-set theory, machine learning, artificial intelligence, and social computing. Tzung-Pei Hong received his B.S. degree in chemical engineering from National Taiwan University in 1985, and his Ph.D. degree in computer science and information engineering from National Chiao-Tung University in From 1987 to 1994, he was with the Laboratory of Knowledge Engineering, National Chiao-Tung University, where he was involved in applying techniques of parallel processing to artificial intelligence. He was an associate professor at the Department of Computer Science in Chung-Hua Polytechnic Institute from 1992 to 1994, and at the Department of Information Management in I-Shou University (originally Kaohsiung Polytechnic Institute) from 1994 to He was a professor in I-Shou University from 1999 to He was in charge of the whole computerization and library planning for National University of Kaohsiung in Preparation from 1997 to 2000 and served as the first director of the library and computer center in National University of Kaohsiung from 2000 to 2001, as the Dean of Academic Affairs from 2003 to 2006, as the Administrative Vice President from 2007 to 2008, and as the Academic Vice President from 2010 to He is currently a distinguished professor at the Department of Computer Science and Information Engineering and at the Department of Electrical Engineering. His current research interests include knowledge engineering, soft computing and granular computing. Guo-Cheng Lan received his B.S. and M.S. degrees from the Department of Information Management in Southern Taiwan University, Tainan, Taiwan, in 2004 and 2006, respectively, and his Ph.D. degree in Computer Science and Information Engineering from National Cheng Kung University, Taiwan, in He is a member of Taiwanese Association for Artificial Intelligence (TAAI). He is currently a postdoctoral research fellow in Computer Science and Information Engineering from National Cheng Kung University, Taiwan. His current research interests include data mining, medical informatics, soft computing, ontology knowledge, fuzzy theory, and www applications. Jia-Wei Wong received his B.S. degree in Computer Science and Information Engineering from National University of Kaohsiung, Taiwan, in 2010, and his M.S. degree in Computer Science and Information Engineering from National Sun Yat-sen University, Taiwan, in His research interests include privacy data mining and fuzzy theory. Wen-Yang Lin a professor at the Department of Computer Science and Information Engineering, National University of Kaohsiung. He received his Ph.D. in Computer Science and Information Engineering from National Taiwan University in From 2004 to 2007, he has chaired the Department of Computer Science and Information Engineering at National University of Kaohsiung, and served as the Director of Computer Science and Information Center from 2008 to His current research interests include data mining, data warehousing, and evolutionary computation. He has co-edited several special issues of renowned international journals, (co-)authored more than 140 refereed publications, served as cochair, a member of program committees and organized special sessions for many international conferences, including ASONAM, IEEE SMC, WCCI, and IEA/AIE. He is a member of IEEE and the Taiwanese AI Association.

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR

More information

Maintenance of the Prelarge Trees for Record Deletion

Maintenance of the Prelarge Trees for Record Deletion 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets A Two-Phase Algorithm for Fast Discovery of High Utility temsets Ying Liu, Wei-keng Liao, and Alok Choudhary Electrical and Computer Engineering Department, Northwestern University, Evanston, L, USA 60208

More information

A New Method for Mining High Average Utility Itemsets

A New Method for Mining High Average Utility Itemsets A New Method for Mining High Average Utility Itemsets Tien Lu 1, Bay Vo 2,3, Hien T. Nguyen 3, and Tzung-Pei Hong 4 1 University of Sciences, Ho Chi Minh, Vietnam 2 Divison of Data Science, Ton Duc Thang

More information

Maintenance of fast updated frequent pattern trees for record deletion

Maintenance of fast updated frequent pattern trees for record deletion Maintenance of fast updated frequent pattern trees for record deletion Tzung-Pei Hong a,b,, Chun-Wei Lin c, Yu-Lung Wu d a Department of Computer Science and Information Engineering, National University

More information

A New Method for Mining High Average Utility Itemsets

A New Method for Mining High Average Utility Itemsets A New Method for Mining High Average Utility Itemsets Tien Lu, Bay Vo, Hien Nguyen, Tzung-Pei Hong To cite this version: Tien Lu, Bay Vo, Hien Nguyen, Tzung-Pei Hong. A New Method for Mining High Average

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning Philippe Fournier-Viger 1, Cheng-Wei Wu 2, Souleymane Zida 1, Vincent S. Tseng 2 1 Dept. of Computer Science, University

More information

Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds

Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds Jerry Chun-Wei Lin 1(B), Ting Li 1, Philippe Fournier-Viger 2, Tzung-Pei Hong 3,4, and Ja-Hwung Su 5 1 School of Computer

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Mining High Utility Itemsets in Big Data

Mining High Utility Itemsets in Big Data Mining High Utility Itemsets in Big Data Ying Chun Lin 1( ), Cheng-Wei Wu 2, and Vincent S. Tseng 2 1 Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,

More information

High Utility Web Access Patterns Mining from Distributed Databases

High Utility Web Access Patterns Mining from Distributed Databases High Utility Web Access Patterns Mining from Distributed Databases Md.Azam Hosssain 1, Md.Mamunur Rashid 1, Byeong-Soo Jeong 1, Ho-Jin Choi 2 1 Database Lab, Department of Computer Engineering, Kyung Hee

More information

An Efficient Tree-based Fuzzy Data Mining Approach

An Efficient Tree-based Fuzzy Data Mining Approach 150 International Journal of Fuzzy Systems, Vol. 12, No. 2, June 2010 An Efficient Tree-based Fuzzy Data Mining Approach Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Abstract 1 In the past, many algorithms

More information

Efficient Mining of Uncertain Data for High-Utility Itemsets

Efficient Mining of Uncertain Data for High-Utility Itemsets Efficient Mining of Uncertain Data for High-Utility Itemsets Jerry Chun-Wei Lin 1(B), Wensheng Gan 1, Philippe Fournier-Viger 2, Tzung-Pei Hong 3,4, and Vincent S. Tseng 5 1 School of Computer Science

More information

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Efficient Remining of Generalized Multi-supported Association Rules under Support Update Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou

More information

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree International Journal for Research in Engineering Application & Management (IJREAM) ISSN : 2454-915 Vol-4, Issue-3, June 218 RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree Mrs.

More information

EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining

EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining Under consideration for publication in Knowledge and Information Systems EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining Souleymane Zida, Philippe Fournier-Viger 2, Jerry Chun-Wei

More information

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and ata Bases, Corfu Island, Greece, February 16-19, 2007 142 Maintenance of Generalized Association Rules for

More information

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology

More information

CHUIs-Concise and Lossless representation of High Utility Itemsets

CHUIs-Concise and Lossless representation of High Utility Itemsets CHUIs-Concise and Lossless representation of High Utility Itemsets Vandana K V 1, Dr Y.C Kiran 2 P.G. Student, Department of Computer Science & Engineering, BNMIT, Bengaluru, India 1 Associate Professor,

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and

More information

UP-Growth: An Efficient Algorithm for High Utility Itemset Mining

UP-Growth: An Efficient Algorithm for High Utility Itemset Mining UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng 1, Cheng-Wei Wu 1, Bai-En Shie 1, and Philip S. Yu 2 1 Department of Computer Science and Information Engineering, National

More information

Generation of Potential High Utility Itemsets from Transactional Databases

Generation of Potential High Utility Itemsets from Transactional Databases Generation of Potential High Utility Itemsets from Transactional Databases Rajmohan.C Priya.G Niveditha.C Pragathi.R Asst.Prof/IT, Dept of IT Dept of IT Dept of IT SREC, Coimbatore,INDIA,SREC,Coimbatore,.INDIA

More information

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning Philippe Fournier-Viger 1 Cheng Wei Wu 2 Souleymane Zida 1 Vincent S. Tseng 2 presented by Ted Gueniche 1 1 University

More information

Enhancing the Performance of Mining High Utility Itemsets Based On Pattern Algorithm

Enhancing the Performance of Mining High Utility Itemsets Based On Pattern Algorithm Enhancing the Performance of Mining High Utility Itemsets Based On Pattern Algorithm Ranjith Kumar. M 1, kalaivani. A 2, Dr. Sankar Ram. N 3 Assistant Professor, Dept. of CSE., R.M. K College of Engineering

More information

REDUCTION OF LARGE DATABASE AND IDENTIFYING FREQUENT PATTERNS USING ENHANCED HIGH UTILITY MINING. VIT University,Chennai, India.

REDUCTION OF LARGE DATABASE AND IDENTIFYING FREQUENT PATTERNS USING ENHANCED HIGH UTILITY MINING. VIT University,Chennai, India. International Journal of Pure and Applied Mathematics Volume 109 No. 5 2016, 161-169 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: 10.12732/ijpam.v109i5.19

More information

Minig Top-K High Utility Itemsets - Report

Minig Top-K High Utility Itemsets - Report Minig Top-K High Utility Itemsets - Report Daniel Yu, yuda@student.ethz.ch Computer Science Bsc., ETH Zurich, Switzerland May 29, 2015 The report is written as a overview about the main aspects in mining

More information

Discovering High Utility Change Points in Customer Transaction Data

Discovering High Utility Change Points in Customer Transaction Data Discovering High Utility Change Points in Customer Transaction Data Philippe Fournier-Viger 1, Yimin Zhang 2, Jerry Chun-Wei Lin 3, and Yun Sing Koh 4 1 School of Natural Sciences and Humanities, Harbin

More information

Efficient High Utility Itemset Mining using Buffered Utility-Lists

Efficient High Utility Itemset Mining using Buffered Utility-Lists Noname manuscript No. (will be inserted by the editor) Efficient High Utility Itemset Mining using Buffered Utility-Lists Quang-Huy Duong 1 Philippe Fournier-Viger 2( ) Heri Ramampiaro 1( ) Kjetil Nørvåg

More information

EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining

EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining Souleymane Zida 1, Philippe Fournier-Viger 1, Jerry Chun-Wei Lin 2, Cheng-Wei Wu 3, Vincent S. Tseng 3 1 Dept. of Computer Science, University

More information

UNCORRECTED PROOF ARTICLE IN PRESS. 1 Expert Systems with Applications. 2 Mining knowledge from object-oriented instances

UNCORRECTED PROOF ARTICLE IN PRESS. 1 Expert Systems with Applications. 2 Mining knowledge from object-oriented instances 1 Expert Systems with Applications Expert Systems with Applications xxx (26) xxx xxx wwwelseviercom/locate/eswa 2 Mining knowledge from object-oriented instances 3 Cheng-Ming Huang a, Tzung-Pei Hong b,

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

An Efficient Generation of Potential High Utility Itemsets from Transactional Databases

An Efficient Generation of Potential High Utility Itemsets from Transactional Databases An Efficient Generation of Potential High Utility Itemsets from Transactional Databases Velpula Koteswara Rao, Ch. Satyananda Reddy Department of CS & SE, Andhra University Visakhapatnam, Andhra Pradesh,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK EFFICIENT ALGORITHMS FOR MINING HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASES

More information

AN EFFECTIVE WAY OF MINING HIGH UTILITY ITEMSETS FROM LARGE TRANSACTIONAL DATABASES

AN EFFECTIVE WAY OF MINING HIGH UTILITY ITEMSETS FROM LARGE TRANSACTIONAL DATABASES AN EFFECTIVE WAY OF MINING HIGH UTILITY ITEMSETS FROM LARGE TRANSACTIONAL DATABASES 1Chadaram Prasad, 2 Dr. K..Amarendra 1M.Tech student, Dept of CSE, 2 Professor & Vice Principal, DADI INSTITUTE OF INFORMATION

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Utility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets

Utility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets Utility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets C. Sivamathi 1, Dr. S. Vijayarani 2 1 Ph.D Research Scholar, 2 Assistant Professor, Department of CSE, Bharathiar

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity Unil Yun and John J. Leggett Department of Computer Science Texas A&M University College Station, Texas 7783, USA

More information

Keywords: Frequent itemset, closed high utility itemset, utility mining, data mining, traverse path. I. INTRODUCTION

Keywords: Frequent itemset, closed high utility itemset, utility mining, data mining, traverse path. I. INTRODUCTION ISSN: 2321-7782 (Online) Impact Factor: 6.047 Volume 4, Issue 11, November 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case

More information

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 68 CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 5.1 INTRODUCTION During recent years, one of the vibrant research topics is Association rule discovery. This

More information

A Survey of Sequential Pattern Mining

A Survey of Sequential Pattern Mining Data Science and Pattern Recognition c 2017 ISSN XXXX-XXXX Ubiquitous International Volume 1, Number 1, February 2017 A Survey of Sequential Pattern Mining Philippe Fournier-Viger School of Natural Sciences

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai EFFICIENTLY MINING FREQUENT ITEMSETS IN TRANSACTIONAL DATABASES This article has been peer reviewed and accepted for publication in JMST but has not yet been copyediting, typesetting, pagination and proofreading

More information

Applying Data Mining to Wireless Networks

Applying Data Mining to Wireless Networks Applying Data Mining to Wireless Networks CHENG-MING HUANG 1, TZUNG-PEI HONG 2 and SHI-JINN HORNG 3,4 1 Department of Electrical Engineering National Taiwan University of Science and Technology, Taipei,

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures

PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures 1 Introduction Frequent itemset mining is a popular data mining task. It consists of discovering sets of items (itemsets) frequently

More information

ISSN Vol.03,Issue.09 May-2014, Pages:

ISSN Vol.03,Issue.09 May-2014, Pages: www.semargroup.org, www.ijsetr.com ISSN 2319-8885 Vol.03,Issue.09 May-2014, Pages:1786-1790 Performance Comparison of Data Mining Algorithms THIDA AUNG 1, MAY ZIN OO 2 1 Dept of Information Technology,

More information

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

More information

Mining High Utility Itemsets from Large Transactions using Efficient Tree Structure

Mining High Utility Itemsets from Large Transactions using Efficient Tree Structure Mining High Utility Itemsets from Large Transactions using Efficient Tree Structure T.Vinothini Department of Computer Science and Engineering, Knowledge Institute of Technology, Salem. V.V.Ramya Shree

More information

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets Sheetal K. Labade Computer Engineering Dept., JSCOE, Hadapsar Pune, India Srinivasa Narasimha

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Efficient High Utility Itemset Mining using extended UP Growth on Educational Feedback Dataset

Efficient High Utility Itemset Mining using extended UP Growth on Educational Feedback Dataset Efficient High Utility Itemset Mining using extended UP Growth on Educational Feedback Dataset Yamini P. Jawale 1, Prof. Nilesh Vani 2 1 Reasearch Scholar, Godawari College of Engineering,Jalgaon. 2 Research

More information

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Mining Top-K Association Rules. Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2. University of Moncton, Canada

Mining Top-K Association Rules. Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2. University of Moncton, Canada Mining Top-K Association Rules Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2 1 University of Moncton, Canada 2 National Cheng Kung University, Taiwan AI 2012 28 May 2012 Introduction

More information

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923

More information

Privacy Breaches in Privacy-Preserving Data Mining

Privacy Breaches in Privacy-Preserving Data Mining 1 Privacy Breaches in Privacy-Preserving Data Mining Johannes Gehrke Department of Computer Science Cornell University Joint work with Sasha Evfimievski (Cornell), Ramakrishnan Srikant (IBM), and Rakesh

More information

Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model

Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model 356 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model Hirofumi Suzuki, Sun Hao, and Shin-ichi Minato Graduate

More information

Research and Improvement of Apriori Algorithm Based on Hadoop

Research and Improvement of Apriori Algorithm Based on Hadoop Research and Improvement of Apriori Algorithm Based on Hadoop Gao Pengfei a, Wang Jianguo b and Liu Pengcheng c School of Computer Science and Engineering Xi'an Technological University Xi'an, 710021,

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Utility Mining Algorithm for High Utility Item sets from Transactional Databases

Utility Mining Algorithm for High Utility Item sets from Transactional Databases IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. V (Mar-Apr. 2014), PP 34-40 Utility Mining Algorithm for High Utility Item sets from Transactional

More information

Fast Algorithm for Finding the Value-Added Utility Frequent Itemsets Using Apriori Algorithm

Fast Algorithm for Finding the Value-Added Utility Frequent Itemsets Using Apriori Algorithm Fast Algorithm for Finding the Value-Added Utility Frequent Itemsets Using Apriori Algorithm G.Arumugam #1, V.K.Vijayakumar *2 # Senior Professor and Head, Department of Computer Science Madurai Kamaraj

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining

EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining 1 High-utility itemset mining Input a transaction database a unit profit table minutil: a minimum utility threshold set by the

More information

BCB 713 Module Spring 2011

BCB 713 Module Spring 2011 Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

A Survey on Efficient Algorithms for Mining HUI and Closed Item sets

A Survey on Efficient Algorithms for Mining HUI and Closed Item sets A Survey on Efficient Algorithms for Mining HUI and Closed Item sets Mr. Mahendra M. Kapadnis 1, Mr. Prashant B. Koli 2 1 PG Student, Kalyani Charitable Trust s Late G.N. Sapkal College of Engineering,

More information

Lecture 2 Wednesday, August 22, 2007

Lecture 2 Wednesday, August 22, 2007 CS 6604: Data Mining Fall 2007 Lecture 2 Wednesday, August 22, 2007 Lecture: Naren Ramakrishnan Scribe: Clifford Owens 1 Searching for Sets The canonical data mining problem is to search for frequent subsets

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

A Review on High Utility Mining to Improve Discovery of Utility Item set

A Review on High Utility Mining to Improve Discovery of Utility Item set A Review on High Utility Mining to Improve Discovery of Utility Item set Vishakha R. Jaware 1, Madhuri I. Patil 2, Diksha D. Neve 3 Ghrushmarani L. Gayakwad 4, Venus S. Dixit 5, Prof. R. P. Chaudhari 6

More information

A Review on Mining Top-K High Utility Itemsets without Generating Candidates

A Review on Mining Top-K High Utility Itemsets without Generating Candidates A Review on Mining Top-K High Utility Itemsets without Generating Candidates Lekha I. Surana, Professor Vijay B. More Lekha I. Surana, Dept of Computer Engineering, MET s Institute of Engineering Nashik,

More information

Efficient Mining of Generalized Negative Association Rules

Efficient Mining of Generalized Negative Association Rules 2010 IEEE International Conference on Granular Computing Efficient Mining of Generalized egative Association Rules Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang Dept. of Information Engineering and Computer

More information

Mining Top-k High Utility Patterns Over Data Streams

Mining Top-k High Utility Patterns Over Data Streams Mining Top-k High Utility Patterns Over Data Streams Morteza Zihayat and Aijun An Technical Report CSE-2013-09 March 21 2013 Department of Computer Science and Engineering 4700 Keele Street, Toronto, Ontario

More information

A Survey of Itemset Mining

A Survey of Itemset Mining A Survey of Itemset Mining Philippe Fournier-Viger, Jerry Chun-Wei Lin, Bay Vo, Tin Truong Chi, Ji Zhang, Hoai Bac Le Article Type: Advanced Review Abstract Itemset mining is an important subfield of data

More information

Mining Temporal Indirect Associations

Mining Temporal Indirect Associations Mining Temporal Indirect Associations Ling Chen 1,2, Sourav S. Bhowmick 1, Jinyan Li 2 1 School of Computer Engineering, Nanyang Technological University, Singapore, 639798 2 Institute for Infocomm Research,

More information

Kavitha V et al., International Journal of Advanced Engineering Technology E-ISSN

Kavitha V et al., International Journal of Advanced Engineering Technology E-ISSN Research Paper HIGH UTILITY ITEMSET MINING WITH INFLUENTIAL CROSS SELLING ITEMS FROM TRANSACTIONAL DATABASE Kavitha V 1, Dr.Geetha B G 2 Address for Correspondence 1.Assistant Professor(Sl.Gr), Department

More information

Infrequent Weighted Item Set Mining Using Frequent Pattern Growth

Infrequent Weighted Item Set Mining Using Frequent Pattern Growth Infrequent Weighted Item Set Mining Using Frequent Pattern Growth Sahu Smita Rani Assistant Professor, & HOD, Dept of CSE, Sri Vaishnavi College of Engineering. D.Vikram Lakshmikanth Assistant Professor,

More information

Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

Efficient Algorithm for Frequent Itemset Generation in Big Data

Efficient Algorithm for Frequent Itemset Generation in Big Data Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru

More information

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare

More information

Mining of High Utility Itemsets in Service Oriented Computing

Mining of High Utility Itemsets in Service Oriented Computing Mining of High Utility Itemsets in Service Oriented Computing 1 Mamta Singh, 2 D.R. Ingle 1,2 Department of Computer Engineering, Bharati Vidyapeeth s College of Engineering Kharghar, Navi Mumbai Email

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu, Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Analyzing Working of FP-Growth Algorithm for Frequent Pattern Mining

Analyzing Working of FP-Growth Algorithm for Frequent Pattern Mining International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 4, Issue 4, 2017, PP 22-30 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) DOI: http://dx.doi.org/10.20431/2349-4859.0404003

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information