Incrementally mining high utility patterns based on pre-large concept

Size: px

Start display at page:

Download "Incrementally mining high utility patterns based on pre-large concept"

Aubrey Bartholomew Sutton
5 years ago
Views:

1 Appl Intell (2014) 40: DOI /s z Incrementally mining high utility patterns based on pre-large concept Chun-Wei Lin Tzung-Pei Hong Guo-Cheng Lan Jia-Wei Wong Wen-Yang Lin Published online: 27 August 2013 Springer Science+Business Media New York 2013 Abstract In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a twophase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be C.-W. Lin Innovative Information Industry Research Center (IIIRC), Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen , P.R. China jerrylin@ieee.org C.-W. Lin Shenzhen Key Laboratory of Internet Information Collaboration, School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen , P.R. China T.-P. Hong (B) W.-Y. Lin Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, R.O.C. tphong@nuk.edu.tw W.-Y. Lin wylin@nuk.edu.tw T.-P. Hong J.-W. Wong Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 804, Taiwan, R.O.C. J.-W. Wong jwwong.alex@gmail.com G.-C. Lan Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 701, Taiwan, R.O.C. rrfoheiay@gmail.com inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms. Keywords Utility mining Pre-large itemset High utility itemset Incremental mining Two-phase approach Notation I A set of m items, I ={i 1,i 2,...,i j,...,i m }, in which each item i j has its own profit value p j ; P The profit table, {p 1,p 2,...,p j,...,p m }, in which p j is the profit value of item i j ; D The original quantitative database, D ={T 1,T 2,...,T k,...,t n }, in which each transaction contains several items with purchase quantities; d The new transactions, d ={t 1,t 2,...,t k,...,t n }, in which each transaction contains several items with purchase quantities; U The entire updated database, i.e., D d; TU D The total utility of the transactions in D; TU d The total utility of the transactions in d;

2 344 C.-W. Lin et al. TU U The total utility of the transactions in U; q kj The quantity of item i j in transaction t k ; u kj The utility of item i j in transaction t k, which is calculated as q kj p j ; tu k The transaction utility of currently processed transaction t k ; buf A buffer used to store the total utility of the last processed transactions for transaction insertion. It is set to 0 after the database is rescanned; X An itemset containing several items i j ; S u The upper utility threshold for large (high) transaction-weighted utilization and high utility itemsets. It is the same as the high utility threshold in traditional utility mining; S l The lower utility threshold for pre-large transaction-weighted utilization and pre-large itemsets, where S u >S l ; f The safety transaction utility bound for new transactions; C r The set of candidate r-itemsets; Rescan_Items The set of the itemsets that must be rescanned in original database; HTWU D r The set of large (high) transaction-weighted utilization r-itemsets in the original database; PTWU D r The set of pre-large transaction-weighted utilization r-itemsets in the original database; HTWU D The set of large (high) transaction-weighted utilization itemsets in the original database; PTWU D The set of pre-large transaction-weighted utilization itemsets in the original database; HTWU U r The set of large (high) transaction-weighted utilization r-itemsets in the updated database; PTWU U r The set of pre-large transaction-weighted utilization r-itemsets in the updated database; HTWU U The set of large (high) transaction-weighted utilization itemsets in the updated database; PTWU U The set of pre-large transaction-weighted utilization itemsets in the updated database; HU U The set of high-utility itemsets in the updated database; twu D (X) The transaction-weighted utilization of itemset X in the original database; twu d (X) The transaction-weighted utilization of itemset X in the new transactions; twu U (X) The transaction-weighted utilization of itemset X in the updated database; au D (X) The actual utility of itemset X in the original database; au d (X) au U (X) 1 Introduction The actual utility of itemset X in the new transactions; The actual utility of itemset X in the updated database. Mining association rules [1, 3, 11, 26, 33] is the most popular approach among data mining techniques [2, 7, 9, 12, 13, 15, 19, 21, 22, 30]. It firstly finds all frequent itemsets based on a user-defined minimum support threshold then generates the association rules from the discovered frequent itemsets based on the user-defined minimum confidence threshold. In association rule mining, each item is treated as a binary variable for discovering interesting relationships between itemsets. The frequency of an itemset, however, is insufficient for identifying highly profitable itemsets. In association-rule mining, the frequent itemsets only reflect the frequency of the presence or absence of an item in a database. The other factors, such as price, quantity, or profit are not identified in association rules. In realworld business, however, most customers may not buy large amounts of jewels or gourmets and a retailer may be interested to identify the most valuable customers. Utility mining [6, 23, 29, 31, 32] was thus proposed to improve the limitation of frequent itemsets. The local transaction utility and the external utility are usually respectively defined as quantity and profit in utility mining. The frequent-itemset mining can thus be thought as a specific case of utility mining with both sold quantities and item profits as 1 only. The utility of an itemset X is defined as u(x), which is the summation of the utilities of an itemset X in all transactions containing X. In utility mining, the utility of an itemset is considered as its quantity multiplied by its profit. If u(x) is greater than or equal to the given minimum utility threshold, the itemset X is considered as a high utility itemset. An example is given below to illustrate how to calculate the utility value of an itemset. Suppose there are two transactions as TID 1 ={B : 2,C : 1} and TID 2 ={A : 3,C : 2} respectively. The value appended to an item is its quantity value. Assume the profits of the items are defined as {A : 1,B : 3,C : 2}. Thus, the utilities of the items A, B, C are respectively calculated as u(a) = (3 1) (=3), u(b) = (2 3) (=6), and u(c) = (1 2) +(2 2)(=6). Also, the utilities of the itemsets BC and AC are respectively calculated as u(bc) = (2 3) + (1 2) (=8), and u(ac) = (3 1) + (2 2) (=7). Liu et al. proposed a two-phase algorithm and designed the transaction-weighted utilization (TWU) property for efficiently extracting high utility itemsets based on the downward closure property [24]. The transaction-weighted utilization is considered as an upper bound of each candidate

3 Incrementally mining high utility patterns based on pre-large concept 345 itemset in a transaction to reduce the number of candidate itemsets for later processing in a two-phase algorithm. An additional database scan is performed to determine the real utility values of the remaining candidates to identify high utility itemsets. In the above approaches, the database is assumed to be a static one and the mining processes are performed in a batch mode. In real-world applications, transactions may be dynamically inserted into the original database. The discovered frequent itemsets may become invalid, or some new information may emerge in the updated database. Developing an efficient algorithm to incrementally update the discovered association rules is thus desired in the dynamic data mining. Hong et al. proposed a pre-large concept to define large and pre-large itemsets based on the upper and lower thresholds for inducing a safety number of new transactions to reduce the number required database rescans [10]. In this paper, an incremental mining algorithm based on pre-large concept to update discovered high utility itemsets is proposed for transaction insertion. The contributions of this paper are described below. 1. Traditional utility mining processes the database in a batch way no matter whether new transactions are inserted. It is thus not efficient to waste the discovered information for updating the whole database with a small number of inserted transactions. In this paper, an efficient approach is proposed for handling transaction insertion in utility mining. 2. The downward closure property from the two-phase approach [24] is applied in this paper to reduce the number of candidate itemsets in utility mining, thus speeding up the processing time for mining high utility itemsets. 3. In the proposed algorithm, only a small number of itemsets must be rescanned to maintain the high utility itemsets, thus reducing the computational process when compared to the batch approach. 4. The upper bound utility and the lower bound utility are defined in this paper as the effective thresholds for respectively deriving high (large) and pre-large utility itemsets. The remainder of this paper is organized as follows. Some related works are reviewed in Sect. 2. The algorithm for handling transaction insertion in utility mining is described in Sect. 3. An example to illustrate the proposed algorithm is given in Sect. 4. Experimental results for showing the performance of the proposed algorithm are provided in Sect. 5, and conclusions are finally given in Sect Related works In this section, the association rule mining and the high utility mining are then respectively reviewed. 2.1 Association rule mining Traditional data mining is used to extract useful itemsets or rules from a binary database. The most common approach is to generate association rules from a transactional database, such that the presence of certain items in a transaction implies the presence of some other items. Agrawal and Srikant proposed Apriori algorithm [1] for mining association rules from a set of transactions level by level. The downward closure property is used to prune unpromising candidate itemsets, thus improving the efficiency of discovering association rules. Many algorithms have been proposed for efficiently discovering the desired association rules [1, 2, 4, 5, 7, 26 28, 33]. In real-world applications, transaction databases usually grow over time and the procedure for mining association rules is performed in a batch mode. Some new association rules may be generated and some existing ones may become invalid. Traditional batch mining algorithms solve this problem by rescanning the entire updated database when transactions are inserted, deleted, or modified in the dynamic databases. Cheung et al. thus proposed the Fast UPdated (FUP) algorithm [8] to effectively handle transaction insertion for maintaining frequent itemsets. It divides the itemsets into four parts according to whether they are large or small in the original database and in the newly inserted transactions. Each part has its own procedure to update the discovered information. The original database, however, is still required to be rescanned while an itemset is large in the inserted transactions but small in the original database based on the FUP concept. Hong et al. proposed the pre-large concept for efficiently maintaining discovered rules in incremental data mining [10]. A pre-large itemset is not truly large (frequent), but might easily become large in the future through the data insertion process. Two support thresholds are used to respectively discover the large and pre-large itemsets to reduce the rescanning time of the original database. Since rescanning the database requires a lot of computational time, the maintenance cost is thus reduced. The procedure for transaction insertion is as follows. When new transactions are inserted into the database, there are nine possible cases (see Fig. 1)[10]. Cases 1, 5, 6, 8, and 9 do not affect the final association rules. Cases 2 and 3 may remove some existing association rules, and Cases 4 and 7 may generate new association rules. If all large and pre-large itemsets from the original database are pre-stored, Cases 2, 3, and 4 can be easily handled. In the maintenance phase, the ratio of new transactions compared to those in the database is usually very small. It has been formally shown that an itemset in Case 7 cannot possibly be

4 346 C.-W. Lin et al. Fig. 1 Nine cases arising from newly inserted transactions into an existing database large for the entire updated database as long as the number of new transactions is smaller than safety number f [10]. (Su S l )d f =, 1 S u where S u is the upper threshold, S l is the lower threshold, and d is the number of transactions in the original database. In this paper, the pre-large concept is used to reduce the rescanning of the original database by determining the number of newly inserted transactions. The original database is required to be rescanned only when the number of inserted transactions is larger than the safety bound, which is more efficient than the batch approach and the FUP concept [8]. 2.2 High utility mining Utility mining [6, 17, 32], an extension of frequent itemset mining, is based on the measurement of the local transaction utility and the external utility. The utility of an item in a transaction is defined as the product of its quantity multiplied by its profit. The utility of an itemset in a transaction is the sum of the utilities of all items in this transaction. If the sum ratio of the utilities for an itemset in all transactions is larger than or equal to the user-specified minimum utility threshold, the itemset is considered as a high utility itemset. Many algorithms have been proposed for mining high utility itemsets. Li et al. proposed the FSM, ShFSM, and DCG methods [16, 18] for finding high utility itemsets based on the level closure property. These approaches delete useless candidate itemsets according to the critical function of each candidate. Yao et al. proposed an algorithm for efficiently mining high utility itemsets based on the mathematical property of utility constraints [31]. Two pruning strategies are used to reduce the search space based on the utility upper bounds and expected utility upper bounds, respectively. Liu et al. proposed a two-phase algorithm for efficiently discovering high utility itemsets [24] based on the downward closure property. The two phases generate and test high utility itemsets level by level. The transactionweighted utilization (TWU) is thus defined as an upper bound for maintaining the downward closure property in the two-phase model. For an itemset X, its TWU is calculated as the sum of the transaction utilities of all the transactions containing X. The two-phase model consists of two phases. In the first phase, the transaction-weighted utilization is used as an effective upper bound of each candidate itemset in the transactions according to the downward closure property of it. This method can be applied to the search space to reduce the number of candidate itemsets. In the second phase, an additional database scan is performed to find the real utility values of the remaining candidate itemsets for discovering high utility itemsets. 3 The proposed incremental utility mining algorithm In the past, the updated database is processed in a batch way for handling the transaction insertion. Lin et al. thus proposed the FUP-HUI algorithm [20] to update the discovered utility itemsets based on the FUP concept in incremental utility mining. The database, however, is still required to be re-scanned if some itemsets are small in the original database but large in the newly inserted transactions. In this paper, the proposed incremental mining approach can efficiently maintain and update the discovered high utility itemsets by integrating and modifying the two-phase algorithm [24] and the pre-large concepts [10]. The downward closure property is applied to the proposed approach for reducing the size of candidates in order to decrease the computational time of scanning the database. Figure 1 shows that an itemset in Case 7 cannot possibly be large in the updated database as long as the transaction utility value in the new transactions is smaller than the safety transaction utility bound f. Thus, the original database does not need to be rescanned for maintaining the discovered information. When new transactions are inserted into the original database, the proposed incremental algorithm is performed to update the discovered high utility itemsets. The candidate 1-itemsets in the new transactions are firstly derived with their transaction-weighted utilizations and their actual utility values. The upper utility threshold and the lower utility threshold can be extended from the pre-large concept [10] for respectively deriving the high utility itemsets and the pre-large utility itemsets, which are normally defined as a percentage between 0 and 1. Based on the two utility thresholds, the generated candidate 1-itemsets in the new transactions can be divided into three parts with nine cases

5 Incrementally mining high utility patterns based on pre-large concept 347 according to whether they have large, pre-large, or small transaction-weighted utilization in the original database. Individual procedures are then applied to each part to maintain and update the candidate 1-itemsets for generating high transaction-weighted utilization 1-itemsets in the updated database. Note that both of the transaction-weighted utilizations and the actual utility values of the generated candidate 1-itemsets are updated at the same time. The candidate 2-itemsets are then formed from the remaining large (high) and pre-large transaction-weighted utilization 1-itemsets. The same procedure is repeated level by level until all high utility itemsets have been maintained and updated. The notation and details of the proposed algorithm are described below. A variable, buf, is used to record the total utility value of new transactions since the last rescan of the original database. The notations used in the proposed incremental utility algorithm are given at the beginning of this paper. 3.1 Theoretical foundation In the pre-large concept [10], the number of new transactions and the number of original database can be respectively extended as TU d and TU D in the utility mining. The safety transaction utility bound f in utility mining is then modified as f = (S u S l ) TU D 1 S u. Thus, it has formally been shown below that an itemset in Case 7 cannot possibly be large for the entire updated database as long as the total utility in the new transactions is smaller than the safety number f as: TU d (S u S l ) TU D. 1 S u Theorem 1 Let S l and S u be respectively the lower and the upper utility thresholds, and let TU D and TU d be respectively the total utility in the original database and in the new transactions. If TU d (S u S l ) TU D 1 S u, the transactionweighted utilization of an itemset in Case 7[10] is no longer large in the entire updated database. Proof From TU d (S u S l ) TU D 1 S u, the following derivations can be obtained as: TU d (S u S l ) TU D 1 S u TU d (1 S u ) (S u S l ) TU D TU d TU d S u TU D S u TU D S l TU d + TU D S l S u ( TU D + TU d) TUd + TU D S l TU D + TU d S u. (1) For an itemset in Case 7, if the transaction-weighted utilization of an itemset X is small (neither large nor pre-large) in the original database D, its transaction-weighted utilization twu D (X) in the original database D must be less than S l TU D. Therefore, twu D (X) < S l TU D. (2) Besides, if the transaction-weighted utilization of an itemset X is large in the new transactions d, its transactionweighted utilization twu d (X) in the new transactions d must be larger than or equal to S u TU d and S l TU d,but smaller than the total utility TU d in the new transactions. Therefore, TU d > twu d (X) S u TU d >S l TU d. (3) In utility mining, the ratio of an itemset X in the updated database U is calculated as twuu (X), which can be further TU D +TU d expanded from the formulas (2) and (3) as: twu U (X) TU D + TU d = twud (X) + twu d (X) TU D + TU d < S l TU D + TU d TU D + TU d. (4) From the formulas (1) and (4), the updated ratio for an itemset X is always small as follows: twu U (X) TU D + TU d < S l TU D + TU d TU D + TU d S u. (5) It can thus be found that the transaction-weighted utilization of an itemset X is not large in the entire updated database when the total utility TU d in the new transactions is smaller than or equal to (S u S l ) TU D 1 S u. Example Assume the total utility in original database is setat100(tu D = 100), and the lower and upper utility threshold are respectively set at 50 % and 60 % (S l = 0.5, S u = 0.6). The safety bound f is calculated as: (S u S l ) TU D ( ) 100 = = S u The database is not required to be re-scanned if the total utility in the new transactions is less than 25. A small utility itemset X is definitely not large in the updated database. From Theorem 1, the itemsets in Case 7 can thus be efficiently handled. The proposed PRE-HUI algorithm INPUT: A profit table P of items, an original database D, an upper utility threshold S u (the same as the minimum high utility threshold), a lower utility threshold S l,the total utility TU D of D, the large (high) transactionweighted utilization itemsets HTWU D and the prelarge transaction-weighted utilization itemsets PTWU D with their transaction-weighted utilization values and

6 348 C.-W. Lin et al. actual utility values discovered from D, the safety transaction utility buffer buf for preserving the total utility value of the last processed transactions, and a set of d new transactions. OUTPUT: A set of high utility itemsets (HU U ) for the updated database U(D d). STEP 1: Calculate the safety transaction utility bound f as: f = S u S l TU D. 1 S u STEP 2: Calculate the item utility value u kj of each item i j in each new transaction t k as: u kj = q kj p j, where q kj is the quantity of i j in t k and p j is the profit of i j in the profit table P ; sum up the utility values of all the items in each transaction t k as the transaction utility tu k as: m tu k = u kj ; j=1 Add the transaction utilities for all the new transactions in d as the total utility TU d as: n TU d = tu k. k=1 STEP 3: Calculate the total utility TU U for the updated database as: TU U = TU D + TU d, where TU D is the total utility in the original database D, and TU d is the total utility in the new transactions d. STEP 4: Generate the candidate 1-itemsets C 1, which are the items appearing in the new transactions d. STEP 5: Set r = 1, where r records the number of items in the itemsets currently being processed. STEP 6: For each candidate r-itemset X in C r, calculate the transaction-weighted utilization twu d (X) and the actual utility au d (X) respectively as: twu d (X) = tu k and au d (X) = t k d&t k X t k d&t k X&i j X u kj, where twu d (X) is the sum of the transaction utilities containing the itemset X in the new transactions d, and au d (X) is the sum of the actual item utilities containing itemset X in the new transactions d. STEP 7: For each large (high) transaction-weighted utilization itemset in HTWU D r in the original database, do the following substeps (Cases 1, 2, and 3): Substep 7.1: Set the updated transaction weightedutilization twu U (X) of itemset X in the entire updated database as: twu U (X) = twu D (X) + twu d (X), where twu D (X) was kept in the set of HTWU D in the original database, and twu d (X) was calculated in STEP 6 from the new transactions. Substep 7.2: Set the updated actual utility au U (X) of itemset X in the entire updated database as: au U (X) = au D (X) + au d (X), where au D (X) was kept in the set of HTWU D in the original database, and au d (X) was calculated in STEP 6 from the new transactions. Substep 7.3: If twuu (X) S TU U u, put itemset X in the set of HTWU U r, which is the large (high) transactionweighted utilization r-itemset in the updated database; if S l twuu (X) <S TU U u, put itemset X in the set of PTWU U r, which is the pre-large transactionweighted utilization r-itemset in the updated database; otherwise, discard itemset X since it is still small after the database is updated. STEP 8: For each pre-large transaction-weighted utilization itemset in PTWU D r in the original database, do the substeps in STEP 7 (Cases 4, 5, and 6). STEP 9: For each candidate r-itemset X in C r which does not appear in the sets of HTWU D r and PTWU D r and its twu d (X) TU d S l, put X in the set of Rescan_Items which is processed if the original database is rescanned in STEP 10. STEP 10: If (buf + TU d ) f or the set of Rescan_Items is null, nothing is done in this step; otherwise, do the following substeps for each candidate r-itemset X in the set of Rescan_Items: Substep 10.1: Rescan the original database to calculate the transaction-weighted utilization twu D (X) and the actual utility au D (X) respectively as: twu d (X) = tu k and au d (X) = t k d&t k X t k d&t k X&i j X u kj, where twu D (X) is the sum of the transaction utilities containing the itemset X in the original database D, and au D (X) is the sum of the item utilities containing itemset X in the original database D. Substep 10.2: Set the updated transaction weightedutilization twu U (X) of itemset X in the entire updated database as: twu U (X) = twu D (X) + twu d (X).

7 Incrementally mining high utility patterns based on pre-large concept 349 Fig. 2 The flowchart of the proposed PRE-HUI algorithm Substep 10.3: Set the updated actual utility au U (X) of itemset X in the entire updated database as: au U (X) = au D (X) + au d (X). Substep 10.4: If twuu (X) S TU U u, put itemset X in the set of HTWU U r as the large (high) transaction-weighted utilization r-itemset in the updated database; if S l twuu (X) <S TU U u, put itemset X in the set of PTWU U r as the pre-large transaction-weighted utilization r-itemset in the updated database; otherwise, discard itemset X since it is still small after the database is updated. STEP 11: Form the candidate (r + 1)-itemsets C r+1 from the large transaction-weighted utilization r-itemsets HTWU U r and the pre-large transaction-weighted utilization r-itemsets PTWU U r (HTWUU r PTWUU r ). STEP 12: Set r = r + 1. STEP 13: Repeat Steps 6 to 12 until no updated large (high) or pre-large transaction-weighted utilization itemsets are found. STEP 14: Process each itemset X in the set of HTWU U ; if auu (X) S TU U u, itemset X is a high utility itemset. Put itemset X into the set of HU U. STEP 15: If(buf + TU d )>f, set TU D = TU U and buf = 0; otherwise, set buf = buf + TU d. STEP 16: Set HTWU D = HTWU U as the set of the original large (high) transaction-weighted utilization itemsets and PTWU D = PTWU U as the set of the original pre-large transaction-weighted utilization itemsets for the next transaction insertion in incremental mining. In STEP 13, the large and the pre-large transactionweighted utilization itemsets have been found in the first phase. The actual utility values are then confirmed in the second phase from STEPs 14 to 16. After STEP 16, the high utility itemsets can thus be updated. The set of HU U in STEP 14 includes all the high utility itemsets after the database is updated. The procedure of the proposed algorithm is summarized and shown in Fig An illustrated example of the PRE-HUI algorithm In this section, an example is given to illustrate the proposed incremental utility mining algorithm for transaction insertion. For processing the proposed algorithm in incremental utility mining, the original database must firstly derive the initially high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets, respectively. An example to derive the above information before transaction insertion is shown in Table 1. It consists of 8 transactions with 5 items, denoted by A to E. The upper utility threshold is set at 30 %, and the lower utility threshold is set at 20 %. Note that the upper utility threshold is the same as the minimum high utility threshold in traditional utility mining. The profits of the items are shown in Table 2. Based on the two utility thresholds, the two-phase algorithm is first performed to find the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets with their actual utilities for transaction insertion. Since the two-phase approach is stated in [24], the procedure of the two-phase algorithm for deriving the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets is not described here. The results are respectively shown in Tables 3 and 4.

8 350 C.-W. Lin et al. Table 1 Original database in the example TID A B C D E Table 2 Profit table Item Profit ($) A 6 B 2 C 15 D 7 E 10 Table 3 Large (high) transaction-weighted utilization itemsets and their actual utilities Itemset High transaction-weighted utilization itemsets {A} {C} {E} Actual utility Table 4 Pre-large transaction-weighted utilization itemsets and their actual utilities Itemset Pre-large transaction-weighted utilization itemsets {B} {AC} {AE} {CE} {ACE} Actual utility Assume that the four new transactions shown in Table 5 are inserted into the original database. The proposed incremental utility algorithm processes the new transactions as follows. STEP 1 The safety transaction utility bound is used to evaluate whether the original database should be rescanned for the transaction insertion. It is calculated as S u S l 1 S u TU D = (=50), where S u and S l are the up- Table 5 Four new transactions inserted into the database TID A B C D E Table 6 Transaction utilities for the four new transactions TID A B C D E tu per and lower utility thresholds, respectively, and TU D is the transaction utility in the original database D. STEP 2 The utility value of each item occurring in each new transaction in Table 7 is firstly calculated. Take the first transaction as an example to illustrate the process. The items with their quantities in the first transaction are (B : 2, C : 3,D : 7), and the profits for the three 1-itemsets {B}, {C}, and {D} are 2, 15, and 7, respectively. The transaction utility for the first transaction in Table 5 is thus calculated as tu 9 = (2 2) + (3 15) + (7 7) (=98). The other transactions in Table 5 are calculated in the same way. The results are shown in Table 6. The total utility of the four new transactions TU d is then calculated as ( ) (=210). STEP 3 In this example, the total utility of the updated database TU U is calculated as ( ) (=560). STEP 4 The items that appear in the new transactions are used to generate the candidate 1-itemsets, which are {B}, {C}, {D}, and {E} in this example. STEP 5 The r value is initially set at 1. STEP 6 The transaction-weighted utilization and the actual utility of each candidate 1-itemset in the new transactions are calculated. Take the 1-itemset {B} as an example. The 1-itemset {B} appears in transactions 9, 11, and 12. The value of twu d (B) is the sum of the transaction utilities of the three transactions, which is calculated as ( ) (=200). The actual utility of au d (B) is calculated at the same time, which is (2 2) + (5 2) + (3 2) (=20).The other items are calculated in the same way. The results are shown in Table 7.

9 Incrementally mining high utility patterns based on pre-large concept 351 Table 7 Transaction-weighted utilization and the actual utility of each candidate 1-itemset in the new transactions 1-itemset twu au {B} {C} {D} {E} Table 8 The updated 1-itemsets after STEP 7 1-itemset twu au Ratio Updated result {A} % Small {C} % Large {E} % Pre-large STEP 7 For each 1-itemset in the set of large (high) transaction-weighted utilization 1-itemset HTWU D 1 in the original database is then processed. In this example, the 1- itemsets {A}, {C}, and {E} are processed. The transactionweighted utilization of 1-itemset {A} in the original database is twu D (A) (=108), which was shown in Table 3. The transaction-weighted utilization of 1-itemset {A} in the new transactions is 0. The updated transaction-weighted utilization of 1-itemset {A} is then calculated as ( )(=108). The actual utility of 1-itemset {A} in the original database is au D (A) (=48), which was shown in Table 3. Its actual utility in the new transactions is au d (A) (=0). The updated actual utility of 1-itemset {A} is thus calculated as (48 + 0) (=48). The 1-itemsets {C} and {E} are processed in the same way. The results are shown in Table 8. In this example, the updated transaction-weighted utilization ratio of 1-itemset {A} is calculated as (=19.3 %), which is smaller than the lower utility threshold (20 %). 1-itemset {A} is thus not considered as a high-utility itemset after the database is updated. It is directly neglected in the updated database. The updated transaction-weighted utilization ratio of 1-itemset {C} is (=57 %), which is larger than the upper utility threshold (30 %). 1-itemset {C} is thus put into the set of HTWU U 1. The updated transactionweighted utilization ratio of 1-itemset {E} is calculated as (=28.9 %), which is larger than the lower utility threshold but smaller than the upper utility threshold. 1-Itemset {E} is thus put into the set of PTWU U 1. Thus, HTWUU 1 = {C} and PTWU U 1 ={E}. STEP 8 For each 1-itemset in the set of pre-large transaction-weighted utilization 1-itemset PTWU D 1 in the original database is then processed. In this example, only 1-itemset {B} is processed. The transaction-weighted utilization of 1-itemset {B} in the original database is twu D (B) Table 9 The updated 1-itemsets after STEP 8 1-itemset twu au Ratio Updated result {B} % Large Table 10 The values of remaining 1-itemsets in the new transactions 1-itemset twu au Ratio Updated result {D} % Add to the set of Rescan_Items (=87), which was shown in Table 4. The transactionweighted utilization of 1-itemset {B} in the new transactions is twu d (B) (=200). The updated transaction-weighted utilization of 1-itemset {B} is updated as ( ) (=287). The actual utility of 1-itemset {B} in the original database is au D (B) (=24), which was shown in Table 4. Its actual utility in the new transactions is au d (B) (=20). The updated actual utility of 1-itemset {B} is thus calculated as ( ) (=44). The results are shown in Table 9. In this example, the updated transaction-weighted utilization ratio of 1-itemset {B} is calculated as (=51.2 %), which is larger than the upper utility threshold (30 %). 1-itemset {B} is thus put into the set HTWU U 1. Thus, HTWU U 1 ={B,C} and PTWUU 1 ={E}. STEP 9 1-itemsets {B}, {C}, and {E} in Table 7 are respectively processed in STEPs 7 and 8. The remaining 1-itemset {D} does not appear in either HTWU D 1 or PTWUD 1.The transaction-weighted utilization of 1-itemset {D} in the new transactions is twu d (D) (=200), which was shown in Table 7. The transaction utility in the new transactions is calculated as TU d (=210). Thus, the updated transactionweighted utilization ratio of 1-itemset {D} is calculated as (=95.2 %), which is larger than the lower utility threshold (20 %). 1-itemset {D} is thus put into the set of Rescan_Items if the database is required to be rescanned in STEP 10. The results are shown in Table 10. STEP 10 Since there are no processed transactions in the past in this example, the buf is initially set at 0. The total utility in the new transactions TU d is calculated as 210. Thus, (buf + TU d )(= ) (=210), which is larger than the safety transaction utility bound (f = 50). The original database in Table 1 must be rescanned to find the transaction-weighted utilization twu d (D) and the actual utility au d (D) of 1-itemset {D} in the original database, which are respectively calculated as 67 and 63. The updated transaction-weighted utilization twu U (D) and the actual utility au U (D) of 1-itemset {D} in the updated database are calculated as twu U (D) (=twu D (D) + twu d (D)) (= ) (=267), and au U (D) (=au D (D) + au d (D)) (= ) (=168). The updated transaction-weighted utilization

10 352 C.-W. Lin et al. Table 11 Large (high) and pre-large transaction-weighted utilization 1-itemsets with their actual utilities Large (high) transaction-weighted utilization Pre-large transaction-weighted utilization Itemset Transaction-weighted utilization {B} {C} {D} {E} Actual utility Table 13 The final results Large (high) transaction-weighted utilization Itemset Actual utility Ratio (%) HU {B} % {C} % HU {D} % HU {BC} % {BD} % HU {CD} % {BCD} % HU Table 12 Final results for large (high) and pre-large transactionweighted utilization itemsets with their actual utilities Large (high) transaction-weighted utilization Pre-large transaction-weighted utilization Itemset Transaction-weighted utilization Actual utility {B} {C} {D} {BC} {BD} {CD} {BCD} {E} ratio of 1-itemset {D} is calculated as (=47.7 %), which is larger than the upper utility threshold (30 %). 1-Itemset {D} is thus put into the large (high) transaction-weighted utilization 1-itemset HTWU U 1. The large (high) transactionweighted utilization and the pre-large transaction-weighted utilization 1-itemsets with their actual utilities in the updated database are shown in Table 11. STEP 11 Based on the downward closure property, the candidate 2-itemsets are formed using an Apriori-like approach from Table 11. The generated results are {BC}, {BD}, {BE}, {CD}, {CE}, and {DE}. STEP 12 The variable r issetat2. STEP 13 STEPs 6 to 12 are repeated until no candidate itemsets are generated. The final results are then generated, as shown in Table 12. STEP 14 The final large (high) transaction-weighted utilization itemsets in Table 12 are then determined to evaluate whether they are high-utility itemsets in the updated database. Take 1-itemset {B} as an example. The updated actual utility for 1-itemset {B} is 44; its ratio in the updated database is calculated as (=7.9 %), which is smaller than the lower utility threshold. The 1-itemset {B} is thus not a high-utility itemset in the updated database. The other large (high) transaction-weighted utilization itemsets in Table 12 are processed in the same way. The results are shown in Table 13. STEP 15 Since (buf + TU d )(= ) (=210), which is larger than the safety transaction utility bound (f = 50). Thus, the buf is set at 0 and the transaction utility in the original database is updated as TU D (=TU U ) (=560) for the next transaction insertion in incremental mining. STEP 16 In this example, the set HTWU U ={B,C,D, BC, BD, CD, BCD}, and the set PTWU U ={E}. Theyare the respectively considered as the large (high) and the prelarge transaction-weighted utilization itemsets and put into the set of HTWU D and PTWU D for the next transaction insertion in incremental mining. The final results for the highutility itemsets are {C,D,BD, BCD}. 5 Experimental results Experiments are conducted to evaluate the performance of the two-phase TP-HUI algorithm in batch mode [24], the incremental algorithm for transaction insertion based on FUP concepts (FUP-HUI) [20], and the proposed PRE-HUI. The experiments are implemented in the Java language and executed on a PC with a 3.0 GHz CPU and 4 GB of memory. Two datasets are respectively used in the experiments, namely a simulation dataset from the IBM data generator [14] and the real-world foodmart database [25]. A simulation model [24] was developed to generate the quantities of items in the transactions for the database, which is generated from the IBM data generator [14]. The range of quantities is set from 1 to 5, and the profit is randomly set from 0.01 to 10 in the utility table. The foodmart dataset was collected from an anonymous chain store [25]. It is a quantitative database of the products sold by the chain store. There are 21,556 transactions and 1,559 items in the dataset.

11 Incrementally mining high utility patterns based on pre-large concept 353 In the experimental evaluation, the TP-HUI [24], FUP- HUI [20], and the proposed PRE-HUI algorithms are compared. When new transactions are inserted, the TP-HUI algorithm has to rescan the updated database to extract the updated high utility itemsets in a batch way. The FUP- HUI algorithm divides the itemsets into four parts according to their transaction-weighted utilizations in the original database and in the new transactions. Each part is then individually processed to update the discovered information. For the FUP-HUI algorithm, the original database is required to be rescanned if the itemset has a large (high) transactionweighted utilization in the new transactions but has a small transaction-weighted utilization in the original database. 5.1 Experimental results of the simulation database The IBM data generator [14] is used to generate two simulation databases called T10I4N4KD200K and T20I8N8- KD400K. The average length of items in a transaction is denoted as T, the average length of maximal potentially frequent itemsets is denoted as I, the total number of unique items is denoted as N, and the total number of transactions is denoted as D. For the T10I4N4KD200K, the first 200,000 transactions are used to initially mine the large (high) and pre-large transaction-weighted utilization itemsets with their actual utility values. Each 2,000 transactions are then sequentially and bottom-up extracted from the original database, thus forming the new transactions for transaction insertion at each time. The minimum high utility threshold (upper utility threshold) is set at 0.2 % to evaluate the performance of TP-HUI, FUP-HUI, and the proposed PRE- HUI. The lower utility threshold is set at 0.17 % for PRE- HUI algorithm. Figure 3 shows the comparisons of the execution times for three algorithms. In Fig. 3, it shows that the proposed PRE-HUI algorithm ran faster than the other two algorithms for transaction insertion since the proposed algorithm can reduce the rescanning time by defining a safety transaction utility bound for newly inserted transactions. The FUP-HUI algorithm, however, must rescan the whole database if there is any itemset appearing in Case 3 and the TP-HUI algorithm has to process the updated database in a batch way whenever transactions are inserted. Experiments are also made to evaluate the efficiency of the algorithms in different minimum high utility thresholds. The minimum high utility threshold (upper utility threshold) is set from 0.2 % to 1.0 %, increments 0.2 % each time. The lower utility threshold for the proposed algorithm is set from 0.17 % to 0.97 %, with a decrement of 0.03 % each time along with the upper utility threshold. The results are shown in Fig. 4. In Fig. 4, it is obvious to see that the execution time of the proposed PRE-HUI is lesser than those of the TP- HUI and FUP-HUI algorithms for handling transaction insertion in different minimum high utility thresholds. The average number of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets are then compared to show the space complexity of the three algorithms. The results are shown in Table 14. In the proposed PRE-HUI algorithm, it keeps both the high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets for speeding up the execution time. From Fig. 4 and Table 14, it is thus obvious to see that more pre-large transaction-weighted utilization itemsets can reduce more execution time when compared to Fig. 4 Comparison of execution time in different minimum utility thresholds Table 14 Comparison of numbers of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets TP-HUI/FUP-HUI PRE-HUI Fig. 3 Comparison of execution time for sequentially inserting transactions 0.2 % 0.17 % % 0.37 % % 0.57 % % 0.77 % % 0.97 %

12 354 C.-W. Lin et al. Fig. 5 Comparison of execution time for various numbers of inserted transactions Fig. 7 Comparison of execution time in different minimum utility thresholds Table 15 Comparison of numbers of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets TP-HUI/FUP-HUI PRE-HUI 0.2 % 0.17 % % 0.37 % % 0.57 % % 0.77 % % 0.97 % Fig. 6 Comparison of execution time for sequentially inserting transactions the TP-HUI and the FUP-HUI algorithms. The experiment is made to evaluate the efficiency of the proposed PRE-HUI algorithm for inserting the various numbers transactions. The upper utility threshold (minimum high utility threshold) and the lower utility threshold are respectively set at 0.2 % and 0.17 %. The numbers of inserted transactions were 4,000, 8,000, 12,000, 16,000, and 20,000. The results are shown in Fig. 5. Figure 5 shows that the proposed PRE-HUI algorithm is the fastest one among them. The T20I8N8KD400K is also used to evaluate the scalability of the proposed PRE- HUI algorithm. Each 4,000 transactions are then sequentially and bottom-up extracted from the original database, thus forming the new transactions for transaction insertion at each time. The minimum high utility threshold (upper utility threshold) is set at 0.53 % to evaluate the performance of the TP-HUI, FUP-HUI, and the proposed PRE-HUI algorithms. The lower utility threshold is set at 0.51 % for the PRE-HUI algorithm. The results are shown in Fig. 6. Figure 6 also shows that the proposed PRE-HUI algorithm runs faster than the other two algorithms for transaction insertion. Experiments are then made to evaluate the efficiency of the algorithms with different minimum high utility thresholds. The minimum high utility threshold (upper utility threshold) is set from 0.53 % to 0.61 % with an increment 0.02 % each time. The lower utility threshold for the proposed algorithm is set from 0.51 % to 0.59 % with a decrement of 0.02 % each time along with the upper utility threshold. The results are shown in Fig. 7. In Fig. 7, it is obvious to see that the proposed PRE- HUI is faster than the TP-HUI and FUP-HUI algorithms for handling transaction insertion for different minimum high utility thresholds. The average numbers of the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets are then compared to show the space complexity of the three algorithms. The results are shown in Table 15. From Fig. 7 and Table 15, it is obvious to see that more pre-large transaction-weighted utilization itemsets can reduce more execution time when compared to the TP-HUI and the FUP-HUI algorithms. Experiments are then made to evaluate the efficiency of the algorithms for various numbers of inserted transactions. The upper utility threshold (minimum high utility threshold) and the lower utility threshold are respectively set at 0.2 % and 0.17 %. The numbers of inserted transactions are 4,000, 8,000, 12,000, 16,000, and 20,000. The results are shown in Fig. 8. Figure 8 shows that the proposed PRE-HUI algorithm is the fastest one among them.

13 Incrementally mining high utility patterns based on pre-large concept 355 Fig. 8 Comparison of execution time for various numbers of inserted transactions Fig. 10 Comparison of execution time in different minimum utility thresholds Table 16 Comparison of numbers of high transaction-weighted utilization itemsets and pre-large transaction-weighted utilization itemsets TP-HUI/FUP-HUI PRE-HUI 0.01 % % % % % % % % % % Fig. 9 Comparison of execution time for transaction insertion 5.2 Experimental results of the foodmart dataset In this section, the foodmart dataset is used to compare the three algorithms. The first 21,556 transactions were initially used to mine the large (high) and pre-large transactionweighted utilization itemsets with their actual utility values. The 215 transactions are then sequentially and bottom-up extracted from the original database, thus forming the new transactions for transaction insertion at each time. The minimum high utility threshold (upper utility threshold) is set at 0.01 % to evaluate the performance of the three algorithms. The lower utility threshold is set at %. Figure 8 shows the execution times of the three algorithms. The 215 transactions are sequentially inserted into the original database. In Fig. 9, it shows that the proposed PRE-HUI algorithm ran faster than the other two algorithms for transaction insertion. Experiments are also made to evaluate the efficiency of the algorithms in different minimum high utility thresholds. The results are shown in Fig. 10. The minimum high utility threshold (upper utility threshold) is set from 0.01 % to %, increments % each time. The lower utility threshold for the proposed algorithm is set at % to %, decrements % each time along with the upper utility threshold. In Fig. 10, it shows that the execution time of the proposed PRE-HUI is faster than those of the TP-HUI and FUP-HUI algorithms for handling transaction insertion in the foodmart dataset. The average numbers of the high transaction-weighted utilization itemsets and the pre-large transaction-weighted utilization itemsets are then compared to show the space complexity of the three algorithms. The results are shown in Table 16. From Fig. 10 and Table 16, it is obvious to see that more pre-large transaction-weighted utilization itemsets can reduce more execution time when compared to the TP-HUI and FUP-HUI algorithms. Finally, experiments are made to evaluate the efficiency of the algorithms in various numbers of inserted transactions. The minimum high utility threshold (upper utility threshold) and the lower utility threshold are respectively set at 0.01 % and %. The numbers of inserted transactions were 215, 430, 645, 860, and The results are shown in Fig. 11. It is obvious to see from Fig. 11 that the proposed PRE- HUI algorithm is faster than the other two algorithms. The above results about time and space complexity are thus acceptable.

14 356 C.-W. Lin et al. References Fig. 11 Comparison of execution time in various numbers of inserted transactions 6 Conclusion and future works In real-world applications, new transactions are constantly inserted into the original database. It is thus important to efficiently update and maintain the discovered utility itemsets for transaction insertion. In this paper, an incremental algorithm for efficiently mining high utility itemsets is proposed for transaction insertions based on the prelarge concept. When new transactions are inserted into the original database, the proposed incremental algorithm partitions itemsets into three parts with nine cases according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database. Each part is then processed individually to maintain the discovered high utility itemsets. Based on the proposed approach, it can thus achieve the advantages as follows: (1) it re-uses the already discovered information to efficiently maintain and update the high utility itemsets without rescanning the entire updated database for handling inserted transactions. (2) The downward closure property is kept to reduce the number of candidates for generating the high utility itemsets level-by-level. (3) The computational time can be greatly reduced by handling only a small number of itemsets in the inserted transactions. (4) The defined upper and lower bound utilities are treated as the effective thresholds for respectively deriving high (large) and pre-large utility itemsets. From the experimental results, it is obvious to see that the proposed incremental high utility mining algorithm outperforms the two-phase algorithm and the previous FUP-HUI algorithm in incremental mining, and the correctness of the proposed PRE-HUI algorithm is also proved. Since the transaction deletion and transaction modification also exist in the real-world applications. In the future, these two topics will be explored and studied as well for developing more efficient approaches to maintain and update the discovered knowledge. 1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The 20th international conference on very large data bases, pp Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6): Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: International conference on management of data, pp Berzal F, Cubero JC, Marín N, Serrano JM (2001) Tbar: an efficient method for association rule mining in relational databases. Data Knowl Eng 37(1): Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 26(2): Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE international conference on data mining, pp Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6): Cheung DW, Jiawei H, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering, pp Hong TP, Wu CH (2011) An improved weighted clustering algorithm for determination of application nodes in heterogeneous sensor networks. J Inf Hiding Multimed Signal Process 2: Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5: Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4): Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4): Hu K, Lu Y, Zhou L, Shi C (1999) Integrating classification and association rule mining: a concept lattice framework. In: The international workshop on new directions in rough sets, data mining, and granular-soft computing, pp IBM quest data mining project, Quest synthetic data generation code Lent B, Swami A, Widom J (1997) Clustering association rules. In: The international conference on data engineering, pp Li YC, Yeh JS, Chang CC (2005) Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets. In: Lecture notes in computer science, vol 3614, pp Li YC, Yeh JS, Chang CC (2005) Efficient algorithms for mining share-frequent itemsets. In: The world congress of international fuzzy systems association, pp Li YC, Yeh JS, Chang CC (2005) Fast algorithm for mining sharefrequent itemsets. In: The Asia Pacific web conference, pp Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Discov 3(3): Lin CW, Lan GC, Hong TP (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8): Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4(4): Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell. doi: /s Liu Y, Liao W-k, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: The international workshop on utility-based data mining, pp Liu Y, Liao W-k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Lecture notes in computer science, pp

Incrementally mining high utility patterns based on pre-large concept 357 25. Microsoft Example database foodmart of Microsoft analysis services. http://msdn.microsoft.com/en-us/library/aa217032 (SQL.

Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813 825 28.

Song W, Liu Y, Li J (2013) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell. doi:10.1007/ s10489-013-0443-7 30.

Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603 626 32.

Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules.

15 Incrementally mining high utility patterns based on pre-large concept Microsoft Example database foodmart of Microsoft analysis services. (SQL.80).aspx 26. Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. SIGMOD Rec 24(2): Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5): Sarda NL, Srinivas NV (1998) An adaptive algorithm for incremental mining of association rules. In: The international workshop on database and expert systems applications, pp Song W, Liu Y, Li J (2013) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell. doi: / s Sucahyo Y, Gopalan R (2005) Building a more accurate classifier based on strong frequent patterns. In Lecture notes in computer science, pp Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3): Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM international conference on data mining, pp Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: International conference on knowledge discovery and data mining, pp Chun-Wei Lin received his B.S. and M.S. degrees in Information Management from I-Shou University, Taiwan, in 2002 and 2006, respectively and his Ph.D. degree in Computer Science and Information Engineering from National Cheng Kung University in He is an Associate Professor in the Innovative Information Industry Research Center (IIIRC) at Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China. His research interests include data mining, fuzzy-set theory, machine learning, artificial intelligence, and social computing. Tzung-Pei Hong received his B.S. degree in chemical engineering from National Taiwan University in 1985, and his Ph.D. degree in computer science and information engineering from National Chiao-Tung University in From 1987 to 1994, he was with the Laboratory of Knowledge Engineering, National Chiao-Tung University, where he was involved in applying techniques of parallel processing to artificial intelligence. He was an associate professor at the Department of Computer Science in Chung-Hua Polytechnic Institute from 1992 to 1994, and at the Department of Information Management in I-Shou University (originally Kaohsiung Polytechnic Institute) from 1994 to He was a professor in I-Shou University from 1999 to He was in charge of the whole computerization and library planning for National University of Kaohsiung in Preparation from 1997 to 2000 and served as the first director of the library and computer center in National University of Kaohsiung from 2000 to 2001, as the Dean of Academic Affairs from 2003 to 2006, as the Administrative Vice President from 2007 to 2008, and as the Academic Vice President from 2010 to He is currently a distinguished professor at the Department of Computer Science and Information Engineering and at the Department of Electrical Engineering. His current research interests include knowledge engineering, soft computing and granular computing. Guo-Cheng Lan received his B.S. and M.S. degrees from the Department of Information Management in Southern Taiwan University, Tainan, Taiwan, in 2004 and 2006, respectively, and his Ph.D. degree in Computer Science and Information Engineering from National Cheng Kung University, Taiwan, in He is a member of Taiwanese Association for Artificial Intelligence (TAAI). He is currently a postdoctoral research fellow in Computer Science and Information Engineering from National Cheng Kung University, Taiwan. His current research interests include data mining, medical informatics, soft computing, ontology knowledge, fuzzy theory, and www applications. Jia-Wei Wong received his B.S. degree in Computer Science and Information Engineering from National University of Kaohsiung, Taiwan, in 2010, and his M.S. degree in Computer Science and Information Engineering from National Sun Yat-sen University, Taiwan, in His research interests include privacy data mining and fuzzy theory. Wen-Yang Lin a professor at the Department of Computer Science and Information Engineering, National University of Kaohsiung. He received his Ph.D. in Computer Science and Information Engineering from National Taiwan University in From 2004 to 2007, he has chaired the Department of Computer Science and Information Engineering at National University of Kaohsiung, and served as the Director of Computer Science and Information Center from 2008 to His current research interests include data mining, data warehousing, and evolutionary computation. He has co-edited several special issues of renowned international journals, (co-)authored more than 140 refereed publications, served as cochair, a member of program committees and organized special sessions for many international conferences, including ASONAM, IEEE SMC, WCCI, and IEA/AIE. He is a member of IEEE and the Taiwanese AI Association.

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR