Utility Pattern Mining: A Concise and Lossless Representation using Up Growth+

Size: px

Start display at page:

Download "Utility Pattern Mining: A Concise and Lossless Representation using Up Growth+"

Garey Norton
5 years ago
Views:

1 Utility Pattern Mining: A Concise and Lossless Representation using Up Growth+ Anusmitha.A 1, Renjana Ramachandran 2 M. Tech PG Scholar, Department of CSE, Mangalam College of Engineering, Kottayam, India 1 Assistant professor, of CSE, Mangalam College of Engineering, Kottayam, India 2 Abstract: The ultimate aim of every individual or organization is profit/utility. Identifying high utility item set from large databases is a challenge. Several algorithms have been proposed in order to identify high utility item set, but the problem associated with them is the large number of candidate item sets. It leads to computational complexity as well as degrades the mining performances. The situation may get more complicated when the transaction contains large number of transactions. In order to overcome this situation in this paper two mew algorithms named UPGrowth+_HC_D(UP Growth+ based algorithm for mining High utility Closed item sets with Discarding isolated item sets, CHUD+(Closed+ High utility item set Discovery) are introduced. These two are a combination of closed item set mining and high utility item set mining for mining closed high utility item set with some effective strategies to prune away unpromising item sets. The algorithms perform with only two scan of the databases and it stores the information in tree structure named UP tree, IT tree. The performance of the algorithms is efficient in terms of number of candidate item sets in the phases, as well as memory utilization, especially when the database contains large transaction. Keywords: Utility pattern mining, Closed pattern mining, frequent item set mining, Lossless and concise representation, Data mining I. INTRODUCTION Data mining becomes an essential research area in the field of computer science, as it helps much more in finding interested patterns from long transactions and because of many other advantages. A fundamental research topic in data mining applications is frequent pattern mining with wide acceptance. Large number of studies has been proposed for mining frequent item sets from the databases and successfully adopted in various application domains like market basket analysis, click stream analysis etc. In market basket analysis, mining frequent item sets from a database (usually a transactional database) refers to the discovery of the item sets which frequently appear in the transactions, but the unit profits and purchased quantities of items are not considered in the framework of frequent item set mining. Hence, it cannot satisfy the requirement of the user who is interested in discovering the item sets with high sales profits. In view of this, utility mining emerges as an important topic in data mining for discovering the item sets with high utility like profits. Mining high utility item sets from the databases refers to finding the item sets with high utilities or high profits. The basic meaning of utility is the interestedness /importance /profitability of items to the users. The utility of items in a transaction database consists of several aspects like importance of distinct items, which is called external utility, importance of the items in the transaction, which is called internal utility. An item set is called a high utility item set if its utility is no less than a user specified threshold; otherwise, the item set is called a low utility item set. Mining high utility item sets from databases is an important task which is essential to a wide range of applications such as website click streaming analysis, cross-marketing in retail stores, business promotion in chain hypermarkets and even biomedical applications. Mining high utility item set is an important and efficient method, even though it is not easy since Downward closure property [2]does not hold. So when databases with large transactions are used, the methods suffer from the problem of large search space. Closed item set mining is an efficient method. If an item is said to be closed, it will not have a superset with same frequency, so it will generate a concise representation. Many studies focused on developing concise representations such as free sets, no derivable sets[5], odds ratio patterns[7], closed patterns[9], closed item sets etc. The integration of high utility item set and closed pattern mining are one of the best combination, where it produce a lossless and concise representation. However there may have some challenges, they are 1) Even though it produce a lossless representation, it may be meaningless to the users. 2) The number of extracted pattern may be high and thus produce problem of large search space. 3) They may be slower than the best algorithms for mining high utility item sets. Here, all these challenges are addresses by proposing a concise and meaningful representation of high utility item sets. It provides several advantages like 1) The proposed representation is lossless due to the structure UP Tree and concise due to IT Tree that allows recovering all HUIs and their utilities efficiently. Copyright to IJARCCE DOI /IJARCCE

Inherits some nice properties from the well-known Apriori algorithm. The CHUD algorithm includes three novel strategies named REL, RML [1] and DCM [1] that greatly enhance its performance.

2 2) The proposed representation is also compact. Experiments show that it reduces the number of candidate item sets. 3) Two algorithms named UPGrowth+_HC_D (UP Growth based algorithm with discarding unpromising and isolated items) and CHUD+ (Closed+ High Utility item set Discovery) to find this representation. Inherits some nice properties from the well-known Apriori algorithm. The CHUD algorithm includes three novel strategies named REL, RML [1] and DCM [1] that greatly enhance its performance. Remainder of the paper is organized as follows. Section II describes the background of the study, preliminary definitions and other important data are discussed in the section III, II. BACKGROUND In this section the motivation of the study, preliminaries as well as the related works are included. A. MOTIVATION Frequent item set mining, Closed pattern mining [1],[9] and utility mining leads to the discovery of associations and correlations among items in large transactional or relational data sets. With massive amounts of data continuously being collected and stored, many industries are becoming interested in mining such patterns from their databases. The discovery of interesting correlation relationships among huge amounts of business transaction records can help in many business decision-making processes, such as catalog design, cross-marketing, and customer shopping behavior analysis. A typical application of utility based closed item set mining is market basket analysis. This process analyzes customer buying habits by finding associations between the different items that customers place in their shopping baskets (Figure 1) and how much utility it provides. Thus he can arrange the items in a précised, concise manner with less loss, along with that better utility/profit can be achieved. Figure 1. Market Basket Analysis For example if a customer buys two pen and four pencils, if only the frequency of item set are checked, the profit will get reduce because pen may cost than pencil, so while analyzing market along with frequency, utility can also be considered so that better results can be achieved. B. RELATED WORKS Many works have been proposed in the field of frequent pattern mining. The most famous are the association rule mining and sequential pattern mining. The well known algorithms are Apriori[3] which is the pioneer for the frequent pattern mining, Frequent Pattern Growth(FP Growth) was a better move in frequent pattern, which improves better performance, Weighted association rule where unlike other frequent pattern mining, user importance about the items were also considered, Two Phase etc. The problems associated with frequent pattern mining are, they may not ensure downward closure property, they does not consider the utility of the items and it may generate large number of candidate item sets. In order to solve the problems, the utility pattern mining was introduced. Several algorithms like IHUP[6],[8],[9], IHUP L, IHUP TF, IHUP TWU [6],[8],[9],[13]etc were introduced. UP Growth and UP Growth+ [2],[1]were two strong algorithms introduced in this field. Closed item set mining is another method of mining items from large databases. Many efficient algorithms are introduce in this field like ECLAT, DCI-CLOSED [9] etc. The utility pattern mining and closed item set mining are two important moves in the field of data mining, while combining these two together, the above mentioned problems like large number of item sets, high computational cost, large search space etc can be solved to a large extend. The Apriori_HC[1], Apriori_HC_D[1], CHUD[1] etc are the best available algorithms in the area of closed high utility item set mining. C. HIGH UTILITY ITEM SET MINING Definition 1 (Support of an item set). [1]The support count of an item set X is defined as the number of transactions containing X in D and denoted as SC(X). The support of X is defined as the ratio of SC(X) to D. The complete set of all the item sets in D is defined as L ={X X I; SC(X) > 0}. Definition 2 [2](Absolute utility of an item in a transaction). The absolute utility of an item ai in a transaction TR is denoted as au(a i, TR) and defined as p(ai,d) x q(a i, TR). Definition 3 [1](Absolute utility of an item set in a transaction). The absolute utility of an item set X in a transaction TR is defined as au(x, TR)= Ʃa i X au(a i, TR). Definition 4 [2](Transaction utility and total utility). The transaction utility (TU) of a transaction TR is defined as TU(TR)= au(tr,tr). The total utility of a database D is denoted as TotalU and defined as au(x,tr)=ʃtr DTU(TR) Definition 5 [1] (Absolute utility of an item set in a database). The absolute utility of an item set X in D is defined as au(x)= ƩX TR^TR D u(x,tr). The (relative) utility of X is defined as u(x)= au(x)/totalu. Definition 6 [1] (High utility item set). An item set X is called high utility item set iff u(x) is no less than a userspecified minimum utility threshold min_utility (0% < min util _ 100%). Otherwise, X is a low utility item set. Copyright to IJARCCE DOI /IJARCCE

The transactionweighted utilization (TWU) of an item set X is the sum of the transaction utilities of all the transactions containing X, which is denoted as TWU(X) and defined as TWU(X)= ƩX TR^TR D

3 Definition 7 [1] (Complete set of HUIs in the database). Let S be a set of item sets and a function fh(s)={x X S, u(x) min utility}. The complete set of HUIs in D is denoted as H(H L) and defined as fh(l). The problem of mining HUIs is to find the set H in D. Definition 8 [2] (TWU of an item set). The transactionweighted utilization (TWU) of an item set X is the sum of the transaction utilities of all the transactions containing X, which is denoted as TWU(X) and defined as TWU(X)= ƩX TR^TR D TU(TR). Definition 9 [1] (HTWUI). An item set X is a high transaction weighted utilization item set (HTWUI) iff TWU(X) abs min utility. An HTWUI of length K is abbreviated as K-HTWUI. Example 1 (High Utility Item sets). [1] Let Table 1 be a database containing five transactions. Each row in Table 1 represents a transaction, in which each letter represents an item and has a purchase quantity (internal utility). The unit profit of each item is shown in Table 2 (external utility). In Table 1, the absolute utility of the item {F} in the transaction T3 is au{f, T3}= p({f},d) x q({f}, T3)=3x2=6. The absolute utility of {BF} in T3 is au({bf}, T3)=au({B}, T3) + au({f}, T3)= = 7. The absolute utility of {BF} is au({bf} = u({bf}, T3) u({bf}, T5) = 17. If abs min utility = 10, the set of HUIs in Table 1 is H = {{E }: 12, {F} : 15, {AE} : 10, {AF} : 7, {BE} : 10, {BF} : 17,{ABE} : 12, {ABF} : 19}, where the number beside each item set is its absolute utility. TABLE I EXAMPLE DATABASE TID Transaction Transaction Utility(TU) T1 A(1),B(1),E(1),W(1) 5 T2 A(1),B(1),E(3) 8 T3 A(1),B(1),F(2) 8 T4 E(2),G(1) 5 T5 A(1),B(1),F(3) 11 TABLE 2 PROFIT DATABASE Item A B E F G W Unit Profit D. CLOSED ITEM SET MINING Some definitions about closed item set mining are introduced in this section. For more details refer to Definition 10 [1] (Tidset of an item set). The Tidset of an item set X is denoted as g(x) and defined as the set of Tids of transactions containing X. The support count of X is expressed in terms of g(x) as SC(X) = g(x). Definition 11 (Closure of an item set). [1] The closure of an item set X L, denoted as C(X), is the largest set Y L such that X Y and SC(X)= SC(Y). Alternatively, it is defined as C(X)= R g(x)tr. Definition 12 (Closed item set). [1] An item set X L is a closed item set iff there exists no item set Y L such that (1) X Y and (2) SC(X)=SC(Y). Otherwise, X is nonclosed item set. An equivalent definition is that X is closed iff C(X) = X. For example, in the database of Table 1, {B} is non-closed because C({B}) = T1 T2 T3 T5 = {AB}. Definition 13 [1] (Complete set of closed item set in the database). Let S be a set of item sets and a function fc(s)= {X X S, ᴲY S such that X C Y and SC(X) = SC(Y ).The complete set of closed item sets in D is denoted as C(C L) and defined as fc(l). For example, the set of closed item sets in Table 1 is fc(l) = {{E} : 3, {EG} : 1, {AB} : 4, {ABE} : 2, {ABF} : 2, {ABEW} : 1}, in which the number beside each item set is its support count. The supersets of {B} in fc(l) are {AB} : 4, {ABE} : 2, {ABF}:2 and {ABEW} : 1. Thus, SC({B}) = max{4,2, 2, 1} = 4. III. UPGROWTH+_HC_D AND CHUD+ : EFFICIENT CLOSED HIGH UTILITY MINING ALGORITHMS This section describes about how much better result can be achieved by combining the concepts of high utility item set mining and closed item set mining. It is theoretically proven that this new representation is lossless, meaningful and concise. The figure 2 shows the block diagram of the proposed algorithms. The framework includes two algorithms UPGrowth+_HC_D, CHUD+. They completely rely on the TWU models and implements effective strategies to mine high utility closed item sets. Both the algorithms consist of two phases, where the high utility items are mined in the phase1 and closed high utility item sets are mined in the second phase. The algorithm UPGrowth+_HC_D include two phases, initially in the first phase, high utility items are mined. In the second phase, closed high utility item sets are mined. It uses a horizontal database. On the other hand CHUD+ is an extension of CHUD. It include two phases where high utility items are mined, in the second phase closed item sets are mined from the high utility item sets. It uses vertical database. Figure 2. Block diagram of proposed system Copyright to IJARCCE DOI /IJARCCE

4 Definition 14 [1] (Closed high utility item set). We define the set of closed high utility item sets as HC = {X X L,X = C(X), u(x) min utility}, HC = H = C. An item set X is called non-closed high utility item set iff X H and X C. For example, if abs min utility = 10, the complete set of closed HUIs in Table 1 is HC = {{E}, {ABE}, {ABF}}. A. THE UPGROWTH+_HC_D ALGORITHM The algorithm works as follows, initially it performs a database scan, identifies the internal utility and external utility. It computes the Transaction Utility(TU) of the transactions, using the transaction utilities, it calculates the Transaction Weighted Utility by adding the TUs of the transaction. Based on the min_utility, the algorithm prune away the unpromising item sets Definition 15 (Promising item)[2]. An item ip is a promising item iff TWU (ip) abs min utility. Otherwise, it is an unpromising item. Algorithm then applies DGU strategy to remove the global utilities from the database and create a global UPTree (Utility Pattern Tree). Then conditional patternbase of items are created and thus removes the local unpromising item sets. Finally it creates a UPTree with less number of candidate with high utility item sets in phase1 Strategy 1. DGU [2] (Discarding Global Unpromising items) [24]. Discard global unpromising items and their exact utilities from transactions and transaction utilities of the database, respectively. Rationale. By Property 10, unpromising items play no role in CHUIs. Therefore, unpromising items can be removed from each transaction TR and their absolute utilities canbe subtracted from TUðTRÞ. Thus, the utilities of unpromising items can be ignored in the calculation of the estimated utilities of item sets (i.e., TWU). For more details about the strategy DGU, readers can refer to [24]. The second Phase in which it identifies the closed item sets by applying the strategy IIDS(Isolated Item set Discarding Strategy). The second strategy is based on the following definition and properties. Finally Closed High utility item set will be the result of phase2 Definition 16 (Isolated item) [9]. Let Lk be the set of HTWUIs of length k, an item io is called an isolated item of level k iff io I not contained in any item set in Lk. Strategy 2. IIDS [12] (Isolated Items Discarding Strategy) [19] : Discard isolated items and their actual utilities from transactions and transaction utilities of the database. ALGORITHM: UPGROWTH+_HC_D Input: D: the database, abs_min_utility; 1. CHUI = Ø 2. Scan D 3. UPGrowth+_HC_D_Phase1(D,abs_min_utility) 4. UPGrowth+_HC_D_PhaseII(D,UPTree, abs_ min_ utility) Figure3. UPGrowth+_HC_D algorithm ALGORITHM: UPGROWTH+_HC_D_Phase1 Input: D: the database, abs_min_utility; Output: The complete set of HUIs, UPTree 1. Scan D 2. For(i=0;i<d.length;i++) 3. { 4. Compute TU 5. Compute TWU 6. D1= DGU_Strategy(D) 7. T1= Tree construct(d1) 8. } 9. Return(T1) Figure4. UPGrowth+_HC_D_Phase1 ALGORITHM: UPGROWTH+_HC_D_Phase11 Input: D1: the database, abs_min_utility, T1: the UPTree; 1. Scan T1 2. For(i=0;i<length(T1);i++) 3. { 4. I1=IIDS_Strategy(Ti) 5. CHUI=I1 6. } Figure 5. UPGrowth+_HC_D_pahse11 B. THE CHUD+ ALGORITHM The CHUD+ algorithm, which is an efficient algorithm is inspired from CHUD algorithm. CHUD+ is introduced in order to mine closed high utility item sets with effective strategies for reducing the number of candidate item sets. It uses a tree structure called IT Tree(Item set-tid Tree). Phase1 of the algorithm works as follows, It starts with a database scan, it then computes the Transaction Utility and Transaction Weighted Utilities of the items. The unpromising item sets will be prune away using the min_utility threshold. Closed item sets are mined after finding the high utility item set mining. Then the remaining data will be converted into a tree structure called IT Tree(Item set-tid Tree), where it stores the item set, tid of every item. It will be a vertical database too. Along with that two ordered sets are also used POST- SET(X) and PREV_SET(X). For each item a i from the IT Tree, it create a node N{(ai)} and puts items which are frequent in the POST_SET(X). After that the nonduplicate frequent item sets are explored from POST_SET(X) and puts them in PRE_SET(X). By using PRE_SET(X) values a local IT Tree will be created. This will be the input to the second phase ALGORITHM: CHUD+ Input: D: The database; abs_min_utility; 1. initial databases can(d) 2. Remove Utility Unpromising Items(O,GTU) 3. for each item ak O do 4. { 5. create node N({ak}) 6. CHUD+_Phase_I (N{ak}), GIT, abs_min_utility) 7. } 8. CHUD+_Phase_I(LIT,GIT,abs_min_utility) Figure 6. CHUD+ Algorithm Copyright to IJARCCE DOI /IJARCCE

5 Candidate Items ISSN (Online) In the phase three effective strategies are used to select the closed high utility item sets and also to reduce the number of candidate item sets. It takes the tree as input and apply the REI, RML, DCM strategies. Strategy 3. REI (Removing the Exact utilities of items from the Global IT Tree). Each time that an item ak O has been processed in the main procedure this strategy is applied by calling the REI_Strategy procedure. The procedur is called with g(a k ) and the global utility table GTU. removes the utility of ak from the transaction utility of each transaction containing ak in the global IT-Tree. Strategy 4. RML [1] (Removing the Mius of items from Local IT- Tree). The strategy consists of using a local IT- Tree TUX for each node N(X) in the IT-Tree. Let N(X) be the current node being processed by Explore and N(Y) be a child node of N(X) that has been created by appending an item ak from POST-SET(X) to X such that Y = X U {a k }. The strategy is applied after the Explore procedure by calling the RML_Strategy sub-function. The RML Strategy procedure takes as parameters, he transaction utility of N(X) and, the set of transactions that contains Y. The procedure first removes miu{a k } from the transaction utility of each transaction containing a k in TUX. The updated local TU-Table TUX is used for all child nodes of N(X). This process reduces the estimated utility of N(X) and that of its child nodes. Besides, miu{ak} x SC(Y) is removed from EstU (X). If the updated EstU(X) is less than abs_min_utility, the algorithm will not process X U {a k } for each following item a k POST-SET(X). ALGORITHM: CHUD+_Phase I strategy is called DCM and is applied in line 3 of the CHUD Phase- I procedure. A candidate XC can be discarded from Phase II if its estimated utility EstU(X C ) or MAU(X C ) is less than abs_min_utility. The procedure takes as parameter an item set X C. It first computes the maximum utility MAU(X C ) of XC. Then if EstU(XC) or MAU(X C ) is no less than abs_min_utility, X C is output with its estimated utility. IV. EXPERIMENTAL EVALUATION The section evaluate the performance of the proposed algorithms with UPGrowth[2] from which the UPGrowth+ evolved. Experiments are performed on a desktop computer with Intel Core2 Duo Processor with Windows8 operating system along with 2GB RAM. The algorithms were implemented in java. MySQL is used as a backend. Real life datasets like mushroom dataset and footmart datasets are used for the experimental evaluations. The footmart dataset is a real-life dataset obtained from the Microsoft footmart 2000 database. The experiments are done on the basis of the number of candidate item sets generated from the first phase of the algorithms and the memory usage/search space. As the both algorithms use UPGrowth+ strategies to extract the high utility item set, when compared to other algorithms, number of candidate item sets is very less. The figure shows the number of item sets produced while using UP Growth and UP Growth+ in phase1 of the algorithms to extract the high utility item sets. It clearly shows that while using UP Growth+, it achieves better performance. Input: D: The database; abs_min_utility; Output: The complete set of HUIs 1. initial databases can(d) 2. Calculate TU,TWU 3. Remove unpromising item set 4. for each item ak O do 5. { 6. DGU_Strategy(D) 7. D1=Select closed itensets 8. ITTree_construct(D1) 9. P=compute POST_SET(x) 10. Pr=compute PRE_SET(x) 11. T1=ITTree _construct(p1) 12. } Figure 7. CHUD+_Phase_I 1 ALGORITHM: CHUD+_Phase II Input: abs_min_utility; T1:Local IT Tree 1. initial scan(t1) 2. T1=REG_Strategy(T1) 3. T2=RML_Strategy(T2) 4. T3=DCM_Strategy(T3) 5. Return T3 Figure 8. CHUD+_Phase II Strategy 5. DCM [1] (Discarding Candidates with a MAU that is less than the minimum utility threshold). The last Number of Candidate itemsets in 12 phase UPGrowth +_HC_D Phase1 Figure 9. Number of candidates in phase 1 of UPGrowth+_HC_D UPGrowth UPGrowth +_HC_D Figure 10(a) Execution time of UPGrowth+_HC_D Copyright to IJARCCE DOI /IJARCCE

6 The memory usage or search space used by the both algorithms are shown in figure a and figure b. The figure a shows the execution time needed for UPGrowth+_HC_D when compared to UPGrowth algorithm. The figure b shows the execution time needed for the CHUD+ when compared to CHUD and UPGrowth CHUD+ CHUD UPGrowth Figure 10(b) Execution time of CHUD+ V. CONCLUSION The important research areas in data mining are frequent pattern mining and utility mining, both of these are important when considering a market analysis or some other business application. While combining both the frequent pattern with utility mining it provides better result, reduced computational overheads and satisfaction to the users. The algorithms like AprioriHC, AprioriHC-D, and CHUD etc provide high utility item sets from large datasets. They provide less computational overheads concise and lossless representation of high utility item sets. Two new method named UPGROWTH+_HC_D and CHUD+ to reduce the number of candidate in phase I as well as the less memory consumption. The results are analyzed based on the algorithm and dataset selected. In this work, we only study the integration of closed item set mining and high utility item set mining. However, there are many other compact representations that have not yet been combined with high utility item set mining. Although closed+ high utility item set mining is essential to many research topics and industrial applications, it is still a novel and challenging problem. Other related research issues are worthwhile to be explored in the future. VI. ACKNOWLEDGEMENT The first author would like to thanks all those people, who guided and supported. Without their valuable guidance and support, this task was not possible and also likes to thank colleagues for their discussions and suggestions. REFERENCES [1] Vincent S. Tseng, Cheng-Wei Wu, Philippe Fournier-Viger, and Philip S. Yu, Fellow, IEEE. Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Item sets IEEE Transactions On Knowledge And Data Engineering, Vol. 27, No. 3, March 2015 [2] V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, UP- Growth: AN efficient algorithm for high utility item set mining, in Proc. AC SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2010, pp S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, A novel ultrathin elevated channel low-temperature poly-si TFT, IEEE Electron Device Lett., vol. 20, pp , Nov [2] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in Proc. 20th Int. Conf. Very Large Data Bases, 1994, pp [3] C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowl. Data Eng., vol. 21, no. 12, pp , Dec [4] J.-F. Boulicaut, A. Bykowski, and C. Rigotti, Free-sets: A condensed representation of Boolean data for the approximation of frequency queries, Data Mining Knowl. Discovery, vol. 7, no. 1, pp. 5 22, [5] T. Calders and B. Goethals, Mining all non-derivable frequent item sets, in Proc. Int. Conf. Eur. Conf. Principles Data Mining Knowl. Discovery, 2002, pp [6] K. Chuang, J. Huang, and M. Chen, Mining top-k frequent patterns in the presence of the memory constraint, VLDB J., vol. 17, pp , [7] R. Chan, Q. Yang, and Y. Shen, Mining high utility item sets, in Proc. IEEE Int. Conf. Data Min., 2003, pp [8] A. Erwin, R. P. Gopalan, and N. R. Achuthan, Efficient mining of high utility item sets from large datasets, in Proc. Int. Conf. Pacific- Asia Conf. Knowl. Discovery Data Mining, 2008, pp [9] T. Hamrouni, Key roles of closed sets and minimal generators in concise representations of frequent patterns, Intell. Data Anal., vol. 16, no. 4, pp , [10] J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2000, pp [11] T. Hamrouni, S. Yahia, and E. M. Nguifo, Sweeping the disjunctive search space towards mining new exact concise representations of frequent item sets, Data Knowl. Eng., vol. 68, no. 10, pp , [12] H.-F. Li, H.-Y. Huang, Y.-C. Chen, Y.-J. Liu, and S.-Y. Lee, Fast and memory efficient mining of high utility item sets in data streams, in Proc. IEEE Int. Conf. Data Mining, 2008, pp [13] C.-W. Lin, T.-P. Hong, and W.-H. Lu, An effective tree structure for mining high utility item sets, Expert Syst. Appl., vol. 38, no. 6, pp , [14] G.-C. Lan, T.-P. Hong, and V. S. Tseng, An efficient projectionbased indexing approach for mining high utility item sets, Knowl. Inf. Syst, vol. 38, no. 1, pp , [15] H. Li, J. Li, L. Wong, M. Feng, and Y. Tan, Relative risk and odds ratio: A data mining perspective, in Proc. ACM SIGACT- SIGMOD- SIGART Symp. Principles Database Syst., 2005, pp [16] B. Le, H. Nguyen, T. A. Cao, and B. Vo, A novel algorithm for mining high utility item sets, in Proc. 1st Asian Conf. Intell. Inf. Database Syst., 2009, pp [17] Y. Liu, W. Liao, and A. Choudhary, A fast high utility item sets mining algorithm, in Proc. Utility-Based Data Mining Workshop, 2005, pp [18] C. Lucchese, S. Orlando, and R. Perego, Fast and memory efficient mining of frequent closed item sets, IEEE Trans. Knowl. Data Eng., vol. 18, no. 1, pp , Jan [19] Y.-C. Li, J.-S. Yeh, and C.-C. Chang, Isolated items discarding strategy for discovering high utility item sets, Data Knowl. Eng., vol. 64, no. 1, pp , [20] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Efficient mining of association rules using closed item set lattice, J. Inf. Syst., vol 24, no. 1, pp , [21] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, H-mine: Fast and space-preserving frequent pattern mining in large databases, IIE Trans., vol. 39, no. 6, pp , Jun [22] B.-E. Shie, H.-F. Hsiao, V. S. Tseng, and P. S. Yu, Mining high utility mobile sequential patterns in mobile commerce environments, in Proc. Int. Conf. Database Syst. Adv. Appl., 2011, vol. 6587, pp [23] B.-E. Shie, V. S. Tseng, and P. S. Yu, Online mining of temporal maximal utility item sets from data streams, in Proc. Annu. ACM Symp. Appl. Comput., 2010, pp Copyright to IJARCCE DOI /IJARCCE

7 BIOGRAPHIES Anusmitha.A, Department of Computer Science & Engineering, Mangalam College of Engineering, Ettumanoor, Kerala, India. Renjana Ramachandran, Assistant Professor, Department of Computer Science and Engineering, Mangalam College of Engineering, Ettumanoor, Kerala, India Copyright to IJARCCE DOI /IJARCCE

CHUIs-Concise and Lossless representation of High Utility Itemsets

CHUIs-Concise and Lossless representation of High Utility Itemsets Vandana K V 1, Dr Y.C Kiran 2 P.G. Student, Department of Computer Science & Engineering, BNMIT, Bengaluru, India 1 Associate Professor,