Utility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets

Similar documents
RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree

Generation of Potential High Utility Itemsets from Transactional Databases

An Efficient Generation of Potential High Utility Itemsets from Transactional Databases

AN EFFECTIVE WAY OF MINING HIGH UTILITY ITEMSETS FROM LARGE TRANSACTIONAL DATABASES

Utility Mining Algorithm for High Utility Item sets from Transactional Databases

CHUIs-Concise and Lossless representation of High Utility Itemsets

Infrequent Weighted Item Set Mining Using Frequent Pattern Growth

Mining High Utility Itemsets from Large Transactions using Efficient Tree Structure

An Efficient Algorithm for finding high utility itemsets from online sell

Efficient Algorithm for Mining High Utility Itemsets from Large Datasets Using Vertical Approach

UP-Growth: An Efficient Algorithm for High Utility Itemset Mining

Improved UP Growth Algorithm for Mining of High Utility Itemsets from Transactional Databases Based on Mapreduce Framework on Hadoop.

Enhancing the Performance of Mining High Utility Itemsets Based On Pattern Algorithm

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

Utility Pattern Approach for Mining High Utility Log Items from Web Log Data

Minig Top-K High Utility Itemsets - Report

Keywords: Frequent itemset, closed high utility itemset, utility mining, data mining, traverse path. I. INTRODUCTION

Implementation of Efficient Algorithm for Mining High Utility Itemsets in Distributed and Dynamic Database

Mining High Average-Utility Itemsets

Heuristics Rules for Mining High Utility Item Sets From Transactional Database

AN ENHNACED HIGH UTILITY PATTERN APPROACH FOR MINING ITEMSETS

FUFM-High Utility Itemsets in Transactional Database

A Survey on Efficient Algorithms for Mining HUI and Closed Item sets

JOURNAL OF APPLIED SCIENCES RESEARCH

Implementation of CHUD based on Association Matrix

Improved Frequent Pattern Mining Algorithm with Indexing

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets

Mining of High Utility Itemsets in Service Oriented Computing

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

A Review on High Utility Mining to Improve Discovery of Utility Item set

Efficient High Utility Itemset Mining using extended UP Growth on Educational Feedback Dataset

Mining of Web Server Logs using Extended Apriori Algorithm

A Review on Mining Top-K High Utility Itemsets without Generating Candidates

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning

High Utility Web Access Patterns Mining from Distributed Databases

Mining High Utility Itemsets in Big Data

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

SIMULATED ANALYSIS OF EFFICIENT ALGORITHMS FOR MINING TOP-K HIGH UTILITY ITEMSETS

Data Mining Part 3. Associations Rules

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES

A New Method for Mining High Average Utility Itemsets

UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases

Appropriate Item Partition for Improving the Mining Performance

Systolic Tree Algorithms for Discovering High Utility Itemsets from Transactional Databases

High Utility Itemset Mining from Transaction Database Using UP-Growth and UP-Growth+ Algorithm

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,

Adaption of Fast Modified Frequent Pattern Growth approach for frequent item sets mining in Telecommunication Industry

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

A Modern Search Technique for Frequent Itemset using FP Tree

Mining Top-K High Utility Itemsets

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

High Utility Itemset mining using UP growth with Genetic Algorithm from OLAP system

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

Maintenance of the Prelarge Trees for Record Deletion

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Design of Search Engine considering top k High Utility Item set (HUI) Mining

Efficient Mining of a Concise and Lossless Representation of High Utility Itemsets

Improved Algorithm for Frequent Item sets Mining Based on Apriori and FP-Tree

Incrementally mining high utility patterns based on pre-large concept

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

Mining High Utility Patterns in Large Databases using MapReduce Framework

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Data Structure for Association Rule Mining: T-Trees and P-Trees

ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS

Discovery of High Utility Itemsets Using Genetic Algorithm

Memory issues in frequent itemset mining

Efficient Tree Based Structure for Mining Frequent Pattern from Transactional Databases

Mining Weighted Association Rule using FP tree

Sensitive Rule Hiding and InFrequent Filtration through Binary Search Method

FP-Growth algorithm in Data Compression frequent patterns

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

Performance Analysis of Data Mining Algorithms

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

Mining Frequent Patterns with Counting Inference at Multiple Levels

ANALYSIS OF DENSE AND SPARSE PATTERNS TO IMPROVE MINING EFFICIENCY

An Improved Apriori Algorithm for Association Rules

ETP-Mine: An Efficient Method for Mining Transitional Patterns

Item Set Extraction of Mining Association Rule

DESIGNING AN INTEREST SEARCH MODEL USING THE KEYWORD FROM THE CLUSTERED DATASETS

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Improved Version of Apriori Algorithm Using Top Down Approach

Efficient Mining of Uncertain Data for High-Utility Itemsets

Enhanced SWASP Algorithm for Mining Associated Patterns from Wireless Sensor Networks Dataset

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

Discovery of Frequent Itemsets: Frequent Item Tree-Based Approach

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

Closed Pattern Mining from n-ary Relations

Mining Frequent Patterns with Screening of Null Transactions Using Different Models

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

Transcription:

Utility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets C. Sivamathi 1, Dr. S. Vijayarani 2 1 Ph.D Research Scholar, 2 Assistant Professor, Department of CSE, Bharathiar University, Coimbatore, India 1 c.sivamathi@gmail.com 2 vijimohan_2@yahoo.com Abstract: Efficient discovery of high utility itemsets from a database is defined as utility mining. An itemset whose utility value is greater than utility threshold is known as high utility itemsets. In recent years, several algorithms have been proposed in utility mining. In a traaction database, utility mining algorithms extract high profitable itemsets. In a huge volume traaction database, utility mining produces a large set of itemsets. To retrieve compact patter, maximal high utility itemsets are introduced. An itemset is defined as maximal high utility itemsets, if there is no proper subset of this itemsets are high utility itemsets. The main objective of this work is to retrieve maximal high utility itemsets from a traaction database using UP Growth algorithm. This work uses UP Growth algorithm approach to store utility information of itemsets. From the resultant high utility itemsets, it retrieves only the maximal high utility itemsets. A Chess dataset is used for performance analysis of the algorithm. An experimental result discusses the performance factors like execution time, Memory space and number of maximal high utility items retrieved by the algorithm. Keywords: Utility Mining, High Utility Itemsets, UP Growth, Maximal High Utility Itemsets, Pruning strategy. 126 I. INTRODUCTION Utility mining results a huge number of high utility itemsets. It needs to apply post-processes on these itemsets to discover necessary high utility itemsets [1]. Moreover some itemsets may be irrelevant to users. Hence compact patter like maximal and closed itemsets are introduced. An itemset is called maximal if it is not a subset of any other high utility itemsets. An itemset is called closed if it has no superset with the same utility value. These two compact patter do not reduce the huge number of utility itemsets, but it provide meaningful results. Moreover if a traaction database is very dee and the minimum utility threshold very low, then mining all high utility item sets might not be a good idea. For example, if there is a high utility item set with size n, then all 2n nonempty subsets of the item set have to be generated. Hence if subset of item set is high utility, it is sufficient to discover only all the maximal high utility item sets. Mining high utility item sets can thus be reduced to mine a border in the item set lattice. All item sets above the border are high utility itemset and those that are below the border are low utility itemset. However, mining only maximal itemsets has the following deficiency: From a maximal itemset and its utility threshold, it is known that all its subsets are high utility and the utility threshold is more than minimum threshold, but the exact utility value is not known. In this work, the algorithm discover maximal high utility itemsets, which are not only high utility but also maximal itemset. The algorithm is based on UP- Growth algorithm. As per our knowledge, this is the first work to retrieve maximal high utility itemsets in data mining. II. RELATED WORK Association rule mining is coidered to be an interesting research area and studied widely [2] [3] by many researchers. In the recent years, some relevant methods have been proposed for mining high utility itemsets from traaction databases. In 1994, Agrawal.R et al. proposed Apriori algorithm by exploit downward closure property [3][4] which is the pioneer for efficiently mining association rules from large databases. This algorithm generated and tested candidate itemsets iteratively. This may scan database multiple times, so the computational cost is high. In order to overcome the disadvantages of Apriori algorithm and efficiently finds frequent itemsets without generating candidate itemsets, a frequent pattern Growth (FP-Growth) is proposed by Han.J et al. [5]. The FP-Growth was used to compress a database into a tree structure which shows a better performance than Apriori. Although it has two limitatio: (i) It treats all items with the same price. (ii) In one traaction each item appears in a binary (/1) form, i.e. either present or absent. In the real world, each item in the supermarket has a different prices and single customer may take same item multiple times. Therefore, finding only traditional frequent patter in a database cannot fulfill the requirement of finding the most valuable customers/itemsets that contribute the most to the total profit in a retail business. In 26, H. Yao et al. proposed UMining [6] algorithm to find almost all the high utility itemsets from an original database. But it suffers to capture a complete set of high utility itemsets. Later, in 21 V. S. Tseng et al. [7] proposed UP-Growth algorithm to rectify the problems of FP-Growth. Another algorithm named Two-Phase is able to find high utility itemsets. The Two-Phase algorithm is used to prune down the number of candidates and obtain the complete set of high utility itemsets. In the first phase, traaction-weighted downward closure property of search space is used to expedite the identification of candidates. In the second phase, one extra database scan

is performed to identify high utility itemsets. However, this algorithm cannot deal with negative item values in utility mining. In order to find high utility itemsets with negative item values some candidate itemsets are lost. Hence, the Two-Phase algorithm focuses on positive item values and is not suited to negative item values in utility mining [8]. An algorithm named THUI (Temporal High Utility Itemsets) was proposed [9] and it is the first algorithm for finding temporal high utility itemsets in temporal databases. The algorithm integrated the advantages of the Two-Phase algorithm and the SWF algorithm and augment with the incremental mining techniques for mining temporal high utility itemsets efficiently. However, this algorithm only focuses on high utility itemsets with positive item values and is not suited to negative item values. Hence, the algorithm cannot find high utility itemsets with negative item values. Lin et al. first developed the HAUP-tree structure and the HAUP-growth algorithm for mining HAUIs. In the HAUP-tree, each node at the end of a path stores the average-utility upper bound of the corresponding item as well as the quantities of the preceding items in the same path. This approach can thus be used to speed up the discovery of HAUIs [1]. Lan et al. [11] proposed a projection-based averageutility itemset mining (PAI) algorithm to reveal HAUIs using a level-wise approach. Based on the proposed upper-bound model, the number of unpromising candidates can be greatly reduced compared to previous work based on the TWU model. 127 III. PROPOSED METHODOLOGY The UP-Growth is one of the efficient algorithms to generate high utility itemsets. It uses a tree structure called, global UP-Tree. This UPMAXTree maintai the traactio information, so that there is no need to scan the database again and again. In aupmaxtree, each node has node s item name, support count, parent name, node link to which it points to a node and a set of child nodes. The UPMAXTree maintai a table named Header Table. In header table, each entry records an item name, an overestimated utility, and a link. The cotruction of a global UP-Tree is performed with two database sca. In the first scan, Traaction Utility (TU) of all traactio and Traaction Weighted Utility (TWU) of all items are calculated. An item whose TWU is less than minimum utility threshold is said to be unpromising item. An unpromising item and all its supersets are not high utility itemsets. DGU (Discarding Global Unpromising Items during Cotructing a Global UP-Tree) states discard global unpromising items and their actual utilities from traactio and traaction utilities of the database. The traactio are ierted into aupmaxtree in the second scan. When a traaction is retrieved, the unpromising items are removed from the traaction and their utilities are also eliminated from the traactio utility (TU). Thus, new TU are calculated after pruning unpromising items which are called reorganized traaction utility (abbreviated as RTU). Then, Reorganized will be cotructed with the RTU [1]. DGN (Decreasing Global Node Utilities during Cotructing a Global UP-Tree) states decrease the global node utilities for the nodes of global UP-Tree by actual utilities of descendant nodes. DGN is especially suitable for the databases containing lots of long traactio. In other words, a traaction contai more items; more utilities can be discarded by DGN. PHUI itemsets are retrieved from the tree. The PHUI is similar to TWU, which compute all itemsets utility with the help of estimated utility. Finally, identify high utility itemsets from PHUIs values. The global UP-Tree contai many sub paths. Each path is coidered from bottom node of header table. This path is named as conditional pattern base (CPB). Traaction Id Table I. Example Traaction Database Utility of Traaction T1 (i1,1) (i2,3) (i3,2) 19 T2 (i1,2) (i3,3) (i4,1) 21 T3 (i2,2) (i3,1) 1 T4 (i2,3) (i3,1) (i4,3) (i5,2) 32 T5 Item (i1,1) (i2,3) (i3,2) (i4,1) (i5,2) Table III. Profit of Items 28 Profit (utility) i1 2 i2 3 i3 4 i4 5 i5 2 Item Table IIIII. TWU of Items TWU i1 68 i2 89 i3 11 i4 81 i5 6 Calculate TWU of all items, which is the sum of traaction utility of the item in which it appears. This is shown in table 3. Here minimum utility threshold was chosen as 7, hence the unpromising items i1 and i5 are

removed from database and RTU was cotructed and this is shown in table 4. The items are arranged in descending order with respect to their TWU. Now cotruction of UPMAXTree is done and this is given in figure 1. Bottom-up tracing: Each branch in the tree is traced from the leaf node. First, a pointer is set to the leftist leaf node.the npointer moves from the leaf node to root node. For checking a node N, if N s utility is larger than or equal to minimum utility threshold, then the itemset is put to a high utility itemset list.also, the ancient nodes from N to the root are also labeled as checked, i.e., these nodes do not need to be checked since they are impossible to be maximal. If N s utility is less than MinU, the pointer goes to the parent node of N. After all branches are checked, the process is finished. From the figure utility itemsets are: {2} {3} {2, 3} { 4 3} {4,2} {4, 2, 3}. Maximal utility itemsets is {4 2 3}. Pseudo code for UPMaxUtilityItemsets Algorithm: UPMaxUtilityItemsets Input :UPMAXTree, HeadertableHx, Itemset X, min_utility_threshold. Output :MaximalHighUtilityItemsets. 1. For each entry i in Hx do 2. Trace each node related to i and calculate nu sum (i). // nu sum (i) = sum of node utilities of (i). 3. If (nu sum (i) >min_utility_threshold) do 4. Generate PHUI Y = X i. 5. Set pu(i) as estimated utility of i. 6. Cotruct Y-CPB 7. Put local promising items in Hv 8. Apply DLU 9. Apply DLN 1. Set a pointer pt to leftist leaf node of UPMAXTree. 11. List = botttomuptracing(upmaxtree, min_utility_threshold, pt) 12. Output MaximalHighUtilityItemsets from List. 13. If ( Tx NULL) then UPMaxUtilityItemsets (Tx, Hx, X) 14. End if. 15. End for. Traaction id 128 Table IV. Reorganized Reorganised RTU T1 (i3,2 )(i2,3) 17 T2 (i3,3) (i4,1) 17 T3 (i3,1) (i2,2) 1 T4 (i3,1) (i2,3) (i4,3) 28 T5 (i3,2) (i2,3) (i4,1) 22 Fig 1. UPMAX Tree IV. EXPERIMENTAL EVALUATION The algorithm is implemented in Java language. The software tool used is NetBea IDE 8.. The dataset used in the experiment is Foodmart. It is dataset of customer traactio from a retail store, obtained and traformed from SQL-Server 2. It coists of 4141 traactio with an average of 6 items per traactio. Table V. Performance Measures Database size 1 Traactio 2 Traactio Traactio Traactio Min. Utility Threshol d Execution Time in ms. Memory Coumptio n in MB High Utility Itemset Counts Generated Max. High Utility Itemset s 78 16.14 16 624 25 13.46 3 1 9 19.95 4219 1545 121 22.27 12958 5489 33 34.72 7245 4728 222 27.62 16 74 386 42.58 185 8925 444 48.2 3945 18524 396 81 987 3694 31 69.7 386 132 454 74.23 24848 12584 67 84.4 55714 2371 21 15.61 17439 5784 378 16.1 769 528 4 12.84 428 192 12 372 11.74 261 154 From the table V, it was found that maximal high utility itemsets retrieves compact view of high utility itemsets. It includes only the superset of high utility itemsets. Figure 2 shows the comparison of execution time of different size of traactio at different utility threshold.figure 3 gives the comparison of memory coumption of different size of traactio at different utility threshold and figure 4 depicts the number of high utility itemsets retrieved. Figure 5 represents the number of maximal high utility itemsets retrieved.

12 12 12 12 International Journal of Electrical Electronics & Computer Science Engineering 7 6 5 4 3 2 1 2 2 1 9 8 7 6 5 4 3 2 1 6 2 1 Fig. 2. Comparison of Execution Time 1 Fig. 3. Comparison of Memory Coumed Fig. 4. Comparison of Number of High Utility Items Retrieved 129 2 2 1 2 1 2 Fig. 5. Comparison of Number of Maximal High Utility Items Retrieved V. CONCLUSION Utility Mining coiders utility factors of itemset, which is an emerging topic in data mining. It is very beneficial in several real-life applicatio. In this work, a novel algorithm for effectively mining maximal high utility item sets from a traaction database is proposed. It sca the database only once. This maximal high utility itemsets give a compact list of high utility itemsets. In this work UPMAXTree was cotructed and then maximal high utility itemsets are retrieved from it. It also implements DGU and DGNpruning strategies. Hence it reduces time and number of candidates generation. The algorithm was implemented using and experiments are done using chess dataset. Execution time, Memory space, number of maximal high utility items and number of high utility itemsets retrieved by the algorithm are found. VI. REFERENCES [1] Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu, Fellow, Efficient Algorithms for Mining High Utility Itemsets from Traactional Databases, IEEE Traaction on knowledge and data engineering, vol. 25, no. 8, Aug 213. [2] C.F. Ahmed, S.K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases, IEEE Tra. Knowledge and Data Eng., vol. 21, no. 12, pp. 178-1721, Dec. 29 [3] R. Agrawal, T. Imieliki, A. Swami, 1993, mining association rules between sets of items in large databases, in: proceedings of the ACM SIGMOD International Conference on Management of data, pp. 27-216 [4] R. Agrawal, R Srikant, Fast algorithms for mining association rules,in : Proceedings of 2th

international Conference on Very Large Databases,Santiago, Chile, 1994, pp.487-499 [5] J Han, J.Pei, Y.Yin,R. Mao Mining frequent Patter without candidate generation:a frequent - pattern tree approach, Data Mining and Knowledge Discovery 8(1)(24) 53-87 [6] Liu. Y, Liao. W,A. Choudhary, A fast high utility itemsets mining algorithm, in: Proceedings of the Utility-Based Data Mining Workshp, August 25 [7] Y.-C. Li, J.-S. Yeh, and C.-C. Chang, Isolated Items Discarding Strategy for Discovering High Utility Itemsets, Data and Knowledge Eng., vol. 64, no. 1, Jan. 28. [8] C.H. Cai, A.W.C. Fu, C.H. Cheng, and W.W. Kwong, Mining Association Rules with Weighted Items, Proc. Int l Database Eng. and Applicatio Symp. (IDEAS 98), 1998. [9] R. Chan, Q. Yang, and Y. Shen, Mining High Utility Itemsets, Proc. IEEE Third Int l Conf. Data Mining, pp. 19-26, Nov. 23. [1] V.S. Tseng, C.-W. Wu, B.-E. Shie, and P.S. Yu, UP-Growth: An Efficient Algorithm for High Utility Itemsets Mining, Proc. 16th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD 1), 21. [11] H. Yao, H.J. Hamilton, and L. Geng, A Unified Framework for Utility-Based Measures for Mining Itemsets, Proc. ACM SIGKDD Second Workshop Utility-Based Data Mining, Aug. 26. [12] Jiawei Han, Hong Cheng, Dong Xin and Xifeng Yan, Frequent pattern mining: current status and future directio, Data Mining Knowledge Discovery, January 27. 13