REDUCTION OF LARGE DATABASE AND IDENTIFYING FREQUENT PATTERNS USING ENHANCED HIGH UTILITY MINING. VIT University,Chennai, India.

Size: px

Start display at page:

Download "REDUCTION OF LARGE DATABASE AND IDENTIFYING FREQUENT PATTERNS USING ENHANCED HIGH UTILITY MINING. VIT University,Chennai, India."

Berniece Johnston
5 years ago
Views:

1 International Journal of Pure and Applied Mathematics Volume 109 No , ISSN: (printed version); ISSN: (on-line version) url: doi: /ijpam.v109i5.19 PAijpam.eu REDUCTION OF LARGE DATABASE AND IDENTIFYING FREQUENT PATTERNS USING ENHANCED HIGH UTILITY MINING K. Suresh 1 and V. Pattabiraman 2 1,2 School of computing science and engineering, VIT University,Chennai, India. Abstract: In general the traditional frequent mining is used without considering the profit values. Hence the prediction of the frequent pattern is not efficient. Towards improve the prediction in static as well as dynamic data sets there is an need for a new approach. Therefore in this paper proposed an enhanced high utility mining. The utility datasets is continuously grow so the existing prediction patterns might become out of trend. The customer behaviors are ever changing which could be known through the techniques like sequential mining, pattern growth mining, frequent and incremental mining. High utility pattern mining (HUP) is considered due to two major factors such as profit value for concern itemsets and number of item in total transactions. In this paper proposed an enhanced HUP technique for pruning candidates effectively in mining process. This research work proposes a different way for utility mining, it will recommend for business analytics with the different constraints. The objective of enhanced HUP techniques is to identify itemsets which has low frequency with high profit values in a given utility threshold. Experimental results shows that the performance of the proposed technique will executes faster and reduced the run time when compared to existing method. Key Words: ssociation rule mining, Utility mining, Interactive mining, Incremental Mining Received: October 1, 2016 Published: October 23, 2016 c 2016 Academic Publications, Ltd. url:

2 162 K. Suresh and V. Pattabiraman 1. Introduction In current mining research, to find an outstanding pattern from a large database is one of the important job. Knowledge pattern or efficient itemsets have been conceive and useful to find out strong associations among association and utility mining. Association Rule Mining ARM is invented by Agrawal, Imieli ski, & Swami, ARM and frequent mining investigate gives more important to find itemsets whose items contain high relationship. Data mining suggest that to find uncertainties and hidden information from massive databases, and frequent mining is one of the data mining fields which plays a vital role in extracting purposeful information. Apriori and FP-growth are the two essential frequent pattern mining techniques. It has become necessary criteria in various frequent pattern mining studies and applications. In addition, there are widespread and numerous approaches like sequential frequent patterns with no specific threshold, frequent patterns over knowledge and weighted frequent patterns. The explosive increase of information provides the motivation to search out meaningful knowledge hidden within the immense database, and this is main reason to perform data processing techniques. Pattern mining is one of the mining techniques for locating helpful patterns from the large databases. Frequent mining techniques is presently one among the foremost attention grabbing fields in data processing. From the most recent analysis of databases, it required further additional immediate process owing to immense volume of knowledge is being updated in real time. In despite of the existing methodology will not posses entirely appropriate for a knowledge surroundings and since they can manage over number of database scans. In addition frequent pattern mining over information generates a vast range of frequent patterns in that way it causes a major amount of expenditure. Considering weight conditions are extremely helpful factors in reflective importance for every object in the real time, it is also necessary to use them to the mining techniques so as to get a lot of sensible and significant patterns. The purpose of evolving high utility pattern techniques is to overcome the drawbacks of frequent mining technique. In recent research, HUP mining turns into vital role in data mining and knowledge discovery. HUP considers number of frequent values and various profit values of corresponding itemset from transaction table. In real time applications, do not handle static databases usually it handles only dynamic databases. Perspectives of incremental mining datasets are

3 REDUCTION OF LARGE DATABASE changing from day by day. For example suppose new transaction made by the customer then automatically it will be added in the existing original database. The recent transaction may produce unreasonable patterns. The author [9] proposed Fast Update Pattern (FUP) incremental mining algorithm to solve the issues like discovering association rule in streaming data sets. There are two process are involved in FUP, primarily estimates the number of new itemsets from real datasets and evaluates them with existing frequent mining rules from the transactional database. Various methods are compared based on the results. Special features of FUP is, it can minimize the number of re-searching rules from transactional database and it also accumulating runtime as well as computational time in incremental mining. Encouraged from the above mining circumstances, in this research proposes a novel method for mining highutility over transactional datasets. The technique considers both internal and external utility mining and introduces new measure to reduce the database size while pattern has to be found. The remainder of this research work is described as follows. In section 2, describes review of related works. In section 3, Existing work In section 4, proposed work. In section 5, experimental results and analysis are conferred and tested. Conclusions is given in section Related Work [11] In data mining finding frequent patterns is the fascinating and basic complications. Mining has been broadly involved application like retail business, trading, etc. The basic mining process is used to find out buying product which is frequently purchased by customers. Frequent mining handles both static and dynamic databases. [12] This paper proposed a Two-Phase algorithm for dynamic search the number of efficient candidates and as well as retrieve high utility itemsets. In first phase, they have implemented the transaction weighted utility in downwards closure property to discovery number of candidates in transaction data. In second phase, to find out the high utility itemsets on different database and finally by using parallelization technique they have tested the memory speed. [13] CTU-Mine proposed an algorithm thats a lot of economical than the Two-Phase methodology solely in dense databases once the minimum utility threshold is incredibly low. The Isolated Items Discarding Strategy (IIDS) for locating high utility itemsets was projected to scale back the amount of candidates in each info scan. Applying IIDS, the authors developed economical HUP

4 164 K. Suresh and V. Pattabiraman mining algorithms known as FUM. Economical tree structures are projected for progressive HUP mining. However, these algorithms arent applicable for HUP mining. [10] This paper proposed an algorithm based on the rule high Utility Mining using the Maximal Itemset property (UMMI). This algorithmic rule will cut back the itemset within the massive databases. This paper achieved that UMMI time quality is quicker than TWU-mining, CTUPRO and 2 part algorithmic rules. In a very real knowledge experiment, UMMI is quicker than Two-Phase. [11] In incremental mining, initially Sliding window model is used in streaming data. In this model transaction tables are divided by window based partition. It has predetermined size for handling different sliding window data. Batch processing method is used to assign fixed length of sliding windows. In sliding window model it will consider only recent data which is arrived in window and the oldest transaction are removed from window. [12] In this paper they proposed novel method based on the sliding window model. The frequent number of occurrences for each sliding window has been maintained in a particular list. The Unexpected large sizes of datasets are not easy to handle in sliding window model, therefore window size also compressed. Traditionally in transaction table insertion and deletion of itemsets operations are done by single row, latter it has been enhanced to number of rows in single batch process in transaction table. By implementing streaming technique in sliding window, updating transaction data is preserved constantly. 3. Methodology This section providing the details about the sliding window techniques and mathematical model used for the data mining process. [18] Retail business, web log data, e-business, share market data analysis and network traffic analysis these are the some examples for data mining applications. In recentyears, a lot of attentions are paid to stream data processing. Detective work frequent knowledge things is a crucial task in data knowledge analysis. Frequency may be an elementary characteristic in several data processing tasks like association rule mining and iceberg queries. [19] In most of the condition, if a new transaction involves in transactional database then old transaction is automatically outdated from the sliding window. Although the window model has been extensively studied, 2 necessary problems havent received ample attention. Firstly, whereas the information in

5 REDUCTION OF LARGE DATABASE a window area unit differentiated from those shifted out of it, all the transactions in the window area unit treated equally. For a category of time-sensitive knowledge discovery applications, like stock knowledge analysis, transportation traffic analysis and device network knowledge analysis, the importance of data embedded in a very dealing gradually decreases with time. Therefore, once mining frequent patterns from such knowledge, it is not going to [8] The internal utility or local transaction utility value l(i p,t q ), represents the quantity of item i p in transaction T q. For example, in Table 1, l(d,t6) = 4. [8] Theoccurrence of transaction in itemset T q denoted by OC(T q ) is the total numberof items occurred in each T q. For example, OC(T 1 ) = a+b+c+d+e = = 2. [9] The minimum utility threshold is the ratio between the occurrence sum and total number of transactions of window W k. Assume the minimum threshold δ is30 percentage, then minimumutility valueinthis windowcanbedefined as MinutilW k = δw k SOC(T q )/n. (1) So, in this example, minutilwk = = intable 3. [8] The external utility p(ip) is the unit profit value of item I p. For example, in Fig. 1(b), p(d) = 8. [9] The transaction utility of transaction T q denoted as T u (T q ) describes the total profit of that transaction and it is defined by t u (T q ) = U(i p,t p ) (2) i p T p For example, T u (T 6 ) = u(a,t 6 )+u(b,t 6 )+u(d,t 6 ) = = 75 in Table 1 4. Proposed Work This research work put forward to develop an algorithmic rule for progressive and interactive HUP mining over information using HUP tree structure approach. It helps to reduce the size of the database. Next it proposes the algorithm for finding the high utility itemset. Finally found the HUP mining with respect to total number of itemsets using the enhanced HUP tree. To deal utility data may be a quite information that happens in several application areas like sensor networks, web log data, telecommunication information, etc., it should have infinite range of transactions. A batch of transactions

6 166 K. Suresh and V. Pattabiraman contains a non empty set of transactions. The development of HUP technique and HUS-tree to capture stream information. It arranges the things in composition order. A header table is maintained to stay associate degree item order in this tree structure. Every entry in a header table expressly maintains item-id associate degree of TWU (Transaction Weighted Utility) price of an item. However, each node in a tree maintains item-id information to expeditiously maintain the flow of data. To facilitate the tree traversals adjacent links also are maintained in tree structure. It describes the mining method for HUP technique. To use a pattern growth mining approach, first creates a prefix tree from the bottom-most item in all the branches, prefixing that item unit taken from the TWU of that transaction. For the mining purpose, it tends to add all the TWU prices of a node within the prefix tree to point its total TWU value during this transaction. Considering the bench mark dataset shown in Table 1. It consists of 8 sample transactions and five items, denoted A to E. And also assume the userdefined profit values for the items are given in a utility table shown in Table 2. Table 3, it considers the frequent number of occurrences in two different ways. Frequent number of occurrence in itemset perspective and as well as total number of occurrences from the transactions wise. Based on this total number of occurrences is consider as the threshold for transactional database. Considers set of itemsets and database I = i 1,i 2,...i m and D be a transaction database T 1,T 2,...T n respectively, where each transaction T i D is a subset of I. The occurrence sum of all transaction T q k considered by SOC(T q ) is identified the sum of all the sum of all occurrences of T q. SOC(Tq) = n OC(T q ) (3) Utility u(i p,t q ), is the quantitative measure of utility for item I p in transaction T q, defined by U(I p,t q ) = l(i p,t p ) p(i p )/n (4) For example, u(c,t 2 ) = 4 8 = 32 in Table 1. The Weighted Utilization Transaction (WTU) of an itemset X, denoted as WTU(X), is the total number of the transaction utilities of all transactions containing X. twu(t i ) = T U (T p ) (5) q=1 X T q D

7 REDUCTION OF LARGE DATABASE A pattern X is a high utility pattern in window W k, if U W k(x)minutilw k. Finding high utility patterns in window W k means find out all the patterns X having criteria uw k (X) Thetransactionweightedutilization ofanitemsetx inabatchb j,twu B j(x), is defined by twu b j(x) = X T q B j T u (T q ) (6) For example, tw B 4(d) = (102) in Table-3. X is a high transaction weighted utilization itemset in W k if t u W k (T i ) minutilityw k Figure 1: Enhanced Tree structure 5. Experimental results and analysis Estimating the achievement of proposed algorithm, this research work has executed various analysis on synthetic datasets T 10I4D100K. These datasets dont have profit values for itemsets which is presented transaction table. This work considers random numerical values as the profit values for itemsets. Since the existing HUP algorithm outperforms the other sliding window-based HUP mining algorithms over data knowledge, this work compares the candidate itemsets and runtime differences increase when the scanning process done through the original database

8 168 K. Suresh and V. Pattabiraman Figure 2: Efficient Dataset Figure 3: Efficient Datasets

9 REDUCTION OF LARGE DATABASE In figure 2 represents the total number of transaction is represented in x-axis and the efficient transaction is represented in the y-axis. To obtain 50k efficient transactions it needs to scan whole original database (i.e.) 50k transactions whereas enhanced HUP method there are only 22k efficient transactions. The reason for this time complexity is the reduction of total number of itemsets in the original database. Its efficiency is measured based on its threshold value. In figure 3 represented the runtime of the number of transaction in the original databases based on the enhanced algorithm. For 50K number of transactions run time for original database is computed in 12 seconds whereas enhanced HUP method is computed in 5 seconds. X-axis is considered as the total number of transaction which is represented in thousands and Y-axis is mentioned as runtime in seconds. 6. Conclusion In the past, the HUP algorithm was designed to discover the high utility itemsets effectively. An additional database scan was performed to find out the real utility values of the remaining candidates and to identify high utility itemsets. In this paper, an efficient incremental algorithm to update and discover the high utility itemsets for new transactions is proposed. Experimental results shows that the performance of the proposed algorithm executes faster than HUP algorithm in the intermittent data environment. This algorithm reduces the complexity at most half of the existing algorithms and therefore it saves memory. References [1] G. Gasper, M. Rahman, Basic Hypergeometric Series, Cambridge University Press, Cambridge (1990). [2] M. Rosenblum, Generalized Hermite polynomials and the Bose-like oscillator calculus, In: Operator Theory: Advances and Applications, Birkhäuser, Basel (1994), [3] D.S. Moak, The q-analogue of the Laguerre polynomials, J. Math. Anal. Appl., 81 (1981),

10 170

Incrementally mining high utility patterns based on pre-large concept

Appl Intell (2014) 40:343 357 DOI 10.1007/s10489-013-0467-z Incrementally mining high utility patterns based on pre-large concept Chun-Wei Lin Tzung-Pei Hong Guo-Cheng Lan Jia-Wei Wong Wen-Yang Lin Published