620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others
|
|
- Thomas Wilkerson
- 6 years ago
- Views:
Transcription
1 Vol.15 No.6 J. Comput. Sci. & Technol. Nov A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing Center, Department of Computer Science University of Science & Technology of China, Hefei , P.R. China flshuang,hpchen,xwang,glcheng@ustc.edu.cn Received March 7, 2000; revised May 22, Abstract In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions. Keywords database, data mining, large itemset, association rule, minimum support, minimum confidence 1 Introduction Data mining is motivated by the decision support problem faced by most large retail organizations. Progress in bar-code technology has made it possible for retail organizations to collect and store massive sales data, referred to as the basket data. A record in such data typically consists of the transaction time and the items bought in the transaction [1;2]. The problem of mining association rules over basket data was introduced in [2]. An example of such a rule might be that 80% of customers who purchase tires and auto accessories also get automobile services done. Finding all such rules is valuable for cross-marketing and attached mailing applications. Other applications include catalog design, add-on sales, store layout, and customer segmentation based on buying patterns. The databases involved in these applications are very large. It is imperative, therefore, to develop fast algorithms for this task. The following is a formal statement of the problem [2]. Let I = fi 1 ;i 2 ;:::;img be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T I. Associated with each transaction is a unique identifier, called its TID. We say that a transaction T contains X, a set of some items in I, if X T. An association rule is an implication of the form X ) Y, where X ρ I, Y ρ I, and X Y = ;. The rule X ) Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y. The rule X ) Y has support s in the transaction set D if s% of transactions in D contain X [ Y. Given a set of transactions D, the problem of mining association rules is to generate all association rules that have support and confidence greater than or equal to the user-specified minimum support (called minsup) and minimum confidence (called minconf) respectively. This problem can be decomposed into two subproblems [3;4]. 1) Find all sets of items (itemsets) that have transaction supports above minimum support. The support for an itemset is defined as the fraction of total transactions that contains This work was supported in part by the National `863' High-Tech Programme of China (No ZD06-2).
2 620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others small itemsets. 2) Use the large itemsets to generate the desired rules. Here is a straightforward algorithm. For every large itemset l, find all non-empty subsets of l. For every found subset a, output a rule of the form a ) (l a) if support(l)/support(a) min conf. We need to consider all subsets of l to generate rules with multiple consequents. It is easy to see that the first problem is the key to discover all association rules. In this paper, therefore, we do not discuss the second problem further, but readers may refer to [4] for a rule generation algorithm. Agrawal and Srikant presented two famous algorithms, Apriori and AprioriTid, for finding all large itemsets [3;4]. They also showed that these two algorithms are much better than earlier algorithms, such as AIS [2] and SETM [5] algorithms. In this paper, we present a new algorithm BitMatrix that is fundamentally different from previous algorithms. We also prove through experiments that the proposed algorithm always outperforms the Apriori and AprioriTid and it has excellent scale-up properties, increasing the practicability and the speed of mining association rules over very large databases. The rest of this paper is organized as follows. In Section 2, we give the new algorithm, BitMatrix, for finding all large itemsets. In Section 3, we show the performance of the proposed BitMatrix algorithm against that of the Apriori and AprioriTid algorithms. We also demonstrate the scale-up properties of our algorithm. Finally, we come to a conclusion by pointing out some related open problems in Section 4. 2 Discovering Large Itemsets Most of the existing algorithms for discovering all large itemsets make multiple passes over the data. In the first pass, they count the support of individual items and determine which of them are large. In each subsequent pass, candidate itemsets, also called potentially large itemsets, are generated from previously generated large itemsets or the database. Then the support counts for these candidate itemsets are found during the pass over the data. At the end of the pass, large itemsets are determined by their counts. This process continues until no new large itemsets are found [2;4;5]. In both Apriori and AprioriTid algorithms all the candidate itemsets with the same length must be stored in the memory, which results in a waste of space. To generate large itemsets, the database is passed as many times as the length of the longest large itemsets in Apriori algorithm. Namely, the database is scanned and the support of each candidate itemset is counted after the new candidate itemsets are generated, which results in a waste of time for large database. Although AprioriTid does not use the database for counting support after the first pass, its performance is worse than Apriori in almost all cases because each entry that is used for counting support may be larger than the corresponding transaction in the database in the initial pass. In our BitMatrix algorithm, which is fundamentally different from the Apriori and AprioriTid, we need not store all the candidate itemsets in the memory and pass over the database only once. We start with the previously generated large itemsets and once a candidate itemset is generated, the information stored in the bitmatrix will be used to determine whether it is a large itemset. After all candidates are generated, the large itemsets in that pass are discovered and the pass is over. Therefore, the new algorithm has the excellent property that the database is not used at all after the initialization of bitmatrix. Rather, the encoding of the database is employed for judging whether a candidate is a large itemset. In the later passes, the size of this encoding can become much smaller than the database, thus saving much reading effort.
3 No.6 A Fast Algorithm for Mining Association Rules Algorithm BitMatrix In Apriori and AprioriTid algorithms, it is assumed that items in each transaction are kept sorted in their lexicographic order [4]. However, this is not needed in BitMatrix. By careful programming, we can keep the items in the large itemsets and the large itemsets of the same size are kept sorted in their lexicographic order even if the items in the transactions are not kept sorted. We call the number of items in an itemset its size, and call an itemset of size k a k-itemset. The set of all large k-itemsets is defined as Lk. Each k-itemset c in Lk consists of items c[1];c[2];:::;c[k], where c[1] < c[2] < < c[k]. Associated with each itemset are two fields: count field to store the support for this itemset, and index field (henceforth referred to as support index) to indicate the transactions that contain the itemset. The BitMatrix algorithm is described as: (1) Initialize the bitmatrix; (2) L 1 = flarge 1-itemsetg; (3) for (k = 2; Lk 6= ;; k ++)do (4) Lk =GenLargeItemsets(Lk 1); (5) Answer= [ k Lk. In Step (1) of this algorithm, we initialize the bitmatrix as follows. First we build a matrix whose row number and column number are the item number and the transaction number, respectively. Note that the matrix is a bit-matrix and every position of the matrix only has one bit in the memory. Then we go through the database. If there are items i 1 ;i 2 ;:::;ik in the j-th transaction, bits ai 1j;ai 2j;:::;ai k j (aij represents the bit of i-th row and j-th column) and the other bits in the j-th column of the matrix are initialized as 1 and 0 respectively. As an example, Fig.1 shows a database and the corresponding bitmatrix after initialization. Database TID Items Bitmatrix Items Transactions Fig.1. An example. In Step (2), we simply count the number of 1 in each row to get the support count of every item and the large 1-itemsets are determined. In Step (4), the previously generated large (k 1)-itemsets are used to generate the large k-itemsets. This step repeats until no new large itemsets are generated. The GenLargeItemsets function is used here, which takes as argument Lk 1 and returns Lk. The function works as follows. (1) for (8p; q 2 Lk 1) do (2) if (p[1] = q[1])^;:::;^(p[k 2] = q[k 2]) ^ (p[k 1] < q[k 1]) then f (3) c = p [ q; //c consists of p[1];p[2];:::;p[k 2];p[k 1];q[k 1] (4) for all (k 1)-subsets s of c do (5) if (s 62 Lk) then fdelete c; c = ;; break;g (6) if (c 6= ;) then f (7) c.index = p.index & q.index; //support index (8) compute c.count from c.index; //support count (9) if (c.count minsup) then Lk = Lk [fcg; (10) g//endif (11) g//endif From Steps (1) to (5), the function simply helps generate the Ck that is a set of candidate k-itemsets (potentially large itemsets, see also [4]). In Step (2), the condition p[k 1] <
4 622 HUANG Liusheng, CHEN Huaping et al. Vol.15 q[k 1] ensures that no duplicates are generated. However, this algorithm differs from Apriori in that it need not store all the candidates in the memory. Once a candidate itemset is generated, it will be determined in Steps (7) to (9) whether it is a large one. To decide whether a candidate itemset is a large one, we associate each large itemset with a support index, which is a bit index and each bit of which indicates whether the itemset is contained by a transaction in the database. As to the 1-itemsets, their support index is some row in the bitmatrix. For example, in Fig.1, the support index of item 3 (i.e., the 1-itemset f3g) is the 3rd row in the bitmatrix, or It indicates that the 2nd, 3rd, 4th transactions contain this itemset and the first transaction does not contain it. Since c is the union of p and q, we simply generate c's support index by bit operator AND ( &") that is applied to each bit of p's and q's in Step (7). 2.2 The Correctness We need to show what is generated in the GenLargeItemsets function. Agrawal has proved Ck Lk [4] and Ck will be obtained by collecting all the itemsets c in Step (5). Therefore, we only need to demonstrate that no large itemsets are omitted in Steps (7) (9). In fact we only need to prove that the equation in Step (7) is correct. Note that if the k-th bit in the support index associated with an itemset is 1, then the k-th transaction contains the itemset, and vice versa. If both the k-th bit of p's index and that of q's index are 1, then the k-th transaction contains both p and q. Therefore, it certainly also contains p [ q (i.e., c). By similar reasoning, if either the k-th bit of p's index or that of q's index is 0, the transaction does not contain p or q. So it does not contain c and the k-th bit of c's index is 0. Thus, we can conclude that the equation in Step (7) is correct. 2.3 An Example We still use the database in Fig.1 and assume that the minsup is 2 transactions. From the bitmatrix in Fig.1, we can see that f1g, f2g, f3g, f5g are large 1-itemsets. Using the GenLargeItemsets function, we can get 6 candidate 2-itemsets: f1 2g,f1 3g,f1 5g,f2 3g,f2 5g,f3 5g, and the support indexes associated with them: 1110, 0110, 0010, 0111, 0011, All except f1 5g are large itemsets. Using GenLargeItemsets function again, we can get 2 candidate 3-itemsets f1 2 3g,f2 3 5g, and the associated support indexes: 0110, They are both large 3-itemsets because 1's number (i.e., support count) in their support indexes is not less than minsup. There are no longer large itemsets because L 4 turns out to be empty when we generate L 4 using L Buffer Management The bitmatrix and the support indexes of large itemsets both take much space in the memory. However, we need not store all of them in the memory. When Lk is generated, the bitmatrix only needs the indexes of large (k 1)-itemsets. We can release the space for storing the indexes of large itemsets shorter than k after we generate all the large k-itemsets. Furthermore, when we use p and q to generate c in the GenLargeItemsets function, the indexes of those large (k 1)-itemsets before p are not used and we can release them, too. 3 Performance To evaluate the relative performance of the algorithms for discovering large itemsets, we carried out several experiments on a GODEYE workstation with a CPU clock rate of 133MHz, 64MB of main memory, and running AIX 4.1.
5 No.6 A Fast Algorithm for Mining Association Rules Generation of Synthetic Data To evaluate the performance of the algorithms over a large range of data characteristics, we generated synthetic transaction data using the method proposed in [4]. These transactions simulate the transactions in the retailing environment. Our synthetic data generation program takes the parameters shown in Table 1. We generated dataset by setting N = 1000, jlj = 2000 and jdj = 100; 000. We chose 3 values for jt j: 5, 10, and 20, 3 values for jij: 2, 4, and 6. Table 2 summarizes the dataset parameter settings. For the same jt j and jdj values, the sizes of datasets in megabytes were roughly equal for different values of jij. jdj jt j ji j jlj N Table 1. Parameters Number of transactions Average size of transactions Average size of potentially large itemsets Number of potentially large itemsets Number of items Table 2. Parameter Settings Database jt j ji j jdj Size (MB) T5I2D100k k 3.2 T10I2D100k k 5.2 T10I4D100k k 5.2 T20I2D100k k 9.2 T20I4D100k k 9.2 T20I6D100k k Relative Performance Fig.2 shows the execution time of the six synthetic datasets given in Table 2 for decreasing values of minimum supports. As the minimum support decreases, the execution times of all the algorithms increase because of the increase in the total number of candidates and large itemsets. The figure shows that BitMatrix outperforms Apriori and AprioriTid for all problem sizes. Table 3 gives the execution time of the algorithms for the minimum support of 0.75%. Fig.2. Execution times for synthetic data.
6 624 HUANG Liusheng, CHEN Huaping et al. Vol.15 Table 3. Execution Times for minsup=0.75 (s) Database AprioriTid Apriori BitMatrix T5I2D100k T10I2D100k T10I4D100k T20I2D100k T20I4D100k T20I6D100k Fig.3. Scale-up property of BitMatrix. 3.3 Scale-up Experiment Fig.3 shows how BitMatrix algorithm scales up as the number of transactions increases from 10,000 to 100,000. The combinations for average sizes of transactions and itemsets are T10I4 and T20I6 respectively, and all other parameters are the same as those in Table 2. The minimum support level is set to 0.75%. The execution time is normalized with respect to the time for the 10,000 transaction datasets in this figure. As shown, the execution time scales quite linearly. 4 Conclusions and Future Work In this paper a new algorithm BitMatrix is presented for discovering all significant association rules between items in a large database of transactions. And the algorithm is compared with the previously known algorithms, the Apriori and AprioriTid algorithms. The BitMatrix algorithm has the nice feature that it need not pass over the original dataset in every pass. Furthermore, it need not keep a large set of candidate itemsets in the memory. Instead, once a candidate is generated, it is evaluated whether it is a large one, and it will be discarded if not. Therefore, our algorithm saves much time and space compared with Apriori and AprioriTid over very large databases. This paper has also presented experimental results, showing that the proposed algorithm always outperforms Apriori and AprioriTid. The scale-up experiment shows that BitMatrix scales quite linearly. In the future, we plan to extend this work along the following directions: ffl A more compact data structure may be found to take the place of BitMatrix. This structure may reduce the memory requirement and improve the performance. ffl The problem of very large C 2 will be considered. A specific algorithm has to be worked out for generating C 2. Our idea is that we need not find all the large 2-itemsets, but most of them. If the performance improves greatly, it will be successful. ffl Multiple taxonomies (is-a hierarchies) over items are often available. An example of such a hierarchy is that a dishwasher is a kitchen utensil as well as an electric appliance. It will be valuable to find association rules where such hierarchies are used. ffl The quantities of the items bought in a transaction have not been considered, but they are useful for some applications. Finding such rules needs further research. References [1] Agrawal R, Srikant R. Mining sequential patterns. IBM Research Report, [2] Agrawal R, Imielinski T, Swami A. Mining association rules between sets in large databases. In Proc. the ACM SIGMOD Conf. Management of Data, May 1993, pp [3] Agrawal R, Srikant R. Fast algorithm for mining association rules. IBM Research Report, [4] Agrawal R, Mannila H, Toivonen H et al. Fast Discovery of Association Rules. In Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996, pp [5] Houtsma M, Swami A. Set-oriented mining of association rules. IBM Research Report, Oct
Improved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationEfficient Remining of Generalized Multi-supported Association Rules under Support Update
Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationAn Evolutionary Algorithm for Mining Association Rules Using Boolean Approach
An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,
More informationrule mining can be used to analyze the share price R 1 : When the prices of IBM and SUN go up, at 80% same day.
Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules Anthony K. H. Tung 1 Hongjun Lu 2 Jiawei Han 1 Ling Feng 3 1 Simon Fraser University, British Columbia, Canada. fkhtung,hang@cs.sfu.ca
More informationA Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm
A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationA mining method for tracking changes in temporal association rules from an encoded database
A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:
More informationA Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study
A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study Mirzaei.Afshin 1, Sheikh.Reza 2 1 Department of Industrial Engineering and
More informationAn Improved Algorithm for Mining Association Rules Using Multiple Support Values
An Improved Algorithm for Mining Association Rules Using Multiple Support Values Ioannis N. Kouris, Christos H. Makris, Athanasios K. Tsakalidis University of Patras, School of Engineering Department of
More informationUsing a Hash-Based Method with Transaction Trimming for Mining Association Rules
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 5, SEPTEMBER/OCTOBER 1997 813 Using a Hash-Based Method with Transaction Trimming for Mining Association Rules Jong Soo Park, Member, IEEE,
More informationMedical Data Mining Based on Association Rules
Medical Data Mining Based on Association Rules Ruijuan Hu Dep of Foundation, PLA University of Foreign Languages, Luoyang 471003, China E-mail: huruijuan01@126.com Abstract Detailed elaborations are presented
More informationMining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center
Mining Association Rules with Item Constraints Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120, U.S.A. fsrikant,qvu,ragrawalg@almaden.ibm.com
More informationEcient Parallel Data Mining for Association Rules. Jong Soo Park, Ming-Syan Chen and Philip S. Yu. IBM Thomas J. Watson Research Center
Ecient Parallel Data Mining for Association Rules Jong Soo Park, Ming-Syan Chen and Philip S. Yu IBM Thomas J. Watson Research Center Yorktown Heights, New York 10598 jpark@cs.sungshin.ac.kr, fmschen,
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,
More informationA Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *
A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology
More informationParallel Mining Association Rules in Calculation Grids
ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:
More informationMining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare
More informationMaterialized Data Mining Views *
Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61
More informationMining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,
Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk
More informationAn Algorithm for Frequent Pattern Mining Based On Apriori
An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior
More informationDiscovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree
Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania
More informationAssociation mining rules
Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when
More informationAn Ecient Algorithm for Mining Association Rules in Large. Databases. Ashok Savasere Edward Omiecinski Shamkant Navathe. College of Computing
An Ecient Algorithm for Mining Association Rules in Large Databases Ashok Savasere Edward Omiecinski Shamkant Navathe College of Computing Georgia Institute of Technology Atlanta, GA 3332 e-mail: fashok,edwardo,shamg@cc.gatech.edu
More informationPTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets
: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets J. Tahmores Nezhad ℵ, M.H.Sadreddini Abstract In recent years, various algorithms for mining closed frequent
More informationDiscovering interesting rules from financial data
Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email: psoldack@ii.pw.edu.pl
More informationOptimization using Ant Colony Algorithm
Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department
More informationFIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran
FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Jun Luo Sanguthevar Rajasekaran Dept. of Computer Science Ohio Northern University Ada, OH 4581 Email: j-luo@onu.edu Dept. of
More informationAC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery
: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,
More informationAn Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets
More informationA NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET
A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET Ms. Sanober Shaikh 1 Ms. Madhuri Rao 2 and Dr. S. S. Mantha 3 1 Department of Information Technology, TSEC, Bandra (w), Mumbai s.sanober1@gmail.com
More informationMining Imperfectly Sporadic Rules with Two Thresholds
Mining Imperfectly Sporadic Rules with Two Thresholds Cu Thu Thuy and Do Van Thanh Abstract A sporadic rule is an association rule which has low support but high confidence. In general, sporadic rules
More informationSQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE)
SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE) Takeshi Yoshizawa, Iko Pramudiono, Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 7-22-1 Roppongi,
More informationDMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE
DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com
More informationA DISTRIBUTED ALGORITHM FOR MINING ASSOCIATION RULES
A DISTRIBUTED ALGORITHM FOR MINING ASSOCIATION RULES Pham Nguyen Anh Huy *, Ho Tu Bao ** * Department of Information Technology, Natural Sciences University of HoChiMinh city 227 Nguyen Van Cu Street,
More informationData Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.
More informationMining Quantitative Association Rules on Overlapped Intervals
Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,
More informationMaintenance of the Prelarge Trees for Record Deletion
12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of
More informationGraph Based Approach for Finding Frequent Itemsets to Discover Association Rules
Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery
More informationSensitive Rule Hiding and InFrequent Filtration through Binary Search Method
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 5 (2017), pp. 833-840 Research India Publications http://www.ripublication.com Sensitive Rule Hiding and InFrequent
More informationDiscovery of Association Rules in Temporal Databases 1
Discovery of Association Rules in Temporal Databases 1 Abdullah Uz Tansel 2 and Necip Fazil Ayan Department of Computer Engineering and Information Science Bilkent University 06533, Ankara, Turkey {atansel,
More informationMining Frequent Itemsets for data streams over Weighted Sliding Windows
Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationSETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset
SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset Ye-In Chang and Yu-Ming Hsieh Dept. of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan, Republic
More informationParallel Mining of Maximal Frequent Itemsets in PC Clusters
Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan
More informationON-LINE GENERATION OF ASSOCIATION RULES USING INVERTED FILE INDEXING AND COMPRESSION
ON-LINE GENERATION OF ASSOCIATION RULES USING INVERTED FILE INDEXING AND COMPRESSION Ioannis N. Kouris Department of Computer Engineering and Informatics, University of Patras 26500 Patras, Greece and
More informationDATA MINING II - 1DL460
Uppsala University Department of Information Technology Kjell Orsborn DATA MINING II - 1DL460 Assignment 2 - Implementation of algorithm for frequent itemset and association rule mining 1 Algorithms for
More information2. Discovery of Association Rules
2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining
More informationMining of Web Server Logs using Extended Apriori Algorithm
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India
More informationAssociation Rule Mining for Multiple Tables With Fuzzy Taxonomic Structures
Association Rule Mining for Multiple Tables With Fuzzy Taxonomic Structures Praveen Arora, R. K. Chauhan and Ashwani Kush Abstract Most of the existing data mining algorithms handle databases consisting
More informationInternational Journal of Computer Trends and Technology (IJCTT) volume 27 Number 2 September 2015
Improving Efficiency of Apriori Algorithm Ch.Bhavani, P.Madhavi Assistant Professors, Department of Computer Science, CVR college of Engineering, Hyderabad, India. Abstract -- Apriori algorithm has been
More informationMining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold
Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold Zengyou He, Xiaofei Xu, Shengchun Deng Department of Computer Science and Engineering, Harbin Institute of Technology,
More informationCHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS
23 CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS This chapter introduces the concepts of association rule mining. It also proposes two algorithms based on, to calculate
More informationSQL Model in Language Encapsulation and Compression Technique for Association Rules Mining
SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining 1 Somboon Anekritmongkol, 2 Kulthon Kasemsan 1, Faculty of information Technology, Rangsit University, Pathumtani
More informationAssociation Rule Mining Methods as Means of Forming the System of Life Quality Indicators
Association Rule Mining Methods as Means of Forming the System of Life Quality Indicators Luidmila P. Bilgaeva 1, Erzhena Ts. Sadykova 2, Gregory V. Badmaev 1 1 East Siberia State University of Technology
More informationData Access Paths for Frequent Itemsets Discovery
Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number
More informationMining Association Rules with Composite Items *
Mining Association Rules with Composite Items *. Xiiifeng Ye, Department of Computer Science, University of Auckland, New Zealand. John A. Keane, Department of Computation, UMIST, Manchester, UK. Abstract
More informationInternational Journal of Scientific Research and Reviews
Research article Available online www.ijsrr.org ISSN: 2279 0543 International Journal of Scientific Research and Reviews A Survey of Sequential Rule Mining Algorithms Sachdev Neetu and Tapaswi Namrata
More informationModel for Load Balancing on Processors in Parallel Mining of Frequent Itemsets
American Journal of Applied Sciences 2 (5): 926-931, 2005 ISSN 1546-9239 Science Publications, 2005 Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets 1 Ravindra Patel, 2 S.S.
More informationData Structure for Association Rule Mining: T-Trees and P-Trees
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 1 Data Structure for Association Rule Mining: T-Trees and P-Trees Frans Coenen, Paul Leng, and Shakil Ahmed Abstract Two new
More informationAssociation Rules: Past, Present & Future. Ramakrishnan Srikant.
Association Rules: Past, Present & Future Ramakrishnan Srikant www.almaden.ibm.com/cs/people/srikant/ R. Srikant Talk Outline Association Rules { Motivation & Denition { Most Popular Computation Approach
More informationConcurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm
Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.
More informationAssociation Rules. Berlin Chen References:
Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 25 Data Mining and Mining Association Rules Instructor: Sudeepa Roy Due CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcements
More informationTemporal Weighted Association Rule Mining for Classification
Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider
More informationTo Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set
To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,
More informationCHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL
68 CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 5.1 INTRODUCTION During recent years, one of the vibrant research topics is Association rule discovery. This
More informationDiscovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *
Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques
More informationAn Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining
An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,
More informationChapter 4 Data Mining A Short Introduction
Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationData Mining Query Scheduling for Apriori Common Counting
Data Mining Query Scheduling for Apriori Common Counting Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,
More informationOptimized Frequent Pattern Mining for Classified Data Sets
Optimized Frequent Pattern Mining for Classified Data Sets A Raghunathan Deputy General Manager-IT, Bharat Heavy Electricals Ltd, Tiruchirappalli, India K Murugesan Assistant Professor of Mathematics,
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015
RESEARCH ARTICLE OPEN ACCESS Reviewing execution performance of Association Rule Algorithm towards Data Mining Dr. Sanjay Kumar 1, Abhishek Shrivastava 2 Associate Professor 1, Research Scholar 2 Jaipur
More informationA Novel Texture Classification Procedure by using Association Rules
ITB J. ICT Vol. 2, No. 2, 2008, 03-4 03 A Novel Texture Classification Procedure by using Association Rules L. Jaba Sheela & V.Shanthi 2 Panimalar Engineering College, Chennai. 2 St.Joseph s Engineering
More informationMining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports
Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,
More informationPurna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,
Weighted Association Rule Mining Without Pre-assigned Weights PURNA PRASAD MUTYALA, KUMAR VASANTHA Department of CSE, Avanthi Institute of Engg & Tech, Tamaram, Visakhapatnam, A.P., India. Abstract Association
More informationA Graph-Based Approach for Mining Closed Large Itemsets
A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and
More informationAN ENHANCED SEMI-APRIORI ALGORITHM FOR MINING ASSOCIATION RULES
AN ENHANCED SEMI-APRIORI ALGORITHM FOR MINING ASSOCIATION RULES 1 SALLAM OSMAN FAGEERI 2 ROHIZA AHMAD, 3 BAHARUM B. BAHARUDIN 1, 2, 3 Department of Computer and Information Sciences Universiti Teknologi
More informationA Software Testing Optimization Method Based on Negative Association Analysis Lin Wan 1, Qiuling Fan 1,Qinzhao Wang 2
International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2015) A Software Testing Optimization Method Based on Negative Association Analysis Lin Wan 1, Qiuling Fan
More informationETP-Mine: An Efficient Method for Mining Transitional Patterns
ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com
More informationUsing Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment
Using Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment Ching-Huang Yun and Ming-Syan Chen Department of Electrical Engineering National Taiwan
More informationMining Top-K Association Rules Philippe Fournier-Viger 1, Cheng-Wei Wu 2 and Vincent S. Tseng 2 1 Dept. of Computer Science, University of Moncton, Canada philippe.fv@gmail.com 2 Dept. of Computer Science
More informationSQL Based Frequent Pattern Mining with FP-growth
SQL Based Frequent Pattern Mining with FP-growth Shang Xuequn, Sattler Kai-Uwe, and Geist Ingolf Department of Computer Science University of Magdeburg P.O.BOX 4120, 39106 Magdeburg, Germany {shang, kus,
More informationISSN Vol.03,Issue.09 May-2014, Pages:
www.semargroup.org, www.ijsetr.com ISSN 2319-8885 Vol.03,Issue.09 May-2014, Pages:1786-1790 Performance Comparison of Data Mining Algorithms THIDA AUNG 1, MAY ZIN OO 2 1 Dept of Information Technology,
More informationAn Efficient Algorithm for finding high utility itemsets from online sell
An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,
More informationAppropriate Item Partition for Improving the Mining Performance
Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National
More informationPC Tree: Prime-Based and Compressed Tree for Maximal Frequent Patterns Mining
Chapter 42 PC Tree: Prime-Based and Compressed Tree for Maximal Frequent Patterns Mining Mohammad Nadimi-Shahraki, Norwati Mustapha, Md Nasir B Sulaiman, and Ali B Mamat Abstract Knowledge discovery or
More informationPerformance Based Study of Association Rule Algorithms On Voter DB
Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,
More informationTransforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm
Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December
More informationDESIGN AND CONSTRUCTION OF A FREQUENT-PATTERN TREE
DESIGN AND CONSTRUCTION OF A FREQUENT-PATTERN TREE 1 P.SIVA 2 D.GEETHA 1 Research Scholar, Sree Saraswathi Thyagaraja College, Pollachi. 2 Head & Assistant Professor, Department of Computer Application,
More informationFast Discovery of Sequential Patterns Using Materialized Data Mining Views
Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo
More informationCHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.
119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched
More informationRoadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.
15-721 DB Sys. Design & Impl. Association Rules Christos Faloutsos www.cs.cmu.edu/~christos Roadmap 1) Roots: System R and Ingres... 7) Data Analysis - data mining datacubes and OLAP classifiers association
More informationValue Added Association Rules
Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency
More informationMining Top-K Association Rules. Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2. University of Moncton, Canada
Mining Top-K Association Rules Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2 1 University of Moncton, Canada 2 National Cheng Kung University, Taiwan AI 2012 28 May 2012 Introduction
More informationDiscovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method
Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Preetham Kumar, Ananthanarayana V S Abstract In this paper we propose a novel algorithm for discovering multi
More information