An Algorithm for Mining Frequent Itemsets from Library Big Data

Similar documents
PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Appropriate Item Partition for Improving the Mining Performance

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

Association Rule Mining. Introduction 46. Study core 46

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

UDS-FIM: An Efficient Algorithm of Frequent Itemsets Mining over Uncertain Transaction Data Streams

Comparing the Performance of Frequent Itemsets Mining Algorithms

Parallel Implementation of Apriori Algorithm Based on MapReduce

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

Data Mining for Knowledge Management. Association Rules

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

Data Mining Part 3. Associations Rules

Mining Frequent Patterns without Candidate Generation

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Efficient Algorithm for Frequent Itemset Generation in Big Data

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets

An Improved Apriori Algorithm for Association Rules

Improved Frequent Pattern Mining Algorithm with Indexing

Generation of Potential High Utility Itemsets from Transactional Databases

Memory issues in frequent itemset mining

Frequent Pattern Mining with Uncertain Data

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Frequent Itemsets Melange

Parallelizing Frequent Itemset Mining with FP-Trees

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

, and Zili Zhang 1. School of Computer and Information Science, Southwest University, Chongqing, China 2

Tutorial on Association Rule Mining

Upper bound tighter Item caps for fast frequent itemsets mining for uncertain data Implemented using splay trees. Shashikiran V 1, Murali S 2

Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Maintenance of the Prelarge Trees for Record Deletion

Efficient Tree Based Structure for Mining Frequent Pattern from Transactional Databases

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

SQL Based Frequent Pattern Mining with FP-growth

An Efficient Algorithm for finding high utility itemsets from online sell

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

IWFPM: Interested Weighted Frequent Pattern Mining with Multiple Supports

A Graph-Based Approach for Mining Closed Large Itemsets

Research Article Apriori Association Rule Algorithms using VMware Environment

Association Rule Mining

CS570 Introduction to Data Mining

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Association Rule Mining: FP-Growth

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

Association Rule Mining

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms

Mining High Average-Utility Itemsets

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

Associating Terms with Text Categories

Utility Mining Algorithm for High Utility Item sets from Transactional Databases

Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India

A New Method for Mining High Average Utility Itemsets

Temporal Weighted Association Rule Mining for Classification

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Parallel Mining of Maximal Frequent Itemsets in PC Clusters

CSCI6405 Project - Association rules mining

Research of Improved FP-Growth (IFP) Algorithm in Association Rules Mining

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

Efficient Mining of Generalized Negative Association Rules

FP-Growth algorithm in Data Compression frequent patterns

Parallel Popular Crime Pattern Mining in Multidimensional Databases

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Review of Algorithm for Mining Frequent Patterns from Uncertain Data

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

A Comparative Study of Association Rules Mining Algorithms

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

Data Partitioning Method for Mining Frequent Itemset Using MapReduce

This paper proposes: Mining Frequent Patterns without Candidate Generation

An Efficient Algorithm for Mining Erasable Itemsets Using the Difference of NC-Sets

Processing Load Prediction for Parallel FP-growth

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Association Rules Mining using BOINC based Enterprise Desktop Grid

Medical Data Mining Based on Association Rules

Discovery of Frequent Itemsets: Frequent Item Tree-Based Approach

DESIGN AND CONSTRUCTION OF A FREQUENT-PATTERN TREE

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Mining Quantitative Association Rules on Overlapped Intervals

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

Data Structure for Association Rule Mining: T-Trees and P-Trees

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Pamba Pravallika 1, K. Narendra 2

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

Transcription:

JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 2361 An Algorithm for Mining Frequent Itemsets from Library Big Data Xingjian Li lixingjianny@163.com Library, Nanyang Institute of Technology, Nanyang, Henan 473000, China Abstract Frequent itemset mining plays an important part in college library data analysis. Because there are a lot of redundant data in library database, the mining process may generate intra-property frequent itemsets, and this hinders its efficiency significantly. To address this issue, we propose an improved FP-Growth algorithm we call RFP-Growth to avoid generating intra-property frequent itemsets, and to further boost its efficiency, implement its MapReduce version with additional prune strategy. The proposed algorithm was tested using both synthetic and real world library data, and the experimental results showed that the proposed algorithm outperformed existing algorithms. Index Terms big data; frequent itemset; data mining; library I. INTRODUCTION Frequent Itemsets (Frequent Patterns) Mining is an important technique in data mining, and it has been used in many fields, such as study of other data mining techniques, bank, library, etc. Since the first algorithm of mining frequent itemsets named Apriori was proposed by Rakesh Agrawal [1], new algorithms have been proposed constantly for various sub-domains of frequent itemsets mining, and these algorithms have been applied to many domains [2-5]. The algorithms Apriori and FP-Growth [1, 6] are representative of level-wise algorithms and patterngrowth algorithms, respectively; and many algorithms are variants of these two algorithms. While many algorithms [7-13] have significantly increased their performance, they are still not fast enough for dealing with large datasets. This situation gave rise to the development of parallel algorithms. Along with the rapid growth of the application of MapReduce parallel computing framework, more and more data mining algorithms have been ported to run on this framework; parallel algorithms of mining frequent itemsets on MapReduce are based on either Apriori or FP-Growth. The algorithms in the papers [14-18] are based on Apriori, and they needed multiple MapReduce processes; if the length of the longest frequent pattern is K, K rounds of MapReduce must be applied. The algorithm PFP [19] is based on FP-Growth, and it needs only two rounds of MapReduce, but the data distributed to each node are heavily redundant, and the size of data chunks cannot be well-proportioned, so its time-performance is still not very satisfying. Because there are a lot of redundant data in some datasets, such as library database, the mining process may generate intra-property frequent itemsets, and this hinders its efficiency significantly. To address the above issue, we propose a parallel algorithm of mining frequent itemsets that needs at most only two rounds of MapReduce (in our experiments, only 1 round is needed in most cases), and this algorithm distributes data evenly among data nodes. Meanwhile, it can avoid generating intra-property frequent itemsets, and to further boost its efficiency. Experimental results confirmed that our algorithm outperformed existing algorithms significantly in term of time efficiency. The rest of this paper is organized as follows: Section 2 gives the problem definitions; Section 3 proposes the new algorithm; Section 4 is the experiment results; and Section 5 is the conclusion. II. PROBLEM DEFINITIONS Let a dataset DB ={t 1,t 2,, t n } contains n transaction itemsets and m distinct items I = {i 1, i 2,, i m }. Each transaction T = (TID, P) in DB consists of an itemset P and a unique identifier TID and P I. DB represents the size of dataset DB. An itemset X = { i, i,, i }(1 j < j < < j m) j1 j2 jk 1 2 k containing k distinct items is called a k-itemset and k is the length of the itemset X. Definition 1. The minimum support threshold min_sup is the user specified percentile of number of transactions in the given dataset DB; then minimum support number, min_supnum, in DB is defined by min_supnum = min_sup * DB. Definition 2. The support number (sn) of an itemset X is the number of transaction itemsets containing X. Definition 3. An itemset X is a frequent itemset if its support number is not less than the minimum support number. Definition 4. Let dataset DB be divided into s little datasets DB= DB1 DB2... DBs. min_sup DB i is called the local minimum support number of DB i (1 i s ). In dataset DB i, an itemset X is a local frequent itemset if its support number is not less than the local minimum support number. doi:10.4304/jsw.9.9.2361-2365

2362 JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 Property 1. Any subset X ( X null ) of a frequent itemset is a frequent itemset; any superset of a nonfrequent itemset is not a frequent itemset. III. THE PROPOSED ALGORITHM This section mainly includes three parts: (1) preprocess the college library data; (2) give a new algorithm based on FP-Growth Algorithm; (3) implement the improved parallel FP-Growth Algorithm on MapReduce framework. A. Preprocessing Library Data This paper presents a data preprocessing method to process the combined data of books bibliographic data, reader s information and readers borrowing information by the following: (1) replacing all the property items of the book information with A prefix of consecutive numbers; (2) replacing all the property items of reader information with B prefix of consecutive numbers. Definition 5. A frequent itemset is called intraproperty frequent itemset if each item of the frequent itemset has the same prefix or property. In most cases, the user doesn t care about the intraproperty frequent itemsets, or association rules derived from intra-property frequent itemsets, but need to find out the frequent itemsets between different properties, such as discovering frequent itemsets or association rules between the book information and the reader information. B. The Proposed Algorithm RFP-Growth In many cases, the algorithm FP-Growth outperforms Apriori in terms of the mining efficiency. Therefore, this paper will improve the FP-Growth algorithm for mining data with a lot of redundant. The improved algorithm can avoid generating intra-property frequent itemsets, in order to improve the efficiency of mining. The improved algorithm is called RFP-Growth. Steps of RFP-Growth Algorithm are as follows: Step1: Scan the dataset, count the support number of each item (the number of item appears in the dataset), then save the items and corresponding support number to a header table H. Each header table contains three fields: item name, support number and link field, and the field link records all nodes of an item on a tree. Step2: Remove items whose support number is less than the minimum support number from header table. If the remaining items in this header table belong to one property or this header table is empty, then quit the mining algorithm. Otherwise, sort the remaining items by support number in descending order. Step3: Scan the dataset for a second time to process the transaction item sets, the processing steps are as following steps: Step3.1: Remove the non-frequent items from transaction itemsets. Step3.2: Sort the remaining items of transaction itemsets by the order of items in header table, then adding the ordered itemsets to a tree T. Step3.3: After ordered itemsets are added to a tree T, all the support number of the itemsets' corresponding nodes plus 1, then keep all the new nodes in the field link of corresponding header table. Step4: Process each item of the header table from the last one. The processing steps are as following steps (assuming the current processing for item Q): Step4.1: Add item Q into a base itemsets BI; Step4.2: In header table H, Q.link contains all the nodes in the tree T whose item name is Q, denoted k nodes: N1, N2,..., Nk. Step4.3: read all the items from node Ni(i = 1, 2,..., k) to the root of tree T, then save this items and its support number(the support numbers of each item are the same, which is the support number of node Ni) to a sub-header table subh( sub-header table subh has the same structure with the header table H). Step4.4: Remove items from subh whose support number is less than minimum support number; if items of base itemsets BI and items of sub-header table subh are the same property or subh is empty, then step 5. Otherwise, sort the sub-header table subh by support number in descending order. Step4.5: Read all the items from node Ni (i = 1, 2,..., k) to the root of tree T in turn, remove local nonfrequent items from the read items; Step4.6: Sort the remaining itemsets by sub-header table order, then add the ordered itemsets to a new subtree subt. Step4.7: When ordered itemsets are added to the subtree subt, all the support number of the itemsets' corresponding nodes plus s (s is the support number of node Ni), then keep all the new nodes in the link of corresponding subhead table.

JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 2363 Figure 1. The flow chat of algorithm RFP-MR Step4.8: Process sub-header table subh from step 4 recursively; Step5: Remove the current processing item of the current processing header table from base itemsets BI, then continue process next item in the current processing header table. The improved FP-Growth algorithm is called RFP- Growth, the main differences between the RFP-Growth algorithm and original FP-Growth algorithm are Step2 and Step4.4, and these two steps can effectively avoid generating intra-property frequent itemsets.

2364 JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 C. The Method of Mining Frequent Itemsets from a Library Big Data Theorem 1: Itemset X is a frequent itemset if the sum of support number of part of the data blocks of the itemset X is not less than the minimum support number. Proof: Obviously, if the sum of support number of part of the data blocks of an itemset X is not less than the minimum support number, then the sum of support number of all data blocks must have not less than the minimum support number. Therefore, the itemset X must be a frequent itemset. Theorem 2: Divide dataset DB into m little datasets DB = { DB 1, DB 2,, DB m }, and let the frequent itemset contained in dataset DB be denoted by FI, and the local frequent itemset contained in each little dataset be called FI 1, FI 2,, FI m. Then the frequent itemset of dataset DB is a subset of what on all the little datasets, that is FI FI 1 FI 2 FI m. Proof: Proof by contradiction. Set the support number of a itemset X on the i-th data block as SN i,then the support number of itemset X on m dataset DB is SN i=. If itemset X is not a frequent itemset on any data block, then the support number of X on any data block is less than the local minimum support number, that is X i <min_sup* DB i (1 i s). Therefore, m m m X * < min_sup DB min_sup * DB i= = i= i= = min_sup* DB. In conclusion, the itemset X is not a frequent itemset if it is not a local frequent itemset. According to the principle of Theorem 2, mining frequent itemsets from a large dataset can be divided into two steps: (1) Taking advantage of the algorithm RFP-Growth to mine local frequent itemsets from each data block, which generate candidate itemsets of global frequent itemsets. (2) Re-scaning dataset to discover real frequent itemsets, where each step executes MapReduce once. A parallel RFP-Growth algorithm based on MapReduce is called RFP-MR. Figure s a flow chart of the algorithm RFP-MR. The transaction datasets execute MapReduce twice to mine frequent itemsets. Before performing the second round, cut down the size of candidate itemsets: remove some of the non-frequent itemsets from candidate itemsets by taking advantage of Property 1; discover some of the frequent itemsets by taking advantage of Theorem1. Thus, the time efficiency of the second MapReduce can be efficiently improved by decreasing of the size of candidate itemsets. IV. EXPERIMENTAL RESULTS In order to verify the effectiveness of our proposed algorithm, we compare our algorithm with Apriori-based k rounds MapReduce (referred to as Apriori_KMR) algorithm [18] and the algorithm PFP [19] respectively. All experiments in this paper are based on the algorithm implemented in Python. Experimental platform is a cluster of 26 nodes, containing a master node, a scheduling node, a backup node and 23 data nodes. Each node in the hardware is configured with the 2.5GHz dualcore CPU and 8GB of memory, and the software is configured with ubuntu 12.04 and Hadoop 0.23.0. In this paper, we take two datasets, one is historical borrowing data of our university (denoted as DATA1), it contains a total of 2000K transactions, and the length of each transaction itemset is 26 (that is each transaction contains 26 properties). The other one is synthetic dataset T20I10D10000K, and it was generated by IBM data generator, this dataset contains 10000K transactions,the average length of the transactions is 20. In order to fully taking advantage of 23 data nodes, the tested data file in this paper has been divided evenly into 20 small data files. Figure 2 shows the runtime of three algorithms in different minimum support threshold, and it can be seen from Figure 2 that the proposed algorithm in this paper outperformed the algorithm Apriori_KMR in term of time efficiency, because with the decrease of minimum support threshold, the number of frequent itemsets increases. Thus, the algorithm Apriori-KMR produces excessive candidate itemsets, and it repeatedly invokes MapReduce. The proposed algorithm in this paper executes MapReduce for the first time to produce local frequent itemsets, and then cut down the size of candidate itemsets by filtering the candidate itemsets. In many cases, there is no need for a second round of MapReduce to count the support number of candidate itemsets. Therefore, the time efficiency of the proposed algorithm is higher than the algorithm Apriori_KMR. When testing data in DATA1, since the dataset has intra-property frequent itemsets, the algorithm Apriori_KMR can avoid generating intra-property frequent itemsets effectively. Therefore, the algorithm RFP-MR outperforms Apriori- KMR in term of time efficiency. runtime (s) 2500 2000 1500 1000 500 0 RFP-MR Apriori-KMR PFP 1.2 1.4 1.6 1.8 2.0 minimum support threshold (%) (a) The dataset DATA1

JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 2365 runtime (s) 7000 6000 5000 4000 3000 2000 1000 0 RFP-MR Apriori-KMR PFP 0.7 0.9 1.1 1.3 1.5 minimum support threshold (%) (b) The dataset T20I10D10000K Figure 2. The runtime of three algorithms under varied minimum support threshold In the algorithm PFP, the data distributed to each node are heavily redundant, and the size of data chunks cannot be well-proportioned, so its time efficient is still not very REFERENCES [1] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. International Conference on Very Large Data Bases (VLDB 1994). 1994. Santiago, Chile. [2] B. Li. Non-personalized Book Recommendation Based on Search Behavior. Library Journal, 2013. 8(32): pp. 36-41. [3] Q. Ruan, H. Lin and H. Tan. Research on the approach to optimize book exhibition based on mining association rules. International Journal of Advancements in Computing Technology, 2012. 4(16): pp. 500-507. [4] H. He. Analysis ofassociation Rules in Book Circulation. Library Journal, 2011. 7(30): pp. 63-68. [5] M. Antonie, O.R. Zaiane and A. Coman. Application of Data Mining Techniques for Medical Image Classification. MDM/KDD, 2001. 2001: pp. 94-101. [6] J. Han, J. Pei and Y. Yin. Mining frequent patterns without candidate generation. ACM SIGMOD International Conference on Management of Data. 2000. Dallas, TX, United states. [7] L. Wang, L. Feng and M. Wu. AT-Mine: An Efficient Algorithm of Frequent Itemset Mining on Uncertain Dataset. Journal of Computers, 2013. 8(6): pp. 1417-1426. [8] L. Feng, L. Wang and B. Jin. UT-Tree: Efficient mining of high utility itemsets from data streams. Intelligent Data Analysis, 2013. 17(4): pp. 585-602. [9] B. Chandra and S. Bhaskar. A novel approach for finding frequent itemsets in data stream. International Journal of Intelligent Systems, 2013. 28(3): pp. 217-241. [10] J.J. Cameron, A. Cuzzocrea and C.K. Leung. Stream mining of frequent sets with limited memory. 28th Annual ACM Symposium on Applied Computing (SAC 2013). 2013. Coimbra, Portugal. [11] B. Vo, T. Hong and B. Le. DBV-Miner: A Dynamic Bit- Vector approach for fast mining frequent closed itemsets. Expert Systems with Applications, 2012. 39(8): pp. 7196-7206. satisfying. As shown in Figure 2, the proposed algorithm RFP-MR outperforms PFP in term of time performance. V. CONCLUSION With the development of information technology, library data is also growing, in order to effectively mine the frequent itemsets, this paper have done three works, one is giving a mining algorithm RFP-Growth, which can effectively avoid generating intra-property frequent itemsets. Secondly, applying the improved algorithm RFP-Growth to the MapReduce framework, and realize parallelism, which can effectively mine frequent itemsets from big data. Finally, we use the library borrowing data and a synthetic dataset to validate the proposed algorithm RFP-MR, and the experimental results confirmed that our algorithm RFP-MR outperformed existing algorithms Apriori-KMR and PFP significantly in term of time efficiency. [12] W. Shui and W. Le. An implementation of FP-growth algorithm based on high level data structures of weka- JUNG framework. Journal of Convergence Information Technology, 2010. 5(9): pp. 287-294. [13] M. El-hajj and O.R. Zaïane. COFI-tree mining: a new approach to pattern growth with reduced candidacy generation. IEEE International Conference on Frequent Itemset Mining Implementations. 2003. [14] T. Xiao, C. Yuan and Y. Huang. PSON: A parallelized SON algorithm with MapReduce for mining frequent sets. 4th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2011). 2011. Tianjin, China. [15] M. Riondato, et al. PARMA: A parallel randomized algorithm for approximate association rules mining in MapReduce. 21st ACM International Conference on Information and Knowledge Management (CIKM 2012). 2012. Maui, HI, United states. [16] X.Y. Yang, Z. Liu and Y. Fu. MapReduce as a programming model for association rules algorithm on Hadoop. 3rd International Conference on Information Sciences and Interaction Sciences (ICIS 2010). 2010. Chengdu, China. [17] J. Cryans, S. Ratte and R. Champagne. Adaptation of apriori to MapReduce to build a warehouse of relations between named entities across the web. 2nd International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2010). 2010. Menuires, France. [18] M. Lin, P. Lee and S. Hsueh. Apriori-based frequent itemset mining algorithms on MapReduce. 6th International Conference on Ubiquitous Information Management and Communication. 2012. [19] H. Li, et al. PFP: Parallel FP-growth for query recommendation. 2nd ACM International Conference on Recommender Systems (RecSys 2008). 2008. Lausanne, Switzerland.