Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

Similar documents
Mining Frequent Patterns with Counting Inference at Multiple Levels

I. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

An Improved Apriori Algorithm for Association Rules

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Association Rule Mining from XML Data

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Pamba Pravallika 1, K. Narendra 2

Association Rule Mining. Introduction 46. Study core 46

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

Improved Frequent Pattern Mining Algorithm with Indexing

CS570 Introduction to Data Mining

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

ETP-Mine: An Efficient Method for Mining Transitional Patterns

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

Efficient Mining of Generalized Negative Association Rules

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

Algorithm for Efficient Multilevel Association Rule Mining

A Comparative Study of Association Rules Mining Algorithms

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Improved Algorithm for Frequent Item sets Mining Based on Apriori and FP-Tree

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

A Modern Search Technique for Frequent Itemset using FP Tree

An Efficient Algorithm for finding high utility itemsets from online sell

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

Mining of Web Server Logs using Extended Apriori Algorithm

A mining method for tracking changes in temporal association rules from an encoded database

Association Rules Mining using BOINC based Enterprise Desktop Grid

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets

Item Set Extraction of Mining Association Rule

Finding the boundaries of attributes domains of quantitative association rules using abstraction- A Dynamic Approach

An Algorithm for Mining Frequent Itemsets from Library Big Data

Product presentations can be more intelligently planned

Efficient Tree Based Structure for Mining Frequent Pattern from Transactional Databases

Research of Improved FP-Growth (IFP) Algorithm in Association Rules Mining

Associating Terms with Text Categories

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

SQL Based Frequent Pattern Mining with FP-growth

Temporal Weighted Association Rule Mining for Classification

Appropriate Item Partition for Improving the Mining Performance

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS

Data Structure for Association Rule Mining: T-Trees and P-Trees

Optimization using Ant Colony Algorithm

Association Rule Mining. Entscheidungsunterstützungssysteme

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015

Modified Frequent Itemset Mining using Itemset Tidset pair

On Frequent Itemset Mining With Closure

Comparing the Performance of Frequent Itemsets Mining Algorithms

Improving the Efficiency of Web Usage Mining Using K-Apriori and FP-Growth Algorithm

Mining High Average-Utility Itemsets

A Literature Review of Modern Association Rule Mining Techniques

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

Association Rules. Berlin Chen References:

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Generation of Potential High Utility Itemsets from Transactional Databases

Data Mining Part 3. Associations Rules

Mining Spatial Gene Expression Data Using Association Rules

CSCI6405 Project - Association rules mining

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran

Medical Data Mining Based on Association Rules

and maximal itemset mining. We show that our approach with the new set of algorithms is efficient to mine extremely large datasets. The rest of this p

DATA MINING II - 1DL460

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Sequential Pattern Mining Methods: A Snap Shot

Marwan AL-Abed Abu-Zanona Department of Computer Information System Jerash University Amman, Jordan

BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

Using Association Rules for Better Treatment of Missing Values

Fast Algorithm for Mining Association Rules

Tutorial on Association Rule Mining

Performance Based Study of Association Rule Algorithms On Voter DB

Memory issues in frequent itemset mining

Monotone Constraints in Frequent Tree Mining

Optimized Frequent Pattern Mining for Classified Data Sets

MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth

A Novel method for Frequent Pattern Mining

DESIGN AND CONSTRUCTION OF A FREQUENT-PATTERN TREE

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

PLT- Positional Lexicographic Tree: A New Structure for Mining Frequent Itemsets

Upper bound tighter Item caps for fast frequent itemsets mining for uncertain data Implemented using splay trees. Shashikiran V 1, Murali S 2

Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold

Generating Cross level Rules: An automated approach

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Mining Negative Rules using GRD

Sensitive Rule Hiding and InFrequent Filtration through Binary Search Method

Transcription:

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania University, Rajasthan 2 Department of Computer Science and Engineering, Meerut Institute of Engg. & Technology, Meerut, 3 Department of Math s & MCA Maulana Azad National Inst. Of Tech., Bhopal 1 vk_shrivastava@yahoo.com, 2 drparveen@apiit.edu.in, 3 kamalrajp@hotmail.com Abstract The problem of discovering association rules at single level has received significant research attention and several algorithms for mining frequent itemsets have been developed. Previous studies in data mining have yielded efficient algorithms for discovering association rules. However, discovery of association rules at multiple concept levels may lead to the mining of more specific and concrete knowledge from datasets. The discovery of multilevel association rules is very much useful in many applications. In most of the studies for multi-level association rule mining, the database is scanned repeatedly which affects the efficiency of mining process. In this research paper, a method for discovery of multi-level association rules from primitive level FP-tree is proposed in order to reduce main memory usage and make execution faster. Keywords: Data mining, discovery of association rules, multiplelevel association rules, FP-tree, FP(l)-tree, COFI-tree, primitive level FP-tree. 1. Introduction Data mining [4] is the searches for relationships and global patterns that exists in the large databases but are hidden among the vast amount of data, such as relationship between patient data and their medical diagnosis. These relationships represent valuable knowledge about the database. Data mining, the extraction of the hidden predictive information from large databases, is a powerful new technology with great potential to analyze important information in the data warehouse. Association rule mining is one of the important techniques of data mining. Association Rule mining techniques can be used to discover unknown or hidden correlation between items found in the database of transactions. An association rule [1, 3, 4] is a rule, which implies certain association relationships among a set of objects (such as occurs together or one implies to other ) in a database. Association rule mining [1, 5] is the discovery of association rules attribute-value conditions that occur frequently together in a given data set. Association rule mining is widely used for market basket or transaction data analysis. One of the basic algorithms for mining frequent itemsets is Apriori. It was proposed by Agrawal and Srikant in 1994. It is also called the level-wise algorithm. It is the most popular and influent algorithm to find all the frequent sets. The discovery of multi-level association rule involves items at different levels of abstraction. For many applications, it is difficult to find strong association among items at low level or primitive level of abstraction due to the sparsity of data in multilevel dimension. Strong associations discovered at higher levels may represent common sense knowledge. In the discovery of multi-level frequent patterns from primitive level FP-tree, first requirement is to offer an efficient method for generating frequent items at multiple level of abstraction. This requirement can be full filled by providing concept taxonomies from the primitive level concepts to higher level. There are many possible to way to explore efficient discovery of multi-level association rules. One way is to apply the existing single level association rule mining methods to mine multi-level association rules. If we apply same minimum support and minimum confidence thresholds (as single level) to the multiple levels, it may lead to some undesirable results. For example, if we apply Apriori algorithm [2, 7] to find data items at multiple level of abstraction under the same minimum support and minimum confidence thresholds, it may lead to generation of some uninteresting associations at higher or intermediate levels. Therefore, if we want to find strong relationship at relatively low level in hierarchy, the minimum support threshold must be reduced substantially. However, it may lead to generation of many uninteresting associations such as mouse antivirus. On the other hand it will generate many strong association rules at a primitive concept level. In order to remove the uninteresting rules generated in association mining process, one should apply different minimum support to different concept levels. Some algorithms are developed and progressively reducing the minimum support threshold at different level of abstraction is one of the approaches [6, 8, 9]. Figure 1: concept hierarchy of an electronics stores This paper is organized as follows. The section two describes the basic concepts of multi-level association rules. In section 506

three, a method a method for discovery of multi-level association rules from primitive level FP-tree is proposed. Section four discusses the experimental results and section five describes the conclusions of proposed research work. 2. Multiple-level Association Rules: To understand association rule mining, let assume that the database contains: i. an item data set which contains the description of each item in I in the form of <A i, description i >, where A i Є I, and ii. a transaction data set T, which consists of a set of transactions <T i, {A p,...,a q }>, where T i is a transaction identifier and A i Є I (for i. p,..., q). 2.1 Definition: A pattern or an itemset A, is one item A i or a set of conjunctive items A i ^ A j, where A i,..., A j Є I. The support of a pattern A in a set S, s(a/s), is the number of transactions (in S) which contain A versus the total number of transactions in S. The confidence of A B in S, c(a B/S), is the ratio of s(a B/S) versus s(a/s), i.e., the probability that pattern B occurs in S when pattern A occurs in S. To generate relatively frequent occurring patterns and reasonably strong rule implications, one may specify two thresholds: minimum support s, and minimum confidence c. Observe that, for discovery of multi-level association rules, different minimum support and/or minimum confidence can be specified at different levels. 2.2 Definition: A patterns A is frequent in set S at level l if the support of A is no less than its corresponding minimum support threshold s. A rule A B/S is strong if, for a set S, each ancestor (i.e., the corresponding high-level item) of every item in A and B, if any, is frequent at its corresponding level, A B/S is frequent (at the current level), and the confidence of A B/S is no less than minimum confidence threshold at the current level. For example, we assume that a database contains dataset which contains the description of each item in I in the form of A i ; description, where A i the product code. A transaction data set, T={T 1, T 2, T 3, T 4,., T n }, which consists of a set of transactions T 1 = { A x,.., A z ;} in the format of (TID, A i ); where TID is the transaction number and A i is an item from the item data set. Multi-level database stores hierarchy information encoded transaction table instead of the original transaction table [9, 10, 11]. Therefore, in transaction table each item is encoded as a sequence of digits. The product category table, which is depicted in table 1, contains product category codes and a description for each product category code which is only needed for final display. To concept hierarchy depicted in figure 1, we need to first encode the transaction data or information. For example, we encode hierarchy information by assigning a number to each item of each level. In figure 1, we have 3 levels (excluding All, since there is only 1 item at the root level). At level 3, we have computer, printer, accessories and software. We can assign 1 to computer, 2 to printer, 3 to accessories and so on. Product Category Code 0 All electronics 1 Computer 2 Printer 3 Accessories 4 Software Description 11 Desktop computer 12 Laptop computer 111 IBM Desktop computer 112 IBM Laptop computer 211 HP Inkjet Printer 221 HP Laser Printer. Table 1: Product category codes of an Electronics Stores At the sub-level for computer, level 2, we assign 1 to desktop computer, 2 to laptop computer and 3 to server computer etc. Furthermore, at the sub-level for desktop computer, we assign 1 to IBM 2 to HP and 3 to Dell. As a result, 121 represents IBM laptop computer, while 211 implies HP inkjet printer. The encoding can be extended to permit more than 10 alternatives per level. The difference between this encoding method and that found in single level is that in the single level case we encode all the items explicitly. Transaction Number Items T1 111,112,311,411 T2 112.121,312,411,422 T3 111,112,311,411 T4 113,112,311,411 T5 112.121,312,411,423 T6 111,112,311,411 T7 112.121,312,411,422 T8 111,112,311,411 T9 112.121,312,411,421 10 111,112,311,411 Table 2.2: Encoded transaction table However, in the multi-level context we will add "*" to encode higher level items. For example, at level 3 we will have 1**, 2**, 3** and 4** etc., and at level 2, we will have items like 11*, 13*, and 31* etc. At level 1, will have items like 111, 112 and 113 etc. Consequently, we would get the encoded transaction table shown in Table 2.2. The distinct items per level describes to the number of different items per concept level and the total transaction elements describes to items that appear in the itemset and occur repeatedly in the levels themselves. For example, we assume that in the encoded transaction table, the distinct items at level 1 are 111, 112, 131, 132, 211, 212, 221, 222, 311, 312, 321, 322, 412, and 423. Therefore the number of distinct items at level 1 is 14. 507

3. Method for discovery of multi-level association rules from primitive level FP- tree In this section, we propose a method for discovery of multilevel association rules from primitive level FP- tree. This method uses a hierarchy information encoded transaction table instead of the original transaction table [10, 11, 12]. It is often beneficial to use an encoded table. Although our method does not rely on the derivation of such an encoded table because the encoding can always be performed on the fly. This works in bottom-up manner. For the given FP-tree of primitive level and support threshold s and confidence c for of level l, it discovers multi-level association rules as follows: Input: FP-tree of primitive level, support threshold s and confidence threshold c for level l. Output: frequent pattern association rules for level l. 1. Construct FP(l)-tree from FP tree of primitive level by transforming all items in the header table and all nodes in the FP tree of primitive level. 2. If there are repeated nodes in a path, then top node will be maintained and others will be removed. 3. For each item in the header table, if its support count is less than s, then eliminate the item and its relative nodes form both header table and FP(l)-tree. 4. Merge the same items and relative nodes in FP(l)-tree according to each item in new header table. 5. Eliminate the duplicate items and relative nodes. Accumulate support count of relative nodes. 6. Arrange header table and nodes in FP(l)-tree in descending order of item frequency. 7. Adjust the node-links and path-links in the FP(l)-tree. 8. Generate frequent itemsets 9. For each frequent 1-item 10. Call COFI-tree // Algorithm 2 as shown in method 1. 11. Generate frequent pattern association rules for the level l. Method 1: Multi-level frequent patterns association rules mining from primitive level FP-tree Algorithm 2 COFI-tree // Build Co-Occurrence Frequent Item tree with pruning and Mining COFI-trees for FP(l)-tree Input: FP(l)-Tree, the support threshold s for level l Output: frequent item sets for level l 1. LFI= the least frequent item on the header table of FP(l)-Tree 2. While (true) do 2.1 add up the frequency of all items that share item (LFI) a path. Frequencies of all items that share the same path are the same as of the frequency of the (LFI) items 2.2 Eliminate all non-locally frequent items for the frequent list of item (LFI) 2.3 Create a root node for the (LFI)-COFI-tree with both frequency-count and contribution-count = 0 2.3.1 LOFI is the path of locally frequent items in the path of item FLI to the root 2.3.2 Items on LOFI form a prefix of the (LFI)-COFI-tree. 2.3.3 If the prefix is new then Set frequency_count= frequency of (LFI) node and contributioncount = 0 for all nodes in the path else 2.3.4 Update the frequency_count of the previously exist part of the path. 2.3.5 Update the pointers of the Header list if needed 2.3.6 find the next node for item LFI in the FP(l)-tree and go to 2.3.1 2.4 Call Mine_COFI_tree (LFI) // described in algorithm 3 2.5 Release (LFI) COFI-tree 2.6 LFI = next frequent item from the header table of FP(l)-Tree 3. Goto 2 Algorithm 3 Mine_COFI_tree (LFI) 1. nodelfi = select next node 2. while (there are still nodes) do 2.1. D = set of nodes from nodelfi to the root 2.2. F = nodelfi.frequency-count-nodelfi. contributioncount 2.3 Generate all Candidate patterns X from items in D. Patterns that do not have LFI will be discarded. 2.4. Patterns in X that do not exist in the LFI-Candidate List will be added to it with frequency = F otherwise just increment their frequency with F 2.5. Increment the value of contribution-count by F for all items in LOFI 2.6. nodelfi = select next node 3. Goto 2 4. Based on support threshold s remove non-frequent patterns from LFI Candidate List. 4. Experimental Results To test the performance of proposed a method for discovery of multi-level association rules from primitive level frequent pattern tree. We have collected data from two sources. First, we collected sales database of S. K. Technologies. Second, we have generated synthetic transaction databases using a randomized item set generation algorithm similar to that described in [2]. The transaction database is converted into an encoded transaction table, according to the information about the generalized items in the item description table. The maximum level of the conceptual hierarchy in the item table is set to 5. The encoded transaction database DB1 has parameters setting as shown in table 3. We have generated three synthetic databases BD2, DB3 and DB3 with the parameters setting as shown in table 4. We have used IBM 8175 desktop with Intel P-IV processor 2.8 GHz, 865 mother board, DDR RAM 512 MB @ 400 MHz and Western Digital 40 GB IDE Hard Drive. We have used Microsoft Windows XP operating system and Turbo C++ to test experiments. 508

Parameter Name Description N 100 T 150000 I 10 L 5 B 0 Table 3: Parameters settings for encoded database DB1 Database N T I L B DB2 100 200000 5 5 0 Table 4: Shows parameters setting for DB2. Figure 2: Running time on DB1 Figure 3: Memory usage in KB on DB1 Figure 5: Memory usage in KB on DB2 Experimental results are shown in figures 2, 3, 4, and 5. These figures show that our proposed method performs better than ML_T2L1 [10] in order to main memory usage and also runs fast. 4. Conclusion In this research work, a method for discovery of multi-level association rules from primitive level FP- tree is proposed. This proposed method constructs FP-tree(l) from FP-tree of primitive level. To generate frequent pattern in multi-level it uses COFI-tree method which reduces the memory usage in comparison to ML_T2L1. Therefore, it can mine larger database with smaller main memory available and it also runs fast. This method uses the non recursive mining process and a simple traversal of the COFI-tree, a full set of frequent items can be generated. It also uses an efficient pruning method that is to remove all locally non frequent patters, leaving the COFItree with only locally frequent items. It reaps the advantages of both the FP growth and COFI. Our experimental results show that our proposed method works efficiently in order to reduce main memory usage and also reduces the execution complexity. References [1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIG- MOD International Conference on Management of Data, pages 207-216, Washington, DC, May 26-28 1993. [2] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, Proc. of the 20th International Conference on Very Large Data Bases (VLDB '94), pp. 487-499, Santiago de Chile, Chile, September 12-15, 1994. [3] A. Savasere, E. Omiecinski and S. Navathe An Efficient Algorithm for Mining Association Rules in Large Databases. Proceedings of the 21 st VLDB conference Zurich, Swizerland, 1995. [4] R Srikant, Q. Vu and R Agrawal. Mining Association Rules with Item Constrains. IBM Research Centre, San Jose, CA 95120, USA. [5] A. K Pujai Data Mining techniques. University Press (India) Pvt. Ltd. 2001. [6] J. Han and Y. Fu Discovery of Multiple-Level Association Rules from Large Databases. Proceedings of the 21st VLDB Conference Zurich, Swizerland, 1995. [7] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufman, San Francisco, CA, 2001. Figure 4: Running time on DB2 509

[8] Y. Wan, Y. Liang, L. Ding Mining multilevel association rules with dynamic concept hierarchy. In proceedings of the seventh international conference on machine learning and cybernetics, kunming, 12-15 july 2008. [9] J. Han and Y. Fu Discovery of Multiple-Level Association Rules from Large Databases. IEEE Trans. on Knowledge and Data Eng. Vol. 11 No. 5 pp 798-804, 1999. [10] J. Han, J, Pei and Y Yin. Mining Frequent Patterns Without Candidate Generation. In ACM SIGMOD Conf. Management of Data, May 2000. [11] R. S. Thakur, R. C. Jain and K. R. Pardasani Fast Algorith for mining multi-level association rules in large databases. Asian Journal of International Management 1(1):19-26, 2007. [12] M. El-Hajj and O. R. Za ıane. COFI-tree Mining: A New Approach to Pattern Growth with in the context of interactive mining. In Proc. 2003 Int l Conf. on Data Mining and Knowledge Discovery (ACM SIGKDD), August 2003. 510