Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Similar documents
Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms

Discovering Quasi-Periodic-Frequent Patterns in Transactional Databases

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Efficient Discovery of Periodic-Frequent Patterns in Very Large Databases

Data Mining Part 3. Associations Rules

Discovering Chronic-Frequent Patterns in Transactional Databases

Mining Frequent Patterns without Candidate Generation

Appropriate Item Partition for Improving the Mining Performance

Generation of Potential High Utility Itemsets from Transactional Databases

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

Improved Approaches to Mine Periodic-Frequent Patterns in Non-Uniform Temporal Databases

Association Rule Mining

This paper proposes: Mining Frequent Patterns without Candidate Generation

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

Tutorial on Association Rule Mining

Data Mining for Knowledge Management. Association Rules

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

Memory issues in frequent itemset mining

An Efficient Algorithm for finding high utility itemsets from online sell

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

Frequent Pattern Mining for Multiple Minimum Supports with Support Tuning and Tree Maintenance on Incremental Database

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Maintenance of the Prelarge Trees for Record Deletion

Information Sciences

An Improved Algorithm for Mining Association Rules Using Multiple Support Values

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree

I. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

IWFPM: Interested Weighted Frequent Pattern Mining with Multiple Supports

Discovering Periodic Patterns in Non-uniform Temporal Databases

Association Rule Mining: FP-Growth

An Efficient Generation of Potential High Utility Itemsets from Transactional Databases

MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm

Association Rule Mining

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

Improved Frequent Pattern Mining Algorithm with Indexing

Mining of Web Server Logs using Extended Apriori Algorithm

Finding Sporadic Rules Using Apriori-Inverse

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS

An Improved Apriori Algorithm for Association Rules

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Parallelizing Frequent Itemset Mining with FP-Trees

Pamba Pravallika 1, K. Narendra 2

Association mining rules

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Utility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets

ETP-Mine: An Efficient Method for Mining Transitional Patterns

Monotone Constraints in Frequent Tree Mining

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

CSCI6405 Project - Association rules mining

Mining High Average-Utility Itemsets

Improved FP-growth Algorithm with Multiple Minimum Supports Using Maximum Constraints

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams *

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Parallel Popular Crime Pattern Mining in Multidimensional Databases

Performance Analysis of Data Mining Algorithms

An Algorithm for Mining Frequent Itemsets from Library Big Data

Research of Improved FP-Growth (IFP) Algorithm in Association Rules Mining

A New Approach to Discover Periodic Frequent Patterns

Enhanced SWASP Algorithm for Mining Associated Patterns from Wireless Sensor Networks Dataset

Finding Frequent Patterns Using Length-Decreasing Support Constraints

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

Implementation of Efficient Algorithm for Mining High Utility Itemsets in Distributed and Dynamic Database

ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

UDS-FIM: An Efficient Algorithm of Frequent Itemsets Mining over Uncertain Transaction Data Streams

Minig Top-K High Utility Itemsets - Report

Frequent Itemsets Melange

Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Patterns with Counting Inference at Multiple Levels

Improving the Efficiency of Web Usage Mining Using K-Apriori and FP-Growth Algorithm

Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

Approaches for Mining Frequent Itemsets and Minimal Association Rules

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

arxiv: v1 [cs.db] 11 Jul 2012

Maintenance of fast updated frequent pattern trees for record deletion

Chapter 4: Association analysis:

Association Rule Mining from XML Data

Improved Algorithm for Frequent Item sets Mining Based on Apriori and FP-Tree

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Association Rule Mining. Introduction 46. Study core 46

Finding the boundaries of attributes domains of quantitative association rules using abstraction- A Dynamic Approach

AN INTELLIGENT APPROACH FOR MINING FREQUENT SPATIAL OBJECTS IN GEOGRAPHIC INFORMATION SYSTEM

Transcription:

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad, India - 500032. uday rage@research.iiit.ac.in and pkreddy@iiit.ac.in Abstract Recently, an approach has been proposed in the literature to extract frequent patterns which occur periodically. In this paper, we have proposed an approach to extract rare periodic-frequent patterns. Normally, the single minsup based frequent pattern mining approaches like Apriori and FP-growth suffer from rare item problem. That is, at high minsup, frequent patterns consisting of rare items will be missed, and at low minsup, number of frequent patterns explode. In the literature, efforts have been made to extract rare frequent patterns under multiple minimum support framework. It was observed that the periodic-frequent pattern mining approach also suffers from the rare item problem. In this paper, we have extended multiple minimum support framework to extract rare periodic-frequent patterns and developed a new algorithm to extract rare periodic-frequent patterns. Experiment results show that the proposed approach is efficient. 1 INTRODUCTION It can be observed that most of the data mining approaches discover the knowledge pertaining to frequently occurring entities. However, real-world datasets are mostly nonuniform in nature containing both frequent and relatively infrequent or rarely occurring entities. In literature, it has been reported that rare knowledge patterns i.e., knowledge pertaining to rare entities may contain interesting knowledge useful in decision making process [1]. The rare knowledge patterns are more difficult to detect because they present in fewer data cases. In the literature, research efforts are being made to investigate efficient approaches to extract rare knowledge patterns like rare association rules and rare class identification [1]. Frequent pattern mining is an important model in data mining. Normally, the single minsup based frequent pattern mining approaches like Apriori [2] and FP-growth [3] 15 th International Conference on Management of Data COMAD 2009, Mysore, India, December 9 12, 2009 c Computer Society of India, 2009 suffer from rare item problem. That is, at high minsup, frequent patterns consisting of rare items will be missed, and at low minsup, number of frequent patterns explode. In the literature, efforts have been made to extract rare frequent patterns under multiple minimum support framework [5, 6, 7, 8]. Several extensions to frequent pattern mining have been investigated in the literature. Recently, an approach has been proposed to extract new kind of frequent patterns, called periodic-frequent patterns []. It considers both the items of a transaction along with the timestamp (occurrence time), and extracts knowledge pertaining to periodic occurrence of patterns. In this paper, we have made an effort to extract rare periodic-frequent patterns. The existing periodic-frequent pattern mining approach also suffers from the rare item problem as it is also based on single minsup constraint []. In this paper, we have developed a new algorithm to extract rare periodic-frequent patterns by extending multiple minimum support framework. Experimental results on synthetic and real-world datasets show that the proposed approach is efficient. The paper is organized as follows. In Section 2, we discuss the background information about the rare item problem in mining frequent patterns and multiple minimum support framework. We also discussed the rare item problem in periodic-frequent pattern mining approach. In Section 3, we discuss the proposed approach. Experimental results conducted on synthetic and real world datasets are presented in Section. Section 5 concludes the paper. 2 Background 2.1 Overview of Frequent Pattern Mining Frequent patterns are important class of regularities which exist in a database. The basic model of frequent patterns is as follows [2]: Let I = {i 1,i 2,,i n } be a set of items. Let T be a set of transactions (dataset), where each transaction t is a set of items such that t I. A pattern (or an itemset) X is a set of items {i 1,i 2,,i k }, where (1 k n) such that X I.

Support of X, denoted as S(X), refers to the proportion of transactions that contain X in the transaction dataset. The pattern X is a frequent pattern, if S(X) minsup, where minsup refers to user-specified minimum support. Table 1: Transaction dataset. TID Items TID Items 1 bread, jam 6 bread, ball, bat 2 bread, ball, pen, bat 7 ball, bed, pillow 3 pen, bed, pillow 8 ball, bat bread, jam 9 bread, jam 5 bread, jam 10 ball, bat, book Example 1 (Running Example). Consider the set of items I = {bread, ball, pen, jam, bat, bed, pillow, book} shown in Table 1. The dataset contains 10 transactions. Let minsup = 0.. The pattern {bread, jam} is one of the frequent pattern because its support is 0., which is equal to minsup. 2.2 Rare Item Problem in Frequent Pattern Mining Rare items are the items having low support values. Rare frequent patterns are the frequent patterns consisting of only rare items, or both frequent and rare items. With a single minsup constraint, mining of rare frequent patterns causes the following problem. At high minsup, frequent patterns involving rare items will be missed, and at low minsup number of frequent patterns will be exploded. This dilemma is called rare item problem [5]. Example 2: We now explain the rare item problem by considering two minsup values for the dataset shown in Table 1. If minsup = 0., the set of frequent patterns generated is {{bread}, {ball}, {jam}, {bat}, {bread, jam}, {bat, ball}}. It can be observed that the rare pattern {bed, pillow} is missed. If minsup = 2, then the set of frequent patterns generated is {{bread}, {ball}, {jam}, {bat}, {bed}, {pillow}, {bread, jam}, {bat, ball}, {bed, pillow}, {bread, ball}, {bread, bat}, {bread, ball, bat}}. It can be observed that even though the missed rare pattern {bed, pillow} is extracted at low minsup value, however the number of frequent patterns is increased. 2.3 Multiple Minimum Support Framework In the literature, efforts are being made to extract rare frequent patterns under multiple minimum support framework [5, 6, 7, 8]. The basic idea is as follows. To extract frequent patterns only a single minsup is used for entire dataset, whereas under multiple minimum support framework, each item is specified with a different minimum support value, called minimum item support (MIS). Rare frequent patterns are extracted by considering MIS values for all items. Under multiple minimum support framework, minsup for a pattern is calculated using Equation 1. ( MIS(i1 ),MIS(i minsup(i 1,i 2,...,i k ) min 2 )...,MIS(i k ) ) (1) where, minsup(i 1,i 2,,i k ) represents the support of a pattern {i 1,i 2,,i k } and MIS(i j ) represents the minimum item support for the item i j I. Example 3: For the dataset of Table 1, let the userspecified MIS values for the items bread, ball, pen, jam, bat, bed and pillow be,, 3, 3, 3, 2 and 2 (in counts) respectively. The frequent patterns generated are {{bread}, {ball}, {jam}, {bat}, {bed}, {pillow}, {bread, jam}, {bat, ball}, {bed, pillow}}. 2. Overview of Periodic-Frequent Pattern Mining Periodic-Frequent patterns are special kind of frequent patterns which occur periodically (or regularly) within a dataset []. In this approach, the time of occurrence of each transaction is taken into account for periodic-frequent pattern mining. So, in the periodic-frequent pattern mining, the tid represents timestamp of a particular transaction t. Let ttid X be the tid in which the pattern X has occurred. Suppose, a pattern X appears in two transactions, say ti X and t X j, where i and j are tids and i < j (i occurred before j). A period of pattern X, say p X, is the difference between t X j ti X. Let, P X denote the set of all periods of X i.e., P X = {p X o,, p X o }. Then, Per(X) = Max(p X o,, p X o ) is called the periodicity of pattern X. A pattern is said to be periodic-frequent, if its periodicity is no greater than the user-specified maximum periodicity (maxper) constraint and its support is no less than the user-given minimum support (minsup) constraint. Example : For the dataset of Table 1, we assume that the first column represents the timestamps of the transactions. The pattern {bread, jam} has occurred in t 1, t, t 5 and t 9. A period for this pattern, p bread, jam by considering the t 1 and t is 3 ( 1). Similarly, the other periodicities are 1 (5-) and (9-5). The periodicity of the pattern {bread, jam}, Per({bread, jam}) = Max(3,1,) =. Let the user-specified minsup and maxper values are and respectively. Then, this pattern is a periodic-frequent pattern because S(bread, jam) minsup and Per(bread, jam) maxper. Conventional frequent pattern mining approaches like Apriori or FP-growth cannot be used for mining periodicfrequent patterns because they do not capture the periodicity of the patterns. Therefore, another pattern-growth approach has been discussed in []. In this approach, a tree, called Periodic-Frequent tree (PF-tree) is constructed and mined using conditional pattern bases to discover complete set of periodic-frequent patterns. 2.5 Rare Item Problem in Periodic-Frequent Pattern Mining As the basic model of periodic-frequent patterns uses only single minsup as one of its constraint, this model also suffers from the dilemma called rare item problem similar to the rare item problem in frequent pattern mining approaches such Apriori and FP-growth.

3 Mining Rare Periodic-Frequent Pattern 3.1 Basic Idea To extract rare periodic-frequent patterns, the multiple minimum support framework has to be extended to the basic model of periodic-frequent patterns. Under multiple minimum support framework, the MIS values for the items are taken into account to extract rare frequent patterns. Similarly, if we extended multiple minimum support framework to the existing approach of periodicfrequent patterns for extracting rare periodic-frequent patterns, the existing approach has to be modified by taking into account the MIS values of the items. For extending multiple minimum support framework to the existing periodic-frequent pattern approach, the following issues are to be addressed. Criteria for specifying minsup for a pattern. Extending the existing Periodic Frequent-tree (PFtree) approach discussed in [] to mine rare periodicfrequent patterns. 3.1.1 Criteria for specifying minsup for a pattern The input to the proposed approach is set of transactions containing items, MIS values for the items and maximum periodicity (maxper). Under multiple minimum support framework for frequent patterns, the criterion for specifying minsup for a pattern is shown in Equation 1. Similarly, for multiple minimum support framework based periodic-frequent pattern mining, the minsup for a pattern is shown in Equation 2. ( ) MIS(i1 ),MIS(i S(i 1,i 2,...,i k ) min 2 ) (2)...,MIS(i k ) and Per(i 1,i 2,...,i k ) maxper where S(i 1,i 2,,i k ) represents the support of a pattern {i 1, i 2,, i k }, Per(i 1,i 2,...,i k ) represents the periodicity of a pattern {i 1, i 2,, i k } and MIS(i j ) represents the minimum item support for the item i j I. 3.1.2 Extending multiple minimum support framework to PF-tree The existing PF-tree approach is summarized as follows. The PF-tree contains two components (i) PF-list and (ii) prefix-tree. PF-list is a list which is used to maintain each item s frequency and periodicity values. Prefix-tree is a tree similar to the FP-tree [3]; however, instead of containing support of an item at each node, the tid of the transaction is maintained at the leaf node of the corresponding item. The corresponding tid list of the item s nodes is called tid-list of that item. The construction of PF-tree is as follows. The PF-list is constructed with the initial scan on the dataset. From the PF-list, items which have support less than minsup and periodicity greater than maxper are pruned. Next, items in the PF-list are sorted in descending order of their support values. Using, the sorted list of items, prefix-tree is constructed by performing another scan on the dataset. Using conditional pattern bases, the constructed PF-tree is mined to discover complete set of periodic-frequent patterns. We now address the issues regarding the extensions to the PF-tree approach to mine rare periodic-frequent patterns. First, we discuss about downward closure property and sorted closure property of the patterns. According to downward closure property, all non-empty subsets of frequent patterns are frequent. According to sorted closure property, if a sorted pattern i 1, i 2,..., i k, for k 2 and MIS(i 1 ) MIS(i 2 )... MIS(i k ), is frequent then all its subsets consisting of the item having lowest MIS value i.e., i 1 must be frequent. However, the other subsets need not necessarily be frequent. (The sorted closure property has been elaborately discussed in [5] [8]). It was observed that the patterns mined using maxper constraint follow downward closure property. Whereas, the patterns mined using multiple minimum support framework follow sorted closure property. The issue here is which property do the rare periodicfrequent patterns follow. It can be noted that the rare periodic-frequent patterns follow sorted closure property. As a result, to extract rare periodic-frequent patterns we have to consider all subsets of a pattern for generating rare periodic-frequent patterns. The other issue is construction of PF-tree (PF-list and prefix-tree) by considering MIS values of the items. To extract rare periodic-frequent patterns, we store the item s MIS values by creating additional column in the PF-list. And, the prefix-tree has to be constructed with MIS descending order of items. 3.2 Description of the Proposed Approach The input to the proposed approach is the transactional dataset (along with the timestamp of each transaction), item s MIS values and maxper constraint. The output is the complete set of periodic-frequent patterns. The approach involves three steps: Construction of MIS-PF-tree, Construction of compact MIS-PF-tree and Mining Patterns from the compact MIS-PF-tree. We briefly discuss these steps. 3.2.1 Construction of MIS-PF-tree The MIS-PF-tree is a tree consisting of two components: MIS-PF-list and prefix-tree. The MIS-PF-list is a list which contains four fields - item name (i), minimum item support (MIS), total support ( f ) and periodicity of item i (p). The structure of the prefix-tree used in MIS-PF-tree is same as the prefix-tree used in PF-tree []. However, items are arranged in support descending order in the prefix-tree of the PF-tree, whereas the items are arranged in MIS descending order in the prefix-tree of MIS-PF-tree. We define the following variables. Let id l be the temporary array which records the tids of last occurring trans-

id l id l id l Bread 0 0 0 Bread 1 1 1 Bread 2 1 2 0 0 0 0 0 0 Bread 1 2 2 Pen 3 0 0 0 Pen 3 0 0 0 Pen 3 1 2 2 Jam 3 0 0 0 Jam 3 1 1 1 Jam:1 Jam 3 1 1 1 Bat 3 0 0 0 Bat 3 0 0 0 Bat 3 1 2 2 Bed 2 0 0 0 Bed 2 0 0 0 Bed 2 0 0 0 Pill. 2 0 0 0 Pill. 2 0 0 0 Pill. 2 0 0 0 Book 2 0 0 0 Book 2 0 0 0 Book 2 0 0 0 Jam:1 Bread Pen Bat:2 (a) (b) (c) id l Bread 6 9 Bread 6 5 10 Bread Pen 5 Pen 3 2 2 3 Pen 3 2 7 Jam 3 9 Jam:1,,5,9 Bed Bed Bat:8 Jam 3 Bat 3 10 Bat 3 Bed 2 2 7 Pen Bat:6 Pillow:3 Pillow:7 Book:10 Bed 2 2 Pill. 2 2 7 Pill. 2 2 Book 2 1 10 10 Bat:2 Book 2 1 10 (d) (e) Figure 1: Construction of MIS-PF-list and MIS-PF-tree. (a) Before scanning before scanning dataset (b) After scanning first transaction (c) After scanning second transaction (d) After scanning complete dataset (e) Updated MIS-PF-list after scanning complete dataset. The term Pill. refers to the item pillow actions of the items in the MIS-PF-list. Let t cur and p cur denote the tid of current transaction and the most recent period for an item. The construction of MIS-PF-tree is as follows. Before scanning the transaction dataset, the MIS- PF-list is constructed by adding the items in descending order of their MIS values and by setting f = 0 and p = 0. In the prefix-tree, a root node is created and labeled as null. The id l value of every item are set to 0. Let the ordered list of items in the MIS-PF-list be L. Next, in each transaction t cur, identify the items in it. For each item in the respective transaction, perform the following steps in the MIS-PF-list: (i) increment f by 1 (ii) calculate p cur as t cur id l and update p = p cur, if p cur > p (iii) set id l = t cur. For the same transaction a branch is created in the prefix-tree and the tid of the respective transaction is added to tid-list of the tail node represented by this transaction. (The creation procedure of a branch in the prefix-tree is same as that in FP-tree; however, we do not maintain the frequency value at each node.) After scanning all the transactions, update the p value of every item in the MIS- PF-list by calculating the p cur value with t cur equivalent to the tid of the last transaction in the transaction dataset. To facilitate tree traversal, an item header table is built so that each item points to its occurrences in the tree via a chain of node-links. We explain the construction of MIS-PF-tree by considering the dataset shown in Table 1. The initial MIS-PF-tree is shown in Figure 1 (a). The MIS-PF-tree generated after scanning first, second and the last transaction are shown in Figure 1(b), Figure 1(c) and Figure 1(d). Figure 1(e) shows the updated MIS-PF-list after scanning the complete transaction dataset. 3.2.2 Constructing the compact MIS-PF-tree The items in the MIS-PF-tree can be pruned using the following observations. If the support of an item is less than the lowest MIS value among all periodic-frequent items, such item will not generate any periodic-frequent pattern. If the periodicity of an item is greater than maxper, such item will not generate any periodic-frequent pattern. In the MIS-PF-list, we collect the set of all periodicfrequent items and identify the lowest MIS value among them. Let us call this value as periodic lowest minimum item support (PLMIS). In the MIS-PF-list, items which have support < PLMIS or periodicity > maxper are identified and pruned from the MIS-PF-tree one after the another because they do not generate any periodic-frequent pattern. The procedure followed for pruning each item from the MIS-PF-tree is as follows. In the MIS-PF-list, completely remove the respective item from the list. In the prefix-tree, perform tree-pruning operation to remove all nodes of the respective item. If a pruned node has a tid-list then tid-list is transferred to its respective parent node. After tree-pruning operation, the resultant prefix-tree may have parent nodes, where each parent node may have multiple child nodes of a same item. Therefore, treemerging operation is performed on the prefix-tree to merge such child nodes. The tree-merging operation involves generating a common branch by merging the tid-lists of different child nodes having same item. The resultant MIS- PF-tree derived after tree-pruning and tree-merging operations is called compact MIS-PF-tree. The MIS-PF-list in the compact MIS-PF-tree is called compact MIS-PF-list. Continuing with the example, the periodic-frequent items in the MIS-PF-list are bread, ball, jam, bat, bed and pillow. The lowest MIS value among all these periodicfrequent items is 2. Therefore, using PLMIS = 2 and Maxper =, we prune the items book and pen from the MIS-PF-list and prefix-tree of MIS-PF-tree. Figure 2(a) and Figure 2(b) show the MIS-PF-tree generated after

pruning item book and item pen. It can be observed in Figure 2(b) that there exists two different branches, one with tid-list = 2 and another with tid-list = 6, sharing same set of items. So we perform tree-merging operation to merge such branches. The compact MIS-PF-tree derived after tree-merging operation is shown in Figure 3(a). Bread 6 5 Bread Pen Pen Bat 3 3 2 8 Jam Bed 3 2 2 Jam:1,,5,9 Pen Bat:6 Bed Pillow:3 Bed Pillow:7 Bat:8,10 Pill. 2 2 Bat:2 Bread 6 5 Bread Bed Jam 3 Bat 3 Jam:1,,5,9 Pillow:3 Bed Bat:8,10 Bed 2 2 Pill. 2 2 Bat:2 Bat:6 Pillow:7 (b) Figure 2: MIS-PF-tree. (a) After pruning item book and (b) After pruning item pen. (a) 3.2.3 Mining the compact MIS-PF-tree Mining the compact MIS-PF-tree is similar to the mining of PF-tree []. However, the difference is we use the MIS value of the suffix item as the minsup for all of its conditional pattern bases. Continuing with the example, mining the periodicfrequent patterns from the compact MIS-PF-tree shown in Figure 3(a) is as follows. The item pillow which is at the bottom-most of the MIS-PF-list is initially chosen for mining periodic-frequent patterns. Its prefix-tree, MIS-PF pillow, shown in Figure 3(b) is constructed with the prefix sub-paths of nodes labeled with item pillow. From MIS-PF pillow, the conditional-tree, MIS-CT pillow (see Figure 3(c)) is constructed by removing all nonperiodic frequent nodes. From MIS-CT pillow, the pattern {bed, pillow} is generated as periodic-frequent pattern because S(bread, jam) MIS(pillow) and P(bread, jam) maxper. To enable construction of prefix-tree for the remaining items in the compact MIS-PF-list, the tid-lists of the item pillow are pushed-up to respective parents nodes in the original compact MIS-PF-tree and all nodes of item pillow in compact MIS-PF-tree are deleted thereafter. Similarly process is carried out for remaining items in the compact MIS-PF-tree. Finally, the periodic-frequent patterns generated are {{bread}, {ball}, {jam}, {bat}, {bed}, {pillow}, {bread, jam}, {bat, ball}, {bed, pillow}}. Experimental Results In this section, we present the performance comparison of the proposed approach with the approach discussed in []. We have evaluated the performance results pertaining to the Synthetic Retail dataset dataset SD = λ(1-β) SD = λ(1-β) β = 0.8 0.25% 0.07% β = 0.6 0.50% 0.1% β = 0. 0.75% 0.21% β = 0.2 1.00% 0.28% Table 2: Support difference values used in synthetic and retail datasets. generation of periodic-frequent patterns by considering two kinds of datasets: synthetic and real world datasets. The synthetic dataset T10.I.D100K, is generated with the data generator [2], which is widely used for evaluating association rule mining algorithms. It contains 1,00,000 number of transactions and 886 items. Another dataset is a real world dataset referred as retail dataset [9]. It contains 88,162 number of transactions and 16,70 items. For our experiments, we used the method discussed in [7] for specifying the MIS values for the items (see Equation 3). MIS(i j ) = S(i j ) SD i f (S(i j ) SD) > LS (3) = LS otherwise SD = λ(1 β) where, S(i j ) refers to support of an item i j I, LS refers to user-specified least support value, MIS(i j ) refers to minimum item support for an item i j I, SD refers to userspecified support difference value, λ represents the parameter like mean, median and mode of the item supports and β [0,1]. For both T10.I.D100k and Retail datasets, the MIS values are calculated by fixing LS value at 0.1% and varying the SD values. The SD values are calculated with λ representing the mean of the items having support values no less than the LS (0.1%) value and β varying at 0.2, 0., 0.6 and 0.8. The derived SD values for synthetic and retail datasets are shown in Table 2. For both T10.I.D100k and Retail datasets, the maximum periodicity (maxper) has been set to 2.5% and 5% respectively. Both Figure (a) and Figure (b) show how the number of periodic-frequent patterns varied at different SD values and maxper = 0.05%. In these figures, the X-axis represents the SD (β) value and the Y-axis represents the number of periodic-frequent patterns generated. The think line in this graph represents the number of periodic-frequent patterns generated using the approach discussed in [], with and maxper = 0.05%. It can be observed that the number of periodic-frequent patterns is significantly reduced by our method with low SD value. When SD value becomes larger, the number of periodic-frequent patterns found by our method gets closer to that found by the approach discussed in []. The reason is, when SD becomes larger more and more items MIS values reach LS. Both Figure (c) and Figure (d) show the runtime taken for generating periodic-frequent patterns at different SD

Bread 6 5 Bread Bed Jam 3 Bat 3 Jam:1,,5,9 Pillow:3 Bed 2 2 Pill. 2 2 Bat:2,6 (a) Bed Bat:8,10 Pillow:7 Bread 1 7 Bed 2 2 (b) Bed:3 Bed:7 Bed 2 2 (c) Bed:3,7 Figure 3: Mining Compact MIS-PF-tree. (a) Compact MIS-PF-tree generated after tree-merging operation (b) Prefix-tree for item pillow (PT pillow ) and (c) Conditional-tree for item pillow (CT pillow ). Periodic-frequent patterns 8000 6000 000 2000 Periodic-frequent patterns 000 3500 2500 2000 1500 1000 Runtime (sec) 200 150 100 50 Runtime (sec) 350 300 250 200 150 100 ( β = 0.8 ) ( β = 0.6) ( β= 0. ) ( β= 0.2) SD = 0.25 SD = 0.5 SD = 0.75 SD = 1.0 (a) ( β = 0.8 ) ( β = 0.6 ) ( β= 0. ) SD = 0.07 SD = 0.1 SD = 0.21 (b) ( β= 0.2) ( β = 0.8 ) ( β = 0.6) ( β= 0. ) ( β= 0.2) ( β = 0.8 ) ( β = 0.6 ) ( β= 0. ) ( β= 0.2) SD = 0.28 SD = 0.25 SD = 0.5 SD = 0.75 SD = 1.0 SD = 0.07 SD = 0.1 SD = 0.21 SD = 0.28 (c) (d) Figure : Periodic-frequent patterns generated at different SD values in (a) Synthetic and (b) Retail datasets. Runtime at different SD values in (c) Synthetic and (d) Retail datasets. values and maxper = 0.05%. The think line in this graph represents the runtime of the approach discussed in []. It can be observed that as the SD value increases, runtime of the proposed approach also increases and gets closer to that of []. The reason is, when SD increases, more number of periodic-frequent patterns are generated. 5 Conclusions and Future Work In this paper, we have proposed an approach to extract rare periodic-frequent patterns. The existing periodic-frequent pattern mining approach suffer from the rare item problem while extracting rare periodic-frequent patterns. We extended multiple minimum support framework which was proposed to extract rare frequent patterns to extract rare periodic-frequent patterns by incorporating the notion of periodicity and proposed a new pattern-growth algorithm to extract rare periodic-frequent patterns. Experiment results on both synthetic and real-world datasets show that the proposed approach is efficient. In this approach, we have extended multiple minimum support framework to only support variable of the periodic-frequent pattern approach. It can be noted that there is a scope to improve the performance by extending multiple minimum support framework to the periodicity variable of the periodic-frequent pattern approach. We are planning to investigate it as a part of future work. References [1] G. M. Weiss, Mining With Rarity: A Unifying Framework, ACM Special Interest Group on Knowledge Discovery and Data Mining Explorations, 200. [2] Agrawal, R., Imielinski, T., and Swami, A. Mining association rules between sets of items in large databases. SIGMOD, 1993, pp. 207-216. [3] Jiawei, H., Jian, P., Yiwen, Y., and Runying, M. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach*., ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 200, pp. 53-87. [] Tanbeer, S. K., Ahmed, C. F., Jeong, B., and Lee, Y. Discovering Periodic-Frequent Patterns in Transactional Databases, Pacific Asia Knowledge Discovery in Databases, 2009. [5] Liu, B., Hsu, W., and Ma, Y. Mining Association Rules with Multiple Minimum Supports. SIGKDD Explorations, 1999. [6] Ya-Han Hu, and Yen-Liang Chen. Mining Association Rules with Multiple Minimum Supports: A New Algorithm and a Support Tuning Mechanism, Decision Support Systems, 2006, Volume 2, Issue 1, pp. 1-2. [7] Uday Kiran, R., and Krishna Reddy, P. An Improved Multiple Minimum Support Based Approach to Mine Rare Association Rules, IEEE Symposium on Computational Intelligence and Data Mining, 2009. [8] Uday Kiran, R., and Krishna Reddy, P. An Improved Frequent Pattern-growth Approach To Discover Rare Association rules, International Conference on Knowledge Discovery and Information Retrieval, 2009. [9] Brijs T., Swinnen G., Vanhoof K., and Wets G., The use of association rules for product assortment decisions - a case study, Knowledge Discovery and Data Mining, 1999.