Cyclic Association Rules: Coupling Multiple Levels and Parallel Dimension Hierarchies

Size: px

Start display at page:

Download "Cyclic Association Rules: Coupling Multiple Levels and Parallel Dimension Hierarchies"

Alan Rose
5 years ago
Views:

Cyclic Association Rules: Coupling Multiple Levels and Parallel Dimension Hierarchies Eya Ben Ahmed, Ahlem Nabli and Faïez Gargouri Abstract The data warehouses contain massive volumes of

1 Cyclic Association Rules: Coupling Multiple Levels and Parallel Dimension Hierarchies Eya Ben Ahmed, Ahlem Nabli and Faïez Gargouri Abstract The data warehouses contain massive volumes of historicized data defined over a set of dimensions and aggregated through multiple levels of granularities. Although the extensive analysis tools aiming to navigate through those granularity levels, few works exploit the multidimensional model features to derive regular fitting knowledge. In this paper, we highly take advantage of the different dimensions and their parallel levels of granularity to propose a new mining method for cyclic patterns extraction from data cubes. Hence, the innovative definitions and dedicated algorithm are extended from ordinary cyclic patterns to this particular context. Experiments are reported, showing the significance of our approach. I. INTRODUCTION In the last decade, several works were interested in mining association rules from data cubes to explain the relationships amongst the multidimensional data. Since their extraction, most of the generated association rules benefit from the multidimensional data features, i.e., dimensions, measures, concept hierarchies. However, deriving strong associations among data at low levels of abstraction seems to be in the multidimensional space an effortful task due to the sparsity of data. Thus, providing capabilities to mine association rules at multiple levels of abstraction and traverse easily among different abstraction spaces are efficiently carried out using the Multi-level association rules (MLAR). To mine MLAR, concept hierarchies should be provided for generalizing primitive level concepts to high level ones. Unfortunately, only simple hierarchy is mainly used in such a mining of association rules from data cubes. In fact, the simple hierarchy describes the relationship between the members of the dimension can be represented by a tree. Nevertheless, in real situations, the dimension can be aggregated using several relationship analysis. So that, the granularity levels can form more than one hierarchy. Hence, investigating this analysis context on the mining process may efficiently explore such variety of dimensional analysis views leading to more specific rules fitting the user expectations. In this paper, we focus on cyclic patterns which aim to discover rules that occur in user-defined intervals at regular periods. Our main claim is to generalize the use of concept hierarchies for dimensions during the mining process. The main idea behind our approach is to combine the multiplelevels forming the concept hierarchies and the parallel concept hierarchies which are employed to express several granularities of given dimension depending on the analysis context. Hence, we provide a comprehensive framework for the multi-level hybrid cyclic patterns extraction. The remainder of the paper is organized as follows. The section 2 introduces a motivating example illustrating our contribution. In section 3, we present a survey of some related works. We briefly define the foundations of our method in section 4. We describe our algorithm MIHYCAR for multi-level hybrid cyclic patterns mining in section 5. Through extensive carried out experiments performed on real data warehouse, we stress on the performance of our approach in section 6. Finally, section 7 presents a conclusion resuming the strengths of our contribution and sketches future research directions. II. MOTIVATING EXAMPLE In order to illustrate our contribution, we assume the sales data cube depicted by the figure 1 and defined over three dimensions, namely: the Time T of the transactions, the Item I which was bought, the Point Of Sale POS where the item is bought. Fig. 1. Sales data cube. We provide the dimensional concept hierarchies of the data cube in the following. Figure 2 illustrates the concept hierarchies for both Time dimension and Item dimension. Such concept hierarchies are known as simple hierarchy because their members can be represented using only one tree. Nevertheless, the Point of sale dimension is described using two hierarchies as depicted by figure 4 : the first hierarchy is composed of POS -> City -> Country -> All, and the other is represented by POS -> Sales Group Division -> Sales Group Region->All. These hierarchies on Point of sale dimension account for different analysis criteria, for example, the member values of Point of sale can be analyzed by geographic location or organization structure criteria. Apparently, such hierarchies are mutually

2 non-exclusive, i.e., it is possible to compute the aggregates grouped by both geographic location and/or organization structure (see figure 2). Fig. 2. Concept hierarchies of the time and item dimensions. The expert in such a context needs to analyze the cyclic correlation existing between the item such as Astradol and its point of sale provided through its sales group division such as SGDIV1 and its geographic position such as Tunis. Such correlation is cyclic and it is repeated every month in the sales data cube (see Table I). Fig. 3. Parallel concept hierarchies associated to the point of sale dimension. We aim at building rules combining several dimensions while each dimension is formed using simple or parallel hierarchies depending on the user analysis requirements. Fig. 4. Parallel concept hierarchies of the point of sale dimension. Thus, the derived patterns answer different analytical purposes, and it makes sense to explain the correlation within the multi-faced data. Time Item Point Of Sale T I POS Jan 2010 Astradol PosBardo Feb 2010 Astradol PosBardo Mar 2010 Astradol PosBardo Apr 2010 Astradol PosBardo May 2010 Clarid PosMarsa Jun 2010 Clarid PosMarsa TABLE I TABLE T III. RELATED WORKS In this section, we focus on the various research works closely related to the cyclic pattern extraction and multidimensional association rules mining. A. Cyclic patterns The extraction of the CAR is a major issue in the data mining field. It was introduced by Ozden et al. (1998). It involves the association rules mining from articles characterized by their regular variation over time. Indeed, these association rules can highlight the daily, weekly, quarterly, or annual regular variation which is naturally cyclic. Discovering such regularities on the behavior of association rules allow marketers, for example, to better identify sales trends and provide a relevant prediction of future requests. The transactional data for analysis are time-stamped and that time intervals are specified by the user to divide the data into disjoint segments. Generally, users opt for natural data segmentation based on the months, weeks, days, etc. Indeed, users are the ablest to make such a decision based on their data comprehension. We present briefly the basic concepts related to cyclical patterns. The databases which are based on the cyclical pattern extraction data have three closely related problem of the consumer basket, the first is an identifier on the client, the second is a list of products and the third represents the date that this customer bought this product package. The database is composed of itemsets identified by date and customer ID. A cycle is a period in time characterized by its length (a month in our case). The database is therefore considered as a set of cycles of fixed length specified by the user. A cyclic item is an assigned value for the attribute that is repeated cyclically according to the length of the cycle (Astradol occurs each month of 2007). A Cyclic itemset is a set of cyclic items. For example (Astradol, Clarid) is a cyclic itemset if it appears during the first and the second quarter of The crucial challenge of CAR mining algorithms is the best extraction of the frequent cyclic patterns. Several algorithms were proposed such as INTERLEAVED and SEQUENTIAL introduced by [11] or MTP presented by Thuan [14], [15] or the Chiang s method to combine cyclic and sequential patterns [5] or PCAR, proposed by [3]. These propositions rely on generate and prune paradigm where candidates are generated then unfrequent ones are pruned. B. Multi-dimensional association rules mining We shed light on the hierarchical aspect on the survey of multidimensional association rules.

3 Method Temporality Dimension Hierarchy Constraint Non-temporal Sequential Cyclic Intra-dimensional Inter-dimensional (Kamber et al.,1997) x x x x (Zhu,1998) x x x x x x (Odzen et al.,1998) x x x x (Imielinski et al.,1999) x x x x (Thuan,2004,2008) x x x x (Tjioe and Taniar,2005) x x x x (Ben Messaoud et al.,2006) x x x x (Chiang et al.,2009) x x x x x (Plantevit et al.,2010) x x x x (Ben Ahmed and Gouider,2010) x x x x (Ben Ahmed and Gargouri,2010,2011) x x x x (Our approach,2011) x x x x Fig. 5. Comparison of cyclic and multidimensional association rules approaches. Based on this criterion, we can distinguish two types of rules: (i) Single-level association rules, (ii) Multi-level association rules. 1) Single-Level association rules: Most of related works neglect the multiple dimensional granularities levels. Kamber et al. introduced the mining of association rules from data warehouses [10]. In [16], underlying the number of involved dimensions and predicates in the association rule, Zhu introduces three classes of association rules, i.e., (i) intra-dimensional (association within one dimension), (ii) inter-dimensional (association among a set of dimensions), (iii) and hybrid association mining (association among a set of dimensions with some items belonging to the same dimension). Ben Ahmed and Gargouri study the CAR mining from several dimensions [2]. After that, Ben Ahmed et al. involve the measures during the CAR extraction from data cubes [1]. 2) Muliple-levels association rules: The approach of Imielinski et al. is the first work dealing with the multilevel association rules over data warehouses. Then, Tjioe and Taniar present a method for association rules extraction from multiple dimensions whithin several levels of abstraction [13]. Plantevit et al. take advantage of the different dimensions and levels of granularities to mine sequential patterns [12]. However, all the multi-level association rules consider only one concept hierarchy associated to each involved dimension. Nevertheless, some dimensions associate several hierarchies according to different analysis criteria. Such hierarchies are very frequent and called parallel hierarchies. Only few works handle the association rules mining from parallel dimensional association rules. To overcome this drawback, we investigate an evolving of such hierarchies to derive patterns rules within repetitive predicates. Hybrid IV. FORMAL BACKGROUND In this section, we introduce the basic notions then we present our innovative key concepts that will be of use in the remainder. Single-level Multi-level constraint-based Without constraints A. Dimensions and hierarchies Definition 1: (Concept Hierarchy for dimension) A concept hierarchy for dimension is a tree whose nodes are elements belonging to the domain of this dimension [9]. It is a set of binary relationships between dimension levels. A dimension level participating in a hierarchy is called hierarchical level or in short level. The sequence of these levels is called a hierarchical path or in short path. The number of levels forming a path is called the path length. The first level of a hierarchical path is called leaf and the last is called root generally denoted by ALL. The root represents the most generalized view of data. The edges are considered as is-a relationships between members. Given two consecutive levels of a hierarchy, the higher level is called parent and the lower level is called child. Every instance of a level is called member. Example 1: The concept hierarchy of the Time dimension is depicted by the figure 2. The ALL attribute is the root, the Month is the child and 2011 is the member. Several types of concept hierarchies for dimension may be underlined. In our context, we focus on the parallel concept hierarchies. Definition 2: (Parallel concept hierarchies for dimension) Parallel hierarchies arise when a dimension has associated several hierarchies accounting for different analysis criteria. Such hierarchies can be independent or dependent. In a parallel independent hierarchies, the different hierarchies do not share levels, i.e., they represent non-overlapping sets of hierarchies. Example 2: An example of parallel concept hierarchies is depicted by figure 4. In the first concept hierarchy of point of sale, each POS is mapped into corresponding city, which is finally mapped into a corresponding country. And the second concept hierarchy, each POS is mapped into sales group division, which is mapped into sales group region. In this setting, we propose our key concepts. B. Dimensions Partition We consider that all is set in a multidimensional context. The three necessary data for cyclic mining drawn from classic context (Customer, Product, Date) become in a multidimensional context sets. We consider that the table T, related to the sales data issued by customers, defined on a set D of n dimensions is partitioned into two sets: Context dimensions D C which concern the investigated dimensions; Out of context dimensions D C related to the rest of uninvestigated dimensions or the complementary dimensions. The context dimensions can be divided into three subcategories: (i) Temporal dimension D T : introducing a relation of temporal order (date in classical context), (ii) Reference dimensions D R : the table is segmented according to the

4 reference dimensions values (customer in classical context), and (iii) Analysis dimensions: D A = {D 1,..., D m } with D i Dom(D i ) corresponding to products in the classic context and relative to dimensions from which will be extracted the cyclic correlations. Example 3: In our running example shown by table 1, we consider the whole table as our context composed of : (i) context dimensions D C ={T, I, POS} with the temporal dimension D T ={T }, the reference dimension D R = /0 and the analysis dimensions D A ={I, POS}. C. Concept Hierarchies Partition The analysis dimensions may be organized using one or more concept hierarchies. The latter can be partitioned into two sets: Context concept hierarchies H C concern the set of involved concept hierarchies related to the analysis dimensions D A ; Out of context concept hierarchies H C which report the set of unexplored concept hierarchies related to the analysis dimensions D A. Let T 1D A = {T D A1,..., T nd Am } the set of the n concept hierarchies associated to the m analysis dimensions. The elements of the analysis dimension D A1 are summarized using k concept hierarchies organizing the hierarchical relationships between the elements of this dimension : T D A1 = {T 1D A1,..., T kd A1 }. We assume that the k concept hierarchy of the i analysis dimensions T kd Ai is an oriented tree; node n i T kd Ai, label(n i ) Dom(D Ai ). Example 4: In our running example in the respect of the concept hierarchies shown by figures 2 and 4, we consider T D A = {T I, T 1POS, T 2POS }, with the T I is illustrated by figure 2, T 1POS is depicted in the left side of figure 4 and T 2POS is shown by the right side of figure 4. D. Generalization / Specialization in the concept hierarchies We denote by x (respectively x) the set containing x along with all generalizations (respectively specializations) of x with respect to T D A1 that belong to Dom(D A1 ). Each analysis dimension D Ai is instantiated using only one value d Ai considered as node having the leaf label in the k concept hierarchy associated to the dimension D kai. Example 5: In our running example shown by figure 4, we consider x =Tunis T 2POS ; the specialization of x is i.e., x= Tunis =PosBardo and the generalization of x is i.e., x= Tunis = Tunisia. E. Multi-level Dimensional Cyclic Item and Multi-level hybrid Cyclic Itemset Definition 3: (Multi-level Dimensional Cyclic Item) Let the analysis dimensions D A = {D 1,...,D m } and a cycle length l. A multi-level dimensional cyclic item α is an item belonging to one of the analysis dimensions, namely D k and having a value of d k for the date t and the date t + l with d k {T D k } and such that k [1,m], d k Dom(D k ). Unlike the transactional databases, a multi-level dimensional cyclic item can be generalized using any value node associated to d i in the k concept hierarchy without necessarily being a leaf. Example 6: Typical example of multi-level dimensional cyclic item, considered in the multidimensional context, shown by the table 1 and the delimitation of the context considered previously, is α= (PosBardo) because it belongs to the POS dimension, being a part of analysis dimension and its value PosBardo belongs to the POS domain and is repeated each month of the first quarter of Definition 4: (Multi-level Hybrid Cyclic Itemset) A multi-level hybrid cyclic itemset F defined on D A = {D 1,...,D m } is a nonempty set of multi-level dimensional cyclic items F = {α 1,...,α m } with j [1, m], α j is a multilevel dimensional cyclic item defined on D j at the date t and it is repeated at each date t +l with j,k [1, m], α j α k. Example 7: An example of multi-level hybrid cyclic itemset is F=[Astradol, PosBardo] because it is composed of two multi-level hybrid cyclic items i.e., α 1 =(Astradol), α 2 =(PosBardo). It is repeated monthly during the first quarter of F. Connectivity of multi-level hybrid cyclic itemsets We study the connectivity by scrutinizing the different relationships that may exist between the multi-level hybrid cyclic itemsets. Let two multi-level hybrid cyclic itemsets F=(d 1,..., d m ) and G=(d 1,..., d m), two types of connectivity between those itemsets are considered: 1) Connected multi-level hybrid cyclic itemsets; 2) Disconnected multi-level hybrid cyclic itemsets. Definition 5: (Disconnected multi-level hybrid cyclic itemsets) F and G are disconnected iff they do not belong to the same concept hierarchies. Example 8: F=SGDIV1 and G=Tunisia are disconnected because they do not belong to the same concept hierarchies, F=SGDIV1 T 1POS and G= Tunisia T 2POS. Definition 6: (Connected multi-level hybrid cyclic itemsets) F and G are connected iff they belong to the same concept hierarchies. Example 9: F=PosBardo and G=Tunisia are connected because they belong to the same concept hierarchy. If the multi-level hybrid cyclic itemsets are connected, two classes of relationships may be outlined: 1) Covered multi-level hybrid cyclic itemsets; 2) Uncovered multi-level hybrid cyclic itemsets. Definition 7: (Covered multi-level hybrid cyclic itemsets) F is covered by G iff d i, d i = d i or d i = d i. Example 10: F=[Astradol,PosBardo] is covered by G=[Antibiotic,Tunis] because Tunis= PosBardo and Antibiotic= Astradol. Definition 8: (Uncovered multi-level hybrid cyclic itemsets) F is un covered by G iff d i, d i d i.

5 Example 11: F=[Astradol,PosBardo] is covered by G=[Antiviral,Tunis] because Astradol Antiviral. If two multi-level hybrid cyclic itemsets are covered, two eventual relationships may be highlighted: 1) Adjacent multi-level hybrid cyclic itemsets; 2) Non adjacent multi-level hybrid cyclic itemsets. Definition 9: (Adjacent multi-level hybrid cyclic itemsets) F belongs to the n hierarchical level, G is considered as its adjacent multi-level hybrid cyclic itemset iff G belongs to n 1 level or n + 1 level of the same concept hierarchy. Example 12: F=[Astradol,PosBardo] belonging to the 1- level is adjacent to G=[Antibiotic,Tunis] because G belongs to the 2-level in the concept hierarchies depicted by both figure 2 and figure 4. Definition 10: (Non adjacent multi-level hybrid cyclic itemsets) F belongs to the n hierarchical level, G is considered as a non adjacent multi-level hybrid cyclic itemset of F iff G does not belong to n 1 level or n + 1 level of the same concept hierarchy. Example 13: F=[Astradol,PosBardo] belongs to the 1- level and is not adjacent to G=[Africa,Therapeutic] because G belongs to the 3-level in the concept hierarchies which is not the 2-level in the concept hierarchies. G. Support of multi-level hybrid cyclic itemset Definition 11: : (Support of multi-level hybrid cyclic itemset) - The support of multi-level hybrid cyclic itemset, denoted Supp(F) is the number of tuples that contain the itemset; Supp(F) = COUNT (F). Example 14: Consider the context shown by the table I and the delimitation already presented. The multi-level hybrid cyclic itemset F=(Antibiotics,PosBardo,SGDIV1) has an absolute support related to the sales of the products considered as Antibiotics and which are sold in the first sales group division SGDIV1 in PosBardo: Supp(Antibiotics,PosBardo,SGDIV1) = COUNT(I = Antibiotics,POS = PosBardo SGDIV1) = 4 H. Support and Confidence Computing of Multi-level Hybrid Cyclic Rule Definition 12: : (Support of multi-level hybrid cyclic rule) - The rule support R : F G, denoted Supp(R), is equal to the ratio of the number of tuples that contain F and G to the total number of tuples in the sub-cube. COUNT (F G) Supp(R) = COUNT (ALL,ALL) ; The support of de R, Supp(R) [0, 1]. Definition 13: : (Confidence of multi-level hybrid cyclic rule) - The rule confidence R : F G, denoted con f (R), is equal to the ratio of the number of tuples that contain F and G to the number of tuples that contain F in the sub-cube. con f (R) = Supp(R) Supp(F) ; The confidence of R, con f (R) [ 0, 1 ]. Example 15: In our running example, the rule R: Antibiotics,PosBardo SGDIV1 has : Supp(R) = COUNT(I = Antibiotics, POS = PosBardo SGDIV1) = 4 con f (R) = COUNT(I=Astradol,POS=PosBardo SGDIV1) COUNT(I=Astradol,POS=PosBardo) = 4 4 = 1 I. MIHYCAR: A Method for MultI-level HYbrid Cyclic Association Rules A method for mining multi-level hybrid cyclic association rules is introduced in this section, which uses a hierarchyinformation encoded multi-dimensional data cube instead of the classical data cube. Indeed, it is advantageous to encode the relevant data. Such encoded predicate string is composed as follows [d-h-l-k] with d is the dimension, h is the concept hierarchy of the d dimension, h indicated the abstraction level in the concept hierarchy, and finally k represents the number of itemsets. For example, Tunisia is encoded using the following string [ ] with 3 represents the dimension Point of Sales, 2 represents the second hierarchy concept and 3 describes the level of abstraction in the concept hierarchy, finally 1 represents 1-item. Indeed, such encoding requires fewer bits than the corresponding object-identifier or bar-code. To illustrate the encoding method, we present an abstract example which simulates the real life example illustrated by the table II. Item Encoded Item POS Encoded POS Astradol [ ] PosBardo [3-*-2-1] Clarid [ ] PosMarsa [3-*-2-1] TABLE II ENCODED DATA CUBE T The process of mining of multi-level hybrid cyclic rules is performed using our algorithm MIHYCAR which proceeds as follows. Notation SC lc d nd l h depth D t M insupp C [d,h,l,k] Description : Sub-Cube : Length of Cycle : Current dimension : Number of dimensions : Current level : Current hierarchy of dimension : Depth of the current concept hierarchy : Date t : Minimum Support Threshold : Set of candidates from the dimension d belonging to the hierarchy h and the level l having k itemsets (resp.f [d,h,l,k]) : Set of frequents from the dimension d belonging to the hierarchy h and the level l having k itemsets s Supp(C ) : nonempty subset s of F i : Support of the multi-level hybrid cyclic itemset C TABLE III LIST OF USED NOTATIONS IN THE MIHYCAR ALGORITHM.

6 Algorithm 1: MIHYCAR: MultI-level Hybrid Cyclic Association Rules Data: SC, M insupp Result: Multiple-levels frequent itemsets. begin // initialisation d=1; h=1;l=1; F [d,h,l,1]= Find 1-frequent cyclic itemsets(sc, l,d t, M insupp) ; for (d=1; d <= nd ; d++ ) do //scan of dimensions for (h=1; h < depth; h++) do //scan of concept hierarchies of each dimension for (l=1; F [d,h,l,1] /0; l++ ) do //scan of concept hierarchies levels of each dimension for (k=2; F [d,h,l,k 1] /0; k++ ) do C [d,h,l,k] = CandidatGeneration (F [d,h,l,k-1]); if C [d,h,l,k] is a hybrid cyclic itemset then foreach transaction T SC at date D t do C [d,h,l,t]=subset(c [d,h,l,k], T ) foreach candidat C C C [d,h,l,t] do C.support = SupportComputing(SC, l,d t, C ); F [d,h,l,k] = { C C [d,h,l,k], C.support > M insupp } end Return F [d,h,l,k] = k F [d,h,l,k] ; Starting at dimension 1, we scan all the concept hierarchies related to the first dimension. Thus, we derive for each level l, the frequent multi-level dimensional cyclic items F [1,h,l,1]. In fact, the multi-level dimensional cyclic item is frequent if it is cyclic otherwise in the respect of the length of cyclic specified by the use, its cyclic occurrences exceed the minimum support threshold (see procedure ComputingSupport). For each level l, we extract the frequent multi-level hybrid cyclic itemsets. In fact, only the descendants of frequent multi-level hybrid cyclic itemsets at level l are considered as candidates in the level-l+1 frequent itemsets. A scan of dimensions is performed. In fact, we apply the anti-monotony property which states that for each non frequent itemset, all its super-itemsets are drastically not frequent. This property is projected in the multi-level granularities space in order to enable an outstanding reduction of the search space. After finding the frequent multi-level hybrid cyclic itemsets, the set of multi-level hybrid cyclic association rules can be derived according to the minimum confidence threshold MinCon f. An example of generated rule is R : Antibiotics, PosBardo SGDIV 1. Function Find 1-frequent cyclic itemsets (SC, l,d t, M insupp) Result: F 1 begin while (!End of tuples in SC) do foreach transaction T SC do foreach item α T do foreach transaction T SC at date D t+l do Supp(α)=COUNT(α); if (Supp(α) > M insupp ) then F [d,h,l,1] = F [d,h,l,1] α; Return F [d,h,l,1] ; end Function SupportComputing (SC, l,d t, C ) Result: Supp(C ) begin NoMoreCyclic: Boolean; NoMoreCyclic = false; while ((!End of tuples in SC) and (!NoMoreCyclic)) do C [d,h,l,k] = CandidatGeneration (C [d,h,l,k 1]); foreach transaction T SC at date D t+l do if C exists in T then Supp(C )= Supp(C )+1; NoMoreCyclic = true; Return Supp(C ) ; end V. EXPERIMENTAL STUDY All experiments were carried out a PC equipped with 1.73 GHz and 1 GB of main memory. In the following, we report experiments performed on a real sales data warehouse 1, which contains three dimensions (e.g., Time dimension, Item dimension, point of sale dimension) and one sales fact table. The data warehouse is built using relational OLAP (RO- LAP) and is modeled in a star schema, which contains dimension tables for the hierarchies and a fact table for the dimensional attributes and measures. Our objective is to show, through our extensive experimental study: (i) the performance of our algorithm according to the length of cycle and the number of analysis dimensions; (ii) the assessment of the hierarchical aspect in respect of the number of involved concept hierarchies and the average depth of those hierarchies. Figure 6.(a) plots the runtime needed to generate multilevel hybrid cyclic association rules with the respect of the length of cycle. Clearly, in efficiency terms, it can be seen from this figure that the running time decreases proportionally to the length of cycle. 1 The data warehouse is related to pharmaceutical listed company. It is built using the available information at http ://

7 Runtime(s) Runtime(s) (a) (c) Length of cycle=2 Length if cycle=4 Length of cycle=8 Number of hierarchies=3 Number of hierarchies=4 Number of hierarchies=5 Runtime(s) Runtime(s) (b) (d) Number of dimensions=2 Number of dimensions=3 Number of dimensions=4 Average of depth of hierarchies=2 Average of depth of hierarchies=3 Average of depth of hierarchies=4 Fig. 6. Performance of our algorithm in respect of the (a) length of cycle, (b) dimensions number, (c) number of concept hierarchies, (d) average depth of the concept hierarchies. Figure 6.(b) describes the behavior of our approach in terms of runtime according to number of analysis dimensions. Obviously, we observe that the slopes of the three plots are increasing when the number of analysis dimensions increases. In fact, having more analysis dimensions, more concept hierarchies will be included. So that, the number of generated patterns will highly increase. Through the last experiments, we compare the runtime needed to generate multi-level hybrid cyclic rules over the number of involved concept hierarchies and the average depth of the concept hierarchies, also called the specialization level. Moreover, as shown in figure 6.(c), the number of concept hierarchies related to the analysis dimensions radically influences the performance of our algorithm. Taking more hierarchies into account through parallel hierarchies involving, the runtime of our algorithm significantly increases. Figure 6.(d) shows the number of generated multi-level hybrid cyclic association rules over the depth of the concept hierarchies. In fact, increasing the size of the concept hierarchies brings additional specialization level. Accordingly, our algorithm mines less frequent patterns until it cannot mine any more knowledge. VI. CONCLUSIONS AND FUTURE WORKS We have extended the scope of the study of mining cyclic association rules from single level to multiple concept levels and studied methods for mining multiple-level hybrid cyclic association rules from data warehouses. Mining such patterns may lead to progressive mining of refined knowledge from data. Exclusively, our method extracts patterns in the respect to parallel concept hierarchies of dimensions which organize the attribute values into different levels of abstraction related to different analysis criteria. The performance study underlines the utility of our approach. The extension of our method for future works addresses the following issues: (i) It is interesting to develop efficient algorithms for mining multiple-level hybrid cyclic rules under crossing levels on concept hierarchies; (ii) Involving the independence of the parallel concept hierarchies, an extension of our work by considering both of dependent and independent concept hierarchies; (iii) Reducing the redundant rules and filtering the uninteresting patterns. REFERENCES [1] E.Ben Ahmed, A.Nabli and F.Gargouri, Usage Des Mesures Pour La Gnration Des Rgles d Associations Cycliques, 7 me confrence francophone sur les entrepts de donnes et l analyse en ligne (EDA 11),2011, To appear. [2] E.Ben Ahmed and F.Gargouri, Règles d association cycliques dans un contexte multidimensionnel, Atelier des Systmes Décisionnels (ASD 10), Tunisia, [3] E.Ben Ahmed and M.S.Gouider, Towards a new mechanism of extracting cyclic association rules based on partition aspect, IEEE International Conference on Research Challenges in Information Science, 2010, pp 69 78,2010. [4] R.Ben Messaoud, O.Boussaid, S.L Rabasda and R.Missaoui, Enhanced mining of association rules from data cubes, Proceedings of the 9 th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2006), pp 11 18, [5] D.Chiang, C.Wang, S.Chen and C.Chen, The Cyclic Model Analysis on Sequential Patterns, IEEE Trans. on Knowl. and Data Eng., pp , [6] G.Dong, J.Han, J.Lam, J.Pei, K.Wang and W.Zou, Mining Constrained Gradients in Large Databases, IEEE Transactions on Knowledge Discovery and Data Engineering, [7] J.Han, W.Gong and Y.Yin Mining Segment-Wise Periodic Patterns in Time-Related Databases, KDD, pp , [8] J.Han, W.Gong and Y.Yin Efficient Mining of Partial Periodic Patterns in Time Series Database, ICDE, pp ,1999. [9] C.S.Jensen, T.B.Pedersen, C.Thomsen, Multidimensional Databases and Data Warehousing, [10] M.Kamber, J.Han and J.Y.Chiang Metarule-guided mining of multidimensional association rules using data cubes, Proceedings of the 1997 International Conference on Knowledge Discovery and Data Mining (KDD 97) pp ,1997. [11] B.Ozden, S.Ramaswamy and A.Silberschatz, Cyclic Association Rules, Proceedings of the Fourteenth International Conference on Data Engineering pp , [12] M.Plantevit, A.Laurent, D.Laurent, M.Teisseire and Y.Choong Mining multidimensional and multilevel sequential patterns, ACM Transactions on Knowledge Discovery from Data, pp ,2010. [13] H.C.Tjioe and D.Taniar, Mining Association Rules in Data Warehouses, IJDWM, pp 28-62, [14] N.D.Thuan, Mining Cylic Association Rules in Temporal Database, The Journal Science and technology developement, Vietnam National University, pp 12 19, [15] Thuan, N.D., Mining Time Pattern Association Rules in Temporal Database, SCSS, pp 7-11, [16] H.Zhu, On-line analytical mining of association rules, Master s thesis,simon Fraser University, Burnaby, British Columbia, Canada, 1998.

Mining Association Rules in OLAP Cubes

Mining Association Rules in OLAP Cubes Riadh Ben Messaoud, Omar Boussaid, and Sabine Loudcher Rabaséda Laboratory ERIC University of Lyon 2 5 avenue Pierre Mès-France, 69676, Bron Cedex, France rbenmessaoud@eric.univ-lyon2.fr,