Cyclic Association Rules: Coupling Multiple Levels and Parallel Dimension Hierarchies

Size: px
Start display at page:

Download "Cyclic Association Rules: Coupling Multiple Levels and Parallel Dimension Hierarchies"

Transcription

1 Cyclic Association Rules: Coupling Multiple Levels and Parallel Dimension Hierarchies Eya Ben Ahmed, Ahlem Nabli and Faïez Gargouri Abstract The data warehouses contain massive volumes of historicized data defined over a set of dimensions and aggregated through multiple levels of granularities. Although the extensive analysis tools aiming to navigate through those granularity levels, few works exploit the multidimensional model features to derive regular fitting knowledge. In this paper, we highly take advantage of the different dimensions and their parallel levels of granularity to propose a new mining method for cyclic patterns extraction from data cubes. Hence, the innovative definitions and dedicated algorithm are extended from ordinary cyclic patterns to this particular context. Experiments are reported, showing the significance of our approach. I. INTRODUCTION In the last decade, several works were interested in mining association rules from data cubes to explain the relationships amongst the multidimensional data. Since their extraction, most of the generated association rules benefit from the multidimensional data features, i.e., dimensions, measures, concept hierarchies. However, deriving strong associations among data at low levels of abstraction seems to be in the multidimensional space an effortful task due to the sparsity of data. Thus, providing capabilities to mine association rules at multiple levels of abstraction and traverse easily among different abstraction spaces are efficiently carried out using the Multi-level association rules (MLAR). To mine MLAR, concept hierarchies should be provided for generalizing primitive level concepts to high level ones. Unfortunately, only simple hierarchy is mainly used in such a mining of association rules from data cubes. In fact, the simple hierarchy describes the relationship between the members of the dimension can be represented by a tree. Nevertheless, in real situations, the dimension can be aggregated using several relationship analysis. So that, the granularity levels can form more than one hierarchy. Hence, investigating this analysis context on the mining process may efficiently explore such variety of dimensional analysis views leading to more specific rules fitting the user expectations. In this paper, we focus on cyclic patterns which aim to discover rules that occur in user-defined intervals at regular periods. Our main claim is to generalize the use of concept hierarchies for dimensions during the mining process. The main idea behind our approach is to combine the multiplelevels forming the concept hierarchies and the parallel concept hierarchies which are employed to express several granularities of given dimension depending on the analysis context. Hence, we provide a comprehensive framework for the multi-level hybrid cyclic patterns extraction. The remainder of the paper is organized as follows. The section 2 introduces a motivating example illustrating our contribution. In section 3, we present a survey of some related works. We briefly define the foundations of our method in section 4. We describe our algorithm MIHYCAR for multi-level hybrid cyclic patterns mining in section 5. Through extensive carried out experiments performed on real data warehouse, we stress on the performance of our approach in section 6. Finally, section 7 presents a conclusion resuming the strengths of our contribution and sketches future research directions. II. MOTIVATING EXAMPLE In order to illustrate our contribution, we assume the sales data cube depicted by the figure 1 and defined over three dimensions, namely: the Time T of the transactions, the Item I which was bought, the Point Of Sale POS where the item is bought. Fig. 1. Sales data cube. We provide the dimensional concept hierarchies of the data cube in the following. Figure 2 illustrates the concept hierarchies for both Time dimension and Item dimension. Such concept hierarchies are known as simple hierarchy because their members can be represented using only one tree. Nevertheless, the Point of sale dimension is described using two hierarchies as depicted by figure 4 : the first hierarchy is composed of POS -> City -> Country -> All, and the other is represented by POS -> Sales Group Division -> Sales Group Region->All. These hierarchies on Point of sale dimension account for different analysis criteria, for example, the member values of Point of sale can be analyzed by geographic location or organization structure criteria. Apparently, such hierarchies are mutually

2 non-exclusive, i.e., it is possible to compute the aggregates grouped by both geographic location and/or organization structure (see figure 2). Fig. 2. Concept hierarchies of the time and item dimensions. The expert in such a context needs to analyze the cyclic correlation existing between the item such as Astradol and its point of sale provided through its sales group division such as SGDIV1 and its geographic position such as Tunis. Such correlation is cyclic and it is repeated every month in the sales data cube (see Table I). Fig. 3. Parallel concept hierarchies associated to the point of sale dimension. We aim at building rules combining several dimensions while each dimension is formed using simple or parallel hierarchies depending on the user analysis requirements. Fig. 4. Parallel concept hierarchies of the point of sale dimension. Thus, the derived patterns answer different analytical purposes, and it makes sense to explain the correlation within the multi-faced data. Time Item Point Of Sale T I POS Jan 2010 Astradol PosBardo Feb 2010 Astradol PosBardo Mar 2010 Astradol PosBardo Apr 2010 Astradol PosBardo May 2010 Clarid PosMarsa Jun 2010 Clarid PosMarsa TABLE I TABLE T III. RELATED WORKS In this section, we focus on the various research works closely related to the cyclic pattern extraction and multidimensional association rules mining. A. Cyclic patterns The extraction of the CAR is a major issue in the data mining field. It was introduced by Ozden et al. (1998). It involves the association rules mining from articles characterized by their regular variation over time. Indeed, these association rules can highlight the daily, weekly, quarterly, or annual regular variation which is naturally cyclic. Discovering such regularities on the behavior of association rules allow marketers, for example, to better identify sales trends and provide a relevant prediction of future requests. The transactional data for analysis are time-stamped and that time intervals are specified by the user to divide the data into disjoint segments. Generally, users opt for natural data segmentation based on the months, weeks, days, etc. Indeed, users are the ablest to make such a decision based on their data comprehension. We present briefly the basic concepts related to cyclical patterns. The databases which are based on the cyclical pattern extraction data have three closely related problem of the consumer basket, the first is an identifier on the client, the second is a list of products and the third represents the date that this customer bought this product package. The database is composed of itemsets identified by date and customer ID. A cycle is a period in time characterized by its length (a month in our case). The database is therefore considered as a set of cycles of fixed length specified by the user. A cyclic item is an assigned value for the attribute that is repeated cyclically according to the length of the cycle (Astradol occurs each month of 2007). A Cyclic itemset is a set of cyclic items. For example (Astradol, Clarid) is a cyclic itemset if it appears during the first and the second quarter of The crucial challenge of CAR mining algorithms is the best extraction of the frequent cyclic patterns. Several algorithms were proposed such as INTERLEAVED and SEQUENTIAL introduced by [11] or MTP presented by Thuan [14], [15] or the Chiang s method to combine cyclic and sequential patterns [5] or PCAR, proposed by [3]. These propositions rely on generate and prune paradigm where candidates are generated then unfrequent ones are pruned. B. Multi-dimensional association rules mining We shed light on the hierarchical aspect on the survey of multidimensional association rules.

3 Method Temporality Dimension Hierarchy Constraint Non-temporal Sequential Cyclic Intra-dimensional Inter-dimensional (Kamber et al.,1997) x x x x (Zhu,1998) x x x x x x (Odzen et al.,1998) x x x x (Imielinski et al.,1999) x x x x (Thuan,2004,2008) x x x x (Tjioe and Taniar,2005) x x x x (Ben Messaoud et al.,2006) x x x x (Chiang et al.,2009) x x x x x (Plantevit et al.,2010) x x x x (Ben Ahmed and Gouider,2010) x x x x (Ben Ahmed and Gargouri,2010,2011) x x x x (Our approach,2011) x x x x Fig. 5. Comparison of cyclic and multidimensional association rules approaches. Based on this criterion, we can distinguish two types of rules: (i) Single-level association rules, (ii) Multi-level association rules. 1) Single-Level association rules: Most of related works neglect the multiple dimensional granularities levels. Kamber et al. introduced the mining of association rules from data warehouses [10]. In [16], underlying the number of involved dimensions and predicates in the association rule, Zhu introduces three classes of association rules, i.e., (i) intra-dimensional (association within one dimension), (ii) inter-dimensional (association among a set of dimensions), (iii) and hybrid association mining (association among a set of dimensions with some items belonging to the same dimension). Ben Ahmed and Gargouri study the CAR mining from several dimensions [2]. After that, Ben Ahmed et al. involve the measures during the CAR extraction from data cubes [1]. 2) Muliple-levels association rules: The approach of Imielinski et al. is the first work dealing with the multilevel association rules over data warehouses. Then, Tjioe and Taniar present a method for association rules extraction from multiple dimensions whithin several levels of abstraction [13]. Plantevit et al. take advantage of the different dimensions and levels of granularities to mine sequential patterns [12]. However, all the multi-level association rules consider only one concept hierarchy associated to each involved dimension. Nevertheless, some dimensions associate several hierarchies according to different analysis criteria. Such hierarchies are very frequent and called parallel hierarchies. Only few works handle the association rules mining from parallel dimensional association rules. To overcome this drawback, we investigate an evolving of such hierarchies to derive patterns rules within repetitive predicates. Hybrid IV. FORMAL BACKGROUND In this section, we introduce the basic notions then we present our innovative key concepts that will be of use in the remainder. Single-level Multi-level constraint-based Without constraints A. Dimensions and hierarchies Definition 1: (Concept Hierarchy for dimension) A concept hierarchy for dimension is a tree whose nodes are elements belonging to the domain of this dimension [9]. It is a set of binary relationships between dimension levels. A dimension level participating in a hierarchy is called hierarchical level or in short level. The sequence of these levels is called a hierarchical path or in short path. The number of levels forming a path is called the path length. The first level of a hierarchical path is called leaf and the last is called root generally denoted by ALL. The root represents the most generalized view of data. The edges are considered as is-a relationships between members. Given two consecutive levels of a hierarchy, the higher level is called parent and the lower level is called child. Every instance of a level is called member. Example 1: The concept hierarchy of the Time dimension is depicted by the figure 2. The ALL attribute is the root, the Month is the child and 2011 is the member. Several types of concept hierarchies for dimension may be underlined. In our context, we focus on the parallel concept hierarchies. Definition 2: (Parallel concept hierarchies for dimension) Parallel hierarchies arise when a dimension has associated several hierarchies accounting for different analysis criteria. Such hierarchies can be independent or dependent. In a parallel independent hierarchies, the different hierarchies do not share levels, i.e., they represent non-overlapping sets of hierarchies. Example 2: An example of parallel concept hierarchies is depicted by figure 4. In the first concept hierarchy of point of sale, each POS is mapped into corresponding city, which is finally mapped into a corresponding country. And the second concept hierarchy, each POS is mapped into sales group division, which is mapped into sales group region. In this setting, we propose our key concepts. B. Dimensions Partition We consider that all is set in a multidimensional context. The three necessary data for cyclic mining drawn from classic context (Customer, Product, Date) become in a multidimensional context sets. We consider that the table T, related to the sales data issued by customers, defined on a set D of n dimensions is partitioned into two sets: Context dimensions D C which concern the investigated dimensions; Out of context dimensions D C related to the rest of uninvestigated dimensions or the complementary dimensions. The context dimensions can be divided into three subcategories: (i) Temporal dimension D T : introducing a relation of temporal order (date in classical context), (ii) Reference dimensions D R : the table is segmented according to the

4 reference dimensions values (customer in classical context), and (iii) Analysis dimensions: D A = {D 1,..., D m } with D i Dom(D i ) corresponding to products in the classic context and relative to dimensions from which will be extracted the cyclic correlations. Example 3: In our running example shown by table 1, we consider the whole table as our context composed of : (i) context dimensions D C ={T, I, POS} with the temporal dimension D T ={T }, the reference dimension D R = /0 and the analysis dimensions D A ={I, POS}. C. Concept Hierarchies Partition The analysis dimensions may be organized using one or more concept hierarchies. The latter can be partitioned into two sets: Context concept hierarchies H C concern the set of involved concept hierarchies related to the analysis dimensions D A ; Out of context concept hierarchies H C which report the set of unexplored concept hierarchies related to the analysis dimensions D A. Let T 1D A = {T D A1,..., T nd Am } the set of the n concept hierarchies associated to the m analysis dimensions. The elements of the analysis dimension D A1 are summarized using k concept hierarchies organizing the hierarchical relationships between the elements of this dimension : T D A1 = {T 1D A1,..., T kd A1 }. We assume that the k concept hierarchy of the i analysis dimensions T kd Ai is an oriented tree; node n i T kd Ai, label(n i ) Dom(D Ai ). Example 4: In our running example in the respect of the concept hierarchies shown by figures 2 and 4, we consider T D A = {T I, T 1POS, T 2POS }, with the T I is illustrated by figure 2, T 1POS is depicted in the left side of figure 4 and T 2POS is shown by the right side of figure 4. D. Generalization / Specialization in the concept hierarchies We denote by x (respectively x) the set containing x along with all generalizations (respectively specializations) of x with respect to T D A1 that belong to Dom(D A1 ). Each analysis dimension D Ai is instantiated using only one value d Ai considered as node having the leaf label in the k concept hierarchy associated to the dimension D kai. Example 5: In our running example shown by figure 4, we consider x =Tunis T 2POS ; the specialization of x is i.e., x= Tunis =PosBardo and the generalization of x is i.e., x= Tunis = Tunisia. E. Multi-level Dimensional Cyclic Item and Multi-level hybrid Cyclic Itemset Definition 3: (Multi-level Dimensional Cyclic Item) Let the analysis dimensions D A = {D 1,...,D m } and a cycle length l. A multi-level dimensional cyclic item α is an item belonging to one of the analysis dimensions, namely D k and having a value of d k for the date t and the date t + l with d k {T D k } and such that k [1,m], d k Dom(D k ). Unlike the transactional databases, a multi-level dimensional cyclic item can be generalized using any value node associated to d i in the k concept hierarchy without necessarily being a leaf. Example 6: Typical example of multi-level dimensional cyclic item, considered in the multidimensional context, shown by the table 1 and the delimitation of the context considered previously, is α= (PosBardo) because it belongs to the POS dimension, being a part of analysis dimension and its value PosBardo belongs to the POS domain and is repeated each month of the first quarter of Definition 4: (Multi-level Hybrid Cyclic Itemset) A multi-level hybrid cyclic itemset F defined on D A = {D 1,...,D m } is a nonempty set of multi-level dimensional cyclic items F = {α 1,...,α m } with j [1, m], α j is a multilevel dimensional cyclic item defined on D j at the date t and it is repeated at each date t +l with j,k [1, m], α j α k. Example 7: An example of multi-level hybrid cyclic itemset is F=[Astradol, PosBardo] because it is composed of two multi-level hybrid cyclic items i.e., α 1 =(Astradol), α 2 =(PosBardo). It is repeated monthly during the first quarter of F. Connectivity of multi-level hybrid cyclic itemsets We study the connectivity by scrutinizing the different relationships that may exist between the multi-level hybrid cyclic itemsets. Let two multi-level hybrid cyclic itemsets F=(d 1,..., d m ) and G=(d 1,..., d m), two types of connectivity between those itemsets are considered: 1) Connected multi-level hybrid cyclic itemsets; 2) Disconnected multi-level hybrid cyclic itemsets. Definition 5: (Disconnected multi-level hybrid cyclic itemsets) F and G are disconnected iff they do not belong to the same concept hierarchies. Example 8: F=SGDIV1 and G=Tunisia are disconnected because they do not belong to the same concept hierarchies, F=SGDIV1 T 1POS and G= Tunisia T 2POS. Definition 6: (Connected multi-level hybrid cyclic itemsets) F and G are connected iff they belong to the same concept hierarchies. Example 9: F=PosBardo and G=Tunisia are connected because they belong to the same concept hierarchy. If the multi-level hybrid cyclic itemsets are connected, two classes of relationships may be outlined: 1) Covered multi-level hybrid cyclic itemsets; 2) Uncovered multi-level hybrid cyclic itemsets. Definition 7: (Covered multi-level hybrid cyclic itemsets) F is covered by G iff d i, d i = d i or d i = d i. Example 10: F=[Astradol,PosBardo] is covered by G=[Antibiotic,Tunis] because Tunis= PosBardo and Antibiotic= Astradol. Definition 8: (Uncovered multi-level hybrid cyclic itemsets) F is un covered by G iff d i, d i d i.

5 Example 11: F=[Astradol,PosBardo] is covered by G=[Antiviral,Tunis] because Astradol Antiviral. If two multi-level hybrid cyclic itemsets are covered, two eventual relationships may be highlighted: 1) Adjacent multi-level hybrid cyclic itemsets; 2) Non adjacent multi-level hybrid cyclic itemsets. Definition 9: (Adjacent multi-level hybrid cyclic itemsets) F belongs to the n hierarchical level, G is considered as its adjacent multi-level hybrid cyclic itemset iff G belongs to n 1 level or n + 1 level of the same concept hierarchy. Example 12: F=[Astradol,PosBardo] belonging to the 1- level is adjacent to G=[Antibiotic,Tunis] because G belongs to the 2-level in the concept hierarchies depicted by both figure 2 and figure 4. Definition 10: (Non adjacent multi-level hybrid cyclic itemsets) F belongs to the n hierarchical level, G is considered as a non adjacent multi-level hybrid cyclic itemset of F iff G does not belong to n 1 level or n + 1 level of the same concept hierarchy. Example 13: F=[Astradol,PosBardo] belongs to the 1- level and is not adjacent to G=[Africa,Therapeutic] because G belongs to the 3-level in the concept hierarchies which is not the 2-level in the concept hierarchies. G. Support of multi-level hybrid cyclic itemset Definition 11: : (Support of multi-level hybrid cyclic itemset) - The support of multi-level hybrid cyclic itemset, denoted Supp(F) is the number of tuples that contain the itemset; Supp(F) = COUNT (F). Example 14: Consider the context shown by the table I and the delimitation already presented. The multi-level hybrid cyclic itemset F=(Antibiotics,PosBardo,SGDIV1) has an absolute support related to the sales of the products considered as Antibiotics and which are sold in the first sales group division SGDIV1 in PosBardo: Supp(Antibiotics,PosBardo,SGDIV1) = COUNT(I = Antibiotics,POS = PosBardo SGDIV1) = 4 H. Support and Confidence Computing of Multi-level Hybrid Cyclic Rule Definition 12: : (Support of multi-level hybrid cyclic rule) - The rule support R : F G, denoted Supp(R), is equal to the ratio of the number of tuples that contain F and G to the total number of tuples in the sub-cube. COUNT (F G) Supp(R) = COUNT (ALL,ALL) ; The support of de R, Supp(R) [0, 1]. Definition 13: : (Confidence of multi-level hybrid cyclic rule) - The rule confidence R : F G, denoted con f (R), is equal to the ratio of the number of tuples that contain F and G to the number of tuples that contain F in the sub-cube. con f (R) = Supp(R) Supp(F) ; The confidence of R, con f (R) [ 0, 1 ]. Example 15: In our running example, the rule R: Antibiotics,PosBardo SGDIV1 has : Supp(R) = COUNT(I = Antibiotics, POS = PosBardo SGDIV1) = 4 con f (R) = COUNT(I=Astradol,POS=PosBardo SGDIV1) COUNT(I=Astradol,POS=PosBardo) = 4 4 = 1 I. MIHYCAR: A Method for MultI-level HYbrid Cyclic Association Rules A method for mining multi-level hybrid cyclic association rules is introduced in this section, which uses a hierarchyinformation encoded multi-dimensional data cube instead of the classical data cube. Indeed, it is advantageous to encode the relevant data. Such encoded predicate string is composed as follows [d-h-l-k] with d is the dimension, h is the concept hierarchy of the d dimension, h indicated the abstraction level in the concept hierarchy, and finally k represents the number of itemsets. For example, Tunisia is encoded using the following string [ ] with 3 represents the dimension Point of Sales, 2 represents the second hierarchy concept and 3 describes the level of abstraction in the concept hierarchy, finally 1 represents 1-item. Indeed, such encoding requires fewer bits than the corresponding object-identifier or bar-code. To illustrate the encoding method, we present an abstract example which simulates the real life example illustrated by the table II. Item Encoded Item POS Encoded POS Astradol [ ] PosBardo [3-*-2-1] Clarid [ ] PosMarsa [3-*-2-1] TABLE II ENCODED DATA CUBE T The process of mining of multi-level hybrid cyclic rules is performed using our algorithm MIHYCAR which proceeds as follows. Notation SC lc d nd l h depth D t M insupp C [d,h,l,k] Description : Sub-Cube : Length of Cycle : Current dimension : Number of dimensions : Current level : Current hierarchy of dimension : Depth of the current concept hierarchy : Date t : Minimum Support Threshold : Set of candidates from the dimension d belonging to the hierarchy h and the level l having k itemsets (resp.f [d,h,l,k]) : Set of frequents from the dimension d belonging to the hierarchy h and the level l having k itemsets s Supp(C ) : nonempty subset s of F i : Support of the multi-level hybrid cyclic itemset C TABLE III LIST OF USED NOTATIONS IN THE MIHYCAR ALGORITHM.

6 Algorithm 1: MIHYCAR: MultI-level Hybrid Cyclic Association Rules Data: SC, M insupp Result: Multiple-levels frequent itemsets. begin // initialisation d=1; h=1;l=1; F [d,h,l,1]= Find 1-frequent cyclic itemsets(sc, l,d t, M insupp) ; for (d=1; d <= nd ; d++ ) do //scan of dimensions for (h=1; h < depth; h++) do //scan of concept hierarchies of each dimension for (l=1; F [d,h,l,1] /0; l++ ) do //scan of concept hierarchies levels of each dimension for (k=2; F [d,h,l,k 1] /0; k++ ) do C [d,h,l,k] = CandidatGeneration (F [d,h,l,k-1]); if C [d,h,l,k] is a hybrid cyclic itemset then foreach transaction T SC at date D t do C [d,h,l,t]=subset(c [d,h,l,k], T ) foreach candidat C C C [d,h,l,t] do C.support = SupportComputing(SC, l,d t, C ); F [d,h,l,k] = { C C [d,h,l,k], C.support > M insupp } end Return F [d,h,l,k] = k F [d,h,l,k] ; Starting at dimension 1, we scan all the concept hierarchies related to the first dimension. Thus, we derive for each level l, the frequent multi-level dimensional cyclic items F [1,h,l,1]. In fact, the multi-level dimensional cyclic item is frequent if it is cyclic otherwise in the respect of the length of cyclic specified by the use, its cyclic occurrences exceed the minimum support threshold (see procedure ComputingSupport). For each level l, we extract the frequent multi-level hybrid cyclic itemsets. In fact, only the descendants of frequent multi-level hybrid cyclic itemsets at level l are considered as candidates in the level-l+1 frequent itemsets. A scan of dimensions is performed. In fact, we apply the anti-monotony property which states that for each non frequent itemset, all its super-itemsets are drastically not frequent. This property is projected in the multi-level granularities space in order to enable an outstanding reduction of the search space. After finding the frequent multi-level hybrid cyclic itemsets, the set of multi-level hybrid cyclic association rules can be derived according to the minimum confidence threshold MinCon f. An example of generated rule is R : Antibiotics, PosBardo SGDIV 1. Function Find 1-frequent cyclic itemsets (SC, l,d t, M insupp) Result: F 1 begin while (!End of tuples in SC) do foreach transaction T SC do foreach item α T do foreach transaction T SC at date D t+l do Supp(α)=COUNT(α); if (Supp(α) > M insupp ) then F [d,h,l,1] = F [d,h,l,1] α; Return F [d,h,l,1] ; end Function SupportComputing (SC, l,d t, C ) Result: Supp(C ) begin NoMoreCyclic: Boolean; NoMoreCyclic = false; while ((!End of tuples in SC) and (!NoMoreCyclic)) do C [d,h,l,k] = CandidatGeneration (C [d,h,l,k 1]); foreach transaction T SC at date D t+l do if C exists in T then Supp(C )= Supp(C )+1; NoMoreCyclic = true; Return Supp(C ) ; end V. EXPERIMENTAL STUDY All experiments were carried out a PC equipped with 1.73 GHz and 1 GB of main memory. In the following, we report experiments performed on a real sales data warehouse 1, which contains three dimensions (e.g., Time dimension, Item dimension, point of sale dimension) and one sales fact table. The data warehouse is built using relational OLAP (RO- LAP) and is modeled in a star schema, which contains dimension tables for the hierarchies and a fact table for the dimensional attributes and measures. Our objective is to show, through our extensive experimental study: (i) the performance of our algorithm according to the length of cycle and the number of analysis dimensions; (ii) the assessment of the hierarchical aspect in respect of the number of involved concept hierarchies and the average depth of those hierarchies. Figure 6.(a) plots the runtime needed to generate multilevel hybrid cyclic association rules with the respect of the length of cycle. Clearly, in efficiency terms, it can be seen from this figure that the running time decreases proportionally to the length of cycle. 1 The data warehouse is related to pharmaceutical listed company. It is built using the available information at http ://

7 Runtime(s) Runtime(s) (a) (c) Length of cycle=2 Length if cycle=4 Length of cycle=8 Number of hierarchies=3 Number of hierarchies=4 Number of hierarchies=5 Runtime(s) Runtime(s) (b) (d) Number of dimensions=2 Number of dimensions=3 Number of dimensions=4 Average of depth of hierarchies=2 Average of depth of hierarchies=3 Average of depth of hierarchies=4 Fig. 6. Performance of our algorithm in respect of the (a) length of cycle, (b) dimensions number, (c) number of concept hierarchies, (d) average depth of the concept hierarchies. Figure 6.(b) describes the behavior of our approach in terms of runtime according to number of analysis dimensions. Obviously, we observe that the slopes of the three plots are increasing when the number of analysis dimensions increases. In fact, having more analysis dimensions, more concept hierarchies will be included. So that, the number of generated patterns will highly increase. Through the last experiments, we compare the runtime needed to generate multi-level hybrid cyclic rules over the number of involved concept hierarchies and the average depth of the concept hierarchies, also called the specialization level. Moreover, as shown in figure 6.(c), the number of concept hierarchies related to the analysis dimensions radically influences the performance of our algorithm. Taking more hierarchies into account through parallel hierarchies involving, the runtime of our algorithm significantly increases. Figure 6.(d) shows the number of generated multi-level hybrid cyclic association rules over the depth of the concept hierarchies. In fact, increasing the size of the concept hierarchies brings additional specialization level. Accordingly, our algorithm mines less frequent patterns until it cannot mine any more knowledge. VI. CONCLUSIONS AND FUTURE WORKS We have extended the scope of the study of mining cyclic association rules from single level to multiple concept levels and studied methods for mining multiple-level hybrid cyclic association rules from data warehouses. Mining such patterns may lead to progressive mining of refined knowledge from data. Exclusively, our method extracts patterns in the respect to parallel concept hierarchies of dimensions which organize the attribute values into different levels of abstraction related to different analysis criteria. The performance study underlines the utility of our approach. The extension of our method for future works addresses the following issues: (i) It is interesting to develop efficient algorithms for mining multiple-level hybrid cyclic rules under crossing levels on concept hierarchies; (ii) Involving the independence of the parallel concept hierarchies, an extension of our work by considering both of dependent and independent concept hierarchies; (iii) Reducing the redundant rules and filtering the uninteresting patterns. REFERENCES [1] E.Ben Ahmed, A.Nabli and F.Gargouri, Usage Des Mesures Pour La Gnration Des Rgles d Associations Cycliques, 7 me confrence francophone sur les entrepts de donnes et l analyse en ligne (EDA 11),2011, To appear. [2] E.Ben Ahmed and F.Gargouri, Règles d association cycliques dans un contexte multidimensionnel, Atelier des Systmes Décisionnels (ASD 10), Tunisia, [3] E.Ben Ahmed and M.S.Gouider, Towards a new mechanism of extracting cyclic association rules based on partition aspect, IEEE International Conference on Research Challenges in Information Science, 2010, pp 69 78,2010. [4] R.Ben Messaoud, O.Boussaid, S.L Rabasda and R.Missaoui, Enhanced mining of association rules from data cubes, Proceedings of the 9 th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2006), pp 11 18, [5] D.Chiang, C.Wang, S.Chen and C.Chen, The Cyclic Model Analysis on Sequential Patterns, IEEE Trans. on Knowl. and Data Eng., pp , [6] G.Dong, J.Han, J.Lam, J.Pei, K.Wang and W.Zou, Mining Constrained Gradients in Large Databases, IEEE Transactions on Knowledge Discovery and Data Engineering, [7] J.Han, W.Gong and Y.Yin Mining Segment-Wise Periodic Patterns in Time-Related Databases, KDD, pp , [8] J.Han, W.Gong and Y.Yin Efficient Mining of Partial Periodic Patterns in Time Series Database, ICDE, pp ,1999. [9] C.S.Jensen, T.B.Pedersen, C.Thomsen, Multidimensional Databases and Data Warehousing, [10] M.Kamber, J.Han and J.Y.Chiang Metarule-guided mining of multidimensional association rules using data cubes, Proceedings of the 1997 International Conference on Knowledge Discovery and Data Mining (KDD 97) pp ,1997. [11] B.Ozden, S.Ramaswamy and A.Silberschatz, Cyclic Association Rules, Proceedings of the Fourteenth International Conference on Data Engineering pp , [12] M.Plantevit, A.Laurent, D.Laurent, M.Teisseire and Y.Choong Mining multidimensional and multilevel sequential patterns, ACM Transactions on Knowledge Discovery from Data, pp ,2010. [13] H.C.Tjioe and D.Taniar, Mining Association Rules in Data Warehouses, IJDWM, pp 28-62, [14] N.D.Thuan, Mining Cylic Association Rules in Temporal Database, The Journal Science and technology developement, Vietnam National University, pp 12 19, [15] Thuan, N.D., Mining Time Pattern Association Rules in Temporal Database, SCSS, pp 7-11, [16] H.Zhu, On-line analytical mining of association rules, Master s thesis,simon Fraser University, Burnaby, British Columbia, Canada, 1998.

Mining Association Rules in OLAP Cubes

Mining Association Rules in OLAP Cubes Mining Association Rules in OLAP Cubes Riadh Ben Messaoud, Omar Boussaid, and Sabine Loudcher Rabaséda Laboratory ERIC University of Lyon 2 5 avenue Pierre Mès-France, 69676, Bron Cedex, France rbenmessaoud@eric.univ-lyon2.fr,

More information

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS

5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS 5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS Association rules generated from mining data at multiple levels of abstraction are called multiple level or multi level association

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

2 CONTENTS

2 CONTENTS Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a Road Map..................................... 3 5.1.1 Market Basket Analysis: A Motivating Example........................

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

Mining Generalised Emerging Patterns

Mining Generalised Emerging Patterns Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au

More information

Association Rules. Berlin Chen References:

Association Rules. Berlin Chen References: Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A

More information

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Data Warehousing. Overview

Data Warehousing. Overview Data Warehousing Overview Basic Definitions Normalization Entity Relationship Diagrams (ERDs) Normal Forms Many to Many relationships Warehouse Considerations Dimension Tables Fact Tables Star Schema Snowflake

More information

Distributed Data Mining by associated rules: Improvement of the Count Distribution algorithm

Distributed Data Mining by associated rules: Improvement of the Count Distribution algorithm www.ijcsi.org 435 Distributed Data Mining by associated rules: Improvement of the Count Distribution algorithm Hadj-Tayeb karima 1, Hadj-Tayeb Lilia 2 1 English Department, Es-Senia University, 31\Oran-Algeria

More information

Interestingness Measurements

Interestingness Measurements Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected

More information

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India

More information

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

Generating Cross level Rules: An automated approach

Generating Cross level Rules: An automated approach Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

A New Approach to Discover Periodic Frequent Patterns

A New Approach to Discover Periodic Frequent Patterns A New Approach to Discover Periodic Frequent Patterns Dr.K.Duraiswamy K.S.Rangasamy College of Terchnology, Tiruchengode -637 209, Tamilnadu, India E-mail: kduraiswamy@yahoo.co.in B.Jayanthi (Corresponding

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

Mining Temporal Indirect Associations

Mining Temporal Indirect Associations Mining Temporal Indirect Associations Ling Chen 1,2, Sourav S. Bhowmick 1, Jinyan Li 2 1 School of Computer Engineering, Nanyang Technological University, Singapore, 639798 2 Institute for Infocomm Research,

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

INTELLIGENT SUPERMARKET USING APRIORI

INTELLIGENT SUPERMARKET USING APRIORI INTELLIGENT SUPERMARKET USING APRIORI Kasturi Medhekar 1, Arpita Mishra 2, Needhi Kore 3, Nilesh Dave 4 1,2,3,4Student, 3 rd year Diploma, Computer Engineering Department, Thakur Polytechnic, Mumbai, Maharashtra,

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

Hierarchies in a multidimensional model: From conceptual modeling to logical representation

Hierarchies in a multidimensional model: From conceptual modeling to logical representation Data & Knowledge Engineering 59 (2006) 348 377 www.elsevier.com/locate/datak Hierarchies in a multidimensional model: From conceptual modeling to logical representation E. Malinowski *, E. Zimányi Department

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Mining Vague Association Rules

Mining Vague Association Rules Mining Vague Association Rules An Lu, Yiping Ke, James Cheng, and Wilfred Ng Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China {anlu,keyiping,csjames,wilfred}@cse.ust.hk

More information

Multi-Level Mining and Visualization of Informative Association Rules

Multi-Level Mining and Visualization of Informative Association Rules JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 32, 1061-1078 (2016) Multi-Level Mining and Visualization of Informative Association Rules MUHAMMAD USMAN 1 AND M. USMAN 2 1 Department of Computer and Mathematical

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Mining Association Rules in Data Warehouses

Mining Association Rules in Data Warehouses IDEA GROUP PUBLISHING 28 International Journal of Data Warehousing & Mining, 1(3), 28-62, July-September 2005 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661;

More information

Mining Frequent Patterns with Counting Inference at Multiple Levels

Mining Frequent Patterns with Counting Inference at Multiple Levels International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika

More information

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis. www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema

More information

Nesnelerin İnternetinde Veri Analizi

Nesnelerin İnternetinde Veri Analizi Bölüm 4. Frequent Patterns in Data Streams w3.gazi.edu.tr/~suatozdemir What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

EDA Juin 2013 Blois, France. Summarizability Issues in Multidimensional Models: A Survey* Authors: Marouane HACHICHA Jérôme DARMONT

EDA Juin 2013 Blois, France. Summarizability Issues in Multidimensional Models: A Survey* Authors: Marouane HACHICHA Jérôme DARMONT *Problèmes d'additivité dus à la présence de hiérarchies complexes dans les modèles multidimensionnels : définitions, solutions et travaux futurs EDA 2013 Summarizability Issues in Multidimensional Models:

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Preetham Kumar, Ananthanarayana V S Abstract In this paper we propose a novel algorithm for discovering multi

More information

Dta Mining and Data Warehousing

Dta Mining and Data Warehousing CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:

More information

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,

More information

Discovering Periodic Patterns in Database Audit Trails

Discovering Periodic Patterns in Database Audit Trails Vol.29 (DTA 2013), pp.365-371 http://dx.doi.org/10.14257/astl.2013.29.76 Discovering Periodic Patterns in Database Audit Trails Marcin Zimniak 1, Janusz R. Getta 2, and Wolfgang Benn 1 1 Faculty of Computer

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Vladimir Estivill-Castro School of Computing and Information Technology With contributions fromj. Han 1 Association Rule Mining A typical example is market basket

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Performance Based Study of Association Rule Algorithms On Voter DB

Performance Based Study of Association Rule Algorithms On Voter DB Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

Inference in Hierarchical Multidimensional Space

Inference in Hierarchical Multidimensional Space Proc. International Conference on Data Technologies and Applications (DATA 2012), Rome, Italy, 25-27 July 2012, 70-76 Related papers: http://conceptoriented.org/ Inference in Hierarchical Multidimensional

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923

More information

The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees

The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees Omar H. Karam Faculty of Informatics and Computer Science, The British University in Egypt and Faculty of

More information

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on   to remove this watermark. 119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

Mining Segment-Wise Periodic Patterns in Time-Related Databases

Mining Segment-Wise Periodic Patterns in Time-Related Databases Mining Segment-Wise Periodic Patterns in Time-Related Databases Jiawei Han Wan Gong Yiwen Yin Intelligent Database Systems Research Laboratory, School of Computing Science Simon Fraser University, Burnaby,

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE David C. Hay Essential Strategies, Inc In the buzzword sweepstakes of 1997, the clear winner has to be Data Warehouse. A host of technologies and techniques

More information

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets : A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets J. Tahmores Nezhad ℵ, M.H.Sadreddini Abstract In recent years, various algorithms for mining closed frequent

More information

APD tool: Mining Anomalous Patterns from Event Logs

APD tool: Mining Anomalous Patterns from Event Logs APD tool: Mining Anomalous Patterns from Event Logs Laura Genga 1, Mahdi Alizadeh 1, Domenico Potena 2, Claudia Diamantini 2, and Nicola Zannone 1 1 Eindhoven University of Technology 2 Università Politecnica

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Production rule is an important element in the expert system. By interview with

Production rule is an important element in the expert system. By interview with 2 Literature review Production rule is an important element in the expert system By interview with the domain experts, we can induce the rules and store them in a truth maintenance system An assumption-based

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA. Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

Structure of Association Rule Classifiers: a Review

Structure of Association Rule Classifiers: a Review Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be

More information

CHAPTER-23 MINING COMPLEX TYPES OF DATA

CHAPTER-23 MINING COMPLEX TYPES OF DATA CHAPTER-23 MINING COMPLEX TYPES OF DATA 23.1 Introduction 23.2 Multidimensional Analysis and Descriptive Mining of Complex Data Objects 23.3 Generalization of Structured Data 23.4 Aggregation and Approximation

More information

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997 1 of 8 5/24/02 4:43 PM A Systems Approach to Dimensional Modeling in Data Marts By Joseph M. Firestone, Ph.D. White Paper No. One March 12, 1997 OLAP s Purposes And Dimensional Data Modeling Dimensional

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities

Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities RESEARCH ARTICLE OPEN ACCESS Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities C. Usha Rani 1, B. Rupa Devi 2 1, 2 Asst. Professor of Department of CSE ABSTRACT

More information

Interestingness Measurements

Interestingness Measurements Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea. 15-721 DB Sys. Design & Impl. Association Rules Christos Faloutsos www.cs.cmu.edu/~christos Roadmap 1) Roots: System R and Ingres... 7) Data Analysis - data mining datacubes and OLAP classifiers association

More information

Building Fuzzy Blocks from Data Cubes

Building Fuzzy Blocks from Data Cubes Building Fuzzy Blocks from Data Cubes Yeow Wei Choong HELP University College Kuala Lumpur MALAYSIA choongyw@help.edu.my Anne Laurent Dominique Laurent LIRMM ETIS Université Montpellier II Université de

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information