Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes

Size: px
Start display at page:

Download "Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes"

Transcription

1 European Journal of Scientific Research ISSN X Vol.33 No.2 (2009), pp EuroJournals Publishing, Inc Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes Jalal Atoum Computer. Sci. Dept., PSUT, Amman-Jordan Tel: Abstract Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge from data. Approximate Functional Dependencies (AFD) mined from database relations represent potentially interesting patterns and have proven to be useful for various tasks like feature selection for classification, query optimization and query rewriting. The discovery of AFDs still remains under explored, posing a special set of challenges. Such challenges include defining right interestingness measures for AFDs, employing effective pruning strategies and performing an efficient traversal in the search space of the attribute lattice. In this paper, we present a new algorithm for finding approximate functional dependencies from large relational databases, based on an approximation measure g 3. This algorithm utilizes some concepts from relational databases design theory specifically the concepts of equivalences and the minimal cover. It has resulted in large improvement in performance in comparison with a modified version of an algorithm called TANE. Keywords: Data Mining, Approximate Functional Dependencies, Equivalent classes, Minimal Cover. 1. Introduction The primary motivations for mining function dependencies (FDs) from databases are the discovering of useful patterns from data and the discovering of interesting relations between variables in large databases. In some cases, an FD may not hold because of a few tuples. This FD can be thought to approximately hold. For example, Language Nationality may approximately hold. Approximate functional dependencies (AFDs) represent valuable knowledge of the structure of the relation instance. The discovery of such knowledge can be valuable to find an expertise from a database for a specific domain expert. Such AFDs exist in several databases when there are expected dependencies between attributes, but some tuples contain errors or represent exceptions to the rule. The discovery of unexpected but meaningful approximate dependencies is an exciting and practical goal in many data mining applications. For instance, a restatement from [3]: an AFD in a database of chemical compounds relating various structural attributes to carcinogenicity could provide valuable hints to biochemists for potential causes of cancer (but cannot be taken as a fact without further analysis by domain specialists).

2 Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes 339 Applications of AFDs includes; Predicting Missing Values of attributes in relational tables (QPIAD) [14] using values of attributes in determining set of AFDs, query optimization (CORDS[4]) by maintaining correct selectivity estimates, query rewriting (AIMQ[9], QPIAD[14], QUIC[13]), and in database normalization for better performance and efficient storage design. The discovery of AFDs is costly due to the following reasons; the pruning strategies of FDs are not applicable in case of AFDs, for databases with large number of attributes, the search space gets worse, and the methods for determining whether a dependency holds or not are costly [6]. In this paper, we propose a new algorithm for discovering AFDs from static databases based on an approximation measure g 3 [8]. This algorithm will also employ some concepts from relational database theory, specifically, the theory of equivalencies and minimal cover of FDs. The proposed algorithm aims at minimizing the time requirements of algorithms that discover AFDs from databases. We will compare the results of our proposed algorithm with a modification version of previous well known algorithm called Tane [3]. 2. Previous Research In recent years, a new research direction has emerged involving mining FDs. Researchers have been addressing the problem of finding all of the FDs which hold in a given relation instance [3, 5, 6, 8, 11]. The AFD discovery research consists of three primary parts: (1) defining an approximation measure for AFDs, (2) developing methods for applying AFDs to pre-existing problems, (3) developing algorithms for efficiently computing AFDs. Huhtala et al. [3] address the last part by developing an algorithm, Tane, for discovering all AFDs which hold in a relation instance. Tane uses an approximation measure, g 3, proposed in [8] to define when an AFD is deemed to hold (g 3 will be defined in the section 4). AFDs discovery have been considered in [1, 7, 9]. Kivinen and Mannila [8] define several measures for the error of a dependency, and derive bounds for discovering dependencies with errors of a dependency, and derive bounds for discovering dependencies with errors. The measure g 3 is one of their measures. The use of partitions to describe and define functional and approximate dependencies has been suggested in [1]. 3. Functional Dependencies Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written X Y) if and only if each X value is associated with precisely one Y value. An FD that is denoted by X A, is a constraint between two sets of attributes X and A that are subset of some relation schema R. It specifies a constraint on all possible tuples t 1 and t 2 in R such that if t 1 [X]=t 2 [X], then they must also have t 1 [A] =t 2 [A]. This means that the values of the A component of any tuple in R depend on or determined by the values of the X component Functional Dependencies and Equivalent Classes To discover a set of FDs that are satisfied by a relation instance, we use the partition method that divide the tuples of this instance into groups based on the different values for each column (attribute). For each attribute, the number of groups is equal to the number of different values for that attribute. Each group is called an equivalent class. For instance, consider the relation instance shown in table 1, in this instance, attribute A has value "1" only in tuples number one and two, so they form an equivalent class [1] {A} = [2] {A} = {1,2} (we use here tuple identifiers to denote tuples). Similarly, attribute A has value of 2 in tuples 3,4,5 and has value of 3 in tuples 6,7,8. Hence the whole equivalent classes with respect to attribute A is consisted of three equivalence classes as follows: {A} = {{1, 2}, {3, 4, 5}, {6, 7, 8}}.

3 340 Jalal Atoum The equivalence classes with respect to the combined attributes {B, C}, for example, is: {B, C} r = {{1}, {2}, {3, 4}, {5}, {6}, {7}, {8}}. Table 1: Relation Instance Tuple ID A B E C D 1 1 a 2 $ Flower 2 1 A 2 Tulip 3 2 A 0 $ Daffodil 4 2 A 0 $ Flower 5 2 B 0 Lily 6 3 B 1 $ Orchid 7 3 C 1 Flower 8 3 C 1 # Rose The concept of equivalent classes refinement gives almost directly functional dependencies. An equivalent classes refines another equivalent classes ' if every equivalence class in is a subset of some equivalence classes of '. An FD X Y holds if and only if {x} refines {Y}. In our example: attribute A has the following sets of equivalence classes: {{t 1, t 2 }, {t 3, t 4, t 5 }, {t 6, t 7, t 8 }}, and attribute E has the following sets of equivalence classes: {{t 1, t 2 }, {t 3, t 4, t 5 }, {t 6, t 7, t 8 }}. Since the equivalent classes of attribute E refine the equivalent classes of attribute A, we can discover that A E holds on this instance (Table 1) Minimal Cover The concept of minimal FDs or minimal cover is useful in eliminating unnecessary FDs so that only the minimal number of dependencies need to be considered. A set of functional dependencies F is a minimal cover iff: 1. Every functional dependency in F is of the form X A where A is a single attribute. 2. For no X A in F is F - {X A} equivalent to F 3. For no X A in F and Y X is F {X A} {Y A} equivalent to F Example: {A C, A B } is a minimal cover for {AB C, A B} 4. Approximate Functional Dependencies For some relations, some FDs may not hold for all of the tuples. Such an FD can be thought to approximately hold. For example for cars, Make is determined by Model via an expectation dependency: given that Model =323, we know that Make = Mazda with high probability, but there is also a small chance that Make = BMW. This expected or approximated FD is specified by Model Make. A standard definition of an approximate dependency X A is based on the minimum number of rows that need to be removed from the relation instance r for X A to hold in r: the error g 3 (X A) = 1 - (max{ s s r and X A holds in s})/ r. The measure g 3 has a natural interpretation as the fraction of rows with exceptions or errors affecting the dependency. Given an error threshold ε, 0 ε 1, we say that X A is an approximate dependency if and only if g 3 (X A) is at most ε [2]. An alternative method of computing g 3 that will be used in our proposed algorithm is as follows [3]: An equivalence class c of π X is the union of one or more equivalence classes c 1, c 2,. of π x {A}, and the rows in all but one of the c i s must be removed for X A to hold. The minimum number of rows to remove is thus the size of c minus the size of the largest of the c i s. Summing that over all equivalence classes c of π X gives the total number of tuples to remove. Thus, we have: g 3 (X A) = 1 - c πx max{ c c π x {A} and c c} / r

4 Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes 341 For instance, in our example, to test whether A B holds or not, we find the equivalent classes of π A = {{1, 2}, {3, 4, 5}, {6, 7, 8}} and the equivalent classes of π B = {{1}, {2, 3, 4}, {5, 6}, {7, 8}}. Since the equivalent class {1, 2} in π A does not refine any class in π B and so on for the other classes in π A. Therefore, A B does not hold. However, A B may hold in our example, with some error g 3, if we remove some tuples form the given relation. According to the above alternative method of computing g 3, we first find π A {B} = {{1}, {2}, {3, 4}, {5}, {6}, {7, 8}}. The equivalent class {1, 2} in π A it is equals to: {1} {2} from π A {B} with max size of {1} and {2} = 1. The equivalent class {3, 4, 5} in π A is equals to: {3, 4} {5} from π A {B} with max size of {3, 4} and {5} = 2. Finally, for the equivalent class {6, 7, 8} in π A is equals to: {6} {7, 8} from π A {B} with max size of {6} and {7, 8} = 2. Hence, in our example, g3 (A B) = 1 (1+2+2)/8= In other words, at least three tuples out of the existing 8 tuples in Table 1 must be removed for the A B to hold. Such an FD: A B is said to be approximately hold on the relation shown in Table 1 with error rate of ε= This process of discovering AFDs is repeated for all attributes and for all of their combinations (candidate set). For instance, given a relation with five attributes (A, B, C, D, E) the candidate set is {φ, A, B, C, D, E, AB, AC, AD, AE, BC, BD, DE, CD, CE, DE, ABC, ABD, ABE, ACD, ACE, ADE, BCD,BCE, BDE, CDE, ABCD, ABCE, ABDE, ACDE, BCDE, ABCDE} for a total of 32 (i.e. 2 5 ) combinations. These candidate attributes of this relation are represented as a lattice as shown in Figure 1. Each node in Figure 1 represents a candidate attributes. An edge between any two nodes such as E and DE indicates that the AFD: DE D, needs to be checked. Hence, all known algorithms for this task have running times that can be in the worst case exponential in the number of tuples and in number of attributes [8]. Figure 1: Lattice for the Attributes of the Relation in Table Modified Tane Algorithm The original Tane algorithm [3] finds all non-trivial FDs by searching the Lattice in a levelwise manner. A level L l is the collection of attribute sets of size l such that the sets in L l can potentially be used to construct dependencies from the lattice. The algorithm starts with level L 1 = {{A} A R}, and computes L 2 from L 1, L 3 from L 2, and so on.

5 342 Jalal Atoum This algorithm employs the term, C(X), which is the collection of rhs candidates of a set X R and it is formally defined as C(X) = { A X X \ {A} A does not hold} R \ X. Furthermore, C + (X) is also used to indicate the collection of rhs + candidates of a set X R. C + (X) is formally defined as C +( X) = { A R for all B X, X \ {A, B} B does not hold}. The original Tane algorithm is defined below: Algorithm TANE: levelwise search of dependencies. 1 L 0 := {Ө} 2 C + ( Ө):= R 3 L 1 := {{A} A R} 4 l:= 1 5 while L l Ө 6 COMPUTE-DEPENDENCIES(L l ) 7 PRUNE(L l ) 8 L l +1 := GENERATE-NEXT-LEVEL(L l ) 9 l:= l + 1 The specification of the procedure COMPUTE-DEPENDENCIES is: Procedure COMPUTE-DEPENDENCIES(L l ) 1 for each X L l do 2 C + (X):= A X C + (X \ {A}) 3 for each X L l do 4 for each A X C + (X) do 5 if X \ {A} A is valid then 6 output X \ {A} A 7 remove A from C + (X) 8 remove all B in R \ X from C + (X) The specification of the procedure PRUNE is: Procedure PRUNE(L`) 1 for each X L l do 2 if C + (X) = Ө do 3 delete X from L l 4 if X is a (super)key do 5 for each A C + (X) \ X do 6 if A := b X C + (X {A} \ {B}) then 7 output X A 8 delete X from L l The specification of GENERATE-NEXT-LEVEL is L l +1 = {X X = l + 1 and for all Y with Y X and Y = l we have Y L l} TANE Algorithm was modified to compute all approximate dependencies X A with g 3 (X A) ε, for a given threshold value ε [3]. The key modification is the change of the validity test on line 5 of procedure COMPUTEDEPENDENCIES to: 5 if g 3 (X \ {A} A) ε then In addition, line 8 of COMPUTE-DEPENDENCIES has been replaced by: 8 I if X \ {Ag} A holds exactly then 9 remove all B in R \ X from C + (X)

6 Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes Time complexities Modified Tane Algorithm For a given relation R of size R attributes and size of r tuples. The time complexity of the modified TANE algorithm is dependent on the number of tuples in the database r, on the number of sets in all levels of the candidate attributes lattice s = O(2 R ), and on the number of keys K = O(2 R / R ). According to [3], the modified Tane algorithm has the following total time complexity: O(s( r + R 2 ) + K [R] 3 ) 6. Suggested Work In this paper, we suggest an algorithm that discovers all AFDs from databases with approximate dependency of at most ε that is called Approximate discovery of Functional Dependency using Minimal Cover and Equivalent Classes (AFDMCEC). This algorithm would reduce the number of attributes and AFDs to be checked by incorporating some concepts from relational database design theory. The first concepts involves an incremental minimal cover computation of AFDs during each phase of discovering AFDs. The aim of this concept is to minimize the number of AFDs to be checked. While the second concept involves the computation of equivalency of attributes based on their nontrivial closure. For each pair of attributes whose closures are found equal we remove one of them from the candidate set of attributes. Also, we add the fact that these two attributes are approximately equivalent ( ). This will reduce the number of attributes to be checked during each phase of the proposed algorithm. The following figure (Figure 2) presents the main procedure of AFDMCEC algorithm. Figure 2: The Main Procedure of the Approximate AFDMinEQC Algorithm AFDMCEC Algorithm: Input: dataset D and its attribute X 1,X 2,.,X n ε: Error threshold, 0 ε 1 Output: Minimal Approximate FD_Set, Candidate Set for next level, EQ_Set, 1. Initialization Step Set R= attribute (X 1, X 2,.., X n ) Nrows = number of rows in the database Set FD_Set = φ Approximate_FDSet=φ Set EQ_Set =φ Set Candidate_Set= {X 1, X 2,.., X n }. 2. While Candidate_Set φ Do For all X i Candidate_Set Do Approximate_FDSet= ComputeMinimalApproximate_FD(X i ) GenerateNextLevelCandidates(Candidate_Set) 3. Display ApproximateFD_Set The main procedure of the Approximate AFDMCEC algorithm calls the ComputeMinimalApproximate_FD(X i ) for each X i in Candidate_set as shown in Figure 3. For each attribute Y, Y R - X i, if g 3 (X Y) ε then add X i Y to ApproximateFD_Set, and if the approximate closure of X i is the same as the approximate closure of Y then add Y X i to ApproximateFD_Set, add X i Y to EQ_Set and remove Y from candidate_set. Finally, Figure 4 presents the GenerateNextLevelCandidates procedure.

7 344 Jalal Atoum Figure 3: ComputeApproximate_FD Procedure ComputeMinimalApproximate_FD (X i ) Max = 0 TempList=Φ. For each Y R - X i Do M=? Xi N=? XiY For all S N Do For all T M Do If T S then W=W T If Max < Len (T) then max = Len (W) Add Max to Tmplist For I = 1 to Len(Tmplist) J = J+ Tmplist(I) Result = 1-J/ NRows If Result ε Then Add X i Y to ApproximateFD_Set If (approximate_closure(x) = approximate_closure(y)) then Add Y X i to ApproximateFD_Set Add X i to closure' [Y] Add X i Y to EQ_Set Remove Y from candidate_set Figure 4: GenerateNextLevelCandidates procedure Procedure GenerateNextLevelCandidates(CANDIDATE_SET ) For each Xi CANDIDATE_SET do For each Xj CANDIDATE_SET do If (Xi[1]=Xj[1],, Xi[k-2] = Xj[k-2] and Xi[k-1] < Xj[k-1]) then Set Xij = Xi join Xj If Xij TmpList then delete Xij else Compute the partition ПXij of Xij 6.1. Time Complexity of AFDMCEC Algorithm Initially, the proposed algorithm will scan the whole table of size r tuples in order to find all equivalent classes for a time complexity of r. Then the main body of the AFDMCEC algorithm has a loop that iterates R times. Therefore, this main body has a time complexity of R. Within each iteration of this loop, there is a call for each of the following procedures: 1. ComputeMinimalApproximate_FD(), each call of this procedure takes R iterations. In each of these iterations there is a loop that scans all of the candidates in that level of size s = 2 R. Hence the total time of this step is s * R. 2. GeneratNextLevelCandidates(Candidate_Set) this procedure performs two nested loops, each with R iteration for a total time of R 2. Therefore, the total time complexity required by the AFDMCEC algorithm is: O( r + R (s R + R 2 )) = O( r + s R 2 + R 3 )

8 Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes Experimental Analysis As a result of running both algorithms (Modified Tane and AFDMCEC), the same set of AFDs from the UCI datasets [12] had been generated. Furthermore, Table 2 shows the results of the actual times required for Modified TANE algorithm and for AFDMCEC algorithm for these UCI datasets with varying number of attributes and tuples and with different thresholds ε values for discovering all AFDs. Table 2: Actual Time Requirements at all level for Both Algorithms (Modified TANE and the AFDMCEC algorithms) for Some UCI Datasets for different thresholds ε. ε = 0.0 ε = 0.05 ε = 0.25 ε = 0.5 DataBase Time (Min) Time (Min) Time (Min) Time (Min) AFDMCEC ModTane AFDMCEC ModTane AFDMCEC ModTane AFDMCEC ModTane Abalone Balance-scale Breast-cancer Bridge Chess Echocardiogram Glass Iris Nursery Machine From Table 2, the same AFDs are found more efficiently using our proposed algorithm in comparisons with the modified version of Tane algorithm. This had happened as a result of more equivalent classes and consequently more equivalent attributes. The more equivalent attributes lead to less number of AFDs to be checked for satisfaction. Furthermore, we notice that the higher the thresholds ε value the decrease in time requirements for both algorithms. This is due to the fact that with higher threshold value ε, the more error rates are allowed and less number of tuples to be removed in computing of g Time Complexity Comparisons Table 3 presents the time complexity comparison that are computed earlier for AFDMCEC Algorithm and for modified TANE Algorithm. Table 3: Time Complexity Comparison Based on T(n) for Both Algorithms Database Name # of Attribute # Of Tuples Modified Tane AFDMCEC s( r + R 2 ) + K [R] 3 r + s R 2 +. R 3 Abalone 9 4, Balance-scale Breast-cancer Bridge Chess 7 28, Echocardiogram Glass Iris Nursery 9 12, Machine

9 346 Jalal Atoum 9. Conclusions We have suggested new algorithm for discovering AFDs from large relational databases, based on an approximation measure g 3 which employs the concepts of equivalent properties and minimal (Canonical) cover of FDs. The aim of this algorithm is to optimize the time requirements when compared with a modification of a previous algorithm called TANE. The analyses of the AFDMCEC algorithm had a better performance over the modified version of the TANE algorithm. Furthermore, simulation results for both algorithms have shown that as the thresholds ε values increases both algorithms perform much better with dramatic decreases in time requirements. With higher thresholds values more discovered AFDs are found from database. In this case, most of the discovered AFDs are useless in terms of discovered valuable knowledge since they have high error rates (high threshold values ). References [1] Dalkilic, M.M., Gucht, D. V., and Robertson, E. L, CE: the Classifier-Estimator Framework for Data mining. In Proceedings of the 7 th IFIP 2.6 Working Conference on Database Semantics (DS-7), Leysin, Switzerland, Oct Chapman and Hall. [2] Giannella, Chris and Robertson, Edward, 2004 On Approximation Measures for Functional Dependencies, Inform Action Systems Archive 29(6), [3] Huhtala, Y., Karkkainen, J., Porkka P., and Toivonen, H., Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2): [4] Ilyas, I. F., Markl, V., Haas, P., Brown P., and Aboulnaga, A Cords: Automatic Discovery of Correlations and Soft Functional Fependencies. In SIGMOD 04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages , New York, NY, USA. [5] Jiang, N. and Gruenwald, L., "Research Issues in Data Stream Association Rule Mining", SIGMOD Record, Vol. 35, No. 1. [6] Kalavagattu, Aravind Krishna., Mining approximate Dependencies as Condensed Representations of Association Rules, Master Thesis, Arizona State University. [7] Kramer, S. and Pfahringer, B., Efficient search of strong partial determinations. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 96), pages , Portland, OR, Aug AAAI Press. [8] Kivinen, J., and Mannila, H., Approximate Inference of Functional Dependencies From Relations. Theoretical Computer Science, 149: [9] Nambiar, U. and Kambhampati, S., Answering Imprecise Queries over Autonomous Web Databases. In ICDE, page 45. [10] Novelli, N., and Cicchetti, R., Fun: An Efficient Algorithm for Mining Functional and Embedded Dependencies. Proceedings of the 8th International Conference on Database Theory (ICDT), pages [11] Perugini, S., and Ramakrishnan N., Mining Web Functional Dependencies for Flexible Information Access, Journal of the Americal Society for Information Science and Technology. [12] UCI Machine Learning Repository, /~mlearn/ MLRepository.html. [13] Wolf, G, Khatri, H., Chen, Y., and Kambhampati, S., Quic: A System for Handling Imprecision & Incompleteness in Autonomous Databases (demo). In CIDR, pages [14] Wolf, G., Khatri, H., Chokshi, B., Fan J., Chen, Y., and Kambhampati, S., Query Processing over Incomplete Autonomous Databases. In VLDB 07: Proceedings of the 33rd international conference on Very large data bases, pages VLDB Endowment.

Mining Functional Dependency from Relational Databases Using Equivalent Classes and Minimal Cover

Mining Functional Dependency from Relational Databases Using Equivalent Classes and Minimal Cover Journal of Computer Science 4 (6): 421-426, 2008 ISSN 1549-3636 2008 Science Publications Mining Functional Dependency from Relational Databases Using Equivalent Classes and Minimal Cover 1 Jalal Atoum,

More information

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

Data Dependencies Mining In Database by Removing Equivalent Attributes

Data Dependencies Mining In Database by Removing Equivalent Attributes International Journal of Scientific Research in Computer Science & Engineering Research Paper Vol-1, Issue-4 ISSN: 2320-7639 Data Dependencies Mining In Database by Removing Equivalent Attributes Pradeep

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Effectiveness of Freq Pat Mining

Effectiveness of Freq Pat Mining Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Query Processing over Incomplete Autonomous Databases

Query Processing over Incomplete Autonomous Databases Query Processing over Incomplete Autonomous Databases Garrett Wolf (Arizona State University) Hemal Khatri (MSN Live Search) Bhaumik Chokshi (Arizona State University) Jianchun Fan (Amazon) Yi Chen (Arizona

More information

Functional Dependencies and Single Valued Normalization (Up to BCNF)

Functional Dependencies and Single Valued Normalization (Up to BCNF) Functional Dependencies and Single Valued Normalization (Up to BCNF) Harsh Srivastava 1, Jyotiraditya Tripathi 2, Dr. Preeti Tripathi 3 1 & 2 M.Tech. Student, Centre for Computer Sci. & Tech. Central University

More information

IQPI: An Incremental System for Answering Imprecise Queries Using Approximate Dependencies and Concept Similarities

IQPI: An Incremental System for Answering Imprecise Queries Using Approximate Dependencies and Concept Similarities IQPI: An Incremental System for Answering Imprecise Queries Using Approximate Dependencies and Concept Similarities S. M. Fakhr Ahmad, M. H. Sadreddini, M. Zolghadri Jahromi 1 Abstract Most of the proposed

More information

Functional Dependency: Design and Implementation of a Minimal Cover Algorithm

Functional Dependency: Design and Implementation of a Minimal Cover Algorithm IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 5, Ver. I (Sep.- Oct. 2017), PP 77-81 www.iosrjournals.org Functional Dependency: Design and Implementation

More information

Keywords Functional dependency, inclusion dependency, conditional dependency, equivalence class, dependency discovery.

Keywords Functional dependency, inclusion dependency, conditional dependency, equivalence class, dependency discovery. Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fast and Efficient

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Optimization using Ant Colony Algorithm

Optimization using Ant Colony Algorithm Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department

More information

Discovery of Data Dependencies in Relational Databases

Discovery of Data Dependencies in Relational Databases Discovery of Data Dependencies in Relational Databases Siegfried Bell & Peter Brockhausen Informatik VIII, University Dortmund 44221 Dortmund, Germany email: @ls8.informatik.uni-dortmund.de Abstract Since

More information

Closed Non-Derivable Itemsets

Closed Non-Derivable Itemsets Closed Non-Derivable Itemsets Juho Muhonen and Hannu Toivonen Helsinki Institute for Information Technology Basic Research Unit Department of Computer Science University of Helsinki Finland Abstract. Itemset

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Identifying Useful Data Dependency Using Agree Set form Relational Database

Identifying Useful Data Dependency Using Agree Set form Relational Database Volume 1, Issue 6, September 2016 ISSN: 2456-0006 International Journal of Science Technology Management and Research Available online at: Identifying Useful Data Using Agree Set form Relational Database

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms

Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms Thorsten Papenbrock 2 Jens Ehrlich 1 Jannik Marten 1 Tommy Neubert 1 Jan-Peer Rudolph 1 Martin Schönberg 1 Jakob Zwiener

More information

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets American Journal of Applied Sciences 2 (5): 926-931, 2005 ISSN 1546-9239 Science Publications, 2005 Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets 1 Ravindra Patel, 2 S.S.

More information

CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies

CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies I. F. Ilyas, V. Markl, P. Haas, P. Brown and A. Aboulnaga SIGMOD 2004 Presenter: Nabiha Asghar. Outline Introduction & Motivation

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization. This lecture Databases -Normalization I This lecture introduces normal forms, decomposition and normalization (GF Royle 2006-8, N Spadaccini 2008) Databases - Normalization I 1 / 23 (GF Royle 2006-8, N

More information

Discovery of Interesting Data Dependencies from a Workload of SQL Statements

Discovery of Interesting Data Dependencies from a Workload of SQL Statements Discovery of Interesting Data Dependencies from a Workload of SQL Statements S. Lopes J-M. Petit F. Toumani Université Blaise Pascal Laboratoire LIMOS Campus Universitaire des Cézeaux 24 avenue des Landais

More information

Advance Association Analysis

Advance Association Analysis Advance Association Analysis 1 Minimum Support Threshold 3 Effect of Support Distribution Many real data sets have skewed support distribution Support distribution of a retail data set 4 Effect of Support

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India

More information

MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm

MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm , pp.55-66 http://dx.doi.org/0.457/ijhit.04.7..6 MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm Wiem Taktak and Yahya Slimani Computer Sc. Dept, Higher Institute of Arts MultiMedia (ISAMM),

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

Approximate Functional Dependencies for XML Data

Approximate Functional Dependencies for XML Data Approximate Functional Dependencies for XML Data Fabio Fassetti and Bettina Fazzinga DEIS - Università della Calabria Via P. Bucci, 41C 87036 Rende (CS), Italy {ffassetti,bfazzinga}@deis.unical.it Abstract.

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Induction of Association Rules: Apriori Implementation

Induction of Association Rules: Apriori Implementation 1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University

More information

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets A Two-Phase Algorithm for Fast Discovery of High Utility temsets Ying Liu, Wei-keng Liao, and Alok Choudhary Electrical and Computer Engineering Department, Northwestern University, Evanston, L, USA 60208

More information

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24 Databases -Normalization I (GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 1 / 24 This lecture This lecture introduces normal forms, decomposition and normalization. We will explore problems

More information

Dependency-based Query Result Approximation

Dependency-based Query Result Approximation Dependency-based Query Result Approximation (extended abstract) Loredana Caruccio, Vincenzo Deufemia, and Giuseppe Polese Department of Computer Science, University of Salerno, via Giovanni Paolo II n.132,

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

Efficient Computation of Data Cubes. Network Database Lab

Efficient Computation of Data Cubes. Network Database Lab Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database

More information

SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset

SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset Ye-In Chang and Yu-Ming Hsieh Dept. of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan, Republic

More information

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

More information

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

Survey: Efficent tree based structure for mining frequent pattern from transactional databases IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 5 (Mar. - Apr. 2013), PP 75-81 Survey: Efficent tree based structure for mining frequent pattern from

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

An Empirical Comparison of Methods for Iceberg-CUBE Construction. Leah Findlater and Howard J. Hamilton Technical Report CS August, 2000

An Empirical Comparison of Methods for Iceberg-CUBE Construction. Leah Findlater and Howard J. Hamilton Technical Report CS August, 2000 An Empirical Comparison of Methods for Iceberg-CUBE Construction Leah Findlater and Howard J. Hamilton Technical Report CS-2-6 August, 2 Copyright 2, Leah Findlater and Howard J. Hamilton Department of

More information

Mining Generalised Emerging Patterns

Mining Generalised Emerging Patterns Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au

More information

Ordering Depth First Search to Improve AFD Mining

Ordering Depth First Search to Improve AFD Mining Ordering Depth First Search to Improve AFD Mining Jeremy T. Engle Indiana University Lindley Hall 25 Bloomington, Indiana, USA jtengle@indiana.edu Edward L. Robertson Indiana University Lindley Hall 25

More information

Fast Algorithm for Mining Association Rules

Fast Algorithm for Mining Association Rules Fast Algorithm for Mining Association Rules M.H.Margahny and A.A.Mitwaly Dept. of Computer Science, Faculty of Computers and Information, Assuit University, Egypt, Email: marghny@acc.aun.edu.eg. Abstract

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:

More information

Review: Attribute closure

Review: Attribute closure CS445 - Introduction to Database Management Systems Fall Semester 2015 LECTURE 10 Functional Dependencies, Normalization Part II TEXTBOOK REFERENCE: CHAPTER 19 CS445 DATABASES: LECTURE 10 1 Review: Attribute

More information

Optimized Frequent Pattern Mining for Classified Data Sets

Optimized Frequent Pattern Mining for Classified Data Sets Optimized Frequent Pattern Mining for Classified Data Sets A Raghunathan Deputy General Manager-IT, Bharat Heavy Electricals Ltd, Tiruchirappalli, India K Murugesan Assistant Professor of Mathematics,

More information

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm Narinder Kumar 1, Anshu Sharma 2, Sarabjit Kaur 3 1 Research Scholar, Dept. Of Computer Science & Engineering, CT Institute

More information

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department

More information

Production rule is an important element in the expert system. By interview with

Production rule is an important element in the expert system. By interview with 2 Literature review Production rule is an important element in the expert system By interview with the domain experts, we can induce the rules and store them in a truth maintenance system An assumption-based

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database. Volume 6, Issue 5, May 016 ISSN: 77 18X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fuzzy Logic in Online

More information

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES Prof. Ambarish S. Durani 1 and Mrs. Rashmi B. Sune 2 1 Assistant Professor, Datta Meghe Institute of Engineering,

More information

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong Recall A primary goal of database design is to decide what tables to create. Usually, there are two principles:

More information

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values Patrick G. Clark Department of Electrical Eng. and Computer Sci. University of Kansas Lawrence,

More information

Roadmap. PCY Algorithm

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

Schema Refinement & Normalization Theory 2. Week 15

Schema Refinement & Normalization Theory 2. Week 15 Schema Refinement & Normalization Theory 2 Week 15 1 How do we know R is in BCNF? If R has only two attributes, then it is in BCNF If F only uses attributes in R, then: R is in BCNF if and only if for

More information

COMP7640 Assignment 2

COMP7640 Assignment 2 COMP7640 Assignment 2 Due Date: 23:59, 14 November 2014 (Fri) Description Question 1 (20 marks) Consider the following relational schema. An employee can work in more than one department; the pct time

More information

Lecture 2 Wednesday, August 22, 2007

Lecture 2 Wednesday, August 22, 2007 CS 6604: Data Mining Fall 2007 Lecture 2 Wednesday, August 22, 2007 Lecture: Naren Ramakrishnan Scribe: Clifford Owens 1 Searching for Sets The canonical data mining problem is to search for frequent subsets

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Tendency Mining in Dynamic Association Rules Based on SVM Classifier

Tendency Mining in Dynamic Association Rules Based on SVM Classifier Send Orders for Reprints to reprints@benthamscienceae The Open Mechanical Engineering Journal, 2014, 8, 303-307 303 Open Access Tendency Mining in Dynamic Association Rules Based on SVM Classifier Zhonglin

More information

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB) Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017 International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 17 RESEARCH ARTICLE OPEN ACCESS Classifying Brain Dataset Using Classification Based Association Rules

More information

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,

More information

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS APPLYIG BIT-VECTOR PROJECTIO APPROACH FOR EFFICIET MIIG OF -MOST ITERESTIG FREQUET ITEMSETS Zahoor Jan, Shariq Bashir, A. Rauf Baig FAST-ational University of Computer and Emerging Sciences, Islamabad

More information

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please) Virginia Tech. Computer Science CS 4604 Introduction to DBMS Spring 2016, Prakash Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please) Reminders: a. Out of 100 points.

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

arxiv: v2 [cs.db] 13 Dec 2010

arxiv: v2 [cs.db] 13 Dec 2010 Defining and Mining Functional Dependencies in Probabilistic Databases Sushovan De Deptt. of Computer Science and Engineering Arizona State University sushovan@asu.edu Subbarao Kambhampati Deptt. of Computer

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

Design Theory for Relational Databases

Design Theory for Relational Databases By Marina Barsky Design Theory for Relational Databases Lecture 15 Functional dependencies: formal definition X Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes

More information

Challenges and Interesting Research Directions in Associative Classification

Challenges and Interesting Research Directions in Associative Classification Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Email: FFayez@philadelphia.edu.jo

More information

Association Rules. A. Bellaachia Page: 1

Association Rules. A. Bellaachia Page: 1 Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...

More information

COSC Dr. Ramon Lawrence. Emp Relation

COSC Dr. Ramon Lawrence. Emp Relation COSC 304 Introduction to Database Systems Normalization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Normalization Normalization is a technique for producing relations

More information

Discover Dependencies from Data - A Review

Discover Dependencies from Data - A Review Discover Dependencies from Data - A Review Jixue Liu 1 Jiuyong Li 1 Chengfei Liu 2 Yongfeng Chen 3 1 School of Computer and Info. Sci., University of South Australia {jixue.liu, jiuyong.li}@unisa.edu.au

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Enumerating Pseudo-Intents in a Partial Order

Enumerating Pseudo-Intents in a Partial Order Enumerating Pseudo-Intents in a Partial Order Alexandre Bazin and Jean-Gabriel Ganascia Université Pierre et Marie Curie, Laboratoire d Informatique de Paris 6 Paris, France Alexandre.Bazin@lip6.fr Jean-Gabriel@Ganascia.name

More information

Approximation of Frequency Queries by Means of Free-Sets

Approximation of Frequency Queries by Means of Free-Sets Approximation of Frequency Queries by Means of Free-Sets Jean-François Boulicaut, Artur Bykowski, and Christophe Rigotti Laboratoire d Ingénierie des Systèmes d Information INSA Lyon, Bâtiment 501 F-69621

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design 1 Normalization What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update Normalization uses concept of dependencies Functional

More information

Finding frequent closed itemsets with an extended version of the Eclat algorithm

Finding frequent closed itemsets with an extended version of the Eclat algorithm Annales Mathematicae et Informaticae 48 (2018) pp. 75 82 http://ami.uni-eszterhazy.hu Finding frequent closed itemsets with an extended version of the Eclat algorithm Laszlo Szathmary University of Debrecen,

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road

More information

An algorithm for Performance Analysis of Single-Source Acyclic graphs

An algorithm for Performance Analysis of Single-Source Acyclic graphs An algorithm for Performance Analysis of Single-Source Acyclic graphs Gabriele Mencagli September 26, 2011 In this document we face with the problem of exploiting the performance analysis of acyclic graphs

More information

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY Ramadevi Yellasiri, C.R.Rao 2,Vivekchan Reddy Dept. of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, INDIA. 2 DCIS, School

More information

Limsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu)

Limsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu) Theory, Practice, and an Application of Frequent Pattern Space Maintenance Limsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu) 2 What Data? Transactional data Items, transactions,

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Chapter 6: Association Rules

Chapter 6: Association Rules Chapter 6: Association Rules Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model. Transaction data (no time-dependent) Assume all data are categorical. No good

More information

Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

More information