The Use of Statistics in Semantic Query Optimisation

Size: px
Start display at page:

Download "The Use of Statistics in Semantic Query Optimisation"

Transcription

1 The Use of Statistics in Semantic Query Optimisation Ayla Sayli ( saylia@essex.ac.uk ) and Barry Lowden ( lowdb@essex.ac.uk ) University of Essex, Dept. of Computer Science Wivenhoe Park, Colchester, CO4 3SQ, Essex, UK Abstract An important aspect of semantic query optimisation is automatic rule derivation. These rules are used to make the query process more intelligent. However, the set of the rules generated may become very large and some rules in the set may not be useful. This problem is normally referred to as the utility problem. Our paper is concerned with limiting the rules set using the chisquare test in statistics to determine a relationship degree between the antecedent attribute of a candidate rule and the consequent attribute of the rule. We have constructed the chi-square table according to the condition on the antecedent attribute of the rule and the condition on the consequent attribute of the rule. If the rule does not have a 'strong relationship degree it can be added into a secondary rules set. This secondary rules set can be used as a filter to avoid derivations of similar weak rules for future queries. Otherwise, the rule is added into a primary rules set. This primary rules set contains all strong rules. In case of large databases, it is hoped that this additional test can reduce the size of the rules set for semantic query optimisation. Introduction Semantic Query Optimisation (SQO) is a comparatively recent approach which can be used to transform a query into an alternative query that has the same answer but can be processed more efficiently. The main difference between the semantic approach and the other optimisation approaches is to use rules during the query optimisation [Graefe and Dewitt, 987; King, 98]. Moreover these rules are derived automatically for the given query when they are needed. Derivation and use of rules makes SQO more intelligent and less expensive. Using any of a number of approaches to SQO it is possible to derive all rules from a given query and databases. The approaches may be classified as heuristic_based systems [Siegel et al., 992], logic_based systems [Chakravarthy et al., 990], graph_based systems [Shenoy and Ozsoyoglu, 989] and data_driven systems [Hsu and Knoblock, 994; Lowden et al., 995; Shekhar et al., 993]. However, having a large rules set remains a problem in all the existing systems since rules are produced automatically regardless of how effective they might be in the query transformation process [Chan and Wong, 99; Han et al., 993; Piatetsky-Shapiro and Matheus, 993; Savnik and Flach, 993; Ziarko, 99]. This is known as the utility problem. For this reason we are suggesting the use of the chi-square test in statistical methods to measure the relationship degree of a Candidate Rule (CR) with a given confidence level and degree of freedom [Chan and Wong, 99]. If the relationship degree of the CR is greater than the decision value of the test, we store it in a primary rules set as a strong rule. Otherwise it is placed in a secondary rules set as a weak rule. The remainder of this paper is structured as follows. Our approach to SQO with statistical additions is shown in section 2, in section 2. the automatic rule derivation for SQO is given for the unmatched CRs using the chi-square test. This test is given in more detail in section 3. In section 4, we prove applicability of the test using a data-driven system approach [Lowden et al., 995]. Finally, in section 5, the results of three experiments using the system are given. 2 Semantic Query Optimisation In general SQO takes a given query in a query language such as SQL, which is the language used in this paper. First it adopts one of the approaches [Chakravarthy et al., 990; Lowden et al., 995; Shenoy and Ozsoyoglu, 989; Siegel et al., 992] to generate a CR from the represented query and a given database. Secondly a check is made to see if there is a rule in the rules set which can match the CR, the matching rule may be used to derive an alternative query. Otherwise the unmatched CR goes into the rule derivation process for future query. In our paper we are going to use two rule sets, a primary rules set and a secondary rules set. Firstly we check the CR against the primary rules set. If there is no matching rule then we examine the secondary rules set to see if there is a weak rule that matches the CR. If the CR is unmatched in both rules sets, the new CR can be used to derive a new rule for future queries (see section 3 Thirdly, in the case of finding a matching rule, SQO transforms the query into an alternative query according to the standard transformation rules of Constraint INtroduction (CIN) and Constraint REmoval (CRE) [Chakravarthy et al., 990; Graefe and Dewitt 987; King, 98; Shenoy and Ozsoyoglu, 989; Siegel et al., 992]. Finally when all alternative queries have been found with their transformations costs, the optimum query is selected. Moreover a matching rule is more likely to be found in the primary rules set since this set contains the strongest rules for the given database. The system is illustrated in Figure.

2 Presenting a given query in SQL Primary Rules Set Producing CRs of the given query by one of approaches in SQO Matching CRs Automatic Rule Derivation Database Schema Given Query Transforming the given query by matching rules using CIN or CRE & Estimating costs of the transformations SQO Optimum Query Figure. Semantic Query Optimiser Unmatched CRs & Costs of CRs Rule Derivation Processing by one of the approaches for SQO Chi-square Test Chi-squarevalue >= Decision Value Database Schema Primary Rules Set Secondary Rules Set System Catalog : If the rule contains indexed attributes Figure 2. Automatic Rule Derivation 2. Automatic Rule Derivation As mentioned before, an important aspect of SQO is to derive its own rules automatically when needed. For example, the process of automatic rule derivation by heuristic_based systems [Schkolnick and Tiberio, 985; Siegel et al., 992] is shown with four modules: defining rule characteristics, selecting requested rules, query generation, and rule management. However, the set of these rules contains non-useful rules and may become very large. Having non-useful rules in SQO causes inefficient and slow query processing. Moreover the system can involve a large number of comparisons that make SQO relatively expensive. For these reasons, our suggested approach is to use the chi-square test to calculate a relationship degree for each new rule using a given confidence level and degree of freedom. If the calculated relationship degree of the new rule is greater or equal to the decision value in the list of the chi-square decision values, the new rule is entered to the primary rules set, otherwise the secondary rules set. In contrast to CHAID [Kass, 980], our method of using chi-square test does not split the rules set into two sets against a given database. Our methods use conditions of the CR to construct the chi-square table in order to measure the relationship degrees of each rule, not the relationship degrees of its attributes class. In the case of categorising dependent variables in very large databases, CHAID can be very useful in future work. Our system incorporating the chi-square statistical test is illustrated in Figure 2. 3 Analysing the Rules Set by Chi-square Test As mentioned earlier, if CRs are kept in rules sets according to their reliability in the database, it is possible to limit the number of rules in the sets [Chan and Wong, 99; Imam et al., 993; Piatetsky-Shapiro and Matheus, 993; Siegel et al., 992]. We now explain how we use the chi-square test on the rule derivation process theoretically, step by step, and then we give examples for the usage of the test in SQO in section Chi-squareTest on the Rule Derivation Process Assume that an unmatched CR is A a B b. R is a relation of a given database. A in the relation R is an antecedent attribute of the rule, B in the relation R is the consequent attribute of the rule. is the condition on the

3 antecedent attribute (A is the condition on the consequent attribute (B a and b are constant. It is important to determine whether this rule should be in the primary rules set. The first step of the test is to arrange a chi-square table as follows: i) Rows and columns in the table are represented as below: -The first row characterises occurrences of the condition ( ) on the antecedent attribute (A) in the database, (A a -The second row characterises occurrences of negation of the condition ( ) on the antecedent attribute (A) in the database, (A a -The first column characterises occurrences of the condition ( ) on the consequent attribute (B), (B b -The second column characterises occurrences of negation of the condition ( ) on the consequent attribute (B), (B b ii) Cell values of the table are counted numbers of the occurrences according to the conditions on the attributes. For example, the first cell value, X is a number of occurrences of B b if A a is true; the second cell value, X 2 is a number of occurrences of (B b ) if A a ;...,so on. iii) Calculated values of the last row and the last column, in Table, are totals of each row/column values according to their location in the table. For example, TR is the total of all values in row, ( TR = X + X 2 ), TR2 is the total of all values in row 2, ( TR 2 = X 2 + X 22 ) and so on. T is the total of all values in chi-square table (T= X X 22 Table is the table of the chi-square test showing all represented symbols. Chi-square Table B b (B b ) TR A a X X 2 TR (A a ) X 2 X 22 TR 2 TC TC TC 2 T Table. Table of the chi-square test The second step is to measure the relationship degree of the rule that can be found using the values of Table and the chi-square formulae in statistics. This formula, () is given as: Chi-square Value = n m 2 ( X ) 2 ij = ( Xij Yij ) / Yij = T, ( m i= j= Yij TRi TC j 2, n 2) and where Yij = ( * ), (where T TR i = m j= X ij and TC j = n i= For 2*2 metrics of the chi-square table, another short formula can be used instead of ( This formula is given below: Chi-square Value = 2 =T*( X * X X * X ) /( TC * TC * TR * TR) X ij ) 2 2 After determining the relationship degree, it is compared to the decision value of chi-square test according to a given confidence level and degree of freedom. The degree of freedom is equal to v = (n-)*(m-) = (2-)*(2-) = where n is number of rows in the table (n=2), m is the number of columns in the table (m=2 The final stage is to evaluate the CR : If the relationship degree of the rule is less than the decision value and the CR does not contain any indexed attributes, this CR may be discarded from the primary rules set directly because it has a low relationship degree between the antecedent attribute of the rule and the consequent attribute of the rule. In other words, it is not likely to be relevant to future queries. However it may possible to derive the same weak rule for future queries. Therefore if we can store this tested rule in the secondary rules set as a weak rule we can avoid a subsequent derivation process. Another advantage of having the secondary rules set in SQO is to limit the primary rules set because all weak rules are stored in the secondary rules set. Otherwise this CR is added into the primary rules set since it has a high relationship degree in the database, in other words, it is a promising rule for optimising future queries. Limiting the size of the primary rules set can be a solution to the utility problem in large databases. 4 Applicability of Chi - square Test on Automatic Rule Derivation for Semantic Query Optimisation As mentioned before it is possible to prove applicability of the test using one of the current SQO approaches. Our software is based on a data driven approach [Lowden et al., 995]. Our examples are based on a small database of the DEPARTMENT relation which has 25 instances and 4 different attributes (Dcode, Dname, Project, and Manager), and assumes initially that the system does not

4 have any rules in the primary rules set or in the secondary rules set. Dcode is an index attribute of the relation. This database is given in Table 2. In this section, examples are given to show how the chi-square test can be used to limit the rules set in the rule derivation process for SQO. Dcode Dname Project Manager 'ACCT' 'Accounting' 'ACCT0' 'ACCT' 'Accounting' 'ACCT02' 'ACCT' 'Accounting' 5 'ACCT0' 'MKTG' 'Marketing' 2 'MKTG0' 'MKTG' 'Marketing' 2 'MKTG04' 'MKTG' 'Marketing' 2 'MKTG05' 'MKTG' 'Marketing' 3 'MKTG02' 'MKTG' 'Marketing' 4 'MKTG03' 'MKTG' 'Marketing' 5 'MKTG0' 'MKTG' 'Marketing' 5 'MKTG04' 'MKTG' 'Marketing' 5 'MKTG05' 'MKTG' 'Marketing' 6 'MKTG02' 'MKTG' 'Marketing' 6 'MKTG03' 'PRSN' 'Personnel' 7 'PRSN0' 'PRSN' 'Personnel' 8 'PRSN02' Table 2. The database of the DEPARTMENT relation Assume that we are looking for dname where dcode = MKTG. This query can be represented in SQL as : select dname from DEPARTMENT where dcode = ' MKTG'; Using the data_driven approach, two rules will be derived: ( i) dcode = MKTG' dname = Marketing and (ii) dname = 'Marketing' project >= 2. We take the first rule (i) to show how the system works. Firstly we construct the chi-square table of the rule using Table 2 where A a = (dcode= MKTG ), (A a ) = (dcode= MKTG ), B b = (dname= Marketing ) and (B b ) = (dname = Marketing The table is as follows: Chi-square Table B b (B b ) TR A a (A a ) TC Using the formula () and the table, the relationship degree of the rule can be found: this is equal to 5. This degree is greater than the decision value, according to the 99% confidence level and degree of freedom (v= From the result, this rule should be added into the primary rules set as a strong rule. When the same process is performed on the second rule, (ii) as below, the relationship degree of the rule is found to be equal to 4.6. It is less than the decision value and the CR does not contain the index attribute. Therefore it is added into the secondary rules set. Experiment 3 of the next section shows how likely to derive weak rules and strong rules. 5 Computational Results Our first experiment on the query optimisation process was investigated using only the primary rules set that is given in Table 3. The second experiment was carried out using all rules in Table 4. As a consequence Table 5 shows all time savings between the first and second experiments. Our approach has been tested on a 7000 instances database. The relational schema of the database is given below. STUDENT (name char(30), regno integer, logname char(8), advisor integer, entry integer, year char(2), scheme char(6), uccacode char(6), status char(), examno integer, study char(8), school char(4)) Figure 3 Relational Schema of STUDENT database The database is indexed on the name attribute and the regno attribute. Ten specified queries were as follows: Q) select * from student where status = 'A' and school = 'CSG'; Q2) select * from student where advisor = 479 and uccacode = 'B70002'; Q3) select * from student where entry < 9 and entry > 83 and examno = 0 and school = 'SEG'; Q4) select * from student where advisor = 259 and entry = 90 and status = A ; Q5) select scheme from student where entry = 90 and study = 'PHD CHEM'; Q6) select name, regno, advisor from student where entry = 85 and year = 'G'; Q7) select * from student where uccacode = 'Q008' and entry = 90; Q8) select advisor from student where uccacode = 'B70002'; Q9) select name, regno, advisor from student where entry = 85 and regno <= and regno >= ; Q0) select examno from student where school= 'SEG'; The above queries were chosen for the rule derivation procedure according either to their features or to illustrate

5 the query optimisation process. All queries in the tables are shown with their numbers, e.g. Query 9 is presented as Q9. If a query does not exist in one of tables then the query is not useful. All results are computed from average values found by executing the queries five different times. 5. Experiment Results shown in Table 3 are for the query optimisation procedure using the primary rules set only. The first column of the table shows times (in seconds) for the given queries without reformulation. The second column is with reformulation using the primary rules set. Third column gives numbers of fired rules for the given queries. The last column shows all calculated execution time differences between the first column and the second column. These times in the last column show that reformulation of given queries can be used to reduce the execution times of the original queries. In Table, the greatest savings are from Q5 and Q0 because the conditions of Q5 are refuted by a rule in the primary rules set and the answer of Q0 is found from a rule without executing the given query. Q4, Q6 and Q7 are transformed by rules containing indexed attributes. The execution time of the reformulated query of Q9 is higher than the original query because the original query contains indexed attribute(s) in its where clause. This type of query is already inexpensive to execute without reformulation. Table 3 System operating with the primary rules set from Query to Query2. The system has 33 strong rules in the primary rules set and 2 weak rules in the secondary rules set: Execution Time (secs.) Without ref.with ref. Num. Fired Saved Rules Time Q) Q2) Q3) Q4) Q5) Q6) Q7) Q8) Q9) Q0) Table 4 System operating with all rules (45 rules) Num. Fired Saved Without ref.with ref. Rules Time Q) Q2) Q3) Q4) Q5) Q6) Q7) Q8) Q9) Q0) Experiment 2 This experiment was performed to show the query optimisation process using all rules, e.g. both primary and secondary, in order to compare results of our system. Results are shown in Table 4. Table 5 gives a comparison between Experiments and 2. Differences between the numbers of fired rules, Table 3 and Table 4, show how many weak rule(s) are fired to transform the given queries. Table 5 Efficiency of using only the primary rules set instead of all rules Saved Saved Saved Time Time () Time (2) [() - (2)] Q) Q3) Q4) Q5) Q6) Q7) Q8) Q0) Experiment 3 Our last experiment gave timings for the rule derivation process both with chi-square test and without and are shown in Graph. This graph is based on the averaged times for each rule for a running query. The proportion of weak/strong rules in the rules sets was found to be 29% for weak rules and 7% for strong rules on the STUDENT database. Finally 8% saving was made by splitting up rules into two rules sets according to the chisquare test and currently we are testing the system on a database which has 27,266 instances.

6 Evaluation times Graph : Evaluation times for each rule in the rule derivation process (seconds) 3 6 Conclusion 5 7 with the test 9 Rules without the test This paper has focused on the usage of the chi-square test in automatic rule derivation to limit the set of rules. From the results, the test promises to eliminate weak rules. It may be possible to use other methods in statistics for the limitation of the rule set that can also be used to select only useful rules in the process. Our aims are to extend our current work to other methods in statistics and knowledge discovery systems to find the most appropriate costs function for query processing based on derivability, maintenance and selectivity factors [Graefe and Dewitt, 987; Schkolnick and Tiberio, 985; Ziarko, 99]. Acknowledgement Ayla Sayli is sponsored by the University of Yildiz. References [Chakravarthy et al., 990] U. S. Chakravarthy, J. Grant and J. Minker. Logic-based approach to semantic query optimisation. ACM Transactions on Database Systems, Vol. 5, No. 2, , June 990. [Chan and Wong, 99] K. C. Chan and A. K. C. Wong. A statistical test for extracting classificatory knowledge form databases. Knowledge Discovery in Databases, Ed., The AAAI Press, 07-23, 99. [Graefe and Dewitt, 987] G. Graefe and D. Dewitt. The EXODUS optimiser generator. In Proceedings of the 987 ACM-SIGMOD Conference on Management of Data, San Francisco, 60-7, May 987. [Han et al., 993] J. Han, Y. Cai and N. Cercone. Datadriven discovery of quantitative rules in relational databases. IEEE Transactions on Knowledge and Data Engineering, Vol. 5, no., 29-40, February [Hanson and Widom, 993] E. N. Hanson and J. Widom. An overview of production rules in database systems. The Knowledge Engineering Review, Vol 8:2, 2-43, 993. [Hsu and Knoblock, 994] C. Hsu and C. A. Knoblock. Rule induction for semantic query optimisation. In Proceedings of the Eleventh International Conference on Machine Learning, 994. [Imam et al., 993] I. F. Imam, R. S. Michalski and L. Kerschberg. Discovering attribute dependence in database by integrating symbolic learning and statistical analysis tests, Knowledge Discovery in Databases Workshop, , 993. [Kass, 980] G. V. KASS. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Appl. Statist., 29, No. 2, 9-27, 980. [King, 98] J. J. King. QUIST: A system for semantic query optimisation in relational databases. In Proceeding of the 7 th VLDB Conference, 50-57, Sept. 98. [Lowden et al., 995] B. G. T. Lowden, J. Robinson and K. Y. Lim. A semantic query optimiser using automatic rule derivation. Proc. Fifth Annual Workshop on Information Technologies and Systems, Netherlands, 68-76, December 995. [Piatetsky-Shapipo and Matheus, 993] G. Piatetsky- Shapiro, and C. Matheus. Measuring data dependencies in large databases. Knowledge Discovery in Databases Workshop, 62-73, 993. [Savnik and Flach, 993] I. Savnik and P. A. Flach. Bottom-up induction of functional dependencies from relations. Knowledge Discovery in Databases Workshop, 74-85, 993. [Schkolnick and Tiberio, 985] M. SCHKOLNICK and P. TIBERIO. Estimating the cost of updates in a relational database. ACM Trans. Database Systems, 0, 2, 63-79, June 985. [Shekhar et al., 993] S. Shekhar, B. Hamidzadeh and A. Kohli. Learning transformation rules for semantic query optimisation: a data-driven approach. IEEE, , 993. [Shenoy and Ozsoyoglu, 989] S. T. Shenoy and Z. M. Ozsoyoglu. Design and implementation of semantic query optimiser. IEEE Transactions on Knowledge and Data Engineering, Vol., No. 3, , Sept [Siegel et al., 992] M. D. Siegel, E. Sciore and S. Salveter. A method for automatic rule derivation to support semantic query optimisation. ACM Transactions on Database Systems, Vol. 7, No. 4, , Dec [Yu and Sun, 989] C. Yu and W. Sun. Automatic knowledge acquisition and maintenance for semantic query optimisation. IEEE Trans. Knowl. Data Eng.,, 3, , Sept [Ziarko, 99] W. Ziarko. The discovery, analysis, and representation of data dependencies in databases. In Knowledge Discovery in Databases, The AAAI Press, , 99.

A Statistical Approach to Rule Selection in Semantic Query Optimisation

A Statistical Approach to Rule Selection in Semantic Query Optimisation A Statistical Approach to Rule Selection in Semantic Query Optimisation Barry G. T. Lowden and Jerome Robinson Department of Computer Science, The University of ssex, Wivenhoe Park, Colchester, CO4 3SQ,

More information

A Fast Transformation Method to Semantic Query Optimisation

A Fast Transformation Method to Semantic Query Optimisation A Fast Transformation Method to Semantic Query Optimisation Ayla Sayli ( saylia@essex.ac.uk ) and Barry Lowden ( lowdb@essex.ac.uk ) University of Essex, Dept. of Computer Science, Wivenhoe Park, Colchester,

More information

Utilizing Multiple Computers in Database Query Processing and Descriptor Rule Management

Utilizing Multiple Computers in Database Query Processing and Descriptor Rule Management Utilizing Multiple Computers in Database Query Processing and Descriptor Rule Management Jerome Robinson, Barry G. T. Lowden, Mohammed Al Haddad Department of Computer Science, University of Essex Colchester,

More information

Distributing the Derivation and Maintenance of Subset Descriptor Rules

Distributing the Derivation and Maintenance of Subset Descriptor Rules Distributing the Derivation and Maintenance of Subset Descriptor Rules Jerome Robinson, Barry G. T. Lowden, Mohammed Al Haddad Department of Computer Science, University of Essex Colchester, Essex, CO4

More information

Attribute-Pair Range Rules

Attribute-Pair Range Rules Lecture Notes in Computer Science 1 Attribute-Pair Range Rules Jerome Robinson Barry G. T. Lowden Department of Computer Science, University of Essex Colchester, Essex, CO4 3SQ, U.K. {robij, lowdb}@essex.ac.uk

More information

A Fast Method for Ensuring the Consistency of Integrity Constraints

A Fast Method for Ensuring the Consistency of Integrity Constraints A Fast Method for Ensuring the Consistency of Integrity Constraints Barry G. T. Lowden and Jerome Robinson Department of Computer Science, The University of Essex, Wivenhoe Park, Colchester CO4 3SQ, Essex,

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Vidya Bodhe P.G. Student /Department of CE KKWIEER Nasik, University of Pune, India vidya.jambhulkar@gmail.com Abstract

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

PATTERN DISCOVERY IN TIME-ORIENTED DATA

PATTERN DISCOVERY IN TIME-ORIENTED DATA PATTERN DISCOVERY IN TIME-ORIENTED DATA Mohammad Saraee, George Koundourakis and Babis Theodoulidis TimeLab Information Management Group Department of Computation, UMIST, Manchester, UK Email: saraee,

More information

Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach

Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach Gediminas Adomavicius Computer Science Department Alexander Tuzhilin Leonard N. Stern School of Business Workinq Paper Series

More information

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches Suk-Chung Yoon E. K. Park Dept. of Computer Science Dept. of Software Architecture Widener University

More information

Discovery and Maintenance of Functional Dependencies by Independencies

Discovery and Maintenance of Functional Dependencies by Independencies From: KDD-95 Proceedings. Copyright 1995, AAAI (www.aaai.org). All rights reserved. Discovery and Maintenance of Functional Dependencies by Independencies Siegfried Bell Informatik VIII University Dortmund

More information

Tadeusz Morzy, Maciej Zakrzewicz

Tadeusz Morzy, Maciej Zakrzewicz From: KDD-98 Proceedings. Copyright 998, AAAI (www.aaai.org). All rights reserved. Group Bitmap Index: A Structure for Association Rules Retrieval Tadeusz Morzy, Maciej Zakrzewicz Institute of Computing

More information

Associating Terms with Text Categories

Associating Terms with Text Categories Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science

More information

Optimization of Queries in Distributed Database Management System

Optimization of Queries in Distributed Database Management System Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of

More information

Query Optimization in Distributed Databases. Dilşat ABDULLAH

Query Optimization in Distributed Databases. Dilşat ABDULLAH Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Optimization of Query Processing in XML Document Using Association and Path Based Indexing Optimization of Query Processing in XML Document Using Association and Path Based Indexing D.Karthiga 1, S.Gunasekaran 2 Student,Dept. of CSE, V.S.B Engineering College, TamilNadu, India 1 Assistant Professor,Dept.

More information

A survey: Web mining via Tag and Value

A survey: Web mining via Tag and Value A survey: Web mining via Tag and Value Khirade Rajratna Rajaram. Information Technology Department SGGS IE&T, Nanded, India Balaji Shetty Information Technology Department SGGS IE&T, Nanded, India Abstract

More information

Multi-relational Decision Tree Induction

Multi-relational Decision Tree Induction Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

Efficient subset and superset queries

Efficient subset and superset queries Efficient subset and superset queries Iztok SAVNIK Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 5000 Koper, Slovenia Abstract. The paper

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

arxiv: v1 [cs.db] 7 Dec 2011

arxiv: v1 [cs.db] 7 Dec 2011 Using Taxonomies to Facilitate the Analysis of the Association Rules Marcos Aurélio Domingues 1 and Solange Oliveira Rezende 2 arxiv:1112.1734v1 [cs.db] 7 Dec 2011 1 LIACC-NIAAD Universidade do Porto Rua

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Integrating Logistic Regression with Knowledge Discovery Systems

Integrating Logistic Regression with Knowledge Discovery Systems Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1997 Proceedings Americas Conference on Information Systems (AMCIS) 8-15-1997 Integrating Logistic Regression with Knowledge Discovery

More information

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Mohd Helmy Ab Wahab 1, Azizul Azhar Ramli 2, Nureize Arbaiy 3, Zurinah Suradi 4 1 Faculty of Electrical

More information

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Jun Luo Sanguthevar Rajasekaran Dept. of Computer Science Ohio Northern University Ada, OH 4581 Email: j-luo@onu.edu Dept. of

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Analyzing and Segmenting Finger Gestures in Meaningful Phases

Analyzing and Segmenting Finger Gestures in Meaningful Phases 2014 11th International Conference on Computer Graphics, Imaging and Visualization Analyzing and Segmenting Finger Gestures in Meaningful Phases Christos Mousas Paul Newbury Dept. of Informatics University

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Data integration supports seamless access to autonomous, heterogeneous information

Data integration supports seamless access to autonomous, heterogeneous information Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources

More information

A SURVEY OF DIFFERENT ASSOCIATIVE CLASSIFICATION ALGORITHMS

A SURVEY OF DIFFERENT ASSOCIATIVE CLASSIFICATION ALGORITHMS Asian Journal Of Computer Science And Information Technology 3 : 6 (2013)88-93. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science And Information Technology Journal

More information

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

Content Based Image Retrieval system with a combination of Rough Set and Support Vector Machine

Content Based Image Retrieval system with a combination of Rough Set and Support Vector Machine Shahabi Lotfabadi, M., Shiratuddin, M.F. and Wong, K.W. (2013) Content Based Image Retrieval system with a combination of rough set and support vector machine. In: 9th Annual International Joint Conferences

More information

Representing the Knowledge of a Mediator. Introduction

Representing the Knowledge of a Mediator. Introduction Query Processing in the SIMS Information Mediator Yigal Arens, Chun-Nan Hsu, and Craig A. Knoblock Information Sciences Institute and Department of Computer Science University of Southern California 4678

More information

The Markov Reformulation Theorem

The Markov Reformulation Theorem The Markov Reformulation Theorem Michael Kassoff and Michael Genesereth Logic Group, Department of Computer Science Stanford University {mkassoff, genesereth}@cs.stanford.edu Abstract In this paper, we

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques

Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques Tom Brijs, Koen Vanhoof and Geert Wets Limburg University Centre, Faculty of Applied Economic Sciences, B-3590 Diepenbeek, Belgium

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

Efficiently Mining Positive Correlation Rules

Efficiently Mining Positive Correlation Rules Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 39S-44S Efficiently Mining Positive Correlation Rules Zhongmei Zhou Department of Computer Science & Engineering,

More information

Implementation of CHUD based on Association Matrix

Implementation of CHUD based on Association Matrix Implementation of CHUD based on Association Matrix Abhijit P. Ingale 1, Kailash Patidar 2, Megha Jain 3 1 apingale83@gmail.com, 2 kailashpatidar123@gmail.com, 3 06meghajain@gmail.com, Sri Satya Sai Institute

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

EXTRACTING RULE SCHEMAS FROM RULES, FOR AN INTELLIGENT LEARNING DATABASE SYSTEM

EXTRACTING RULE SCHEMAS FROM RULES, FOR AN INTELLIGENT LEARNING DATABASE SYSTEM EXTRACTING RULE SCHEMAS FROM RULES, FOR AN INTELLIGENT LEARNING DATABASE SYSTEM GEOFF SUTCLIFFE and XINDONG WU Department of Computer Science, James Cook University Townsville, Qld, 4811, Australia ABSTRACT

More information

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori The Computer Journal, 46(6, c British Computer Society 2003; all rights reserved Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes Tori KEQIN LI Department of Computer Science,

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Detecting Logical Errors in SQL Queries

Detecting Logical Errors in SQL Queries Detecting Logical Errors in SQL Queries Stefan Brass Christian Goldberg Martin-Luther-Universität Halle-Wittenberg, Institut für Informatik, Von-Seckendorff-Platz 1, D-06099 Halle (Saale), Germany (brass

More information

RELEVANT RULE DERIVATION FOR SEMANTIC QUERY OPTIMIZATION

RELEVANT RULE DERIVATION FOR SEMANTIC QUERY OPTIMIZATION RELEVANT RULE DERIVATION FOR SEMANTIC QUERY OPTIMIZATION Junping Sun, Nittaya Kerdprasop, and Kittisak Kerdprasop School of Computer and Information Sciences, Nova Southeastern University Fort Lauderdale,

More information

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

UNCORRECTED PROOF ARTICLE IN PRESS. 1 Expert Systems with Applications. 2 Mining knowledge from object-oriented instances

UNCORRECTED PROOF ARTICLE IN PRESS. 1 Expert Systems with Applications. 2 Mining knowledge from object-oriented instances 1 Expert Systems with Applications Expert Systems with Applications xxx (26) xxx xxx wwwelseviercom/locate/eswa 2 Mining knowledge from object-oriented instances 3 Cheng-Ming Huang a, Tzung-Pei Hong b,

More information

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques

More information

Reducing redundancy in characteristic rule discovery by using integer programming techniques

Reducing redundancy in characteristic rule discovery by using integer programming techniques Intelligent Data Analysis 4 (2000) 229 240 229 IOS Press Reducing redundancy in characteristic rule discovery by using integer programming techniques Tom Brijs, Koen Vanhoof and Geert Wets Department of

More information

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD World Transactions on Engineering and Technology Education Vol.13, No.3, 2015 2015 WIETE Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical

More information

Efficient Re-construction of Document Versions Based on Adaptive Forward and Backward Change Deltas

Efficient Re-construction of Document Versions Based on Adaptive Forward and Backward Change Deltas Efficient Re-construction of Document Versions Based on Adaptive Forward and Backward Change Deltas Raymond K. Wong Nicole Lam School of Computer Science & Engineering, University of New South Wales, Sydney

More information

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules A Comparative study of CARM and BBT Algorithm for Generation of Association Rules Rashmi V. Mane Research Student, Shivaji University, Kolhapur rvm_tech@unishivaji.ac.in V.R.Ghorpade Principal, D.Y.Patil

More information

DESIGN AND IMPLEMENTATION OF TOOL FOR CONVERTING A RELATIONAL DATABASE INTO AN XML DOCUMENT: A REVIEW

DESIGN AND IMPLEMENTATION OF TOOL FOR CONVERTING A RELATIONAL DATABASE INTO AN XML DOCUMENT: A REVIEW DESIGN AND IMPLEMENTATION OF TOOL FOR CONVERTING A RELATIONAL DATABASE INTO AN XML DOCUMENT: A REVIEW Sunayana Kohli Masters of Technology, Department of Computer Science, Manav Rachna College of Engineering,

More information

The Hibernate Framework Query Mechanisms Comparison

The Hibernate Framework Query Mechanisms Comparison The Hibernate Framework Query Mechanisms Comparison Tisinee Surapunt and Chartchai Doungsa-Ard Abstract The Hibernate Framework is an Object/Relational Mapping technique which can handle the data for applications

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Structure of Association Rule Classifiers: a Review

Structure of Association Rule Classifiers: a Review Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be

More information

GIR experiements with Forostar at GeoCLEF 2007

GIR experiements with Forostar at GeoCLEF 2007 GIR experiements with Forostar at GeoCLEF 2007 Simon Overell 1, João Magalhães 1 and Stefan Rüger 2,1 1 Multimedia & Information Systems Department of Computing, Imperial College London, SW7 2AZ, UK 2

More information

Log Linear Model for String Transformation Using Large Data Sets

Log Linear Model for String Transformation Using Large Data Sets Log Linear Model for String Transformation Using Large Data Sets Mr.G.Lenin 1, Ms.B.Vanitha 2, Mrs.C.K.Vijayalakshmi 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology,

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING

USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING SARAH COPPOCK AND LAWRENCE MAZLACK Computer Science, University of Cincinnati, Cincinnati, Ohio 45220 USA E-mail:

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

Learning Database Abstractions For Query Reformulation*

Learning Database Abstractions For Query Reformulation* From: AAAI Technical Report WS-93-02. Compilation copyright 1993, AAAI (www.aaai.org). All rights reserved. Learning Database Abstractions For Query Reformulation* Chun-Nan Hsu Department of Computer Science

More information

Aggregation and maintenance for database mining

Aggregation and maintenance for database mining Intelligent Data Analysis 3 (1999) 475±490 www.elsevier.com/locate/ida Aggregation and maintenance for database mining Shichao Zhang School of Computing, National University of Singapore, Lower Kent Ridge,

More information

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology

More information

The Effects of Dimensionality Curse in High Dimensional knn Search

The Effects of Dimensionality Curse in High Dimensional knn Search The Effects of Dimensionality Curse in High Dimensional knn Search Nikolaos Kouiroukidis, Georgios Evangelidis Department of Applied Informatics University of Macedonia Thessaloniki, Greece Email: {kouiruki,

More information

A Conflict-Based Confidence Measure for Associative Classification

A Conflict-Based Confidence Measure for Associative Classification A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

Applying Objective Interestingness Measures. in Data Mining Systems. Robert J. Hilderman and Howard J. Hamilton. Department of Computer Science

Applying Objective Interestingness Measures. in Data Mining Systems. Robert J. Hilderman and Howard J. Hamilton. Department of Computer Science Applying Objective Interestingness Measures in Data Mining Systems Robert J. Hilderman and Howard J. Hamilton Department of Computer Science University of Regina Regina, Saskatchewan, Canada SS 0A fhilder,hamiltong@cs.uregina.ca

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR

More information

A Model of Machine Learning Based on User Preference of Attributes

A Model of Machine Learning Based on User Preference of Attributes 1 A Model of Machine Learning Based on User Preference of Attributes Yiyu Yao 1, Yan Zhao 1, Jue Wang 2 and Suqing Han 2 1 Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada

More information

Enhanced Associative classification based on incremental mining Algorithm (E-ACIM)

Enhanced Associative classification based on incremental mining Algorithm (E-ACIM) www.ijcsi.org 124 Enhanced Associative classification based on incremental mining Algorithm (E-ACIM) Mustafa A. Al-Fayoumi College of Computer Engineering and Sciences, Salman bin Abdulaziz University

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

High Utility Web Access Patterns Mining from Distributed Databases

High Utility Web Access Patterns Mining from Distributed Databases High Utility Web Access Patterns Mining from Distributed Databases Md.Azam Hosssain 1, Md.Mamunur Rashid 1, Byeong-Soo Jeong 1, Ho-Jin Choi 2 1 Database Lab, Department of Computer Engineering, Kyung Hee

More information

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE Mustapha Muhammad Abubakar Dept. of computer Science & Engineering, Sharda University,Greater Noida, UP, (India) ABSTRACT Apriori algorithm

More information

A Literature Review of Modern Association Rule Mining Techniques

A Literature Review of Modern Association Rule Mining Techniques A Literature Review of Modern Association Rule Mining Techniques Rupa Rajoriya, Prof. Kailash Patidar Computer Science & engineering SSSIST Sehore, India rprajoriya21@gmail.com Abstract:-Data mining is

More information

A Better Approach for Horizontal Aggregations in SQL Using Data Sets for Data Mining Analysis

A Better Approach for Horizontal Aggregations in SQL Using Data Sets for Data Mining Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 8, August 2013,

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

DATABASE ISSUES IN KNOWLEDGE DISCOVERY AND DATA MINING ABSTRACT INTRODUCTION

DATABASE ISSUES IN KNOWLEDGE DISCOVERY AND DATA MINING ABSTRACT INTRODUCTION DATABASE ISSUES IN KNOWLEDGE DISCOVERY AND DATA MINING Chris P. Rainsford Defence Science and Technology Organisation Information Technology Division DSTO C3 Research Centre Femhill Park, Canberra 2600

More information

Knowledge Discovery from Client-Server Databases

Knowledge Discovery from Client-Server Databases Knowledge Discovery from Client-Server Databases Nell Dewhurst and Simon Lavington Department of Computer Science, University of Essex, Wivenhoe Park, Colchester CO4 4SQ, UK neilqessex, ac.uk, lavingt

More information

RECOMMENDATION SYSTEM BASED ON ASSOCIATION RULES FOR DISTRIBUTED E-LEARNING MANAGEMENT SYSTEMS

RECOMMENDATION SYSTEM BASED ON ASSOCIATION RULES FOR DISTRIBUTED E-LEARNING MANAGEMENT SYSTEMS ACTA UNIVERSITATIS CIBINIENSIS TECHNICAL SERIES Vol. LXVII 2015 DOI: 10.1515/aucts-2015-0072 RECOMMENDATION SYSTEM BASED ON ASSOCIATION RULES FOR DISTRIBUTED E-LEARNING MANAGEMENT SYSTEMS MIHAI GABROVEANU

More information

Classification Using Unstructured Rules and Ant Colony Optimization

Classification Using Unstructured Rules and Ant Colony Optimization Classification Using Unstructured Rules and Ant Colony Optimization Negar Zakeri Nejad, Amir H. Bakhtiary, and Morteza Analoui Abstract In this paper a new method based on the algorithm is proposed to

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

Effective Estimation of Modules Metrics in Software Defect Prediction

Effective Estimation of Modules Metrics in Software Defect Prediction Effective Estimation of Modules Metrics in Software Defect Prediction S.M. Fakhrahmad, A.Sami Abstract The prediction of software defects has recently attracted the attention of software quality researchers.

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information